[ACCEPTED]-Paging in a Rest Collection-pagination

Accepted answer
Score: 33

I don't really agree with some of you guys. I've 41 been working for weeks on this features 40 for my REST service. What I ended up doing 39 is really simple. My solution only makes 38 a sense for what REST people call a collection.

Client 37 MUST include a "Range" header to indicate 36 which part of the collection he needs, or 35 otherwise be ready to handle a 413 REQUESTED 34 ENTITY TOO LARGE error when the requested 33 collection is too large to be retrieved 32 in a single round-trip.

Server sends a 206 31 PARTIAL CONTENT response, with the Content-Range 30 header specifying which part of the resource 29 has been sent, and an ETag header to identify 28 the current version of the collection. I 27 usually use a Facebook-like ETag {last_modification_timestamp}-{resource_id}, and 26 I consider that the ETag of a collection 25 is that of the most recently modified resource 24 it contains.

To request a specific part of 23 a collection, the client MUST use the "Range" header, and 22 fill the "If-Match" header with the ETag 21 of the collection obtained from previously 20 performed requests to acquire other parts 19 of the same collection. The server can therefore 18 verify that the collection hasn't changed 17 before sending the requested portion. If 16 a more recent version exists, a 412 PRECONDITION 15 FAILED response is returned to invite the 14 client to retrieve the collection from scratch. This 13 is necessary because it could mean that 12 some resources might have been added or 11 removed before or after the currently requested 10 part.

I use ETag/If-Match in tandem with 9 Last-Modified/If-Unmodified-Since to optimize 8 cache. Browsers and proxies might rely on 7 one or both of them for their caching algorithms.

I 6 think that a URL should be clean unless 5 it's to include a search/filter query. If 4 you think about it, a search is nothing 3 more than a partial view of a collection. Instead 2 of the cars/search?q=BMW type of URLs, we 1 should see more cars?manufacturer=BMW.

Score: 24

My gut feeling is that the HTTP range extensions 8 aren't designed for your use case, and thus 7 you shouldn't try. A partial response implies 6 206, and 206 must only be sent if the client asked 5 for it.

You may want to consider a different 4 approach, such as the one use in Atom (where 3 the representation by design may be partial, and 2 is returned with a status 200, and potentially 1 paging links). See RFC 4287 and RFC 5005.

Score: 8

You can still return Accept-Ranges and Content-Ranges with a 200 response 7 code. These two response headers give you 6 enough information to infer the same information 5 that a 206 response code provides explicitly.

I 4 would use Range for pagination, and have it simply 3 return a 200 for a plain GET.

This feels 100% RESTful 2 and doesn't make browsing any more difficult.

Edit: I 1 wrote a blog post about this: http://otac0n.com/blog/2012/11/21/range-header-i-choose-you.html

Score: 5

If there is more than one page of responses, and 46 you don't want to offer the whole collection 45 at once, does that mean there are multiple 44 choices?

On a request to /db/questions, return 300 Multiple Choices with Link headers 43 that specify how to get to each page as 42 well as a JSON object or HTML page with 41 a list of URLs.

Link: <>; rel="http://paged.collection.example/relation/paged"
Link: <>; rel="http://paged.collection.example/relation/paged"

You'd have one Link header for 40 each page of results (an empty string means 39 the current URL, and the URL is the same 38 for each page, just accessed with different 37 ranges), and the relationship is defined 36 as a custom one per the upcoming Link spec. This relationship would explain your 35 custom 266, or your violation of 206. These headers 34 are your machine-readable version, since 33 all of your examples require an understanding 32 client anyway.

(If you stick with the "range" route, I 31 believe your own 2xx return code, as you described 30 it, would be the best behavior here. You're 29 expected to do this for your applications 28 and such ["HTTP status codes are extensible."], and 27 you have good reasons.)

300 Multiple Choices says you SHOULD 26 also provide a body with a way for the user 25 agent to pick. If your client is understanding, it 24 should use the Link headers. If it's a user 23 manually browsing, perhaps an HTML page 22 with links to a special "paged" root 21 resource that can handle rendering that 20 particular page based on the URL? /humanpage/1/db/questions or something 19 hideous like that?

The comments on Richard 18 Levasseur's post remind me of an additional 17 option: the Accept header (section 14.1). Back 16 when the oEmbed spec came out, I wondered 15 why it hadn't been done entirely using HTTP, and 14 wrote up an alternative using them.

Keep 13 the 300 Multiple Choices, the Link headers and the HTML page for 12 an initial naive HTTP GET, but rather than 11 use ranges, have your new paging relationship 10 define the use of the Accept header. Your subsequent 9 HTTP request might look like this:

GET /db/questions HTTP/1.1
Host: paged.collection.example
Accept: application/json;PagingSpec=1.0;page=1

The Accept header 8 allows you to define an acceptable content 7 type (your JSON return), plus extensible 6 parameters for that type (your page number). Riffing 5 on my notes from my oEmbed writeup (can't 4 link to it here, I'll list it in my profile), you 3 could be very explicit and provide a spec/relation 2 version here in case you need to redefine 1 what the page parameter means in the future.

Score: 4


After thinking about it a bit more, I'm 36 inclined to agree that Range headers aren't 35 appropriate for pagination. The logic being, the 34 Range header is intended for the server's 33 response, not the applications. If you 32 served 100 megabytes of results, but the 31 server (or client) could only process 1 30 megabyte at a time, well, thats what the 29 Range header is for.

I'm also of the opinion 28 that a subset of resources is its own resource 27 (similar to relational algebra.), so it 26 deserve representation in the URL.

So basically, I 25 recant my original answer (below) about 24 using a header.

I think you answered your 23 own question, more or less - return 200 22 or 206 with content-range and optionally 21 use a query parameter. I would sniff the 20 user agent and content type and, depending 19 on those, check for a query parameter. Otherwise, require 18 the range headers.

You essentially have conflicting 17 goals - let people use their browser to 16 explore (which doesn't easily allow custom 15 headers), or force people to use a special 14 client that can set headers (which doesn't 13 let them explore).

You could just provide 12 them with the special client depending on 11 the request - if it looks like a plain browser, send 10 down a small ajax app that renders the page 9 and sets the necessary headers.

Of course, there 8 is also the debate about whether the URL 7 should contain all the necessary state for 6 this sort of thing. Specifying the range 5 using headers can be considered "un-restful" by 4 some.

As an aside, it would be nice if servers 3 could respond with a "Can-Specify: Header1, header2" header, and 2 web browsers would present a UI so users 1 could fill in values, if they desired.

Score: 3

You might consider using a model something 8 like the Atom Feed Protocol since it has 7 a sane HTTP model of collections and how 6 to manipulate them (where insane means WebDAV).

There's 5 the Atom Publishing Protocol which defines the collection model 4 and REST operations plus you can use RFC 5005 - Feed Paging and Archiving to 3 page through big collections.

Switching from 2 Atom XML to JSON content should not affect 1 the idea.

Score: 3

I think the real problem here is that there 38 is nothing in the spec that tells us how 37 to do automatic redirects when faced with 36 413 - Requested Entity Too Large.

I was struggling 35 with this same problem recently and I looked 34 for inspiration in the RESTful Web Services book. Personally 33 I don't think 206 is appropriate due to 32 the header requirement. My thoughts also 31 led me to 300, but I thought that was more 30 for different mime-types, so I looked up 29 what Richardson and Ruby had to say on the 28 subject in Appendix B, page 377. They suggest 27 that the server just pick the preferred 26 representation and send it back with a 200, basically 25 ignoring the notion that it should be a 24 300.

That also jibes with the notion of links 23 to next resources that we have from atom. The 22 solution I implemented was to add "next" and 21 "previous" keys to the json map I was sending 20 back and be done with it.

Later on I started 19 thinking maybe the thing to do is send a 18 307 - Temporary Redirect to a link that 17 would be something like /db/questions/1,25 16 - that leaves the original URI as the canonical 15 resource name, but it gets you to an appropriately 14 named subordinate resource. This is behavior 13 I'd like to see out of a 413, but 307 seems 12 a good compromise. Haven't actually tried 11 this in code yet though. What would be 10 even better is for the redirect to redirect 9 to a URL containing the actual IDs of the 8 most recently asked questions. For example 7 if each question has an integer ID, and 6 there are 100 questions in the system and 5 you want to show the ten most recent, requests 4 to /db/questions should be 307'd to /db/questions/100,91

This 3 is a very good question, thanks for asking 2 it. You confirmed for me that I'm not nuts 1 for having spent days thinking about it.

Score: 2

With the publication of rfc723x, unregistered range units do go against an explicit recommendation in the spec. Consider rfc7233 (deprecating 2 rfc2616):

"New range units ought to be registered with IANA" (along with a reference 1 to a HTTP Range Unit Registry).

Score: 2

One of the big problems with range headers 3 is that a lot of corporate proxies filter 2 them out. I'd advise to use a query parameter 1 instead.

Score: 1

You can detect the Range header, and mimic Dojo 9 if it is present, and mimic Atom if it is 8 not. It seems to me that this neatly divides 7 the use cases. If you are responding to 6 a REST query from your application, you 5 expect it to be formatted with a Range header. If 4 you are responding to a casual browser, then 3 if you return paging links it will let the 2 tool provide an easy way to explore the 1 collection.

Score: 0

Seems to me that the best way to do this 7 is to include range as query parameters. e.g., GET /db/questions/?date>mindate&date<maxdate. Upon 6 a GET to the /db/questions/ with no query 5 parameters, return 303 with Location: /db/questions/?query-parameters-to-retrieve-the-default-page. Then provide 4 a different URL by which whomever is consuming 3 your API to get statistics about the collection 2 (e.g., what query parameters to use if s/he 1 wants the entire collection);

Score: 0

While its possible to use the Range header 23 for this purpose, I don't think that was 22 the intent. It seems to have been designed 21 for handling flaky connections as well as 20 limiting the data (so the client can request 19 part of the request if something was missing 18 or the size was too large to process). You 17 are hacking pagination into something that 16 is likely used for other purposes at the 15 communication layer. The "proper" way to 14 handle pagination is with the types you 13 return. Rather than returning questions 12 object, you should be returning a new type 11 instead.

So if questions is like this:

<questions> <question index=1></question> <question index=2></question> ... </questions>

The 10 new type could be something like this:

<questionPage> <startIndex>50</startIndex> <returnedCount>10</returnedCount> <totalCount>1203</totalCount> <questions> <question index=50></question> <question index=51></question> .. </questions> <questionPage>

Of 9 course you control your media types, so 8 you can make your "pages" a format that 7 suits your needs. If you make is something 6 generic, you can have a single parser on 5 the client to handle paging the same for 4 all types. I think that is more in the 3 spirit of the HTTP specification, rather 2 than fudging the Range parameter for something 1 else.

More Related questions