The current advice implicitly in this doc and explicitly in the <a href="https://devce

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I imagine with some data (as pseudo-CSV) <div class="snippet-clipboard-content not

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Paginating/ranging over non-unique fields about http-api-design HOT 15 OPEN

interagent commented on September 23, 2024

Paginating/ranging over non-unique fields

from http-api-design.

Comments (15)

mathias commented on September 23, 2024

The simple solution I have now is that the query to sort the results is always something like WHERE conditions ORDER BY deploys DESC id ASC where deploys is some non-unique count. The ORDER BY would be part of a default scope in my models, and my equivalent of API's Ranger class would handle the pagination inside that sorted list.

For the small-ish number of records that we might need to serve (1-10,000 likely, or no more than 10K for awhile) in the service I'm proposing, sorting like this in each query without saving an index should be ok, but that probably can be improved in the future.

from http-api-design.

geemus commented on September 23, 2024

@mathias I had originally started pondering down a very similar path. Unfortunately although a convention based approach helps in terms of consistent order, it doesn't help in terms of boundaries. ie if the main key has several, sorted by id, when you ask for the next page, you would could either end up skipping the differing id ones (or duplicating). In the pathological case, there could even be a case where an entire range (or more) of records shared the same primary sorting key.

No particular suggestion just yet, need to ponder/research a bit more. Will try to work on something soon.

from http-api-design.

mathias commented on September 23, 2024

@geemus Doesn't using the id of the record (which should be unique) as the second ORDER BY argument prevent those cases you mentioned?

from http-api-design.

mathias commented on September 23, 2024

I imagine with some data (as pseudo-CSV)

id,deploys,another_attr
1,123,foo
2,4,bar
3,123,baz

When we want to sort by deploys descending, it'd really sort like this:

deploys/id
123/1
123/3
4/2

And if we sorted by another_attr ascending:

another_attr/id
bar/2
baz/3
foo/1

Essentially the unique ID is being used to break ties for the sort order.

from http-api-design.

geemus commented on September 23, 2024

Correct. The issue is, how do you ask for the second range if the last value of the first range and first value of the second range have the same primary sort column value? So, for instance:

1/1
2/1
---
2/2
3/1

Let's say this is the data, and we are using ranges with 2 items each. The first range would be 1..2 and presumably the next-range would then be 2..1, but this would include the 2/1 again, since it matches with that specification also. Does that help clarify my meaning?

from http-api-design.

mathias commented on September 23, 2024

@geemus The right side would have to be a unique value for each of those. You couldn't have a list like

1/1
2/1
---
2/2
3/1

Because the 1 on the right is reused. "Inside" any value on the left side, the right side would also be sorted.

A better version of the above data would hopefully look like:

1/1
2/4
---
2/2
3/3

where the values on the right side are the database ID (or UUID) -- basically, where they're unique.

However, this leads to a cumbersome thing from the client side: We'll have to know that internally that's what pagination needs, and send both values to paginate:

Range: deploys 1..2; id 1..4; max=2

Passing order=desc doesn't make sense, unless it is guaranteed to apply to the first attr sorted on and the second attr is always assumed to be ASC (for simplicity.)

from http-api-design.

geemus commented on September 23, 2024

Well, my example was bad, but yeah, you would have to pass both keys (at least in some cases) to get the next range. And there are already semantics around asking for multiple ranges in the RFCs that are NOT at all like this, which seems troublesome. In the RFCs it is more like bytes 1-10, 20-30, so the same range unit, but more than one range at once (which might or might not be contiguous). Doesn't mean we can't do something like your suggestion, but might mean that doing it as a doubled range could be problematic.

I was pondering using something like marker from S3 in cases like this, ie you would pass an extra param that would always contain the last-seen uuid or something. Similar really to your suggestion, but just putting the arg elsewhere, ie Range: deploys 1..2; marker=4, max=2; or something like that.

from http-api-design.

mathias commented on September 23, 2024

You're right, and I suspect that the fact that the Range header is intended for ordered and unique values (bytes in a file), it won't work here without expanding what we allow passed in / expanding the definition to some degree.

Adding in the marker is an interesting twist, and it means the client doesn't have to be quite so clever about the values it sends it, it can simply use the Next-Range headers from the last response as is intended with this design.

Sounds like S3 uses Marker for something slightly different:

Specifies the key to start with when listing objects in a bucket. Amazon S3 returns object keys in alphabetical order, starting with key after the marker in order.

Type: String

Default: None

They also have a NextMarker value which corresponds to our Next-Range to some degree.

Would the response headers look like this?

Content-Range: id 5..10, marker=3
Next-Range: ]10..; marker=4, max=2

from http-api-design.

mathias commented on September 23, 2024

Follow-up to that: I guess the Next-Range shouldn't use ]10..; there unless we're sure that there's no 10 on the next page, so rather than check, we'd want it to look more like this:

Content-Range: id 5..10, marker=3
Next-Range: 10..; marker=]3, max=2

Edit: Gah, can't seem to get this right. We wouldn't include a range in marker in Next-Range either, right? It'd be assumed that the range started from 3 on above?

from http-api-design.

geemus commented on September 23, 2024

Yeah, I think whatever we end up with, it should be such that client can just pass Next-Range. Since the markers in S3 give you back things AFTER the marker, but not including the marker, already (if I'm reading that right), I think the marker values in the examples could just be 3 and not need the exclusivity marker, though perhaps it would be good to include it to be explicit, I think it would probably always be exclusive. I don't know, this seems like a pretty weird conglomeration of barely associated things, which worries me. Can't say I've figured out a good alternative just yet though.

from http-api-design.

bjeanes commented on September 23, 2024

Marker seems like a close precedent, though it is used to set a boundary on the primary sort order by S3. Nonetheless, worth thinking about applying it here! I appreciate you dwelling on this for us @geemus!

from http-api-design.

geemus commented on September 23, 2024

Still working on this, but ran out of time as I leave for vacation. A couple thoughts (though I'm midstream in research/pondering).

Many of the other implementations of pagination via range do it with natural numbers, more like byte ranges. I imagine this often points to incrementing primary keys, but it could also just be used as a means of pointing to offsets/limits. ie something like releases 1-100 could indicate the first hundred in some ordering, which might actually be encoded later on (or default to primary key/uuid).

To make it more generic it could be something like resources 1-100 instead of being resource-specific. If we did it this way (with dashes instead of double dots), it could probably be done in a backwards compatible way. This undermines the level of meaning of a range and makes them somewhat less stable, but maybe that doesn't really matter.

Anyway, I have a ton of tabs open I'm still working through to try and get something solidified, but I'm also just out of time (it's already a couple hours past my EOD, but wanted to share where I was for reference and so that I can hopefully more easily pick the thread back up upon my return).

from http-api-design.

geemus commented on September 23, 2024

See also a detailed discussion around a possible way to work around this (and the many tradeoffs/issues): interagent/pliny#234

from http-api-design.

geemus commented on September 23, 2024

Here is a proposal I wrote a little while back that would be much closer to the original Range stuff: https://gist.github.com/geemus/8dfc01bd99367346c25891c915327220

from http-api-design.

gudmundur commented on September 23, 2024

I know there were some performance concerns around using the object offset for paginating. But I do wonder how large the dataset has to be for that to become an actual issue.

On interagent/pliny#234 I proposed a design using the unique identifiers with row value expressions. The only question remaining was what separator to use for the ranges.

It seems to me that the performance of pagination is highly dependent on the underlying data store, so I'm not sure we can have a one size fit all.

from http-api-design.

Paginating/ranging over non-unique fields about http-api-design HOT 15 OPEN

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent