Coder Social home page Coder Social logo

Comments (12)

gomme600 avatar gomme600 commented on May 31, 2024 1

I have also seen this on Android

from photoprism-mobile.

thielepaul avatar thielepaul commented on May 31, 2024

The reason for this is that the backend currently has an issue here:

  • the app requests the first 100 photos via the API
  • the backend may return an array of photos with a size less than 100 if there are multiple files for one photo (e.g. if it is a live photo or a video)

I already created a pull request to fix this in the backend.

from photoprism-mobile.

lastzero avatar lastzero commented on May 31, 2024

That's not entirely correct: API clients request files, not photos. If you set the primary flag, the number of files and photos will match (but then, you don't get stacks of related files):

https://github.com/photoprism/photoprism/blob/develop/internal/query/photo_search.go#L55

The API offers to merge files to reduce response size and improve client-side rending performance. Stacks / files might still need to be merged on the client at the beginning and end of result sets, as you otherwise see duplicates or broken previews when one of the files in an affected stack is not a JPEG. To get the number of photos, you count the number of results (JS: response.data.length). The file count and offset can be found in the response header.

The contents of the photos and files tables may be highly dynamic, especially when indexing. They can also grow very large, up to millions of rows. Therefore it typically doesn't make sense to get a total row count as it may change every second and forces the database to scan the complete table (or index if you're lucky) for results when returning the first X files would have been enough.

from photoprism-mobile.

thielepaul avatar thielepaul commented on May 31, 2024

I see why the API is currently the way it is. And, I also see why this is not a big issue for the web frontend as it is currently implemented. Yet, I believe it would be better if the API does not return files as you described but photos with their files (stacked) and correct counts and offsets in that case.

I think this would also help to fix photoprism/photoprism#500

Regarding database performance, I can just say the implementation from my pull request (photoprism/photoprism#723) works fine for me with more the 6K photos but obviously I haven't tested it with millions of files. For me the count of the photos is also not that dynamic but won't change more often than a few times per day. Maybe, we have different usecases for photoprism in mind here.

from photoprism-mobile.

lastzero avatar lastzero commented on May 31, 2024

We are days before a release. There is no way to change the API in such a fundamental way at this point. Let me explain a bit more though.

The reason the API operates on files is because our users are searching for image data and file names, like when you group by similarity:

photoprism/photoprism#715

When the same photo (id) exists in different rows that don't come next to each other, would this count as one photo or two? Or must the files be similar, although one is a red cat and the other a blue elephant? Using a relational database the way it is designed ensures we're following the path of logic. It was a real pain developing that frontend JS code, but it makes sense the way it is.

When it comes to performance, a user on GitHub reported 400k+ files. That must be our standard as we don't want to break things for these people, even though we might not have these requirements personally.

from photoprism-mobile.

thielepaul avatar thielepaul commented on May 31, 2024

I agree that it is sensible not to change the API shortly before the release.

If anyone can test my branch with a large photo collection to evaluate the performance that would be awesome! (there is a docker image of my branch docker pull thielepaul/photoprism:photo-search-count)

The reason the API operates on files is because our users are searching for image data and file names, like when you group by similarity:

Also in this case, if I ask the backend for let's say 10 results, I expect it to return 10 elements and not 3 or so which then contain 10 files.

When the same photo (id) exists in different rows that don't come next to each other, would this count as one photo or two? Or must the files be similar, although one is a red cat and the other a blue elephant? Using a relational database the way it is designed ensures we're following the path of logic. It was a real pain developing that frontend JS code, but it makes sense the way it is.

I can not follow this argument, why would there be two files with the same photo id, if one shows a red cat and the other a blue elephant?
If the two different rows come next to each other or not does not matter in my opinion. As I see it, there should always be one result per photo id.

I would prefer, if we discuss the details of the API changes within the PR.
As you write yourself, writing the frontend logic for the current API is rather difficult and in my opinion that means that there is a problem in the backend.

from photoprism-mobile.

lastzero avatar lastzero commented on May 31, 2024

Also in this case, if I ask the backend for let's say 10 results, I expect it to return 10 elements and not 3 or so which then contain 10 files.

Don't set the merge flag. It will be that way then. As mentioned, that's a performance optimization for clients. Not a change in logic as what you suggest.

I can not follow this argument, why would there be two files with the same photo id, if one shows a red cat and the other a blue elephant?

They can be stacked based on the same file name prefix or unique image id.

from photoprism-mobile.

lastzero avatar lastzero commented on May 31, 2024

why would there be two files with the same photo id, if one shows a red cat and the other a blue elephant?

To give a practical example: There may be one image file showing both and multiple crops only showing one animal each - some might also be edited in other ways (aspect ratio, chroma, brightness,...). All of that is stored in the files table, not photos. When you explicitly sort by aspect ratio, you don't want files with different aspect ratios merged in one "photo" result.

from photoprism-mobile.

lastzero avatar lastzero commented on May 31, 2024

This is a good example for results that are not merged, so you can see the editing differences and file names:

Screenshot 2020-12-20 at 12 31 35

from photoprism-mobile.

thielepaul avatar thielepaul commented on May 31, 2024

Thank you for your extensive explanations regarding the different options of the photo_search API!

Yet, I am unsure what is your main point of criticism regarding the PR?

  • the motivation behind the API change (get the absolute counts of the results and results at specified offsets to allow fluid scrolling in the mobile app)
  • the additional API parameter format=count to get the absolute count
  • the API change that offset and count are consistent with the size of elements in the response array if merged=true
  • the implementation of the API changes (including performance concerns)

from photoprism-mobile.

lastzero avatar lastzero commented on May 31, 2024

Your motivation is fine and completely understandable, but it remains to be seen if this is the best and only way to get the app fixed. Our JS frontend clearly demonstrates that the API works as designed - there are currently no open bug reports that would indicate issues with the search results.

Why don't you query the API with the parameter "primary:true" for now? Seems like this solves the issues described in a much less complex and impactful way. You don't need more to show a result list with preview. Note that you might still run into issues when new files are added as the offset becomes useless. In this case, you need to start with 0 and a count that includes additional results. We use websockets with push events to notify the client. (Edit: Querying pages in an interleaved way is an alternative to detect dirty results that require a full refresh in many cases. So you shift the offset by one to see if the last id from the previous page matches the first from the next. That one result gets discarded, so minimal performance impact.).

Not sure what you ultimately need the total count for? Stop to query when the result length is less than what you requested as this must be the end. Even if you know a count from the past, it might be wrong as something changed. Users will report it as a bug if they scroll down and don't see their new photos.

Counting all rows will always have a performance impact, unless you have less then a certain amount of rows as your complete database fits in memory. Just because 6,000 files work well doesn't mean it's not a major issue with 100,000+ files.

As a side note, I remember to have tested counting all rows and found certain edge cases where the result didn't match the actual number of results. That was a long time ago, so I can't give you all details from memory. Need to focus on updating our docs for the release now.

from photoprism-mobile.

lastzero avatar lastzero commented on May 31, 2024

We can take a look at the API together, after the release is done πŸŽ„

from photoprism-mobile.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.