Coder Social home page Coder Social logo

mtalcott / google-photos-deduper Goto Github PK

View Code? Open in Web Editor NEW
99.0 6.0 7.0 705 KB

Locally run web app and Chrome extension to remove duplicates from Google Photos

License: MIT License

Python 49.68% Dockerfile 0.81% HTML 0.77% JavaScript 0.33% CSS 1.04% TypeScript 47.38%
chrome-extension docker-compose google-photos google-photos-api image-embeddings python react typescript crxjs vite

google-photos-deduper's Introduction

Google Photos Deduper

Python tests build badge

Locally run web app + Chrome extension to delete duplicates in Google Photos. Built with:

Google Photos API Python MediaPipe TypeScript Vite React MUI CRXJS Docker

Demo

Demo

Getting Started

While a hosted web app would be ideal, one is not currently provided due to API usage limits, the overhead of Google's app verification process, cost, and user privacy considerations. Instead, follow these instructions to get the app up and running locally:

Setup

1. Install Docker Desktop on your system.

2. Clone this repository.

3. Create a Google Cloud project and OAuth credentials.
  • Create a Google Cloud project (Guide)
    • Project name: Enter Photos Deduper
    • Select the project
  • Go to APIs & Services > Enable APIs and Services
    • Search for Photos Library API
    • Enable
  • Go to APIs & Services > OAuth consent screen
    • User Type: Choose External
    • Create
      • App name: Enter Photos Deduper
      • User support email: Choose your email
      • Developer contact information: Enter your email
      • Save and Continue
    • Add or remove scopes:
      • Manually add scopes:
        • https://www.googleapis.com/auth/userinfo.profile
        • https://www.googleapis.com/auth/userinfo.email
        • https://www.googleapis.com/auth/photoslibrary
      • Update
      • Save and Continue
    • Test users:
      • Add your email (and any others you want to use the tool with)
      • Save and Continue
  • Go to APIs & Services > Credentials > Create Credentials > OAuth client ID
    • Application type: Choose Web application
    • Name: Enter Photos Deduper Web Client
    • Authorized JavaScript origins: Enter http://localhost
    • Authorized redirect URIs: Enter http://localhost/auth/google/callback
    • Create
  • Download the JSON file

4. Set up local environment variables.

  • cp example.env .env
  • Generate FLASK_SECRET_KEY with python -c 'import secrets; print(secrets.token_hex())' and add it to .env.
  • Add GOOGLE_CLIENT_ID and GOOGLE_CLIENT_SECRET from the client_id and client_secret values from the client secret file created above.

Start

  1. Run docker-compose up from the project directory.
  2. Load http://localhost and follow the instructions from there!

Support

If you found a bug or have a feature request, please open an issue.

If you have questions about the tool, please post on the discussions page.

Development

  • Python app
    • Flask is set to debug mode, so live reloading is enabled.
    • Debugging with debugpy is supported. See launch.json.
  • React app
    • Utilizes Vite for HMR and building.
  • Chrome extension

Motivation

I've been a long-time user of Google Photos. When Picasa Web Albums retired, my cloud photos and albums moved to Google Photos. I have used nearly every desktop client Google provided from Picasa, to the old Google Photos desktop uploader, to Google Drive's built-in Photos integration, and finally to Backup and Sync.

Google has improved duplicate detection upon upload in recent years, but that wasn't always the case. I have tens of thousands of photos across hundreds of albums that were at some point duplicated by a desktop client. Also, even today, deleting, re-uploading, then restoring a photo results in a duplicate.

This could probably be solved by clearing out my Photos data and re-uploading everything. However, that would remove all album organization and photo descriptions. Instead, it's preferred to remove duplicates in-place. Searches show interest in this feature from the Google Photos user base, but it hasn't ever made its way into the product.

The existing tools I could find for this problem did so only with media on the local computer, felt scammy, or didn't fully automate the deletion process. So I created this one.

It turns out the Google Photos API is quite limited. While apps can read limited metadata about the media items in a user's library, they cannot delete media items (photos and videos), and they can only modify media items uploaded by the app itself. This means we can't, for example, add all of the duplicates to an album for the user to review. This necessitates some kind of tool to automate the deletion of duplicates. Since we've already bought in to the Google ecosystem as a Photos user, I chose to do this with a complementary Chrome extension.

Say Thanks

If you found this project useful, give it a star!

google-photos-deduper's People

Contributors

mtalcott avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

google-photos-deduper's Issues

Bad Gateway 502

I am getting a 502 Bad Gateway when running get started I noticed the same issue on two other endpoints in the console too. I have attached screenshots in case anyone knows the answer. Thanks in advance.

Screenshot 2024-02-18 203350
Screenshot 2024-02-18 2033501

Any plans to automate installation and set up?

Hello

Your app looks extremely helpful. I note that it includes some steps that look rather daunting such as:
4. Set up local environment variables.
cp example.env .env
Generate FLASK_SECRET_KEY with python -c 'import secrets; print(secrets.token_hex())' and add it to .env.
Add GOOGLE_CLIENT_ID and GOOGLE_CLIENT_SECRET from the client_id and client_secret values from the client secret file created above.

Do you have any plans to package it in such a way that installation and operation is easier for people who are less technical? Thank you

Show full previews

Currently the previews are cropped to horizontal rectangles. That hides part of the photo for photos with different aspect ratios, which makes it harder to compare them. It would be more convenient if I could see the full photo independently on aspect ratio.

Cannot build Chrome Extension

Firstly, really cool project. I can't wait to see the results, it's been running for a while so far :)

I was going to build the extension for Chrome, but I am getting this error

jkowall@jkhome:~/google-photos-deduper$ docker-compose -f chrome_extension/docker-compose.yml run node npm run build
ERROR: The Compose file is invalid because:
Service node has neither an image nor a build context specified. At least one must be provided.

KeyError: 'storageFilename'

Hi,

Sadly getting an error when processing duplicates at 38%

Final bit of the log:

38%|███▊      | 9857/26081 [02:13<03:40, 73.64it/s]G/ForkPoolWorker-31]

2023-09-27 00:24:00 [2023-09-26 23:24:00,422: ERROR/ForkPoolWorker-31] Task app.tasks.process_duplicates[7cdb40e7-fee2-40b1-99ee-8eaa89969e53] raised unexpected: KeyError('storageFilename')

2023-09-27 00:24:00 Traceback (most recent call last):

2023-09-27 00:24:00   File "/usr/local/lib/python3.9/site-packages/celery/app/trace.py", line 477, in trace_task

2023-09-27 00:24:00     R = retval = fun(*args, **kwargs)

2023-09-27 00:24:00   File "/usr/src/app/app/_init.py", line 25, in __call_

2023-09-27 00:24:00     return self.run(*args, **kwargs)

2023-09-27 00:24:00   File "/usr/src/app/app/tasks.py", line 90, in process_duplicates

2023-09-27 00:24:00     results = task_instance.run()

2023-09-27 00:24:00   File "/usr/src/app/app/lib/process_duplicates_task.py", line 109, in run

2023-09-27 00:24:00     similarity_map = duplicate_detector.calculate_similarity_map()

2023-09-27 00:24:00   File "/usr/src/app/app/lib/duplicate_image_detector.py", line 61, in calculate_similarity_map

2023-09-27 00:24:00     embeddings = self._calculate_embeddings()

2023-09-27 00:24:00   File "/usr/src/app/app/lib/duplicate_image_detector.py", line 113, in _calculate_embeddings

2023-09-27 00:24:00     storage_path = self._get_storage_path(media_item)

2023-09-27 00:24:00   File "/usr/src/app/app/lib/duplicate_image_detector.py", line 297, in _get_storage_path

2023-09-27 00:24:00     return self.image_store.get_storage_path(media_item["storageFilename"])

2023-09-27 00:24:00 KeyError: 'storageFilename'

(Sorry for the weird line breaks, but without it just seemed to paste as one big string).

MaxRetryError

After running the process for some time it starts throwing MaxRetryError.

google-photos-deduper-worker-1 | [2023-09-21 20:09:07,653: ERROR/ForkPoolWorker-25] Task app.tasks.store_images[04069794-2d4d-4e60-9c1c-8b7c3ec287cc] raised unexpected: ConnectionError(MaxRetryError("HTTPSConnectionPool(host='lh3.googleusercontent.com', port=443): Max retries exceeded with url: /lr/AAJ1LKfwsu7RuSBKHHMfm0N9oHB-ukYJf_aJQ8AXs-51YZ4g8IPOPVlEtie0F_guBhuRsTzLO0ZzGEpaCBbpoBrV-UbhtAHWjCyUmiGgFU6q2QwKQf9iQCK7Fp07lDJKtMX1I4q1nfmc4YR6JIo1KQI5LEpypvdmxswZvFotFyDXNKhgN674UK2QipnwSpHnflkt16lbPRwi589MrnqtTuRsefVod9KQRwq74mL3jefNHPbQFcWt8JYbnGFTEpThs4w0p-AMiw3hiOAr3anCm7Yy1yBettFaQC3bCkpq39X-u-LdSr9RPEIKrvOpXXyDhuZWB3Slz0Mh0d18Y_mNOj65r9zt9qDII3hOWqT0T3wp_lP7gyNVbsyc5faWy4dH8JLesgComiSNy80_E4dE2LlLmaqC8J2pvnNHfknJoSNYPRMMCANTOmOV27rDNvpHp5IQmsnScIK_vhCLmFVxbzzFJAe4HyY99XF1Z3fhbDg66Zv9GqhUpx1ESAI8oUZ9WbVKpUWacjuIAahQ53VSn0680JQqZ3wsE4nvZyCWNpomDvB4eNrQHuJ4QFGojLpZwkae8zDSJgUa6V2QwBCdFkxZapksVLPurjf1RIvgx7KVlDL_T0FmLtBk-YlGPMbarQuWVzoIIzr1STnVVb-3oaVDI7Ye1Ym4bw80aaCWdbOd7jmgDviYT4I9acqe3pgy1JJktfWRRuFAnnE2txWB3pgfO1W00bVUZ_aQy0fek92C4SUW-tXoxfwGsFCjce5QyLlBQAEIpwkejdH4OJgQaDq68_XNtLL88BOfqgNeoroJNeF2T_NMVr-cMF73wD_xgdGI6Hhm0_k9juostJHaSqM-l88cs5O49b4EboDaXqFSjwRU7f7UhOYq4eYarwNxYobkgj7wmI-UA_ctlkiOx5BeW5VTyEmaycTrQCmcFCn9ZQo9p9nutUA7h9lvBbVY39Eg-4mPEOuVTJKCkf-gXfpL3gbAsHcgsPhhGyRjuKPC3D7prswAh2GmM7kSIOXLrA=w250-h250 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0xffff6dce9160>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))"))
google-photos-deduper-worker-1 | Traceback (most recent call last):
google-photos-deduper-worker-1 | File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 174, in _new_conn
google-photos-deduper-worker-1 | conn = connection.create_connection(
google-photos-deduper-worker-1 | File "/usr/local/lib/python3.9/site-packages/urllib3/util/connection.py", line 72, in create_connection
google-photos-deduper-worker-1 | for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
google-photos-deduper-worker-1 | File "/usr/local/lib/python3.9/socket.py", line 954, in getaddrinfo
google-photos-deduper-worker-1 | for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
google-photos-deduper-worker-1 | socket.gaierror: [Errno -3] Temporary failure in name resolution
google-photos-deduper-worker-1 |
google-photos-deduper-worker-1 | During handling of the above exception, another exception occurred:
google-photos-deduper-worker-1 |
google-photos-deduper-worker-1 | Traceback (most recent call last):
google-photos-deduper-worker-1 | File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 714, in urlopen
google-photos-deduper-worker-1 | httplib_response = self._make_request(
google-photos-deduper-worker-1 | File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 403, in _make_request
google-photos-deduper-worker-1 | self._validate_conn(conn)
google-photos-deduper-worker-1 | File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1053, in _validate_conn
google-photos-deduper-worker-1 | conn.connect()
google-photos-deduper-worker-1 | File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 363, in connect
google-photos-deduper-worker-1 | self.sock = conn = self._new_conn()
google-photos-deduper-worker-1 | File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 186, in _new_conn
google-photos-deduper-worker-1 | raise NewConnectionError(
google-photos-deduper-worker-1 | urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0xffff6dce9160>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution

403 Client Error: Forbidden for url: https://photoslibrary.googleapis.com/v1/mediaItems?pageSize=100

I have doublechecked the scopes, the client secret and the client id.

image

When I click start on the Select Options page, I get the following on the screen:
image

and the following in the console logs:

google-photos-deduper-worker-1  | [2024-03-13 13:40:42,302: ERROR/ForkPoolWorker-31] Task app.tasks.process_duplicates[09d00cc7-3f48-4930-a4c5-523cebc4a9d9] raised unexpected: HTTPError('403 Client Error: Forbidden for url: https://photoslibrary.googleapis.com/v1/mediaItems?pageSize=100')
google-photos-deduper-worker-1  | Traceback (most recent call last):
google-photos-deduper-worker-1  |   File "/usr/local/lib/python3.9/site-packages/celery/app/trace.py", line 477, in trace_task
google-photos-deduper-worker-1  |     R = retval = fun(*args, **kwargs)
google-photos-deduper-worker-1  |   File "/usr/src/app/app/__init__.py", line 25, in __call__
google-photos-deduper-worker-1  |     return self.run(*args, **kwargs)
google-photos-deduper-worker-1  |   File "/usr/src/app/app/tasks.py", line 111, in process_duplicates
google-photos-deduper-worker-1  |     results = task_instance.run()
google-photos-deduper-worker-1  |   File "/usr/src/app/app/lib/process_duplicates_task.py", line 91, in run
google-photos-deduper-worker-1  |     self._fetch_media_items(client)
google-photos-deduper-worker-1  |   File "/usr/src/app/app/lib/process_duplicates_task.py", line 196, in _fetch_media_items
google-photos-deduper-worker-1  |     client.fetch_media_items(callback=fetch_callback)
google-photos-deduper-worker-1  |   File "/usr/src/app/app/lib/google_photos_client.py", line 44, in fetch_media_items
google-photos-deduper-worker-1  |     resp_json = self._refresh_credentials_if_invalid(func)
google-photos-deduper-worker-1  |   File "/usr/src/app/app/lib/google_api_client.py", line 131, in _refresh_credentials_if_invalid
google-photos-deduper-worker-1  |     raise error
google-photos-deduper-worker-1  |   File "/usr/src/app/app/lib/google_api_client.py", line 124, in _refresh_credentials_if_invalid
google-photos-deduper-worker-1  |     return func()
google-photos-deduper-worker-1  |   File "/usr/src/app/app/lib/google_photos_client.py", line 39, in func
google-photos-deduper-worker-1  |     return self.session.get(
google-photos-deduper-worker-1  |   File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 602, in get
google-photos-deduper-worker-1  |     return self.request("GET", url, **kwargs)
google-photos-deduper-worker-1  |   File "/usr/local/lib/python3.9/site-packages/google/auth/transport/requests.py", line 549, in request
google-photos-deduper-worker-1  |     response = super(AuthorizedSession, self).request(
google-photos-deduper-worker-1  |   File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 589, in request
google-photos-deduper-worker-1  |     resp = self.send(prep, **send_kwargs)
google-photos-deduper-worker-1  |   File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 710, in send
google-photos-deduper-worker-1  |     r = dispatch_hook("response", hooks, r, **kwargs)
google-photos-deduper-worker-1  |   File "/usr/local/lib/python3.9/site-packages/requests/hooks.py", line 30, in dispatch_hook
google-photos-deduper-worker-1  |     _hook_data = hook(hook_data, **kwargs)
google-photos-deduper-worker-1  |   File "/usr/src/app/app/lib/google_api_client.py", line 141, in <lambda>
google-photos-deduper-worker-1  |     session.hooks = {"response": lambda r, *args, **kwargs: r.raise_for_status()}
google-photos-deduper-worker-1  |   File "/usr/local/lib/python3.9/site-packages/requests/models.py", line 1021, in raise_for_status
google-photos-deduper-worker-1  |     raise HTTPError(http_error_msg, response=self)
google-photos-deduper-worker-1  | requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://photoslibrary.googleapis.com/v1/mediaItems?pageSize=100

Some duplicate images not in resultlist

Hello,

The scan seems to miss some photos that are exactly the same (image, filename, dimensions). I don't know why they don't show up in the similarity map. Sometimes running it a few times eventually finds them. I suppose its because the model doesn't use the metadata, only the content to compare the images?

This got me thinking that a simple scanning feature could be just a comparison of photo metadata instead of content wise.
So this is somewhere between a bug and a feature request.

The deletion feature is also something that could be used separately, e.g. load a list of IDs and run that through the plugin.

PS: it's a really cool project

I encountered an error

image

the log :

worker_1  | [2023-10-30 10:19:52,049: ERROR/ForkPoolWorker-31] Task app.tasks.process_duplicates[f30624a7-2829-4fc4-819f-dae7856e41c3] raised unexpected: RuntimeError('Expected image with 1 (grayscale), 3 (RGB) or 4 (RGBA) channels, found 2 channels.')
worker_1  | Traceback (most recent call last):
worker_1  |   File "/usr/local/lib/python3.9/site-packages/celery/app/trace.py", line 477, in trace_task
worker_1  |     R = retval = fun(*args, **kwargs)
worker_1  |   File "/usr/src/app/app/__init__.py", line 25, in __call__
worker_1  |     return self.run(*args, **kwargs)
worker_1  |   File "/usr/src/app/app/tasks.py", line 90, in process_duplicates
worker_1  |     results = task_instance.run()
worker_1  |   File "/usr/src/app/app/lib/process_duplicates_task.py", line 109, in run
worker_1  |     similarity_map = duplicate_detector.calculate_similarity_map()
worker_1  |   File "/usr/src/app/app/lib/duplicate_image_detector.py", line 61, in calculate_similarity_map
worker_1  |     embeddings = self._calculate_embeddings()
worker_1  |   File "/usr/src/app/app/lib/duplicate_image_detector.py", line 114, in _calculate_embeddings
worker_1  |     mp_image = mp.Image.create_from_file(storage_path)
worker_1  | RuntimeError: Expected image with 1 (grayscale), 3 (RGB) or 4 (RGBA) channels, found 2 channels.
root@googlePh:~/google-photos-deduper#

FileNotFoundError: [Errno 2] No such file or directory: 'client_secret_ ... .apps.googleusercontent.com.json'

 docker-compose run python python -m google_photos_deduper
time="2022-04-19T16:16:49+10:00" level=warning msg="The \"ACCESS_TOKEN\" variable is not set. Defaulting to a blank string."
[+] Running 1/0
 - Container google-photos-deduper-python-mongo-1  Running                                                                                                         0.0s
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/src/app/google_photos_deduper/__main__.py", line 64, in <module>
    run()
  File "/usr/src/app/google_photos_deduper/__main__.py", line 24, in run
    flow = Flow.from_client_secrets_file(
  File "/usr/local/lib/python3.9/site-packages/google_auth_oauthlib/flow.py", line 201, in from_client_secrets_file
    with open(client_secrets_file, "r") as json_file:
FileNotFoundError: [Errno 2] No such file or directory: 'client_secret_792370018435-b70v71v2e7cb3i3s67cdg74qtfao5a1t.apps.googleusercontent.com.json'

Suggest improvements to README

Hi Mack! Thanks for the project!

However, there are some unmentioned points (or easily missed when reading the official docs) to get the app up and running. So I would suggest mention the following:

  1. enable Photos Library API in Google Cloud console.
  2. in OAuth client configuration, add http://localhost/auth/google/callback to "Authorized redirect URIs"
  3. in OAuth consent screen settings, add your Gmail to test users.

Help with "Set up local environment variables."

Apologies in advance, a complete "terminal" noobie here.... but nearly there... diligently followed all steps and now at this one.

  1. cp example.env .env - DONE and verified by using "vim .env" and I am editing a file

  2. Generate FLASK_SECRET_KEY with python -c 'import secrets; print(secrets.token_hex())' and add it to .env. - DONE - I had to run "python3 xxx" but generated a long Key that I copied and in the first line of .env I replaced "FLASK_SECRET_KEY=XYZ" with my secret Key overwriting "XYZ" what was there from the "I assume" example.env file

  3. Add GOOGLE_CLIENT_ID and GOOGLE_CLIENT_SECRET from the client_id and client_secret values from the client secret file created above.

This is where I am stuck, where do I get GOOGLE_CLIENT_ID and GOOGLE_CLIENT_SECRET from the client secret file... is this the JSON File created in the step before "Go to APIs & Services > Credentials > Create Credentials > OAuth client ID" ? I would appreciate unpacking this step with some more detail.

Really appreciate any help. Thanks

No upload into the Duplicate folder

Hi there,
I succeded in running the script, but unfortunately this does not move the duplicates into the Album created, so I cannot remove the files.
Any idea to fix?

Thank you!

Server 404

Hi! No matter what I try and in which port, I'm always getting 404 when trying to connect to the server through http://localhost

Does anyone have this issue and can give me ideas? I have no idea what's going wrong since it seems to be running correctly, but when I try to access localhost in my browser, that's what my terminal shows:

google-photos-deduper-main-server-1 | 192.168.65.1 - - [04/Oct/2023 20:26:13] "GET / HTTP/1.1" 404 -
google-photos-deduper-main-server-1 | 192.168.65.1 - - [04/Oct/2023 20:26:13] "GET / HTTP/1.1" 404 -

I already tried every solution from ChatGPT and it's still not working. The flask has to be the problem, since the client opens a page at localhost:3000

Thanks in advance for any help!

Deleting Photo Fails

After selecting duplicates and clicking the Delete Duplicates button, a new window is launched with each photo in question, but fails to delete. Inspecting the service worker yields the following error:

service_worker.ts-72f4954d.js:1 navigateAndDelete error Timeout: failed to delete mediaItem <id> within 10s, skipping

It doesn't appear as though the extension is clicking the delete button for some reason. Has Google changed things a bit to make the extension no longer function?

Standalone deleter

The chrome plugin by itself is interesting. Suppose I have list of MediaItemIds that I would like to run through the plugin, how would that work? I can handle the upload of a JSON/CSV but do you have some React skeleton code that I might use to invoke the plugin?

thanks

Bob

Error processing duplicates

Seemed to be running, but I think I hit some rate limits. The task was moving forward slowly. Finally it got an error in the web UI, but I am not sure which logs to check. Might be good to add to the docs:

image

Error Building Image

Safely at the next install step, wonder if there is any advice you can give on where to go next ;-)

anthonyjclarke@Anthonys-MBP google-photos-deduper % docker-compose up
[+] Running 36/36
✔ flower 9 layers [⣿⣿⣿⣿⣿⣿⣿⣿⣿] 0B/0B Pulled 37.3s
✔ 9fda8d8052c6 Pull complete 27.1s
✔ e7ae3e644d56 Pull complete 28.3s
✔ 4da8ee49dcce Pull complete 30.3s
✔ 7c6a85daf644 Pull complete 30.4s
✔ 35f63c7ece91 Pull complete 30.7s
✔ c4c0cbd43a96 Pull complete 31.4s
✔ adcb2fdba780 Pull complete 32.7s
✔ ae1f36545ccf Pull complete 31.7s
✔ aa6324cb4929 Pull complete 32.4s
✔ redis 8 layers [⣿⣿⣿⣿⣿⣿⣿⣿] 0B/0B Pulled 13.0s
✔ a5573528b1f0 Pull complete 3.6s
✔ 5510d86d1248 Pull complete 0.8s
✔ da38f099d0c0 Pull complete 0.8s
✔ 1c7eb85776c1 Pull complete 3.8s
✔ b01ad51b2004 Pull complete 8.6s
✔ ced83491d1f3 Pull complete 4.4s
✔ 4f4fb700ef54 Pull complete 4.6s
✔ 4ee968e6f056 Pull complete 5.1s
✔ mongo 9 layers [⣿⣿⣿⣿⣿⣿⣿⣿⣿] 0B/0B Pulled 31.8s
✔ d519a3a2a796 Pull complete 9.1s
✔ 352ba6b7451f Pull complete 6.9s
✔ a6ded4191389 Pull complete 10.2s
✔ c0ab25682bfe Pull complete 10.3s
✔ fb81d91cc097 Pull complete 10.8s
✔ ac8819c2b7ec Pull complete 10.9s
✔ 73d757d8e05c Pull complete 11.4s
✔ bc3edf585167 Pull complete 26.4s
✔ 304d69c595fa Pull complete 11.9s
✔ nginx 6 layers [⣿⣿⣿⣿⣿⣿] 0B/0B Pulled 33.8s
✔ 8897d65c8417 Pull complete 29.3s
✔ fbc138d1d206 Pull complete 13.1s
✔ 06f386eb9182 Pull complete 16.4s
✔ aeb2f3db77c3 Pull complete 19.1s
✔ 64fb762834ec Pull complete 21.3s
✔ e5a7e61f6ff4 Pull complete 26.0s
[+] Building 214.2s (30/35) docker:desktop-linux
=> [worker internal] load .dockerignore 10.1s
=> => transferring context: 328B 10.0s
=> [worker internal] load build definition from Dockerfile 10.1s
=> => transferring dockerfile: 1.07kB 10.0s
=> [server internal] load build definition from Dockerfile 10.1s
=> => transferring dockerfile: 1.07kB 10.0s
=> [server internal] load .dockerignore 10.1s
=> => transferring context: 328B 10.0s
=> [client internal] load .dockerignore 10.1s
=> => transferring context: 136B 10.0s
=> [client internal] load build definition from Dockerfile 10.0s
=> => transferring dockerfile: 271B 10.0s
=> [worker internal] load metadata for docker.io/library/python:3.9 24.8s
=> [client internal] load metadata for docker.io/library/node:20-alpine 23.6s
=> [worker auth] library/python:pull token for registry-1.docker.io 0.0s
=> [client auth] library/node:pull token for registry-1.docker.io 0.0s
=> [client 1/6] FROM docker.io/library/node:20-alpine@sha256:8e6a472eb9742f4f486ca9ef13321b7fc2e54f2f60814f339eeda2aff3037573 19.8s
=> => resolve docker.io/library/node:20-alpine@sha256:8e6a472eb9742f4f486ca9ef13321b7fc2e54f2f60814f339eeda2aff3037573 0.0s
=> => sha256:8e6a472eb9742f4f486ca9ef13321b7fc2e54f2f60814f339eeda2aff3037573 1.43kB / 1.43kB 0.0s
=> => sha256:6dbf56a08bcade5ee1e2196cce346182ab52bad9dcf308f4bc7b36eefb318662 1.16kB / 1.16kB 0.0s
=> => sha256:9d9d9a7b83de49f3f37b8410fb205135c4fb279835e43d6ecaf75eb755dc2b9e 7.15kB / 7.15kB 0.0s
=> => sha256:c303524923177661067f7eb378c3dd5277088c2676ebd1cd78e68397bb80fdbf 3.35MB / 3.35MB 0.7s
=> => sha256:2ec53874fe288a72a8b7207e1695ec626ea0bc80a7c3fa3e872c694d1605add7 42.01MB / 42.01MB 18.8s
=> => sha256:3946ff1ba9858829b3ccbb2f4b3e06515829d3847659ea96287591a76801fe29 2.34MB / 2.34MB 1.1s
=> => extracting sha256:c303524923177661067f7eb378c3dd5277088c2676ebd1cd78e68397bb80fdbf 0.1s
=> => sha256:aa2987d39b19e8c4f23a1841693e2ea0b850c3817af22d9a0623b2c9c351ef8b 446B / 446B 0.9s
=> => extracting sha256:2ec53874fe288a72a8b7207e1695ec626ea0bc80a7c3fa3e872c694d1605add7 0.8s
=> => extracting sha256:3946ff1ba9858829b3ccbb2f4b3e06515829d3847659ea96287591a76801fe29 0.0s
=> => extracting sha256:aa2987d39b19e8c4f23a1841693e2ea0b850c3817af22d9a0623b2c9c351ef8b 0.0s
=> [client internal] load build context 10.0s
=> => transferring context: 192.87kB 10.0s
=> [server base 1/5] FROM docker.io/library/python:3.9@sha256:3d9dbe78e1f45ed2eb525b462cdb02247cc0956713325aeeffa37cb5f2c8c42e 50.1s
=> => resolve docker.io/library/python:3.9@sha256:3d9dbe78e1f45ed2eb525b462cdb02247cc0956713325aeeffa37cb5f2c8c42e 0.0s
=> => sha256:5665c1f9a9e17acd68ae05b2839df402eac34afdd095f8c115f09886d757840c 49.59MB / 49.59MB 32.9s
=> => sha256:3d9dbe78e1f45ed2eb525b462cdb02247cc0956713325aeeffa37cb5f2c8c42e 1.86kB / 1.86kB 0.0s
=> => sha256:a22b9266997b4821003361a296574dd60cd31603abd7483f6d8d5e6308b273bc 7.34kB / 7.34kB 0.0s
=> => sha256:f419b1a62fc83850ab3cb43274970bb20a18ae6e674535478a48f5bee11559b6 23.58MB / 23.58MB 3.1s
=> => sha256:219b621d810b25485a046dbb4aa5ba50cd1190a775438449a245b2558a06c39e 2.01kB / 2.01kB 0.0s
=> => sha256:76b4f1810f998c1f1580e2404b2e7fed8e264902d898bbe531443ea9789b7641 63.99MB / 63.99MB 11.7s
=> => sha256:1c176cbf649709b5d8a03720a6c53e18e33ad50feef33abe83c5ae95c5aabdb2 202.50MB / 202.50MB 45.4s
=> => sha256:ba0d9396537e9f0e9dfcfdbc88e19bf081ba7c18180e6db53fa370789e309f4d 6.47MB / 6.47MB 25.6s
=> => sha256:cf458769c92c44dc19dd1117e06e84d0c974725309b0cffd50cad029495ec3db 15.54MB / 15.54MB 37.5s
=> => extracting sha256:5665c1f9a9e17acd68ae05b2839df402eac34afdd095f8c115f09886d757840c 1.4s
=> => sha256:a76a1914532c78be4cfa274a122f28dec35a574ec6a723116e1a757df7b89a9e 245B / 245B 33.7s
=> => sha256:03729fef6de7c3f230cffcba7e29e29cdc6f1f51b14acd24d0d546ca940333ca 2.85MB / 2.85MB 35.6s
=> => extracting sha256:f419b1a62fc83850ab3cb43274970bb20a18ae6e674535478a48f5bee11559b6 0.4s
=> => extracting sha256:76b4f1810f998c1f1580e2404b2e7fed8e264902d898bbe531443ea9789b7641 1.6s
=> => extracting sha256:1c176cbf649709b5d8a03720a6c53e18e33ad50feef33abe83c5ae95c5aabdb2 3.9s
=> => extracting sha256:ba0d9396537e9f0e9dfcfdbc88e19bf081ba7c18180e6db53fa370789e309f4d 0.2s
=> => extracting sha256:cf458769c92c44dc19dd1117e06e84d0c974725309b0cffd50cad029495ec3db 0.3s
=> => extracting sha256:a76a1914532c78be4cfa274a122f28dec35a574ec6a723116e1a757df7b89a9e 0.0s
=> => extracting sha256:03729fef6de7c3f230cffcba7e29e29cdc6f1f51b14acd24d0d546ca940333ca 0.1s
=> [worker internal] load build context 10.1s
=> => transferring context: 266.71kB 10.0s
=> [server internal] load build context 10.0s
=> => transferring context: 266.71kB 10.0s
=> [client 2/6] WORKDIR /app 0.2s
=> [client 3/6] COPY package.json ./ 0.0s
=> [client 4/6] COPY package-lock.json ./ 0.0s
=> [client 5/6] RUN npm ci && npm cache clean --force 18.9s
=> [client 6/6] COPY ./ ./ 0.0s
=> [client] exporting to image 1.7s
=> => exporting layers 1.7s
=> => writing image sha256:e5fcf62adc8ed54a2bdbebe28e271d7724f044cfdceb863aafff60b7562c59f1 0.0s
=> => naming to docker.io/library/google-photos-deduper-client 0.0s
=> [server base 2/5] RUN apt-get update && apt-get install -y python3-opencv && rm -rf /var/lib/apt/lists/* 41.4s
=> [worker base 3/5] WORKDIR /usr/src/app 0.0s
=> [server base 4/5] COPY requirements.txt ./ 0.0s
=> [server base 5/5] RUN pip install --no-cache-dir -r requirements.txt 61.4s
=> [server dev 1/3] COPY requirements-dev.txt ./ 0.0s
=> [server dev 2/3] RUN pip install --no-cache-dir -r requirements-dev.txt 4.2s
=> [server dev 3/3] COPY . . 0.0s
=> [server] exporting to image 2.0s
=> => exporting layers 2.0s
=> => writing image sha256:0398252e2ec3119865f40d877561ae5148a7d4e78b129bf736aa69b8f8d370e2 0.0s
=> => naming to docker.io/library/google-photos-deduper-server 0.0s
=> [worker] exporting to image 2.0s
=> => exporting layers 2.0s
=> => writing image sha256:117005aa23030cf4aa886bc7d37489070a24de5c09f8bae8aa2a08b32cbdd893 0.0s
=> => naming to docker.io/library/google-photos-deduper-worker 0.0s
[+] Running 10/10
✔ Network google-photos-deduper_default Created 0.0s
✔ Volume "google-photos-deduper_image-volume" Created 0.0s
✔ Volume "google-photos-deduper_client-node-modules" Created 0.0s
✔ Container google-photos-deduper-client-1 Created 2.8s
✔ Container google-photos-deduper-mongo-1 Created 0.1s
✔ Container google-photos-deduper-redis-1 Created 0.1s
✔ Container google-photos-deduper-flower-1 Created 0.1s
✔ Container google-photos-deduper-server-1 Created 0.1s
✔ Container google-photos-deduper-worker-1 Created 0.1s
✔ Container google-photos-deduper-nginx-1 Created 0.1s
Attaching to client-1, flower-1, mongo-1, nginx-1, redis-1, server-1, worker-1
redis-1 | 10:C 20 Jan 2024 01:49:57.885 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see jemalloc/jemalloc#1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
redis-1 | 10:C 20 Jan 2024 01:49:57.887 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis-1 | 10:C 20 Jan 2024 01:49:57.887 * Redis version=7.2.4, bits=64, commit=00000000, modified=0, pid=10, just started
redis-1 | 10:C 20 Jan 2024 01:49:57.887 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
redis-1 | 10:M 20 Jan 2024 01:49:57.887 * monotonic clock: POSIX clock_gettime
redis-1 | 10:M 20 Jan 2024 01:49:57.888 * Running mode=standalone, port=6379.
redis-1 | 10:M 20 Jan 2024 01:49:57.888 * Server initialized
redis-1 | 10:M 20 Jan 2024 01:49:57.888 * Ready to accept connections tcp
mongo-1 | {"t":{"$date":"2024-01-20T01:49:57.933+00:00"},"s":"I", "c":"CONTROL", "id":23285, "ctx":"main","msg":"Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'"}
mongo-1 | {"t":{"$date":"2024-01-20T01:49:57.937+00:00"},"s":"I", "c":"NETWORK", "id":4648601, "ctx":"main","msg":"Implicit TCP FastOpen unavailable. If TCP FastOpen is required, set tcpFastOpenServer, tcpFastOpenClient, and tcpFastOpenQueueSize."}
mongo-1 | {"t":{"$date":"2024-01-20T01:49:57.937+00:00"},"s":"I", "c":"STORAGE", "id":4615611, "ctx":"initandlisten","msg":"MongoDB starting","attr":{"pid":1,"port":27017,"dbPath":"/data/db","architecture":"64-bit","host":"fdc8b05dec19"}}
mongo-1 | {"t":{"$date":"2024-01-20T01:49:57.937+00:00"},"s":"I", "c":"CONTROL", "id":23403, "ctx":"initandlisten","msg":"Build Info","attr":{"buildInfo":{"version":"4.4.28","gitVersion":"61c2baf63a060f7c12bd76e779044800ae18710b","openSSLVersion":"OpenSSL 1.1.1f 31 Mar 2020","modules":[],"allocator":"tcmalloc","environment":{"distmod":"ubuntu2004","distarch":"aarch64","target_arch":"aarch64"}}}}
mongo-1 | {"t":{"$date":"2024-01-20T01:49:57.937+00:00"},"s":"I", "c":"CONTROL", "id":51765, "ctx":"initandlisten","msg":"Operating System","attr":{"os":{"name":"Ubuntu","version":"20.04"}}}
mongo-1 | {"t":{"$date":"2024-01-20T01:49:57.937+00:00"},"s":"I", "c":"CONTROL", "id":21951, "ctx":"initandlisten","msg":"Options set by command line","attr":{"options":{"net":{"bindIp":"*"}}}}
mongo-1 | {"t":{"$date":"2024-01-20T01:49:57.938+00:00"},"s":"I", "c":"STORAGE", "id":22297, "ctx":"initandlisten","msg":"Using the XFS filesystem is strongly recommended with the WiredTiger storage engine. See http://dochub.mongodb.org/core/prodnotes-filesystem","tags":["startupWarnings"]}
mongo-1 | {"t":{"$date":"2024-01-20T01:49:57.938+00:00"},"s":"I", "c":"STORAGE", "id":22315, "ctx":"initandlisten","msg":"Opening WiredTiger","attr":{"config":"create,cache_size=3411M,session_max=33000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000,close_scan_interval=10,close_handle_minimum=250),statistics_log=(wait=0),verbose=[recovery_progress,checkpoint_progress,compact_progress],"}}
Gracefully stopping... (press Ctrl+C again to force)
[+] Killing 0/0
[+] Stopping 4/5gle-photos-deduper-worker-1 Killing 0.1s
⠹ Container google-photos-deduper-worker-1 Killing 0.3s
✔ Container google-photos-deduper-flower-1 Killed 0.3s
✔ Container google-photos-deduper-redis-1 Killed 0.2s
⠹ Container google-photos-deduper-worker-1 Stopping 0.3s
[+] Stopping 7/5gle-photos-deduper-mongo-1 Killed 0.2s
✔ Container google-photos-deduper-worker-1 Stopped 0.3s
✔ Container google-photos-deduper-nginx-1 Stopped 0.0s
✔ Container google-photos-deduper-flower-1 Stopped 0.3s
✔ Container google-photos-deduper-client-1 Stopped 0.3s
✔ Container google-photos-deduper-server-1 Stopped 0.0s
✔ Container google-photos-deduper-redis-1 Stopped 0.0s
✔ Container google-photos-deduper-mongo-1 Stopped 0.0s
redis-1 exited with code 137
mongo-1 exited with code 137
flower-1 exited with code 0
flower-1 exited with code 137
client-1 exited with code 0
client-1 exited with code 137
worker-1 exited with code 0
worker-1 exited with code 137
Error response from daemon: Ports are not available: exposing port TCP 0.0.0.0:5000 -> 0.0.0.0:0: listen tcp 0.0.0.0:5000: bind: address already in use

Allow to keep more than photo

Sometimes if there are multiple duplicates that are not exact, I'd want to keep more than one photo (say 2 out of 5). Having, for example, a check to keep additional duplicate would be helpful in such situation

403 forbidden possibly due to rate limit

Hi,

I've started last evening the process which processed quite quickly until it finished the 75,000 "Base URL requests per day" quota,
Overnight it continued advancing very slowly the numbers with "WARNING/ForkPoolWorker-32] Received 429 Client Error: Too Many Requests for url" (maybe skipped those). which is fine.

Waiting for subtasks to complete... (934 / 2056) advanced over this time (429 errors) about 200 numbers...
At pacific midnight the quota was reset, the app started chugging again but now I got:

"WARNING/ForkPoolWorker-23] Received 403 Client Error: Forbidden for url: https://lh3.googleusercontent.com/lr/... getting media item size"

tons of them quickly, those tasks numbers raised a bit, perhaps 100-200 and stopped raising, at this time tens of thousands of requests (by the logs) happened with those 403 forbidden.

From reading about this - this is also some rate limit, here it's mentioned to add "referrerpolicy". so what are those 403 errors?

Ability to limit the library scan by date

Currently, it tries to download ALL photos from my Google Photos library. But having more than 100K images it's easy to go over the API quota. There should be either an option to specify how far back I want the app to scan my library (e.g. last 3 years) OR two date fields to specify start and end date for the scan.

Add duplicate filtering

It would be convenient to be able to filter duplicates by parameters, e.g., show only those with different resolutions, those with the same name, or those with the same size, etc.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.