Coder Social home page Coder Social logo

pblca's Introduction

Pinboard Link Checker and Archiver (PBLCA)

Lately I have become increasingly worried about the problem of link rot. I've been using Pinboard for more than 5 years, and I have over 1200 bookmarks, so out of curiosity I decided to check how many of these links were dead. The result was that more than 5% of the links didn't exist any more, or redirected to a 403 page. Since I have been continously adding bookmarks over this time, this means that the link rot rate is (significantly) larger than 5% per 5 years.

I've also added the option to look for the closest snapshot (to the bookmark creation date) of the dead link in the internet archive and update the bookmark to redirect to it. If no snapshot exists, the script will ask you whether you want to delete the bookmark or keep it.

PBLCA uses multiprocessing to query all your bookmarks, so it should be relatively fast. Checking 1250 of my bookmarks takes around 6 minutes.

Usage

Clone the repo and cd to the folder:

git clone https://github.com/Fackelmann/PBLCA
cd PBLCA

Create the poetry virtual environment:

poetry update

And run it, providing your Pinboard API token

poetry run pblca --token USERNAME:API_TOKEN

If you don't have poetry installed, you'll need to install it first:

pip3 install poetry

Testing

To run pytest, go to the top level directory and run:

make pytest

and

make mypy

You will need to create config.py under tests with your Pinboard API token:

config.py

VALID_TOKEN = USERNAME:TOKEN
INVALID_TOKEN = AN_INVALID:TOKEN

TODO

  • Add options for batch processing

Known issues

  • There is no real way to update a bookmark via the Pinboard API, as the URL is the key. PBLCA will create a new bookmark with the same attributes (including creation date), and delete the old one. Not a real issue from a functionality perspective, but worth mentioning.
  • A (very) few bookmarks will show up as dead even though you can still access the page with your browser. It seems to be an issue with the headers.

Disclaimer

  • Please use at your own risk. ALWAYS have a backup of your data.

pblca's People

Contributors

fackelmann avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.