Coder Social home page Coder Social logo

retriever's Introduction

Retriever tracks files on the internet and notifies you when they are updated.

Currently, I use it to let me know when new editions of some financial research papers that I follow are available. But it would work for anything else (text files, images, videos, etc).

How to run?

Pre-requisite

Having a Mailjet account. If you prefer using another email provider, you can adjust that in utils/emailing.py

Steps

  1. Set-up a python virtual env (3.7 or above)

    python3 -m venv .venv
    
    source .venv/bin/activate
  2. Install the dependencies

    pip install -r requirements.txt
  3. Create a checksums directory somewhere in the filesystem

  4. Set the project environment variables

Environment variable Description Required
RETRIEVER_MJ_API_KEY Mailjet API key Yes
RETRIEVER_MJ_API_SECRET Mailjet API secret Yes
RETRIEVER_CHECKSUM_DIR_PATH Path to the checksums directory (with a / at the end!) Yes
RETRIEVER_RECEIVER_EMAIL Emails address where emails should be sent Yes
RETRIEVER_RECEIVER_NAME Name of the receiver if any No
RETRIEVER_SENDER_EMAIL Email to use for the sender Yes

You're done, you can now execute the script! ๐ŸŽ‰

python retriever.py

How to run retriever frequently?

If you have a linux instance at your disposal, one way to run retriever regularly is to add a crontab entry (crontab -e).

If you want this script to run, say, every day at 8am, that would look something like:

SHELL=/bin/bash
MAILTO=""
0 8 * * * (cd ~/retriever && . .venv/bin/activate && . .env && python retriever.py) >> ~/retriever/logs.txt 2>&1

where:

  • SHELL=/bin/bash sets the shell to bash, a standard, more rich featured shell than sh
  • MAILTO=""" makes sure crontab doesn't try to send us emails. If we don't do that cron will try and it will likely fail because most linux instances don't have a STMP server configured by default.
  • 0 8 * * * is setting the frequency at which cronjob runs the command, i.e. every day of every month at 8 (A.M.)
  • cd ~/retriever set the current directory of the bash session to the location of the project
  • . .venv/bin/activate enters the python virtual environment
  • . .env executes a shell script loading the environment variables
  • python retriever.py calls the python interpreter to execute our program
  • >> ~/retriever/logs.txt 2>&1 tells bash to append the stout and sterr of all the preceding commands to ~/retriever/logs.txt (and create the file is it doesn't exist)

retriever's People

Contributors

nathandem avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.