- Install WSL
- Set up apache airflow
- Install mongodb on WSL
Optional if you want to use virtual environment
- Rename this repo to dags and put it inside your airflow folder
- airflow
- dags
- twitter_dag.py
- ...
- dags
- Run
pip install -r requirements
- Create .env file and copy and paste from .sample-env, replacing the variables with your own values
- Download chrome and chromedriver for selenium to run
- Go to https://googlechromelabs.github.io/chrome-for-testing/last-known-good-versions-with-downloads.json
- Click the first url under channels -> Stable -> downloads -> chrome
- Click the first url under channels -> Stable -> downloads -> chromedriver
- unzip both folder and copy and paste both folder into $HOME directory of your WSL
- Run
airflow scheduler
- Open another terminal and run
airflow webserver --port 8080
- Open localhost:8080 and find dag with the name
is459_assignment1_twitter
- Click the run button and click
Trigger DAG
- Use the web UI to check if DAG is successful
- Check if documents were successfully inserted into mongodb using mongo shell
- Open another terminal and run
mongo
(ormongosh
ifmongo
does not work) - Run
show dbs
- Find the name of the db which is
twitter_db
and runuse twitter_db
- Run
show collections
which should give you{topic}_collection
(replace {topic} with your topic) - Run
db.{topic}_collection.find()
to see your tweets