See more on introduction slides, project survey and demo.
- Python 3.x
- Recommended to create a new virtual environment to manage your python project.
- Download python packages from
requirements.txt
:$ pip install -r requirements.txt
. - Download NLTK data:
$ python -m nltk.downloader all
. - Download SpaCy
en_core_web_md
model:$ python -m spacy download en_core_web_md
. - Download
stanford-ner-xxxx-xx-xx
zip file Stanford NER model- Download from the official website.
- Unzip and place the
stanford-ner-xxxx-xx-xx
folder the project root path. The name of folder should also bestanford-ner/
.
- Create a twitter and reddit account, follow the accounts that you are interested in.
- Copy
config-sample.json
and rename it toconfig.json
in the same directory. Remember to fill the keys inconfig.json
. (Go to your twitter/reddit developer console, create application and get keys.) - We need to crawl twitter data, so run the script
crawlers/twitter_crawler.py
. It will automatically crawl data and save them todataset/twitter/
by default. - You can customize data entities by modifying
domains.json
andtypes.json
. (See demo) - Currently, you can execute
demo/demo_howard.ipynb
or other notebooks to see daily digest.