codeforafrica-scrapers / taxclock_ke_news Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 3.0 57 KB

[morph] Scrape business stories to be used on TaxClock KE accessible at https://taxclock.codeforkenya.org/

Home Page: https://morph.io/CodeForAfrica-SCRAPERS/taxclock_ke_standard

Python 100.00%

taxclock budget news kenya africa african-spending african-languages tax

taxclock_ke_news's People

Contributors

Stargazers

Watchers

Forkers

chegejames josephluvanda amutava

taxclock_ke_news's Issues

When no image, let's link to placeholder.

Some of the stories we scrape might not have images so we should link through to the following placeholder - https://github.com/CodeForAfrica/TaxClock/blob/kenya/img/placeholder.png

Add a try catch statement when requesting a page.

Currently when we request for a page and the website is down or unreachable, the scraper fails. It should instead just log an error and move to the next website to scrape.

Add budget/business news from Kenyan newsrooms.

We are currently only scraping the Standard Newspaper's stories but we should also scrape data on other newsrooms that are doing financial, business, and budget stories.

Business daily
Nation.co.ke
The Star
Capital FM

[DEV] Scraper should allow for download to local

Currently the scraper is only pushing data to S3. We should instead look at having the data saved locally when S3 isn't set for local development.

[DEV] Align all scrapers

The Standard scraper is currently being initialized and run different from the rest. We should instead strive for uniformity.

NOTE: This is in reference to the develop branch

Improve logging

Improve how we are doing logging including using our slack logger code - https://github.com/CodeForAfricaLabs/python-slack-logger

Related: #18

[BUG] Nation Scraper doesn't seem to work

We should check on Nation scraper as there is no data being saved to the JSON.

Combine all news items into one JSON.

Currently we have each news outlet as it's own separate JSON.

For our purposes of having the stories displayed on https://taxclock.codeforkenya.org/ We should combine them into a single data/news.json that is sorted by time and limited to 7 stories.

codeforafrica-scrapers / taxclock_ke_news Goto Github PK

taxclock_ke_news's People

Contributors

Stargazers

Watchers

Forkers

taxclock_ke_news's Issues

When no image, let's link to placeholder.

Add a try catch statement when requesting a page.

Add budget/business news from Kenyan newsrooms.

[DEV] Scraper should allow for download to local

[DEV] Align all scrapers

Improve logging

[BUG] Nation Scraper doesn't seem to work

Combine all news items into one JSON.

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent