Coder Social home page Coder Social logo

siddeshsambasivam / newscastapi Goto Github PK

View Code? Open in Web Editor NEW
3.0 1.0 0.0 9.62 MB

Newscast API is a simple REST API to get you all the news articles for any given query word.

Home Page: https://newscastapi.readthedocs.io/en/latest/

License: GNU Lesser General Public License v2.1

Python 98.63% Dockerfile 1.37%
data-crawling data-preprocessing rest-api software-engineering

newscastapi's Introduction

NewscastAPI

InspirationReleasesContributing

NewscastAPI is a web service to provide news for a given word from various sources.

The API provides the following data for each news article,

  1. Headline
  2. Source
  3. url to the article
  4. published timestamp
  5. category
  6. country

Applications: Tracking sentiment of a specific person in news, searching for buzz words, etc.

Inspiration

I was working on a personal project to track sentiment of a given word across news articles and tweets hence I wanted an API to fetch all the news headlines a given word.

Luckily, I found quite a lot of alternatives which provided the exact service, but all of them were either expensive or had a lot of restrictions for its usage. So I thought of building something which does the job at an acceptable performance.

Releases

  • 0.1.0
    • ADD: Google news crawler, Endpoint to access news.

Contributing

When contributing to this repository, please first discuss the change you wish to make via issue or email.

  1. Implement crawlers for new sites

newscastapi's People

Contributors

abhinav112 avatar jwright707 avatar siddeshsambasivam avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

newscastapi's Issues

Refactor: Parsing results dataframe

  • Results dataframe is converted to a list of records, in which each record is converted a dict.
  • Above operation is duplicated in a lot of places in the get_results() function and hence the set of operations could be made into a functional component.

News crawler not working

News crawler is not working and hence recent news are not added to the database.

Potential Causes: MongoDB database maximum size reached
Suggested Solution: Create a new collection whenever the limit is reached and write a merge function to merge all collection during local caching

Update README.md

The examples in the usage section could be visual images rather than a code block. And its code could be inside the examples folder.

Automated crawling for given query with zero results

Is your feature request related to a problem? Please describe.
When a query which is not present in the database results in zero search results. This is the case when it is not trending or a buzz topic at any point in time.

Describe the solution you'd like
Build a crawler that crawls for the given query when there are zero results.

Restructure src files

  • Move the app.py to the src folder
  • Create an __init__.py file in the root directory
  • Make necessary changes to Procfile (app:app -> src.app:app)

Perform Sentiment Analysis

Implement sentiment analysis for the news data

Sentiment value is signficant for various analytical tasks; hence a sentiment analysis model has to be integrated into the backend to perform the SA.

As for the implementation part, you can fork the project and work on a separate branch to create the inference for the model which will be later added to the backend by the maintainers.

  • Perform cost analysis
    • check the price of google ML services
    • compare the price with the cost of hosting model in AWS
    • make a decision on which option to implement
      Please share the relevant results and information in this issue

Provide data from other information sources

Is your feature request related to a problem? Please describe.
No. In addition to news articles, the API can provide tweets, Reddit threads.

Describe the solution you'd like
Create a new crawler which scrapes from various other sources

Provide context news

Feature: For a given news article add a new feature to provide context by providing related articles from the past.

Naive Implementation:

  • convert news headlines to word embeddings
  • use KNN to find the top 5 relevant news articles (cosine similarity)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.