Coder Social home page Coder Social logo

yedhink / covid19-kerala-api-deprecated Goto Github PK

View Code? Open in Web Editor NEW
13.0 3.0 4.0 8.15 MB

Deprecated - A fast API service for retrieving day to day stats about Coronavirus(COVID-19, SARS-CoV-2) outbreak in Kerala(India).

Home Page: https://covid19-kerala-api.herokuapp.com/

License: MIT License

Makefile 1.27% Go 45.62% Python 37.14% HTML 15.91% Procfile 0.06%
covid19 covid19-data covid-data coronavirus coronavirus-tracking coronavirus-real-time covid19india covid19kerala covid19dataindia covid19datakerala

covid19-kerala-api-deprecated's Introduction

COVID19 Kerala API

Note

The project is currently unmaintained since the source of the covid data, in pdf form, had got highly inconsistent. Thus the parsing of such data pdfs, which changes layout on a daily basis, requires a lot of modifications to the existing scripts and a lot of time investment.

Why?

Manually collecting and updating the data from the pdf sources is time consuming and energy draining! Make use of this API to automatically retrieve the latest as well some of the old COVID19 data specific to Kerala in JSON format, into your applications with ease.


Table of Contents


Source

Currently the API auto collects the data from http://dhs.kerala.gov.in/. This site provides reliable data in a very unreliable and inconsistent format. Thus some of the data from certain dates are still missing in the dataset. Currently trying to find a solution to extract data from some of the inconsistent data PDF's.

Usage

All you have to do is make a simple GET request to get the indented JSON


API Details

The API currently contains three endpoints /api, /timeline and /location at the moment.

The data can be viewed from the browser by visiting say https://covid19-kerala-api.herokuapp.com/api or just use curl magic, sugar coated by jq to view a neat response:

curl "https://covid19-kerala-api.herokuapp.com/api" | jq

Notes

  • All json responses consist of a success key, which denotes whether the user request was valid or whether it retrieved any data
  • All the timestamps in results follow ISO 8601
  • Sometimes the response might be slow - because heroku shuts down it's dynos after a certain interval of inactivity and it has to restart when a request is made in such a state
  • The JSON responses are not indented for performance sake. All examples are shown in indented format for readability only

API Endpoint

The /api endpoint serves the whole available data in the following JSON format(this is a rough format):

{
	"success": true | false, // whether the request is valid or not
    {oldest-timestamp} : {
        {district1}: {
            {cases-deaths-etc}: {int(cases)},
            ...,
            ...,
            ...,
            "other_districts": {
                {district} : {number_of_persons}
            }
        },
        {district2}:{...}
        ...,
        "total":{...}
    },
    ...,
    ...,
    ...,
    {latest-timestamp}: {
        {similar-to-above-entry-but-data-values-corresponds-to-timestamp}
    }
}

Example

curl "https://covid19-kerala-api.herokuapp.com/api" | jq

Location Endpoint

The /api/location endpoint can serve a variety of data based on the query parameters that the user provides. The default response is an array of the possible location values acceptable by loc parameter:

curl "https://covid19-kerala-api.herokuapp.com/api/location" | jq

The parameters that are currently supported include loc(specify location) and date(specify date/timestamp).

Example


Loc

We can specify an array of locations to be filtered out:

curl "https://covid19-kerala-api.herokuapp.com/api/location?loc=kasaragod&loc=ernakulam" | jq

The above request provides the data pertaining to Kasaragod and Ernakulam districts from the oldest timestamp till latest.


Date

We can also filter using date={dd-mm-yyyy|dd/mm/yyyy} formatted parameter. Here the date supports inclusion of < and > characters in the query and even a keyword latest to get the latest data only.

Retrieve the data of all locations for the date 1st April 2020:

curl "https://covid19-kerala-api.herokuapp.com/api/location?date=01-04-2020" | jq

Retrieve the data from all locations with dates(timestamp) greater than 1st April 2020 till the last updated date:

curl "https://covid19-kerala-api.herokuapp.com/api/location?date=>07-04-2020" | jq

Combination

We can also combine these parameters for querying specific entries:

Getting the total summary from the latest data:-

curl "https://covid19-kerala-api.herokuapp.com/api/location?loc=total&date=latest" | jq

Retrieving the data of Ernakulam and Kannur districts for all dates after 4th April 2020 till latest timestamp.

curl "https://covid19-kerala-api.herokuapp.com/api/location?date=>04-04-2020&loc=ernakulam&loc=kannur" | jq

Timeline Endpoint

The /timeline endpoint serves the timeline of the number of cases in each district[WIP]. An example response format:

curl "https://covid19-kerala-api.herokuapp.com/api/timeline" | jq
{
	"success": true,
    "total_no_of_positive_cases_admitted": {
        "latest": 256,
        "timeline": {
            "2020-02-28T00:00:00Z": 0,
            "...": 1,
            ...,
            ...,
            ...,
            {latest-timestamp}: 256
        }
    }
}

Contributing

This is a general idea about the structure I have used. I'll happily accept new contributions and ideas. Make sure you check out the issues, or raise one and follow the contribution guidelines, and make your PR(raise issue before PR or claim already existing issue).

Libraries

Golang

  • gin - highly performant web framework
  • favicon - middleware for gin
  • cron - scheduler
  • jsoniter - just faster json encode/decode
  • soup - scraper(a bs4 clone)
  • color - rainbow puke

Python

  • pdftotext - not the best, but still provides a layout
  • jsonpickle - encode/decode objects into json

Running

i 'gnu' that make is gods own creation, the moment i laid my hands on it

Start off by installing the go and python packages - only needs to be done the first time:-

make init

The python script is invoked from within the gin-server. Therefore activate the pipenv shell first:-

pipenv shell

Then run the server(note that the executable will be stored in bin/):-
To run in production mode

make build

To run in development mode (i.e. gin server will be in development mode)

make run

Once everything is setup, essentially running make build or make run (as per requirement) from project root can restart the server everytime.


Project Structure

├── bin-------------------------------->covid19keralaapi executable
├── cmd
│   └── covid19keralaapi
│       └── main.go-------------------->Entry point and initialization of all pkgs
├── data
│   ├── 09-04-2020.pdf----------------->Latest pdf data collected
│   └── data.json---------------------->Latest json data extracted from the above pdf
├── go.mod---------------------------| 
├── go.sum---------------------------|->Go Modules tracker
├── internal--------------------------->Pkgs for internal use only
│   ├── date
│   │   └── date.go-------------------->All date related functions and validations
│   ├── controller
│   │   └── controller.go-------------->Deserialization, Timeline Generation, Location Array Gen
│   ├── logger
│   │   └── logger.go------------------>Custom logging for all pkgs
│   ├── model
│   │   └── model.go------------------->Primarily for all json unmarshalling
│   ├── scheduler
│   │   └── scheduler.go--------------->Scheduling of scraper,downloader and exec python script
│   ├── scraper
│   │   └── scraper.go----------------->Interface to scrape any website with limited attrs
│   ├── server
│   │   ├── no_route_handler.go-----|-->handles invalid routes
│   │   ├── api_handler.go----------|-->'/api' serves the server.JsonData.All.Data
│   │   ├── api_location_handler.go-|-->'/api/location' filters based on loc and date params
│   │   ├── api_timeline_handler.go-|-->'/api/timeline' serves the TimeLine struct
│   │   ├── root_handler.go---------|-->'/' endpoint renders the html frontpage
│   │   └── server.go---------------|-->Server running, allotting handlers to url
│   ├── storage
│   │   └── storage.go----------------->PDF,json filenames, deletion of old pdf file
│   └── website
│       ├── error.go------------------->Custom error handler while scraping the netzz
│       └── website.go----------------->Implements scraper functions and downloads latest data
├── Makefile--------------------------->For easier building and running
├── readme.md
├── scripts
│   ├── extract-text-data.py----------->Messy script - converts pdf to json
│   ├── Pipfile
│   └── Pipfile.lock
└── web
    └── index.html--------------------->Frontpage
    └── assets/------------------------>No css yet. Just favicons

License

Use this repo in the name of Freeeeeedommmmmmmm!! and open source. or this would do - license

covid19-kerala-api-deprecated's People

Contributors

dependabot[bot] avatar mohitrajane avatar yedhink avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

covid19-kerala-api-deprecated's Issues

The python script to extract json from pdf data is a mess - should it be refactored? Alternative libraries?

Currently the script works pretty well. But since the pdfs that we get from dhs kerala is very inconsistent, I had to use a lot of hacks to make the script work as intended. Also the pdftotext library used, is not very effective to all pdfs(say example pdf of date 02/12/2020). So what are your thoughts on refactoring the whole script? What other alternative libraries are there to extract tabular data from pdfs in a much better way? Also just manually adding the data for missing dates is the only solution(which destroys the purpose of this repo, but sitll)?

Extract data from annexure-2 table - will provide number of deaths,number discharged and an awesome timeline.

Currently important stats like number of deaths, number of discharged are missing from our response data since annex-2 table hasn't been extracted.

Extracting data from annex-2 table will be a challenge. But the data from it can help to create a timeline data from starting till latest dates with number of deaths per district, total number of deaths, number of people discharged per district, number of total discharged etc.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.