Coder Social home page Coder Social logo

1x-eng / chart_data_extractor Goto Github PK

View Code? Open in Web Editor NEW
9.0 2.0 9.0 79 KB

This webservice will help scrape data out of chart(s) presented on any given website. (At this moment, I only support scrape from HighCharts and AmCharts. Other libraries, maybe next time).

License: MIT License

Python 100.00%
scraper scraper-engine python-scraper python-chart-scraper chart-data-extractor chart-data charts highcharts amcharts amcharts-js-charts highcharts-js beautifulsoup python-web-scraper hacktoberfest hacktoberfest2018

chart_data_extractor's Introduction

Chart Data Extractor - SDK | API

This webservice will help scrape data out of chart(s) presented on any given website. (At this moment, I only support scrape from HighCharts and AmCharts. Other libraries, maybe next time).

NOTE: uses gunicorn (https://docs.gunicorn.org/en/stable/index.html) which is WSGI HTTP server for *nix systems. On windows, you might want to swap gunicorn with uWSGI or other alternatives.

Features:

  • REST services for extracting data via URL.
  • Simpler to get started.

Getting Started:

  • Clone this repo > `cd chart_data_extractor.
  • pip install -r requirements.txt in cenv of your choice (py=3)
gunicorn -b localhost:8000 scraper_service:app --threads 3 --reload
  • To extract data from a (supported)chart, try this:
http://localhost:8000/v1/chartDataExtractor?targetUrl=http://www.google.com

MIT License

Copyright (c) 2018 Pruthvi Kumar

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

chart_data_extractor's People

Contributors

1x-eng avatar renovate-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

chart_data_extractor's Issues

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Detected dependencies

pip_requirements
chart_data_extractor/requirements.txt
  • requests ==2.27.1
  • pygogo ==1.3.0
  • beautifulsoup4 ==4.11.1
  • demjson ==2.2.4
  • falcon ==3.1.0
  • falcon_cors ==1.1.7
  • selenium ==4.1.3

  • Check this box to trigger a request for Renovate to run again on this repository

FileNorFoundError

Hi, I've upgraded demjson to demjson4 as i had some dependency issues and wasn't working for me. Now when i run the code i get this error:

[2023-10-19 15:38:32 +0200] [33433] [INFO] Starting gunicorn 21.2.0 [2023-10-19 15:38:32 +0200] [33433] [INFO] Listening at: http://127.0.0.1:8000 (33433) [2023-10-19 15:38:32 +0200] [33433] [INFO] Using worker: gthread [2023-10-19 15:38:32 +0200] [33434] [INFO] Booting worker with pid: 33434 [2023-10-19 15:38:32 +0200] [33434] [ERROR] Exception in worker process Traceback (most recent call last): File "/Users/aishahalane/chart_data_extractor/extract/lib/python3.11/site-packages/gunicorn/arbiter.py", line 609, in spawn_worker worker.init_process() File "/Users/aishahalane/chart_data_extractor/extract/lib/python3.11/site-packages/gunicorn/workers/gthread.py", line 95, in init_process super().init_process() File "/Users/aishahalane/chart_data_extractor/extract/lib/python3.11/site-packages/gunicorn/workers/base.py", line 134, in init_process self.load_wsgi() File "/Users/aishahalane/chart_data_extractor/extract/lib/python3.11/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi self.wsgi = self.app.wsgi() ^^^^^^^^^^^^^^^ File "/Users/aishahalane/chart_data_extractor/extract/lib/python3.11/site-packages/gunicorn/app/base.py", line 67, in wsgi self.callable = self.load() ^^^^^^^^^^^ File "/Users/aishahalane/chart_data_extractor/extract/lib/python3.11/site-packages/gunicorn/app/wsgiapp.py", line 58, in load return self.load_wsgiapp() ^^^^^^^^^^^^^^^^^^^ File "/Users/aishahalane/chart_data_extractor/extract/lib/python3.11/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp return util.import_app(self.app_uri) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/aishahalane/chart_data_extractor/extract/lib/python3.11/site-packages/gunicorn/util.py", line 371, in import_app mod = importlib.import_module(module) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<frozen importlib._bootstrap>", line 1204, in _gcd_import File "<frozen importlib._bootstrap>", line 1176, in _find_and_load File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 690, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 940, in exec_module File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "/Users/aishahalane/chart_data_extractor/chart_data_extractor/scraper_service.py", line 18, in <module> app = falcon.API(middleware=[cors.middleware, PersistApiCalls()]) ^^^^^^^^^^^^^^^^^ File "/Users/aishahalane/chart_data_extractor/chart_data_extractor/middlewares/logCapture.py", line 22, in __init__ low_hdlr=gogo.handlers.file_hdlr('./logs/pk_scraperService_middleware.log'), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/aishahalane/chart_data_extractor/extract/lib/python3.11/site-packages/pygogo/handlers.py", line 107, in file_hdlr return logging.FileHandler(filename, **fkwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/logging/__init__.py", line 1181, in __init__ StreamHandler.__init__(self, self._open()) ^^^^^^^^^^^^ File "/opt/homebrew/Cellar/[email protected]/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/logging/__init__.py", line 1213, in _open return open_func(self.baseFilename, self.mode, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ FileNotFoundError: [Errno 2] No such file or directory: '/Users/aishahalane/chart_data_extractor/chart_data_extractor/logs/pk_scraperService_middleware.log' [2023-10-19 15:38:32 +0200] [33434] [INFO] Worker exiting (pid: 33434) [2023-10-19 15:38:32 +0200] [33433] [ERROR] Worker (pid:33434) exited with code 3 [2023-10-19 15:38:32 +0200] [33433] [ERROR] Shutting down: Master [2023-10-19 15:38:32 +0200] [33433] [ERROR] Reason: Worker failed to boot.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.