Coder Social home page Coder Social logo

tickerscrape's Introduction

TickerScrape

Scrape the universe of exchange-traded security tickers.

TickerScrape is a package for scraping financial security ticker data. It leverages scrapy.

Every publically traded security for every asset class are scraped to a SQL database using SQLAlchemy. The ORM is configured to create database tables mapping securities to assset classes, countries, industries and exchanges. It also creates relationships between counties and currencies, as well as industruies and sectors (based on NAICS codes). The securities table has columns for fundamental data, metadata, accounting ratios, and analyst estimates. Country metadata such as ISO 3166 code, continent, territory status, region, economic grouping, and geopolitical grouping are pulled from a local CSV file. The country table also has empty columns for economic data such as GDP. Currency metadata such as symbol, ISO 4217 code, ticker, and minor unit are pulled from a local CSV file. The currency table also has empty columns for economic data such as interest rates.

The repository can be found at: Github-TickerScrape

To install from git:

pip install git+git://github.com/Saran33/TickerScrape.git or git clone https://github.com/Saran33/TickerScrape.git

Dependencies

TickerScrape requires Docker, Splash and this fork of Aquarium to scrape some websites that render in Javascript.

  1. After pip installing TickerScrape, download Docker at the above link.
  2. As per the above Splash installation docs, pull the splash image with:
Linux:
$ sudo docker pull scrapinghub/splash
OS X / Windows:
$ docker pull scrapinghub/splash
  1. Start the container:
Linux:
$ sudo docker run -it -p 8050:8050 --rm scrapinghub/splash

(Splash is now available at 0.0.0.0 at port 8050 (http))

OS X / Windows:
$ docker run -it -p 8050:8050 --rm scrapinghub/splash

(Splash is available at 0.0.0.0 address at port 8050 (http))

  • Alternatively, use the Docker desktop app. Splash is found under the 'images' tab. Hover over it, click 'run'. In additional settings, name the container 'splash', and select a port such as 8050. Click 'run' and switch on the container before running scrapy. Switch it off after.

  • In a browser, enter localhost:8050 (or whatever port you choose), and you should see Splash is working.

  • The other dependencies will be automatically installed and you can run TickerScrape as normal. $ sudo docker pull scrapinghub/splash for Linux or $ docker pull scrapinghub/splash for OS X.

  1. Aquarium creates multiple Splash instances behind a HAProxy, in order to load balance parallel scrapy requests to a splash docker cluster. The instances collaborate to render a specific website. It may be necessary for preventing 504 errors (timeout) on some sites. It also speeds up the scraping of Javascript pages, and can also facilitate Tor proxies. To install Aquarium, navigate to your home directory and run the command:
cookiecutter gh:Saran33/aquarium

Choose default settings or whatever suits, splash_version: latest, set user and password, set Tor to 0.

  1. a. To start the container (without Acquarium):
Linux:

$ sudo docker run -it --restart always -p 8050:8050 scrapinghub/splash (Linux) (Splash is now available at 0.0.0.0 at port 8050 (http).)

OS X / Windows:

or $ docker run -it --restart always -p 8050:8050 scrapinghub/splash (OS X) (Splash is available at 0.0.0.0 address at port 8050 (http).)

  • Alternatively, use the Docker desktop app. Splash is found in the 'images' tab. Hover over it, click 'run'. In additional settings, name the container 'splash', and select a port such as 8050. Click 'run.'
  • In a broweser, enter localhost:8050 (or whatever port you choose) and you should see Splash.
  • The other dependencies will be automatically be installed and you can run TickerScrape as normal.
  1. b. Or to start the Splash cluster with Aquarium:

Go to the new acquarium folder and start the Splash cluster:

cd ./aquarium
docker-compose up

In a browser window, visit the below link to view Splash is working: http://localhost:8050/ To see the stats of the cluster: http://localhost:8036/

To run TickerScrape (download every security for every country):

  1. Navigate to the outer directory of TickerScrape.
  2. Open a terminal and run:
python3 TickerScrape.py 

To scrape U.S.stocks only:

scrapy crawl mw_stocks -a country=us

To run TickerScrape GUI:

python3 TickerScrape_gui.py
  1. The default settings save the tickers to a local SQLite database (which can be changed in settings.py). The DB can be read via SQL queries such as:
sqlite3 TickerScrape.db
.tables
.schema stocks
.schema bonds
select * from stocks limit 3;
.quit

Alternatively, the DB can be opened in the convenient DB Browser for SQLite.

To save the scraped data to a CSV as well as the DB, run:

scrapy crawl marketwatch -o output.csv -t csv

tickerscrape's People

Contributors

saran33 avatar

Watchers

 avatar

Forkers

webclinic017

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.