Coder Social home page Coder Social logo

biddyweb / pystock-crawler Goto Github PK

View Code? Open in Web Editor NEW

This project forked from eliangcs/pystock-crawler

0.0 3.0 0.0 5.43 MB

Crawl and parse financial reports (XBRL) from SEC EDGAR, and daily stock prices from Yahoo

License: MIT License

Python 100.00%

pystock-crawler's Introduction

pystock-crawler

image

image

image

pystock-crawler is a utility for crawling historical data of US stocks, including:

Example Output

NYSE ticker symbols:

DDD   3D Systems Corporation
MMM   3M Company
WBAI  500.com Limited
...

Apple's daily prices:

symbol,date,open,high,low,close,volume,adj_close
AAPL,2014-04-28,572.80,595.75,572.55,594.09,23890900,594.09
AAPL,2014-04-25,564.53,571.99,563.96,571.94,13922800,571.94
AAPL,2014-04-24,568.21,570.00,560.73,567.77,27092600,567.77
...

Google's fundamentals:

symbol,end_date,amend,period_focus,fiscal_year,doc_type,revenues,op_income,net_income,eps_basic,eps_diluted,dividend,assets,cur_assets,cur_liab,cash,equity,cash_flow_op,cash_flow_inv,cash_flow_fin
GOOG,2009-06-30,False,Q2,2009,10-Q,5522897000.0,1873894000.0,1484545000.0,4.7,4.66,0.0,35158760000.0,23834853000.0,2000962000.0,11911351000.0,31594856000.0,3858684000.0,-635974000.0,46354000.0
GOOG,2009-09-30,False,Q3,2009,10-Q,5944851000.0,2073718000.0,1638975000.0,5.18,5.13,0.0,37702845000.0,26353544000.0,2321774000.0,12087115000.0,33721753000.0,6584667000.0,-3245963000.0,74851000.0
GOOG,2009-12-31,False,FY,2009,10-K,23650563000.0,8312186000.0,6520448000.0,20.62,20.41,0.0,40496778000.0,29166958000.0,2747467000.0,10197588000.0,36004224000.0,9316198000.0,-8019205000.0,233412000.0
...

Installation

Prerequisites:

  • Python 2.7

pystock-crawler is based on Scrapy, so you will also need to install prerequisites such as lxml and libffi for Scrapy and its dependencies. On Ubuntu, for example, you can install them like this:

sudo apt-get update
sudo apt-get install -y gcc python-dev libffi-dev libssl-dev libxml2-dev libxslt1-dev build-essential

See Scrapy's installation guide for more details.

After installing prerequisites, you can then install pystock-crawler with pip:

(sudo) pip install pystock-crawler

Quickstart

Example 1. Fetch Google's and Yahoo's daily prices ordered by date:

pystock-crawler prices GOOG,YHOO -o out.csv --sort

Example 2. Fetch daily prices of all companies listed in ./symbols.txt:

pystock-crawler prices ./symbols.txt -o out.csv

Example 3. Fetch Facebook's fundamentals during 2013:

pystock-crawler reports FB -o out.csv -s 20130101 -e 20131231

Example 4. Fetch fundamentals of all companies in ./nyse.txt and direct the log to ./crawling.log:

pystock-crawler reports ./nyse.txt -o out.csv -l ./crawling.log

Example 5. Fetch all ticker symbols in NYSE, NASDAQ and AMEX:

pystock-crawler symbols NYSE,NASDAQ,AMEX -o out.txt

Usage

Type pystock-crawler -h to see command help:

Usage:
  pystock-crawler symbols <exchanges> (-o OUTPUT) [-l LOGFILE] [-w WORKING_DIR]
                                      [--sort]
  pystock-crawler prices <symbols> (-o OUTPUT) [-s YYYYMMDD] [-e YYYYMMDD]
                                   [-l LOGFILE] [-w WORKING_DIR] [--sort]
  pystock-crawler reports <symbols> (-o OUTPUT) [-s YYYYMMDD] [-e YYYYMMDD]
                                    [-l LOGFILE] [-w WORKING_DIR]
                                    [-b BATCH_SIZE] [--sort]
  pystock-crawler (-h | --help)
  pystock-crawler (-v | --version)

Options:
  -h --help       Show this screen
  -o OUTPUT       Output file
  -s YYYYMMDD     Start date [default: ]
  -e YYYYMMDD     End date [default: ]
  -l LOGFILE      Log output [default: ]
  -w WORKING_DIR  Working directory [default: .]
  -b BATCH_SIZE   Batch size [default: 500]
  --sort          Sort the result

There are three commands available:

  • pystock-crawler symbols grabs ticker symbol lists
  • pystock-crawler prices grabs daily prices
  • pystock-crawler reports grabs fundamentals

<exchanges> is a comma-separated string that specifies the stock exchanges you want to include. Current, NYSE, NASDAQ and AMEX are supported.

The output file of pystock-crawler symbols can be used for <symbols> argument in pystock-crawler prices and pystock-crawler reports commands.

<symbols> can be an inline string separated with commas or a text file that lists symbols line by line. For example, the inline string can be something like AAPL,GOOG,FB. And the text file may look like this:

# This line is comment
AAPL    Put anything you want here
GOOG    Since the text here is ignored
FB

Use -o to specify the output file. For pystock-crawler symbols command, the output format is a simple text file. For pystock-crawler prices and pystock-crawler reports the output format is CSV.

-l is where the crawling logs go to. If not specified, the logs go to stdout.

By default, the crawler uses the current directory as the working directory. If you don't want to use the current directoy, you can specify it with -w option. The crawler keeps HTTP cache in a directory named .scrapy under the working directory. The cache can save your time by avoid downloading the same web pages. However, the cache can be quite huge. If you don't need it, just delete the .scrapy directory after you've done crawling.

-b option is only available to pystock-crawler reports command. It allows you to split a large symbol list into smaller batches. This is actually a workaround for an unresolved bug (#2). Normally you don't have to specify this option. Default value (500) works just fine.

The rows in the output file are in an arbitrary order by default. Use --sort option to sort them by symbols and dates. But if you have a large output file, don't use --sort because it will be slow and eat a lot of memory.

Developer Guide

Installing Dependencies

pip install -r requirements.txt

Running Test

Install test requirements:

pip install -r requirements-test.txt

Then run the test:

py.test

This will download the test data (a lot of XML/XBRL files) from from SEC EDGAR on the fly, so it will take some time and disk space. The test data is saved to pystock_crawler/tests/sample_data directory. It can be reused on the next time you run the test. If you don't need them, just delete the sample_data directory.

pystock-crawler's People

Contributors

eliangcs avatar

Watchers

James Cloos avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.