Coder Social home page Coder Social logo

elijas / sec-downloader Goto Github PK

View Code? Open in Web Editor NEW
9.0 2.0 3.0 424 KB

A better version of sec-edgar-downloader.

Home Page: https://elijas.github.io/sec-downloader/

License: MIT License

Jupyter Notebook 51.31% CSS 1.49% Python 47.20%
edgar edgar-downloader finance financial financial-data mutual-funds python sec-edgar stock-market stocks

sec-downloader's Introduction

sec-downloader

GitHub Workflow Status PyPI - Python Version PyPI version Licence

A better version of sec-edgar-downloader. Includes an alternative implementation (a wrapper instead of a fork), to keep compatibility with new sec-edgar-downloader releases. This library partially uses nbdev.

Features

Advantages over sec-edgar-downloader:

Flexibility in Download Process

  • Tailored for choosing what, where, and how to download.
  • Files stored in memory for faster operations and no unnecessary disk clutter.

Separate Metadata and File Downloads

  • Easily skip unneeded files.
  • Download metadata first, then selectively download files.
  • Option to save metadata for better organization.

More Input Options

  • Ticker or CIK (e.g., AAPL, 0000320193) for latest filings.
  • Accession Number (e.g., 0000320193-23-000077). Not supported in sec-edgar-downloader.
  • SEC EDGAR URL (e.g., https://www.sec.gov/ix?doc=/Archives/edgar/data/0001067983/000119312523272204/d564412d8k.htm). Not supported in sec-edgar-downloader.

Install

pip install sec-downloader

How to use

Download the metadata

Note The company name and email address are used to form a user-agent string that adheres to the SEC EDGAR’s fair access policy for programmatic downloading. Source

from sec_downloader import Downloader

dl = Downloader("MyCompanyName", "[email protected]")

Find a filing with an Accession Number

metadatas = dl.get_filing_metadatas("AAPL/0000320193-23-000077")
print(metadatas)
[FilingMetadata(accession_number='0000320193-23-000077',
                form_type='10-Q',
                primary_doc_url='https://www.sec.gov/Archives/edgar/data/320193/000032019323000077/aapl-20230701.htm',
                items='',
                primary_doc_description='10-Q',
                filing_date='2023-08-04',
                report_date='2023-07-01',
                cik='0000320193',
                company_name='Apple Inc.',
                tickers=[Ticker(symbol='AAPL', exchange='Nasdaq')])]

Alternatively, you can also use any of these to get the same answer:

metadatas = dl.get_filing_metadatas("aapl/000032019323000077")
metadatas = dl.get_filing_metadatas("320193/000032019323000077")
metadatas = dl.get_filing_metadatas("320193/0000320193-23-000077")
metadatas = dl.get_filing_metadatas("0000320193/0000320193-23-000077")
metadatas = dl.get_filing_metadatas(CompanyAndAccessionNumber(ticker_or_cik="320193", accession_number="0000320193-23-000077"))

Find the filing matching a SEC EDGAR Filing URL. Only CIK and Accession Number are used from the URL:

metadatas = dl.get_filing_metadatas(
    "https://www.sec.gov/ix?doc=/Archives/edgar/data/0001067983/000119312523272204/d564412d8k.htm"
)
print(metadatas)
[FilingMetadata(accession_number='0001193125-23-272204',
                form_type='8-K',
                primary_doc_url='https://www.sec.gov/Archives/edgar/data/1067983/000119312523272204/d564412d8k.htm',
                items='2.02,9.01',
                primary_doc_description='8-K',
                filing_date='2023-11-07',
                report_date='2023-11-04',
                cik='0001067983',
                company_name='BERKSHIRE HATHAWAY INC',
                tickers=[Ticker(symbol='BRK-B', exchange='NYSE'),
                         Ticker(symbol='BRK-A', exchange='NYSE')])]

Alternatively, you can also URLs in other formats and get the same answer:

metadatas = dl.get_filing_metadatas("https://www.sec.gov/Archives/edgar/data/1067983/000119312523272204/d564412d8k.htm")

Find latest filings by company ticker or CIK:

from sec_downloader.types import RequestedFilings

metadatas = dl.get_filing_metadatas(
    RequestedFilings(ticker_or_cik="MSFT", form_type="10-K", limit=2)
)
print(metadatas)
[FilingMetadata(accession_number='0000950170-23-035122',
                form_type='10-K',
                primary_doc_url='https://www.sec.gov/Archives/edgar/data/789019/000095017023035122/msft-20230630.htm',
                items='',
                primary_doc_description='10-K',
                filing_date='2023-07-27',
                report_date='2023-06-30',
                cik='0000789019',
                company_name='MICROSOFT CORP',
                tickers=[Ticker(symbol='MSFT', exchange='Nasdaq')]),
 FilingMetadata(accession_number='0001564590-22-026876',
                form_type='10-K',
                primary_doc_url='https://www.sec.gov/Archives/edgar/data/789019/000156459022026876/msft-10k_20220630.htm',
                items='',
                primary_doc_description='10-K',
                filing_date='2022-07-28',
                report_date='2022-06-30',
                cik='0000789019',
                company_name='MICROSOFT CORP',
                tickers=[Ticker(symbol='MSFT', exchange='Nasdaq')])]

Alternatively, you can also use any of these to get the same answer:

metadatas = dl.get_filing_metadatas("2/msft/10-K")
metadatas = dl.get_filing_metadatas("2/789019/10-K")
metadatas = dl.get_filing_metadatas("2/0000789019/10-K")

The parameters limit and form_type are optional. If omitted, limit defaults to 1, and form_type defaults to ‘10-Q’.

metadatas = dl.get_filing_metadatas("NFLX")
print(metadatas)
[FilingMetadata(accession_number='0001065280-23-000273',
                form_type='10-Q',
                primary_doc_url='https://www.sec.gov/Archives/edgar/data/1065280/000106528023000273/nflx-20230930.htm',
                items='',
                primary_doc_description='10-Q',
                filing_date='2023-10-20',
                report_date='2023-09-30',
                cik='0001065280',
                company_name='NETFLIX INC',
                tickers=[Ticker(symbol='NFLX', exchange='Nasdaq')])]

Alternatively, you can also use any of these to get the same answer:

metadatas = dl.get_filing_metadatas("nflx")
metadatas = dl.get_filing_metadatas("1/NFLX")
metadatas = dl.get_filing_metadatas("NFLX/10-Q")
metadatas = dl.get_filing_metadatas("1/NFLX/10-Q")
metadatas = dl.get_filing_metadatas(RequestedFilings(ticker_or_cik="NFLX"))
metadatas = dl.get_filing_metadatas(RequestedFilings(limit=1, ticker_or_cik="NFLX", form_type="10-Q"))

Download the HTML files

After obtaining the Primary Document URL, for example from the metadata, you can proceed to download the HTML using this URL.

for metadata in metadatas:
    html = dl.download_filing(url=metadata.primary_doc_url).decode()
    print(html[:50])
    break  # same for all filings, let's just print the first one
'<?xml version="1.0" ?><!--XBRL Document Created wi'

Alternative implementation: Wrapper

Files are downloaded to a temporary folder, immediately read into memory, and then deleted. Let’s demonstrate how to download a single file (latest 10-Q filing details in HTML format) to memory. The “glob” pattern is used to select which files are read to memory.

from sec_edgar_downloader import Downloader as SecEdgarDownloader
from sec_downloader.download_storage import DownloadStorage

ONLY_HTML = "**/*.htm*"

storage = DownloadStorage(filter_pattern=ONLY_HTML)
with storage as path:
    dl = SecEdgarDownloader("MyCompanyName", "[email protected]", path)
    dl.get("10-Q", "AAPL", limit=1, download_details=True)
# all files are now deleted and only stored in memory

content = storage.get_file_contents()[0].content
print(f"{content[:50]}...")
"<?xml version='1.0' encoding='ASCII'?>\n<html xmlns..."

Downloading multiple documents:

storage = DownloadStorage()
with storage as path:
    dl = SecEdgarDownloader("MyCompanyName", "[email protected]", path)
    dl.get("10-K", "GOOG", limit=2)
# all files are now deleted and only stored in memory

for path, content in storage.get_file_contents():
    print(f"Path: {path}\nContent [len={len(content)}]: {content[:30]}...\n")
('Path: sec-edgar-filings/GOOG/10-K/0001652044-24-000022/full-submission.txt\n'
 'Content [len=13927595]: <SEC-DOCUMENT>0001652044-24-00...\n')
('Path: sec-edgar-filings/GOOG/10-K/0001652044-23-000016/full-submission.txt\n'
 'Content [len=15264470]: <SEC-DOCUMENT>0001652044-23-00...\n')

Contributing

Follow these steps to install the project locally for development:

  1. Install the project with the command pip install -e ".[dev]".

Note We highly recommend using virtual environments for Python development. If you’d like to use virtual environments, follow these steps instead:

  • Create a virtual environment python3 -m venv .venv
  • Activate the virtual environment source .venv/bin/activate
  • Install the project with the command pip install -e ".[dev]"

sec-downloader's People

Contributors

elijas avatar inf800 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

inf800 jasonz1360

sec-downloader's Issues

Need instructions for installing `sec_downloader`

The import statement for sec_downloader in the "How to use" section will not work because instructions for installing sec_downloader is not provided. This results in

ModuleNotFoundError: No module named 'sec_downloader'

Two ways to handle this:

Either add pip install sec_downloader in the README file's "Getting Started" section.

(or)

Add a line with sec-downloader = "^0.2.3" in pyproject.toml's [tool.poetry.dependencies] instead of just adding it in dev dependencies. This will make sure that the import statement will work with just pip install sec-ai

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.