dgunning / edgartools Goto Github PK

View Code? Open in Web Editor NEW

347.0 15.0 74.0 87.73 MB

Python library for working with SEC Edgar

License: MIT License

Python 98.25% HTML 1.75%

edgartools's Introduction

The world's easiest, most powerful edgar library

Features

🧠 Intuitive and easy to use: edgartools has a super simple API that is easy to use.
🛠️ Works as a library or a CLI: You can use edgartools as a library in your code or as a CLI tool.
📁 Access any SEC filing: You can access any SEC filing since 1994.
📅 List filings for any date range: List filings for year, quarter e.g. or date range 2024-02-29:2024-03-15
🌟 Best looking edgar library: Uses rich library to display SEC Edgar data in a beautiful way.
🔄 Page through filings: Use filings.next() and filings.previous() to page through filings
🏗️ Build Data Pipelines: Build data pipelines by finding, filtering, transforming and saving filings
✅ Select a filing: You can select a filing from the list of filings.
📄 View the filing as HTML or text: Find a filing then get the content as HTML or text.
🔢 Chunk filing text: You can chunk the filing text into sections for vector embedding.
🔍 Preview the filing: You can preview the filing in the terminal or a notebook.
🔎 Search through a filing: You can search through a filing for a keyword.
📊 Parse XBRL: If a filing has XBRL, you can parse it to a dataframe.
💾 Data Objects: Automatically downloads and parses filings into data objects.
📥 Download any attachment: You can download any attachment from the filing.
🕒 Automatic throttling: Automatically throttles requests to Edgar to avoid being blocked.
📥 Bulk downloads: Faster batch processing through bulk downloads of filings and facts
🔢 Get company by Ticker or Cik: Get a company by ticker Company("SNOW") or cik Company(1640147)
📚 Get company filings: You can get all the company's historical filings using company.get_filings()
📈 Get company facts: You can get company facts using company.get_facts()
💰 Company Financials: You can get company financials using company.financials
🔍 Lookup Ticker by CUSIP: You can lookup a ticker by CUSIP
📑 Dataset of SEC entities: You can get a dataset of SEC companies and persons
📈 Fund Reports: Search for and get 13F-HR fund reports
👤 Insider Transactions: Search for and get insider transactions

Getting started

Install using pip

pip install edgartools

Import and start using

from edgar import *

# Tell the SEC who you are
set_identity("Michael Mccallum [email protected]")

filings = get_filings()

Key Concepts

How do I find a filing?

Depends on what you know

A. I know the accession number

filing = find("0001065280-23-000273")

B. I know the company ticker or cik

filings = Company("NFLX").get_filings(form="10-Q").latest(1)

C. Show me a list of filings

filings = get_filings(form="10-Q")
filing = filings[0]

What can I do with a filing

You can view it in the terminal or open it in the browser, get the filing as html, xml or text, and download attachments. You can extract data from the filing into a data object.

What can I do with a company

You can get the company's filings, facts and financials.

How to use edgartools

Task	Code
Set your EDGAR identity in Linux/Mac	`export EDGAR_IDENTITY="First Last [email protected]"`
Set your EDGAR identity in Windows	`set EDGAR_IDENTITY="First Last [email protected]"`
Set identity in Windows Powershell	`$env:EDGAR_IDENTITY="First Last [email protected]"`
Set identity in Python	`set_identity("First Last [email protected]")`
Importing the library	`from edgar import *`

Working with filings

Task	Code
Get filings for the year to date	`filings = get_filings()`
Get only xbrl filings	`filings = get_filings(index="xbrl")`
Get filings for a specific year	`filings = get_filings(2020)`
Get filings for a specific quarter	`filings = get_filings(2020, 1)`
Get filings for multiple years	`filings = get_filings([2020, 2021])`
Get filigs for a range of years	`filings = get_filings(year=range(2010, 2020)`
Get filings for a specific form	`filings = get_filings(form="10-K")`
Get filings for a list of forms	`filings = get_filings(form=["10-K", "10-Q"])`
Show the next page of filings	`filings.next()`
Show the previous page of filings	`filings.prev()`
Get the first n filings	`filings.head(20)`
Get the last n filings	`filings.tail(20)`
Get the latest n filings by date	`filings.latest(20)`
Get a random sample of the filings	`filings.sample(20)`
Filter filings on a date	`filings = filings.filter(date="2020-01-01")`
Filter filings between dates	`filings.filter(date="2020-01-01:2020-03-01")`
Filter filings before a date	`filings.filter(date=":2020-03-01")`
Filter filings after a date	`filings.filter(date="2020-03-01:")`
Get filings as a pandas dataframe	`filings.to_pandas()`

Working with a filing

Task	Code
Get a single filing	`filing = filings[3]`
Get a filing by accession number	`filing = get_by_accession_number("0000320193-20-34576")`
Get the filing homepage	`filing.homepage`
Open a filing in the browser	`filing.open()`
Open the filing homepage in the browser	`filing.homepage.open()`
View the filing in the terminal	`filing.view()`
Get the html of the filing document	`filing.html()`
Get the XBRL of the filing document	`filing.xbrl()`
Get the filing document as markdown	`filing.markdown()`
Get the full submission text of a filing	`filing.text()`
Get and parse the data object of a filing	`filing.obj()`
Get the filing attachments	`filing.attachments`
Get a single attachment	`attachment = filing.attachments[0]`
Open an attachment in the browser	`attachment.open()`
Download an attachment	`content = attachment.download()`

Working with a company

Task	Code
Get a company by ticker	`company = Company("AAPL")`
Get a company by CIK	`company = Company("0000320193")`
Get company facts	`company.get_facts()`
Get company facts as a pandas dataframe	`company.get_facts().to_pandas()`
Get company filings	`company.get_filings()`
Get company filings by form	`company.get_filings(form="10-K")`
Get a company filing by accession_number	`company.get_filing(accession_number="0000320193-21-000139")`
Get the company's financials	`company.financials`
Get the company's balance sheet	`company.financials.balance_sheet`
Get the company's income statement	`company.financials.income_statement`
Get the company's cash flow statement	`company.financials.cash_flow_statement`

Installation

pip install edgartools

Usage

Set your Edgar user identity

Before you can access the SEC Edgar API you need to set the identity that you will use to access Edgar. This is usually your name and email, or a company name and email.

Sample Company Name AdminContact@<sample company domain>.com

The user identity is sent in the User-Agent string and the Edgar API will refuse to respond to your request without it.

EdgarTools will look for an environment variable called EDGAR_IDENTITY and use that in each request. So, you need to set this environment variable before using it.

Setting EDGAR_IDENTITY in Linux/Mac

export EDGAR_IDENTITY="Michael Mccallum [email protected]"

Setting EDGAR_IDENTITY in Windows Powershell

 $Env:EDGAR_IDENTITY="Michael Mccallum [email protected]"

Alternatively, you can call set_identity which does the same thing.

from edgar import set_identity
set_identity("Michael Mccallum [email protected]")

For more detail see https://www.sec.gov/os/accessing-edgar-data

Usage

Importing edgar

from edgar import *

Using the Filing API

Use the Filing API when you are not working with a specific company, but want to get a list of filings.

For details on how to use the Filing API see Using the Filing API

Using the Company API

With the Company API you can find a company by ticker or CIK, and get the company's filings, facts and financials.

Company("AAPL")
        .get_filings(form="10-Q")
        .latest(1)
        .obj()

See Using the Company API

Viewing and downloading attachments

Every filing has a list of attachments. You can view the attachments using filing.attachments

# View the attachments
filing.attachments

You can access each attachment using the bracket operator [] and the index of the attachment.

# Get the first attachment
attachment = filing.attachments[0]

You can download the attachment using attachment.download(). This will download the attachment to string or bytes in memory.

Automatic parsing of filing data

Now the reason you may want to download attachments is to get information contained in data files. For example, 13F-HR filings have attached infotable.xml files containing data from the holding report for that filing.

Fortunately, the library handles this for you. If you call filing.obj() it will automatically download and parse the data files into a data object, for several different form types. Currently, the following forms are supported:

Form	Data Object	Description
10-K	`TenK`	Annual report
10-Q	`TenQ`	Quarterly report
8-K	`EightK`	Current report
MA-I	`MunicipalAdvisorForm`	Municipal advisor initial filing
Form 144	`Form144`	Notice of proposed sale of securities
C, C-U, C-AR, C-TR	`FormC`	Form C Crowdfunding Offering
D	`FormD`	Form D Offering
3,4,5	`Ownership`	Ownership reports
13F-HR	`ThirteenF`	13F Holdings Report
NPORT-P	`FundReport`	Fund Report
EFFECT	`Effect`	Notice of Effectiveness
And other filing with XBRL	`FilingXbrl`

For example, to get the data object for a 13F-HR filing you can do the following:

filings = get_filings(form="13F-HR")
filing = filings[0]
thirteenf = filing.obj()

If you call obj() on a filing that does not have a data file, then it will return None.

Working with XBRL filings

Some filings are in XBRL (eXtensible Business Markup Language) format. These are mainly the newer filings, as the SEC has started requiring this for newer filings.

If a filing is in XBRL format then it opens up a lot more ways to get structured data about that specific filing and also about the company referred to in that filing.

The Filing class has an xbrl function that will download, parse and structure the filing's XBRL document if one exists. If it does not exist, then filing.xbrl() will return None.

The function filing.xbrl() returns a FilingXbrl instance, which wraps the data, and provides convenient ways of working with the xbrl data.

filing_xbrl = filing.xbrl()

Financials

Some filings, notably 10-K and 10-Q filings contain financial statements in XBRL format. You can get the financials from the XBRL data using the Financials class.

from edgar.financials import Financials
financials = Financials.from_xbrl(filing.xbrl())
financials.balance_sheet
financials.income_statement
financials.cash_flow_statement

Or automatically through the Tenk and TenQ data objects.

Here is an example that gets the latest Apple financials

tenk = Company("AAPL").get_filings(form="10-K").latest(1).obj()
financials = tenk.financials
financials.balance_sheet

Get the financial data as a pandas dataframe

Each of the financial statements - BalanceSheet, IncomeStatement and CashFlowStatement - have a to_dataframe() method that will return the data as a pandas dataframe.

balance_sheet_df = financials.balance_sheet.to_dataframe()

Downloading Edgar Data

The library is designed to make real time calls to EDGAR to get the latest data. However, you may want to download data for offline use or to build a dataset.

Download Bulk Company Data

You can download all the company filings and facts from Edgar using the download_edgar_data function. Note that this will store json files for each company of their facts and submissions, but it will not include the actual HTML or other attachments. It will however dramatically speed up loading companies by cik or ticker.

The submissions and facts bulk data files are each over 1.GB in size, and take around a few minutes each. The data is stored by default in the ~/.edgar directory. You can change this by setting the EDGAR_LOCAL_DATA_DIR environment variable.

```python
def download_edgar_data(submissions: bool = True, facts: bool = True):
    """
    Download all the company data from Edgar
    :param submissions: Download all the company submissions
    :param facts: Download all the company facts
    """
download_edgar_data()

Using Bulk Data

If you want edgartools to use the bulk data files you can call use_local_storage() before you start making calls using the library. Alternatively, set EDGAR_USE_LOCAL_DATA to True in your environment.

Downsides of using bulk data

The filings downloaded for each company is limited to the last 1000
You will need to download the latest data every so often to keep it up to date.

Downloading Attachments

You can download attachments from a filing using the download method on the attachments. This will download all the attached files to a folder of your choice.

class Attachments:
    
    ...
    
    def download(self, path: Union[str, Path], archive: bool = False):
        """
        Download all the attachments to a specified path.
        If the path is a directory, the file is saved with its original name in that directory.
        If the path is a file, the file is saved with the given path name.
        If archive is True, the attachments are saved in a zip file.
        path: str or Path - The path to save the attachments
        archive: bool (default False) - If True, save the attachments in a zip file
        """ 
        ...
        
# Usage
filing.attachments.download(path)

Contributing

Contributions are welcome! We would love to hear your thoughts on how this library could be better at working with SEC Edgar.

Reporting Issues

We use GitHub issues to track public bugs. Report a bug by opening a new issue; it's that easy!

Making code changes

Fork the repo and create your branch from master.
If you've added code that should be tested, add tests.
If you've changed APIs, update the documentation.
Ensure the test suite passes.
Make sure your code lints.
Issue that pull request!

License

edgartools is distributed under the terms of the MIT license.

Contact

Star History

Subscribe to Polar

edgartools's People

Contributors

Stargazers

Watchers

Forkers

firmai-research serignecisse linusbiostat ryanewood1 wmaiouiru jmmartinnu2 mesyagut willsliou bgkyer djkelleher joaovbezerra1990 ee-rei davelacy dave-lacy ivy113 tosansmith cyc202 damanhanzo luisriverag geoffreyburger hyabean gchinna peacebeuponthee whentostart poggiolabs rla3rd dgarciagud dwise0315 bianhezhen iraday markconde quantfinancelab iqmo-org skunkwerk leehyunuk vgreg virattt xayaraj touristshaun whatif-dev polya20 olavl emilmirzayev drronangallagher rahulgoel oklynx mattnest ssaravanan75 yuval-kahan greenwarecoding sukantag redpoint13 zhaozhonglei unparadise jmy12k3 babbl-team papiguy fintechtonic chence08 bsjung ukaserge abcampbell drixta bridgetleonard2 rivera-lanasm camc-mamani cminor102 emestee sudz4 jacob187 joonable pelly dong-toggle-ai

edgartools's Issues

Partially wrong parsing of Apple's 10K for 2016

Python version: 3.12.2 (main, Mar 12 2024, 08:01:18) [GCC 12.2.0]
edgartools version: 2.21.1

Minimum Reproducible Example

from edgar import Company, set_identity


if __name__ == "__main__":
    set_identity("Ilias Antonopoulos ([email protected])")

    filings = Company("AAPL").get_filings(form="10-K", filing_date="2016-01-01:2016-12-31")

    filing = filings.get(0).obj()

    print(filing.financials.income_statement)
    print(filing.financials.income_statement.to_dataframe())

yields the following:

i.e. apart from the missing Net Sales, one can also identify some incorrect numerical values for:

Gross Profit
Net Income
Earnings Per Share

Upon closer examination, it seems that the parser may have "picked up" the values not from the CONSOLIDATED STATEMENTS OF OPERATIONS but from Note 12 – Selected Quarterly Financial Information (Unaudited) (see screenshot below).

p.s. May also happen to 10Ks from other fiscal years prior to 2021 (based on my findings, 2021 and beyond is correct).
p.s.#2 Keep up the great work, this library is a true gift to the community <3

How to download submissio file

Really appreciate you have created such an awesome library.

Here is one issue we came up with:

After I fetched attachment

I need to download last text file, here is my code:

with open("my_file.txt", "wb") as binary_file:
    submission_txt = filing.attachments[14].download()
    # Write bytes to file
    binary_file.write(submission_txt)

It showed error, can I resolve this issue?

[bug/feature] fetch latest filings

Hi,

I‘m recently trying to see if it is possible to fetch latest data via edgartools.
eg. I was looking at https://www.sec.gov/cgi-bin/browse-edgar?company=&CIK=&type=4&owner=only&count=40&action=getcurrent
and try to get the latest records with exact accepted date time. However, it seems edgartools' most recent records only went up to "yesterday" not "today". I also tried edgar.get_by_accession_number but cannot find a specific "today's" record.

Would it be possible to achieve "live" records fetching using edgartools?

Thank you for the great tool.

Display multiple financial statements

I am trying to print financial statements of quarterly or annual reports. However, the API allows ti print only one at a time. Is there a way to dump the information of last say 3 statements in single table?

For example
Company("AAPL").get_filings(form="10-K", date="2021-01-01:").latest(1).obj()

The above one prints only one statement. Can the API consolidate all the requested filings and dump in a table format?

10-Q filings Facts to pandas dataframe

First of all, This is really amazing! I am enjoying using it and makes it much more easier to find filings than the edgar sec website.

Is there a way to convert the latest 10-Q filing data to a pandas dataframe?

from edgar import *
set_identity("first last [email protected]")
company = Company(909832)
last10q = company.get_filings(form="10-Q").latest()
last10q.xbrl()

filing financials: some facts are missing when converting income statement to dataframe

Python version: 3.12.2 (main, Mar 12 2024, 08:01:18) [GCC 12.2.0]
edgartools version: 2.21.0

Minimum Reproducible Example

from edgar import Company, set_identity


if __name__ == "__main__":
    set_identity("Ilias Antonopoulos ([email protected])")

    filings = Company("AAPL").get_filings(form="10-K", filing_date="2023-01-01:2023-12-31")

    filing = filings.get(0).obj()

    print(filing.financials.income_statement)
    print(filing.financials.income_statement.to_dataframe())

I have noticed that, although the facts present in filing.financials.income_statement are correct, when compared with the corresponding income statement table from SEC, the facts:

Total Net Sales
Cost Goods and Services Sold
Selling General and Administrative Expenses (a bit more minor)

are missing from the .to_dataframe() version of it (see screenshot below).

Feature: get the filing URLs from a given Filing

Hi!

More of a feature request, although this may already be supported; I may just be missing it.

If I have a Filing, it would be great to be able to get the URL links of the filing (screenshot) programmatically, as well:

How can I programmatically get the URL https://sec.gov/Archives/edgar/data/789019/0000950170-23-035122-index.html from the filing?

Here is the code I used to get the filing:

company = Company("MSFT")
company.get_filings(accession_number="0000950170-23-035122")[0]

import error

Import error

from edgar import *
This worked well 2 days ago. Now I get this:

ImportError Traceback (most recent call last)
Cell In[1], line 1
----> 1 from edgar import *

File ~\anaconda3\Lib\site-packages\edgar_init_.py:9
6 from functools import partial
7 from typing import Optional, Union, List
----> 9 from edgar.entities import (Company,
10 CompanyData,
11 CompanyFacts,
12 CompanySearchResults,
13 CompanyFilings,
14 CompanyFiling,
15 Entity,
16 EntityData,
17 find_company,
18 get_entity,
19 get_company_facts,
20 get_company_tickers,
21 get_entity_submissions,
22 get_ticker_to_cik_lookup,
23 get_cik_lookup_data)
24 from edgar._filings import (Filing,
25 Filings,
26 FilingHeader,
(...)
32 get_by_accession_number,
33 FilingHomepage)
34 from edgar.core import (edgar_mode,
35 CRAWL,
36 CAUTION,
37 NORMAL,
38 get_identity,
39 set_identity)

File ~\anaconda3\Lib\site-packages\edgar\entities.py:17
14 from rich.panel import Panel
15 from rich.text import Text
---> 17 from edgar._filings import Filing, Filings, FilingsState
18 from edgar._rich import df_to_rich_table, repr_rich
19 from rich import box

File ~\anaconda3\Lib\site-packages\edgar_filings.py:36
33 from rich.text import Text
35 from edgar._markdown import html_to_markdown, text_to_markdown
---> 36 from edgar._party import Address
37 from edgar._rich import df_to_rich_table, repr_rich
38 from edgar._xbrl import FilingXbrl

File ~\anaconda3\Lib\site-packages\edgar_party.py:12
10 from edgar._rich import repr_rich
11 from edgar._xml import child_text, child_value
---> 12 from edgar.core import IntString
14 all = [
15 'Address',
16 'Issuer',
(...)
20 'get_addresses_as_columns'
21 ]
24 class Address(BaseModel):

File ~\anaconda3\Lib\site-packages\edgar\core.py:20
18 import pyarrow as pa
19 import pyarrow.compute as pc
---> 20 from charset_normalizer import detect
21 from fastcore.basics import listify
22 from rich.logging import RichHandler

File ~\anaconda3\Lib\site-packages\charset_normalizer_init_.py:23
1 """
2 Charset-Normalizer
3 ~~~~~~~~~~~~~~
(...)
21 :license: MIT, see LICENSE for more details.
22 """
---> 23 from charset_normalizer.api import from_fp, from_path, from_bytes, normalize
24 from charset_normalizer.legacy import detect
25 from charset_normalizer.version import version, VERSION

File ~\anaconda3\Lib\site-packages\charset_normalizer\api.py:10
7 PathLike = Union[str, 'os.PathLike[str]'] # type: ignore
9 from charset_normalizer.constant import TOO_SMALL_SEQUENCE, TOO_BIG_SEQUENCE, IANA_SUPPORTED
---> 10 from charset_normalizer.md import mess_ratio
11 from charset_normalizer.models import CharsetMatches, CharsetMatch
12 from warnings import warn

File ~\anaconda3\Lib\site-packages\charset_normalizer\md.py:5
2 from typing import Optional, List
4 from charset_normalizer.constant import UNICODE_SECONDARY_RANGE_KEYWORD
----> 5 from charset_normalizer.utils import is_punctuation, is_symbol, unicode_range, is_accentuated, is_latin,
6 remove_accent, is_separator, is_cjk, is_case_variable, is_hangul, is_katakana, is_hiragana, is_ascii, is_thai
9 class MessDetectorPlugin:
10 """
11 Base abstract class used for mess detection plugins.
12 All detectors MUST extend and implement given methods.
13 """

ImportError: cannot import name 'COMMON_SAFE_ASCII_CHARACTERS' from 'charset_normalizer.constant' (C:\Users\crunc\anaconda3\Lib\site-packages\charset_normalizer\constant.py)

I assume this is affecting other users?

Thank you for your work.
-m

Unhashable type Series for filings xbrl

After running the following code


from edgar import get_filings, Filings, Filing, get_company, set_identity, Company

set_identity('Isla Sthiss [email protected]')

forms = ['10-Q',"10-K","10-K/A","10-Q/A"]

filings = get_filings(range(2013, 2024), form=forms)
file_10k = filings.filter(form="10-K", amendments=True) 
filing = filings[3]
filing.homepage
filing_xbrl = filing.xbrl()

TypeError: unhashable type: 'Series'

Here you have an interactive notebook to reproduce the error: https://colab.research.google.com/drive/12m_ohER2LGLe0khppGeKyqoqUkMLh_Ks?usp=sharing

NVDA 10-K "Item 1" does not provide the full item text

NVDA 10-K "Item 1" does not seem to provide the full item text.
February 24, 2023 - 10-K: Annual report for year ending January 29, 2023

NVDA 10-K filing "Item 1" text spans from page 4 to page 15. However below code provides only part of "Item 1" text page 4 through middle of page 11.

tenk = Company("NVDA").get_filings(form="10-K").latest(1).obj()
print(f"NVDA item 1 text:\n{tenk['Item 1']}")

Is there another way to get the full "Item 1" text correctly?

Bug: Not Capturing 10-K Item 1 text

I've been running some tests on "Item 1" extraction across multiple symbols and have found a few that don't get picked up in the code blow.

Code:

symbol = "PR" #EE also has this behavior if you need another test
filing = Company(symbol).get_filings(form="10-K").latest(1)

tenk = filing.obj()
output = tenk['Item 1']

Feature: Implement batch sample tests using Github actions

To detect issues earlier and across a wide range of filings we need to sample filings across years and filing times.
Implement a series of GitHub actions that sample filings and run test.

error from import edgar statement

Hi I'm getting an error at the initial import statement
from edgar import *

error is:

ImportError Traceback (most recent call last)
Cell In[5], line 1
----> 1 from edgar import *

File ~\AppData\Roaming\Python\Python311\site-packages\edgar\muniadvisors.py:14
11 from rich.table import Table
12 from rich.text import Text
---> 14 from edgar import Filing
15 from edgar._party import Name, Address
16 from edgar._rich import repr_rich

ImportError: cannot import name 'Filing' from 'edgar' (C:\Users\Geoffrey\AppData\Roaming\Python\Python311\site-packages\edgar_init_.py)

Geeting the below Issue when using filing.text()

2024-04-14 15:25:01,042 - root - INFO - Attachment for 0001047469-02-007674.txt -> EX-99.1.txt downloaded.
Traceback (most recent call last):
File "/Users/test.py", line 94, in
download_filings_and_attachments(fillings10K, dir_path_10K)
File "/Users/test.py", line 57, in download_filings_and_attachments
f.write(filing.text())
File "/Users/sumithkumars/Library/Python/3.9/lib/python/site-packages/edgar/_filings.py", line 1671, in text
return HtmlDocument.from_html(html_content).text
File "/Users/sumithkumars/Library/Python/3.9/lib/python/site-packages/edgar/documents.py", line 422, in from_html
root: Tag = cls.get_root(html)
File "/Users/sumithkumars/Library/Python/3.9/lib/python/site-packages/edgar/documents.py", line 412, in get_root
if "" in html[:500]:
TypeError: a bytes-like object is required, not 'str'

Use local data for Company facts

Can the bulk data be downloaded from the SEC (https://www.sec.gov/edgar/sec-api-documentation)? For a Company, the data would be accessed locally from the companyfacts file (https://www.sec.gov/Archives/edgar/daily-index/xbrl/companyfacts.zip), with the hope of speeding up large data downloads, like when accessing all filings from the SP500.

Get latest press release per company / query by attachment name and date

I think press releases are typically encoded as EX 99.1, which is an attachment of 8-K. This worked for me in ~200 out of 500 companies, by iterating through every 8-K and checking for that - but the other ones were missing

Is there an easier way to get the latest press release per company?

There are functions to get the latest 8-K but this doesn't always have the correct attachment

Not able to run EDGAR in Python 3.11.8

I recently updated my Conda environment to version 24.1.2 and then updated Python to 3.11.8. Using pip, I installed EDGAR to get version 5.4.3.

However, upon trying to import EDGAR and run setup, I got the following error:

What could be the cause of this? Thanks.

ability to load previously saved filings

In the spirit of ETL, it would be beneficial for downstream applications of edgartools to be able to discretize company filings persistence (e.g. in a storage backend like S3) and company filings parsing (e.g. extracting the balance sheet data) in order to engineer more flexible data pipelines.

The library right now provides the ability to persist a company filing (e.g. through .full_text_submission() on filing or through .download() on attachments) but - based on my experience as an application user - there doesn't seem to be a straightforward way to load a saved company filing and continue the parsing (e.g. filing.obj()) from there.

I suspect that this might not be the vision and philosophy of edgartools (and i totally respect it), just pitching the angle in order to discuss whether this resonates with you and the community.

p.s. I really, really like how the library is designed! Kudos for the effort, this is monumental work! <3
p.s.2 happy to help with the design / implementation of said feature (if it first passes feedback ofc)

strip_ixbrl_tags fails if 'style' is None

Hello I'm back 😅

I seem to have found another edge case for strip_ixbrl_tags.

edgartools version: 2.8.1

import edgar

edgar.set_identity("id")
company = edgar.Company("ABNB")  # Ticker is important, not all filings run into this issue
filings = company.get_filings(form=["10-K"])
if filings:
    print(filings[0].text())

Stacktrace

Traceback (most recent call last):
  File "/workspace/test.py", line 10, in <module>
    print(filings[0].text())
          ^^^^^^^^^^^^^^^^^
  File "/workspace/.venv/lib/python3.11/site-packages/edgar/_filings.py", line 1512, in text
    return html_to_text(html_content, ignore_tables=ignore_tables, sep=sep)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/.venv/lib/python3.11/site-packages/edgar/htmltools.py", line 87, in html_to_text
    html_str = try_to_strip_ixbrl_tags(html_str)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/.venv/lib/python3.11/site-packages/edgar/htmltools.py", line 176, in try_to_strip_ixbrl_tags
    return strip_ixbrl_tags(html_content)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/.venv/lib/python3.11/site-packages/edgar/htmltools.py", line 191, in strip_ixbrl_tags
    if parent.tag == '{http://www.w3.org/1999/xhtml}div' and 'display:inline' in parent.get('style'):
                                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: argument of type 'NoneType' is not iterable

get_bool function parses String field value 'true' as boolean False

The current implementation of the edgar.core.get_bool function returns False, if the string field value is String 'true', therefore returning wrong reporting_relationship for some filings, e.g.
this filing will be parsed to is_officer = False, but should be is_officer = True

NFLX 10-K income statement is not parsed correctly

NFLX 10-K filing income statement appears to be using different gaap tags and missing multiple facts from income statement after parsing.
https://www.sec.gov/ix?doc=/Archives/edgar/data/1065280/000106528023000035/nflx-20221231.htm#id2a4f0e6b6dd43c49a69044f9260d065_115
"Total Revenue" tag used in 10-K filing: us-gaap:Revenues

    incdf = Company("NFLX").get_filings(form="10-K").latest(1).obj().income_statement.income_statement_dataframe
    print(f"{incdf.to_markdown()}")

above code produces only the below three facts from NFLX 10-K filing.

	Fact	Value
0	Operating Income or Loss	$5,632,831,000
1	Net Income	$4,491,924,000
2	Interest Expense	$706,212,000

Congratulation

One of the nicest looking EDGAR libraries out there, thanks for your open source contribution.

Bug: Apply line breaks after headers when rendering HTML

When displaying html sometimes after a heading there is no line break so the header text continues into the next line.

This happens in a Press Release but probably also with html()

Filing 144

How much do you know about filings 144, for some reason AAPL only has two but maybe I am missing something. In anycase, here is some code that can help with parsing if you are looking into adding it: https://colab.research.google.com/drive/1VyH7hQ__W1Ab-HWRmPS2gywDHHZgsbrl?usp=sharing

AMD 10-K income_statement generates error

When trying to get the data from the income statement in the latest 10-K form from AMD, I am getting an error. I am using the following commands:

filing = Company("AMD").get_filings(form="10-K").latest(1)
income_statement = filing.obj().income_statement

Traceback (most recent call last):
File "", line 1, in
File "/opt/homebrew/lib/python3.11/site-packages/edgar/company_reports.py", line 56, in financials
return Financials.from_xbrl(xbrl)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/edgar/financials.py", line 301, in from_xbrl
balance_sheet = BalanceSheet(xbrl.facts)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/edgar/financials.py", line 174, in init
super().init(facts, end_date)
File "/opt/homebrew/lib/python3.11/site-packages/edgar/financials.py", line 66, in init
self.end_date: str = end_date or self.facts.period_end_date
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/edgar/_xbrl.py", line 126, in period_end_date
return self.get_dei('DocumentPeriodEndDate')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/edgar/_xbrl.py", line 122, in get_dei
return res.value.item()
^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/pandas/core/base.py", line 418, in item
raise ValueError("can only convert an array of size 1 to a Python scalar")
ValueError: can only convert an array of size 1 to a Python scalar

Publish Library documentation

Finish and publish edgartools documentation.
The WIP in progress is in mkdocs hosted on the gh-pages branch

How would you feel about adding the CAT reference data in? not technically EDGAR, it could be useful

https://www.catnmsplan.com/reference-data

set_identity is not working

it shows error "cannot import name 'set_identity' from 'edgar'" I tried using this on both jupter notebook & google colab. My python version on google colab is Python 3.10.12. & on jupyter is Python 3.11.7 . Please let me know how to resolve this. Edit: working now

Implement OwnershipDocument XML parsing

Implement parsing of Ownership XML documents for Form 3,4,5

user agent hard coded

users will have authors rate limiting applied to their own use It should use the user's user agent from set_identity

edgar/_filings.py has line 81 has the authors user-agent identity hard coded.

same issue is in tests/test_filing.py on line 214

Bug in FundReport pulling ISIN

import pandas as pd
from edgar import *

set_identity("Sample [email protected]")
to_drop = ['ticker', 'maturity_date', 'annualized_rate','desc_other_units',
'is_default', 'cash_collateral','non_cash_collateral', 'restricted']
fund = Company("MDY")
latest_holdings = Company("MDY").get_filings(form="NPORT-P")[0]
fund_report = latest_holdings.obj().investment_data()
invest_table = fund_report.drop(to_drop, axis=1)

print(invest_table)

#all the ISINs are the same:
name title
0 Hubbell Inc Hubbell Inc
1 Builders FirstSource Inc Builders FirstSource Inc
2 Reliance Steel & Aluminum Co Reliance Steel & Aluminum Co
3 Graco Inc Graco Inc
4 Jabil Inc Jabil Inc

                  lei      cusip          isin           balance units  \

0 54930088VDQ6840Y6597 443510607 US3719011096 453329.00000000 NS
1 549300W0SKP6L3H7DP63 12008R107 US3719011096 1084228.00000000 NS
2 549300E287ZOFT3C5Z56 759509102 US3719011096 497354.00000000 NS
3 4T5VJ4S81BRT6Q7GGT78 384109104 US3719011096 1424046.00000000 NS
4 5493000CGCQY2OQU7669 466313103 US3719011096 1122419.00000000 NS

Downloading attachments of a filing

Hello!

I really appreciate this Python package and thank you heaps to the authors and contributors of this package, you have made my life so much easier!

I hope it is okay for me to ask a question about the usage of the package. I am now in a situation that I would like to download only the attachments (i.e., not the primary document) of a filing. Would that be possible? Any suggestions would be appreciated.

Best regards,
G

Bug: filings.get(accession_no) throws: LiveError: Only one live display may be active at once

This one is quite strange and may be related to #20

It looks like only NEW filings are throwing this error, though I have not yet confirmed. When taking one of the most recent filing accession numbers from https://www.sec.gov/cgi-bin/browse-edgar?company=&CIK=&type=8-K&owner=include&count=100&action=getcurrent&start=2 and passing it to find the following error is thrown:

accession_no = '0001493152-24-001619'
f = find(accession_no)
           WARNING                                                                                                        _filings.py:626
                        Provide a year between 1994 and 2024 and optionally a quarter (1-4) for which the SEC has                        
                    filings.                                                                                   
                            e.g. filings = get_filings(2023) OR                                                                          
                                 filings = get_filings(2023, 1)                                                
                        (You specified the year 2024 and quarter 2)                                            
           WARNING                                                                                                        _filings.py:626
                        Provide a year between 1994 and 2024 and optionally a quarter (1-4) for which the SEC has                        
                    filings.                                                                                   
                            e.g. filings = get_filings(2023) OR                                                                          
                                 filings = get_filings(2023, 1)                                                
                        (You specified the year 2024 and quarter 3)                                            
           WARNING                                                                                                        _filings.py:626
                        Provide a year between 1994 and 2024 and optionally a quarter (1-4) for which the SEC has                        
                    filings.                                                                                   
                            e.g. filings = get_filings(2023) OR                                                                          
                                 filings = get_filings(2023, 1)                                                
                        (You specified the year 2024 and quarter 4)                                            
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/blahblah/python3.11/site-packages/edgar/__init__.py", line 77, in find
    return get_by_accession_number(search_id)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/blahblah/python3.11/site-packages/edgar/_filings.py", line 2050, in get_by_accession_number
    return filings.get(accession_number)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/blahblah/python3.11/site-packages/edgar/_filings.py", line 812, in get
    with Status(f"[bold deep_sky_blue1]Searching through the most recent filings for {accession_number}...",
  File "/blahblah/python3.11/site-packages/rich/status.py", line 97, in __enter__
    self.start()
  File "/blahblah/python3.11/site-packages/rich/status.py", line 87, in start
    self._live.start()
  File "/blahblah/python3.11/site-packages/rich/live.py", line 113, in start
    self.console.set_live(self)
  File "/blahblah/python3.11/site-packages/rich/console.py", line 836, in set_live
    raise errors.LiveError("Only one live display may be active at once")
rich.errors.LiveError: Only one live display may be active at once
>>>

and if attempted again...

>>> f = find('0001493152-24-001619')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/blahblah/python3.11/site-packages/edgar/__init__.py", line 77, in find
    return get_by_accession_number(search_id)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/blahblah/python3.11/site-packages/edgar/_filings.py", line 2050, in get_by_accession_number
    return filings.get(accession_number)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/blahblah/python3.11/site-packages/edgar/_filings.py", line 812, in get
    with Status(f"[bold deep_sky_blue1]Searching through the most recent filings for {accession_number}...",
  File "/blahblah/python3.11/site-packages/rich/status.py", line 97, in __enter__
    self.start()
  File "/blahblah/python3.11/site-packages/rich/status.py", line 87, in start
    self._live.start()
  File "/blahblah/python3.11/site-packages/rich/live.py", line 113, in start
    self.console.set_live(self)
  File "/blahblah/python3.11/site-packages/rich/console.py", line 836, in set_live
    raise errors.LiveError("Only one live display may be active at once")
rich.errors.LiveError: Only one live display may be active at once

Rendering images when downloading HTML

When you use the .download() method, the attachments such as images aren't downloaded (and therefore, not rendered) in the downloaded HTML

Example:

cik = 'TSLA'

after = '2013-01-01'

filings = Company(cik).get_filings(form="8-K").filter(date=f"{after}:")

filings[0].attachments[1].url

This leads to https://www.sec.gov/Archives/edgar/data/1318605/000095017024046895/tsla-ex99_1.htm which is a slide-deck.
The .download() method returns the HTML but the images are hosted relative to the file URL (e.g. https://www.sec.gov/Archives/edgar/data/1318605/000095017024046895/tsla-ex99_1s2.jpg).

We can also find these images in the filings[0].attachments field, under GRAPHIC tag.

What do you think would be the best pattern for the .download() method to include this? Or are there other alternatives you propose?

Bug: Filing item text is cut off after '$' character

I have a particular filing and was looking at the items and noticed it cuts off after the dollar sign. This particular example is an 8-K.

# get the company
c = Company('0001588272')

# get the filing
f = c.get_filings().filter(form='8-K')[0]

# display the item text 
print(f.text())

Notice the cut-off at:

which the Company issues shares under its distribution reinvestment plan (the “DRP”) at $

Some line items are missing in balance sheet report

Thank you for building and sharing such a useful library!

While testing it, I noticed that some line items under current assets are missing in the balance sheet report when compared to the interactive data on the sec.gov website (Please see images pasted below).

This issue repros with other companies (E.G. MSFT). Is this behavior by design?

Thank you very much!

How to pass in proxies in request

Think this is a fantastic library! Is it possible to pass in a http proxy in the request?

Feature request: Item text parsing support for 10-K/Q

First, excellent library. Well done! 👏 Thanks for creating this....

I'm curious to know if you plan to add text parsing support for 10-Ks and 10-Qs and other filings, much like you provided for 8-Ks?

Like the following:

On a side note, is there an easy way to convert the results above to JSON?

Cheers!

Bug: find method does not appear to work with accession number

I came across a curious situation where the find method does not appear to work when passing a fairly new filing accession number.

However, finding the filing via Company method seems to fetch the filing just fine.

Example:

accession_no = '0001193125-23-300021'
# not working as of 12/20/21 @ 20:43 ET
find(accession_no')

cik = '0001588272'
# vs this which works fine
Company(cik).get_filings().filter(form='8-K')[0]

Nothing critical just thought I'd log it! Thanks :)

How to get all 13F-HR holdings per ticker?

Can we easily aggregate the 13F-HR holdings per ticker to know which companies are associated with a ticker ? For example, similar to how SEC-API does here

https://sec-api.io/sandbox/13f-filings-holding-tesla

I got to the point where I have all the issuers, but need to map this to a ticker

Balance Sheet

Similar issue experiences same items are missing from balance sheet. Otherwise very intuitive design

Interest Expenses and EBITDA

Hello,
is there a way to retrieve the information on interest expenses (needed to calculate EBITDA for a company)?
Or is a synonym used?

Regards,
Peter

S-3 issues

Primary document and full submission text not working if you click the links. There is info on homepage. Cik 1145255 for example

rate limits

Hi,
does an internal mechanism exist to track the number of requests to match the rate limit imposed by data.sec.gov?

The limit is currently 10 requests/second.

https://www.sec.gov/oit/announcement/new-rate-control-limits

Thanks!

strip_ixbrl_tags breaks when name() is missing

edgartools runs into this XML parsing exception when name is missing in the XML

Repro simple

import edgar

edgar.set_identity("id")
company = edgar.Company("HON")
filings = company.get_filings(form=["10-K"])
filings[0].text()  <-- throws here

Python: 3.11.6
Edgartools version: 2.7.2

Using ProcessPoolExecutor with get_filings

I am trying to run Company(ticker).get_filings(form=["10-K","10-Q"]).filter(date=f'{start_date}:{end_date}') for around 1000 stocks using ProcessPoolExecutor. I seem to be getting error 429 even by setting the max_workers = 8 (SEC says it allows for 10 requests every second). I was wondering if 1. form=["10-K","10-Q"] = 1 or 2 requests and if 2. there was a better way to send the requests i.e accessing a list of company fillings AFTER importing all of the SEC filings at once (one request). Any help would be appreciated!

get_facts_for_namespace returns AttributeError

Problem

Per README documentation:

company = Company("SNOW")
company_facts = company.get_facts_for_namespace()

is expected to work - however get_facts_for_namespace returns an AttributeError.

Minimum Reproducible Example

# e.g. tested it in Python 3.12.2
pip install edgartools==2.9.0

and then:

from edgar import *

set_identity("Ilias Antonopoulos [email protected]")  # enter valid email here
company = Company("SNOW")
company_facts = company.get_facts_for_namespace()

Screenshot

p.s. happy to assist with its resolution if it's source code issue and not simply a documentation one :)

filings = get_filings(2023, 1)

It finishes in about 7~8 secs on my laptop. However, when I run

filings = get_filings(2023)

the code runs indefinitely. I have to write a for loop to get the filings for each quarter

filings_list = []
for i in range(1,5):
    filings_list.append(get_filings(2023, i))

it takes 24~25 secs to finish. My Python version is 3.10.14 and I am using edgartools==2.19.2. I am not sure if this is a bug or if I am doing something wrong. I would appreciate any help. Thank you.

dgunning / edgartools Goto Github PK

edgartools's Introduction

Features

Getting started

Key Concepts

How do I find a filing?

A. I know the accession number

B. I know the company ticker or cik

C. Show me a list of filings

What can I do with a filing

What can I do with a company

How to use edgartools

Working with filings

Working with a filing

Working with a company

Installation

Usage

Set your Edgar user identity

Setting EDGAR_IDENTITY in Linux/Mac

Setting EDGAR_IDENTITY in Windows Powershell

Usage

Importing edgar

Viewing and downloading attachments

Automatic parsing of filing data

Working with XBRL filings

Financials

Get the financial data as a pandas dataframe

Downloading Edgar Data

Download Bulk Company Data

Using Bulk Data

Downsides of using bulk data

Downloading Attachments

Contributing

Reporting Issues

Making code changes

License

Contact

Star History

Subscribe to Polar

edgartools's People

Contributors

Stargazers

Watchers

Forkers

edgartools's Issues

Problem

Minimum Reproducible Example

Screenshot

Recommend Projects

Recommend Topics

Recommend Org