Coder Social home page Coder Social logo

edgartools's Introduction

edgar-tools-logo

The world's easiest, most powerful edgar library

PyPI - Version GitHub last commit GitHub Workflow Status CodeFactor Hatch project GitHub

edgardemo

Features

  • ๐Ÿง  Intuitive and easy to use: edgartools has a super simple API that is easy to use.
  • ๐Ÿ› ๏ธ Works as a library or a CLI: You can use edgartools as a library in your code or as a CLI tool.
  • ๐Ÿ“ Access any SEC filing: You can access any SEC filing since 1994.
  • ๐Ÿ“… List filings for any date range: List filings for year, quarter e.g. or date range 2024-02-29:2024-03-15
  • ๐ŸŒŸ Best looking edgar library: Uses rich library to display SEC Edgar data in a beautiful way.
  • ๐Ÿ”„ Page through filings: Use filings.next() and filings.previous() to page through filings
  • ๐Ÿ—๏ธ Build Data Pipelines: Build data pipelines by finding, filtering, transforming and saving filings
  • โœ… Select a filing: You can select a filing from the list of filings.
  • ๐Ÿ“„ View the filing as HTML or text: Find a filing then get the content as HTML or text.
  • ๐Ÿ”ข Chunk filing text: You can chunk the filing text into sections for vector embedding.
  • ๐Ÿ” Preview the filing: You can preview the filing in the terminal or a notebook.
  • ๐Ÿ”Ž Search through a filing: You can search through a filing for a keyword.
  • ๐Ÿ“Š Parse XBRL: If a filing has XBRL, you can parse it to a dataframe.
  • ๐Ÿ’พ Data Objects: Automatically downloads and parses filings into data objects.
  • ๐Ÿ“ฅ Download any attachment: You can download any attachment from the filing.
  • ๐Ÿ•’ Automatic throttling: Automatically throttles requests to Edgar to avoid being blocked.
  • ๐Ÿ“ฅ Bulk downloads: Faster batch processing through bulk downloads of filings and facts
  • ๐Ÿ”ข Get company by Ticker or Cik: Get a company by ticker Company("SNOW") or cik Company(1640147)
  • ๐Ÿ“š Get company filings: You can get all the company's historical filings using company.get_filings()
  • ๐Ÿ“ˆ Get company facts: You can get company facts using company.get_facts()
  • ๐Ÿ’ฐ Company Financials: You can get company financials using company.financials
  • ๐Ÿ” Lookup Ticker by CUSIP: You can lookup a ticker by CUSIP
  • ๐Ÿ“‘ Dataset of SEC entities: You can get a dataset of SEC companies and persons
  • ๐Ÿ“ˆ Fund Reports: Search for and get 13F-HR fund reports
  • ๐Ÿ‘ค Insider Transactions: Search for and get insider transactions

Getting started

Install using pip

pip install edgartools

Import and start using

from edgar import *

# Tell the SEC who you are
set_identity("Michael Mccallum [email protected]")

filings = get_filings()

Key Concepts

How do I find a filing?

Depends on what you know

A. I know the accession number

filing = find("0001065280-23-000273")

B. I know the company ticker or cik

filings = Company("NFLX").get_filings(form="10-Q").latest(1)

C. Show me a list of filings

filings = get_filings(form="10-Q")
filing = filings[0]

What can I do with a filing

You can view it in the terminal or open it in the browser, get the filing as html, xml or text, and download attachments. You can extract data from the filing into a data object.

What can I do with a company

You can get the company's filings, facts and financials.

How to use edgartools

Task Code
Set your EDGAR identity in Linux/Mac export EDGAR_IDENTITY="First Last [email protected]"
Set your EDGAR identity in Windows set EDGAR_IDENTITY="First Last [email protected]"
Set identity in Windows Powershell $env:EDGAR_IDENTITY="First Last [email protected]"
Set identity in Python set_identity("First Last [email protected]")
Importing the library from edgar import *

Working with filings

Task Code
Get filings for the year to date filings = get_filings()
Get only xbrl filings filings = get_filings(index="xbrl")
Get filings for a specific year filings = get_filings(2020)
Get filings for a specific quarter filings = get_filings(2020, 1)
Get filings for multiple years filings = get_filings([2020, 2021])
Get filigs for a range of years filings = get_filings(year=range(2010, 2020)
Get filings for a specific form filings = get_filings(form="10-K")
Get filings for a list of forms filings = get_filings(form=["10-K", "10-Q"])
Show the next page of filings filings.next()
Show the previous page of filings filings.prev()
Get the first n filings filings.head(20)
Get the last n filings filings.tail(20)
Get the latest n filings by date filings.latest(20)
Get a random sample of the filings filings.sample(20)
Filter filings on a date filings = filings.filter(date="2020-01-01")
Filter filings between dates filings.filter(date="2020-01-01:2020-03-01")
Filter filings before a date filings.filter(date=":2020-03-01")
Filter filings after a date filings.filter(date="2020-03-01:")
Get filings as a pandas dataframe filings.to_pandas()

Working with a filing

Task Code
Get a single filing filing = filings[3]
Get a filing by accession number filing = get_by_accession_number("0000320193-20-34576")
Get the filing homepage filing.homepage
Open a filing in the browser filing.open()
Open the filing homepage in the browser filing.homepage.open()
View the filing in the terminal filing.view()
Get the html of the filing document filing.html()
Get the XBRL of the filing document filing.xbrl()
Get the filing document as markdown filing.markdown()
Get the full submission text of a filing filing.text()
Get and parse the data object of a filing filing.obj()
Get the filing attachments filing.attachments
Get a single attachment attachment = filing.attachments[0]
Open an attachment in the browser attachment.open()
Download an attachment content = attachment.download()

Working with a company

Task Code
Get a company by ticker company = Company("AAPL")
Get a company by CIK company = Company("0000320193")
Get company facts company.get_facts()
Get company facts as a pandas dataframe company.get_facts().to_pandas()
Get company filings company.get_filings()
Get company filings by form company.get_filings(form="10-K")
Get a company filing by accession_number company.get_filing(accession_number="0000320193-21-000139")
Get the company's financials company.financials
Get the company's balance sheet company.financials.balance_sheet
Get the company's income statement company.financials.income_statement
Get the company's cash flow statement company.financials.cash_flow_statement

Installation

pip install edgartools

Usage

Set your Edgar user identity

Before you can access the SEC Edgar API you need to set the identity that you will use to access Edgar. This is usually your name and email, or a company name and email.

Sample Company Name AdminContact@<sample company domain>.com

The user identity is sent in the User-Agent string and the Edgar API will refuse to respond to your request without it.

EdgarTools will look for an environment variable called EDGAR_IDENTITY and use that in each request. So, you need to set this environment variable before using it.

Setting EDGAR_IDENTITY in Linux/Mac

export EDGAR_IDENTITY="Michael Mccallum [email protected]"

Setting EDGAR_IDENTITY in Windows Powershell

 $Env:EDGAR_IDENTITY="Michael Mccallum [email protected]"

Alternatively, you can call set_identity which does the same thing.

from edgar import set_identity
set_identity("Michael Mccallum [email protected]")

For more detail see https://www.sec.gov/os/accessing-edgar-data

Usage

Importing edgar

from edgar import *

Use the Filing API when you are not working with a specific company, but want to get a list of filings.

For details on how to use the Filing API see Using the Filing API

With the Company API you can find a company by ticker or CIK, and get the company's filings, facts and financials.

Company("AAPL")
        .get_filings(form="10-Q")
        .latest(1)
        .obj()

expe

See Using the Company API

Viewing and downloading attachments

Every filing has a list of attachments. You can view the attachments using filing.attachments

# View the attachments
filing.attachments

Filing attachments

You can access each attachment using the bracket operator [] and the index of the attachment.

# Get the first attachment
attachment = filing.attachments[0]

Filing attachments

You can download the attachment using attachment.download(). This will download the attachment to string or bytes in memory.

Automatic parsing of filing data

Now the reason you may want to download attachments is to get information contained in data files. For example, 13F-HR filings have attached infotable.xml files containing data from the holding report for that filing.

Fortunately, the library handles this for you. If you call filing.obj() it will automatically download and parse the data files into a data object, for several different form types. Currently, the following forms are supported:

Form Data Object Description
10-K TenK Annual report
10-Q TenQ Quarterly report
8-K EightK Current report
MA-I MunicipalAdvisorForm Municipal advisor initial filing
Form 144 Form144 Notice of proposed sale of securities
C, C-U, C-AR, C-TR FormC Form C Crowdfunding Offering
D FormD Form D Offering
3,4,5 Ownership Ownership reports
13F-HR ThirteenF 13F Holdings Report
NPORT-P FundReport Fund Report
EFFECT Effect Notice of Effectiveness
And other filing with XBRL FilingXbrl

For example, to get the data object for a 13F-HR filing you can do the following:

filings = get_filings(form="13F-HR")
filing = filings[0]
thirteenf = filing.obj()

Filing attachments

If you call obj() on a filing that does not have a data file, then it will return None.

Working with XBRL filings

Some filings are in XBRL (eXtensible Business Markup Language) format. These are mainly the newer filings, as the SEC has started requiring this for newer filings.

If a filing is in XBRL format then it opens up a lot more ways to get structured data about that specific filing and also about the company referred to in that filing.

The Filing class has an xbrl function that will download, parse and structure the filing's XBRL document if one exists. If it does not exist, then filing.xbrl() will return None.

The function filing.xbrl() returns a FilingXbrl instance, which wraps the data, and provides convenient ways of working with the xbrl data.

filing_xbrl = filing.xbrl()

Filing homapage

Financials

Some filings, notably 10-K and 10-Q filings contain financial statements in XBRL format. You can get the financials from the XBRL data using the Financials class.

from edgar.financials import Financials
financials = Financials.from_xbrl(filing.xbrl())
financials.balance_sheet
financials.income_statement
financials.cash_flow_statement

Or automatically through the Tenk and TenQ data objects.

Here is an example that gets the latest Apple financials

tenk = Company("AAPL").get_filings(form="10-K").latest(1).obj()
financials = tenk.financials
financials.balance_sheet

Balance Sheet

Get the financial data as a pandas dataframe

Each of the financial statements - BalanceSheet, IncomeStatement and CashFlowStatement - have a to_dataframe() method that will return the data as a pandas dataframe.

balance_sheet_df = financials.balance_sheet.to_dataframe()

Downloading Edgar Data

The library is designed to make real time calls to EDGAR to get the latest data. However, you may want to download data for offline use or to build a dataset.

Download Bulk Company Data

You can download all the company filings and facts from Edgar using the download_edgar_data function. Note that this will store json files for each company of their facts and submissions, but it will not include the actual HTML or other attachments. It will however dramatically speed up loading companies by cik or ticker.

The submissions and facts bulk data files are each over 1.GB in size, and take around a few minutes each. The data is stored by default in the ~/.edgar directory. You can change this by setting the EDGAR_LOCAL_DATA_DIR environment variable.

```python
def download_edgar_data(submissions: bool = True, facts: bool = True):
    """
    Download all the company data from Edgar
    :param submissions: Download all the company submissions
    :param facts: Download all the company facts
    """
download_edgar_data()

Using Bulk Data

If you want edgartools to use the bulk data files you can call use_local_storage() before you start making calls using the library. Alternatively, set EDGAR_USE_LOCAL_DATA to True in your environment.

Downsides of using bulk data

  • The filings downloaded for each company is limited to the last 1000
  • You will need to download the latest data every so often to keep it up to date.

Downloading Attachments

You can download attachments from a filing using the download method on the attachments. This will download all the attached files to a folder of your choice.

class Attachments:
    
    ...
    
    def download(self, path: Union[str, Path], archive: bool = False):
        """
        Download all the attachments to a specified path.
        If the path is a directory, the file is saved with its original name in that directory.
        If the path is a file, the file is saved with the given path name.
        If archive is True, the attachments are saved in a zip file.
        path: str or Path - The path to save the attachments
        archive: bool (default False) - If True, save the attachments in a zip file
        """ 
        ...
        
# Usage
filing.attachments.download(path)

Contributing

Contributions are welcome! We would love to hear your thoughts on how this library could be better at working with SEC Edgar.

Reporting Issues

We use GitHub issues to track public bugs. Report a bug by opening a new issue; it's that easy!

Making code changes

  • Fork the repo and create your branch from master.
  • If you've added code that should be tested, add tests.
  • If you've changed APIs, update the documentation.
  • Ensure the test suite passes.
  • Make sure your code lints.
  • Issue that pull request!

License

edgartools is distributed under the terms of the MIT license.

Contact

LinkedIn

Star History

Star History Chart

Subscribe to Polar

2 3 Subscribe on Polar 4

edgartools's People

Contributors

chence08 avatar dgunning avatar jekozyra avatar linusbiostat avatar rla3rd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

edgartools's Issues

Partially wrong parsing of Apple's 10K for 2016

Python version: 3.12.2 (main, Mar 12 2024, 08:01:18) [GCC 12.2.0]
edgartools version: 2.21.1

Minimum Reproducible Example

from edgar import Company, set_identity


if __name__ == "__main__":
    set_identity("Ilias Antonopoulos ([email protected])")

    filings = Company("AAPL").get_filings(form="10-K", filing_date="2016-01-01:2016-12-31")

    filing = filings.get(0).obj()

    print(filing.financials.income_statement)
    print(filing.financials.income_statement.to_dataframe())

yields the following:

image

i.e. apart from the missing Net Sales, one can also identify some incorrect numerical values for:

  • Gross Profit
  • Net Income
  • Earnings Per Share

Upon closer examination, it seems that the parser may have "picked up" the values not from the CONSOLIDATED STATEMENTS OF OPERATIONS but from Note 12 โ€“ Selected Quarterly Financial Information (Unaudited) (see screenshot below).

image

p.s. May also happen to 10Ks from other fiscal years prior to 2021 (based on my findings, 2021 and beyond is correct).
p.s.#2 Keep up the great work, this library is a true gift to the community <3

How to download submissio file

Really appreciate you have created such an awesome library.

Here is one issue we came up with:

After I fetched attachment
Screenshot 2024-01-24 100034

I need to download last text file, here is my code:

with open("my_file.txt", "wb") as binary_file:
    submission_txt = filing.attachments[14].download()
    # Write bytes to file
    binary_file.write(submission_txt)

It showed error, can I resolve this issue?

[bug/feature] fetch latest filings

Hi,

Iโ€˜m recently trying to see if it is possible to fetch latest data via edgartools.
eg. I was looking at https://www.sec.gov/cgi-bin/browse-edgar?company=&CIK=&type=4&owner=only&count=40&action=getcurrent
and try to get the latest records with exact accepted date time. However, it seems edgartools' most recent records only went up to "yesterday" not "today". I also tried edgar.get_by_accession_number but cannot find a specific "today's" record.

Would it be possible to achieve "live" records fetching using edgartools?

Thank you for the great tool.

Display multiple financial statements

I am trying to print financial statements of quarterly or annual reports. However, the API allows ti print only one at a time. Is there a way to dump the information of last say 3 statements in single table?

For example
Company("AAPL").get_filings(form="10-K", date="2021-01-01:").latest(1).obj()

The above one prints only one statement. Can the API consolidate all the requested filings and dump in a table format?

10-Q filings Facts to pandas dataframe

First of all, This is really amazing! I am enjoying using it and makes it much more easier to find filings than the edgar sec website.

Is there a way to convert the latest 10-Q filing data to a pandas dataframe?

from edgar import *
set_identity("first last [email protected]")
company = Company(909832)
last10q = company.get_filings(form="10-Q").latest()
last10q.xbrl()

filing financials: some facts are missing when converting income statement to dataframe

Python version: 3.12.2 (main, Mar 12 2024, 08:01:18) [GCC 12.2.0]
edgartools version: 2.21.0

Minimum Reproducible Example

from edgar import Company, set_identity


if __name__ == "__main__":
    set_identity("Ilias Antonopoulos ([email protected])")

    filings = Company("AAPL").get_filings(form="10-K", filing_date="2023-01-01:2023-12-31")

    filing = filings.get(0).obj()

    print(filing.financials.income_statement)
    print(filing.financials.income_statement.to_dataframe())

I have noticed that, although the facts present in filing.financials.income_statement are correct, when compared with the corresponding income statement table from SEC, the facts:

  • Total Net Sales
  • Cost Goods and Services Sold
  • Selling General and Administrative Expenses (a bit more minor)

are missing from the .to_dataframe() version of it (see screenshot below).

image

Feature: get the filing URLs from a given Filing

Hi!

More of a feature request, although this may already be supported; I may just be missing it.

If I have a Filing, it would be great to be able to get the URL links of the filing (screenshot) programmatically, as well:

Screenshot 2024-05-01 at 5 39 11 PM

How can I programmatically get the URL https://sec.gov/Archives/edgar/data/789019/0000950170-23-035122-index.html from the filing?

Here is the code I used to get the filing:

company = Company("MSFT")
company.get_filings(accession_number="0000950170-23-035122")[0]

import error

Import error

from edgar import *
This worked well 2 days ago. Now I get this:


ImportError Traceback (most recent call last)
Cell In[1], line 1
----> 1 from edgar import *

File ~\anaconda3\Lib\site-packages\edgar_init_.py:9
6 from functools import partial
7 from typing import Optional, Union, List
----> 9 from edgar.entities import (Company,
10 CompanyData,
11 CompanyFacts,
12 CompanySearchResults,
13 CompanyFilings,
14 CompanyFiling,
15 Entity,
16 EntityData,
17 find_company,
18 get_entity,
19 get_company_facts,
20 get_company_tickers,
21 get_entity_submissions,
22 get_ticker_to_cik_lookup,
23 get_cik_lookup_data)
24 from edgar._filings import (Filing,
25 Filings,
26 FilingHeader,
(...)
32 get_by_accession_number,
33 FilingHomepage)
34 from edgar.core import (edgar_mode,
35 CRAWL,
36 CAUTION,
37 NORMAL,
38 get_identity,
39 set_identity)

File ~\anaconda3\Lib\site-packages\edgar\entities.py:17
14 from rich.panel import Panel
15 from rich.text import Text
---> 17 from edgar._filings import Filing, Filings, FilingsState
18 from edgar._rich import df_to_rich_table, repr_rich
19 from rich import box

File ~\anaconda3\Lib\site-packages\edgar_filings.py:36
33 from rich.text import Text
35 from edgar._markdown import html_to_markdown, text_to_markdown
---> 36 from edgar._party import Address
37 from edgar._rich import df_to_rich_table, repr_rich
38 from edgar._xbrl import FilingXbrl

File ~\anaconda3\Lib\site-packages\edgar_party.py:12
10 from edgar._rich import repr_rich
11 from edgar._xml import child_text, child_value
---> 12 from edgar.core import IntString
14 all = [
15 'Address',
16 'Issuer',
(...)
20 'get_addresses_as_columns'
21 ]
24 class Address(BaseModel):

File ~\anaconda3\Lib\site-packages\edgar\core.py:20
18 import pyarrow as pa
19 import pyarrow.compute as pc
---> 20 from charset_normalizer import detect
21 from fastcore.basics import listify
22 from rich.logging import RichHandler

File ~\anaconda3\Lib\site-packages\charset_normalizer_init_.py:23
1 """
2 Charset-Normalizer
3 ~~~~~~~~~~~~~~
(...)
21 :license: MIT, see LICENSE for more details.
22 """
---> 23 from charset_normalizer.api import from_fp, from_path, from_bytes, normalize
24 from charset_normalizer.legacy import detect
25 from charset_normalizer.version import version, VERSION

File ~\anaconda3\Lib\site-packages\charset_normalizer\api.py:10
7 PathLike = Union[str, 'os.PathLike[str]'] # type: ignore
9 from charset_normalizer.constant import TOO_SMALL_SEQUENCE, TOO_BIG_SEQUENCE, IANA_SUPPORTED
---> 10 from charset_normalizer.md import mess_ratio
11 from charset_normalizer.models import CharsetMatches, CharsetMatch
12 from warnings import warn

File ~\anaconda3\Lib\site-packages\charset_normalizer\md.py:5
2 from typing import Optional, List
4 from charset_normalizer.constant import UNICODE_SECONDARY_RANGE_KEYWORD
----> 5 from charset_normalizer.utils import is_punctuation, is_symbol, unicode_range, is_accentuated, is_latin,
6 remove_accent, is_separator, is_cjk, is_case_variable, is_hangul, is_katakana, is_hiragana, is_ascii, is_thai
9 class MessDetectorPlugin:
10 """
11 Base abstract class used for mess detection plugins.
12 All detectors MUST extend and implement given methods.
13 """

ImportError: cannot import name 'COMMON_SAFE_ASCII_CHARACTERS' from 'charset_normalizer.constant' (C:\Users\crunc\anaconda3\Lib\site-packages\charset_normalizer\constant.py)


I assume this is affecting other users?

Thank you for your work.
-m

Unhashable type Series for filings xbrl

After running the following code


from edgar import get_filings, Filings, Filing, get_company, set_identity, Company

set_identity('Isla Sthiss [email protected]')

forms = ['10-Q',"10-K","10-K/A","10-Q/A"]

filings = get_filings(range(2013, 2024), form=forms)
file_10k = filings.filter(form="10-K", amendments=True) 
filing = filings[3]
filing.homepage
filing_xbrl = filing.xbrl()

TypeError: unhashable type: 'Series'

TypeError: unhashable type: 'Series'
image

Here you have an interactive notebook to reproduce the error: https://colab.research.google.com/drive/12m_ohER2LGLe0khppGeKyqoqUkMLh_Ks?usp=sharing

NVDA 10-K "Item 1" does not provide the full item text

NVDA 10-K "Item 1" does not seem to provide the full item text.
February 24, 2023 - 10-K: Annual report for year ending January 29, 2023

NVDA 10-K filing "Item 1" text spans from page 4 to page 15. However below code provides only part of "Item 1" text page 4 through middle of page 11.

tenk = Company("NVDA").get_filings(form="10-K").latest(1).obj()
print(f"NVDA item 1 text:\n{tenk['Item 1']}")

Is there another way to get the full "Item 1" text correctly?

Bug: Not Capturing 10-K Item 1 text

I've been running some tests on "Item 1" extraction across multiple symbols and have found a few that don't get picked up in the code blow.

Code:

symbol = "PR" #EE also has this behavior if you need another test
filing = Company(symbol).get_filings(form="10-K").latest(1)

tenk = filing.obj()
output = tenk['Item 1']

error from import edgar statement

Hi I'm getting an error at the initial import statement
from edgar import *

error is:


ImportError Traceback (most recent call last)
Cell In[5], line 1
----> 1 from edgar import *

File ~\AppData\Roaming\Python\Python311\site-packages\edgar\muniadvisors.py:14
11 from rich.table import Table
12 from rich.text import Text
---> 14 from edgar import Filing
15 from edgar._party import Name, Address
16 from edgar._rich import repr_rich

ImportError: cannot import name 'Filing' from 'edgar' (C:\Users\Geoffrey\AppData\Roaming\Python\Python311\site-packages\edgar_init_.py)

Geeting the below Issue when using filing.text()

2024-04-14 15:25:01,042 - root - INFO - Attachment for 0001047469-02-007674.txt -> EX-99.1.txt downloaded.
Traceback (most recent call last):
File "/Users/test.py", line 94, in
download_filings_and_attachments(fillings10K, dir_path_10K)
File "/Users/test.py", line 57, in download_filings_and_attachments
f.write(filing.text())
File "/Users/sumithkumars/Library/Python/3.9/lib/python/site-packages/edgar/_filings.py", line 1671, in text
return HtmlDocument.from_html(html_content).text
File "/Users/sumithkumars/Library/Python/3.9/lib/python/site-packages/edgar/documents.py", line 422, in from_html
root: Tag = cls.get_root(html)
File "/Users/sumithkumars/Library/Python/3.9/lib/python/site-packages/edgar/documents.py", line 412, in get_root
if "" in html[:500]:
TypeError: a bytes-like object is required, not 'str'

Get latest press release per company / query by attachment name and date

I think press releases are typically encoded as EX 99.1, which is an attachment of 8-K. This worked for me in ~200 out of 500 companies, by iterating through every 8-K and checking for that - but the other ones were missing

Is there an easier way to get the latest press release per company?

There are functions to get the latest 8-K but this doesn't always have the correct attachment

Not able to run EDGAR in Python 3.11.8

I recently updated my Conda environment to version 24.1.2 and then updated Python to 3.11.8. Using pip, I installed EDGAR to get version 5.4.3.
image

However, upon trying to import EDGAR and run setup, I got the following error:
image

What could be the cause of this? Thanks.

ability to load previously saved filings

In the spirit of ETL, it would be beneficial for downstream applications of edgartools to be able to discretize company filings persistence (e.g. in a storage backend like S3) and company filings parsing (e.g. extracting the balance sheet data) in order to engineer more flexible data pipelines.

The library right now provides the ability to persist a company filing (e.g. through .full_text_submission() on filing or through .download() on attachments) but - based on my experience as an application user - there doesn't seem to be a straightforward way to load a saved company filing and continue the parsing (e.g. filing.obj()) from there.

I suspect that this might not be the vision and philosophy of edgartools (and i totally respect it), just pitching the angle in order to discuss whether this resonates with you and the community.

p.s. I really, really like how the library is designed! Kudos for the effort, this is monumental work! <3
p.s.2 happy to help with the design / implementation of said feature (if it first passes feedback ofc)

strip_ixbrl_tags fails if 'style' is None

Hello I'm back ๐Ÿ˜…

I seem to have found another edge case for strip_ixbrl_tags.

edgartools version: 2.8.1

import edgar

edgar.set_identity("id")
company = edgar.Company("ABNB")  # Ticker is important, not all filings run into this issue
filings = company.get_filings(form=["10-K"])
if filings:
    print(filings[0].text()) 

Stacktrace

Traceback (most recent call last):
  File "/workspace/test.py", line 10, in <module>
    print(filings[0].text())
          ^^^^^^^^^^^^^^^^^
  File "/workspace/.venv/lib/python3.11/site-packages/edgar/_filings.py", line 1512, in text
    return html_to_text(html_content, ignore_tables=ignore_tables, sep=sep)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/.venv/lib/python3.11/site-packages/edgar/htmltools.py", line 87, in html_to_text
    html_str = try_to_strip_ixbrl_tags(html_str)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/.venv/lib/python3.11/site-packages/edgar/htmltools.py", line 176, in try_to_strip_ixbrl_tags
    return strip_ixbrl_tags(html_content)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/.venv/lib/python3.11/site-packages/edgar/htmltools.py", line 191, in strip_ixbrl_tags
    if parent.tag == '{http://www.w3.org/1999/xhtml}div' and 'display:inline' in parent.get('style'):
                                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: argument of type 'NoneType' is not iterable 

NFLX 10-K income statement is not parsed correctly

NFLX 10-K filing income statement appears to be using different gaap tags and missing multiple facts from income statement after parsing.
https://www.sec.gov/ix?doc=/Archives/edgar/data/1065280/000106528023000035/nflx-20221231.htm#id2a4f0e6b6dd43c49a69044f9260d065_115
"Total Revenue" tag used in 10-K filing: us-gaap:Revenues

    incdf = Company("NFLX").get_filings(form="10-K").latest(1).obj().income_statement.income_statement_dataframe
    print(f"{incdf.to_markdown()}")

above code produces only the below three facts from NFLX 10-K filing.

Fact Value
0 Operating Income or Loss $5,632,831,000
1 Net Income $4,491,924,000
2 Interest Expense $706,212,000

Congratulation

One of the nicest looking EDGAR libraries out there, thanks for your open source contribution.

AMD 10-K income_statement generates error

When trying to get the data from the income statement in the latest 10-K form from AMD, I am getting an error. I am using the following commands:

filing = Company("AMD").get_filings(form="10-K").latest(1)
income_statement = filing.obj().income_statement

Traceback (most recent call last):
File "", line 1, in
File "/opt/homebrew/lib/python3.11/site-packages/edgar/company_reports.py", line 56, in financials
return Financials.from_xbrl(xbrl)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/edgar/financials.py", line 301, in from_xbrl
balance_sheet = BalanceSheet(xbrl.facts)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/edgar/financials.py", line 174, in init
super().init(facts, end_date)
File "/opt/homebrew/lib/python3.11/site-packages/edgar/financials.py", line 66, in init
self.end_date: str = end_date or self.facts.period_end_date
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/edgar/_xbrl.py", line 126, in period_end_date
return self.get_dei('DocumentPeriodEndDate')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/edgar/_xbrl.py", line 122, in get_dei
return res.value.item()
^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/pandas/core/base.py", line 418, in item
raise ValueError("can only convert an array of size 1 to a Python scalar")
ValueError: can only convert an array of size 1 to a Python scalar

set_identity is not working

123

it shows error "cannot import name 'set_identity' from 'edgar'" I tried using this on both jupter notebook & google colab. My python version on google colab is Python 3.10.12. & on jupyter is Python 3.11.7 . Please let me know how to resolve this. Edit: working now

user agent hard coded

users will have authors rate limiting applied to their own use It should use the user's user agent from set_identity

edgar/_filings.py has line 81 has the authors user-agent identity hard coded.

same issue is in tests/test_filing.py on line 214

Bug in FundReport pulling ISIN

import pandas as pd
from edgar import *

set_identity("Sample [email protected]")
to_drop = ['ticker', 'maturity_date', 'annualized_rate','desc_other_units',
'is_default', 'cash_collateral','non_cash_collateral', 'restricted']
fund = Company("MDY")
latest_holdings = Company("MDY").get_filings(form="NPORT-P")[0]
fund_report = latest_holdings.obj().investment_data()
invest_table = fund_report.drop(to_drop, axis=1)

print(invest_table)

#all the ISINs are the same:
name title
0 Hubbell Inc Hubbell Inc
1 Builders FirstSource Inc Builders FirstSource Inc
2 Reliance Steel & Aluminum Co Reliance Steel & Aluminum Co
3 Graco Inc Graco Inc
4 Jabil Inc Jabil Inc

                  lei      cusip          isin           balance units  \

0 54930088VDQ6840Y6597 443510607 US3719011096 453329.00000000 NS
1 549300W0SKP6L3H7DP63 12008R107 US3719011096 1084228.00000000 NS
2 549300E287ZOFT3C5Z56 759509102 US3719011096 497354.00000000 NS
3 4T5VJ4S81BRT6Q7GGT78 384109104 US3719011096 1424046.00000000 NS
4 5493000CGCQY2OQU7669 466313103 US3719011096 1122419.00000000 NS

Downloading attachments of a filing

Hello!

I really appreciate this Python package and thank you heaps to the authors and contributors of this package, you have made my life so much easier!

I hope it is okay for me to ask a question about the usage of the package. I am now in a situation that I would like to download only the attachments (i.e., not the primary document) of a filing. Would that be possible? Any suggestions would be appreciated.

Best regards,
G

Bug: filings.get(accession_no) throws: LiveError: Only one live display may be active at once

This one is quite strange and may be related to #20

It looks like only NEW filings are throwing this error, though I have not yet confirmed. When taking one of the most recent filing accession numbers from https://www.sec.gov/cgi-bin/browse-edgar?company=&CIK=&type=8-K&owner=include&count=100&action=getcurrent&start=2 and passing it to find the following error is thrown:

accession_no = '0001493152-24-001619'
f = find(accession_no)
           WARNING                                                                                                        _filings.py:626
                        Provide a year between 1994 and 2024 and optionally a quarter (1-4) for which the SEC has                        
                    filings.                                                                                   
                            e.g. filings = get_filings(2023) OR                                                                          
                                 filings = get_filings(2023, 1)                                                
                        (You specified the year 2024 and quarter 2)                                            
           WARNING                                                                                                        _filings.py:626
                        Provide a year between 1994 and 2024 and optionally a quarter (1-4) for which the SEC has                        
                    filings.                                                                                   
                            e.g. filings = get_filings(2023) OR                                                                          
                                 filings = get_filings(2023, 1)                                                
                        (You specified the year 2024 and quarter 3)                                            
           WARNING                                                                                                        _filings.py:626
                        Provide a year between 1994 and 2024 and optionally a quarter (1-4) for which the SEC has                        
                    filings.                                                                                   
                            e.g. filings = get_filings(2023) OR                                                                          
                                 filings = get_filings(2023, 1)                                                
                        (You specified the year 2024 and quarter 4)                                            
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/blahblah/python3.11/site-packages/edgar/__init__.py", line 77, in find
    return get_by_accession_number(search_id)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/blahblah/python3.11/site-packages/edgar/_filings.py", line 2050, in get_by_accession_number
    return filings.get(accession_number)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/blahblah/python3.11/site-packages/edgar/_filings.py", line 812, in get
    with Status(f"[bold deep_sky_blue1]Searching through the most recent filings for {accession_number}...",
  File "/blahblah/python3.11/site-packages/rich/status.py", line 97, in __enter__
    self.start()
  File "/blahblah/python3.11/site-packages/rich/status.py", line 87, in start
    self._live.start()
  File "/blahblah/python3.11/site-packages/rich/live.py", line 113, in start
    self.console.set_live(self)
  File "/blahblah/python3.11/site-packages/rich/console.py", line 836, in set_live
    raise errors.LiveError("Only one live display may be active at once")
rich.errors.LiveError: Only one live display may be active at once
>>> 

and if attempted again...

>>> f = find('0001493152-24-001619')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/blahblah/python3.11/site-packages/edgar/__init__.py", line 77, in find
    return get_by_accession_number(search_id)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/blahblah/python3.11/site-packages/edgar/_filings.py", line 2050, in get_by_accession_number
    return filings.get(accession_number)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/blahblah/python3.11/site-packages/edgar/_filings.py", line 812, in get
    with Status(f"[bold deep_sky_blue1]Searching through the most recent filings for {accession_number}...",
  File "/blahblah/python3.11/site-packages/rich/status.py", line 97, in __enter__
    self.start()
  File "/blahblah/python3.11/site-packages/rich/status.py", line 87, in start
    self._live.start()
  File "/blahblah/python3.11/site-packages/rich/live.py", line 113, in start
    self.console.set_live(self)
  File "/blahblah/python3.11/site-packages/rich/console.py", line 836, in set_live
    raise errors.LiveError("Only one live display may be active at once")
rich.errors.LiveError: Only one live display may be active at once

Rendering images when downloading HTML

When you use the .download() method, the attachments such as images aren't downloaded (and therefore, not rendered) in the downloaded HTML

Example:

cik = 'TSLA'

after = '2013-01-01'

filings = Company(cik).get_filings(form="8-K").filter(date=f"{after}:")

filings[0].attachments[1].url

This leads to https://www.sec.gov/Archives/edgar/data/1318605/000095017024046895/tsla-ex99_1.htm which is a slide-deck.
The .download() method returns the HTML but the images are hosted relative to the file URL (e.g. https://www.sec.gov/Archives/edgar/data/1318605/000095017024046895/tsla-ex99_1s2.jpg).

We can also find these images in the filings[0].attachments field, under GRAPHIC tag.

What do you think would be the best pattern for the .download() method to include this? Or are there other alternatives you propose?

Bug: Filing item text is cut off after '$' character

I have a particular filing and was looking at the items and noticed it cuts off after the dollar sign. This particular example is an 8-K.

# get the company
c = Company('0001588272')

# get the filing
f = c.get_filings().filter(form='8-K')[0]

# display the item text 
print(f.text())

Notice the cut-off at:

which the Company issues shares under its distribution reinvestment plan (the โ€œDRPโ€) at $

Some line items are missing in balance sheet report

Thank you for building and sharing such a useful library!

While testing it, I noticed that some line items under current assets are missing in the balance sheet report when compared to the interactive data on the sec.gov website (Please see images pasted below).

image
image

This issue repros with other companies (E.G. MSFT). Is this behavior by design?

Thank you very much!

Feature request: Item text parsing support for 10-K/Q

First, excellent library. Well done! ๐Ÿ‘ Thanks for creating this....

I'm curious to know if you plan to add text parsing support for 10-Ks and 10-Qs and other filings, much like you provided for 8-Ks?

Like the following:

Image

On a side note, is there an easy way to convert the results above to JSON?

Cheers!

Bug: find method does not appear to work with accession number

I came across a curious situation where the find method does not appear to work when passing a fairly new filing accession number.

However, finding the filing via Company method seems to fetch the filing just fine.

Example:

accession_no = '0001193125-23-300021'
# not working as of 12/20/21 @ 20:43 ET
find(accession_no')

cik = '0001588272'
# vs this which works fine
Company(cik).get_filings().filter(form='8-K')[0]

Nothing critical just thought I'd log it! Thanks :)

Balance Sheet

Similar issue experiences same items are missing from balance sheet. Otherwise very intuitive design

Interest Expenses and EBITDA

Hello,
is there a way to retrieve the information on interest expenses (needed to calculate EBITDA for a company)?
Or is a synonym used?

Regards,
Peter

S-3 issues

Primary document and full submission text not working if you click the links. There is info on homepage. Cik 1145255 for example

strip_ixbrl_tags breaks when name() is missing

edgartools runs into this XML parsing exception when name is missing in the XML

Screenshot 2024-01-31 at 4 01 19โ€ฏPM Screenshot 2024-01-31 at 3 58 28โ€ฏPM

Repro simple

import edgar

edgar.set_identity("id")
company = edgar.Company("HON")
filings = company.get_filings(form=["10-K"])
filings[0].text()  <-- throws here

Python: 3.11.6
Edgartools version: 2.7.2

Using ProcessPoolExecutor with get_filings

I am trying to run Company(ticker).get_filings(form=["10-K","10-Q"]).filter(date=f'{start_date}:{end_date}') for around 1000 stocks using ProcessPoolExecutor. I seem to be getting error 429 even by setting the max_workers = 8 (SEC says it allows for 10 requests every second). I was wondering if 1. form=["10-K","10-Q"] = 1 or 2 requests and if 2. there was a better way to send the requests i.e accessing a list of company fillings AFTER importing all of the SEC filings at once (one request). Any help would be appreciated!

get_facts_for_namespace returns AttributeError

Problem

Per README documentation:

company = Company("SNOW")
company_facts = company.get_facts_for_namespace()

is expected to work - however get_facts_for_namespace returns an AttributeError.

Minimum Reproducible Example

# e.g. tested it in Python 3.12.2
pip install edgartools==2.9.0

and then:

from edgar import *

set_identity("Ilias Antonopoulos [email protected]")  # enter valid email here
company = Company("SNOW")
company_facts = company.get_facts_for_namespace()

Screenshot

image

p.s. happy to assist with its resolution if it's source code issue and not simply a documentation one :)

How to download the attachments?

I am trying to download the attachments.
filings = get_filings(range(2022, 2025), form="NPORT-P", amendments=True) filings[0].attachments.download(path=r'D:/Karie')
Then there are some errors:
image
How to solve this?
Thank you!

Fail to Get Filings for a Specific Year

I get the expected result when I run

filings = get_filings(2023, 1)

It finishes in about 7~8 secs on my laptop. However, when I run

filings = get_filings(2023)

the code runs indefinitely. I have to write a for loop to get the filings for each quarter

filings_list = []
for i in range(1,5):
    filings_list.append(get_filings(2023, i))

it takes 24~25 secs to finish. My Python version is 3.10.14 and I am using edgartools==2.19.2. I am not sure if this is a bug or if I am doing something wrong. I would appreciate any help. Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.