Coder Social home page Coder Social logo

py-edgar's Introduction

EDGAR

A small library to access files from SEC's edgar.

Installation

pip install edgar

Example

To get a company's latest 5 10-Ks, run

from edgar import Company
company = Company("Oracle Corp", "0001341439")
tree = company.get_all_filings(filing_type = "10-K")
docs = Company.get_documents(tree, no_of_documents=5)

or

from edgar import Company, TXTML

company = Company("INTERNATIONAL BUSINESS MACHINES CORP", "0000051143")
doc = company.get_10K()
text = TXTML.parse_full_10K(doc)

To get all companies and find a specific one, run

from edgar import Edgar
edgar = Edgar()
possible_companies = edgar.find_company_name("Cisco System")

To avoid pull of all company data from sec.gov on Edgar initialization, pass in a local path to the data

from edgar import Edgar
edgar = Edgar("/path/to/cik-lookup-data.txt")
possible_companies = edgar.find_company_name("Cisco System")

To get XBRL data, run

from edgar import Company, XBRL, XBRLElement

company = Company("Oracle Corp", "0001341439")
results = company.get_data_files_from_10K("EX-101.INS", isxml=True)
xbrl = XBRL(results[0])
XBRLElement(xbrl.relevant_children_parsed[15]).to_dict() // returns a dictionary of name, value, and schemaRef

API

Company

Company(name, cik, timeout=10)
  • name (company name)
  • cik (company CIK number)
  • timeout (optional) (default: 10)

Methods

get_filings_url(self, filing_type="", prior_to="", ownership="include", no_of_entries=100) -> str

Returns a url to fetch filings data

  • filing_type: The type of document you want. i.e. 10-K, S-8, 8-K. If not specified, it'll return all documents
  • prior_to: Time prior which documents are to be retrieved. If not specified, it'll return all documents
  • ownership: defaults to include. Options are include, exclude, only.
  • no_of_entries: defaults to 100. Returns the number of entries to be returned. Maximum is 100.

get_all_filings(self, filing_type="", prior_to="", ownership="include", no_of_entries=100) -> lxml.html.HtmlElement

Returns the HTML in the form of lxml.html

  • filing_type: The type of document you want. i.e. 10-K, S-8, 8-K. If not specified, it'll return all documents
  • prior_to: Time prior which documents are to be retrieved. If not specified, it'll return all documents
  • ownership: defaults to include. Options are include, exclude, only.
  • no_of_entries: defaults to 100. Returns the number of entries to be returned. Maximum is 100.

get_10Ks(self, no_of_documents=1, as_documents=False) -> List[lxml.html.HtmlElement]

Returns the HTML in the form of lxml.html of concatenation of all the documents in the 10-K

  • no_of_documents (default: 1): numer of documents to be retrieved
  • When as_documents is set to True, it returns -> List[edgar.document.Documents] a list of Documents

get_document_type_from_10K(self, document_type, no_of_documents=1) -> List[lxml.html.HtmlElement]

Returns the HTML in the form of lxml.html of the document within 10-K

  • document_type: Tye type of document you want, i.e. 10-K, EX-3.2
  • no_of_documents (default: 1): numer of documents to be retrieved

get_data_files_from_10K(self, document_type, no_of_documents=1, isxml=False) -> List[lxml.html.HtmlElement]

Returns the HTML in the form of lxml.html of the data file within 10-K

  • document_type: Tye type of document you want, i.e. EX-101.INS
  • no_of_documents (default: 1): numer of documents to be retrieved
  • isxml (default: False): by default, things aren't case sensitive and is parsed with html in lxml. If this is True, then it is parsed with etree` which is case sensitive

Class Method

get_documents(self, tree: lxml.html.Htmlelement, no_of_documents=1, debug=False, as_documents=False) -> List[lxml.html.HtmlElement] Returns a list of strings, each string contains the body of the specified document from input

  • tree: lxml.html form that is returned from Company.getAllFilings
  • no_of_documents: number of document returned. If it is 1, the returned result is just one string, instead of a list of strings. Defaults to 1.
  • debug (default: False): if True, displays the URL and form
  • When as_documents is set to True, it returns -> List[edgar.document.Documents] a list of Documents

Edgar

Gets all companies from EDGAR

get_cik_by_company_name(company_name: str) -> str: Returns the CIK if given the exact name or the company

get_company_name_by_cik(cik: str) -> str: Returns the company name if given the CIK (with the 000s)

find_company_name(words: str) -> List[str]: Returns a list of company names by exact word matching

match_company_by_company_name(self, name, top=5) -> List[Dict[str, Any]]: Returns a list of dictionarys, with company names, CIK, and their fuzzy match score

  • top (default: 5) returns the top number of fuzzy matches. If set to None, it'll return the whole list (which is a lot)

XBRL

Parses data from XBRL

Properties

relevant_children

  • get children that are not context relevant_children_parsed
  • get children that are not context, unit, schemaRef
  • cleans tags

Documents

Filing and Documents Details for the SEC EDGAR Form (such as 10-K)

Documents(url, timeout=10)

Properties

url: str: URL of the document

content: dict: Dictionary of meta data of the document

content['Filing Date']: str: Document filing date

content['Accepted']: str: Document accepted datetime

content['Period of Report']: str: The date period that the document is for

element: lxml.html.HtmlElement: The HTML element for the Document (from the url) so it can be further parsed

Contribution

Buy Me A Coffee

py-edgar's People

Contributors

eabase avatar ipl31 avatar joeyism avatar kbennatti avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

py-edgar's Issues

Issue with parsing 10K filings

I am using edgar 5.4.1 to get 10K filings from the SEC database but I cannot parse the html elements and see the actual text content of the documents.
It seems to be the proper way for the 5.4.1 version but it raises an attribute error.

import edgar
company = edgar.Company("Facebook Inc", "0001326801")
tree = company.get_all_filings(filing_type="10-K")
docs = edgar.Company.get_documents(tree=tree, no_of_documents=5, as_documents=True)
sample_text = edgar.TXTML.parse_full_10K(docs[3].element) 
sample_text

Capture

I tried another way too, as below, but I think it would work only in a previous version.

import edgar
company = edgar.Company("Facebook Inc", "0001326801")
tree = company.get_all_filings(filing_type="10-K")
docs = edgar.Company.get_documents(tree=tree, no_of_documents=5)
docs[3].element.text_content()

Capture1

Is there sth I don't get? What is the issue here?

Thanks, in advance!

Getting 10-K for a company returns empty String

company = edgar.Company("AARON'S INC", "706688") a = company.get_10K() text = edgar.TXTML.parse_full_10K(a) print(text)

This code for accessing an Aaron's Inc 10-K gives an empty String. This code works correctly for other companies but fails and returns an empty String for Aaron's Inc. What is the issue here and how can I fix it? I am using the latest version.

IndexError: list index out of range running sample code "To get XBRL data"

Hello Joey!

I want to say thanks so much for working on this project, it is exactly what I hoped to find.

I have been working with the package and sample code some and am getting an index error running your sample code from the section "To get XBRL data".

Specifically, "results" is a list with no elements after running the following line:

results = company.get_data_files_from_10K("EX-101.INS", isxml=True)

The error occurs during execution of the next line:

xbrl = XBRL(results[0])

I originally tried this in a Jupyter notebook, but also tried in an interactive interpreter session to make sure it wasn't just related to the environment.
edgar-index-error-interpreter
edgar-index-error-jupyter

I would appreciate it if you could provide some assistance and let me know if this problem is a known issue or something I am doing wrong, please?

I specifically would like to collect "facts" from the 10-K reports like number of common shares outstanding, for instance.

Thanks, sincerely

P.S. - I followed the BuyMeACoffee.com link with the intent of providing support, and found you don't have a support button on your page...

Retrieving latest 10-K file does not give the actual filing

I'm running this code to get the most recent 10-K file for 1st Source Corp:

company = edgar.Company("1ST SOURCE CORP", "34782") a = company.getAllFilings(filingType = "10-K",noOfEntries = 1) filings = edgar.getDocuments(tree = a, noOfDocuments = 1) soup = BeautifulSoup(filings,'html5lib') text = soup.get_text(strip = True) print(text)

The output does not give me the actual filing, but gives me this:

	This application relies heavily on JavaScript, you
	will need to allow JavaScript to use this application.Inline Viewer
	
		
			
				
					
					Menu
					Menu
				
				
					Information
					Save XBRL Instance
					Save
						XBRL Zip File
					Open
						as HTML
					Settings
					Help
					
				
			
			
				
					
					Sections
					Sections
				
			
			
				
					
						
							
								Additional
									Search Options
							
						
						
							
								
								
									Include Fact Name
								
							
							
								
								
									Include Fact Content
								
							
							
								
								
									Include Labels
								
							
							
								
								
									Include Definitions
								
							
							
								
								
									Include Dimensions
								
							
							
								Reference Options
								
									
									
										Include Topic
									
								
								
									
									
										Include Sub-Topic
									
								
								
									
									
										Include Paragraph
									
								
								
									
									
										Include Publisher
									
								
								
									
									
										Include Section
									
								
							
							
								
								
									Match Case
								
							
						
					
					
					
						
							
							Clear Search
						
						
							
							Submit Search
						
					
				
			
			
				
					
					Data
					Data
				
				
					
						
							
							All
						
						
							
							
								
								Amounts Only
							
						
						
							
							
								
								Text Only
							
						
						
							
							
								
								Calculations Only
							
						
						
							
							
								
								Negatives Only
							
						
						
							
							
								
								Additional Items Only
							
						
					
				
			
			
				
					
					Tags
					Tags
				
				
					
						
							
							All
						
						
							
							
								
								Standard Only
							
						
						
							
							
								
								
								Custom Only
							
						
					
				
			
			
				
					
					More Filters
					More Filters
					
				
				
					
						Selecting any of the below will take a few moments.
					
						
							
								
									
										Periods
										
											
										
									
								
								
									
									
								
							
						
						
							
								
									
										Measures
										
											
										
									
								
								
									
									
								
							
						
						
							
								
									
										Axis
										
											
										
									
								
								
									
									
								
							
						
						
							
								
									
										Members
										
											
										
									
								
								
									
									
								
							
						
						
							
								
									
										Scale
										
											
										
									
								
								
									
									
								
							
						
						
							
								
									
										Balance
										2
									
								
								
									
										
											
											Debit
										
									
									
										
											
											Credit
										
									
								
							
						
					
				
			
			
				
					
					Reset All Filters
				
			
			
				
					
					Links
				
				
			
		
		
			
				
					Facts
					
						
					
				
			
		
	


	
		
	
	
		
			
			Loading Inline Form.
		
	
	
		
			Tagged Sections
			
				
			
		
		
			
				
					
				
				
					
						
						Search in All.
					
					
						
						Search in Internal Sections
							Only.
					
					
						
						Search in External Sections
							Only.
					
				
			
			
			
				
					
					Clear Sections
						Search
				
				
					
					Submit Sections
						Search
				
			
		
		
		
			
				
					
						
							Document Entity Information
							
								
							
						
					
				
				
					
					
				
			
			
				
					
						
							Financial Statements
							
								
							
						
					
				
				
					
					
				
			
			
				
					
						
							Notes to the Financials
							
								
							
						
					
				
				
					
					
				
			
			
				
					
						
							RR Summaries
							
								
							
						
					
				
				
					
					
				
			
		
	
	
		
			Help
			
				
			
		
		
			
				
					
						
							Getting Started
						
					
				
				
					The Inline XBRL Viewer allows a user to quickly
						and easily review details of the tagged information in an Inline
						document by automatically placing a top and bottom highlight
						border around each tagged numeric fact and left and right border
						for each block tagged fact. Hovering over a tagged fact will
						highlight (shade) all content related to the tagged fact, and
						clicking on a tagged fact will reveal its tagging details in the
						Fact Review Window. Search and filter options are also provided
						to easily refine and identify specific types of tagged
						information.
				
			
			
				
					
						
							Fact Review Window
						
					
				
				
					The Fact Review Window shows the tagging
						details for the currently selected fact, which is highlighted
						with a solid blue background. There are four categories of fact
						detail which can be viewed; an “N/A” value indicates there is no
						available information for the item within the given category:
					
						Attributes - All primary information (as applicable)
							describing the tagged fact including period, sign, decimals,
							dimensional detail (axes and members), scale, measure, data type
							and footnotes
						Labels - Detailed documentation (definition) for the tag
							used, and other labels
						References - Authoritative reference information (as
							applicable) for the selected tag
						Calculation - Balance and calculation hierarchy details
							(numeric items only)
					
				
			
			
				
					
						
							Search
						
					
				
				
					The Search box can be used to find tagged facts
						matching entered keywords. By default, tag name, tag labels, and
						tagged content are included in Search. To search tagged
						information, enter a keyword and select the magnifying glass icon
						to return matching results. Tagged facts matching the search
						criteria are shown with a yellow-colored (default) shading, while
						Tagged Sections are reduced to just those that included the
						entered search keywords (if expanded; see Tagged Sections for
						additional detail). The content included in Search can be
						increased to included tag definitions, dimensions, and
						authoritative references. See Settings for more information.
					Search operators “and” (via “AND” or “&”)
						and “or” (via “OR” or “|”) are available to further refine a
						search. For example, and with Settings “Include References” on,
						searching for “FASB AND 225” will highlight tagged data that is
						related to FASB Codification topic 225.
					Filters can be used in conjunction with Search
						to further refine the scope of Search. Filters reduce the amount
						of tagged facts that the keyword search is performed on. For
						example, if “cash” is entered in conjunction with a Tags filter
						of “Custom Only”, the shaded search results will only be
						indicated on tagged facts based on a custom tag.
				
			
			
				
					
						
							Filter
						
					
				
				
					Filters change the number of highlighted facts
						indicated by providing several ways to review the tagged
						information. Multiple filters can be used at once. When the first
						filter is applied, a filter toolbar indicates all active filter
						selections and provides the ability to remove one or all applied
						filters.
					Data Filter
					These filters options allow the user to refine
						the highlighted tagged facts by data type:
					
						All - Displays all tagged data (default)
						Amounts Only - Numeric items only
						Text Only - Textual items only
						Calculations Only - Numeric items participating in a
							calculation
						Negatives Only - Numeric items with the Inline “sign”
							option
						Additional Items Only - Tagged items with potentially no
							corresponding HTML presentation (i.e., hidden)
					
					Tags Filter
					These filters allow the user to refine the
						highlighted facts by tag type:
					
						Standard Only - Tags from a common taxonomy (e.g.,
							US_GAAP, DEI)
						Custom Only - Extension tags unique to the entity's
							document
					
					More Filters
					Additional filters that allow user to further
						refine the highlighted facts:
					
						Periods - List of all used context reporting periods
						Measures - List of all used units of measure; as
							applicable
						Axes - List of all used axes (dimensions); as applicable
						Scale - List of all used scaled options (e.g.,
							thousands, millions); as applicable
						Balance - Debit, credit; as applicable
					
					Multiple filters work in conjunction with each
						other. For example, selecting the "Amounts Only" Data filter and
						"Custom Only" Tags filter will highlight only numeric tagged
						facts using custom tags. Active filters are displayed in the
						Filter toolbar as they are selected. Active filters can be
						removed individually by selecting the "X" icon to the right of
						each filter, or all at once via the "Clear All" option.
				
			
			
				
					
						
							Facts Results List
						
					
				
				
					Selecting the down arrow "V" to the right of
						the facts count on the toolbar reveals the Facts Results List; a
						navigable listing of all currently highlighted tagged facts. By
						default, all tagged facts are displayed in the Facts Results
						List. The list content and count reflects the currently
						highlighted facts (i.e., both Filters and Search criteria refine
						the list to match the highlighted tagged facts). Navigation
						controls are available to move through the list as well as move
						the current view to the corresponding highlighted fact location
						automatically. When a fact in the Facts Results List is selected,
						it will reveal the Fact Review Window.
					If the letter "A" appears for a fact, it
						indicates the fact is additional data (i.e., hidden with
						potentially no corresponding HTML presentation). If the letter
						"C" appears, the fact is tagged with a custom tag. If the letter
						"D" appears, the fact is tagged with dimensional information.
				
			
			
				
					
						
							Information
						
					
				
				
					The Information menu item provides additional
						detail about the current Inline document and customizable viewer
						settings.
					
						Document - Basic company and document information
						Tags - Fact and tag (standard and custom) information
						Files - Files used
						Additional Items - Additional data that's been tagged
							but potentially does not have a corresponding location in the
							HTML
					
				
			
			
				
					
						
							Tagged Sections
						
					
				
				
					The Tagged Sections toolbar/menu item provides
						a listing of the tagged sections of the Inline document. By
						selecting a section item in the listing, the document will
						navigate to that section. When the Tagged Sections feature is
						open, the Search box will additionally filter the list of
						sections to only those sections that match the entered criteria.
				
			
			
				
					
						
							Save XBRL Instance
						
					
				
				
					The Save XBRL Instance menu item allows an XBRL
						instance document (*.xml) that's extracted from the Inline
						document to be saved locally.
				
			
			
				
					
						
							Save XBRL Zip
						
					
				
				
					The Save XBRL Zip menu item allows a zip file
						(*.zip) that contains the as-provided XBRL instance document and
						related custom taxonomy files to be saved locally.
				
			
			
				
					
						
							Settings
						
					
				
				
					The Settings menu item provides the ability to
						customize Viewer features.
					Highlight Colors
					
						Tagged Data - Change the highlight color of the tagged
							fact border
						Search Results - Change the background color of tagged
							items matching the Search results
						Selected Fact - Change the color of highlight border
							used to identify the currently selected fact
						Tag Shading - Change the color of the shading applied to
							tagged data
					
					Search Options
					
						Match Case - Matches the specific case of the entered
							Search keyword
						Include Labels - Extends Search to include tag labels
						Include Definitions - Extends Search to include tag
							definitions
						Include Dimensions - Extends Search to include
							dimensional detail
						Include References - Extends Search to include
							authoritative reference information
					
					Tagged Fact Hover
					
						Display - Displays the hover fact review window for any tagged fact*
						Hide - Hides the hover fact review window for any tagged fact (default)
					
					*May impact performance with certain web browsers.	
				
			
		
	
	
		
			
				Facts
				
					
				
			
			
				
				
					
						Select Page
					
				
				
			
		
	
	
		
			
		
	
	


	
		
		
			

			

			

		
	
	
		
			Copy and Paste Below Content.
			
				
			
			
		
		
			
				
					
						
							
							
								
									
								
							
						
					
				
				
					
						
							
							
								
									
								
							
						
					
				
				
					
						
							
							
								
									
								
							
						
					
				
				
					
						
							
							
								
									
								
							
						
					
				
			
		
	
	
		
			
			Previous
		
		
		
		
			
			Next
		
	


	
		Settings
		
			

			
		
	
	
		
			
				Hover over Fact for
					quick information.
				
					On
					Off
				
			
			
				Auto Scrolling
					Position
				
					Top
					Center
				
				Selecting a
					fact from the Sections Menu or the Fact Menu will automatically
					scroll that element to the (Top, or Middle) of the viewer window.
					This setting will have no use on IE 10, or Safari.
			
			
			
				
					Tagged Data
				
				
					
				
			
			
			
				
					Search Results
				
				
					
				
			
			
			
				
					Selected Fact
				
				
					
				
			
			
			
				
					Tag Shading (hover)
				
				
					
				
			
		
	
	


	
		
		

			

			
			

			

			

			
		
	
	
		
		
			Copy and Paste Below Content.
			
				
			
			
		
		
			
				
					
						
							
							
								
									
								
							
						
					
				
				
					
						
							
							
								
									
								
							
						
					
				
				
					
						
							
							
								
									
								
							
						
					
				
				
					
						
							
							
								
									
								
							
						
					
				
			
		
	
	
		
			
			Previous
		
		
		
		
			
			Next
		
	


	
		
			Nested Facts
			
			/
			
		
		
			

			

			

			

			

			
		
	
	
		
			
				
					
					Previous
				
				
					
				
				
					
					Next
				
			
		
		
			Copy and Paste Below Content.
			
				
			
			
		
		
			
				
					
						
							
							
								
									
								
							
						
					
				
				
					
						
							
							
								
									
								
							
						
					
				
				
					
						
							
							
								
									
								
							
						
					
				
				
					
						
							
							
								
									
								
							
						
					
				
			
		
	
	
		
			
			Previous
		
		
			
		
		
			
			Next

One important note is that when I tried retrieving the last two files, the second file was correctly the filing from 12-31-2018; while the first one, which should be the one from 12-31-2019, was still incorrect.

Release new version?

There have been quite a few commits to master since the last release. It would be nice to get an official pypi release to avoid having to set a commit as a dependency to pull the latest changes.

Excluding amendments

What would you suggest for applying a parameter for excluding amendments? I know in the sec_edgar_downloader library there's a parameter that can be set: include_amends=False. Do you have a parameter in your methods or can direct me how I can apply that to the code below?
Thanks again!

I'm using the following code to pull the documents:

from edgar import Company
from edgar import TXTML

company = Company("BOSTON SCIENTIFIC CORP", "0000885725")
tree = company.get_all_filings(filing_type = "10-K")
docs = Company.get_documents(tree, no_of_documents=5)

text=TXTML.parse_full_10K(docs[3])
text

Edgar issue, but I need help from experts here

This is an Edgar issue, but I lack any knowledge of how to make them aware of the issue.

Any advice welcomed and you can close this, but I think of benefit to users here to address properly.  
The CIK is APPL
This works, but takes 4 minutes 20:  
Two concerns, 4 minutes plus to complete, failure of a header consistent with their request header specification.

import requests
header = {
  "User-Agent": "[email protected]"
}

CIK = '320193'
url = f"https://data.sec.gov/api/xbrl/companyfacts/CIK{str(CIK).zfill(10)}.json"
info = requests.get(url, headers=header).json()

This fails and also takes 4'20''

import requests
header = {
  "User-Agent": "[email protected]", "Host": "www.sec.gov"
}
CIK = '320193'
url = f"https://data.sec.gov/api/xbrl/companyfacts/CIK{str(CIK).zfill(10)}.json"
info = requests.get(url, headers=header).json()

Unable to Retrieve Most Updated 10-Q

`company = Company('COSTCO WHOLESALE CORP.', "909832")
tree = company.get_all_filings(filing_type="10-Q")
docs = get_documents(tree, no_of_documents=5)

texter = tostring(docs[1])
soup=BeautifulSoup(texter,'html.parser')
print(soup.get_text())
`
On executing the above the 10-Q available is for quarterly period ended May 12,2019, while the most current/updated 10-Q available at Edgar for this company is for quarterly period ended November 24,2019
Checked with a couple of other companies as well and last 2-3 10-Qs filed are not available.

get_documents does not work

I tried the three examples on https://pypi.org/project/edgar/

There is, however, one example that does not work:

from edgar import Company
company = Company("Oracle Corp", "0001341439")
tree = company.get_all_filings(filing_type = "10-K")
docs = edgar.get_documents(tree, no_of_documents=5)

AttributeError: 'Edgar' object has no attribute 'get_documents'

Rate limit edgar requests

The SEC website recently (within the last couple months) added rate limiting to their website. Currently, none of this libraries requests properly respond to it. This leads to hard to decode errors and makes this generally much less usable in a scripted fashion. When an IP is detected as needing rate limiting, the SEC website returns a 403 response with a body that looks like the below text.

<html xmlns="http://www.w3.org/1999/xhtml" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>SEC.gov | Request Rate Threshold Exceeded</title>
<style>
html {height: 100%}
body {height: 100%; margin:0; padding:0;}
#header {background-color:#003968; color:#fff; padding:15px 20px 10px 20px;font-family:Arial, Helvetica, sans-serif; font-size:20px; border-bottom:solid 5px #000;}
#footer {background-color:#003968; color:#fff; padding:15px 20px;font-family:Arial, Helvetica, sans-serif; font-size:20px;}
#content {max-width:650px;margin:60px auto; padding:0 20px 100px 20px; background-image:url(seal_bw.png);background-repeat:no-repeat;background-position:50% 100%;}
h1 {font-family:Georgia, Times, serif; font-size:20px;}
h2 {text-align:center; font-family:Georgia, Times, serif; font-size:20px; width:100%; border-bottom:solid #999 1px;padding-bottom:10px; margin-bottom:20px;}
h3 {font-family:Georgia, Times, serif; font-size:16px; margin:25px 0 0 0;}
p {font-family:Verdana, Geneva, sans-serif;font-size:14px;line-height:1.3;}
.grey_box {background-color:#eee; padding:5px 40px 20px 40px;margin-top:75px;}
.grey_box p {font-size:12px;line-height:1.5}
.note {padding: 0 40px; font-style: italic;}
</style>
</head>

<body>
<div id="header">U.S. Securities and Exchange Commission</div>
<div id="content">
<h1>Your Request Originates from an Undeclared Automated Tool</h1>
<p>To allow for equitable access to all users, SEC reserves the right to limit requests originating from undeclared automated tools. Your request has been identified as pa
rt of a network of automated tools outside of the acceptable policy and will be managed until action is taken to declare your traffic.</p>

<p>Please declare your traffic by updating your user agent to include company specific information.</p>


<p>For best practices on efficiently downloading information from SEC.gov, including the latest EDGAR filings, visit <a href="https://www.sec.gov/developer" target="_blank
">sec.gov/developer</a>. You can also <a href="https://public.govdelivery.com/accounts/USSEC/subscriber/new?topic_id=USSEC_260" target="_blank">sign up for email updates</
a> on the SEC open data program, including best practices that make it more efficient to download data, and SEC.gov enhancements that may impact scripted downloading proce
sses. For more information, contact <a href="mailto:[email protected]">[email protected]</a>.</p>

<p>For more information, please see the SECâs <a href="#internet">Web Site Privacy and Security Policy</a>. Thank you for your interest in the U.S. Securities and Exchange Commission.
</p><p>Reference ID: 0.8fa83817.1621385100.1c3be594</p>
<div class="grey_box">
<h2>More Information</h2>
<h3><a name="internet" id="internet">Internet Security Policy</a></h3>

<p>By using this site, you are agreeing to security monitoring and auditing. For security purposes, and to ensure that the public service remains available to users, this government computer system employs programs to monitor network traffic to identify unauthorized attempts to upload or change information or to otherwise cause damage, including attempts to deny service to users.</p>

<p>Unauthorized attempts to upload information and/or change information on any portion of this site are strictly prohibited and are subject to prosecution under the Computer Fraud and Abuse Act of 1986 and the National Information Infrastructure Protection Act of 1996 (see Title 18 U.S.C. §§ 1001 and 1030).</p>

<p>To ensure our website performs well for all users, the SEC monitors the frequency of requests for SEC.gov content to ensure automated searches do not impact the ability of others to access SEC.gov content. We reserve the right to block IP addresses that submit excessive requests.  Current guidelines limit users to a total of no more than 10 requests per second, regardless of the number of machines used to submit requests. </p>

<p>If a user or application submits more than 10 requests per second, further requests from the IP address(es) may be limited for a brief period. Once the rate of requests has dropped below the threshold for 10 minutes, the user may resume accessing content on SEC.gov. This SEC practice is designed to limit excessive automated searches on SEC.gov and is not intended or expected to impact individuals browsing the SEC.gov website. </p>

<p>Note that this policy may change as the SEC manages SEC.gov to ensure that the website performs efficiently and remains available to all users.</p>
</div>
<br />
<p class="note"><b>Note:</b> We do not offer technical support for developing or debugging scripted downloading processes.</p>
</div>
</body>
</html>

Issue with match_company_by_company_name

Thank you for the amazing library that you have developed and shared!

I am currently trying to find matching company names in the SEC EDGAR database, given that I have a file where the names are not exact matches of their real ones. I have been trying to write the execute the following code:

edgar.Edgar.match_company_by_company_name("Neoforma.com Inc")

but get the following error:

match_company_by_company_name() missing 1 required positional argument: 'name'

I believe the first argument is the self in the Python class, an issue I don't know how to overcome.
Any help would be greatly appreciated!

Having trouble with getting 8-K

I tried to use the very first sample code and it only returns the html element, not the actual document, When I use the second sample code I am able to get the document. I want to use the first sample code for 8-K's but cannot parse the html, is there a way I can do this; is there some sort of "get_8-K" function. I tried using the "TXTML.get_HTML_from_document" to parse the 8k, but I get a list index out of range error if the number of documents is set to one, if it is more than one I get list object has no attribute tag. Is this the right function that I am using. Thanks (sorry if this is a noob question)

Filing Date?

Aside from scraping the headers within the SEC url, do you have a method for pulling the respective Filing Date for each document? Thanks!

Read timed out error

I'm trying to pull the past few 10-K documents for each company in the S&P but after a few successful company returns in a for-loop, typically works between 5 and 10 companies, I get the following error:
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='www.sec.gov', port=443): Read timed out. (read timeout=10)

*Is the request I'm making hitting the site too hard and excessive, or could something else be causing this issue? Would you recommend implementing either a sleep function or some other alternative?

Issues with parsing 10K filings

I've tried this on several companies and despite pulling 5 documents, the first element [0] returns a 'list index out of range' error.
I'm using TXTML.parse_full_10K(), which works for the subsequent 4 elements but never for the first. Is there a different method for the first element I should be using? I'm working with version 5.3.4. Thanks!

from edgar import Company
from edgar import TXTML

company = Company("Oracle Corp", "0001341439")
#company = Company("Amazon Com Inc", "0001018724")   
#company = Company('AbbVie Inc.','0001551152')
#company = Company('Alphabet Inc.', '0001652044')

tree = company.get_all_filings(filing_type = "10-K")
docs = Company.get_documents(tree, no_of_documents=5)

text=TXTML.parse_full_10K(docs[0])
text

Edgar class init fails

Looks like the CIK file the class downloads from SEC/EDGAR now 404s.
Repro:
>>> Edgar() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "edgar/edgar.py", line 31, in __init__ all_companies_array[i] = (item_arr[0], item_arr[1]) IndexError: list index out of range

Do you plan on continuing to maintain and use this library? I might have some time to try a fix for this but would prefer to avoid it if the library is abandoned. I suspect you will need to migrate to doing CIK lookups on demand instead of pre-caching them since it looks like the SEC no longer has that file available for download.

Instalation error for latest as python-levenshtein is not maintained

Installation error for latest as python-levenshtein is not maintained.
Also fuzzywuzzy is not maintained and has probably moved to thefuzz.

Building wheels for collected packages: python-levenshtein
  Building wheel for python-levenshtein (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [67 lines of output]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.