lukerosiak / pysec Goto Github PK
View Code? Open in Web Editor NEWParse XBRL filings from the SEC's EDGAR in Python
Parse XBRL filings from the SEC's EDGAR in Python
Looks like an interesting project. As I am new to python/git, please excuse any ignorance on my part.
It seems like the current logic is to get the XBRL zip file as well as other files from the Full Index, which points the the complete filing in TXT. Would it be better to use the RSS feed (see ftp://ftp.sec.gov/edgar/monthly/) to get information on the XBRL filing, considering pysec is only for XBRL? The RSS feed gives the zip file explicitly in most cases (some older filings under the voluntary program will not have a zip).
If you are interested in the difference between the HTML, TXT, and XML (XBRL) files, let me know and I will elaborate.
Hi, I have 3 Qs:
a) is this project actively maintained? I had to modify the examples to make them work, the py file that generates a CSV file crashes after 40 companies , etc.
b) I am at a stage right now where I am able to retrieve financial information for a particular cik. I love it that pysec autopopulates its DB (I am using sqlite3) so next time access is faster. When I see the data that print x.fields prints (from example.py) its a handful of information - is it complete? For example, how do I print market cap of a filing?
c) Last question: using pysec, is it possible to print say, ",list of all companies with market cap > $1B in a particular industry" ?
This project looks interesting. However, it would be easier to incorporate into projects if it were structured as a proper setuptools-based Python package, versioned, and registered on pypi.python.org.
Thank you for sharing your code with the community :)
Though at the bottom of your README you state the project is released under the GNU, it may be helpful to prospective contributors/users to know which flavor of the GNU (GPL?) you are releasing under.
Again, thank you for sharing your work, and I hope this helps drive more folks to your project.
I got an issue with 819793/0000891092-14-001589/ain-20131231.xml, I got an exception
'Undefined namespace prefix'
at x.getNodeList("//xbrli:entity/xbrli:segment/xbrldi:explicitMember")
I don't have a lot experiences in lxml, can someone please help me to sovle this? thanks
Library does not work with quarterly reports (10Q's). Using .xml file download directly. GetBaseInformation in xbrl.py fails.
Any plans to look at this issue?
See for example pandas-dev/pandas#4407
The data structures returned by pysec would be pandas Dataframes
Currently pandas is able to use a number of other projects for data loading such as xlrd.
Do we want to make it work?
hey is pysec/pysec/management identical to pysec/management? Can one of 'em be killed?
In Models.xbrl, xbrl_localpath() assumes the xbrl filename has an .xml extension. But in some cases the xml/xbrl documents appear to have been included in a larger text submission. For example, see here: http://www.sec.gov/Archives/edgar/data/320193/0001193125-12-023398.txt (there are several distinct xbrl files included there). It looks like the text file also includes a binary zip file within a block. All inside a .txt file. Which is, uh, odd.
I came across this while looking into handling 10-Q filings--perhaps this isn't an issue for 10-K's. What do you think is the best way to handle this? Should parsing xml from within the .txt file be part of the download() step? Or is there another file location that these should be pulled from?
anyone notice any reliability issues? can we feel comfortable using this tool without saying anything patently wrong about a company as a result?
I'm having issues implementing this with Django and Sqlite3. Your Read Me stipulates change the DATA_DIR in the settings.py module but there is no DATA_DIR to change unless copy and pasted into settings. Is this what we're supposed to do? I also didn't understand the "Put this Django app under manage.py". What exactly do you mean?
Ultimately I'm trying to get the sec_import_index and sec_xbrl_to_csv to work. Maybe a step by step to getting those to work with Django & Sqlite3 will help others. Thanks in advance for any help.
I've tried both Mysql and Postgresql but receive a segmentation fault after a few minutes of it hanging at.ftp://ftp.sec.gov/edgar/full-index/2013/QTR1/company.zip and the company.zip is downloaded but not extracted.
My settings.py looks like:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql', # Add 'postgresql_psycopg2', 'mysql', 'sqlite3' or 'oracle'.
'NAME': 'pysec1', # Or path to database file if using sqlite3.
# The following settings are not used with sqlite3:
'USER': 'blah',
'PASSWORD': 'blahblah',
'HOST': '127.0.0.1', # Empty for localhost through domain sockets or '127.0.0.1' for localhost through TCP.
'PORT': '', # Set to empty string for default.
}
}
The database appears to get setup with no issues but this is as far as the program gets with both databases that I've tried.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.