lukerosiak / pysec Goto Github PK

View Code? Open in Web Editor NEW

343.0 343.0 147.0 219 KB

Parse XBRL filings from the SEC's EDGAR in Python

Python 100.00%

pysec's People

Contributors

Stargazers

Watchers

Forkers

jsfenfen vincentraia dberkholz blacksburg98 tonydot28 huangt rchenmit ncaarules rahimnathwani chrisspen digitallexicon dmitrybelsky bigspotteddog hpei1011 johnconnelly75 ifanchu cjones2500 joeycallaghan cademarkegard joejohnston leandroloi decause plditallo bossadvisors spurnaye hiroakip jansel dkdndes sethdandridge davidkunio spikelynch along1x somethingnew2-0 dalamar66 johnsontrey chreko jackhbarnes popeyesurfer faraday1221 ubuntuevangelist wizardshowing richardkhoo an-li-github omitroom13 dta613 skatingboy2006 tranluuha abhigupta4 jeanandhao riskiuniverse hal2001 exactlywrong willianlone dylan1218 jermellb keita1 dashstander rtvt123 killerdarcy sudarshansarolkar abesleistigal sichao92 aferreiramartinez cpfrer ilovenorway shlomitsur guildary vico colingwuyu samtamp95 gpkc totrit pjkonicki amoyquantum rontwo jliang4 frostbyte303 sammyonline boazde alexanu kthouz liuhoward grogoyle chriswongwr tangunner cuperto zldoty stjordanis daejungkim sshuster tungvuthanh btw1027 mjdhasan triplofilotto joepfortunato sweisser noke8868 willmw97 wycliffwasonga wayinone

pysec's Issues

Using RSS Feed

Looks like an interesting project. As I am new to python/git, please excuse any ignorance on my part.

It seems like the current logic is to get the XBRL zip file as well as other files from the Full Index, which points the the complete filing in TXT. Would it be better to use the RSS feed (see ftp://ftp.sec.gov/edgar/monthly/) to get information on the XBRL filing, considering pysec is only for XBRL? The RSS feed gives the zip file explicitly in most cases (some older filings under the voluntary program will not have a zip).

If you are interested in the difference between the HTML, TXT, and XML (XBRL) files, let me know and I will elaborate.

Is this project actively maintained?

Hi, I have 3 Qs:

a) is this project actively maintained? I had to modify the examples to make them work, the py file that generates a CSV file crashes after 40 companies , etc.
b) I am at a stage right now where I am able to retrieve financial information for a particular cik. I love it that pysec autopopulates its DB (I am using sqlite3) so next time access is faster. When I see the data that print x.fields prints (from example.py) its a handful of information - is it complete? For example, how do I print market cap of a filing?

c) Last question: using pysec, is it possible to print say, ",list of all companies with market cap > $1B in a particular industry" ?

How do I install this?

Structure as formal Python package

This project looks interesting. However, it would be easier to incorporate into projects if it were structured as a proper setuptools-based Python package, versioned, and registered on pypi.python.org.

Missing LICENSE?

Thank you for sharing your code with the community :)

Though at the bottom of your README you state the project is released under the GNU, it may be helpful to prospective contributors/users to know which flavor of the GNU (GPL?) you are releasing under.

Again, thank you for sharing your work, and I hope this helps drive more folks to your project.

'Undefined namespace prefix' in GetCurrentPeriodAndContextInformation

I got an issue with 819793/0000891092-14-001589/ain-20131231.xml, I got an exception
'Undefined namespace prefix'
at x.getNodeList("//xbrli:entity/xbrli:segment/xbrldi:explicitMember")

I don't have a lot experiences in lxml, can someone please help me to sovle this? thanks

10Q's

Library does not work with quarterly reports (10Q's). Using .xml file download directly. GetBaseInformation in xbrl.py fails.
Any plans to look at this issue?

Consider adding pysec to Pandas

See for example pandas-dev/pandas#4407

The data structures returned by pysec would be pandas Dataframes

Currently pandas is able to use a number of other projects for data loading such as xlrd.

how does this work on quarterly 8-Ks?

Do we want to make it work?

duplicate management folder

hey is pysec/pysec/management identical to pysec/management? Can one of 'em be killed?

xbrl / xml documents embedded in .txt submission

In Models.xbrl, xbrl_localpath() assumes the xbrl filename has an .xml extension. But in some cases the xml/xbrl documents appear to have been included in a larger text submission. For example, see here: http://www.sec.gov/Archives/edgar/data/320193/0001193125-12-023398.txt (there are several distinct xbrl files included there). It looks like the text file also includes a binary zip file within a block. All inside a .txt file. Which is, uh, odd.

I came across this while looking into handling 10-Q filings--perhaps this isn't an issue for 10-K's. What do you think is the best way to handle this? Should parsing xml from within the .txt file be part of the download() step? Or is there another file location that these should be pulled from?

checking reliability

anyone notice any reliability issues? can we feel comfortable using this tool without saying anything patently wrong about a company as a result?

PySEC with Django & Sqlite3

I'm having issues implementing this with Django and Sqlite3. Your Read Me stipulates change the DATA_DIR in the settings.py module but there is no DATA_DIR to change unless copy and pasted into settings. Is this what we're supposed to do? I also didn't understand the "Put this Django app under manage.py". What exactly do you mean?

Ultimately I'm trying to get the sec_import_index and sec_xbrl_to_csv to work. Maybe a step by step to getting those to work with Django & Sqlite3 will help others. Thanks in advance for any help.

Segmentation fault

I've tried both Mysql and Postgresql but receive a segmentation fault after a few minutes of it hanging at.ftp://ftp.sec.gov/edgar/full-index/2013/QTR1/company.zip and the company.zip is downloaded but not extracted.

My settings.py looks like:

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql', # Add 'postgresql_psycopg2', 'mysql', 'sqlite3' or 'oracle'.
        'NAME': 'pysec1',                      # Or path to database file if using sqlite3.
        # The following settings are not used with sqlite3:
        'USER': 'blah',
        'PASSWORD': 'blahblah',
        'HOST': '127.0.0.1',                      # Empty for localhost through domain sockets or '127.0.0.1' for localhost through TCP.
        'PORT': '',                      # Set to empty string for default.
    }
}

The database appears to get setup with no issues but this is as far as the program gets with both databases that I've tried.