palafrank / edgar Goto Github PK
View Code? Open in Web Editor NEWA crawler to get company filing data from XBRL filings
License: Apache License 2.0
A crawler to get company filing data from XBRL filings
License: Apache License 2.0
Hey Palafrank,
Love what you're doing. Would be happy to help out if needed. Could you include a license for this (hopefully Apache 2 - https://www.apache.org/licenses/LICENSE-2.0)?
I try every test in Edgar package, but tests with live data are returning missing fields example this test "TestLiveAMGNParsing" is returning those fields:
Missing fields in Entity Info[ShareCount,]
Missing fields in Assets[Liab,]
i tried to change the CIK company name from "AMGN" to "IBM" and parsing was without missing fields, i try another CIK - "AAPL" and again the parser failed:
Missing fields in Entity Info[ShareCount,]
And finally i understand that this parsing error is happening with some companies, but i don't know how to fix this.
https://github.com/palafrank/edgar/blob/master/parser_test.go#L889
https://github.com/palafrank/edgar/blob/master/xbrltags.go
Dps specifically needs to be checked otherwise *math.Inf or *math.NaN values can mess with the json encoding resulting in a fatal crash as the encoding library does not handle these values for you.
data_def.go changes:
import (
"errors"
"fmt"
"log"
"reflect"
"math"
)
// -- snipped
func generateData(fin *financialReport, name string) float64 {
log.Println("Generating data: ", name)
switch name {
case "GrossMargin":
//Do this only when the parsing is complete for required fields
if isCollectedDataSet(fin.Ops, "Revenue") && isCollectedDataSet(fin.Ops, "CostOfSales") {
log.Println("Generating Gross Margin")
if !math.IsInf(fin.Ops.Revenue - fin.Ops.CostOfSales, 0) && !math.IsNaN(fin.Ops.Revenue - fin.Ops.CostOfSales){
return fin.Ops.Revenue - fin.Ops.CostOfSales
}
}
case "Dps":
if isCollectedDataSet(fin.Cf, "Dividends") {
if isCollectedDataSet(fin.Ops, "WAShares") {
if !math.IsInf(round(fin.Cf.Dividends * -1 / fin.Ops.WAShares), 0) && !math.IsNaN(round(fin.Cf.Dividends * -1 / fin.Ops.WAShares)){
return round(fin.Cf.Dividends * -1 / fin.Ops.WAShares)
}
} else if isCollectedDataSet(fin.Entity, "ShareCount") {
if !math.IsInf(round(fin.Cf.Dividends * -1 / fin.Entity.ShareCount), 0) && !math.IsNaN(round(fin.Cf.Dividends * -1 / fin.Entity.ShareCount)){
return round(fin.Cf.Dividends * -1 / fin.Entity.ShareCount)
}
}
}
case "OpExpense":
if isCollectedDataSet(fin.Ops, "Revenue") &&
isCollectedDataSet(fin.Ops, "CostOfSales") &&
isCollectedDataSet(fin.Ops, "OpIncome") {
if !math.IsInf(round(fin.Ops.Revenue - fin.Ops.CostOfSales - fin.Ops.OpIncome),0) && !math.IsNaN(round(fin.Ops.Revenue - fin.Ops.CostOfSales - fin.Ops.OpIncome)) {
return round(fin.Ops.Revenue - fin.Ops.CostOfSales - fin.Ops.OpIncome)
}
}
}
return 0
}
Hi there,
My fork is highly specialized to my needs at this point but thought I would post my code for how I do a company lookup by name and get the corresponding CIK.
changes to parser.go
func cikPostPageParser(page io.Reader) (string, error) {
doc, _ := html.Parse(page)
r := regexp.MustCompile(`CIK=[+]?\d{2,}$`)
var CIK string
var f func(*html.Node)
f = func(n *html.Node) {
if n.Type == html.ElementNode && n.Data == "a" {
for _, a := range n.Attr {
if a.Key == "href" {
m := r.FindStringSubmatch(a.Val)
if len(m) > 0 {
CIK = strings.Split(m[0], "=")[1]
}
break
}
}
}
for c := n.FirstChild; c != nil; c = c.NextSibling {
f(c)
}
}
f(doc)
if CIK != "" {
for len(CIK) < 10 {
CIK = "0" + CIK
}
return CIK, nil
}
return CIK, errors.New("Could not find CIK")
}
func postPage(url1 string, cn string) io.ReadCloser {
resp, err := http.PostForm(url1, url.Values{"company": {cn}})
if err != nil {
log.Fatal("Query to SEC page ", url1, "failed: ", err)
return nil
}
return resp.Body
}
changes page.go
var (
baseURL string = "https://www.sec.gov/"
cikURL string = "https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&output=xml&CIK=%s"
backupCIK string = "https://www.sec.gov/cgi-bin/cik_lookup"
queryURL string = "cgi-bin/browse-edgar?action=getcompany&CIK=%s&type=%s&dateb=&owner=exclude&count=10"
searchURL string = baseURL + queryURL
)
func postPage(url1 string, cn string) io.ReadCloser {
resp, err := http.PostForm(url1, url.Values{"company": {cn}})
if err != nil {
log.Fatal("Query to SEC page ", url1, "failed: ", err)
return nil
}
return resp.Body
}
func getCompanyCIK(ticker string) string {
fmt.Println("getting company CIK")
var t bool
if strings.Contains(ticker, " ") {
// If the "ticker" has a space in it, we assume it is a company name
t = true
} else {
// Otherwise we assume it is a ticker and try
url1 := fmt.Sprintf(cikURL, ticker)
r := getPage(url1)
rb, _ := ioutil.ReadAll(r) //this is inefficient but upstream it requires an unclosed resp.Body which means I can't test to see if ticker worked fine or not without having to make this call and one later
t = strings.Contains(string(rb),"No matching Ticker Symbol.")
}
switch {
case t == false:
url1 := fmt.Sprintf(cikURL, ticker)
r2 := getPage(url1) //the inefficient second call
if cik, err := cikPageParser(r2); err == nil {
return cik
}
case t == true:
r := postPage(backupCIK, ticker)
if r != nil {
if cik, err := cikPostPageParser(r); err == nil {
fmt.Println(cik)
return cik
}
}
default:
fmt.Println("in default")
return ""
}
return ""
}
It works, its not pretty, but reduces limitations on searching just by CIK or Symbol (as many smaller ones do not automatically work).
Also, I am working on some mass collection of words to correlated them to tags so the number of tags to concepts should increase when I am finished. I will submit some additional tags for you if you want.
Hi Palafrank,
Thanks again for open sourcing this. I was wondering how to save a company folder and if it saves high level info parsed from company documents or the actual documents?
Thanks,
Brock
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.