Coder Social home page Coder Social logo

yanlesin / sec13flist Goto Github PK

View Code? Open in Web Editor NEW
7.0 4.0 1.0 10.48 MB

Returns a data frame with SEC Official List of Section 13F Securities for given Year and Quarter by parsing official list

License: Creative Commons Zero v1.0 Universal

R 82.73% C++ 17.27%
r sec 13f parse pdf dataframe rstats cusip sedol isin

sec13flist's Introduction

R build status codecov SEC13Flist status badge

SEC13Flist

The goal of SEC13Flist package is to provide functions to work with official list of Section 13(f) Securities.

Functions SEC_13F_list and SEC_13F_list_local parses PDF list from SEC.gov based on supplied year and quarter and returns data frame with list of securities, maintaining the same structure as official list. Functions appends YEAR and QUARTER columns to the list. Returned data frame could be customized and filtered according to your needs.

SEC_13F_list function reaches to SEC.gov website and requires tweaks if landing page changes. In case of a breaking change on landing page, you can use SEC_13F_list_local function to parse file downloaded to local folder.

SEC_13F_list function requires setup of user agent prior to attempting download from sec.gov website. For details how to setup user agent and maximum request rate please refer to https://www.sec.gov/os/accessing-edgar-data.

User agent could be setup via options(HTTPUserAgent=...).

Functions isCusip, isSedol, and isIsin verify checksum digit of security identifiers based on leading characters of the identifier (except last checksum digit). Functions returns TRUE/FALSE for correct/incorrect identifier.

CUSIP, ISIN, and SEDOL checksum calculation pseudo code located at Wikipedia - CUSIP, Wikipedia - SEDOL, Wikipedia - ISIN and R/C/C++ implementation is at Rosettacode - CUSIP, Rosettacode - SEDOL, and Rosettacode - ISIN

Installation

You can install current development version from GitHub with:

remotes::install_github("yanlesin/SEC13Flist")

Description of returned data for SEC_13F_list

CUSIP: chr - CUSIP number of the security
HAS_LISTED_OPTION: chr - An asterisk indicates that security having a listed option and each option is individually listed with its own CUSIP number immediately below the name of the security having the option
ISSUER_NAME: chr - Issuer Name
ISSUER_DESCRIPTION: chr - Issuer Description
STATUS: chr - “ADDED” (The security has become a Section 13(f) security) or “DELETED” (The security ceases to be a 13(f) security since the date of the last list)
YEAR: int - Year of the list
QUARTER: int - Quarter of the list

Examples

These are basic examples of usage:

library(SEC13Flist)
library(tidyverse)

## Return list for Q3 2018
SEC13Flist_2018_Q3 <- SEC_13F_list(2018,3)

## Current list form SEC website
SEC13Flist_current <- SEC_13F_list() #Current list form SEC website

## Customizing
SEC13Flist_current <- SEC_13F_list() |> 
  filter(STATUS!="DELETED") |>  #Filter out records with STATUS "DELETED"
  select(-YEAR,-QUARTER) #Remove YEAR and QUARTER columns

## Verifying CUSIP
verify_CUSIP <- SEC_13F_list() |> 
  rowwise() |>  ##CUSIPs are not unique, isCusip function is not vectorized and requires single nine character CUSIP as input
  mutate(VALID_CUSIP=isCusip(CUSIP)) ##validating CUSIP

Use of CUSIP Codes

According to FAQ section of CUSIP Global Services:

Can firms take CGS Data from public sources and create their own database without signing a license agreement with CGS?

CGS Data is publicly available in some offering documents and from other sources. Firms can elect to collect this information and store it in their internal databases for non-commercial use, provided that the source of such information permitted the reproduction and use of such information. However, CGS’s experience has been that the CGS data generally has not come from publicly available sources but rather from other sources such as a CGS Authorized Distributor or through improperly scraping websites of CGS customers with valid CGS’ licenses. Most end-user customers of CGS Data prefer to enter into a license agreement with CGS for authorized use and to enjoy the benefits of the integrity and functionality of downloadable, timely and accurate data (either from CGS directly or from an Authorized Distributor).

Known issues with CUSIP codes supplied in SEC’s Official List of 13(f) securities

This discussion at stackexchange describes problem with CUSIP codes for CALL and PUT options that is still present at current list.

This discussion at FundApps support article describes how FundApps (software provider for regulatory compliance) addresses quality issue for CUSIP codes including all option securities with the same first six-character subset of CUSIP code as main issue (* for HAS_LISTED_OPTION field in the list).

sec13flist's People

Contributors

realmadsci avatar yanlesin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

realmadsci

sec13flist's Issues

CUSIP functions return incorrect checksum digit for CINS numbers?

In some cases for CINS numbers or CUSIP with character in the middle, function returns incorrectly that identifier is not valid.
Root cause could be that initial CUSIP code was developed with numerical CUSIPs in mind and requires tweaking. Better code could be borrowed from isIsin function, as it handles alpha characters better, but still using the same checksum digit algo

File name handling and scope limit

  1. Limit scope of the function to files that could be parsed (starting from Q1 2004)
  2. Special handling of file name for Q1 2004
  3. Checking parameters passed into function (starting from Q1 2004)

Remove `tidyr` dependency by re-implementing `filldown` function

fill function is the only dependency in the package that requires tidyr. It is possible to include cpp filldown from tidyr as non-exported function, in order to remove dependency.
The remaining dependencies will be pdftools and rvest.
rvest changed in the past and broke functionality and should be considered for removing as dependency.
pdftools - ok to have it as dependency.

Resolve "no visible binding..." note during package build

checking R code for possible problems ... NOTE
SEC_13F_list: no visible binding for global variable 'PDF_STRING'
SEC_13F_list: no visible binding for global variable 'CUSIP_start'
SEC_13F_list: no visible binding for global variable
'ISSUER_NAME_start'
SEC_13F_list: no visible binding for global variable
'ISSUER_DESCRIPTION_start'
SEC_13F_list: no visible binding for global variable 'STATUS_start'
SEC_13F_list: no visible binding for global variable 'CUSIP_end'
SEC_13F_list: no visible binding for global variable 'STATUS_end'
SEC_13F_list: no visible binding for global variable
'HAS_LISTED_OPTION_start'
SEC_13F_list: no visible binding for global variable
'HAS_LISTED_OPTION_end'
SEC_13F_list: no visible binding for global variable 'ISSUER_NAME_end'
SEC_13F_list: no visible binding for global variable
'ISSUER_DESCRIPTION_end'
SEC_13F_list: no visible binding for global variable 'CUSIP'
Undefined global functions or variables:
CUSIP CUSIP_end CUSIP_start HAS_LISTED_OPTION_end
HAS_LISTED_OPTION_start ISSUER_DESCRIPTION_end
ISSUER_DESCRIPTION_start ISSUER_NAME_end ISSUER_NAME_start PDF_STRING
STATUS_end STATUS_start

Outstanding Issues:

The following thing will be fixed in next couple of releases:

  1. Description of returned data frame

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.