Coder Social home page Coder Social logo

ropensci / rb3 Goto Github PK

View Code? Open in Web Editor NEW
69.0 13.0 27.0 40.56 MB

A bunch of downloaders and parsers for data delivered from B3

Home Page: https://docs.ropensci.org/rb3/

License: Other

R 100.00%
r market-data finance financial-data financial-services exchange-data brazil

rb3's Introduction

rb3

Project Status: Active – The project has reached a stable, usable state and is being actively developed. Codecov test coverage R build (rcmdcheck) CRAN status Status at rOpenSci Software Peer Review

B3 is the main financial exchange in Brazil, offering support and access to trading systems for equity and fixed income markets. In its website you can find a vast number of datasets regarding prices and transactions for contracts available for trading at these markets, including:

  • equities/stocks
  • futures
  • FII (Reits)
  • options
  • BDRs
  • historical yield curves (calculated from futures contracts)
  • B3 indexes composition

For example, you can find the current yield curve at this link. Package rb3 uses webscraping tools to download and read these datasets from B3, making it easy to consume it in R in a structured way.

The available datasets are highly valuable, going back as early as 2000’s, and can be used by industry practitioners or academics. None of these datasets are available anywhere else, which makes rb3 an unique package for data importation from the Brazilian financial exchange.

Documentation

The documentation is available in its pkgdown page, where articles (vignettes) with real applications can be found.

Installation

Package rb3 is available in its stable form in CRAN and its development version in Github. Please find the installation commands below:

# stable (CRAN)
install.packages("rb3")

# github (Development branch)
if (!require(devtools)) install.packages("devtools")
devtools::install_github("ropensci/rb3")

Examples

Yield curve

In this first example we’ll import and plot the historical yield curve for Brazil using function yc_get.

library(rb3)
library(ggplot2)
library(stringr)

df_yc <- yc_mget(
  first_date = Sys.Date() - 255 * 5,
  last_date = Sys.Date(),
  by = 255
)
#> Warning: Automatic coercion from double to character was deprecated in purrr 1.0.0.
#> ℹ Please use an explicit call to `as.character()` within `map_chr()` instead.
#> ℹ The deprecated feature was likely used in the rb3 package.
#>   Please report the issue at <https://github.com/wilsonfreitas/rb3/issues>.

p <- ggplot(
  df_yc,
  aes(
    x = forward_date,
    y = r_252,
    group = refdate,
    color = factor(refdate)
  )
) +
  geom_line() +
  labs(
    title = "Yield Curves for Brazil",
    subtitle = "Built using interest rates future contracts",
    caption = str_glue("Data imported using rb3 at {Sys.Date()}"),
    x = "Forward Date",
    y = "Annual Interest Rate",
    color = "Reference Date"
  ) +
  theme_light() +
  scale_y_continuous(labels = scales::percent)

print(p)

Futures prices

Get settlement future prices with futures_get.

library(rb3)
library(dplyr)

df <- futures_mget(
  first_date = "2022-04-01",
  last_date = "2022-04-29",
  by = 5
)

glimpse(
  df |>
    filter(commodity == "DI1")
)
#> Rows: 153
#> Columns: 8
#> $ refdate          <date> 2022-04-01, 2022-04-01, 2022-04-01, 2022-04-01, 2022…
#> $ commodity        <chr> "DI1", "DI1", "DI1", "DI1", "DI1", "DI1", "DI1", "DI1…
#> $ maturity_code    <chr> "J22", "K22", "M22", "N22", "Q22", "U22", "V22", "X22…
#> $ symbol           <chr> "DI1J22", "DI1K22", "DI1M22", "DI1N22", "DI1Q22", "DI…
#> $ price_previous   <dbl> 99999.99, 99172.50, 98159.27, 97181.87, 96199.14, 951…
#> $ price            <dbl> 100000.00, 99172.31, 98160.23, 97185.43, 96210.42, 95…
#> $ change           <dbl> 0.01, -0.19, 0.96, 3.56, 11.28, 21.61, 34.93, 48.85, …
#> $ settlement_value <dbl> 0.01, 0.19, 0.96, 3.56, 11.28, 21.61, 34.93, 48.85, 5…

Equity data

Equity closing data (without ANY price adjustments) is available thru cotahist_get.

library(rb3)
library(bizdays)
#> 
#> Attaching package: 'bizdays'
#> The following object is masked from 'package:stats':
#> 
#>     offset

# fix for ssl error (only in linux)
if (Sys.info()["sysname"] == "Linux") {
  httr::set_config(
    httr::config(ssl_verifypeer = FALSE)
  )
}

date <- preceding(Sys.Date() - 1, "Brazil/ANBIMA") # last business day
ch <- cotahist_get(date, "daily")

glimpse(
  cotahist_equity_get(ch)
)
#> Rows: 367
#> Columns: 13
#> $ refdate               <date> 2023-03-03, 2023-03-03, 2023-03-03, 2023-03-03,…
#> $ symbol                <chr> "AERI3", "AESB3", "AFLT3", "AGRO3", "AGXY3", "BR…
#> $ open                  <dbl> 1.15, 10.00, 8.89, 26.15, 8.45, 25.89, 28.83, 14…
#> $ high                  <dbl> 1.17, 10.18, 8.89, 26.60, 9.06, 26.35, 29.24, 15…
#> $ low                   <dbl> 1.11, 9.96, 8.71, 26.12, 8.39, 25.80, 28.76, 14.…
#> $ close                 <dbl> 1.12, 10.16, 8.71, 26.33, 9.06, 25.92, 28.92, 14…
#> $ average               <dbl> 1.14, 10.08, 8.80, 26.39, 8.75, 26.13, 29.00, 14…
#> $ best_bid              <dbl> 1.12, 10.14, 8.71, 26.32, 8.92, 25.92, 28.81, 14…
#> $ best_ask              <dbl> 1.13, 10.16, 8.99, 26.33, 9.06, 26.08, 28.92, 14…
#> $ volume                <dbl> 4481724, 19128087, 1760, 7341008, 234691, 689933…
#> $ traded_contracts      <dbl> 3926400, 1897500, 200, 278100, 26800, 26400, 305…
#> $ transactions_quantity <int> 2818, 5573, 2, 1399, 177, 140, 7403, 17092, 1464…
#> $ distribution_id       <int> 101, 105, 119, 113, 101, 145, 145, 123, 102, 166…

Funds data

One can also download hedge fund data with cotahist_etfs_get.

glimpse(
  cotahist_etfs_get(ch)
)
#> Rows: 100
#> Columns: 13
#> $ refdate               <date> 2023-03-03, 2023-03-03, 2023-03-03, 2023-03-03,…
#> $ symbol                <chr> "AGRI11", "IBOB11", "OGIN11", "BOVB11", "BOVS11"…
#> $ open                  <dbl> 43.51, 84.31, 9.51, 104.52, 79.95, 104.50, 10.40…
#> $ high                  <dbl> 43.63, 84.31, 9.63, 105.11, 80.57, 105.74, 10.51…
#> $ low                   <dbl> 42.91, 84.17, 9.35, 104.52, 79.74, 104.49, 10.40…
#> $ close                 <dbl> 43.10, 84.17, 9.57, 104.68, 80.13, 104.90, 10.45…
#> $ average               <dbl> 43.40, 84.17, 9.50, 104.68, 80.22, 105.17, 10.46…
#> $ best_bid              <dbl> 41.50, 83.51, 9.44, 104.68, 80.13, 104.90, 10.45…
#> $ best_ask              <dbl> 46.24, 84.18, 9.57, 113.00, 90.00, 105.11, 10.49…
#> $ volume                <dbl> 2274762.37, 1375010.67, 14024.14, 241499.09, 444…
#> $ traded_contracts      <dbl> 52405, 16336, 1476, 2307, 554, 1733901, 336092, …
#> $ transactions_quantity <int> 50, 6, 24, 15, 452, 11558, 1949, 5, 25, 4402, 4,…
#> $ distribution_id       <int> 100, 100, 102, 100, 100, 101, 100, 101, 100, 100…

FIIs (Brazilian REITs) data

Download FII (Fundo de Investimento Imobiliário) data with cotahist_fiis_get:

glimpse(
  cotahist_fiis_get(ch)
)
#> Rows: 268
#> Columns: 13
#> $ refdate               <date> 2023-03-03, 2023-03-03, 2023-03-03, 2023-03-03,…
#> $ symbol                <chr> "BZLI11", "AFHI11", "IBCR11", "IDGR11", "LUGG11"…
#> $ open                  <dbl> 17.00, 94.69, 76.73, 80.50, 71.98, 94.83, 11.50,…
#> $ high                  <dbl> 17.00, 94.70, 77.00, 80.50, 72.00, 95.88, 11.50,…
#> $ low                   <dbl> 17.00, 93.72, 76.39, 80.50, 71.16, 94.02, 11.49,…
#> $ close                 <dbl> 17.00, 94.20, 77.00, 80.50, 71.61, 94.31, 11.49,…
#> $ average               <dbl> 17.00, 94.19, 76.70, 80.50, 71.53, 94.59, 11.49,…
#> $ best_bid              <dbl> 17.00, 94.20, 76.74, 1.00, 71.42, 94.31, 10.96, …
#> $ best_ask              <dbl> 18.00, 94.42, 77.00, 80.50, 71.61, 94.32, 11.49,…
#> $ volume                <dbl> 357.00, 453376.17, 54150.68, 8050.00, 46068.54, …
#> $ traded_contracts      <dbl> 21, 4813, 706, 100, 644, 11197, 10, 70, 207, 100…
#> $ transactions_quantity <int> 2, 1670, 75, 1, 203, 3116, 4, 6, 58, 73, 2650, 2…
#> $ distribution_id       <int> 100, 124, 121, 112, 140, 152, 213, 102, 226, 144…

BDRs data

Download BDR (Brazilian depositary receipts) with cotahist_bdrs_get:

glimpse(
  cotahist_bdrs_get(ch)
)
#> Rows: 507
#> Columns: 13
#> $ refdate               <date> 2023-03-03, 2023-03-03, 2023-03-03, 2023-03-03,…
#> $ symbol                <chr> "ADBE34", "CSCO34", "I1QV34", "I1QY34", "I1RM34"…
#> $ open                  <dbl> 34.60, 50.53, 283.92, 19.76, 283.50, 500.19, 60.…
#> $ high                  <dbl> 35.86, 51.44, 287.00, 20.00, 283.50, 500.19, 61.…
#> $ low                   <dbl> 34.60, 50.53, 283.92, 19.70, 283.50, 500.19, 60.…
#> $ close                 <dbl> 35.48, 51.44, 287.00, 19.96, 283.50, 500.19, 61.…
#> $ average               <dbl> 35.40, 51.18, 285.46, 19.75, 283.50, 500.19, 60.…
#> $ best_bid              <dbl> 35.48, 48.45, 0.00, 19.55, 249.50, 200.00, 59.40…
#> $ best_ask              <dbl> 39.98, 51.44, 335.10, 0.00, 310.00, 0.00, 70.00,…
#> $ volume                <dbl> 600948.59, 437857.01, 570.92, 39665.04, 1134.00,…
#> $ traded_contracts      <dbl> 16972, 8555, 2, 2008, 4, 472, 77, 500, 535693, 1…
#> $ transactions_quantity <int> 230, 144, 2, 25, 1, 2, 2, 2, 1041, 13, 54, 92, 2…
#> $ distribution_id       <int> 101, 149, 100, 100, 112, 113, 102, 146, 117, 112…

Equity options

Download equity options contracts with cotahist_option_get:

glimpse(
  cotahist_equity_options_get(ch)
)
#> Rows: 6,656
#> Columns: 14
#> $ refdate               <date> 2023-03-03, 2023-03-03, 2023-03-03, 2023-03-03,…
#> $ symbol                <chr> "ABCBC200", "ABCBO176", "ABEVF160", "ABEVP300", …
#> $ type                  <fct> Call, Put, Call, Put, Put, Put, Put, Put, Call, …
#> $ strike                <dbl> 19.69, 17.69, 14.82, 18.82, 28.82, 14.82, 13.82,…
#> $ maturity_date         <date> 2023-03-17, 2023-03-17, 2023-06-16, 2023-04-20,…
#> $ open                  <dbl> 0.07, 0.28, 0.30, 5.48, 15.30, 1.50, 0.90, 2.76,…
#> $ high                  <dbl> 0.07, 0.28, 0.30, 5.48, 15.30, 1.50, 0.98, 2.76,…
#> $ low                   <dbl> 0.06, 0.28, 0.28, 5.48, 15.28, 1.50, 0.90, 2.76,…
#> $ close                 <dbl> 0.06, 0.28, 0.28, 5.48, 15.28, 1.50, 0.98, 2.76,…
#> $ average               <dbl> 0.06, 0.28, 0.28, 5.48, 15.28, 1.50, 0.92, 2.76,…
#> $ volume                <dbl> 76, 140, 86, 5480, 7643, 1500, 7530, 1380, 10423…
#> $ traded_contracts      <dbl> 1100, 500, 300, 1000, 500, 1000, 8100, 500, 3898…
#> $ transactions_quantity <int> 2, 1, 3, 2, 4, 1, 3, 1, 89, 2, 13, 8, 2, 174, 12…
#> $ distribution_id       <int> 142, 143, 124, 124, 124, 124, 124, 124, 125, 125…

Indexes composition

The list with available B3 indexes can be obtained with indexes_get.

indexes_get()
#>  [1] "AGFS" "BDRX" "GPTW" "IBOV" "IBRA" "IBXL" "IBXX" "ICO2" "ICON" "IDIV"
#> [11] "IEEX" "IFIL" "IFIX" "IFNC" "IGCT" "IGCX" "IGNM" "IMAT" "IMOB" "INDX"
#> [21] "ISEE" "ITAG" "IVBX" "MLCX" "SMLL" "UTIL"

And the composition of a specific index with index_comp_get.

(ibov_comp <- index_comp_get("IBOV"))
#>  [1] "ABEV3"  "ALPA4"  "AMER3"  "ASAI3"  "AZUL4"  "B3SA3"  "BBAS3"  "BBDC3" 
#>  [9] "BBDC4"  "BBSE3"  "BEEF3"  "BIDI11" "BPAC11" "BPAN4"  "BRAP4"  "BRFS3" 
#> [17] "BRKM5"  "BRML3"  "CASH3"  "CCRO3"  "CIEL3"  "CMIG4"  "CMIN3"  "COGN3" 
#> [25] "CPFE3"  "CPLE6"  "CRFB3"  "CSAN3"  "CSNA3"  "CVCB3"  "CYRE3"  "DXCO3" 
#> [33] "ECOR3"  "EGIE3"  "ELET3"  "ELET6"  "EMBR3"  "ENBR3"  "ENEV3"  "ENGI11"
#> [41] "EQTL3"  "EZTC3"  "FLRY3"  "GGBR4"  "GOAU4"  "GOLL4"  "HAPV3"  "HYPE3" 
#> [49] "IGTI11" "IRBR3"  "ITSA4"  "ITUB4"  "JBSS3"  "JHSF3"  "KLBN11" "LCAM3" 
#> [57] "LREN3"  "LWSA3"  "MGLU3"  "MRFG3"  "MRVE3"  "MULT3"  "NTCO3"  "PCAR3" 
#> [65] "PETR3"  "PETR4"  "PETZ3"  "POSI3"  "PRIO3"  "QUAL3"  "RADL3"  "RAIL3" 
#> [73] "RDOR3"  "RENT3"  "RRRP3"  "SANB11" "SBSP3"  "SLCE3"  "SOMA3"  "SULA11"
#> [81] "SUZB3"  "TAEE11" "TIMS3"  "TOTS3"  "UGPA3"  "USIM5"  "VALE3"  "VBBR3" 
#> [89] "VIIA3"  "VIVT3"  "WEGE3"  "YDUQ3"

With the index composition you can use COTAHIST to select their quotes.

glimpse(
  cotahist_get_symbols(ch, ibov_comp)
)
#> Rows: 88
#> Columns: 13
#> $ refdate               <date> 2023-03-03, 2023-03-03, 2023-03-03, 2023-03-03,…
#> $ symbol                <chr> "BRAP4", "CSAN3", "CSNA3", "LREN3", "LWSA3", "PC…
#> $ open                  <dbl> 28.83, 14.73, 17.90, 18.27, 4.58, 14.36, 13.12, …
#> $ high                  <dbl> 29.24, 15.06, 18.52, 18.27, 4.81, 14.70, 13.22, …
#> $ low                   <dbl> 28.76, 14.64, 17.85, 17.74, 4.58, 14.19, 12.99, …
#> $ close                 <dbl> 28.92, 14.74, 18.24, 17.75, 4.75, 14.41, 13.11, …
#> $ average               <dbl> 29.00, 14.85, 18.29, 17.93, 4.70, 14.47, 13.09, …
#> $ best_bid              <dbl> 28.81, 14.73, 18.24, 17.74, 4.74, 14.41, 13.11, …
#> $ best_ask              <dbl> 28.92, 14.74, 18.25, 17.76, 4.75, 14.45, 13.12, …
#> $ volume                <dbl> 88684490, 108449481, 151985388, 179220052, 42526…
#> $ traded_contracts      <dbl> 3057800, 7302400, 8307600, 9995000, 9041300, 263…
#> $ transactions_quantity <int> 7403, 17092, 13794, 28476, 12587, 7990, 34400, 2…
#> $ distribution_id       <int> 145, 123, 258, 215, 102, 166, 126, 115, 134, 144…

Template System

One important part of rb3 infrastructure is its Template System.

All datasets handled by the rb3 package are configured in a template, that is an YAML file. The template brings many information regarding the datasets, like its description and its metadata that describes its columns, their types and how it has to be parsed. The template fully describes its dataset.

Once you have the template implemented you can fetch and read downloaded data directly with the functions download_marketdata and read_marketdata.

For examples, let’s use the template FPR to download and read data regarding primitive risk factor used by B3 in its risk engine.

f <- download_marketdata("FPR", refdate = as.Date("2022-05-10"))
f
#> [1] "C:/Users/wilso/R/rb3-cache/FPR/7a2422cc97221426a3b2bd4419215481/FP220510/FatoresPrimitivosRisco.txt"

download_marketdata returns the path for the downloaded file.

fpr <- read_marketdata(f, "FPR")
fpr
#> $Header
#> # A tibble: 1 × 2
#>   tipo_registro data_geracao_arquivo
#>           <int> <date>              
#> 1             1 2022-05-10          
#> 
#> $Data
#> # A tibble: 3,204 × 11
#>    tipo_r…¹ id_fpr nome_…² forma…³ id_gr…⁴ id_ca…⁵ id_in…⁶ orige…⁷  base base_…⁸
#>       <int>  <int> <chr>   <fct>     <dbl> <chr>     <dbl>   <int> <int>   <int>
#>  1        2   1422 VLRAPT4 Basis …       1 BVMF    2.00e11       8     0       0
#>  2        2   1423 VLPETR3 Basis …       1 BVMF    2.00e11       8     0       0
#>  3        2   1424 VLSEER3 Basis …       1 BVMF    2.00e11       8     0       0
#>  4        2   1426 VLJBSS3 Basis …       1 BVMF    2.00e11       8     0       0
#>  5        2   1427 VLKLBN… Basis …       1 BVMF    2.00e11       8     0       0
#>  6        2   1428 VLITUB3 Basis …       1 BVMF    2.00e11       8     0       0
#>  7        2   1429 VLITSA4 Basis …       1 BVMF    2.00e11       8     0       0
#>  8        2   1430 VLHYPE3 Basis …       1 BVMF    2.00e11       8     0       0
#>  9        2   1431 VLGRND3 Basis …       1 BVMF    2.00e11       8     0       0
#> 10        2   1433 VLUGPA3 Basis …       1 BVMF    2.00e11       8     0       0
#> # … with 3,194 more rows, 1 more variable: criterio_capitalizacao <int>, and
#> #   abbreviated variable names ¹​tipo_registro, ²​nome_fpr, ³​formato_variacao,
#> #   ⁴​id_grupo_fpr, ⁵​id_camara_indicador, ⁶​id_instrumento_indicador,
#> #   ⁷​origem_instrumento, ⁸​base_interpolacao
#> 
#> attr(,"class")
#> [1] "parts"

read_marketdata parses the downloaded file according to the metadata configured in the template FPR.

Here it follows a view of the show_templates adding that lists the available templates.

show_templates()

rb3's People

Contributors

msperlin avatar wilsonfreitas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rb3's Issues

Can't read 2014 COTAHIST

Olá! Ao ler o arquivo 2014 pela função read_marketdata com template = 'COTAHIST_YEARLY' devolve esse erro:

Error in stri_trim_both(string) :
invalid UTF-8 byte sequence detected; try calling stri_enc_toutf8()
Acontece apenas em 2014.

Abraços

downloaded files should be unique

rb3 works with downloaded files.
These files should be unique.
If two files have the same content, they are the same.
Currently, the downloaded files are saved with a key that is related to the argument list used to download the file.
This is misleading use to the fact that some files can have different content for the same argument list when downloaded at different time moments.
To solve it, the downloaded files should be saved with a hash code built with their content.
So all files would be downloaded.
Further, the timestamp used to download the file should be used to identify different downloads.

Filtrar apenas 1 vértice da ETTJ com o "yc_mget"

Olá pessoal, boa tarde!

Existe a possibilidade de ao invés a função yc_mget puxar a curva nominal de diversos dias úteis, puxar apenas um vértice específico, por exemplo o vértice 252?

Muito obrigado.

Templates com apenas Reader

Bom dia, pessoal!

Como consigo puxar os dados que tem somente reader nos templates?

Por exemplo:

curvas de vol de opções

image

Valeu!

Fix images in documentation

Images in documentation (README and vignettes) are a mess, some are huge and others are small.
The pkgdown vignettes are OK, this problem appers in the CRAN website.

Don't cache NULL data read_marketdata returns

read_marketdata returns NULL when downloaded data is invalid.
In this situation the download data is cached and the parsed data, in this case NULL, is also cached.
Actually NULL is being cached in a rds file and the invalid downloaded file is also cached.

If read_marketdata returns NULL, the rds file can't be created and the given file, input file, should be removed either.

Simply, don't cache invalid or incorrect data.

Coluna

Olá,

Na a função read_marketdata, template= 'COTAHIST_YEARLY' a coluna qtd_titulos_negociados esta com tipo caractere ao invés de numérico.

Abraços

Utilizar objetos como argumentos de field

O único argumento obrigatório de field é o nome, que é character.
A verificação dos demais é feita por nome.

field <- function(field, ...) {
    parms <- list(...)
    attr(field, 'width')   <- if (!is.null(parms[['width']]))   parms[['width']]   else 0
    attr(field, 'handler') <- if (!is.null(parms[['handler']])) parms[['handler']] else identity
    class(field) <- 'field'
    field
}

field('Identificação da transação', width=6, handler=to_numeric())

Ao invés de fazer a verificação por nome, que carrega a assinatura da função, fazer a verificação por tipo, que deixa a assinatura da função mais clean.

field('Identificação da transação', width(6), to_numeric())
field('Data de geração do arquivo', width(8), to_date(format='%Y%m%d'))
field('Data de emissão do título', to_date('%Y%m%d'))

Review rb3 dependencies

Change rb3 dependencies moving from base to tidy.

Use more readr functions, more tibble and rlang for error management.

There are many calls to data.frame, for example, replace with tibble.

The same for read.csv calls and other legacy code.

load failed for 'rb3'

Olá, eu chequei antes nos tópicos e não achei nada parecido por isso criei a nova issue.

Bom tentei baixar o pacote como mencionado na documentação e ao tentar iniciar a lib já dá um erro.
Apesar de não usar muito r eu tenho a biblioteca bizdays instalada e rodo rotinas todo dia com ela, não sei se é esse o problema, segue a mensagem:

image

Agradeço se puderem me ajudar!

Mudar fwf_field para field

Para definir os campos (colunas) uso fwf_field em arquivos fwf.
No entanto a notação deve ser única, logo apenas field é necessário.

Scrape PDF for IBOVESPA historical composition

This PDF

https://www.b3.com.br/data/files/48/56/93/D5/96E615107623A41592D828A8/SERIE-RETROATIVA-DO-IBOV-METODOLOGIA-VALIDA-A-PARTIR-09-2013.pdf

Has the historical IBOVESPA compositon from Jan/2003 to Jan/2014.

But this must be extracted from this PDF.

pdf_tables <- tabulizer::extract_tables(url_pdf, pages = seq(62, 77))

library(tidyverse)
library(janitor)

pdf_tables[[1]] %>%
  as_tibble(.name_repair = "unique") %>%
  row_to_names(1) %>%
  clean_names() %>%
  pivot_longer(tidyselect::contains("_20"),
    names_to = c("mes")
  )

This code seems to solve that

Lista de atividades

Package infrastructure

  • githhub actions para CHECK/pkgdown/coverage
  • coverage tests
  • metacode.json
  • pkgdown ??
  • improve covtests to at least 75% (run covr::report() for details)

Github

  • remove (or solve) old issues

Download/Parsers

Vignettes

  • Leitura e download de arquivos (download_data e read_marketdata)
  • equities e equity options
  • commodity futures (backward e contango)

Improve the way data is cached

The cached data used the template name and a hash that is built based on the arguments used to generate it.
Further, data is always stored in rds format.

One interesting alternative is the use of parquet files to store data.
With that duckdb could be used to query data on these files and the cache would form a database.

To get this done, and easy to debug and follow, a name formation rule should be interesting for cached files.

A naming function should be created to name cached data.

This naming function would be declared in the template file, the same way is done with downloader and reader.

Create cotahist_equity_options_superset

The function cotahist_equity_options returns a table with all options.

An improvement cotahist_equity_options_superset should return a table with all options, stock prices (closing or OHLC?) and if yc is provided the interest rate at maturity is also put in a column.

This table should contain all information necessary to compute implied volatity.

Salvar arquivos por template no cache

Salvando arquivos por template no cache ajuda a reduzir a confusão de muitos arquivos de diversos templates dentro do diretório do cache.

Dessa maneira no diretório do cache teríamos

<cachedir>/<template>/<files>

Erro para compilação.

Estou recebendo esse erro na instalação do pacote:

Error: Failed to install 'rbmfbovespa' from GitHub:
System command error, exit status: 1, stdout + stderr:
E> * checking for file 'C:\Users\pc\AppData\Local\Temp\RtmpIrueqA\remotes34b05eb278d9\wilsonfreitas-rbmfbovespa-9f6622b/DESCRIPTION' ... OK
E> * preparing 'rbmfbovespa':
E> * checking DESCRIPTION meta-information ... OK
E> Warning in grepl(e, files, perl = TRUE, ignore.case = TRUE) :
E> erro de compilação de padrão PCRE
E> 'nothing to repeat'
E> at '.json'
E> Error in grepl(e, files, perl = TRUE, ignore.case = TRUE) :
E> expressão regular inválida '
.json'
E> Execução interrompida

tem como me ajudar? Obrigado.

cdi_get caches downloaded data, but it shouldn't

get_cdi function does not have a refdate argument and it caches the downloaded data.
Furtherly it doesn't have arguments use_cache and cache_folder for cache handling.
This is a mess because once the data is downloaded and cached, it is not updated after the source is changed.

Update historical data in cache

Create an engine to update historical data in cache.

Maybe the binary data, already processed inside a data.frame, could be saved in a SQLite database like a BLOB.

Considering that all data downloaded is indexed to a reference date, one alternative is: update the cache with all missing reference dates.

Example:

update_cache(template)
#> Updating <template> cache with <n> dates

When the cache is empty

update_cache(template)
#> x Error no cache for template <template>
check_cache(template)
#> cache for <template> is up to date

Update to bizdays version 0.1.10

The bizdays version 0.1.10 brings the function load_builtin_calendars that can be called with the namespace to load the packages.
This is intended for package development.
Packages that use bizdays and uses the calendars by their names should call this function at loading time (.onAttach or .onLoad).
Now the packages can declare bizdays in Imports field instead of Depends, in DESCRIPTION file.

Dados consolidados B3

Olá, eu preciso de volumes de negociação de futuros que hoje em dia consigo pegar pela url:

Ao qual vem um .csv com em torno de 40k linhas, dentre esses futuros, ações etc.
Bom, olhando a documentação da rb3 eu achei o template COTAHIST_DAILY, que tem basicamente as informações desse arquivo consolidado que mencionei acima, porém nesse só tem dados para ações.

Novamente procurei em toda documentação e não achei, mas posso ter deixado passar. Minha pergunta é: Consigo esses dados do arquivo consolidado que vem da b3 mas com todos os ativos e não só de ações?

Deixo abaixo um exemplo do arquivo que vem na b3 abaixo (está em python, mas ignore o detalhe):
image

E aqui um exemplo que peguei usando o template de cotahist da rb3:
image

Indexes information

An implementation like these examples from tidyquant

# Load libraries
library(tidyquant)

# Get the list of stock index options
tq_index_options()

# lists all symbols in the given index
tq_index("DOW")

Lidar com arquivos que tem caractéres especiais

A leitura de BDIN apresenta o seguinte problema:

> bdin <- read_marketdata('inst/extdata/BDIN')
Error in substr(lines, dx[1], dx[2]) : 
  string multibyte inválida em '<cd>NDI<43>E UTIL P<da>B (UTIL)00021400002136000216400021490002157+00077                                                                                                                                                                                                                                                                   '

Este arquivo possui enconding latin1.

Como lidar com isso?

Implementar visualização de templates (print)

Implementar método print (ou summary) para visualização de templates.
A documentação do projeto pode se tornar complicada na medida que novos templates forem sendo adicionados, dessa maneira, a visualização de templates seria a melhor documentação para a biblioteca.

COTAHIST_YEARLY wrongly uses cache

For years lower than 2000, a wrong use of cache is happening.

When the download and read is executed for the specific year of 1999, it works fine.

library(rb3)
f <- download_marketdata("COTAHIST_YEARLY", refdate = as.Date("1999-01-01"))
ch <- read_marketdata(f, "COTAHIST_YEARLY")
ch$Header
#> # A tibble: 1 x 5
#>   tipo_registro nome_arquivo  cod_origem data_geracao_arquivo reserva
#>           <int> <chr>         <chr>      <date>               <chr>  
#> 1             0 COTAHIST.1999 BOVESPA    2000-01-03           ""

When the download and read is executed for the specific year of for 2000 and 1999, the read of 1999 file is wrong and returns the cached version that refers to 2000 file.

library(rb3)
f <- download_marketdata("COTAHIST_YEARLY", refdate = as.Date("2000-01-01"))
ch <- read_marketdata(f, "COTAHIST_YEARLY")
#> Warning in rule$handler(.data, str_match(.data, rule$regex)): NAs introduced by
#> coercion
ch$Header
#> # A tibble: 1 x 5
#>   tipo_registro nome_arquivo  cod_origem data_geracao_arquivo reserva
#>           <int> <chr>         <chr>      <date>               <chr>  
#> 1             0 COTAHIST.2000 BOVESPA    2001-05-02           ""
f <- download_marketdata("COTAHIST_YEARLY", refdate = as.Date("1999-01-01"))
ch <- read_marketdata(f, "COTAHIST_YEARLY")
ch$Header
#> # A tibble: 1 x 5
#>   tipo_registro nome_arquivo  cod_origem data_geracao_arquivo reserva
#>           <int> <chr>         <chr>      <date>               <chr>  
#> 1             0 COTAHIST.2000 BOVESPA    2001-05-02           ""

Created on 2022-07-02 by the reprex package (v2.0.1)

Use yc_mget for multiple dates and yc_get for a single date

I rarely use yield curves for multiples dates.
In instruments pricing the common use is to fetch the yield curve for one specific date and price the financial instruments with that curve.

So, my suggestion is: use yc_mget for multiple dates, following the same interface that is now used in yc_get.
And yc_get should return one single curve for the given date.

yc_mget(first_date, last_date, by = 5, cachedir(), do_cache = FALSE)
yc_get(refdate, cachedir(), do_cache = FALSE)

The same can be applied to futures_get: futures_mget for multiple dates and futures_get for a single date.

get and mget names are commonly used in nosql databases to express the same behavior/idea.

@msperlin what do you think about that?

Composição de handlers

Os handlers poderiam ser compostos, para isso seria necessário declarar mais de 1 e seriam aplicados na ordem em que são declarados.

field('Data de emissão do título', to_date('%Y%m%d'), to_string(format='%Y/%b/%d'))

Aqui converte string para data e depois formata a data.

field('Preço de exercício (opç)', width(13), 
  to_numeric(),
  decimal_places(field='Número de casas decimais dos campos com *'))

Aqui formata para numeric e depois as casas decimais para o número, de acordo com o campo especificado

Organize imports

This package has many dependencies.
It is a good practice to be more specific in the imports.

Prefer @importFrom in place of @import.

Corrigir o mapeamento do template BD_Arbit

Os campos de cotação estão sendo ajustados pelo atributo errado.

field('Cotação do maior negócio do dia', width(8),
    to_numeric(dec='Número de casas decimais dos campos com *', sign='Sinal da cotação do maior negócio do dia')),

O atributo correto é Número de casas decimais dos ajustes.

Rever todos os campos de ajustes.

Improve fields creation

Fields have a handler attribute that defines the variable type to parse the column.
The pass-thru generic handler is to define handler type as character.
By doing that you tell to the parser that this column has not a specific type and should reaims as a character.

fields:
- name: company
  description: Nome da companhia
  handler:
    type: character

Instead of defining a character handler a better approach is defining no handler. Like that:

fields:
- name: company
  description: Nome da companhia

Once a field has no handler it is assumed it uses a pass-thru generic handler.

This simplifies the config file and improves configuration.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.