statistikat / statcuber Goto Github PK
View Code? Open in Web Editor NEWR interface for the STATcube REST API and data.statistik.gv.at
Home Page: https://statistikat.github.io/STATcubeR/
License: GNU General Public License v2.0
R interface for the STATcube REST API and data.statistik.gv.at
Home Page: https://statistikat.github.io/STATcubeR/
License: GNU General Public License v2.0
The cube sc_table_saved("str:table:defaulttable_delufapi004")
(external info page) uses time codes of the form {prefix}-YYYY-Q
which are currently not parsed correctly into a date format because the parser expects {prefix}-YYYYQ
.
x <- sc_table_saved("str:table:defaulttable_delufapi004")
x$field()
# STATcubeR metadata: 6 x 3
code label parsed
<chr> <chr> <chr>
1 APIQ10-2020-1 1. quarter 2020 1. quarter 2020
2 APIQ10-2020-2 2. quarter 2020 2. quarter 2020
3 APIQ10-2020-3 3. quarter 2020 3. quarter 2020
4 APIQ10-2020-4 4. quarter 2020 4. quarter 2020
5 APIQ10-2020-5 annual 2020 annual 2020
6 APIQ10-2021-1 1. quarter 2021 1. quarter 2021
With a proper update of the parser, the parsed
column should be of type <Date>
. We should assume that the client skips the "annual" values by providing certain recodes in the JSON. The parser should therefore be able to work with strings of the form
c("APIQ10-2020-1", "APIQ10-2020-2", "APIQ10-2020-3", "APIQ10-2020-4", "APIQ10-2021-1")
As mentioned in #11 (more precisely: here), it would be useful to add a new "Week" type for time variables. The dataset od_table("OGD_gest_kalwo_alter_GEST_KALWOCHE_5J_100")
uses codes of the form {prefix}-YYYYWW
for the time variable where WW
is the calendar week. This is very similar to the {prefix}-YYYYMM
notation for months which is why the following table fails at parsing the time correctly.
x <- od_table("OGD_gest_kalwo_alter_GEST_KALWOCHE_5J_100")
x$raw$extras$metadata_modified
#> [1] "2021-07-22T09:02:59"
x$meta$fields[, c(1, 3, 4, 5)]
#> code label_en nitems type
#> 1 C-KALWOCHE-0 Calendar week 1121 Time (month)
#> 2 C-B00-0 Province (NUTS 2 unit) of deceased 9 Category
#> 3 C-ALTER5-0 5 years age group of deceased 20 Category
#> 4 C-C11-0 Gender of deceased 2 Category
x$field("Calendar week")[11:14, c(1, 3, 4)]
#> code label_en parsed
#> 2 KALW-200011 11. Calendar week 2000 (week from 13.3.2000 to 19.3.2000) 2000-11-01
#> 3 KALW-200012 12. Calendar week 2000 (week from 20.3.2000 to 26.3.2000) 2000-12-01
#> 4 KALW-200013 13. Calendar week 2000 (week from 27.3.2000 to 2.4.2000) NA
#> 5 KALW-200014 14. Calendar week 2000 (week from 3.4.2000 to 9.4.2000) NA
Implementing this will require modifications in
Line 62 in 8416f63
sc_fields_type()
Add option to render the tables in a non-tidy format that is more similar to what we see in the STATcube GUI. Make sure everything looks nice even if there are several variables that are used as columns. This requires a rendering engine that allows cell merging. Options:
Example table with several variables in rows and columns:
sc_table_saved-list()
)Add functionalities that allow modifications of labels and similar. Define a new R6 class and put an instance into x$recode
.
x <- od_table()
# set labels for fields
x$recode$label(code_field, new_label, language)
# set labels for measures
x$recode$label(code_measure, new_label, language)
# set labels of levels
x$recode$level(code_field, code_level, new_label, language)
# set total codes similar to x$total_codes() but for programmatic usage
x$recode$total_code(code_field, code_level)
# codes_levels is a permutation of x$field(code_field)$code
x$recode$order(code_field, codes_levels)
# define which levels are included in $tabulate().
x$recode$visible(code_field, code_level, visible = TRUE)
# undo all recodes
x$recode$reset()
All modifications should directly overwrite x$meta
and x$fields(i)
. The functionality should be bilingual, i.e. it should be possible to define german and english labels.
# in initialize()
private$recoder <- recoder_class$new(self, private)
# in active bindings
recode = function(value) {
private$recoder
}
It would probably be useful to add extra columns visible
and order
in x$field(i)
to store this part of the "recode state". We could also just store them in $private$p_fields[[i]]
and omit them in the active field
active = list(field = function(i) {
privae$p_fields[[i]][, -c("order", "visible)]
})
We could also add "pluralized" versions that implement the recodes such as
x$recode$labels(code_measures, new_labels, language)
x$recode$levels(code_field, code_levels, new_labels, language)
# ...
For completeness, add functions that use those simple endpoints and parse the results into classes sc_info
and sc_rate_limit
.
Add support for caching of /schema
responses via the Etag header. More generally, document and export the caching behavior of the API responses.
Currently, the colors used in the print methods of STATcubeR
only really work with dark editor themes. This is why there are setup-scripts like these to make the pkgdown-docs look nice despite having a light theme.
Lines 15 to 17 in 4537d3e
Lines 5 to 7 in 99659dc
Since there is a substantial amount of R users using light editor themes, make sure that a freshly installed version of STATcubeR works with both light and dark editors. Additionaly, keep the current color palletes as a "dark-theme" and add some way to switch between the default theme and the dark theme. Simplify the pkgdown setup by just using the new default-theme.
In order to make the theming system powerful engough to include all current "theme-adaptations" for pkgdown, it is necessary to provide
{cli}
options. (possibly a bad idea, TBD)There is already some prototyping which uses theme-definitions in inst/themes/{theme}.json
with the following structure.
{
"description": "default theme for STATcubeR",
"schema": {"FOLDER": "#4400cc", "DATABASE": "#186868", "TABLE": "#624918", "...": "..."},
"annotations": ["#4400cc", "#186868", "#624918", "..."],
"cli": {".field": {"color": "#0d0d73"}, "...": "..."}
}
It would be possible to autodetect wether a light or dark mode is approprite via rstudioapi::getThemeInfo()
. But this would be only applicable for rstudio users. It is probably better to provide a neutral theme, which works in dark and light editors as a default and make optimized themes for dark and light mode opt-in.
The CI testing of STATcubeR
is not working anymore due to changes on travis. Check whether it is possible to switch to https://travis-ci.com or set up a new CI with github actions: https://github.com/r-lib/actions
There is already a first attempt to include unit tests for the STATcube API using {httptest}
in #40 . The basic idea is to have a way to test the parsers and print methods for sc_table()
and friends when submitting the package to CRAN.
One important question here is which cubes/databases should be used in the tests. One reccomendation is the "Gemeindedaten (Demo)" databse. However, in order to maximize code-coverage, some databases with annotations and missing values would be required. Unfortunately, the "Gemeindedaten (Demo)" database only provides missings/annotations of the kind "X: cross tabulation not allowed". Another useful thing would be to have different types of time variables (half year, month, week, quarter, year)
Canidate databases
inst/json_examples
are restricted to certain user groups and replace them accordingly.sc_example()
should be exended so a list of all available examples can be displayed. Either do this with a function like sc_examples_list()
or display an error message with availale examples if sc_example()
is called with an invalid argument.Currently, as.data.frame()
inserts NA
values whenever the annotation "X"
is applied to a cell value.
Lines 52 to 54 in 1158d37
Figure out if this makes sense for other annotations and handle those cases in as.data.frame()
accordingly.
Since the pkgdown website of STATcubeR uses bootstrap 5, there are currently some issues related to r-lib/pkgdown#2207
This should be resolved if the website is rebuilt with the development version of pkgdown
So far, all I am able to do is produce "JSONDecodeError"s: "Expecting value: line 1 column 1 (char 0)"
I tried something like this:
import requests
api_url = "https://statcubeapi.statistik.at/statistik.at/ext/statcube/rest/v1e/table"
api_key = "<quitesomekeyhere>"
headers = {'APIKey': api_key, "Content-Type": "application/json"}
query = {
"database" : "str:database:debevstprog",
"measures" : [ "str:statfn:debevstprog:F-BEVSTPROG:F-S25V1:SUM", "str:statfn:debevstprog:F-BEVSTPROG:F-S25V2:SUM", "str:statfn:debevstprog:F-BEVSTPROG:F-S25V3:SUM", "str:statfn:debevstprog:F-BEVSTPROG:F-S25V4:SUM", "str:statfn:debevstprog:F-BEVSTPROG:F-S25V5:SUM", "str:statfn:debevstprog:F-BEVSTPROG:F-S25V6:SUM", "str:statfn:debevstprog:F-BEVSTPROG:F-S25V7:SUM", "str:statfn:debevstprog:F-BEVSTPROG:F-S25V8:SUM", "str:statfn:debevstprog:F-BEVSTPROG:F-S25V9:SUM", "str:statfn:debevstprog:F-BEVSTPROG:F-S25V10:SUM" ],
"recodes" : {
"str:field:debevstprog:F-BEVSTPROG:C-C11-0" : {
"map" : [ [ "str:value:debevstprog:F-BEVSTPROG:C-C11-0:C-C11-0:C11-1" ], [ "str:value:debevstprog:F-BEVSTPROG:C-C11-0:C-C11-0:C11-2" ] ],
"total" : False
},
"str:field:debevstprog:F-BEVSTPROG:C-A10-0" : {
"map" : [ [ "str:value:debevstprog:F-BEVSTPROG:C-A10-0:C-A10-0:A10-2000" ], [ "str:value:debevstprog:F-BEVSTPROG:C-A10-0:C-A10-0:A10-2010" ], [ "str:value:debevstprog:F-BEVSTPROG:C-A10-0:C-A10-0:A10-2020" ], [ "str:value:debevstprog:F-BEVSTPROG:C-A10-0:C-A10-0:A10-2030" ] ],
"total" : False
}
},
"dimensions" : [ [ "str:field:debevstprog:F-BEVSTPROG:C-A10-0" ], [ "str:field:debevstprog:F-BEVSTPROG:C-C11-0" ] ]
}
response = requests.get(api_url, headers=headers, params=query)
response_data = response.json()
print(response_data.text)
Also had to replace the "false" in the JSON query with "False", as the IDE was throwing an error otherwise. I am not the most firm with REST APIs, can anybody offer some insight how this might work?
The following json file is not handled correctly by sc_table()
{
"database" : "str:database:deenenea",
"measures" : [ "str:statfn:deenenea:F-DATA:F-EBIL:SUM" ],
"recodes" : {
"str:field:deenenea:F-DATA:C-VERWEND0-0" : {
"map" : [
[
"str:value:deenenea:F-DATA:C-VERWEND0-0:C-VERWEND0-0:VERWEND0-1",
"str:value:deenenea:F-DATA:C-VERWEND0-0:C-VERWEND0-0:VERWEND0-2"
],
[ "str:value:deenenea:F-DATA:C-VERWEND0-0:C-VERWEND0-0:VERWEND0-1" ]
]
}
},
"dimensions" : [ [ "str:field:deenenea:F-DATA:C-VERWEND0-0" ] ]
}
It results in duplicate codes for the field C-VERWEND0-0
which causes all kind of issues with $tabulate()
because if implicit assumptions.
sc_table('test.json')$field("C-VERWEND0-0")
#> # STATcubeR metadata: 3 x 7
#> code label parsed
#> <chr> <chr> <chr>
#> 1 VERWEND0-1 Space and water heating Space and water heating
#> 2 VERWEND0-1 Space and water heating Space and water heating
#> 3 SC_TOTAL Total Total
#> # … with 4 more columns: 'label_de', 'label_en', 'visible', 'order'
The reason for that is that the map
field in the json contains several URIs and only the first URI is used to generate the code column in $field()
. It should be made sure that unique codes are generated in this case, possibly by concatinating the codes of the individual uris. A fixed version might create a field definition like this
sc_table('test.json')$field("C-VERWEND0-0")
#> # STATcubeR metadata: 3 x 7
#> code label parsed
#> <chr> <chr> <chr>
#> 1 VERWEND0-1;VERWEND0-2 Space and water heating Space and water heating
#> 2 VERWEND0-1 Space and water heating Space and water heating
#> 3 SC_TOTAL Total Total
#> # … with 4 more columns: 'label_de', 'label_en', 'visible', 'order'
Time variables, should be converted to type category in this case, with a warning. Labels could also be concatenated. However, this would lead to very long labels which might not be ideal.
Implement a function that takes ids for a database, measures and fields and sends a json request. Here is a snippet on how to do that manually at the moment
# pick a dataset
db_id <- "detouextregsai"
db_schema <- sc_schema_db(db_id)
db_uid <- paste0("str:database:", db_id)
# browse the schema to obtain resource ids
id_arrivals <- db_schema$Facts$Arrivals$Arrivals$id
id_time <- db_schema$`Mandatory fields`$`Season/Tourism Month`$`Season/Tourism Month`$id
# get the response
json_list <- list(database = db_uid, measures = list(id_arrivals), dimensions = list(list(id_time)))
response <- httr::POST(
url = paste0(STATcubeR:::base_url, "/table"),
body = jsonlite::toJSON(json_list, auto_unbox = TRUE),
encode = "raw",
config = httr::add_headers(APIKey = sc_key())
)
# convert to class sc_table
my_table <- STATcubeR:::sc_table_class$new(response)
Transofm this snippet into a function
sc_table_custom(database_id, measures, fields)
The statcube api contains codes and labels for all variables. Currently, as.data.frame()
always uses codes for column names and field entries. Make this behavior optional so users can also work with codes.
The conversion between codes and labels is pretty straightforward when sc_meta()
or sc_meta_field()
is used because those functions can be used as "translators".
json_path <- sc_example("bev_seit_1982.json")
my_response <- sc_get_response(json_path)
sc_meta(my_response)
## $database
## label code
## 1 Bevölkerung zu Jahresbeginn ab 1982 debevstandjb
##
## $measures
## label code fun precision
## 1 Fallzahl F-ISIS-1 SUM 0
##
## $fields
## label code nitems
## 1 Jahr C-A10-0 40
## 2 Bundesland C-BB00-0 11
## 3 Geburtsland C-GEBLAND-0 3
sc_meta_field(my_response, 2)
## label code type
## 1 Burgenland <AT11> 1 RecodeItem
## 2 Kärnten <AT21> 2 RecodeItem
## 3 Niederösterreich <AT12> 3 RecodeItem
## 4 Oberösterreich <AT31> 4 RecodeItem
## 5 Salzburg <AT32> 5 RecodeItem
## 6 Steiermark <AT22> 6 RecodeItem
## 7 Tirol <AT33> 7 RecodeItem
## 8 Vorarlberg <AT34> 8 RecodeItem
## 9 Wien <AT13> 9 RecodeItem
## 10 Nicht klassifizierbar <0> 0 RecodeItem
## 11 Zusammen Total
Describe the bug
od_table()
throws an error when trying to download dataset OGD__steuer_lst_ab_2008_4_LST_4
, whereas all other available datasets discovered with od_list()
worked.
To Reproduce
od_table("OGD__steuer_lst_ab_2008_4_LST_4")
Fehler in `$<-.data.frame`(`*tmp*`, "parsed", value = NA_character_) :
Ersetzung hat 1 Zeile, Daten haben 0
Expected behavior
I expected to get the corresponding R6-class object, e.g.
od_table("OGD__steuer_lst_ab_2008_2_LST_2")
Wage Tax Statistics from 2008: Extent of Employment, Sex and Economic Activities
Dataset: OGD__steuer_lst_ab_2008_2_LST_2 (data.statistik.gv.at)
Measures: Number of entities (= persons) subject to wage tax, Gross total income (EUR), Other income according to §67 par. 1-2 (EUR), Entity count: Other
income according to §67 par. 1-2, Other income according to §67 par. 3-8 with fixed tax rate (EUR), Entity count: Other income according to §67 par. 3-8 with
fixed tax rate, NTSONST (nach Tarif versteuerte sonstige Bezüge) (EUR), Z_NTSONST (Fallzahl NTSONST), LFBEZ (laufende Bezüge inkl. KZ220) (EUR), Z_LFBEZ
(Fallzahl LFBEZ), … (48 more)
Fields: Year <15>, Gender <2> <2>, Duration of income <2> <2>, ÖNACE 2008 Abteilungen (2-Steller) <89> [teilw. ABO] (Ebene +1) <22>
Request: [2024-02-20 14:09:45.180562]
STATcubeR: 0.5.0 (@4537d3e)
Environment
> sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.3.0 (2023-04-21)
os Debian GNU/Linux 10 (buster)
system x86_64, linux-gnu
ui RStudio
language (EN)
collate de_AT.UTF-8
ctype de_AT.UTF-8
tz Europe/Vienna
date 2024-02-20
rstudio 2023.12.1+402 Ocean Storm (server)
pandoc NA
─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
arrow * 13.0.0.1 2023-09-22 [1] CRAN (R 4.3.0)
assertthat 0.2.1 2019-03-21 [2] CRAN (R 4.0.3)
bit 4.0.5 2022-11-15 [1] CRAN (R 4.3.0)
bit64 4.0.5 2020-08-30 [2] CRAN (R 4.0.3)
cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.0)
colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.0)
crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.0)
curl 5.2.0 2023-12-08 [1] CRAN (R 4.3.0)
data.table * 1.15.0 2024-01-30 [1] CRAN (R 4.3.0)
dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.3.0)
fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.0)
forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.3.0)
fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.0)
generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0)
ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.3.0)
glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.0)
gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.0)
here 1.0.1 2020-12-13 [2] CRAN (R 4.2.1)
hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.0)
httr 1.4.7 2023-08-15 [1] CRAN (R 4.3.0)
jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.3.0)
lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.0)
lubridate * 1.9.3 2023-09-27 [1] CRAN (R 4.3.0)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.0)
pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0)
pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.0.3)
pkgload 1.3.4 2024-01-16 [1] CRAN (R 4.3.0)
purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0)
readr * 2.1.5 2024-01-10 [1] CRAN (R 4.3.0)
remotes 2.4.2.1 2023-07-18 [1] CRAN (R 4.3.0)
rlang 1.1.3 2024-01-10 [1] CRAN (R 4.3.0)
rprojroot 2.0.4 2023-11-05 [1] CRAN (R 4.3.0)
rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.0)
rvest * 1.0.3 2022-08-19 [2] CRAN (R 4.2.1)
scales 1.3.0 2023-11-28 [1] CRAN (R 4.3.0)
selectr 0.4-2 2019-11-20 [2] CRAN (R 4.2.1)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
STATcubeR * 0.5.0 2023-06-12 [1] Github (statistikat/STATcubeR@4537d3e)
stringi 1.8.3 2023-12-11 [1] CRAN (R 4.3.0)
stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.3.0)
tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.0)
tidyjson 0.3.2 2023-01-07 [1] CRAN (R 4.3.0)
tidyr * 1.3.1 2024-01-24 [1] CRAN (R 4.3.0)
tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0)
tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.3.0)
timechange 0.3.0 2024-01-18 [1] CRAN (R 4.3.0)
tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.0)
utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.0)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.0)
withr 3.0.0 2024-01-16 [1] CRAN (R 4.3.0)
xml2 1.3.6 2023-12-04 [1] CRAN (R 4.3.0)
[1] /home/zenz/R/x86_64-pc-linux-gnu-library/4.3
[2] /usr/local/lib/R/site-library
[3] /usr/lib/R/site-library
[4] /usr/lib/R/library
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
national accounts cubes such as sc_example("foreign_trade")
do not provide values for total codes.
Therefore, they should be aggregated directly in $tabulate()
because otherwise the result would be a table filled with NAs in all measure columns.
sc_example("foreign_trade") %>%
sc_table() %$%
tabulate("Reference year")
# A STATcubeR tibble: 11 x 5
`Reference year` `Import, number… `Import, value … `Export, number… `Export, value …
* <date> <dbl> <dbl> <dbl> <dbl>
1 2008-01-01 NA NA NA NA
2 2009-01-01 NA NA NA NA
3 2010-01-01 NA NA NA NA
4 2011-01-01 NA NA NA NA
5 2012-01-01 NA NA NA NA
6 2013-01-01 NA NA NA NA
7 2014-01-01 NA NA NA NA
8 2015-01-01 NA NA NA NA
9 2016-01-01 NA NA NA NA
10 2017-01-01 NA NA NA NA
11 2018-01-01 NA NA NA NA
In one of our internal projects, we currently use the condition
"T" %in% table$annotation_legend$annotation
to determine wether a direct aggregation via rowsum()
should be applied.
sc_browse()
and sc_browse_preferences()
$edit()
method to
It would be very useful if {STATcubeR}
could support "SDMX archives" which are generated from STATcube. sdmx archives consist of a metadata component called the "structure definition" and a data part which contains the actual cell values. In order to support that, we would need to add parsers for the xml-based data format.
The generated archives are more or less compatible with the CRAN package rsdmx: https://cran.r-project.org/package=rsdmx, which could be used as a starting point to develop parsers.
Possible usuage: parser function sdmx_table()
which generates an object of class sc_data
(the parent class for OGD and STATcube-API datasets)
x <- STATcubeR::sdmx_table("path/to/sdmx_archive.zip")
class(x)
#> [2] "sdmx_table" "sc_data" "R6"
There are several advantages of the sdmx format compared to the API
The last point is probably the most compelling one since a direct interface to SuperCROSS would be very helpful for the internal workflows of statistics austria
It might be a nice addition to make the current pkgdown articles (or some of them) also available as vignettes so that they can be used offline. Currently, the articles use some customizations (in vignettes/R/
) that might make this not 100% straight forward
Persumably, the tooltips should be disabled in the offline version because tippy.js
is currently loaded via a CDN. Dependencies to {fansi}
should also be avoided and possibly replaced with cli::ansi_html()
.
For quite some time there is a hidden feature that allows caching of API responses from the STATcube REST API. This is very useful for our internal web application and we will have to decide how we deal with this in the upcoming CRAN release.
Leaving it as a hidden feature might create a bad impression during reviews. Removing it would make it necessary to implement the caching logic elsewhere, which might be tricky. Therefore, it is probably best to document and export the behavior. Documentation is already available in ?sc_cache
.
One problem with the current implementation is that the hashes are created via serialize()
and therefore they are not reusable in different R versions. It would be very handy to use something like digest::digest()
but adding another dependency package just for the hashes seems unnecessary. Maybe tools::md5sum()
could be used in a different way to get a satisfying result.
TODOs
@export
to make the caching available without environment variables@internal
to include the man pages in the index page of the documentationod_cache_summary()
which provides an overview about the cache contents. This would probably require some kind of cache_index.csv
so we don't need to parse the cache entries.Allow users to switch to english responses (as opposed to german, which is the server standard) with a new parameter language
in sc_get_response()
and sc_saved_table()
.
Hopefully, this will only affect variable descriptions so the parsers (as.data.frame()
, sc_meta()
, ...) should not need any updates due to those changes.
Similar to #25 but for od_table()
. Currently, the caches use something like ~/.cache/STATcubeR/open_data/{id}.csv
which basically mimicks the file format from the servers. We will need a second cache directory or disable caching for the editing server.
od_table()
to switch between the external server and the editing serverDescribe the bug
Function od_catalogue()
throws an error, examples in Documentation don't work
To Reproduce
catalogue <- od_catalogue()
Fehler in strsplit(., "?id=") : nicht-character Argument
Expected behavior
Expected a data.frame containing metadata on the various datasets.
Environment
sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.3.0 (2023-04-21)
os Debian GNU/Linux 10 (buster)
system x86_64, linux-gnu
ui RStudio
language (EN)
collate de_AT.UTF-8
ctype de_AT.UTF-8
tz Europe/Vienna
date 2024-02-22
rstudio 2023.12.1+402 Ocean Storm (server)
pandoc NA
─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.0)
colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.0)
data.table * 1.15.0 2024-01-30 [1] CRAN (R 4.3.0)
dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.3.0)
evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.0)
fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.0)
forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.3.0)
fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.0)
generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0)
ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.3.0)
glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.0)
gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.0)
here 1.0.1 2020-12-13 [2] CRAN (R 4.2.1)
highr 0.10 2022-12-22 [1] CRAN (R 4.3.0)
hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.0)
knitr 1.45 2023-10-30 [1] CRAN (R 4.3.0)
lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.0)
lubridate * 1.9.3 2023-09-27 [1] CRAN (R 4.3.0)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.0)
pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0)
pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.0.3)
purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0)
readr * 2.1.5 2024-01-10 [1] CRAN (R 4.3.0)
rlang 1.1.3 2024-01-10 [1] CRAN (R 4.3.0)
rprojroot 2.0.4 2023-11-05 [1] CRAN (R 4.3.0)
rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.0)
scales 1.3.0 2023-11-28 [1] CRAN (R 4.3.0)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
STATcubeR * 0.5.0 2023-06-12 [1] Github (statistikat/STATcubeR@4537d3e)
stringi 1.8.3 2023-12-11 [1] CRAN (R 4.3.0)
stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.3.0)
tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.0)
tidyr * 1.3.1 2024-01-24 [1] CRAN (R 4.3.0)
tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0)
tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.3.0)
timechange 0.3.0 2024-01-18 [1] CRAN (R 4.3.0)
tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.0)
utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.0)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.0)
withr 3.0.0 2024-01-16 [1] CRAN (R 4.3.0)
xfun 0.41 2023-11-01 [1] CRAN (R 4.3.0)
[1] /home/zenz/R/x86_64-pc-linux-gnu-library/4.3
[2] /usr/local/lib/R/site-library
[3] /usr/lib/R/site-library
[4] /usr/lib/R/library
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
The classification files of OGD datasets contain an optional column "FK" (Foreign Key) which can point to the parent element of the clasification element. This allows a definition of hierarchical classifications. Currently, the FK column is ignored by STATcubeR but it could be used to automatically detect "total codes". Example:
Code | Name | FK |
---|---|---|
WEST | Label for West | TOTAL |
EAST | Label for East | TOTAL |
TOTAL | Label for Total |
Here, there is a singe classification element (Code: TOTAL
) and all other elements point to that element via FK
. In cases like this, it is reasonable to regard TOTAL
as the total code for this classification.
There are now multiple internal STATcube API servers running inside our firewalls. Extend the package in order to
sc_key()
In early versions, STATcubeR used to include annotations in the output of as.dataframe.sc_table()
. This was dropped when support for OGD Datasets was introduced in #11 . Back then, the annotations were included using separate columns.
It is planned to re-implement this feature in a slightly different manner using {tibble}
and {vctrs}
by providing a custom vector class that acts as a "annotated numeric". The result of printing those values should look something like this
Annotations should either replace the values while printing or use color coding to reference a specific annotation
The "annotation legend" (which color corresponds to which annotation) can then be included in the footer of the tibble. Some technical details
sc_tabulate()
and as.data.frame.sc_table()
should be to return simple tibbles that only include columns of type numeric
and factor
. Adding annotations should be "opt-in"as.numeric()
method which drops the annotations and returns a canonical double-typesc_tabulate()
is called in a way where aggregation via rowsums()
is necessary and annotations
is set to TRUE
, an error will be thrown.There have now been several requests to support filtering in sc_table_custom()
. Currently, the only way to do this is to generate the request.json
by hand.
library(STATcubeR)
schema <- sc_schema_db("detouextregsai")
region <- schema$`Other Classifications`$
`Tourism commune [ABO]`$`Regionale Gliederung (Ebene +1)`
request <- list(
database = schema$id,
dimensions = list(I(region$id)),
recodes = setNames(
list(list(
map = list(
I(region$Bregenzerwald$id),
I(region$`Vorarlberg Rest`$id),
I(region$`Bodensee-Vorarlberg`$id)
)
)),
region$id
)
)
jsonlite::write_json(request, "request.json", pretty = TRUE, auto_unbox = TRUE)
readLines("request.json") %>% cat(sep = "\n")
x <- sc_table("request.json", add_totals = FALSE)
x$tabulate()
It might be sensible to extend the functionality of sc_table_custom()
to support filters (or possibly other recodes) via additional parameters. The syntax might look like this
library(STATcubeR)
schema <- sc_schema_db("detouextregsai")
region <- schema$`Other Classifications`$
`Tourism commune [ABO]`$`Regionale Gliederung (Ebene +1)`
sc_table_custom(
schema,
region,
sc_recode(region, c(region$Bregenzerwald,
region$`Vorarlberg Rest`, region$`Bodensee-Vorarlberg`))
)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.