robsteranium / csvwr Goto Github PK
View Code? Open in Web Editor NEWRead and write CSV on the Web (csvw) tables and metadata in R
Home Page: https://robsteranium.github.io/csvwr
Read and write CSV on the Web (csvw) tables and metadata in R
Home Page: https://robsteranium.github.io/csvwr
csvw can define 'separators' in field definitions e.g.
{
"datatype": "string",
"propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#source",
"required": false,
"separator": ";",
"name": "Source"
}
...which means that the field should be parsed from "a;b" to something like c("a", "b"). It would be nice to support this.
Is there a way to read multiple csv files with one metadata json file?
What's the recommended way to work with files from the internet?
I'm providing my users with a notebook so they can explore a remote dataset. If I download my json and csv to temp files, the uri of the table breaks, so I'm having to download them to named local files which, ok might be the best for large files anyway, but if I'm wanting a toe in the water with a dataset, it would be cool to be able to just slurp a few lines of csv into a string and fire those into a dataframe...
Same issue if the uri of the table in the json file is absolute.
How do you imagine best practice for this sort of workflow? Thanks!
Following the basic usage instructions gets me:
require(csvwr)
#> Loading required package: csvwr
d <- iris[sample(150,10),]
s <- derive_table_schema(d)
m <- create_metadata(tables=s)
j <- jsonlite::toJSON(m)
cat(j, file="metadata.json")
write.csv(d, 'data.csv')
read_csvw_dataframe("data.csv", "metadata.json")
#> $error
#> <simpleError: `col_names` must be TRUE, FALSE or a character vector>
#>
#> $filename
#> [1] "data.csv"
#>
#> $dialect
#> list()
#>
#> $group_schema
#> list()
Hi, I'm one of the creators of a data standard (CLDF) which is based on csvw. Unfortunately we decided to allow a dialect
property specifying only the properties which deviate from the default dialect. E.g. here.
This seems not to work with current csvwr
:
> d <- read_csvw("projects/glottolog/glottolog-cldf/cldf/cldf-metadata.json")
> d$tables[[1]]$dataframe
$error
<simpleError: Expected single integer value>
$filename
[1] "projects/glottolog/glottolog-cldf/cldf/cldf-metadata.json"
$dialect
$dialect$commentPrefix
NULL
$group_schema
list()
When I remove the dialect
property from the metadata (falling back to the default) all is good:
> d <- read_csvw("projects/glottolog/glottolog-cldf/cldf/cldf-metadata.json")
> d$tables[[1]]$dataframe
# A tibble: 131,048 × 8
ID Language_ID Parameter_ID Value Code_ID Comment Source codeReference
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 more1… more1255 level family level-… NA NA NA
2 more1… more1255 category Family catego… NA NA NA
3 more1… more1255 subclassific… ((((b… NA **hh:hv… hh:hv… NA
4 mong1… mong1349 level family level-… NA NA NA
5 mong1… mong1349 category Family catego… NA NA NA
6 mong1… mong1349 subclassific… (kita… NA NA NA NA
7 kolp1… kolp1236 level langu… level-… NA NA NA
8 kolp1… kolp1236 category Spoke… catego… NA NA NA
9 kolp1… kolp1236 subclassific… (koll… NA NA NA NA
10 kolp1… kolp1236 aes 3 aes-sh… Kol (10… hh:he… NA
# … with 131,038 more rows
If dialect properties would be merged in
Line 217 in 3c65f95
csvwr
could handle our data out-of-the-box.
Would you consider switching to such a dialect parsing behaviour an option for csvwr
?
Templates with {#var}
are rewritten as #{var}
, this doesn't comply with the spec for fragment expansion if the var is undefined.
Templates with an absolute URL e.g. http://example.org/countries.csv{#countryCode}
have {+url}
prepended erroneously (see e.g. "manifest-validation#test030").
The spec permits a few top level objects: table group, table, schema, dialect or transformation.
It would be good if create_metadata
and read_csvw
could support tables and not just table groups, perhaps switching depending on whether a property named "url" vs "tables" is present.
It might also make sense to handle the other types too.
This puts more responsibility on the user to construct lists that comply with the spec (rather than having csvwr take responsibility for this).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.