robsteranium / csvwr Goto Github PK

View Code? Open in Web Editor NEW

14.0 3.0 1.0 989 KB

Read and write CSV on the Web (csvw) tables and metadata in R

Home Page: https://robsteranium.github.io/csvwr

R 1.20% HTML 97.62% Ruby 0.39% Haml 0.79% Dockerfile 0.01%

csvw

csvwr's People

Contributors

Stargazers

Watchers

Forkers

hadley

csvwr's Issues

Support for csvw spec 'separator'

csvw can define 'separators' in field definitions e.g.

                    {
                        "datatype": "string",
                        "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#source",
                        "required": false,
                        "separator": ";",
                        "name": "Source"
                    }

...which means that the field should be parsed from "a;b" to something like c("a", "b"). It would be nice to support this.

how to read from multiple csv files?

Is there a way to read multiple csv files with one metadata json file?

Use-case working with temp files

What's the recommended way to work with files from the internet?

I'm providing my users with a notebook so they can explore a remote dataset. If I download my json and csv to temp files, the uri of the table breaks, so I'm having to download them to named local files which, ok might be the best for large files anyway, but if I'm wanting a toe in the water with a dataset, it would be cool to be able to just slurp a few lines of csv into a string and fire those into a dataframe...

Same issue if the uri of the table in the json file is absolute.

How do you imagine best practice for this sort of workflow? Thanks!

simpleError: `col_names` must be TRUE, FALSE or a character vector

Following the basic usage instructions gets me:

require(csvwr)
#> Loading required package: csvwr

d <- iris[sample(150,10),]
s <- derive_table_schema(d)
m <- create_metadata(tables=s)
j <- jsonlite::toJSON(m)
cat(j, file="metadata.json")
write.csv(d, 'data.csv')

read_csvw_dataframe("data.csv", "metadata.json")
#> $error
#> <simpleError: `col_names` must be TRUE, FALSE or a character vector>
#> 
#> $filename
#> [1] "data.csv"
#> 
#> $dialect
#> list()
#> 
#> $group_schema
#> list()

More flexible dialect parsing

Hi, I'm one of the creators of a data standard (CLDF) which is based on csvw. Unfortunately we decided to allow a dialect property specifying only the properties which deviate from the default dialect. E.g. here.
This seems not to work with current csvwr:

> d <- read_csvw("projects/glottolog/glottolog-cldf/cldf/cldf-metadata.json")
> d$tables[[1]]$dataframe
$error
<simpleError: Expected single integer value>

$filename
[1] "projects/glottolog/glottolog-cldf/cldf/cldf-metadata.json"

$dialect
$dialect$commentPrefix
NULL


$group_schema
list()

When I remove the dialect property from the metadata (falling back to the default) all is good:

> d <- read_csvw("projects/glottolog/glottolog-cldf/cldf/cldf-metadata.json")
                                                                                                                                                                                                                                                                                                                                                            > d$tables[[1]]$dataframe
# A tibble: 131,048 × 8
   ID     Language_ID Parameter_ID  Value  Code_ID Comment  Source codeReference
   <chr>  <chr>       <chr>         <chr>  <chr>   <chr>    <chr>  <chr>        
 1 more1… more1255    level         family level-… NA       NA     NA           
 2 more1… more1255    category      Family catego… NA       NA     NA           
 3 more1… more1255    subclassific… ((((b… NA      **hh:hv… hh:hv… NA           
 4 mong1… mong1349    level         family level-… NA       NA     NA           
 5 mong1… mong1349    category      Family catego… NA       NA     NA           
 6 mong1… mong1349    subclassific… (kita… NA      NA       NA     NA           
 7 kolp1… kolp1236    level         langu… level-… NA       NA     NA           
 8 kolp1… kolp1236    category      Spoke… catego… NA       NA     NA           
 9 kolp1… kolp1236    subclassific… (koll… NA      NA       NA     NA           
10 kolp1… kolp1236    aes           3      aes-sh… Kol (10… hh:he… NA           
# … with 131,038 more rows

If dialect properties would be merged in

csvwr/R/csvwr.R

Line 217 in 3c65f95

dialect <- dialect %||% default_dialect

rather than just choosing a fully specified custom dialect or the default, csvwr could handle our data out-of-the-box.

Would you consider switching to such a dialect parsing behaviour an option for csvwr?

It might also make sense to handle the other types too.

This puts more responsibility on the user to construct lists that comply with the spec (rather than having csvwr take responsibility for this).

robsteranium / csvwr Goto Github PK

csvwr's People

Contributors

Stargazers

Watchers

Forkers

csvwr's Issues

Support for csvw spec 'separator'

how to read from multiple csv files?

Use-case working with temp files

simpleError: `col_names` must be TRUE, FALSE or a character vector

More flexible dialect parsing

Support for factors

Fix URI template rendering

Handle all top-level objects in create_metadata?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent