Coder Social home page Coder Social logo

csvwr's People

Contributors

hadley avatar robsteranium avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

hadley

csvwr's Issues

Support for csvw spec 'separator'

csvw can define 'separators' in field definitions e.g.

                    {
                        "datatype": "string",
                        "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#source",
                        "required": false,
                        "separator": ";",
                        "name": "Source"
                    }

...which means that the field should be parsed from "a;b" to something like c("a", "b"). It would be nice to support this.

Use-case working with temp files

What's the recommended way to work with files from the internet?

I'm providing my users with a notebook so they can explore a remote dataset. If I download my json and csv to temp files, the uri of the table breaks, so I'm having to download them to named local files which, ok might be the best for large files anyway, but if I'm wanting a toe in the water with a dataset, it would be cool to be able to just slurp a few lines of csv into a string and fire those into a dataframe...

Same issue if the uri of the table in the json file is absolute.

How do you imagine best practice for this sort of workflow? Thanks!

simpleError: `col_names` must be TRUE, FALSE or a character vector

Following the basic usage instructions gets me:

require(csvwr)
#> Loading required package: csvwr

d <- iris[sample(150,10),]
s <- derive_table_schema(d)
m <- create_metadata(tables=s)
j <- jsonlite::toJSON(m)
cat(j, file="metadata.json")
write.csv(d, 'data.csv')

read_csvw_dataframe("data.csv", "metadata.json")
#> $error
#> <simpleError: `col_names` must be TRUE, FALSE or a character vector>
#> 
#> $filename
#> [1] "data.csv"
#> 
#> $dialect
#> list()
#> 
#> $group_schema
#> list()

More flexible dialect parsing

Hi, I'm one of the creators of a data standard (CLDF) which is based on csvw. Unfortunately we decided to allow a dialect property specifying only the properties which deviate from the default dialect. E.g. here.
This seems not to work with current csvwr:

> d <- read_csvw("projects/glottolog/glottolog-cldf/cldf/cldf-metadata.json")
> d$tables[[1]]$dataframe
$error
<simpleError: Expected single integer value>

$filename
[1] "projects/glottolog/glottolog-cldf/cldf/cldf-metadata.json"

$dialect
$dialect$commentPrefix
NULL


$group_schema
list()

When I remove the dialect property from the metadata (falling back to the default) all is good:

> d <- read_csvw("projects/glottolog/glottolog-cldf/cldf/cldf-metadata.json")
                                                                                                                                                                                                                                                                                                                                                            > d$tables[[1]]$dataframe
# A tibble: 131,048 × 8
   ID     Language_ID Parameter_ID  Value  Code_ID Comment  Source codeReference
   <chr>  <chr>       <chr>         <chr>  <chr>   <chr>    <chr>  <chr>        
 1 more1more1255    level         family level-NA       NA     NA           
 2 more1more1255    category      Family categoNA       NA     NA           
 3 more1more1255    subclassific… ((((bNA      **hh:hvhh:hvNA           
 4 mong1mong1349    level         family level-NA       NA     NA           
 5 mong1mong1349    category      Family categoNA       NA     NA           
 6 mong1mong1349    subclassific… (kitaNA      NA       NA     NA           
 7 kolp1kolp1236    level         langulevel-NA       NA     NA           
 8 kolp1kolp1236    category      SpokecategoNA       NA     NA           
 9 kolp1kolp1236    subclassific… (kollNA      NA       NA     NA           
10 kolp1kolp1236    aes           3      aes-sh… Kol (10hh:heNA           
# … with 131,038 more rows

If dialect properties would be merged in

csvwr/R/csvwr.R

Line 217 in 3c65f95

dialect <- dialect %||% default_dialect

rather than just choosing a fully specified custom dialect or the default, csvwr could handle our data out-of-the-box.

Would you consider switching to such a dialect parsing behaviour an option for csvwr?

Fix URI template rendering

Templates with {#var} are rewritten as #{var}, this doesn't comply with the spec for fragment expansion if the var is undefined.

Templates with an absolute URL e.g. http://example.org/countries.csv{#countryCode} have {+url} prepended erroneously (see e.g. "manifest-validation#test030").

Handle all top-level objects in create_metadata?

The spec permits a few top level objects: table group, table, schema, dialect or transformation.

It would be good if create_metadata and read_csvw could support tables and not just table groups, perhaps switching depending on whether a property named "url" vs "tables" is present.

It might also make sense to handle the other types too.

This puts more responsibility on the user to construct lists that comply with the spec (rather than having csvwr take responsibility for this).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.