Coder Social home page Coder Social logo

Use of Data Packages about tnc HOT 8 CLOSED

tdwg avatar tdwg commented on July 17, 2024
Use of Data Packages

from tnc.

Comments (8)

mdoering avatar mdoering commented on July 17, 2024

I am very much interested in an application of the standard that allows to exchange data in CSV files as we can with DwC. Data packages appear to be the strongest candidate for an existing standard in that area to jump on to. But if TCS NG becomes a RDF only standard I am frankly rather disappointed.

from tnc.

baskaufs avatar baskaufs commented on July 17, 2024

@mdoering I just gave Data Packages a look and it's really interesting. Here are a few relevant points:

  • Section 4 of the Standards Documentation Specification is very careful to say that the machine-readable metadata for a standard MAY be expressed as RDF, but that other methods can be used as long as the same relationships are expressed in a machine-processable way. So RDF is not required.

  • I believe strongly that we should try to keep our standards definitions simple enough that they can be expressed as CSV tables. At this point, all of the existing TDWG vocabulary standards ARE simple enough to be expressed as CSV tables. I don't think that people have paid much attention to the rs.tdwg.org repo but it contains all of the information required to describe TDWG vocabularies from CSV data and to turn those CSV data into machine-readable RDF. In each of the folder, there's one core CSV file (like this one for the dwc: terms) and other files that describe how to map the table columns to well-known properties (like this one). I just took a look at the Table Schema information for Data Package and all of the information in the "other files" I just mention could be expressed as a Table Schema JSON file. So the Data Package system could be used to create CSV machine-readable files that are directly translatable to RDF and that would contain equivalent information.

  • Guid-O-Matic is the software that I wrote to turn CSV files into RDF serializations. I have been thinking of making a version 3.0 in Python, so maybe the Data Package specification would be the way to describe the CSV files. I see that they have a Python library, but didn't investigate what all it can do yet. One thing I don't know is how widely adopted Data Package is. Do you know?

  • I said earlier that all existing TDWG standards can be easily expressed in CSV tables. However, some of the models we are talking about so far in TNC are getting complicated to the point where that might be difficult. We should keep that in mind as we try to balance our desire to express complex ideas in the standard.

from tnc.

nielsklazenga avatar nielsklazenga commented on July 17, 2024

@mdoering, that is not at all the intention; we intent to make a specification that is broadly applicable, and not just because that's what the Vocabulary Maintenance Specification requires from TDWG standards. Your use case is definitely a very important one and is very much on our radar. If a lot of the examples were in Turtle that is just because it is easy to read and write and useful to quickly get an idea across. Speaking for myself, when I think about data models I think tables in a database and if I were to produce RDF with real data, it would be something else first (database tables) and there would be something else again (JSON) between the database table and the RDF..

The models we discussed so far may look complicated, but we really have been talking about only two classes/tables, so they aren't really (and I think everything you get into a database structure you can get into CSV). I am pretty sure I could shoehorn everything we discussed so far into the Darwin Core Taxon class if I had a good crack at it. I might just do that (not right now). The use I see for Label objects is in the interface between identifications and taxonomic names.

If I recall correctly, you were the one who suggested we should look at a domain model. We are definitely have a much closer look at serialization. I created this issue because I thought it would be good to have on the radar for the time when we really start looking at that (and because I had a look when you first mentioned it and it looked promising), but maybe it turns out timely to keep our eyes on the ball. Will try to use more different ways to show examples. To be fair, @baskaufs had CSV examples in the document in which he was spruiking the use of SKOS-XL.

from tnc.

nielsklazenga avatar nielsklazenga commented on July 17, 2024

Sorry @baskaufs, I had a better look at your post just now and see that you had already addressed pretty much everything that I just did.

If I provide you with a set of CSV files with data from a taxonomic revision, perhaps even in the form of a Data Package, would you be interested to do your Guid-O-Matic thing on it? I would be interested to see the result, both the CSV and the RDF.

from tnc.

baskaufs avatar baskaufs commented on July 17, 2024

Sure, I can give it a go. We just need to do some mapping of column headers to property URIs. The URIs can just be made-up; it doesn't matter if they are "real" or not.

from tnc.

nielsklazenga avatar nielsklazenga commented on July 17, 2024

Thanks @baskaufs. The time it took me to create my example for issue #30 made me realise it will take some time for me to get all the data together.

from tnc.

baskaufs avatar baskaufs commented on July 17, 2024

No problem. Just let me know...

from tnc.

nielsklazenga avatar nielsklazenga commented on July 17, 2024

I have added an example Data Package to the examples in this repository: /examples/datapackage.

from tnc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.