Coder Social home page Coder Social logo

nschorgh / cavexml Goto Github PK

View Code? Open in Web Editor NEW
3.0 3.0 0.0 3 MB

Experimental implementation of a CaveXML standard to facilitate exchange of data about caves through the definition and implementation of a data interchange format.

License: Other

CSS 1.64% Python 86.28% XSLT 4.44% XQuery 7.64%
speleology dataset interchange-format

cavexml's Introduction

README

CaveXML is a data interchange format for the purpose of facilitating scientific research on caves. This repository provides: CaveXML element and data type definitions, an example database, and tools to work with CaveXML-formatted data.

The most important files:
cavexml.md (explains the CaveXML standard)
cavexml.xsd (XML Schema Definition of CaveXML)
allcaves-database.xml (Master version of the database in native format)
Derivatives/allcaves-database.csv (csv version of the database generated from the XML version)
Utilities/cavexml.py (Python functions for CaveXML)

Auxiliary files:
cavexml-db-table.css (minimalist Style Sheet so the XML database can be viewed in a webbrowser)
Derivatives/allcaves-database.md (Full records in Markdown format, generated from the XML version)
Derivatives/allcaves-database.rdf (XML/RDF version of database for KarstLink)
Derivatives/list-of-ice-caves.md (list of caves with permanent ice)
Derivatives/list-of-lava-tubes.md (list of volcanic caves)
Derivatives/list-of-longest-lava-tubes.md (list of lava tubes longer than 1km)
Derivatives/metadata.rdf (a file used by the RDF ontology)
Utilities/cavexml2csv.py (converts database to comma-separated-values using Python)
Utilities/cavexml2html.py (outputs full records in HTML format)
Utilities/cavexml2kml.py (converts coordinates into KML format)
Utilities/cavexml2md.py (creates filtered list of entries in Markdown format)
Utilities/cavexml2md-full.py (creates allcaves-database.md)
Utilities/CaveXML2rdf.xquery (converts database to RDF/XML using XQuery)
Utilities/CaveXML2rdf.py (converts database to RDF/XML using Python)
Utilities/cavexml-warnings.py (issues informative warnings)
Utilities/cavesystemfinder.py (connects cave branches with cave systems)
Utilities/filter.xquery (creates simple filtered list of entries using XQuery)
Utilities/reorder.xslt (sorts elements within each record)
Utilities/stats.py (extracts statistical information about a database)

This is a pilot project to explore the capabilities of a CaveXML implementation end-to-end. The actual database is for demonstration, and mainly contains ice caves and lava tubes.

Notes

cavexml.md describes the meaning of the XML elements used to organize the data, and restrictions for the entries.

For non-programmers:
The file allcaves-database.csv can be opened as a spreadsheet and contains a flattened version of the database. The Derivatives/ directory also contains other derived data products, such as allcaves-database.md where the full records can be viewed directly on GitHub. A search interface is available at https://tinyurl.com/cavexmlsearch and hosted on Google Colaboratory. It requires a Google account and a warning message will appear at the beginning of each session.

For programmers:
The Python programs in the Utilities/ directory serve as examples for how a CaveXML database can be loaded and analyzed within Python. Parsing functions for quasi-numerical entries are found in cavexml.py, which also contains many other functions useful for working with CaveXML data. The same directory also includes a few xquery scripts as an alternative to Python.

For data creators:
The XML schema definition cavexml.xsd incorporates all CaveXML requirements. The following validates a database against the Schema:

xmllint --schema cavexml.xsd allcaves-database.xml -noout  

Alternatively, various online tools can be used to validate an XML database against an XSD document. If you create a public CaveXML-formatted database, I would be happy to link to it and include it in the search path.Vice versa, programmers can write tools to search multiple databases.

Acknowledgments

Thanks to Jean-Marc Vanel for help with XML and XQuery - March 2021

cavexml's People

Contributors

nschorgh avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

cavexml's Issues

record identifiers

For unique RDF record identifiers, generate hash codes based on the first five elements (country-code to other-cave-name).

To do that, either use existing MD5 hash-generator in cavexml.py and generate auxiliary XML file before applying cavexml2rdf.xquery, or re-implement hash generation within XQuery script.

Add void:Dataset description in RDF/XML

For writing an RDF void:Dataset description, one can take inspiration from the current list of void:Dataset in KartsLink database :
https://data.grottocenter.org/history?uri=http%3A%2F%2Frdfs.org%2Fns%2Fvoid%23Dataset

What I propose is that you write the void:Dataset description in RDF/XML , as a file dataset.rdf in git .
Here is an example

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
>
  <foaf:Document rdf:about="http://data.grottocenter.org/ldp/1612772267045-338797407009437">
    <void:uriRegexPattern rdf:resource="https://data.grottocenter.org/ldp/thai/.*"/>
    <rdfs:label xml:lang="fr">Thai</rdfs:label>
    <void:exampleResource rdf:resource="https://data.grottocenter.org/ldp/thai/3428"/>
    <foaf:primaryTopic rdf:resource="http://dbpedia.org/resource/Speleology"/>
    <void:sparqlEndpoint rdf:resource="https://data.grottocenter.org/sparql"/>
    <void:uriSpace xml:lang="fr">https://data.grottocenter.org/ldp/thai/</void:uriSpace>
    <rdf:type rdf:resource="http://rdfs.org/ns/void#Dataset"/>
    <void:dataDump rdf:resource="https://ontology.uis-speleo.org/data/thai.csv.ttl.zip"/>
  </foaf:Document>
</rdf:RDF>

It would be good also to state the coverage of the cave dataset. My understanding is that they have been visited or somehow studied by you .

longitude range

On Earth, the range of geographic longitude is conventionally -180 to +180 degree, and CaveXML restricts it to that range. On the Moon and Mars, on the other hand, it's 0 to 360 degree. XML 1.0 doesn't allow for restrictions that are conditional on other fields, so this would require changing over to XML 1.1. A quick fix would be extend the allowed latitude range to [-180,+360].

RDF, references should be URI's , not litterals.

Currently for this XML source :

      <reference>https://rpif.asu.edu/LTdatabase/result.php?RECNO=1134</reference>
      <reference>Simons (1998) doi:10.5038/1827-806X.27.1.4</reference>
      <reference>Middleton (1999) http://www.vulcanospeleology.org/1998.pdf</reference>

we have this RDF :

    <dct:references>https://rpif.asu.edu/LTdatabase/result.php?RECNO=1134</dct:references>
    <dct:references>Simons (1998) doi:10.5038/1827-806X.27.1.4</dct:references>
    <dct:references>Middleton (1999) http://www.vulcanospeleology.org/1998.pdf</dct:references>

The first line , beginning with https: or http: , should become:

      <dct:references rdf:about="https://rpif.asu.edu/LTdatabase/result.php?RECNO=1134" />

For the other lines , not beginning with https: or http: , this is more complex:

  <karstlink:UndergroundCavity rdf:about=""" >
    <!-- ... etc unchanged -->
    <dct:references rdf:about="doi:10.5038/1827-806X.27.1.4" />
  </karstlink:UndergroundCavity>
  <foaf:Document rdf:about="doi:10.5038/1827-806X.27.1.4">
    <rdfs:label>Simons (1998)</rdfs:label>
  </foaf:Document>

XQUERY change

For the first line , beginning with https: or http: :

declare function local:processReferences($tags as element()*) as element()* {
  for $tag in $tags
    return local:processReference($tag)
};

declare function local:processReference($tag as element() ) as element()* {
  if($tag / text() != "" ) then
    if( fn:starts-with($tag / text(), "http://") or
        fn:starts-with($tag / text(), "https://")
    ) then
      <dct:references rdf:about="{$tag / text()}" />
    else
      element dct:references { $tag / text() }
  else ()
};
(: ............. :)
    { local:processTag( $rec/branch-name, "karstlink:relatedToUndergroundCavity" ) }
    { local:processReferences( $rec/reference ) }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.