Coder Social home page Coder Social logo

kit-data-manager / ro-crate-java Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 2.0 693 KB

Java library to create and modify RO-Crates.

Home Page: https://kit-data-manager.github.io/webpage/ro-crate-java/

License: Apache License 2.0

Java 98.36% HTML 1.64%
file-based folder-structure linked-data research-object ro-crate package-format

ro-crate-java's Introduction

ro-crate-java

Java CI with Gradle Coverage Status CodeQL Publish to Maven Central / OSSRH

A Java library to create and modify RO-Crates. The aim of this implementation is to not require too deep knowledge of the specification, and avoiding crates which do not fully comply to the specification, at the same time.

Use it in your application

Build the library / documentation

Build and run tests: ./gradlew build
Build documentation: ./gradlew javadoc

On Windows, replace ./gradlew with gradlew.bat.

RO-Crate Specification Compatibility

  • โœ… Version 1.1
  • ๐Ÿ› ๏ธ Version 1.2-DRAFT

Quick-start

Example for a basic crate from RO-Crate website

RoCrate roCrate = new RoCrateBuilder("name", "description", "datePublished", "licenseIdentifier").build();

Example adding a File (Data Entity) and a context pair

RoCrate roCrate = new RoCrateBuilder("name", "description", "datePublished", "licenseIdentifier")
    .addValuePairToContext("Station", "www.station.com")
    .addUrlToContext("contextUrl")
    .addDataEntity(
      new FileEntity.FileEntityBuilder()
        .setId("survey-responses-2019.csv")
        .addProperty("name", "Survey responses")
        .addProperty("contentSize", "26452")
        .addProperty("encodingFormat", "text/csv")
        .build()
    )
    .addDataEntity(...)
    ...
    .addContextualEntity(...)
    ...
    .build();

The library currently comes with three specialized DataEntities:

  1. DataSetEntity
  2. FileEntity (used in the example above)
  3. WorkflowEntity

If another type of DataEntity is required, the base class DataEntity can be used. Example:

new DataEntity.DataEntityBuilder()
    .addType("CreativeWork")
    .setId("ID")
    .addProperty("property from schema.org/Creativework", "value")
    .build();

Note that here you are supposed to add the type of your DataEntity because it is not known.

A DataEntity and its subclasses can have a file located on the web. Example:

Example adding file:

new FileEntity.FileEntityBuilder()
    .addContent(URI.create("https://github.com/kit-data-manager/ro-crate-java/issues/5"))
    .addProperty("description", "my new file that I added")
    .build();

A DataEntity and its subclasses can have a local file associated with them, instead of one located on the web (which link is the ID of the data entity). Example:

Example adding file:

new FileEntity.FileEntityBuilder()
    .addContent(Paths.get("file"), "new_file.txt")
    .addProperty("description", "my new local file that I added")
    .build();

Contextual Entities

Contextual entities cannot be associated with a file (they are pure metadata).

To add a contextual entity to a crate you use the function .addContextualEntity(ContextualEntity entity). Some types of derived/specializes entities are:

  1. OrganizationEntity
  2. PersonEntity
  3. PlaceEntity

If you need another type of contextual entity, use the base class ContextualEntity.

The library provides a way to automatically create contextual entities from external providers. Currently, support for ORCID and ROR is implemented. Example:

PersonEntity person = ORCIDProvider.getPerson("https://orcid.org/*")
OrganizationEntity organization = RORProvider.getOrganization("https://ror.org/*");

Writing Crate to folder or zip file

Writing to folder:

RoCrateWriter folderRoCrateWriter = new RoCrateWriter(new FolderWriter());
folderRoCrateWriter.save(roCrate, "destination");

Writing to zip file:

RoCrateWriter roCrateZipWriter = new RoCrateWriter(new ZipWriter());
roCrateZipWriter.save(roCrate, "destination");

More writing strategies can be implemented, if required.

Reading / importing Crate from folder or zip

Reading from folder:

RoCrateReader roCrateFolderReader = new RoCrateReader(new FolderReader());
RoCrate res = roCrateFolderReader.readCrate("source");

Reading from zip file:

RoCrateReader roCrateFolderReader = new RoCrateReader(new ZipReader());
RoCrate crate = roCrateFolderReader.readCrate("source");

RO-Crate Website (HTML preview file)

By setting the preview to an AutomaticPreview, the library will automatically create a preview using the ro-crate-html-js tool. It has to be installed using npm install --global ro-crate-html-js in order to use it. If you want to use a custom-made preview, you can set it using the CustomPreview class. AutomaticPreview is currently not set by default.

RoCrate roCrate = new RoCrateBuilder("name", "description", "datePublished", "licenseIdentifier")
    .setPreview(new AutomaticPreview())
    .build();

RO-Crate validation (machine-readable crate profiles)

Right now, the only implemented way of validating a RO-crate is to use a JSON-Schema that the crates metadata JSON file should match. JSON-Schema is an established standard and therefore a good choice for a crate profile. Example:

Validator validator = new Validator(new JsonSchemaValidation("./schema.json"));
boolean valid = validator.validate(crate);

Adapting the specification examples

This section describes how to generate the official specifications examples. Each example first shows the ro-crate-metadata.json and, below that, the required Java code to generate it.

{ "@context": "https://w3id.org/ro/crate/1.1/context", 
  "@graph": [

 {
    "@type": "CreativeWork",
    "@id": "ro-crate-metadata.json",
    "conformsTo": {"@id": "https://w3id.org/ro/crate/1.1"},
    "about": {"@id": "./"}
 },  
 {
    "@id": "./",
    "identifier": "https://doi.org/10.4225/59/59672c09f4a4b",
    "@type": "Dataset",
    "datePublished": "2017",
    "name": "Data files associated with the manuscript:Effects of facilitated family case conferencing for ...",
    "description": "Palliative care planning for nursing home residents with advanced dementia ...",
    "license": {"@id": "https://creativecommons.org/licenses/by-nc-sa/3.0/au/"}
 },
 {
  "@id": "https://creativecommons.org/licenses/by-nc-sa/3.0/au/",
  "@type": "CreativeWork",
  "description": "This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Australia License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/au/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.",
  "identifier": "https://creativecommons.org/licenses/by-nc-sa/3.0/au/",
  "name": "Attribution-NonCommercial-ShareAlike 3.0 Australia (CC BY-NC-SA 3.0 AU)"
 }
 ]
}

Here, everything is created manually. For the following examples, more convenient creation methods are used.

  RoCrate crate = new RoCrate();

    ContextualEntity license = new ContextualEntity.ContextualEntityBuilder()
        .addType("CreativeWork")
        .setId("https://creativecommons.org/licenses/by-nc-sa/3.0/au/")
        .addProperty("description", "This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Australia License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/au/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.")
        .addProperty("identifier", "https://creativecommons.org/licenses/by-nc-sa/3.0/au/")
        .addProperty("name", "Attribution-NonCommercial-ShareAlike 3.0 Australia (CC BY-NC-SA 3.0 AU)")
        .build();

    crate.setRootDataEntity(new RootDataEntity.RootDataEntityBuilder()
        .addProperty("identifier", "https://doi.org/10.4225/59/59672c09f4a4b")
        .addProperty("datePublished", "2017")
        .addProperty("name", "Data files associated with the manuscript:Effects of facilitated family case conferencing for ...")
        .addProperty("description", "Palliative care planning for nursing home residents with advanced dementia ...")
        .setLicense(license)
        .build());

    crate.setJsonDescriptor(new ContextualEntity.ContextualEntityBuilder()
        .setId("ro-crate-metadata.json")
        .addType("CreativeWork")
        .addIdProperty("about", "./")
        .addIdProperty("conformsTo", "https://w3id.org/ro/crate/1.1")
        .build()
    );
    crate.addContextualEntity(license);
{ "@context": "https://w3id.org/ro/crate/1.1/context",
  "@graph": [
    {
      "@type": "CreativeWork",
      "@id": "ro-crate-metadata.json",
      "conformsTo": {"@id": "https://w3id.org/ro/crate/1.1"},
      "about": {"@id": "./"}
    },  
    {
      "@id": "./",
      "@type": [
        "Dataset"
      ],
      "hasPart": [
        {
          "@id": "cp7glop.ai"
        },
        {
          "@id": "lots_of_little_files/"
        }
      ]
    },
    {
      "@id": "cp7glop.ai",
      "@type": "File",
      "name": "Diagram showing trend to increase",
      "contentSize": "383766",
      "description": "Illustrator file for Glop Pot",
      "encodingFormat": "application/pdf"
    },
    {
      "@id": "lots_of_little_files/",
      "@type": "Dataset",
      "name": "Too many files",
      "description": "This directory contains many small files, that we're not going to describe in detail."
    }
  ]
}

Here we use the inner builder classes for the construction of the crate. Doing so, the Metadata File Descriptor and the Root Data Entity entities are added automatically. setSource() is used to provide the actual location of these Data Entities (if they are not remote). The Data Entity file in the crate will have the name of the entity's ID.

  RoCrate crate = new RoCrate.RoCrateBuilder()
        .addDataEntity(
            new FileEntity.FileEntityBuilder()
                .addContent (Paths.get("path to file"), "cp7glop.ai")
                .addProperty("name", "Diagram showing trend to increase")
                .addProperty("contentSize", "383766")
                .addProperty("description", "Illustrator file for Glop Pot")
                .setEncodingFormat("application/pdf")
                .build()
        )
        .addDataEntity(
            new DataSetEntity.DataSetBuilder()
                .addContent (Paths.get("path_to_files"), "lots_of_little_files/")
                .addProperty("name", "Too many files")
                .addProperty("description", "This directory contains many small files, that we're not going to describe in detail.")
                .build()
        )
        .build();
{ "@context": "https://w3id.org/ro/crate/1.1/context",
  "@graph": [
    {
        "@type": "CreativeWork",
        "@id": "ro-crate-metadata.json",
        "conformsTo": {"@id": "https://w3id.org/ro/crate/1.1"},
        "about": {"@id": "./"}
  },  
  {
    "@id": "./",
    "@type": [
      "Dataset"
    ],
    "hasPart": [
      {
        "@id": "survey-responses-2019.csv"
      },
      {
        "@id": "https://zenodo.org/record/3541888/files/ro-crate-1.0.0.pdf"
      },
      ]
  },
  {
    "@id": "survey-responses-2019.csv",
    "@type": "File",
    "name": "Survey responses",
    "contentSize": "26452",
    "encodingFormat": "text/csv"
  },
  {
    "@id": "https://zenodo.org/record/3541888/files/ro-crate-1.0.0.pdf",
    "@type": "File",
    "name": "RO-Crate specification",
    "contentSize": "310691",
    "description": "RO-Crate specification",
    "encodingFormat": "application/pdf"
  }
]
}

The web resource does not use .setSource(), but uses the ID to indicate the file's location.

 RoCrate crate = new RoCrate.RoCrateBuilder()
        .addDataEntity(
            new FileEntity.FileEntityBuilder()
                .addContent (Paths.get("README.md"), "survey-responses-2019.csv")
                .addProperty("name", "Survey responses")
                .addProperty("contentSize", "26452")
                .setEncodingFormat("text/csv")
                .build()
        )
        .addDataEntity(
            new FileEntity.FileEntityBuilder()
                .addContent(URI.create("https://zenodo.org/record/3541888/files/ro-crate-1.0.0.pdf"))
                .addProperty("name", "RO-Crate specification")
                .addProperty("contentSize", "310691")
                .addProperty("description", "RO-Crate specification")
                .setEncodingFormat("application/pdf")
                .build()
        )
        .build();
{ "@context": "https://w3id.org/ro/crate/1.1/context",
  "@graph": [

    {
      "@type": "CreativeWork",
      "@id": "ro-crate-metadata.json",
      "conformsTo": {"@id": "https://w3id.org/ro/crate/1.1"},
      "about": {"@id": "./"},
      "description": "RO-Crate Metadata File Descriptor (this file)"
    },
    {
      "@id": "./",
      "@type": "Dataset",
      "name": "Example RO-Crate",
      "description": "The RO-Crate Root Data Entity",
      "datePublished": "2020",
      "license": {"@id": "https://spdx.org/licenses/CC-BY-NC-SA-4.0"},
      "hasPart": [
        {"@id": "data1.txt"},
        {"@id": "data2.txt"}
      ]
    },


    {
      "@id": "data1.txt",
      "@type": "File",
      "description": "One of hopefully many Data Entities",
      "author": {"@id": "#alice"},
      "contentLocation":  {"@id": "http://sws.geonames.org/8152662/"}
    },
    {
      "@id": "data2.txt",
      "@type": "File"
    },

    {
      "@id": "#alice",
      "@type": "Person",
      "name": "Alice",
      "description": "One of hopefully many Contextual Entities"
    },
    {
      "@id": "http://sws.geonames.org/8152662/",
      "@type": "Place",
      "name": "Catalina Park"
    }
 ]
}

If there is no special method for including relative entities (ID properties) one can use .addIdProperty("key","value").

 PersonEntity alice = new PersonEntity.PersonEntityBuilder()
        .setId("#alice")
        .addProperty("name", "Alice")
        .addProperty("description", "One of hopefully many Contextual Entities")
        .build();
    PlaceEntity park = new PlaceEntity.PlaceEntityBuilder()
        .addContent(URI.create("http://sws.geonames.org/8152662/"))
        .addProperty("name", "Catalina Park")
        .build();

    RoCrate crate = new RoCrate.RoCrateBuilder("Example RO-Crate", "The RO-Crate Root Data Entity", "2020", "https://spdx.org/licenses/CC-BY-NC-SA-4.0")
        .addContextualEntity(park)
        .addContextualEntity(alice)
        .addDataEntity(
            new FileEntity.FileEntityBuilder()
                .addContent(Paths.get("......."), "data2.txt")
                .build()
        )
        .addDataEntity(
            new FileEntity.FileEntityBuilder()
                .addContent(Paths.get("......."), "data1.txt")
                .addProperty("description", "One of hopefully many Data Entities")
                .addAuthor(alice.getId())
                .addIdProperty("contentLocation", park)
                .build()
        )
        .build();
{ "@context": "https://w3id.org/ro/crate/1.1/context", 
  "@graph": [
    {
      "@type": "CreativeWork",
      "@id": "ro-crate-metadata.json",
      "conformsTo": {"@id": "https://w3id.org/ro/crate/1.1"},
      "about": {"@id": "./"}
    },
    {
      "@id": "./",
      "@type": "Dataset",
      "name": "Example RO-Crate",
      "description": "The RO-Crate Root Data Entity",
      "datePublished": "2020",
      "license": {"@id": "https://spdx.org/licenses/CC-BY-NC-SA-4.0"},
      "hasPart": [
          { "@id": "workflow/alignment.knime" }
      ]
    },
    {
      "@id": "workflow/alignment.knime",  
      "@type": ["File", "SoftwareSourceCode", "ComputationalWorkflow"],
      "conformsTo": 
        {"@id": "https://bioschemas.org/profiles/ComputationalWorkflow/0.5-DRAFT-2020_07_21/"},
      "name": "Sequence alignment workflow",
      "programmingLanguage": {"@id": "#knime"},
      "creator": {"@id": "#alice"},
      "dateCreated": "2020-05-23",
      "license": { "@id": "https://spdx.org/licenses/CC-BY-NC-SA-4.0"},
      "input": [
        { "@id": "#36aadbd4-4a2d-4e33-83b4-0cbf6a6a8c5b"}
      ],
      "output": [
        { "@id": "#6c703fee-6af7-4fdb-a57d-9e8bc4486044"},
        { "@id": "#2f32b861-e43c-401f-8c42-04fd84273bdf"}
      ],
      "sdPublisher": {"@id": "#workflow-hub"},
      "url": "http://example.com/workflows/alignment",
      "version": "0.5.0"
    },
    {
      "@id": "#36aadbd4-4a2d-4e33-83b4-0cbf6a6a8c5b",
      "@type": "FormalParameter",
      "conformsTo": {"@id": "https://bioschemas.org/profiles/FormalParameter/0.1-DRAFT-2020_07_21/"},
      "name": "genome_sequence",
      "valueRequired": true,
      "additionalType": {"@id": "http://edamontology.org/data_2977"},
      "format": {"@id": "http://edamontology.org/format_1929"}
    },
    {
      "@id": "#6c703fee-6af7-4fdb-a57d-9e8bc4486044",
      "@type": "FormalParameter",
      "conformsTo": {"@id": "https://bioschemas.org/profiles/FormalParameter/0.1-DRAFT-2020_07_21/"},
      "name": "cleaned_sequence",
      "additionalType": {"@id": "http://edamontology.org/data_2977"},
      "encodingFormat": {"@id": "http://edamontology.org/format_2572"}
    },
    {
      "@id": "#2f32b861-e43c-401f-8c42-04fd84273bdf",
      "@type": "FormalParameter",
      "conformsTo": {"@id": "https://bioschemas.org/profiles/FormalParameter/0.1-DRAFT-2020_07_21/"},
      "name": "sequence_alignment",
      "additionalType": {"@id": "http://edamontology.org/data_1383"},
      "encodingFormat": {"@id": "http://edamontology.org/format_1982"}
    },
    {
      "@id": "https://spdx.org/licenses/CC-BY-NC-SA-4.0",
      "@type": "CreativeWork",
      "name": "Creative Commons Attribution Non Commercial Share Alike 4.0 International",
      "alternateName": "CC-BY-NC-SA-4.0"
    },
    {
      "@id": "#knime",
      "@type": "ProgrammingLanguage",
      "name": "KNIME Analytics Platform",
      "alternateName": "KNIME",
      "url": "https://www.knime.com/whats-new-in-knime-41",
      "version": "4.1.3"
    },
    {
      "@id": "#alice",
      "@type": "Person",
      "name": "Alice Brown"
    },
    {
      "@id": "#workflow-hub",
      "@type": "Organization",
      "name": "Example Workflow Hub",
      "url":"http://example.com/workflows/"
    },
    {
      "@id": "http://edamontology.org/format_1929",
      "@type": "Thing",
      "name": "FASTA sequence format"
    },
    {
      "@id": "http://edamontology.org/format_1982",
      "@type": "Thing",
      "name": "ClustalW alignment format"
    },
    {
      "@id": "http://edamontology.org/format_2572",
      "@type": "Thing",
      "name": "BAM format"
    },
    {
      "@id": "http://edamontology.org/data_2977",
      "@type": "Thing",
      "name": "Nucleic acid sequence"
    },
    {
      "@id": "http://edamontology.org/data_1383",
      "@type": "Thing",
      "name": "Nucleic acid sequence alignment"
    }
  ]
}
   ContextualEntity license = new ContextualEntity.ContextualEntityBuilder()
        .addType("CreativeWork")
        .setId("https://spdx.org/licenses/CC-BY-NC-SA-4.0")
        .addProperty("name", "Creative Commons Attribution Non Commercial Share Alike 4.0 International")
        .addProperty("alternateName", "CC-BY-NC-SA-4.0")
        .build();
    ContextualEntity knime = new ContextualEntity.ContextualEntityBuilder()
        .setId("#knime")
        .addType("ProgrammingLanguage")
        .addProperty("name", "KNIME Analytics Platform")
        .addProperty("alternateName", "KNIME")
        .addProperty("url", "https://www.knime.com/whats-new-in-knime-41")
        .addProperty("version", "4.1.3")
        .build();
    OrganizationEntity workflowHub = new OrganizationEntity.OrganizationEntityBuilder()
        .setId("#workflow-hub")
        .addProperty("name", "Example Workflow Hub")
        .addProperty("url", "http://example.com/workflows/")
        .build();
    ContextualEntity fasta = new ContextualEntity.ContextualEntityBuilder()
        .setId("http://edamontology.org/format_1929")
        .addType("Thing")
        .addProperty("name", "FASTA sequence format")
        .build();
    ContextualEntity clustalW = new ContextualEntity.ContextualEntityBuilder()
        .setId("http://edamontology.org/format_1982")
        .addType("Thing")
        .addProperty("name", "ClustalW alignment format")
        .build();
    ContextualEntity ban = new ContextualEntity.ContextualEntityBuilder()
        .setId("http://edamontology.org/format_2572")
        .addType("Thing")
        .addProperty("name", "BAM format")
        .build();
    ContextualEntity nucSec = new ContextualEntity.ContextualEntityBuilder()
        .setId("http://edamontology.org/data_2977")
        .addType("Thing")
        .addProperty("name", "Nucleic acid sequence")
        .build();
    ContextualEntity nucAlign = new ContextualEntity.ContextualEntityBuilder()
        .setId("http://edamontology.org/data_1383")
        .addType("Thing")
        .addProperty("name", "Nucleic acid sequence alignment")
        .build();
    PersonEntity alice = new PersonEntity.PersonEntityBuilder()
        .setId("#alice")
        .addProperty("name", "Alice Brown")
        .build();
    ContextualEntity requiredParam = new ContextualEntity.ContextualEntityBuilder()
        .addType("FormalParameter")
        .setId("#36aadbd4-4a2d-4e33-83b4-0cbf6a6a8c5b")
        .addProperty("name", "genome_sequence")
        .addProperty("valueRequired", true)
        .addIdProperty("conformsTo", "https://bioschemas.org/profiles/FormalParameter/0.1-DRAFT-2020_07_21/")
        .addIdProperty("additionalType", nucSec)
        .addIdProperty("encodingFormat", fasta)
        .build();

    ContextualEntity clnParam = new ContextualEntity.ContextualEntityBuilder()
        .addType("FormalParameter")
        .setId("#6c703fee-6af7-4fdb-a57d-9e8bc4486044")
        .addProperty("name", "cleaned_sequence")
        .addIdProperty("conformsTo", "https://bioschemas.org/profiles/FormalParameter/0.1-DRAFT-2020_07_21/")
        .addIdProperty("additionalType", nucSec)
        .addIdProperty("encodingFormat", ban)
        .build();

    ContextualEntity alignParam = new ContextualEntity.ContextualEntityBuilder()
        .addType("FormalParameter")
        .setId("#2f32b861-e43c-401f-8c42-04fd84273bdf")
        .addProperty("name", "sequence_alignment")
        .addIdProperty("conformsTo", "https://bioschemas.org/profiles/FormalParameter/0.1-DRAFT-2020_07_21/")
        .addIdProperty("additionalType", nucAlign)
        .addIdProperty("encodingFormat", clustalW)
        .build();

    RoCrate crate = new RoCrate.RoCrateBuilder("Example RO-Crate", "The RO-Crate Root Data Entity", "2020", "https://spdx.org/licenses/CC-BY-NC-SA-4.0")
        .addContextualEntity(license)
        .addContextualEntity(knime)
        .addContextualEntity(workflowHub)
        .addContextualEntity(fasta)
        .addContextualEntity(clustalW)
        .addContextualEntity(ban)
        .addContextualEntity(nucSec)
        .addContextualEntity(nucAlign)
        .addContextualEntity(alice)
        .addContextualEntity(requiredParam)
        .addContextualEntity(clnParam)
        .addContextualEntity(alignParam)
        .addDataEntity(
            new WorkflowEntity.WorkflowEntityBuilder()
                .setId("workflow/alignment.knime")
                .setSource(new File("src"))
                .addIdProperty("conformsTo", "https://bioschemas.org/profiles/ComputationalWorkflow/0.5-DRAFT-2020_07_21/")
                .addProperty("name", "Sequence alignment workflow")
                .addIdProperty("programmingLanguage", "#knime")
                .addAuthor("#alice")
                .addProperty("dateCreated", "2020-05-23")
                .setLicense("https://spdx.org/licenses/CC-BY-NC-SA-4.0")
                .addInput("#36aadbd4-4a2d-4e33-83b4-0cbf6a6a8c5b")
                .addOutput("#6c703fee-6af7-4fdb-a57d-9e8bc4486044")
                .addOutput("#2f32b861-e43c-401f-8c42-04fd84273bdf")
                .addProperty("url", "http://example.com/workflows/alignment")
                .addProperty("version", "0.5.0")
                .addIdProperty("sdPublisher", "#workflow-hub")
                .build()

        )
        .build();

ro-crate-java's People

Contributors

code42cate avatar dependabot[bot] avatar nikolatzotchev avatar pfeil avatar sabrineche avatar thomasjejkal avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

ro-crate-java's Issues

Default constructor creates invalid crate

This is a really small issue, but if you create an empty RO Crate with the builder and then print the JSON metadata

RoCrate crate = new RoCrate.RoCrateBuilder().build();
System.out.println(crate.getJsonMetadata());

You get this:

{
   "@context":"https://w3id.org/ro/crate/1.1/context",
   "@graph":[
      {
         "@id":"./",
         "@type":"Dataset"
      },
      {
         "about":{
            "@id":"./"
         },
         "conformsTo":{
            "@id":"https://w3id.org/ro/crate/1.1"
         },
         "@id":"ro-crate-metadata.json",
         "@type":"CreativeWork"
      }
   ]
}

This seems to be the minium valid RO Crate.

But if you do the same with the RoCrate constructor:

RoCrate crate = new RoCrate();
System.out.println(crate.getJsonMetadata());

You get this:

{
   "@context":"https://w3id.org/ro/crate/1.1/context",
   "@graph":[
      null,
      null
   ]
}

Shouldn't they be the same?

Adding urls/pairs to the context without RoCrateBuilder

There is no way to add to the the context without using the RoCrateBuilder (which is not possible when using the RoCrateReader).

There are probably more features missing, and the nicest fix IMO would be to extend the RoCrateBuilder to allow using an existing RoCrate as a template.

RoCrate crate = new RoCrate.RoCrateBuilder(existingCrate)[...]build();

Multiple conformsTo values

First of all, thanks for the library!

Currently the library targets RO-Crate 1.1 and the current behaviour regarding the use of conformsTo is implemented according to this:

https://www.researchobject.org/ro-crate/1.1/root-data-entity.html#finding-the-root-data-entity

if the conformsTo property is a URI that starts with https://w3id.org/ro/crate/
...

This is implemented here and expects to have a single object for conformsTo with an @id value:

JsonNode type = node.get("conformsTo");
if (type != null) {
String uri = type.get("@id").asText();

In the proposed 1.2 draft the conformsTo may be an array of such objects.

https://www.researchobject.org/ro-crate/1.2-DRAFT/profiles.html

It is valid for a crate to conform to multiple profiles, in which case conformsTo is an unordered array.

We try to use this multiple profile support of RO-Crate 1.2 in our project, so it would be great if you could support multiple conformsTo values. Currently, it throws as error because the type.get("@id").asText() will fail when type is an array.

ZipReader unpacks in ./temp

  • should be unpacked in subfolder so different imported crates do not interfere
  • the content should be removed at some point, currently it is "on java exit"? Question is who is responsible for deletion. The reader may create a Crate Object, and it does not directly have a relation to the folder. So, as soon as one (or multiple) crates have been created, it is not anymore in the readers responsibility. We need to add some control on this, I think, and we need to see which possibilities we have in Java. Thoughts:
    • Recently we fixed an issue that a reader can not be used multiple times to read a crate. This might have been on purpose, though? Is this why the content is being deleted so late, because now the reader can love so long? @Code42Cate
    • Responsibility could be given by some kind of file lock, which is initially at the reader, and then at the exported crate. Further crates on the same folder then can not be created.

Publish the library on a maven/gradle compatible repository.

People should be able to use it more easily in their software, so we should have the pre-built library in a repository.

Version: As we have some minor open issues, I am considering staying with 1.0-SNAPSHOT or even 0.9 or something like that.

Missing points in documentation

The API is documented by examples and javadocs (the non-trivial-parts at least) and is pretty straight forward. But there is space for improvements, as far as I can see:

Internals docs

  • How is import different from export, which class members store what in which case?
  • How are invalid crates being handled on import? What is being tolerated (with warnings), what kind of invalid crates can we not import?
  • When exactly are which parts being validated against what?

External docs

  • We need a better Getting Started Guide, see for example #41

Removing urls/key-value pairs from the context is not supported

We can already add Urls and key-value pairs to the context, but there is no way to delete them from existing crates. (Also no way to read them besides raw JSON, but that's a different issue)

I propose adding these two functions to the RoCrate:

public void deleteUrlFromContext(String Url)
public void deleteValuePairFromContext(String key)

I am going to make a PR for this later today.

Encoded file paths do not work

According to paragraph 7.2.1 (Encoding file paths) all file paths may be encoded. Currently, the library does not decode any URLs. This leads to a few errors while reading crates, especially while checking if files exist and when creating a DataEntity. There are probably a lot of places where this needs to be fixed

Note that all @id identifiers must be valid URI references, care must be taken to express
any relative paths using / separator, correct casing, and escape special characters like
space ( %20 ) and percent ( %25 ), for instance a File Data Entity from the Windows path
Results and Diagrams\almost-50%.png becomes "@id":
"Results%20and%20Diagrams/almost-50%25.png" in the RO-Crate JSON-LD.


  • decode URLs when we need them in decoded form. Example: for URL validation (or consider encoded URLs in validation) #9
  • encode URLs when creating/exporting a crate #67
  • Write tests to see if importing both, encoded and non-encoded urls work #9
  • Write tests to make sure that URLs get encoded (and decoding results in the same url), but other IDs stay the same #67

Removing properties from entities is too complicated

AFAIK, the easiest way to delete a property by name from an entity is this one:

entity.setProperties(entity.getProperties().remove("property_name"));

I don't like that we have to work with JsonNodes to delete properties, maybe adding a very simple

void removeProperty(String key)

would be nicer? I think we would only need to add this to the AbstractEntity class.

Unreferenced data entites are invalid

According to the specification, every Data Entity MUST be linked to the Root Data Entity:

Where files and folders are represented as Data Entities in the RO-Crate JSON-LD, these MUST be linked to, either directly or indirectly, from the Root Data Entity using the hasPart property.

Adding a Data Entity like this is valid according to the library:

crate.addDataEntity(new DataSetEntity.DataSetBuilder().setId("./some_file").build(), false);

Now at this point, the crate is not valid. If you add a DataSetEntity that has a hasPart property linking to ./some_file and the Root Data Entity has a hasPart property linking to this DataSetEntity it would be valid again.

Adding an entity with the same @id as RootDataEntity/JsonDescriptor should not be allowed

You can do

crate.addDataEntity(new DataSetEntity.DataSetBuilder().setId("./").addProperty("not_root", true).build(), true);

which then leads to this JSON:

{
   "@context":"https://w3id.org/ro/crate/1.1/context",
   "@graph":[
      {
         "@id":"./",
         "@type":"Dataset",
         "hasPart":{
            "@id":"./"
         }
      },
      {
         "about":{
            "@id":"./"
         },
         "conformsTo":{
            "@id":"https://w3id.org/ro/crate/1.1"
         },
         "@id":"ro-crate-metadata.json",
         "@type":"CreativeWork"
      },
      {
         "not_root":true,
         "@id":"./",
         "@type":"Dataset"
      }
   ]
}

The same can happen if you use ro-crate-metadata.json as Id. This should not be possible!

Transmitting an RoCrate instance over a network

I am attempting to send RoCrates over HTTP and AMQP, this appears to require serializing and deserializing RoCrates to JSON /
Byte array.

I have attempted to use FasterXML Jackson but was unable to use successfully readValue / convertValue to create RoCrate objects (see error below).

Is there another way to serializing and deserializing RoCrates to be transmitted over the network.

Stuart

com.fasterxml.jackson.databind.exc.InvalidDefinitionException: Cannot construct instance of edu.kit.datamanager.ro_crate.entities.data.RootDataEntity (no Creators, like default constructor, exist): cannot deserialize from Object value (no delegate- or property-based Creator)
at [Source: (String)"{"rootDataEntity":{"name":"Request","description":"74e0d9a7-f175-450d-8726-73c3d07579c0","@id":"./","@type":"Dataset","hasPart":[{"@id":"http://example.org/8447fd4d-fbb8-45cf-9c55-f590ed775eb0"},{"@id":"http://example.org/35f215f2-2aec-42a7-bea9-c01e31ed3dcb"}]},"jsonDescriptor":{"about":{"@id":"./"},"conformsTo":{"@id":"https://w3id.org/ro/crate/1.1"},"@id":"ro-crate-metadata.json","@type":"CreativeWork"},"untrackedFiles":[],"allDataEntities":[{"Answer to the Ultimate Question of Life, The Univ"[truncated 1097 chars]; line: 1, column: 20] (through reference chain: edu.kit.datamanager.ro_crate.RoCrate["rootDataEntity"])

Re-evaluate group and package names.

  • In build.gradle it says group 'org.example'. It should be changed, but I am not sure which parts exactly it will affect.
  • Also, the library uses edu.kit.crate as the high level package name. Should we reconsider this to be e.g. edu.kit.ro_crate? Or even edu.kit.datamanager.ro_crate?

Guidance implementation for creating Profile Crates and using them for validation

Note:
This is not yet part of the specification! What we do have already, is support for multiple "conformsTo" values, meaning multiple profiles being specified for a crate. Please remember, both are features of the 1.2-DRAFT specification and there is no guarantee they will make it into the final version.

If you have an urgent need, feel free to discuss further proceeding here. Code contributions are always welcome. What may implement at some point:

  • A specific builder pattern for the suggested profile crates
  • A specific validator supporting the suggested methods for describing the crate using a profile crate

Resources:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.