Coder Social home page Coder Social logo

atescomp / rdf-transform Goto Github PK

View Code? Open in Web Editor NEW
19.0 5.0 6.0 76.05 MB

RDF Transform is an extension for OpenRefine to transform data into RDF formats.

License: Other

Java 38.10% JavaScript 24.81% HTML 35.58% CSS 1.50%
rdf knowledge-graph transformer semantic-web openrefine

rdf-transform's Introduction

RDF Transform

Build NoteOn failed builds, Maven repositories may need to be reset. Review Actions tab for issues. If needed, run the "Maven Reset Dependencies" workflow.

Introduction

This project uses a graphical user interface (GUI) for transforming OpenRefine project data to RDF-based formats. The transform maps the data with a template graph designed using the GUI.

RDF Transform is based on the venerable "RDF Extension" (grefine-rdf-extension). However, it has been thoroughly rewritten to incorporate the newer Java and JavaScript technologies, techniques, and processing enhancements.

The latest releases (2.2.0 and above) only work with OpenRefine 3.6 or better.

Documentation

See the wiki for more information.

Download

See the Install page Prerequisites on the wiki for important Java version information.

Latest Release

RDF Transform v2.2.1

Previous Releases

RDF Transform v2.2.0
RDF Transform v2.1.1-beta
RDF Transform v2.1.0-beta
RDF Transform v2.0.5-alpha
RDF Transform v2.0.4-beta
RDF Transform v2.0.3-alpha
RDF Transform v2.0.2-alpha
RDF Transform v2.0.1-alpha
RDF Transform v2.0.0-alpha

Install

See the Install page on the wiki for more information.

Issues

General interaction issue with OpenRefine versions, Web Browsers, OSes, etc., not specifically code related.

NOTE: It is recommended that you have an active Internet connection when using the extension as it can download ontologies from specified namespaces (such as rdf, rdfs, owl and foaf). You can (re)add namespaces and specify whether to download the ontology (or not) from the namespace declaration URL. If you must run OpenRefine from an offline location, you can copy the ontologies to files in your offline space and use the "from file" feature to load the ontologies.

OpenRefine

As an extension, RDF Transform runs under the control of OpenRefine and its JVM. As such, the libraries included with OpenRefine override any of the same libraries included with the extension. This limits the extension to OpenRefine's version of those library functions and features.

The latest releases (2.2.0 and above) only work with OpenRefine 3.6 or better due to upgraded Apache Jena library features that are not backward compatible.

See the wiki for more information.

OSes

See the Install page on the wiki for related information.

Linux

RDF Transform has been tested against OpenRefine 3.5.2 and above on a modern Debian-based OS (Ubuntu derivative) using Chrome. No system related issue were found under these conditions.

Windows

Test runs on MS Windows 10 have indicated the JVM opertate slightly different than on Linux. The MS Windows version tends to be more sensitive to certain statements.

  1. The version of Simile Butterfly that processes the limited server-side JavaScript engine can fail on unused declarative statements such as "importPackage()". If the package is not found, Windows systems may silently fail to run any following statements whereas Linux systems will continue. To mitigate against server-side JavaScript issues, all possible server-side JavaScript code has been migrated to Java.
  2. The JVM relies on OS specific services to process network connections. It may process web-based content negotiation differently on a particular OS. On Windows, if the URL does not produce the expected response, negotiation and the related response processing may lock the process for an unreasonably long time whereas Linux may fail safe and quickly. To mitigate against web content negotiation issues, a Faulty Content Negotiation processor is used identify known fault intolerant processing. As faults become known, they are added to the processor.

Mac

In all instances, the MacOS versions of OpenRefine are currently bundled with Java 8 JRE. Since RDF Transform requires Java 11 to 17, the bundled Java should be overridden with:

  1. A later Java install, preferably Java 11 JDK or Java 17 JDK
    • Java installs later than 8 do not have a separate JRE install
  2. Setting the JAVA_HOME env variable to the later Java install directory

Reporting

Please report any problem using RDF Transform to the code repository's Issues.

rdf-transform's People

Contributors

atescomp avatar higa4 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

rdf-transform's Issues

Dialog loading is confused by new license comments

Support HTML files containing the new license text as a comment, "<!-- -->", adds an additional HTML element that confuses the bind processing after the loadHTML() call. The JQuery object no longer points to the dialog as element 0 (the only element).

RDF Transform extension not showing on OpenRefine GUI

I installed the RDF Transform according to your instructions on my Mac where I already had a working OpenRefine (v. 3.5.2) but it doesn't show at the OpenRefine GUI. I stopped the OpenRefine service, installed the RDF Transform extension and started OpenRefine again with no success.
I tried tried to install on both the ~/Library/Application Support/OpenRefine/extensions/rdf-transform and on the OpenRefine package (Contents/Resources/webapp/extensions/rdf-transform), also with no success. Is there anything else I should do to make it work?

Custom dummy namespace doesn't work?

using RDF transform v2.0.3-alpha with OR 3.5.2 and Java 17 in WSL Ubuntu 20.04

I've added abbrev ex for http://example.com/ as namespace and this is also my base IRI. It does not retrieve any real namespace, it doesn't need to.

The problem is that the abbrev does not show up in the preview nor the turtle. It looks like this:

@prefix : <http://example.com/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .

:mammals a skos:Concept;
  skos:narrower :felines, :bovines .

It should look like this:

@prefix ex: <http://example.com/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .

ex:mammals a skos:Concept;
  skos:narrower ex:felines, ex:bovines .

Should you have any hints as to how I can fix this myself by tweaking something (I don't know Java), please feel free to share.

In addition, I have to say I'm really happy that RDF Transform came along. Repeated occurrences (e.g. the two narrowers here) are a real pain in the neck to deal with in the "old" RDF-extension, see here.

License clarification

Looking at LICENSE.txt it appears to be reporting on the licenses of dependencies only, but not your code under the src tree. Since there are no license headers, it is unclear whether your code is also licensed under Apache 2.0 or is unlicensed.

If you intend your software to be permissively licensed, do you mind clarifying LICENSE.txt with your desired license (preferably Apache 2.0) and/or adding license headers to your source files?

Much appreciated!

Tier 2+ Literals NOT Processing

When in record mode, any literal declared at the 2nd tier or later (beyond the root level) will not process into the preview or export.

RDF Transform extension causes javascript error in Safari

To recreate:

  • Installed RDF Transform 2.2.0 with OpenRefine 3.6.2 (Linux build)
  • Ran OpenRefine, accessed the UI using Safari, and tried to open a project
  • Project doesn't display, and in developer console errors can be seen

Error is:

[Error] SyntaxError: Unexpected token '{'
	(anonymous function) (project-bundle.js:54566)

which references this part of the code

    // Setup default Master Root Node (copy as needed)...
    static gnodeMasterRoot = {};
    static {
        this.gnodeMasterRoot.valueSource = {};
        this.gnodeMasterRoot.valueSource.source = null; // ...to be replaced with row / record index
        this.gnodeMasterRoot.expression = {};
        this.gnodeMasterRoot.expression.language = RDFTransform.gstrDefaultExpLang;
        this.gnodeMasterRoot.expression.code = null; // ...to be replaced with default language expression
        this.gnodeMasterRoot.propertyMappings = [];
    }

specifically the error occurs at

If I instead access the UI via Chrome, the project loads and no errors are reported in the console

Allow Property Expressions

Currently, all properties are "constant" IRIs. Modify the UI Property code to allow for column expressions.

ERROR: Could not retrieve default namespaces

  • Windows 10 Pro v 21H2 build 19044.1645
  • OpenRefine 3.5.2
  • Java runtime 1.8.0_331
  • RDF Transform 2.0.3-alpha

FYI: This issue isn't really that urgent (to me). After running into it, I managed to install & run OR with RDF Transform under WSL Ubuntu, and this issue does not appear there. However, see issue #5.

Steps to reproduce

Start OpenRefine, select any project, click RDF Transform -> Edit RDF Transform -> Message appears ERROR: Could not retrieve default namespaces

Click OK. RDF Transform pane appears. Click Add, in Prefix box type ex, in IRI box type http://www.example.com/, click OK -> Message appears An IRI is required! The given IRI is invalid: http://www.example.com/

config info

RDF Transform is in C:\Users<myname>\AppData\Roaming\OpenRefine\extensions\rdf-transform

I normally run OR from here: C:\Users<myname>\Downloads\openrefine-3.5.2\openrefine.exe
I also tried to run a copy here: C:\Program Files\OpenRefine\openrefine-3.5.2\openrefine.exe
and here: C:\Users<myname>\AppData\Roaming\OpenRefine\openrefine-3.5.2\openrefine.exe

Same problem always. I think it's looking for these:

"C:\Users<myname>\AppData\Roaming\OpenRefine\extensions\rdf-transform\module\MOD-INF\classes\files\PredefinedVocabs"
"C:\Users<myname>\AppData\Roaming\OpenRefine\extensions\rdf-transform\module\MOD-INF\classes\files\Namespaces"

Any tweaks I could try would be welcome.

Editable Sample Record / Row Count for Preview (WIP)

Provide functionality to edit the sample record / row count for preview. Provide for first row or record to start from and the number of following. Or, design it like page printing (1-4, 7-9, 11). Something like that.

Namespace Add by File Import Error

The Prefixed Namespace Adder fails to process file imports resulting in a hung / locked condition.

The Namespace Adder has three modes:

  1. Add a simple Prefix and Namespace
  2. Add a Prefix and Namespace along with its ontology class and property hints from the web
  3. Add a Prefix and Namespace along with its ontology class and property hints from a file

RDFRefine - can't access imported ontology?

I'm using OpenRefine v.3.5.2 and its extension "rdf-extension v.1.3.1". I'm trying to import ontologies like dbpedia and schema. When I add prefix and insert the URI, it gives me no problem. However, when i try to access its properties it says: 'Not in the imported vocabulary definition.' I tried ti upload the ontology.owl as a local file and the same issue still exist. Could you help with that?

Change Expression on RDF Node type change

When the user selects any of the RDF Node types (index, column, constant) from another types, change the expression to the default expression of that type unless it's changing back to the original type. If changing back to the original, restore the original expression.

Also, consider column expressions may be compatible between columns, so preserve a column expression if just changing columns.

For index types, a change between record and row mode already does this:

  • Changes between "row.record.index" and "row.index".

Process Sub-Records of a Record in Record Mode

Currently, when exporting an RDF file, the exporter uses OpenRefine's record indicator to start processing Root Node subject maps as records starting on non-empty cells of the first column and all subsequent empty cells in the column indicating the rest of the record--a following non-empty cell beings a new record. Once a record is identified, as it processes other columns in the transform, the record processing changes to row processing. This is done to process data as independent lines in the record. Otherwise, the record elements would be considered one long row (or rather squashed row) and complimentary subjects of a column acquire properties and objects belonging to other otherwise separate subjects.

However, there may be sub-record structures in a given record designated by the same first column logic but applied to other columns and the initial record processing would be desirable on these columns. The transform should reset the row processing on these sub-records to record mode via a setting on an object mapping to act as a subject record processor by detecting the object column's non-blank and blank cells--forming sub-record ranges mirroring OpenRefine's reported record range--and processing them accordingly.

This is predicated on the initial record or row setting for the data--it should process sub-records only when the data has been set to record mode in OpenRefine.

Update Preview to use a Pretty Print Presentation

Currently, the RDF Preview tab uses Jena's Stream processing to present the data. Modify the preview to use a whole graph pretty printer approach with a Jena Model. The code uses a Stream as its modeled on the Export process. However, the Export will also need to export using a pretty printer as well to support other output types that don't stream well.

The pretty printer approach presents an optimized output as it can examine the entire graph for the most condensed format. Streams only get smaller chunks of the entire graph for output, so cannot optimize.

Run transformation in batch mode

I found this extension very useful. We have a very large set of XML files that share same structure, so one mapping file can apply to them all. However, it would be tedious to manually load them one by one to OpenRefine. I'd like to ask whether there is a batch mode that allows us to run the transformation in command line. Thanks.

Optimize Export Buffer

Provide an enhancement to examine the project data size and system memory to determine an optimal buffer size and allocations.

Maintaining this repository in the OpenRefine github organization?

Hi @AtesComp,

Again, thank you so much for working on this! This new RDF extension looks really great.
Before you published it, we were in talks with the current maintainer of the older RDF extension to transfer it to OpenRefine's GitHub organization, as a means to give it more visibility (this has not progressed because the maintainer has not responded to this invitation yet).

If that is interest to you, we could do this for your extension instead. We would of course give you all admin rights on the repository.

I still have in mind your request for more documentation about the migration process for OpenRefine 4.0 - I hope to find the time to work on this soon.

If there is anything else we can do to support your work, let us know :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.