maastrichtu-cds / datafairifier Goto Github PK

View Code? Open in Web Editor NEW

7.0 9.0 5.0 8.4 MB

A system that supports the creation and validation of mappings and the creation of RDF data from relational data.

License: Apache License 2.0

Shell 15.54% PLpgSQL 2.27% JavaScript 0.64% Jupyter Notebook 17.65% Python 55.32% Dockerfile 8.37% Batchfile 0.20%

fair docker rdf

datafairifier's People

Contributors

Stargazers

Watchers

Forkers

nlesc trevismd l-ippel fair-data-for-capacity quanpan302

datafairifier's Issues

Connect Jupyter with Virtuoso via SQL

View db table header in Jupyter nb to support SQL query writing

In the Jupyternotebook:
When the user is writing the SQL query, including mapping the column headers to required variable names,
it would be convenient if he/she could see a list with these column headers displayed in the notebook. Otherwise, he/she has to open and inspect the database elsewhere.

R2RML breaks with NaN values in data

When there is NaN information in the age column, stored as string in postgres. The R2RML conversiont tool breaks on these values, and discards all subsequent rows. This results in a list of missing patients.
The same happens with NULL values in the database.

Number the Jupiter notebooks as steps

SeDI Docker

We need to run SeDI as a Docker

Test Ontop performance vs Blazegraph

Although we have defined R2RML mappings, we need to test the performance of Ontop in comparison to constructing all triples. Is there a performance gain/reduction?

Dockerize ontop service

Adapt memory threshold GraphDB in Docker

The RAM memory threshold (1 GB) seems to be limiting loading RDF triples in GraphDB.
-> increase threshold

Jupyter: Various graph to display SQL output

Nice to have: selectboxes for X and Y vars

Create a dcm4che docker image

Create dcm4che docker image to use in the image pathway

Make data mapping easier using UI

This is possible in Protege, but is an ontop-proprietary solution.

The question is how we can make an R2RML mapping using a graphical tool which helps in selecting the right tables/columns/values.

Jupyter: user input SQL query

SQL syntax highlighting
index for pandas dataframe

Start GraphDB with empty db and default r2rml

Currently, the user has to manually create two new repositories in GraphDB when starting the system:

One for database
One for the R2RML file
This should be automated and part of starting GraphDB

Sql server support for r2rml conversion tool

R2RML conversion tool can't handle different SQL 'dialects', i.e., server database access versus direct access to postgres database within our system

MIA Docker Image

A Docker containing all MIA micro-services except the UniversalWorker

Clinical Trial Processor Docker

We need a standard CAT CTP Docker image to anonymize the DICOM data

ReadMe - move "configuration of the infrastructure"?

The paragraph on configuration of the infrastructure is not really clear to me @jvsoest : Is this targeted at people who attend a workshop or hackathon at Maastro?
Maybe we could move this paragraph to the documentation files, and have only generic high-level install&run&use instructions in the ReadMe.

Advantages:

we don't need to reload every time the ontology
we can use the ontology during querying (using SERVICE <….> options in sparql)
we can use it in the mapping tool, including inferencing on subClassOf* reasoning. Some terminology options are sometimes a logical statement between the concepts "delineation" AND "specific organ/tissue".