Coder Social home page Coder Social logo

aksw / xlsx2owl Goto Github PK

View Code? Open in Web Editor NEW
0.0 27.0 0.0 370 KB

xlsx2owl is a tool for collecting classes and properties with a spreadsheet and converting the input to an ontology

License: Mozilla Public License 2.0

Dockerfile 2.16% Nextflow 2.57% Python 81.93% Shell 13.34%
ontology-engineering rdf rml yarrrml

xlsx2owl's Introduction

xlsx2owl

Using a spreadsheet to collect concepts for an ontology benefits from the high number of users interacting with spreadsheet software everyday. With xlsx2owl domain experts can add and edit concepts in a special spreadsheet structure. The spreadsheet can easily get translated to a turtle coded owl file with the YARRRML/RML mapping and shell script contained.

conversion process

Usage

usage information as printed with --help option:

usage:
xlsx2owl.sh [options] [download-url]

positional parameters:
  [download-url]:
            optional url from where the spreadsheet gets downloaded.
            Take local input file if no download-url is given.
            needs to start with a protocoll (e.g. https://...),
            supported protocolls are http, https and file.
named parameters:
  -h, --help:
            produces this usage info and exits.
  -d, --debug :
            enable debug output
  -y, --yarrrml <FILE> :
            path <FILE> to the yarrrml mapping file to use.
            defaults to 'yarrrml.yml' in the scripts directory.
  -o, --outputPrefix <PREFIX> :
            set prefix <PREFIX> for the generated output.
            default 'rdf-out', relativ to the current working directory.
  -i, --input <FILE> :
            path <FILE> to the input spreadsheet file to use
            or where to store spreadfile if download url is given.
            default 'xlsx2owl-tmp.xlsx'
  --noPreprocess :
            skip csv preprocessing.
            Otherwise additional metadata columns get added to CSV files:
            "xlsx2owl_rowNumber", "xlsx2owl_filename", "xlsx2owl_sheetname", "xlsx2owl_datetime", "xlsx2owl_version"
  --test <FILE>:
            enable test mode, use <FILE> as expected result ttl file to diff against.
            In test mode the current time value is fixed to '2024-01-01T00:00:00+00:00'.

run as docker container

  • build docker image:
    • either manual via $ buildah bud -t xlsx2owl ./
    • or with nextflow via `
  • run xlsx2owl with docker image on xlsx2owl-Example.xlsx in current directory e.g. via $ podman run -it --rm -v "./:/data/" xlsx2owl-sd --input '/data/xlsx2owl-Example.xlsx'

run test

Run docker image with test input e.g. via $ podman run -it --rm -v "./test:/data/" xlsx2owl --debug -o /data/test-out --test /data/test-vocab-output.ttl 'file:///data/test-input.xlsx'. With the --test flag diff gets called at the end. Successfull output should end like the following:

diffing against test file
diff test passed

Dependencies

  • YARRRML parser
  • RMLMapper
  • xlsx2csv
  • curl
  • Python 3 (for xlsx2csv)
  • Java 17 (for RMLMapper)
  • node.js 20 (for YARRRML)
  • bash >=5 (for bash regex support)
  • getopt from unix-utils (for argument parsing)

History

  • Version 2.2.1 (2024-03-11):
    • fixed xlsx2owl_filename metadata column when url and --input parameter given
    • minor updates in documentation
  • Version 2.2.0 (2024-02-20):
    • added additional metadata
    • added static csv columns (row number, sheetname, date, ...) as additional mapping input (disable with parameter --noPreprocess)
    • added option --test to automate basic testing by comparing computed result to expected result
  • Version 2.1.1 (2023-11-09):
    • fixed unintended rendering of (foreign) prefixed subjects of classes, relations or attributes
    • fixed Dockerfile bash install
    • updated to SeMiFuLi 0.2.1(adds 'contains' function), RML-Mapper 6.2.2, xlsx2csv 0.8.1+(from 11/2023)
    • updated Dockerfile to java 21(from eclipse temurin), nodejs 21, [email protected]
    • switched Dockerfile to node:21-alpine for reduced dockerfile and faster build
    • minor fixes in xlsx2owl.sh
    • switched GitLab-CI container build process to kaniko
  • Version 2.0 (2023-07-24)
    • rename script from xlsx2owl-StahlDigital.sh to xlsx2owl.sh
    • updated to RML-Mapper 6.2.1, SeMiFuLi 0.2, java 17
    • updated Dockerfile to debian 11, java 17, nodejs 20.
    • added resources/prefixes.csv to Dockerfile
  • Version 1.0 (2023-05-05)
    • first public version

Acknowledgement

We want to kindly thank Sebastian Tramp and eccenca GmbH for the initial idea, spreadsheet structure and rich input.

Work for this has been funded by the German Federal Ministry of Education and Research under grant number 13XP5116B.

License

This project is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at http://mozilla.org/MPL/2.0/.

xlsx2owl's People

Contributors

kibubu avatar lpmeyer avatar seebi avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xlsx2owl's Issues

Add basic test mode

At the moment we have a test directory with test input and expected output. Maybe we could automate this more.

Suggestion: add a --test parameter, use a fix time value (where needed) and do a simple diff at the end between result and a given file (or constant test-vocab-output.ttl) with the expected result.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.