Coder Social home page Coder Social logo

dasch-swiss / dsp-api Goto Github PK

View Code? Open in Web Editor NEW
73.0 16.0 18.0 447.83 MB

DaSCH Service Platform API

Home Page: http://admin.dasch.swiss

License: Apache License 2.0

Scala 96.54% Shell 0.61% HTML 0.03% JavaScript 0.16% XSLT 0.53% Python 0.07% Lua 1.86% Makefile 0.19% Just 0.03%
ontologies humanities rdf triplestore

dsp-api's Introduction

DSP-API โ€” DaSCH Service Platform API

Github Docker CI Codacy Badge Codacy Badge

DSP is a server application for storing, sharing, and working with primary sources and data in the humanities.

It is developed by the Swiss National Data and Service Center for the Humanities at the University of Basel, and is supported by the Swiss Academy of Humanities and Social Sciences and the Swiss National Science Foundation.

DSP-API is free software, released under the Apache License, Version 2.0.

Features

  • Stores humanities data as industry-standard RDF graphs, plus files for binary data such as digitized primary sources.
    • Designed to work with any standards-compliant RDF triplestore. Tested with Jena Fuseki.
  • Based on OWL ontologies that express abstract, cross-disciplinary commonalities in the structure and semantics of research data.
  • Offers a generic HTTP-based API, implemented in Scala, for querying, annotating, and linking together heterogeneous data in a unified way.
    • Handles authentication and authorization.
    • Provides automatic versioning of data.
  • Uses Sipi, a high-performance media server implemented in C++.
  • Designed to be be used with DSP-APP, a general-purpose, browser-based virtual research environment, as well as with custom user interfaces.

Requirements

For developing and testing DSP-API

Each developer machine should have the following prerequisites installed:

JDK Temurin 21

Follow the steps described on https://sdkman.io/ to install SDKMAN. Then, follow these steps:

sdk ls java  # choose the latest version of Temurin 21
sdk install java 21.x.y-tem

SDKMAN will take care of the environment variable JAVA_HOME.

For building the documentation

See docs/Readme.md.

Try it out

Run DSP-API

Create a test repository, load some test data into the triplestore, and start DSP-API:

just stack-init-test

Open http://localhost:4200/ in a web browser.

On first installation, errors similar to the following can come up:

error decoding 'Volumes[0]': invalid spec: :/fuseki:delegated: empty section between colons

To solve this, you need to deactivate Docker Compose V2. This can be done in Docker Desktop either by unchecking the "Use Docker Compose V2" flag under "Preferences > General" or by running

docker-compose disable-v2

Shut down DSP-API:

just stack-stop

Run the automated tests

Automated tests are split into different source sets into slow running integration tests (i.e. tests which do IO or are using Testcontainers) and fast running unit tests.

Run unit tests:

sbt test

Run integration tests:

make integration-test

Run all tests:

make test-all

Release Versioning Convention

The DSP-API release versioning follows the Semantic Versioning convention:

Given a version number MAJOR.MINOR.PATCH, increment the:

  • MAJOR version when you make incompatible API changes,
  • MINOR version when you add functionality in a backwards-compatible manner, and
  • PATCH version when you make backwards-compatible bug fixes.

Additionally, we will also increment the MAJOR version in the case when any kind of changes to existing data would be necessary, e.g., any changes to the knora-base ontology which are not backwards compatible.

dsp-api's People

Contributors

alexellis avatar balduinlandolt avatar daschbot avatar dependabot[bot] avatar ericsommerhalder avatar gfoo avatar jnussbaum avatar kilchenmann avatar loicjaouen avatar lrosenth avatar lukasstoeckli avatar mdelez avatar mpro7 avatar musicenfanthen avatar nemmk avatar nora-olivia-ammann avatar ralfbarkow avatar ridaayed avatar samuelboerlin avatar seakayone avatar sepidehalassi avatar siers avatar snyk-bot avatar subotic avatar tobiasschweizer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dsp-api's Issues

Fix inconsistencies in SALSAH export script

The export script produces some inconsistencies in the Incunabula ontology and data, and I haven't figured out how to deal with them.

Cardinalities

In incunabula:book

  • title has owl:cardinality 1, but book http://data.knora.org/e41ab5695c has two titles (so the cardinality could be owl:minCardinality 1).
  • publoc has owl:cardinality 1, but book http://data.knora.org/de6d83112e04 has no publoc (so the cardinality could be owl:maxCardinality 1).
  • publisher has owl:maxCardinality 1, but book http://data.knora.org/9311a421b501 has two publishers (so the cardinality could be owl:minCardinality 0).

In incunabula:page

  • pagenum has owl:cardinality 1, but page http://data.knora.org/f84653d34004 has no pagenum (so the cardinality could be owl:maxCardinality 1).
  • seqnum has owl:cardinality 1, but page http://data.knora.org/f84653d34004 has no seqnum (so the cardinality could be owl:maxCardinality 1).

In incunabula:Sideband

  • description has owl:cardinality 1, but sideband http://data.knora.org/684a7a4ec5d7 has no description (so the cardinality could be owl:maxCardinality 1).

Property subject class constraints

  • incunabula:description has knora-base:subjectClassConstraint :Sideband, but in the exported data, its subject is sometimes a :book or a :page.
  • incunabula:citation has knora-base:subjectClassConstraint :page, but in the exported data, its subject can be either a :book or a :page.

Reference to nonexistent resource

Page http://data.knora.org/9374cc0a4f03 says it has a left sideband http://data.knora.org/bb4a4d4758e722, but that sideband doesn't exist. If I view the page in the old SALSAH (page l8r in [Das] Narrenschiff (dt.)), it's clear that there's no sideband on that page.

Incorrect references to external resources

http://data.knora.org/c30e76e55204 is a knora-base:ExternalResource. It should have a knora-base:hasExtResValue property pointing to a knora-base:ExternalResValue, but instead it has knora-base:extResId and knora-base:extResProvider, each of which point to a knora-base:ExternalResValue. The two instances of knora-base:ExternalResValue have the same contents.

The other external resources have the same problem.

Regions without isRegionOf

These regions have no isRegionOf (required by the cardinality on Region):

  • http://data.knora.org/c9905d015030
  • http://data.knora.org/8e96ec3b5030
  • http://data.knora.org/d8e282a38b32
  • http://data.knora.org/707f8edf3935
  • http://data.knora.org/d44533274407

It makes no sense to have a region that isn't a region of something, so probably these regions should be excluded from the export.

Page without isPartOf

The page http://data.knora.org/3c94b29fb90f02 has no incunabula:partOf (required by the cardinality on page). It makes no sense to have a page that isn't part of a book, so probably this page should be excluded from the export.

Off by one

In incunabula:seqnum, valueHasString is different from valueHasInteger. For example, in page http://data.knora.org/9374cc0a4f03, the seqnum is http://data.knora.org/9374cc0a4f03/values/ac5f87e52f0d. Its valueHasInteger is 177, but its valueHasString is 176.

Implement automated performance tests

  • Test the performance and scalability of the Knora API server with the fake triplestore
  • Test performance with different real triplestores
  • Have the performance tests run automatically on the test server

FOAF Properties Naming

In OntologyConstants we have the following:

object Foaf {
        val FirstName = "http://xmln.com/foaf/0.1/firstName"
        val LastName = "http://xmln.com/foaf/0.1/lastName"
    }

In knora-base.ttl we have the following definition:

###  http://www.knora.org/ontology/knora-base#User

:User rdf:type owl:Class ;

      rdfs:subClassOf <http://xmlns.com/foaf/0.1/Person> ,
                      [ rdf:type owl:Restriction ;
                        owl:onProperty <http://xmlns.com/foaf/0.1/familyName> ;
                        owl:cardinality "1"^^xsd:nonNegativeInteger
                      ] ,
                      ...
                      [ rdf:type owl:Restriction ;
                        owl:onProperty <http://xmlns.com/foaf/0.1/givenName> ;
                        owl:cardinality "1"^^xsd:nonNegativeInteger
                      ], 
                      ...

      rdfs:comment "Represents a Knora user"@en .

According to the FOAF spec, we have these options: foaf:givenName which is used alongside foaf:familyName and foaf:firstNamewhich is used alongside foaf:lastName. The spec prefers the givenName / familyName combination.

I suppose that we should then stick to foaf:givenName / foaf:familyName?

Also there is a mix of both usages in the test/demo data, which I would like to clean up as soon as it is clear what we will use.

@lrosenth
@benjamingeer
@tobiasschweizer

Ensure that WHERE clauses in updates don't return too many rows

If a WHERE clause in a SPARQL update returns more rows than expected, this can cause an INSERT clause to insert duplicate data. For example, in addValueVersion.scala.txt, the WHERE clause should return at most one row. However, this depends on certain assumptions about the data in the triplestore, e.g. that a resource will have at most one knora-base:lastModificationDate. If the resource being updated has two lastModificationDate triples, the WHERE clause will return two rows, one for each, and the INSERT clause will be executed twice. If the value being inserted contains standoff markup, which is represented as blank nodes, duplicate standoff nodes will be inserted.

My first thought was that this could be prevented by adding LIMIT 1 to the end of the WHERE clause, but it seems that SPARQL Update doesn't allow this. I think it should be possible to use LIMIT by putting the contents of the WHERE clause in a subquery. I made a brief attempt to get this to work, but it wasn't successful; the WHERE clause just returned no results. It would be worth trying it again.

Change in CoreBooted prevented server from starting, needs explanation

One of the changes in pull request #35 prevented re-start from starting the API server:

-    implicit lazy val system = ActorSystem("webapi")
+    implicit def system = ActorSystem("webapi")

The symptom was that the server would appear to start (the console messages looked normal), but any HTTP request from a browser would time out. The user would just get the message The server was not able to produce a timely response to your request in the browser, and there would be no additional output on the console.

I changed it back to lazy val, and now re-start works again. I don't understand why this needs to be a lazy val, or why it was changed to a def. This needs a comment in the code to explain it, so the same problem doesn't happen again.

Also, the Scaladoc comments for Core and CoreBooted are not very helpful. They say that these classes are part of the cake pattern, but it would be better to explain what the cake pattern is accomplishing here. Is the intention that we could make another class like KnoraService, but with a fake ActorSystem? If so, why would we want to do that?

add delete file route to Sipi

This case is meant to delete a file after it has already been successfully created by Sipi, because when looking at its type we find out that it is not what Knora expects (e.g., we expect an image but Sipi create an audio file).

So we have to delete the file from Sipi because its file value will not be created in Knora.

Search responder queries are not deterministic

Two problems:

  1. In a full-text search (e.g. search for 'Orationes' in Incunabula), if a resource has a matching label and a matching value, Fuseki and GraphDB return different results: Fuseki returns the match for the resource label, while GraphDB returns the one for the text value. Both appear to be correct: we are using SAMPLE, so each triplestore is returning a different random result. Try to find another approach so that they return the same result.
  2. GraphDB returns ontology entity labels in the wrong language (again, this seems to be because we have erroneously requested a random one, rather than the one in the user's preferred language as intended). A good solution would probably be to remove all ontology-related things from the search queries, and instead to have the search responder ask the ontology responder for that information.

Ontology responder caches entity info filtered by language

So if someone requests information about a resource class in German, the ontology responder will cache that, and then if someone requests information about the same resource class in French, they'll get the cached German version.

The ontology responder should either:

  • Cache the raw SPARQL query results, and filter them by language on each request, or
  • Cache them separately by language.

Userdata includes the user's password hash

When authenticator.getUserProfileV1 returns a UserProfileV1, that object includes a UserDataV1 containing a hash of the user's password. This hash can easily end up being returned to the client in an API response. Can we remove it from UserDataV1?

Creating file information for Sipi

In Sipi responder's method getFileInfoForSipiV1, a SipiFileInfoGetResponseV1 is created and returned containing the permissions the user has on the file and the path/name of the file.

So this allows Sipi to send a fileValueIri to Knora and get the user's permissions on the file and also the actual file name (so Sipi can find it on the file system and read it).

This happens after the client has sent a IIIF-URL to the Sipi server naming a fileValueIri. Why would Sipi responder return another URL to access the file which it was already called with by the client?

fileValueV1: FileValueV1 => valueUtilV1.makeSipiFileGetUrlFromFileValueV1(fileValueV1)

...


    /**
      * Creates a URL for accessing a file via Sipi. // TODO: implement this correctly.
      *
      * @param fileValueV1 the file value that the URL will point to.
      * @return a Sipi URL.
      */
    // TODO: if this is a StillImageFileValue, create a IIIF URL
    def makeSipiFileGetUrlFromFileValueV1(fileValueV1: FileValueV1): String = {
        makeSipiFileGetUrlFromFilename(fileValueV1.internalFilename)
    }

    /**
      * Creates a URL for accessing a file via Sipi. // TODO: implement this correctly.
      *
      * @param filename the name of the file that the URL will point to.
      * @return a Sipi URL.
      */
    def makeSipiFileGetUrlFromFilename(filename: String): String = {
        s"${settings.sipiUrl}/$filename"
    }

I think we have to different cases here:

  1. Knora creates a IIIF-URL for the client to access an image asking Sipi
  2. Sipi has to ask Knora about the user's permissions on the file and about the internal filename in case it got a fileValueIri in the IIIF-URL (possibly, this isn't even necessary, as Knora could directly put the file's name in it, also making use of the prefix to declare the project)

Don't use rdfs:range to check data consistency

Currently we use rdfs:range for consistency checks when creating values: if a property P has an rdfs:range of class C, we check that the value submitted is a C. In other words, we use rdfs:range to mean that if a value is not a C, it's invalid to make it an object of P.

Ontotext pointed out that this isn't actually what rdfs:range means. As its definition says, it really means that if something is an object of P, it is a C. In other words, rdfs:range is for inferring the type of the value, not for imposing constraints. They suggest we use another predicate to avoid confusion.

In fact, as far as I can tell, there's no standard predicate for this, so perhaps we should add one to knora-base.

Consider refactoring the ontology cache

Currently each combination of entity IRI and preferred language is queried and cached separately. Consider instead caching the unfiltered query results as discussed in pull request #26.

How to create an ActorRef outside an actor?

Hi

What would be the best way to create an ActorRefoutside an Actor (outside a class that extends Actor)?

On my branch wip/setup_fake_responders, I created a TestResponderManagerV1 that extends ResponderManager and has access to all its definitions of the live routes.

However, each of these routes can be overridden in case someone passes a mock or fake responder:

override val sipiRouter = actor("mocksipi")(new Act {
        become {
            case sipiResponderConversionFileRequest: SipiResponderConversionFileRequestV1 => future2Message(sender(), imageConversionResponse(sipiResponderConversionFileRequest), log)
            case sipiResponderConversionPathRequest: SipiResponderConversionPathRequestV1 => future2Message(sender(), imageConversionResponse(sipiResponderConversionPathRequest), log)
        }
    })

What would be the best way to create an actor in a test like SipiV1E2ESpec to pass it to TestResponderManagerV1 then?

The problem is that extending Actor and ActorLogging provides a lot of stuff that you have to explicitly deal with outside this context, like the logger or the ActorContext (implicit ActorRefFactory required: if outside of an Actor you need an implicit ActorSystem, inside of an actor this should be the implicit ActorContext).

So somehow I am breaking the design by creating those actors outside of an Actor.

CC: @subotic @benjamingeer

On resource creation, lastModifiactionDate is created more than once for the resource

When creating a resource, knora-base:lastModificationDate is created several times instead of just once.

How to reproduce this:

  • go to the branch wipi/sipi_integration
  • run Knora
  • in another terminal, call ./create_page_with_binaries.py in webapi/_test_data/test_route
  • copy the Iri of the new resource returned by Knora and search for it in the triplestore.

You will see that the property exists several times:

prop value
:lastModificationDate 2016-02-11T14:40:38.256+01:00
:lastModificationDate 2016-02-11T14:40:38.257+01:00
:lastModificationDate 2016-02-11T14:40:38.262+01:00
:lastModificationDate 2016-02-11T14:40:38.271+01:00
:lastModificationDate 2016-02-11T14:40:38.303+01:00

Now why is that? We had both a look at the SPARQL template createValue.scala.txt. There is a statement for lasModificationDate in Delete, Insert and Where.

Store information about projects in the triplestore, not in application.conf

Currently application.conf contains a list of projects, along with the named graphs used by each project. It should not be necessary to change application.conf to create a project, because this requires restarting the server, and involves a Catch-22: you can't get the project IRI until the project is created, but you can't create it until you put its IRI in application.conf.

Instead, we could create a named graph called something like http://www.knora.org/data/config where a list of projects would be stored, along with the named graphs used for each project. Probably users should be stored in the same named graph, too, rather than with each project, since a user can belong to multiple projects.

Password checking does not seem to work (authentication)

When I submit credentials (username and password), the password does not seem to be checked correctly. Only the existence of the username seems to be checked.

This happens both with HTTP header auth. and when the params are set in the ULR (Get params).

Check the triplestore for inconsistencies

This could be either:

  • A program that we can run periodically to find inconsistencies.
  • A consistency checker that runs each time an update is performed. But this would pose a problem. To create a resource, we first create an empty resource, and then we add each value, one by one. The constraints won't be met until all the values are added.

Ideally, it would use the existing OWL constraints that are defined in the ontologies in the triplestore.

GraphDB can do consistency checks, but the documentation is not clear to me. I've emailed them to ask for clarification.

Implement monitoring

Steps:

  • measuring response times (done in PR #1347)
  • expose metric to Prometheus
  • provide simple GUI to visualize the metrics

Running tests with graphdb

I have installed graphdb SE 6.62 on my machine (Mac).

I ran the following scripts:

  • webapi/scripts/graphdb-se-load-test-data.sh (test)
  • webapi/scripts/graphdb-free-ci-prepare.sh (test-unit)

Now, using graphdb with Knora (test repo) works fine (setting in application.conf), but the tests do not work with graphdb:

Load test data *** FAILED *** (1 second, 828 milliseconds)

The test data cannot be loaded into test-unit. But the test-unit repository exists (I can access it in the over the webinterface). I remember that we had that problem before but thought that running the script would solve it.

What am I doing wrong?

Making changing image servers more easier

I would like to propose a change for application.conf to make exchanging image server settings easier. application.conf could look something like this:

app {
    ...
    imageserver {
        // type = webapi
        type = sipi
        webapi {
            url = "http://localhost"
            path = "/v1/assets"
            port = 3333
        }
        sipi {
            url = "http://localhost"
            path = ""
            port = 1024
            path-conversion-route = "convert_path"
            file-conversion-route = "convert_file"
            image-mime-types = ["image/tiff", "image/jpeg", "image/png", "image/jp2"]
            movie-mime-types = []
            sound-mime-types = []
        }
    }
    ...
}

Default Permissions for knora-base:hasStillImageFileValue is missing in the ontology

For knora-base:hasStillImageFileValue, no default permissions are set.

This is because in incunabula-onto.ttl, incunabula:page is made a subclass of knora-base:StillImageRepresentation and inherits its property knora-base:hasStillImageFileValue (defined in knora-base.ttl), but without any default permission.

I suggest to define those default permissions in incunabula-onto.ttl because this should be project specific.

CC @benjamingeer

Handle valueHasComment in Search

Additionally to valueHasString and rdfs:label, valueHasComment was added to the full text index for the triplestore(s).

Do we have to adapt the fulltext search templates?

-> jena/fuseki:

@if(triplestore == "embedded-jena-tdb" || triplestore == "fuseki") {
            ?matchingSubject knora-base:valueHasString ?literal .
            #?matchingSubject ?p ?literal .
            #FILTER(?p = knora-base:valueHasString || ?matchingProperty = rdfs:label)
        }

the literal could also be a comment!

how to set up a fake sipi responder for testing?

I am now implementing the integration of Sipi into Knora.

In order to test this, it would be good to have a fake sipi responder that simulates the HTTP conversion request sent to Sipi. Otherwise, we could not test it without having Sipi running.

Could you help me with that?

Missing persons in images demo data

In images-demo-data, the consistency checker is finding a lot of references to missing resources of type images:person, which are the objects of properties like images:urheber. There are too many of these references to change them all manually. (In the original source data, there are hundreds of different :person resources.) We need to fix this somehow before consistency checking can be merged.

Sesame/GraphDB bug affecting the import of carriage returns

When we import RDF data containing a triple-quoted string that ends with a carriage return, the carriage return is stripped, apparently by Sesame:

https://openrdf.atlassian.net/browse/SES-425?jql=text%20~%20%22carriage%20return%22

https://openrdf.atlassian.net/browse/SES-1736?focusedCommentId=14423&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14423

It may be fixed in Sesame 2.7.9, in which case the fix should appear in GraphDB 7.0, which will use Sesame 2.8:

http://graphdb.ontotext.com/documentation/standard/roadmap.html

Problems when Adding Default Permissions to hasStillImageFileValue

When adding default permissions to hasStillImageFileValue, the ontology responder does not work properly anymore. The tests fail.

For example, if you want to create a page, this error occurs:

{"status":4,"error":"org.knora.webapi.InconsistentTriplestoreDataException: Resource class
 http://www.knora.org/ontology/incunabula#page has cardinalities for one or more link properties 
without corresponding link value properties. The missing link value property or properties: 
http://www.knora.org/ontology/incunabula#hasLeftSidebandValue, http://www.knora.org/ontology
/incunabula#hasRightSidebandValue"}

Steps to reproduce: add default permissions to hasStillImageFileValue in knora-base.ttl:

 knora-base:hasDefaultRestrictedViewPermission knora-base:UnknownUser ;

 knora-base:hasDefaultViewPermission knora-base:KnownUser ;

 knora-base:hasDefaultModifyPermission knora-base:ProjectMember ,
 knora-base:Owner .

file values have no valueHasOrder

In the where clause of addValueVersion.scala.txt, ?currentValue knora-base:valueHasOrder ?order had to be made optional because otherwise file values would never be update because the do not have a valueHasOrder

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.