Coder Social home page Coder Social logo

sirixdb / sirix Goto Github PK

View Code? Open in Web Editor NEW
1.1K 36.0 234.0 148.63 MB

SirixDB is an an embeddable, bitemporal, append-only database system and event store, storing immutable lightweight snapshots. It keeps the full history of each resource. Every commit stores a space-efficient snapshot through structural sharing. It is log-structured and never overwrites data. SirixDB uses a novel page-level versioning approach.

Home Page: https://sirix.io

License: BSD 3-Clause "New" or "Revised" License

Java 93.45% XSLT 0.10% XQuery 0.19% Kotlin 6.12% Dockerfile 0.03% Shell 0.06% HTML 0.06%
xquery java temporal-data storage snapshot comparison ssd json versioning hashing

sirix's Introduction

An Embeddable, Bitemporal, Append-Only Database System and Event Store

Stores small-sized, immutable snapshots of your data in an append-only manner. It facilitates querying and reconstructing the entire history as well as easy audits.

Tweet

Follow

Download ZIP | Join us on Discord | Community Forum | Documentation | Architecture & Concepts

Working on your first Pull Request? You can learn how from this free series How to Contribute to an Open Source Project on GitHub and another tutorial: How YOU can contribute to OSS, a beginners guide

"Remember that you're lucky, even if you don't think you are because there's always something that you can be thankful for." - Esther Grace Earl (http://tswgo.org)

We want to build the database system together with you. Help us and become a maintainer yourself. Why? You may like the software and want to help us. Furthermore, you'll learn a lot. You may want to fix a bug or add a feature. Do you want to add an awesome project to your portfolio? Do you want to grow your network?... All of this are valid reasons besides probably many more: Collaborating on Open Source Software

SirixDB appends data to an indexed log file without the need of a WAL. It can be embedded and used as a library from your favorite language on the JVM to store and query data locally or by using a simple CLI. An asynchronous HTTP server, which adds the core and query modules as dependencies, can interact with SirixDB over the network using Keycloak for authentication/authorization. One file stores the data with all revisions and possibly secondary indexes. A second file stores offsets into the file to quickly search for a revision by a given timestamp using an in-memory binary search. Furthermore, a few maintenance files exist, which store the configuration of a resource and the definitions of secondary indexes (if any are configured). Other JSON files keep track of changes in delta files if enabled.

It currently supports the storage and (time travel) querying of XML and JSON data in its binary encoding, tailored to support versioning. The index structures and the whole storage engine has been written from scratch to support versioning natively. We might also implement storing and querying other data formats as relational data.

SirixDB uses a huge persistent (in the functional sense) tree of tries, wherein the committed snapshots share unchanged pages and even common records in changed pages. The system only stores page fragments during a copy-on-write out-of-place operation instead of full pages during a commit to reduce write-amplification. During read operations, the system reads the page fragments in parallel to reconstruct an in-memory page (thus, a fast, random access storage device as a PCIe SSD is best suited or even byte-addressable storage as Intel DC optane memory shortly -- as SirixDB stores fine granular cache-size (not page) aligned modifications in a single file.

Please consider sponsoring our Open Source work if you like the project.

Note: Let us know if you'd like to build a brand-new frontend with, for instance Svelte, D3.js, and Typescript.

Discuss it in the Community Forum.

Table of contents

Keeping All Versions of Your Data By Sharing Structure

We could write a lot about why keeping all states of your data in a storage system is of great value. In a nutshell, it's all about looking at the evolution of your data, finding trends, doing audits, and implementing efficient undo-/redo-operations. The Wikipedia page has a bunch of examples. We recently also added use cases over here.

Our firm belief is that a temporal storage system must address the issues which arise from keeping past states way better than traditional approaches. Usually, storing time-varying, temporal data in database systems that do not support the storage thereof natively results in many unwanted hurdles. They waste storage space, query performance to retrieve past states of your data is not most ideal, and usually, temporal operations are missing altogether.

The DBS must store data so that storage space is used as effectively as possible while supporting the reconstruction of each revision, as the database saw it during the commits. All this should be handled in linear time, whether it's the first revision or the most recent revision. Ideally, the query time of old/past revisions and the most recent revision should be in the same runtime complexity (logarithmic when querying for specific records).

SirixDB not only supports snapshot-based versioning on a record granular level through a novel versioning algorithm called sliding snapshot, but also time travel queries, efficient diffing between revisions, and storing semi-structured data.

Executing the following time-travel query on our binary JSON representation of Twitter sample data gives an initial impression of the possibilities:

let $statuses := jn:open('mycol.jn','mydoc.jn', xs:dateTime('2019-04-13T16:24:27Z')).statuses
let $foundStatus := for $status in $statuses
  let $dateTimeCreated := xs:dateTime($status.created_at)
  where $dateTimeCreated > xs:dateTime("2018-02-01T00:00:00") and not(exists(jn:previous($status)))
  order by $dateTimeCreated
  return $status
return {"revision": sdb:revision($foundStatus), $foundStatus{text}}

The query opens a database/resource in a specific revision based on a timestamp (2019โ€“04โ€“13T16:24:27Z) and searches for all statuses, which have a created_at timestamp, which has to be greater than the 1st of February in 2018 and did not exist in the previous revision. . is a dereferencing operator used to dereference keys in JSON objects, array values can be accessed as shown looping over the values or through specifying an index, starting with zero: array[0] for instance, specifies the first value of the array. Brackit, our query processor, also supports Python-like array slices to simplify tasks.

JSONiq examples

To verify changes in a node or its subtree, first, select the node in the revision and then query for changes using our stored Merkle hash tree, which builds and updates hashes for each node and its subtree and checks the hashes with sdb:hash($item). The function jn:all-times delivers the node in all revisions in which it exists. jn:previous delivers the node in the previous revision or an empty sequence if there's none.

let $node := jn:doc('mycol.jn','mydoc.jn').fieldName[1]
let $result := for $node-in-rev in jn:all-times($node)
               let $nodeInPreviousRevision := jn:previous($node-in-rev)
               return
                 if ((not(exists($nodeInPreviousRevision)))
                      or (sdb:hash($node-in-rev) ne sdb:hash($nodeInPreviousRevision))) then
                   $node-in-rev
                 else
                   ()
return [
  for $jsonItem in $result
  return { "node": $jsonItem, "revision": sdb:revision($jsonItem) }
]

Emit all diffs between the revisions in a JSON format:

let $maxRevision := sdb:revision(jn:doc('mycol.jn','mydoc.jn'))
let $result := for $i in (1 to $maxRevision)
               return
                 if ($i > 1) then
                   jn:diff('mycol.jn','mydoc.jn',$i - 1, $i)
                 else
                   ()
return [
  for $diff at $pos in $result
  return {"diffRev" || $pos || "toRev" || $pos + 1: jn:parse($diff).diffs}
]

We support easy updates as in

let $array := jn:doc('mycol.jn','mydoc.jn')
return insert json {"bla":true} into $array at position 0

to insert a JSON object into a resource, whereas the root node is an array at the first position (0). The transaction is implicitly committed. Thus, a new revision is created, and the specific revision can be queried using a single third argument, either a simple integer ID or a timestamp. The following query issues a query on the first revision (thus without the changes).

jn:doc('mycol.jn','mydoc.jn',1)

Omitting the third argument opens the resource in the most recent revision, but you could, in this case, also specify revision number 2. You can also use a timestamp as in:

jn:open('mycol.jn','mydoc.jn',xs:dateTime('2022-03-01T00:00:00Z'))

A simple join (whereas joins are optimized in our query processor called Brackit):

(* first: store stores in a stores resource *)
sdb:store('mycol.jn','stores','
[
  { "store number" : 1, "state" : "MA" },
  { "store number" : 2, "state" : "MA" },
  { "store number" : 3, "state" : "CA" },
  { "store number" : 4, "state" : "CA" }
]')


(* second: store sales in a sales resource *)
sdb:store('mycol.jn','sales','
[
  { "product" : "broiler", "store number" : 1, "quantity" : 20  },
  { "product" : "toaster", "store number" : 2, "quantity" : 100 },
  { "product" : "toaster", "store number" : 2, "quantity" : 50 },
  { "product" : "toaster", "store number" : 3, "quantity" : 50 },
  { "product" : "blender", "store number" : 3, "quantity" : 100 },
  { "product" : "blender", "store number" : 3, "quantity" : 150 },
  { "product" : "socks", "store number" : 1, "quantity" : 500 },
  { "product" : "socks", "store number" : 2, "quantity" : 10 },
  { "product" : "shirt", "store number" : 3, "quantity" : 10 }
]')

let $stores := jn:doc('mycol.jn','stores')
let $sales := jn:doc('mycol.jn','sales')
let $join :=
  for $store in $stores, $sale in $sales
  where $store."store number" = $sale."store number"
  return {
    "nb" : $store."store number",
    "state" : $store.state,
    "sold" : $sale.product
  }
return [$join]

SirixDB through Brackit also supports array slices. The start index is 0, the step is 1, and end index is 1 (exclusive) in the next query:

let $array := [{"foo": 0}, "bar", {"baz": true}]
return $array[0:1:1]

The query returns the first object {"foo":0}.

With the function sdb:nodekey you can find out the internal unique node key of a node, which will never change. You for instance might be interested in which revision it has been removed. The following query uses the function sdb:select-item which, as the first argument needs a context node and, as the second argument the key of the item or node to select. jn:last-existing finds the most recent version and sdb:revision retrieves the revision number.

sdb:revision(jn:last-existing(sdb:select-item(jn:doc('mycol.jn','mydoc.jn',1), 26)))

Index types

SirixDB has three types of indexes along with a path summary tree, which is basically a tree of all distinct paths:

  • name indexes, to index a set of object fields
  • path indexes, to index a set of paths (or all paths in a resource)
  • CAS indexes, so-called content-and-structure indexes, which index paths and typed values (for instance, all xs:integers). In this case, on the paths specified, only integer values are indexed on the path, but no other types

We base the indexes on the following serialization of three revisions of a very small SirixDB resource.

{
  "sirix": [
    {
      "revisionNumber": 1,
      "revision": {
        "foo": [
          "bar",
          null,
          2.33
        ],
        "bar": {
          "hello": "world",
          "helloo": true
        },
        "baz": "hello",
        "tada": [
          {
            "foo": "bar"
          },
          {
            "baz": false
          },
          "boo",
          {},
          []
        ]
      }
    },
    {
      "revisionNumber": 2,
      "revision": {
        "tadaaa": "todooo",
        "foo": [
          "bar",
          null,
          103
        ],
        "bar": {
          "hello": "world",
          "helloo": true
        },
        "baz": "hello",
        "tada": [
          {
            "foo": "bar"
          },
          {
            "baz": false
          },
          "boo",
          {},
          []
        ]
      }
    },
    {
      "revisionNumber": 3,
      "revision": {
        "tadaaa": "todooo",
        "foo": [
          "bar",
          null,
          23.76
        ],
        "bar": {
          "hello": "world",
          "helloo": true
        },
        "baz": "hello",
        "tada": [
          {
            "foo": "bar"
          },
          {
            "baz": false
          },
          "boo",
          {},
          [
            {
              "foo": "bar"
            }
          ]
        ]
      }
    }
  ]
}
let $doc := jn:doc('mycol.jn','mydoc.jn')
let $stats := jn:create-name-index($doc, ('foo','bar'))
return {"revision": sdb:commit($doc)}

The index is created for "foo" and "bar" object fields. You can query for "foo" fields as for instance:

let $doc := jn:doc('mycol.jn','mydoc.jn')
let $nameIndexNumber := jn:find-name-index($doc, 'foo')
for $node in jn:scan-name-index($doc, $nameIndexNumber, 'foo')
order by sdb:revision($node), sdb:nodekey($node)
return {"nodeKey": sdb:nodekey($node), "path": sdb:path($node), "revision": sdb:revision($node)}

Second, whole paths are indexable.

Thus, the following path index is applicable to both queries: .sirix[].revision.tada[].foo and .sirix[].revision.tada[][4].foo. Thus, essentially both foo nodes are indexed and the first child has to be fetched afterwards. For the second query also the array index 4 has to be checked if the indexed node is really on index 4.

let $doc := jn:doc('mycol.jn','mydoc.jn')
let $stats := jn:create-path-index($doc, '/sirix/[]/revision/tada//[]/foo')
return {"revision": sdb:commit($doc)}

The index might be scanned as follows:

let $doc := jn:doc('mycol.jn','mydoc.jn')
let $pathIndexNumber := jn:find-path-index($doc, '/sirix/[]/revision/tada//[]/foo')
for $node in jn:scan-path-index($doc, $pathIndexNumber, '/sirix/[]/revision/tada//[]/foo')
order by sdb:revision($node), sdb:nodekey($node)
return {"nodeKey": sdb:nodekey($node), "path": sdb:path($node)}

CAS indexes index a path plus the value. The value itself must be typed (so in this case we index only decimals on a path).

let $doc := jn:doc('mycol.jn','mydoc.jn')
let $stats := jn:create-cas-index($doc, 'xs:decimal', '/sirix/[]/revision/foo/[]')
return {"revision": sdb:commit($doc)}

We can do an index range-scan as for instance via the next query (2.33 and 100 are the min and max, the next two arguments are two booleans which denote if the min and max should be retrieved or if it's >min and <max). The last argument is usually a path if we index more paths in the same index (in this case we only index /sirix/[]/revision/foo/[]).

let $doc := jn:doc('mycol.jn','mydoc.jn')
let $casIndexNumber := jn:find-cas-index($doc, 'xs:decimal', '/sirix/[]/revision/foo/[]')
for $node in jn:scan-cas-index-range($doc, $casIndexNumber, 2.33, 100, false(), true(), ())
order by sdb:revision($node), sdb:nodekey($node)
return {"nodeKey": sdb:nodekey($node), "node": $node}

You can also create a CAS index on all string values on all paths (all object fields: //*; all arrays: //[]):

let $doc := jn:doc('mycol.jn','mydoc.jn')
let $stats := jn:create-cas-index($doc,'xs:string',('//*','//[]'))
return {"revision": sdb:commit($doc)}

To query for string values with a certain name (bar) on all paths (empty sequence ()):

let $doc := jn:doc('mycol.jn','mydoc.jn')
let $casIndexNumber := jn:find-cas-index($doc, 'xs:string', '//*')
for $node in jn:scan-cas-index($doc, $casIndexNumber, 'bar', '==', ())
order by sdb:revision($node), sdb:nodekey($node)
return {"nodeKey": sdb:nodekey($node), "node": $node, "path": sdb:path(sdb:select-parent($node))}

The argument == means check for equality of the string. Other values that make more sense for integers, and decimals... are <, <=, >= and >.

SirixDB Features

SirixDB is a log-structured, temporal JSON and XML database system, which stores evolutionary data. It never overwrites any data on disk. Thus, we're able to restore and query the full revision history of a resource in the database.

Design Goals

Some of the most important core principles and design goals are:

Embeddable
Similar to SQLite and DucksDB, SirixDB is embeddable at its core. Other APIs as the non-blocking REST API are built on top.
Minimize Storage Overhead
SirixDB shares unchanged data pages as well as records between revisions, depending on a chosen versioning algorithm during the initial bootstrapping of a resource. SirixDB aims to balance read and writer performance in its default configuration.
Concurrent
SirixDB contains very few locks and aims to be as suitable for multithreaded systems as possible.
Asynchronous
Operations can happen independently; each transaction is bound to a specific revision and only one read/write transaction on a resource is permitted concurrently with N read-only transactions.
Versioning/Revision history
SirixDB stores a revision history of every resource in the database without imposing extra overhead. It uses a huge persistent, durable page tree for indexing revisions and data.
Data integrity
SirixDB, like ZFS, stores full checksums of the pages in the parent pages. That means that almost all data corruption can be detected upon reading in the future; we aim to partition and replicate databases in the future.
Copy-on-write semantics
Similarly to the file systems Btrfs and ZFS, SirixDB uses CoW semantics, meaning that SirixDB never overwrites data. Instead, database-page fragments are copied/written to a new location. SirixDB does not simply copy whole pages. Instead, it only copies changed records plus records, which fall out of a sliding window.
Per revision and page versioning
SirixDB does not only version on a per revision but also on a per page base. Thus, whenever we change a potentially small fraction of records in a data page, it does not have to copy the whole page and write it to a new location on a disk or flash drive. Instead, we can specify one of several versioning strategies known from backup systems or a novel sliding snapshot algorithm during the creation of a database resource. The versioning type we specify is used by SirixDB to version data pages.
Guaranteed atomicity and consistency (without a WAL)
The system will never enter an inconsistent state (unless there is hardware failure), meaning that unexpected power-off won't ever damage the system. This is accomplished without the overhead of a write-ahead log. (WAL)
Log-structured and SSD friendly
SirixDB batches writes and syncs everything sequentially to a flash drive during commits. It never overwrites committed data.

Revision Histories

Keeping the revision history is one of the main features in SirixDB. You can revert any revision to an earlier version or back up the system automatically without the overhead of copying. SirixDB only ever copies changed database pages and, depending on the versioning algorithm you chose during the creation of a database/resource, only page fragments, and ancestor index pages to create a new revision.

You can reconstruct every revision in O(n), where n denotes the number of nodes in the revision. Binary search is used on an in-memory (linked) map to load the revision, thus finding the revision root page has an asymptotic runtime complexity of O(log n), where n, in this case, is the number of stored revisions.

Currently, SirixDB offers two built-in native data models, namely a binary XML store and a JSON store.

ย ย 

ย ย 

Articles published on Medium:

Status

SirixDB as of now has not been tested in production. It is recommended for experiments, testing, benchmarking, etc., but is not recommended for production usage. Let us know if you'd like to use SirixDB in production and get in touch. We'd like to test real-world datasets and fix issues we encounter along the way.

Please also get in touch if you like our vision, and you want to sponsor us or help with man-power or if you want to use SirixDB as a research system. We'd be glad to get input from the database and scientific community.

Getting started

Download ZIP or Git Clone

git clone https://github.com/sirixdb/sirix.git

or use the following dependencies in your Maven or Gradle project.

SirixDB uses Java 21, thus you need an up-to-date Gradle (if you want to work on SirixDB) and an IDE (for instance IntelliJ or Eclipse). Also make sure to use the provided Gradle wrapper.

Maven artifacts

At this stage of development, you should use the latest SNAPSHOT artifacts from the OSS snapshot repository to get the most recent changes.

Just add the following repository section to your POM or build.gradle file:

<repository>
 ย <id>sonatype-nexus-snapshots</id>
 ย <name>Sonatype Nexus Snapshots</name>
 ย <url>https://oss.sonatype.org/content/repositories/snapshots</url>
 ย <releases>
 ย  ย <enabled>false</enabled>
 ย </releases>
 ย <snapshots>
 ย  ย <enabled>true</enabled>
 ย </snapshots>
</repository>
repository {
    maven {
        url "https://oss.sonatype.org/content/repositories/snapshots/"
        mavenContent {
            snapshotsOnly()
        }
    }
}

Note that we changed the groupId from com.github.sirixdb.sirix to io.sirix.

Maven artifacts are deployed to the central maven repository (however please use the SNAPSHOT-variants as of now). Currently, the following artifacts are available:

Core project:

<dependency>
 ย <groupId>io.sirix</groupId>
 ย <artifactId>sirix-core</artifactId>
  <version>0.x.y-SNAPSHOT</version>
</dependency>
compile group:'io.sirix', name:'sirix-core', version:'0.x.y-SNAPSHOT'

Brackit binding:

<dependency>
 ย <groupId>io.sirix</groupId>
 ย <artifactId>sirix-query</artifactId>
  <version>0.x.y-SNAPSHOT</version>
</dependency>
compile group:'io.sirix', name:'sirix-query', version:'0.x.y-SNAPSHOT'

Asynchronous, RESTful API with Vert.x, Kotlin and Keycloak (the latter for authentication via OAuth2/OpenID-Connect):

<dependency>
 ย <groupId>io.sirix</groupId>
  <artifactId>sirix-rest-api</artifactId>
  <version>0.x.y-SNAPSHOT</version>
</dependency>
compile group: 'io.sirix', name: 'sirix-rest-api', version: '0.x.y-SNAPSHOT'

Other modules are currently not available (namely the GUI, the distributed package as well as an outdated Saxon binding).

You have to add the following JVM parameters currently:

-ea
--enable-preview
--add-exports=java.base/jdk.internal.ref=ALL-UNNAMED
--add-exports=java.base/sun.nio.ch=ALL-UNNAMED
--add-exports=jdk.unsupported/sun.misc=ALL-UNNAMED
--add-exports=jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED
--add-opens=jdk.compiler/com.sun.tools.javac=ALL-UNNAMED
--add-opens=java.base/java.lang=ALL-UNNAMED
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED
--add-opens=java.base/java.io=ALL-UNNAMED
--add-opens=java.base/java.util=ALL-UNNAMED

Plus we recommend using the Shenandoah GC or ZGC (if possible in the future the generational versions):

-XX:+UseZGC
-Xlog:gc
-XX:+AlwaysPreTouch
-XX:+UseLargePages
-XX:-UseBiasedLocking
-XX:+DisableExplicitGC

We've also had perfect results using GraalVM, possibly due to its JIT compiler and the improved escape analysis.

Setup of the SirixDB HTTP-Server and Keycloak to use the REST-API

The REST-API is asynchronous at its very core. We use Vert.x, which is a toolkit built on top of Netty. It is heavily inspired by Node.js but for the JVM. As such, it uses event loop(s), which is thread(s), which never should by blocked by long-running CPU tasks or disk-bound I/O. We are using Kotlin with coroutines to keep the code simple. SirixDB uses OAuth2 (Password Credentials/Resource Owner Flow) using a Keycloak authorization server instance.

Keycloak setup (Standalone Setup / Docker Hub Image)

You can set up Keycloak as described in this excellent tutorial. Our docker-compose file imports a sirix realm with a default admin user with all available roles assigned. You can skip steps 3 - 7 and 10, 11, and simply recreate a client-secret and change oAuthFlowType to "PASSWORD". If you want to run or modify the integration tests, the client secret must not be changed. Make sure to delete the line "build: ." in the docker-compse.yml file for the server image if you want to use the Docker Hub image.

  1. Open your browser. URL: http://localhost:8080
  2. Login with username "admin" and password "admin"
  3. Create a new realm with the name "sirixdb"
  4. Go to Clients => account
  5. Change client-id to "sirix"
  6. Make sure access-type is set to confidential
  7. Go to the Credentials tab
  8. Put the client secret into the SirixDB HTTP-Server configuration file. Change the value of "client.secret" to whatever Keycloak set up.
  9. If "oAuthFlowType" is specified in the same configuration file change the value to "PASSWORD" (if not default is "PASSWORD").
  10. Regarding Keycloak the direct access grant on the settings tab must be enabled.
  11. Our (user-/group-)roles are "create" to allow creating databases/resources, "view" to allow to query database resources, "modify" to modify a database resource and "delete" to allow deletion thereof. You can also assign ${databaseName}- prefixed roles.

Start Docker Keycloak-Container using docker-compose

For setting up the SirixDB HTTP-Server and a basic Keycloak-instance with a test realm:

  1. git clone https://github.com/sirixdb/sirix.git
  2. sudo docker-compose up keycloak

Start the SirixDB HTTP-Server and the Keycloak-Container using docker-compose

This section describes setting up the Keycloak using docker compose. If you are looking for configuring keycloak from scratch, instructions for that is described in the previous section

For setting up the SirixDB HTTP-Server and a basic Keycloak-instance with a test sirixdb realm:

  1. At first clone this repository with the following command (Or download .zip)

    git clone https://github.com/sirixdb/sirix.git
    
  2. cd into the sirix folder that was just cloned.

    cd sirix/
    
  3. Run the Keycloak container using docker compose

    sudo docker compose up keycloak
    
  4. Visit http://localhost:8080 and login to the admin console using username: admin and password: admin

  5. From the navigation panel on the left, select Realm Settings and verify that the Name field is set to sirixdb

  6. Select the client with Client ID sirix

    1. Verify direct access grant is enabled.
    2. Verify that Access Type is set to confidential
    3. In the credentials tab
      1. Verify that the Client Authentication is set to Client Id and Secret
      2. Click on Regenerate Secret to generate a new secret. Set the value of the field named client.secret of the configuration file to this secret.
  7. Finally run the SirixDB-HTTP Server and Keycloak container with docker compose

     docker compose up
    

SirixDB HTTP-Server Setup Without Docker/docker-compose

To created a fat-JAR. Download our ZIP-file for instance, then

  1. cd bundles/sirix-rest-api
  2. ./gradlew build -x test

And a fat-JAR with all required dependencies should have been created in your target folder.

Furthermore, a key.pem and a cert.pem file are needed. These two files have to be in your user home directory in a directory called "sirix-data", where Sirix stores the databases. For demo purposes, they can be copied from our resources directory.

Once also Keycloak is set up we can start the server via:

java -jar -Duser.home=/opt/sirix sirix-rest-api-*-SNAPSHOT-fat.jar -conf sirix-conf.json -cp /opt/sirix/*

If you like to change your user home directory to /opt/sirix for instance.

The fat-JAR in the future will be downloadable from the maven repository.

Run the Integration Tests

In order to run the integration tests under bundles/sirix-rest-api/src/test/kotlin make sure that you assign your admin user all the user roles you have created in the Keycloak setup (last step). Make sure that Keycloak is running first and execute the tests in your favorite IDE for instance.

Note that the following VM parameters currently are needed: -ea --add-modules=jdk.incubator.foreign --enable-preview

Command-line tool

We ship a (very) simple command-line tool for the sirix-query bundle:

Get the latest sirix-xquery JAR with dependencies.

Documentation

We are currently working on the documentation. You may find first drafts and snippets in the documentation and in this README. Furthermore, you are kindly invited to ask any question you might have (and you likely have many questions) in the community forum (preferred) or in the Discord channel. Please also have a look at and play with our sirix-example bundle which is available via Maven or our new asynchronous RESTful API (shown next).

Getting Help

Community Forum

If you have any questions or are considering contributing or using Sirix, please use the Community Forum to ask questions. Any kind of question, may it be an API question or enhancement proposal, questions regarding use cases are welcome... Don't hesitate to ask questions or make suggestions for improvements. At the moment also, API-related suggestions and critics are of utmost importance.

Join us on Discord

You may find us on Discord for quick questions.

Contributors โœจ

SirixDB is maintained by

  • Johannes Lichtenberger

And the Open Source Community.

As the project was forked from a university project called Treetank, my deepest gratitude to Marc Kramis, who came up with the idea of building a versioned, secure, and energy-efficient data store, which retains the history of resources of his Ph.D. Furthermore, Sebastian Graf came up with a lot of ideas and greatly improved the implementation of his Ph.D. Besides, a lot of students worked and improved the project considerably.

Thanks go to these wonderful people, who greatly improved SirixDB lately. SirixDB couldn't exist without the help of the Open Source community:


Ilias YAHIA

๐Ÿ’ป

BirokratskaZila

๐Ÿ“–

Andrei Buiza

๐Ÿ’ป

Bondar Dmytro

๐Ÿ’ป

santoshkumarkannur

๐Ÿ“–

Lars Eckart

๐Ÿ’ป

Jayadeep K M

๐Ÿ“†

Keith Kim

๐ŸŽจ

Theofanis Despoudis

๐Ÿ“–

Mario Iglesias Alarcรณn

๐ŸŽจ

Antonio Nuno Monteiro

๐Ÿ“†

Fulton Browne

๐Ÿ“–

Felix Rabe

๐Ÿ“–

Ethan Willis

๐Ÿ“–

Erik Axelsson

๐Ÿ’ป

Sรฉrgio Batista

๐Ÿ“–

chaensel

๐Ÿ“–

Balaji Vijayakumar

๐Ÿ’ป

Fernanda Campos

๐Ÿ’ป

Joel Lau

๐Ÿ’ป

add09

๐Ÿ’ป

Emil Gedda

๐Ÿ’ป

Andreas Rohlรฉn

๐Ÿ’ป

Marcin Bielecki

๐Ÿ’ป

Manfred Nentwig

๐Ÿ’ป

Raj

๐Ÿ’ป

Moshe Uminer

๐Ÿ’ป

Contributions of any kind are highly welcome!

License

This work is released under the BSD 3-clause license.

sirix's People

Contributors

alanjereb avatar allcontributors[bot] avatar aminmalek avatar artwo avatar balajiv113 avatar bark avatar blueclaus13 avatar chaensel avatar connorcable avatar dependabot-preview[bot] avatar dependabot[bot] avatar gvdutra avatar ikorennoy avatar johanneslichtenberger avatar jpdsousa avatar karmakaze avatar kmjayadeep avatar larseckart avatar loniks avatar manfrednentwig avatar marcinbieleckilll avatar mikkar2 avatar mosheduminer avatar philipp-p avatar punit-kulal avatar raj-datta-manohar avatar redwanulsourav avatar shdowtail avatar slashgk avatar tsufiyan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sirix's Issues

rtx.getType() return wrong thing.

Code:
rtx.moveTo(diffTuple.getNewNodeKey());
System.out.println("Is element: " +rtx.isElement());
System.out.println("Is attr: " +rtx.isAttribute());
System.out.println("Is text: " +rtx.isText());
System.out.println("type: "+rtx.getType());

Output:
Element in:
Is element: true
Is attr: false
Is text: false
type: xs:untyped

Expected:
type: xs:element

Attribute in:
Is element: false
Is attr: true
Is text: false
type: xs:untyped
type: xs:attribute

Text in:
Is element: false
Is attr: false
Is text: true
[error] play - Cannot invoke the action, eventually got an error: java.lang.IllegalStateException: No other node types supported!
at play.api.Application$class.handleError(Application.scala:293) ~[play_2.10.jar:2.2.2]
at play.api.DefaultApplication.handleError(Application.scala:399) [play_2.10.jar:2.2.2]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$2$$anonfun$applyOrElse$3.apply(PlayDefaultUpstreamHandler.scala:261) [play_2.10.jar:2.2.2]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$2$$anonfun$applyOrElse$3.apply(PlayDefaultUpstreamHandler.scala:261) [play_2.10.jar:2.2.2]
at scala.Option.map(Option.scala:145) [scala-library.jar:na]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$2.applyOrElse(PlayDefaultUpstreamHandler.scala:261) [play_2.10.jar:2.2.2]
Caused by: java.lang.IllegalStateException: No other node types supported!
at org.sirix.page.NamePage.getName(NamePage.java:159) ~[sirix-core-0.1.2-SNAPSHOT.jar:na]
at org.sirix.access.PageReadTrxImpl.getName(PageReadTrxImpl.java:484) ~[sirix-core-0.1.2-SNAPSHOT.jar:na]
at org.sirix.access.NodeReadTrxImpl.getType(NodeReadTrxImpl.java:375) ~[sirix-core-0.1.2-SNAPSHOT.jar:na]
at models.SirixHandler.getDiffs(SirixHandler.java:613) ~[na:na]
at models.SirixHandler.generateDiffs(SirixHandler.java:572) ~[na:na]
at models.SirixHandler.getDiff(SirixHandler.java:751) ~[na:na]

Expekted xs:text or xs:string

Static number of indirect pages should be dynamic

During insertion we have to check if a node exceeds the maximum number of nodes, which can be stored in the record pages and add an indirect page if the node to insert has ID max-number-of-nodes-storable+1. We then have to keep track how many levels we have. But this will improve performance considerably.

Features in a nutshell not clear.

Features in a nutshell

Import of differences between two XML-documents, that is after the first version of an XML-document is imported an algorithm tries to update the Sirix resource with a minimum of operations to change the first version into the new version.

By reading the first part, "Import of differences between two XML-documents", one assumes that one can import a 'patch file' and apply it to the existing version in the repository.

But on the second part, one gets the idea that adding a second xml file, sirix generates the 'patch' inside and applies the differences.

It would be quite nice to have the first one, is there any import of a 'patch' available or planned?

JSON node layer

Each node has a unique 64bit identifyer (long).

  • Object node, consisting of record-nodes (simple name/value)-pairs. The value itself must be a another node, referenced by a long.
  • Array-node, consitsing of arrayItem-nodes, consisting of any type of nodes (long-values). We can simply store pointers to the right sibling for each arrayItem-node.
  • Null-node.
  • Boolean-node.
  • String-node.
  • Number-node, maybe the same as string but with a bit set, if it's a number or a string.

This simple design allows fine-granular copy-on-write operations.

How to reset the database in between XQuery queries

We have been reseting the database when using the low-level API with the following snippet:

//deletes the database (mydocs.col)
//uses org.apache.commons.io.FileUtils
File databaseLocation = new File(DATA_LOCATION, "mydocs.col")
FileUtils.deleteQuietly(databaseLocation);

//recreates the database
final DatabaseConfiguration dbConf = new DatabaseConfiguration(databaseLocation);
Databases.truncateDatabase(dbConf);
Databases.createDatabase(dbConf);
Database database;
try {
    database = Databases.openDatabase(dbConf.getFile());
    database.createResource(ResourceConfiguration.newBuilder(SirixHandler.RESOURCE, dbConf).useDeweyIDs(true).useTextCompression(true).buildPathSummary(false).build());
    final Session session = database.getSession(new SessionConfiguration.Builder(SirixHandler.RESOURCE).build());
} catch (SirixException e) { e.printStackTrace(); }

//DO OTHER THINGS...

We want to be able to do the same in between XQuery queries, but we seem to be failing miserably in that.
We can't use the previous code because after removing and recreation, we get a
SirixIOException: org.sirix.exception.SirixIOException: java.io.EOFException
when running any query after the removal in run time.

I assume that we are not recreating the database properly for the query to be performed.

Any help would be, as always, highly appreciated ๐Ÿ˜ƒ

Node.js GUI

Hi!

I want to build a node.js Application for selecting a json XPATH by a Drag&Drop / Selecting-Elements GUI like the SunburstView. Is there a way to implement this in node / javascript ? It looks really great and would be pure intuitiv to change matching XPATH Elements.

Kind regards!

GUI errors

I have try to build the GUI and i get started the project but as soon as i choose to open resource and select a folder, it is crashing. Same happens when opening an XML document to shredder.
And i don't know really how to use the program as some documentation is kind of needed. I kind of want to visualize the database in some way. Can you give me some pointers?

GUI

  • Introduce a presentation model, capturing all this ugly mutable state inside the GUIs.
  • Code review.
  • Fixing PApplet/PApplet issues (two or more processing views activated are not working as of now).
  • Unit tests for the models with very simple basic trees.

Transaction over many resource transactions

Finishing the work. For instance fix truncate to just committed revision - 1. Add a resource lock manager to lock the just created revision until all resource transactions have been committed, to prevent dirty reads. Add a method to explicitly retry failed transactions.

Refactoring of page references

Use the idea of a bitmap to mark which entries in an array are null/not null. See array mapped tries in closure for instance.

java/util/Optional

When i update the library today i get the error and i cant realy find what ypu have updated for me to get this error:
java.lang.NoSuchMethodError: org.sirix.node.interfaces.immutable.ImmutableNode.getDeweyID()Ljava/util/Optional;
at org.sirix.xquery.node.DBNode.(DBNode.java:105)
at org.sirix.xquery.node.SubtreeBuilder.startElement(SubtreeBuilder.java:195)
at org.brackit.xquery.node.parser.SAX2SubtreeHandlerAdapter.startElement(SAX2SubtreeHandlerAdapter.java:255)
at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source)
at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
at org.apache.xerces.impl.XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.brackit.xquery.node.parser.DocumentParser.parse(DocumentParser.java:130)
at org.sirix.xquery.node.DBStore.create(DBStore.java:189)
at org.sirix.xquery.node.DBStore.create(DBStore.java:164)
at org.sirix.xquery.node.DBStore.create(DBStore.java:45)
at org.brackit.xquery.function.bit.Load.create(Load.java:139)
at org.brackit.xquery.function.bit.Load.execute(Load.java:98)
at org.brackit.xquery.function.FunctionExpr.evaluate(FunctionExpr.java:113)
at org.brackit.xquery.XQuery.run(XQuery.java:88)
at org.brackit.xquery.XQuery.evaluate(XQuery.java:79)
at models.XQueryUsage.init(XQueryUsage.java:218)
at models.XQueryUsage.loadDocumentAndQueryTemporal(XQueryUsage.java:241)
at controllers.Application.testSirix(Application.java:284)

Brackit(.org) binding.

Provide a brackit-binding for full XQuery/XQuery Update Facility support and integrate indexes into the query processor.

Eventually got an error: java.lang.IllegalStateException: Failed to move to nodeKey: X

XML in:
<log tstamp="Sun May 04 23:29:47 CEST 2014" severity="high" foo="bar"> <src>192.168.0.1</src> <content> <a> <a>init file</a> </a> </content> </log>

Query:
replace node doc('mydocs.col')/log/content with <a><div>hej</div></a>
correct result:
<log tstamp="Sun May 04 23:29:47 CEST 2014" severity="high" foo="bar"> <src>192.168.0.1</src> <a> <div>hej</div> </a> </log>

If i use the same input but change the query to:
replace node doc('mydocs.col')/log/content/a with <a><div>hej</div></a>

Next time i run the query:doc('mydocs.col')/log/all-time::*
i get the error:
Eventually got an error: java.lang.IllegalStateException: Failed to move to nodeKey: X
I use the new SirixCompileChain if it has something to do with it.

Provide indexes.

Providing revisioned indexes on the fly during update-operations.

Docker compose

As the RESTful-API and thus the docker container is needing a started Keycloak instance, we could provide a Docker Compose file.

Correct Name + SVG Logo?

I've added your new system to my encyclopedia of databases:

https://dbdb.io/db/sirix

Two questions:

  1. Is the official name of the database Sirix or SirixDB? The Github repo uses both.

  2. Is there an SVG logo available?

-- Andy

Link NodePages to previous versions

Link NodePages to previous versions to skip the whole tree-traversal (even if based on bit shiftings) such that NodePage reconstruction should be faster.

Insert attribute in element

I am not sure i use the correct syntax.
XML before in mydocs.col:

Running query:
insert node attribute { 'a' } { 5 } into doc('mydocs.col')/a
Is this wrong syntax or is this not implemented?

Running
insert node attribute into doc('mydocs.col')/a
Works.

The first one i get the error:
[error] play - Cannot invoke the action, eventually got an error: java.lang.ClassCastException: org.sirix.access.NodeReadTrxImpl cannot be cast to org.sirix.api.NodeWriteTrx
[error] application -

! @6ie84fmpf - Internal server error, for (POST) [/commit] ->

play.api.Application$$anon$1: Execution exception[[ClassCastException: org.sirix.access.NodeReadTrxImpl cannot be cast to org.sirix.api.NodeWriteTrx]]
at play.api.Application$class.handleError(Application.scala:293) ~[play_2.10.jar:2.2.2]
at play.api.DefaultApplication.handleError(Application.scala:399) [play_2.10.jar:2.2.2]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$2$$anonfun$applyOrElse$3.apply(PlayDefaultUpstreamHandler.scala:261) [play_2.10.jar:2.2.2]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$2$$anonfun$applyOrElse$3.apply(PlayDefaultUpstreamHandler.scala:261) [play_2.10.jar:2.2.2]
at scala.Option.map(Option.scala:145) [scala-library.jar:na]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$2.applyOrElse(PlayDefaultUpstreamHandler.scala:261) [play_2.10.jar:2.2.2]
Caused by: java.lang.ClassCastException: org.sirix.access.NodeReadTrxImpl cannot be cast to org.sirix.api.NodeWriteTrx
at org.sirix.xquery.node.DBNode.setAttribute(DBNode.java:1371) ~[sirix-xquery-0.1.2-SNAPSHOT.jar:na]
at org.sirix.xquery.node.DBNode.setAttribute(DBNode.java:1) ~[sirix-xquery-0.1.2-SNAPSHOT.jar:na]
at org.brackit.xquery.update.op.InsertAttributesOp.doInsert(InsertAttributesOp.java:46) ~[brackit-0.1.3-SNAPSHOT.jar:na]
at org.brackit.xquery.update.op.AbstractInsertOp.apply(AbstractInsertOp.java:56) ~[brackit-0.1.3-SNAPSHOT.jar:na]
at org.brackit.xquery.update.UpdateList.apply(UpdateList.java:94) ~[brackit-0.1.3-SNAPSHOT.jar:na]
at org.brackit.xquery.QueryContext.applyUpdates(QueryContext.java:105) ~[brackit-0.1.3-SNAPSHOT.jar:na]

JSON transaction layer.

Either we use the XdmNodeRead / Write transactions or we provide a simple means to insert.

  • empty objects.
  • object-records.
  • empty arrays.
  • array-items (with an index).
  • null-nodes.
  • boolean-nodes.
  • string/number-nodes.

We also should be able to delete those in the same way.

Refactoring the XdmNodeReadTrx

Use the command pattern, basically a factory which creates several commands internally, such that more or less the content of each method is encapsulated in a seperate class, so we can finally write simple unit tests (multiple small classes way better than this monster class).

Cluster nodes.

Cluster nodes during full dumps. However this is an open issue.

Revision the database itself, not only the resources

Idea is simply to create a metadata Sirix resource which stores the names of the resources and their current revision and if they are marked as deleted.

This depends on the database transaction issue, which needs to be finished first.

Fixing the docker image

Issue is:

docker run -t -i -p 9443:9443 sirixdb/sirix
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:8080
at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:632)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:844)
Caused by: java.net.ConnectException: Connection refused
... 11 more
Dec 27, 2018 3:09:07 PM io.vertx.core.impl.launcher.commands.VertxIsolatedDeployer
SEVERE: Failed in deploying verticle
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:8080
at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:632)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:844)
Caused by: java.net.ConnectException: Connection refused
... 11 more

Encryption of the page-fragments

I wanted to use Google Tinke, but something is way off the rails, because it says that I'm using a different key for encrypting/decrypting.

Caused by: java.io.IOException: No matching key found for the ciphertext in the stream.
at com.google.crypto.tink.streamingaead.InputStreamDecrypter.read(InputStreamDecrypter.java:187)
at com.google.crypto.tink.streamingaead.InputStreamDecrypter.read(InputStreamDecrypter.java:130)
at com.google.crypto.tink.streamingaead.InputStreamDecrypter.read(InputStreamDecrypter.java:121)
at java.base/java.io.DataInputStream.readByte(DataInputStream.java:270)
at org.sirix.page.PagePersister.deserializePage(PagePersister.java:49)
at org.sirix.io.file.FileReader.read(FileReader.java:125)

TextValue/AttributeValue-indexes, Path-index

Keep in mind that once the text-index has been included as well as the binding to Brackit, that we have to make sure that Path-index-updates are executed before Text-index updates.

Docker setup / Fix README

  • pull the image from Docker Hub
  • create a docker container
  • update the container with an adapted sirix-conf.json file (client.secret value for Keycloak auth must be set and maybe the listening port for the HTTP(S) verticle
  • start the docker container

Make it work, I'm by no means sadly no docker guru.. and fix the README section accordingly.

JSON-resource manager

More or less the same as the XdmResourceManager, to start/end transactions on a JSON-resource.

Xquery replace followed by insert gives unexpected result

I want to do a replace and after a insert query and getting wrong result.
Code:

try (DBStore store= DBStore.newBuilder().build();){
  CompileChain compileChain = new SirixCompileChain(store);
  QueryContext ctx1 = new SirixQueryContext(store);
  String query2="replace node doc('" + databaseName + "')/log/src with <node>aaa</node>";
  new XQuery(compileChain, query2).evaluate(ctx1);
  String query3="insert nodes <ab>abc</ab> into doc('" + databaseName+ "')/log/content";
  new XQuery(compileChain,query3 ).evaluate(ctx1);
}

running query:doc('mydocs.col')/log/all-time::*

<log tstamp="Tue May 06 12:56:10 CEST 2014" severity="high" foo="bar">
  <src>192.168.0.1</src>
  <content>
    <a.txt/>
  </content>
</log>

<log tstamp="Tue May 06 12:56:10 CEST 2014" severity="high" foo="bar">
  <node>aaa</node>
  <content>
    <a.txt/>
  </content>
</log>

<log tstamp="Tue May 06 12:56:10 CEST 2014" severity="high" foo="bar">
  <content>
    <a.txt/>
  </content>
</log>

Expected is that the last one

<log tstamp="Tue May 06 12:56:10 CEST 2014" severity="high" foo="bar">
  <node>aaa</node>
  <content>
    <ab>abc</ab>
    <a.txt/>
  </content>
</log>

If we use two insert it works. or if I use a new store, it also works.

JSON

Implementing two node types, JSONArray and JSONObject. Implementing a JSONReadTrx and a JSONReadWriteTrx on top of the page reader/writer transactions just as in the XML case. We also need a second resource manager as well as methods in the database implementation to create a resource and open a resource manager.

ENTITY dont work

XML file:

]>

Create a volume with 8ย GB of space, and specify the availability zone and image:

Giving error:
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,142]
Message: The entity "nbsp" was referenced, but not declared.
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:601) ~[na:1.8.0_20]
at com.sun.xml.internal.stream.XMLEventReaderImpl.peek(XMLEventReaderImpl.java:276) ~[na:1.8.0_20]
at org.sirix.service.xml.shredder.XMLShredder.insertNewContent(XMLShredder.java:262) ~[sirix-core-0.1.2-SNAPSHOT.jar:na]
at org.sirix.service.xml.shredder.XMLShredder.call(XMLShredder.java:217) ~[sirix-core-0.1.2-SNAPSHOT.jar:na]
at org.sirix.diff.service.FMSEImport.shredder(FMSEImport.java:99) ~[sirix-core-0.1.2-SNAPSHOT.jar:na]
at org.sirix.diff.service.FMSEImport.dataImport(FMSEImport.java:122) ~[sirix-core-0.1.2-SNAPSHOT.jar:na]

Moving a subtree inside its sibling fails with mention to DeletedNode.

I'm trying to move a subtree inside its sibling and am not succeeding.

my input xml:
<bar>
  <foz>
    <baz/>
  </foz>
  <foo>
    <bao/>
  </foo>
</bar>
what I intend as output xml:
<bar>
  <foz>
    <foo>
      <bao/>
    </foo>
    <baz/>
  </foz>
</bar>
function I am calling to achieve this:
NodeWriteTrx moveSubtreeToFirstChild(long fromKey) 
    throws SirixException
relevant calling code
// ...
NodeWriteTrx wtx = session.beginNodeWriteTrx();
// ... code that writes the input xml...
NodeReadTrx rtx = session.beginNodeReadTrx();
// .... code that moves rtx to 'foo' and wtx to 'foz'
long fromKey = rtx.getNodeKey();
wtx.moveSubtreeToFirstChild(fromKey).commit();
// ...
error output
java.lang.ClassCastException: org.sirix.node.DeletedNode cannot be cast to org.sirix.node.interfaces.StructNode

Caused by: java.lang.ClassCastException: org.sirix.node.DeletedNode cannot be cast to org.sirix.node.interfaces.StructNode
    at org.sirix.index.path.summary.PathSummaryWriter.removePathSummaryNode(PathSummaryWriter.java:660) ~[sirix-core-0.1.2-SNAPSHOT.jar:na]
    at org.sirix.index.path.summary.PathSummaryWriter.adaptPathForChangedNode(PathSummaryWriter.java:443) ~[sirix-core-0.1.2-SNAPSHOT.jar:na]
    at org.sirix.access.NodeWriteTrxImpl.moveSubtreeToFirstChild(NodeWriteTrxImpl.java:322) ~[sirix-core-0.1.2-SNAPSHOT.jar:na]

Any help with this would be much appreciated ๐Ÿ˜ƒ

PS: Some context on your users: I'm working in the same project with @bark, we're doing our master thesis in XML versioning.

SirixUsageException when using FMSEImport

We're getting a SirixUsageException when trying to use the FMSEImport.

SirixUsageException: Resource could not be opened (since it was not created?) at location /.../sirix-data/mydocs.col/resources/shredded

We have an initial commit with:

<log tstamp='%s' severity='%s' foo='bar'>
<src></src>
<msg><bbb></bbb></msg>
<content></content>
</log>

And we're importing another file with:

<log><a></a></log>

Using the following code:

// XML document which should be imported as the new revision.
   File resNewRev = File.createTempFile("temp-file-name", ".tmp"); 
BufferedWriter bw = new BufferedWriter(new FileWriter(resNewRev));
   bw.write("<log><a></a></log>");
   bw.close();
// Determine and import differences between the sirix resource and the
// provided XML document.
final FMSEImport fmse = new FMSEImport();
fmse.main(new String[]{LOCATION.getAbsolutePath()+"/"+databaseName+"/",resNewRev.getAbsolutePath()}); 

The stack trace:

play.api.Application$$anon$1: Execution exception[[SirixUsageException: Resource could not be opened (since it was not created?) at location /home/bark/sirix-data/mydocs.col/resources/shredded ]]
at play.api.Application$class.handleError(Application.scala:293) ~[play_2.10.jar:2.2.2]
at play.api.DefaultApplication.handleError(Application.scala:399) [play_2.10.jar:2.2.2]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$2$$anonfun$applyOrElse$3.apply(PlayDefaultUpstreamHandler.scala:261) [play_2.10.jar:2.2.2]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$2$$anonfun$applyOrElse$3.apply(PlayDefaultUpstreamHandler.scala:261) [play_2.10.jar:2.2.2]
at scala.Option.map(Option.scala:145) [scala-library.jar:na]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$2.applyOrElse(PlayDefaultUpstreamHandler.scala:261) [play_2.10.jar:2.2.2]
Caused by: org.sirix.exception.SirixUsageException: Resource could not be opened (since it was not created?) at location /home/bark/sirix-data/mydocs.col/resources/shredded 
at org.sirix.access.DatabaseImpl.getSession(DatabaseImpl.java:249) ~[sirix-core-0.1.2-SNAPSHOT.jar:na]
at org.sirix.diff.service.FMSEImport.dataImport(FMSEImport.java:126) ~[sirix-core-0.1.2-SNAPSHOT.jar:na]
at org.sirix.diff.service.FMSEImport.main(FMSEImport.java:169) ~[sirix-core-0.1.2-SNAPSHOT.jar:na]
at models.SirixHandler.commit(SirixHandler.java:171) ~[na:na]
at controllers.Application.commit(Application.java:216) ~[na:na]
at Routes$$anonfun$routes$1$$anonfun$applyOrElse$14$$anonfun$apply$14.apply(routes_routing.scala:229) ~[na:na]

We are a bit out of ideas of why it isn't able to shread the file, can you share any light?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.