Coder Social home page Coder Social logo

cumulusrdf's People

Watchers

 avatar

cumulusrdf's Issues

Sort-merge join instead of nested-loop join

Currently, we use the standard nested-loop join (with index support) from 
Sesame. 

However, stored SPO-style indexes in a sorted fashion is fairly easy in 
Cassandra (and already implemented to some extend). Thus, a sorted-merge could 
be implemented without that much work. See, e.g., [1].

- Andreas

[1] http://www.informatik.uni-freiburg.de/~mschmidt/docs/sp2b_exp.pdf

Original issue reported on code.google.com by andreas.josef.wagner on 29 Jan 2014 at 9:14

$pageName on load page

"$pageName" on load Web GUI page after successful upload.


Original issue reported on code.google.com by andreas.josef.wagner on 7 Mar 2014 at 2:28

Evaluation: Composites vs Byte arrays

This is not really a bug. Instead, as discussed here

https://groups.google.com/forum/#!topic/cumulusrdf-dev-list/vOKdDAXJEqg

We could do some benchmark / test in order to see if we really need Composites. 

We are already working with the low level form of serialization (byte arrays) 
so maybe the abstraction and the "complexity" offered by Composites could be 
avoided.  

Original issue reported on code.google.com by [email protected] on 16 Feb 2014 at 2:25

  • Merged into: #36

the RespositoryConnection object cannot clear the repository with null context

What steps will reproduce the problem?
1. when the HttpRepository send a clear request with null context
2. the RepositoryConnectino that is get from the repository in servletContext 
would execute like conn.clear(null)
3. then get the message as "not supported: contexts == null || contexts.length 
== 0"

What is the expected output? What do you see instead?
according to the sesame API, if the context is null, then it would clear the 
whole repository. So it should support this operation instead.

Please use labels and text to provide additional information.


Original issue reported on code.google.com by [email protected] on 8 Apr 2014 at 12:51

Timeout to connect to Cassandra too low

What steps will reproduce the problem?
1. Start Cassandra
2. Start Tomcat

What is the expected output? What do you see instead?

Cumulus webapp should connect to Cassandra, but Cassandra is still booting up.  
Increase timeout (or do retries) for connecting.

Original issue reported on code.google.com by [email protected] on 25 Jan 2013 at 2:21

Logging framework + message catalog

Two enhancements are included in this issue:

1) Refactor code in order to use a more flexible and fast logging framework 
(log4j or logback). At the moment JULI is used which is optimized in Tomcat but 
relies on standard java util logging which offers a limited set of 
capabilities. 

2) A message catalog, that would consiste in an enumerative interface 
(IMessageCatalog) where all CumulusRDF messages are defined. That would allow a 
structured log with (for example) messages like this

...
2014-01-15 17:05:42,105 INFO  <CRDF-00011> : CUMULUS-RDF 1.0.0 open for 
e-business.  
...

As you can see other than having all messages classified, we could associate a 
code with each message and, for relevant messages (e.g. errors), we could 
create a Wiki page with something like:

- Code: CDRF-000034
- Level: ERROR
- Message: Malformed configuration file.
- Suggested action: check your configuration file blabalbla

I know, that would require a more effort each time we need to write an 
additional log message, but at the same time it will provide a very powerful 
and meaningful log subsystem 

Original issue reported on code.google.com by [email protected] on 27 Jan 2014 at 10:11

Support further RDF serializations

Support further RDF serializations, e.g., JSON-LD, Turtle, etc. These 
serializations could be used, e.g., in

* Dump CLI
* Loader CLI
* Servlets

Original issue reported on code.google.com by andreas.josef.wagner on 22 Jan 2014 at 8:50

Test failures in build 23

Lot of test failures after running the whole suite with new Asynch Bulk loader

See 
http://dev.aifb.kit.edu/jenkins/job/CumulusRDF-Milestone-v1.1/lastBuild/testRepo
rt/


Original issue reported on code.google.com by [email protected] on 5 Mar 2014 at 4:29

Support transactions in Sesame

No support for transactions in Sesame, see 
[http://openrdf.callimachus.net/sesame/2.7/docs/users.docbook?view#section-repos
itory-api6 Sesame documentation].

Original issue reported on code.google.com by andreas.josef.wagner on 22 Nov 2013 at 12:29

LoadCLI does not support multithreading any more ...

LoadCLI does not support multithreading any more. It simply uses Sesame to add 
the file. This is not the intended way LoadCLI should work.

Original issue reported on code.google.com by andreas.josef.wagner on 13 Mar 2014 at 11:03

CumulusRDF webapp GUI

Although this is not a real priority for CumulusRDF, I believe we should create 
a more nice (simple) GUI for web pages.

in order to keep things simple, lightweight and fast, I suggest to use 

- bootstrap [1] for graphical things: there's a dashboard [2] sample page that 
should perfectly fits out needs;
- velocity [3] for dynamic pages: it has a very easy and powerful scripting 
language

In this way we could, at least, substitue the info and the welcome page with a 
more attractive dashboard. On top of that, we could gradually insert some 
additional functionality on the sidebar, as happens on Sesame admin console 
(e.g. summary, reports, export, add data, query, explore, remove data, SPARQL 
query & update)

[1] http://getbootstrap.com/
[2] http://getbootstrap.com/examples/dashboard
[3] http://velocity.apache.org/


Original issue reported on code.google.com by [email protected] on 16 Feb 2014 at 4:24

Complex Accept header parsing does not work

What steps will reproduce the problem?
1. access a CumulusRDF URI with a complex accept header (e.g., using multiple 
content types with preferences)
2. problem

What is the expected output? What do you see instead?

The client should get the correctly negotiated format.


Original issue reported on code.google.com by [email protected] on 3 Feb 2013 at 3:34

Add HTTPRepository

Implement a Sesame HTTPRepository. See:

* org.openrdf.http.client.HTTPClient
* 
http://answers.semanticweb.com/questions/22068/exposing-a-triple-store-as-a-sesa
me-http-repository


Original issue reported on code.google.com by andreas.josef.wagner on 22 Jan 2014 at 9:04

Dictionary Performance

SimpleCassandraMapDictionary has a terrible performance. This, in turn, leads a 
bad performance for RDF insert operations. 


Original issue reported on code.google.com by andreas.josef.wagner on 22 Jan 2014 at 8:32

Loading large files gives timeouts

What steps will reproduce the problem?
1. Load a large (> 2 m triples) file.
2. You will see timeout messages.

What is the expected output? What do you see instead?

Higher timeouts, perhaps slowing down input.

Original issue reported on code.google.com by [email protected] on 25 Jan 2013 at 2:23

Defect in CLI Loader

CLI Loader does not load data ... 


Original issue reported on code.google.com by andreas.josef.wagner on 11 Dec 2013 at 1:42

Error in build 20

00:37:45,114 ERROR 
[edu.kit.aifb.cumulus.util.hector.CassandraHectorCounterFactory] counter: 
TRIPLE_COUNTER suffered an overflow! current counter value: -3

Original issue reported on code.google.com by andreas.josef.wagner on 3 Mar 2014 at 2:01

NodeDictionaryBase fails to create (datatyped) literals

NodeDictionaryBase:136 creates literals assuming the n3 string has only the 
value (no language no datatype). 
In case of (example) 

n3 = "2012-02-01T09:53:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>

the Literal instance creation 

Literal l = ValueFactory.createLiteral(n3)

leads to a wrong value because datatype (and language) part is seen has part of 
the value. I mean, a new Literal is created with the following value:

""2012-02-01T09:53:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>"

What is the expected output? What do you see instead?
I would expect a Literal correctly created, with value, language and datatype.

This blocks a lot of unit test because system index triples but is not able to 
correctly return them as part of SELECT or DESCRIBE command  

Original issue reported on code.google.com by [email protected] on 2 Feb 2014 at 9:36

Sparql test cases

As briefly discussed with Andreas, I would like to create a whole SPARQL test 
suite that covers as much scenarios as possible.

To do that, we could use (I asked to author and is ok for him, I'm waiting from 
OReilly permission) the examples (both ttl and rq files in book "Leaning 
SPARQL" [1])

So we will create a test case with several test methods that use and assert the 
examples in the book.

In case OReilly doesn't allow such usage I'll use those examples in order to 
create our own set of datafiles.  

[1] http://www.learningsparql.com/


Original issue reported on code.google.com by [email protected] on 19 Feb 2014 at 2:42

Remove unnecessary sesame dependencies

We currently have 

<groupId>org.openrdf.sesame</groupId>
<artifactId>sesame-runtime</artifactId>

in our current pom. This simply adds (almost) all sesame libs. Regardless if 
they are needed. TODO: remove unnecessary sesame dependencies. This would make 
the jar/war more lightweight in terms of space.

Original issue reported on code.google.com by andreas.josef.wagner on 24 Jan 2014 at 8:24

Simple keyword search

Simple keyword search: just a conjunction of terms tokenised from literals. 

  * Could be done using CQL collections: http://www.datastax.com/documentation/cql/3.0/webhelp/index.html#cql/cql_using/use_collections_c.html#useCollections
  * Lucence/Solr integration
    * Stargate: http://tuplejump.github.io/stargate/index.html //looks cool
    * Lucandra/Solandra: https://github.com/tjake/Solandra //not maintained
    * Datastax Enterprise search(DSE) //not open-source

Original issue reported on code.google.com by andreas.josef.wagner on 12 Feb 2014 at 4:18

Upgrade to Sesame 2.7.11

Upgrade to Sesame 2.7.11, see [1].

[1] https://openrdf.atlassian.net/browse/SES/fixforversion/11701

Original issue reported on code.google.com by andreas.josef.wagner on 14 Apr 2014 at 11:14

New (Maven) project layout

As discussed here [1], in order to enable several perspectives of the project 
test suite, we should change the project layout a bit. The layout that comes 
from the initial discussion [1] seems something like this:

cumulusrdf
--cumulusrdf-kernel
--cumulusrdf-integration-tests
--cumulusrdf-benchmark
--??

Where 

a) cumulusrdf: a top level project with pom packaging
b) cumulusrdf-kernel: please suggest a more appropriate name :), this is the 
current cumulusrdf module (war packaging). It includes sources and unit tests.
c) cumulusrdf-integration-tests: as the name suggests, this module includes 
only integration / system tests
d) cumulusrdf-benchmark: a special test module dedicated to benchmarking the 
corresponding release artifact

Another interesting module could be a "distribution", that uses the maven 
assembly plugin to produce different kind of artifacts (e.g. onejar, war, 
directory)


[1] 
https://groups.google.com/forum/#!topicsearchin/cumulusrdf-dev-list/maven|sort:d
ate|spell:true/cumulusrdf-dev-list/z3JegSK17gY




Original issue reported on code.google.com by [email protected] on 16 Feb 2014 at 4:15

Remove dependencies to NxParser? and Yars

Remove dependencies to NxParser? and Yars, only use Sesame 
model/parsers/writers.

Original issue reported on code.google.com by andreas.josef.wagner on 22 Nov 2013 at 12:18

Running several cassandra-unit tests is not possible

What steps will reproduce the problem?
1. Run more than one unit test that uses cassandra-unit for starting Cassandra

What is the expected output? What do you see instead?
While I expect all tests correctly run, only the first will succeed because 
from the second the embedded Cassandra complains about a duplicate index. This 
seems to be related to cassandra-unit which doesn't provide a way to shutdown 
the embedded instance between tests. 

Original issue reported on code.google.com by [email protected] on 31 Jan 2014 at 2:21

Evaluate entity queries via a single scan

Currently, entity queries, e.g,. 

?x knows ?y .
?x name "x" .
?x age "18" .

are evaluated via joins along their subject (in the above example: ?x). That 
is, one would need to compute bindings for each triple patten, and join them 
using two equi-joins. 

However, this (probably) could be done much more efficiently with a single 
scan. That is, one would start with a scan of the pattern with the least 
matches (e.g., ?x age "18"):

x1 age "18" --> scan for x1 ?p ?o
x2 age "18" --> scan for x2 ?p ?o
x3 age "18" --> scan for x3 ?p ?o
...

Each such scan (x1 ?p ?o) would result in additional property/object pairs - 
these could be pushed to subsequent triple pattern accesses. For instance,  

"x1 ?p ?o" could find "x1 knows y1", "x1 knows y2", "x1 name "x"" ... The 
former two triples could be pushed to access ?x knows ?y, the latter triple 
("x1 name "x") to pattern access for ?x name "x".

The key advantage is really that scans (sorted accesses) are fairly cheap, in 
comparison to random access probes. Thus, when finding the first potential 
result entity (e.g, x1), we could just scan over (all) its associated triples 
...

- Andreas

Original issue reported on code.google.com by andreas.josef.wagner on 29 Jan 2014 at 9:33

Better documentation re configuration

Documentation is unclear.

Webapp can be both configured using config file in /etc or WEB-INF properties.

Client does not read config file.

Possible solutions:
* improve documentation to make current setup clearer
* get rid of client and do loading also via webapp HTTP interface (so only 
webapp needs to be configured) - should be possible with current setup as 
thread buffers input and thus can iterate over the in-memory buffer for 
multiple index construction
* generate *.deb which installs webapp and config file (and log files with 
logrotate) in the right directories and with cassandra dependencies
* ?

Original issue reported on code.google.com by [email protected] on 2 May 2012 at 8:56

Better selectivity estimation

Better selectivity estimation, i.e., collect meaningful statistics for, e.g., 
triple pattern, join pattern.

Original issue reported on code.google.com by andreas.josef.wagner on 22 Nov 2013 at 12:21

Upgrade to CQL

Switch from Hector thrift client to Datastax CQL client.

Original issue reported on code.google.com by andreas.josef.wagner on 22 Jan 2014 at 8:24

SimpleRDFXMLFormatter escapes Literals too much

What steps will reproduce the problem?
1. Serve data with the SimpleRDFXMLFormatter that contains a Literal that 
contains a space 

What is the expected output? What do you see instead?
Expected output is something like >Luiz Felipe<
Instead we get >Luiz+Felipe<

The reason is that the same escape function is used for Literals and Resources.

Original issue reported on code.google.com by [email protected] on 23 May 2012 at 7:46

Build errors in branch 1.0.1

What steps will reproduce the problem?
1. svn co https://cumulusrdf.googlecode.com/svn/branches/1.0.1 cumulusRDF
2. cd cumulusRDF
3. mvn clean install

Expected output is a build success but instead a build failure is reported.
Specifically, there are two problems

1) cannot find symbol LRUMap

LRUMap (used for example in NodeDictionaryBase) comes from sesame-sail-rdbms. 
Now, I'm not able to build the project using maven because that jar is 
(indirectly) declared with "runtime" scope.

That means  

- in an Eclipse workspace all works fine (no compilation errors) because m2e 
imports runtime jars (actually it makes no distinction between scopes) in build 
path;
- running a m2e or a Maven build will fail because that dependency is not found 
at compile time. 

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
...
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:2.3.2:compile (default-compile) 
on project cumulusrdf: Compilation failure: Compilation failure:
[ERROR] 
/home/agazzarini/workspaces/cumulusRDF/cumulusrdf/src/main/java/edu/kit/aifb/cum
ulus/store/dict/NodeDictionaryBase.java:[13,34] package 
org.openrdf.sail.rdbms.util does not exist
[ERROR] 
/home/agazzarini/workspaces/cumulusRDF/cumulusrdf/src/main/java/edu/kit/aifb/cum
ulus/store/dict/NodeDictionaryBase.java:[47,9] cannot find symbol
...
[ERROR] 
/home/agazzarini/workspaces/cumulusRDF/cumulusrdf/src/main/java/edu/kit/aifb/cum
ulus/util/hector/CassandraHectorMap.java:[29,34] package 
org.openrdf.sail.rdbms.util does not exist
[ERROR] 
/home/agazzarini/workspaces/cumulusRDF/cumulusrdf/src/main/java/edu/kit/aifb/cum
ulus/util/hector/CassandraHectorMap.java:[118,9] cannot find symbol
[ERROR] symbol  : class LRUMap
[ERROR] location: class edu.kit.aifb.cumulus.util.hector.CassandraHectorMap<K,V>

2) RestServletPojoTest

This class, which is in the test/src folder, is referenced in a @See comment, 
in RestApplicationResource (line 483) which belongs to main/src folder. 
As consquence of that, RestApplicationResource imports a class which belongs to 
tests which are not visible during the build. 

That is not immediately visible on IDE (i.e. Eclipse) where there are no 
compilation errors but running a m2e or a Maven build I get

[ERROR] 
/home/agazzarini/workspaces/cumulusRDF/cumulusrdf/src/main/java/edu/kit/aifb/cum
ulus/webapp/rest/RESTApplicationResource.java:[44,34] cannot find symbol
[ERROR] symbol  : class RestServletPojoTest
[ERROR] location: package edu.kit.aifb.cumulus.webapp

Original issue reported on code.google.com by [email protected] on 23 Jan 2014 at 10:31

Output complete URI in error msg

What steps will reproduce the problem?
1. Proxy Mode
2. curl -x http://localhost:8080 -H "Accept: application/rdf+xml" 
http://dbpedia.org/resource/Karlsruhe
ERROR /resource/Karlsruhe 404: resource not found

Want to have the full URI, including host part.


Original issue reported on code.google.com by [email protected] on 2 May 2012 at 8:36

CRUDServlet.Put assumes objects are URI

What steps will reproduce the problem?
Send a PUT request with 

s=<http://a.b.c#d>
p=<http://a.b.c#e>
o="A literal"

s2=<http://a.b.c#d>
p2=<http://a.b.c#e>
o2="Another literal"

What is the expected output? What do you see instead?
I would expect the following triple on the store

<http://a.b.c#d> <http://a.b.c#e> Another literal"

Instead, the servlet throws an exception because the object is always supposed 
to be a valid URI (i.e. the following line URI o = valueFactory.createURI("A 
Literal") fails)


Original issue reported on code.google.com by [email protected] on 5 Feb 2014 at 3:40

Make proper shell based on our plain CLI

Make a proper shell based on our plain CLI functionality. 

See also: 
* 
http://stackoverflow.com/questions/14080604/libraries-for-constructing-an-intera
ctive-shell-for-java-application
* http://java.dzone.com/announcements/clamshell-cli-framework

Original issue reported on code.google.com by andreas.josef.wagner on 14 Mar 2014 at 12:10

Bad link on Project Home

This is just a little problem with the home page of the software, rather than 
the software itself. 

What steps will reproduce the problem?
1. Go to Project Home (https://code.google.com/p/cumulusrdf/)
2. Under overview, click on the link to Apache Cassandra
3. You will be redirected to the dead link http://casssandra.apache.org/ 
(cassandra with the letter s 3 times). 

What is the expected output? What do you see instead?

I suppose it should be http://cassandra.apache.org/ (2ses)



Original issue reported on code.google.com by [email protected] on 30 Oct 2013 at 1:16

Support for Cassandra 2.x

CumulusRDF currently only support Cassandra 1.x. Add support for Cassandra 2.x.

Original issue reported on code.google.com by andreas.josef.wagner on 22 Nov 2013 at 12:12

Error in build 20

00:38:06,669 ERROR [edu.kit.aifb.cumulus.store.CassandraRdfHectorTriple] caught 
java.lang.ArithmeticException: / by zero while inserting 0 [0, tries left: 10]
java.lang.ArithmeticException: / by zero
    at com.ecyrd.speed4j.StopWatch.toString(StopWatch.java:258)
    at edu.kit.aifb.cumulus.util.Util.logAndStopTimer(Util.java:245)
    at edu.kit.aifb.cumulus.util.Util.logAndStopTimer(Util.java:218)
    at edu.kit.aifb.cumulus.store.CassandraRdfHectorTriple.batchInsert(CassandraRdfHectorTriple.java:419)

Original issue reported on code.google.com by andreas.josef.wagner on 3 Mar 2014 at 1:59

TTL (Time to live) support

Time to live for added data, to be able to use CumulusRDF as a buffer for 
streams (e.g., always keep one year's worth of data of a given stream).

Original issue reported on code.google.com by andreas.josef.wagner on 22 Jan 2014 at 8:22

Deletion performance can be very bad

Deleting from the store will trigger one test for deletion from a secondary 
index for every triple. Using a hashtable or sorted tree as buffer would 
increase performance here.

Original issue reported on code.google.com by [email protected] on 11 Feb 2014 at 11:02

Remove CompositeColumns wherever possible

As discussed in [1], we could remove the CompositeColumns in favor of simple 
byte arrays (byte array concatenations).

[1] 
https://groups.google.com/d/msgid/cumulusrdf-dev-list/52FCE956.3040606%40gmail.c
om

Original issue reported on code.google.com by andreas.josef.wagner on 16 Feb 2014 at 1:01

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.