Coder Social home page Coder Social logo

duke's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

duke's Issues

SPARQL data source hangs

From Michael.Hausenblas on May 21, 2011 16:46:06

What steps will reproduce the problem? 1. java no.priv.garshol.duke.Duke --showmatches dogfood.xml What is the expected output? What do you see instead? When I do a kill -SIGQUIT {PID} I get the following trace:

2011-05-21 15:39:25
Full thread dump Java HotSpot(TM) 64-Bit Server VM (19.1-b02-334 mixed mode):

"Low Memory Detector" daemon prio=5 tid=10184e800 nid=0x108b69000 runnable [00000000]
java.lang.Thread.State: RUNNABLE

"CompilerThread1" daemon prio=9 tid=10184d000 nid=0x108a66000 waiting on condition [00000000]
java.lang.Thread.State: RUNNABLE

"CompilerThread0" daemon prio=9 tid=10184b800 nid=0x108963000 waiting on condition [00000000]
java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=9 tid=10184a800 nid=0x108860000 waiting on condition [00000000]
java.lang.Thread.State: RUNNABLE

"Surrogate Locker Thread (CMS)" daemon prio=5 tid=101849000 nid=0x10875d000 waiting on condition [00000000]
java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=8 tid=101830000 nid=0x108643000 in Object.wait() [108642000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <7f3001300> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
- locked <7f3001300> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=10 tid=10182f000 nid=0x108532000 in Object.wait() [108531000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <7f30011d8> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:485)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
- locked <7f30011d8> (a java.lang.ref.Reference$Lock)

"main" prio=5 tid=101801800 nid=0x100501000 runnable [100500000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream. read1 (BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
- locked <7f3e12df0> (a java.io.BufferedInputStream)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1195)
- locked <7f3e10418> (a sun.net.www.protocol.http.HttpURLConnection)
at no.priv.garshol.duke.SparqlClient.getResponse(SparqlClient.java:52)
at no.priv.garshol.duke.SparqlClient.execute(SparqlClient.java:30)
at no.priv.garshol.duke.SparqlDataSource$SparqlIterator.fetchNextPage(SparqlDataSource.java:106)
at no.priv.garshol.duke.SparqlDataSource$SparqlIterator.next(SparqlDataSource.java:92)
at no.priv.garshol.duke.SparqlDataSource$SparqlIterator.next(SparqlDataSource.java:43)
at no.priv.garshol.duke.Duke.main(Duke.java:82)

"VM Thread" prio=9 tid=10182a000 nid=0x10842f000 runnable

"Gang worker#0 (Parallel GC Threads)" prio=9 tid=101804800 nid=0x1007c7000 runnable

"Gang worker#1 (Parallel GC Threads)" prio=9 tid=101805800 nid=0x1017cc000 runnable

"Concurrent Mark-Sweep GC Thread" prio=9 tid=101808000 nid=0x1080b6000 runnable
"VM Periodic Task Thread" prio=10 tid=101850800 nid=0x108c6c000 waiting on condition

"Exception Catcher Thread" prio=10 tid=101802800 nid=0x100604000 runnable
JNI global references: 1704

Heap
par new generation total 19136K, used 14785K [7f3000000, 7f44c0000, 7f44c0000)
eden space 17024K, 86% used [7f3000000, 7f3e707e8, 7f40a0000)
from space 2112K, 0% used [7f40a0000, 7f40a0000, 7f42b0000)
to space 2112K, 0% used [7f42b0000, 7f42b0000, 7f44c0000)
concurrent mark-sweep generation total 63872K, used 0K [7f44c0000, 7f8320000, 7fae00000)
concurrent-mark-sweep perm gen total 21248K, used 8748K [7fae00000, 7fc2c0000, 800000000) What version of the product are you using? On what operating system? Using duke-0.2-SNAPSHOT.jar built from source with Java version "1.6.0_24" on Mac OS X 10.5.8

Original issue: http://code.google.com/p/duke/issues/detail?id=16

PersonNameComparator: handling of short words

From [email protected] on October 28, 2011 02:06:13

I may be wrong but it seems the PersonNameComparator has a couple of bugs:

  1. The execution does not really reach the else if responsible for short tokens handling:
    line 88
    } else if (t1[ix].length() + t2[ix].length() <= 4)
    // it's not an initial, so if the strings are 4 characters
    // or less, we quadruple the edit dist
    d = d * 4;
    else
  2. In line 72, t1.length needs to be t2.length? As t1 is always the longer token.
    } else if (d > 1 && (ix + 1) <= t1.length)

What do you think?

Original issue: http://code.google.com/p/duke/issues/detail?id=45

Save/Retrieve multiple property values in the lucene database

From [email protected] on June 13, 2011 08:32:15

Hello,

I would like to indexing a record in which a property has >1 values.
At the moment I can see that only 1st value gets saved in the lucene database:

String value = record.getValue(propname);

So that when a candidate is retrieved from the database all other values are lost and the comparison is not what I'd like it to be.

Would it be possible to correctly save/retrieve the whole collection of property values in the database?

Thanks

Original issue: http://code.google.com/p/duke/issues/detail?id=22

Try opening reader directly from the writer

From [email protected] on August 25, 2011 21:30:46

One of the costliest operations we perform right now is IndexWriter.commit(), and in fact we introduced the whole troublesome batching concept specifically to be able to live with this limitation. It's possible to open a special reader from a writer to get "near real-time" searching, and we should try out whether this works better. http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/index/IndexReader.html#open(org.apache.lucene.index.IndexWriter , boolean)

Original issue: http://code.google.com/p/duke/issues/detail?id=31

Add MatchListener method for debugging record scores

From [email protected] on June 13, 2011 08:50:13

Would it be possible to add a method in the MatchListener which would be called regardless of whether a match has been identified or not. For debugging and fine-tuning purposes it would be nice to see what kind of probabilities each record scores. It would be even better if each property showed what probability it scored on its own.

Thanks

Original issue: http://code.google.com/p/duke/issues/detail?id=23

Should be possible for data sources to assert difference

From [email protected] on November 01, 2011 11:23:39

It should be possible to assert A owl:differentFrom B in a data source, and for this to prevent Duke from ever claiming that A owl:sameAs B.

The JDBCLinkDatabase component in Duke already supports this. If a row (A, B, DIFFERENT, ASSERTED) were to appear in the database, Duke would never add an owl:sameAs between A and B. However, Duke cannot now get this information from the UMIC and into the link database.

To add support for that we'd need to:

  • Add a Collection getLinks() method to the Record interface, so that records can arrive in Duke with pre-known link information.
  • Add support for populating this data to individual data sources.

Oh, and we also need to update the code so that this gets written correctly to the link database.

Original issue: http://code.google.com/p/duke/issues/detail?id=46

Configure Database connection via JNDI

From [email protected] on November 04, 2011 09:21:45

We have a web application were all database connections are configured via JNDI. This allows us, for example, to set up different database connections for the test and our production system without maintaining different war files.

A datasource configuration could look like this:

<column name=.../>

The actual data source would be configured within the context of the web application's servlet container.

In any case: nice project!

Original issue: http://code.google.com/p/duke/issues/detail?id=47

Support for multithreaded processing

From [email protected] on September 04, 2011 15:22:56

We should be able to use threads to make use of all the processor cores in modern machines. Below is an outline of how it might be done.

one thread runs the data source and collects records from there into
a queue.

another set of threads collects records from the queue and indexes
them. it seems that multiple threads doing indexes should work. http://darksleep.com/lucene/ once indexed the records are stuffed
into a second queue.

a pool of threads picks records from the second queue and does the
matching on them

Original issue: http://code.google.com/p/duke/issues/detail?id=35

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.