Coder Social home page Coder Social logo

icij / datashare Goto Github PK

View Code? Open in Web Editor NEW
574.0 574.0 50.0 321.58 MB

A self-hosted search engine for documents.

Home Page: https://datashare.icij.org

License: GNU Affero General Public License v3.0

Java 98.88% Shell 0.94% Makefile 0.09% Dockerfile 0.09% HTML 0.01%
datashare docker elasticsearch extract investigative-journalism named-entity-recognition text-extraction web-gui

datashare's People

Contributors

annelhote avatar bamthomas avatar caro3801 avatar clemdoum avatar dependabot[bot] avatar dmytromolkov avatar henryclw avatar jonashaag avatar julm avatar madoleary avatar marieglr avatar millaguie avatar mvanzalu avatar neoreuters avatar pirhoo avatar sabrinadz avatar soliine avatar stefanbirkner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datashare's Issues

Error while discovering named entities

On my localhost, execute ./lauchBack.sh
Then on the indexing page, I try to start the named entities discover task.
And get this stack:
2018-07-05 15:31:11,162 [CORENLP-0] INFO CoreNlpNerModels - sync models : true
2018-07-05 15:31:11,942 [CORENLP-0] INFO CoreNlpNerModels - downloading models for language ENGLISH under dist/models/corenlp/3-8-0/en
2018-07-05 15:31:14,442 [CORENLP-0] WARN NlpConsumer - error in consumer main loop
com.amazonaws.SdkClientException: Unable to execute HTTP request: java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1114)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1064)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4266)
at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:834)
at com.amazonaws.services.s3.transfer.TransferManager.downloadDirectory(TransferManager.java:1211)
at com.amazonaws.services.s3.transfer.TransferManager.downloadDirectory(TransferManager.java:1157)
at com.amazonaws.services.s3.transfer.TransferManager.downloadDirectory(TransferManager.java:1148)
at org.icij.datashare.io.RemoteFiles.download(RemoteFiles.java:67)
at org.icij.datashare.text.nlp.AbstractModels.downloadIfNecessary(AbstractModels.java:114)
at org.icij.datashare.text.nlp.AbstractModels.load(AbstractModels.java:66)
at org.icij.datashare.text.nlp.AbstractModels.get(AbstractModels.java:55)
at org.icij.datashare.text.nlp.corenlp.CorenlpPipeline.initializeNerAnnotator(CorenlpPipeline.java:180)
at org.icij.datashare.text.nlp.corenlp.CorenlpPipeline.initialize(CorenlpPipeline.java:77)
at org.icij.datashare.text.nlp.NlpConsumer.extractNamedEntities(NlpConsumer.java:67)
at org.icij.datashare.text.nlp.NlpConsumer.run(NlpConsumer.java:41)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:844)
Caused by: javax.net.ssl.SSLException: java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
at java.base/sun.security.ssl.Alerts.getSSLException(Alerts.java:214)
at java.base/sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1974)
at java.base/sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1926)
at java.base/sun.security.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1909)
at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1436)
at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1413)
at org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:394)
at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:353)
at com.amazonaws.http.conn.ssl.SdkTLSSocketFactory.connectSocket(SdkTLSSocketFactory.java:132)
at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:141)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76)
at com.amazonaws.http.conn.$Proxy44.connect(Unknown Source)
at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1236)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
... 24 common frames omitted
Caused by: java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
at java.base/sun.security.validator.PKIXValidator.(PKIXValidator.java:89)
at java.base/sun.security.validator.Validator.getInstance(Validator.java:181)
at java.base/sun.security.ssl.X509TrustManagerImpl.getValidator(X509TrustManagerImpl.java:330)
at java.base/sun.security.ssl.X509TrustManagerImpl.checkTrustedInit(X509TrustManagerImpl.java:180)
at java.base/sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:192)
at java.base/sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:133)
at java.base/sun.security.ssl.ClientHandshaker.checkServerCerts(ClientHandshaker.java:1947)
at java.base/sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1777)
at java.base/sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:264)
at java.base/sun.security.ssl.Handshaker.processLoop(Handshaker.java:1098)
at java.base/sun.security.ssl.Handshaker.processRecord(Handshaker.java:1026)
at java.base/sun.security.ssl.SSLSocketImpl.processInputRecord(SSLSocketImpl.java:1137)
at java.base/sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1074)
at java.base/sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
at java.base/sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1402)
at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1429)
... 45 common frames omitted
Caused by: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
at java.base/java.security.cert.PKIXParameters.setTrustAnchors(PKIXParameters.java:200)
at java.base/java.security.cert.PKIXParameters.(PKIXParameters.java:120)
at java.base/java.security.cert.PKIXBuilderParameters.(PKIXBuilderParameters.java:104)
at java.base/sun.security.validator.PKIXValidator.(PKIXValidator.java:86)
... 60 common frames omitted

Bug: persistant filters visible in URL, invisible in facets

Will Fitzgibbon experienced this bug during our call with him:

  • He had filtered results by the italian language in the past
  • Italian language stayed in the URL and was applied
  • But it was not visible in the left language facet (no "italian" selected)

He's on Windows and he was using local Datashare. I didn't reproduce the bug (Soline).

Option `--redisAddress` is ignored

The option --redisAddress is ignored by DataShare.

Steps to reproduce:

  • Set up a redis container called redis-test
  • Ensure there is no running container called redis
  • Specify redis://redis-test:6379 as the redis address to datashare with --redisAddress
  • Try to parse named entities in a document
  • The extraction crash because redis is not a known host.

QDS Task monitoring & scaling

how we could design task execution monitoring to be able to scale.

The current design of task handling is not able to scale :

  • indexing of document
  • name finding

It is possible to scale datashare to scan lots of documents but not with the web API. It has to be done "by hand".

No enum constant in org.icij.datashare.text.Language

when indexing offshore leaks there are a lot of errors like :

java.lang.IllegalArgumentException: No enum constant org.icij.datashare.text.Language.CY at java.lang.Enum.valueOf(Enum.java:238) at org.icij.datashare.text.Language.valueOf(Language.java:12) at org.icij.datashare.text.Language.parse(Language.java:64) at org.icij.datashare.text.indexing.elasticsearch.language.OptimaizeLanguageGuesser.guess(OptimaizeLanguageGuesser.java:27) at org.icij.datashare.text.indexing.elasticsearch.ElasticsearchSpewer.getMap(ElasticsearchSpewer.java:116) at org.icij.datashare.text.indexing.elasticsearch.ElasticsearchSpewer.prepareRequest(ElasticsearchSpewer.java:81) at org.icij.datashare.text.indexing.elasticsearch.ElasticsearchSpewer.indexDocument(ElasticsearchSpewer.java:134) at org.icij.datashare.text.indexing.elasticsearch.ElasticsearchSpewer.write(ElasticsearchSpewer.java:72) at org.icij.extract.extractor.Extractor.extract(Extractor.java:272) at org.icij.extract.extractor.DocumentConsumer.lambda$accept$0(DocumentConsumer.java:125) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

Enable GateNLP

Gate NLP has been disabled in 7741a4e because the dependencies were clashing with extract-lib.

GateNLP is pulling tika, and its deps (pdfbox, jpgeg...) as does extract-lib.
extract-lib has tika version 1.18 and GateNLP tika 1.7 (in the latest GateNLP version)

Besides, the GateNLP dependency is on the 1.1 and GateNLP is now 8.5.1. It has to be upgraded also.

Replace "Start a new task" with 2 buttons

On http://localhost:8080/#/indexing, replace the button "Start a new task" with:

  • Button 1 : "Extract text"

When clicking it, it opens the pop-in window "Do you want to extract text also from images and PDFs?" with radio buttons "Yes / No" but just "Next" (no "Previous"). And then we remove the recap pop-in window "Ready?".

  • Button 2 : "Find named entities"

When clicking it, it opens the pop-in window "Which technology do you want to apply in order to find named entities?" with NLP options but just "Next" (no "Previous"). And then we remove the recap pop-in window "Ready?".

Interdependence of these 2 buttons:

  • If one extracting-text task is still "running", the "Find named entities" button is grey and users cannot click on it. When they hover it and/or try to click on it, a message says : "Texts must be extracted before finding named entities".
  • When all extracting-text tasks are "done", the "Find named entities" button is burgundy and users can click on it.

github 2 buttons

Blank Screen

  • Synchronize time outs between scripts and Elastic Search ping
  • What if Elastic Search does not start?

Change project do not update facets

on the server datashare.cloud.icij.org, when we are selecting a project, the search results are updated, but the facets (and the NE's) are still the ones of the user's index.

Datashare placeholder starting page

To avoid having users go to Datashare's URL and make it start before it can, create a placeholder starting page which refreshes every 2 seconds and displays "Datashare is starting... Please wait".

NER via HTTP API?

Congratulations to making the datashare code available! I think the multi-NER pipeline is really cool, so allow me two questions:

a) are you planning an API endpoint that makes the chained NER available independently of the content extraction and indexing parts of the application? It'd be really cool if there was an API that just allowed submitting a bunch of strings and then returned an annotated version of them.

b) Is it possible to boot the server without defining an ES index at all (i.e. neither remote nor the built-in)?

Multithreading with Jedis : JedisDataException

When running datashare with a multithreaded env, we see :

2018-05-19 11:03:02,115 [pool-5-thread-1] ERROR DocumentConsumer - Exception while consuming file:  "/home/datashare/data/apollo-global-management-2009-tax-ruling.pdf".
redis.clients.jedis.exceptions.JedisDataException: ERR Protocol error: expected '$', got ' '
at redis.clients.jedis.Protocol.processError(Protocol.java:127)
at redis.clients.jedis.Protocol.process(Protocol.java:161)
at redis.clients.jedis.Protocol.read(Protocol.java:215)
at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:340)
at redis.clients.jedis.Connection.getIntegerReply(Connection.java:265)
at redis.clients.jedis.Jedis.publish(Jedis.java:2690)
at org.icij.datashare.com.redis.RedisPublisher.publish(RedisPublisher.java:22)
at org.icij.datashare.text.indexing.elasticsearch.ElasticsearchSpewer.indexDocument(ElasticsearchSpewer.java:133)
at org.icij.datashare.text.indexing.elasticsearch.ElasticsearchSpewer.write(ElasticsearchSpewer.java:66)
at org.icij.extract.extractor.Extractor.extract(Extractor.java:272)
at org.icij.extract.extractor.DocumentConsumer.lambda$accept$0(DocumentConsumer.java:125)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

This is probably caused by : redis/jedis#1358

Use half of memory space of user's computer

  • Get memory of user's computer
  • Take half of memory

Logs of a case where we could not run CoreNLP:

2019-01-18 14:35:17,311 [CORENLP-0] WARN NlpConsumer - error in consumer main loop
java.lang.OutOfMemoryError: GC overhead limit exceeded

Install Datashare with Avast (and other firewalls)

A beta user wrote: "I haven't been able to open it; Avast kept interfering but I can tell it the file's ok; however, Windows shows an error message, saying I don't have the administrator rights (it's my personal computer so I don't get it). The setup wizard never appears." A few hours later, she said: "Good news: The set up wizard now opens. The only reason I can think of, is that Avast was blocking the wizard until it had analysed the software internally."

Bruno says this happens with BitDefender and could happen with other firewalls.

Error installing

While installing in a brand new Debian Stretch came to this error. I supposed that version 0.17 would be mostly stable. It's for a small but urgent and 0 money project so I have to be time efficient on it.

Here is the full log from brand new machine (as root user):
$ wget https://raw.githubusercontent.com/ICIJ/datashare/master/datashare-dist/src/main/datashare.sh
$ apt-get install apt-transport-https dirmngr
$ echo 'deb https://apt.dockerproject.org/repo debian-stretch main' >> /etc/apt/sources.list
$ apt-key adv --keyserver hkp://p80.pool.sks-keyservers.net:80 --recv-keys F76221572C52609D
$ apt-get update
$ apt-get install docker git curl docker-compose vim htop
$ apt-get install docker-engine
$ mkdir /mnt/data/
$ mkdir /mnt/data/documents/
$ cd /mnt/data/documents/
$ chmod a+x /home/user/datashare.sh
$ /home/user/datashare.sh -w
Creating network "datashare_default" with the default driver
Pulling elasticsearch (docker.elastic.co/elasticsearch/elasticsearch:6.3.0)...
6.3.0: Pulling from elasticsearch/elasticsearch
7dc0dca2b151: Pull complete
a50481268b4a: Pull complete
ee5228de771f: Pull complete
da55b983e8eb: Pull complete
f2b07ded3946: Pull complete
667f672695bc: Pull complete
243fca89bf48: Pull complete
Digest: sha256:3169b4009903c47245d3bf81d8a7b11ab22ce2965598823384e5c4bbd4298316
Status: Downloaded newer image for docker.elastic.co/elasticsearch/elasticsearch:6.3.0
Pulling redis (redis:4.0.1-alpine)...
4.0.1-alpine: Pulling from library/redis
88286f41530e: Pull complete
5b66139f3b4e: Pull complete
111233a1622e: Pull complete
df34d5b23fd6: Pull complete
d313ea2920e6: Pull complete
b82e355658cc: Pull complete
Digest: sha256:ff5db5ad4c91771af708c05aa2b44fccd5107687cf5caed4618c9a1f9471d226
Status: Downloaded newer image for redis:4.0.1-alpine
Creating datashare_elasticsearch_1
Creating datashare_redis_1
Folder path that contains documents [/mnt/data/documents] : /mnt/data/documents/
Folder path for cache (datashare will store models here) [/tmp/dist] :
waiting for index to be up...OK
Unable to find image 'icij/datashare:0.17' locally
0.17: Pulling from icij/datashare
8e3ba11ec2a2: Pull complete
3dd014c6e448: Pull complete
8b6257160d2f: Pull complete
91e262098646: Pull complete
03d86f381922: Pull complete
8798e221b80a: Pull complete
9fb071007ffc: Pull complete
e1eeb27a4711: Pull complete
19f3d1c18994: Pull complete
f1fc0ab88ab6: Pull complete
Digest: sha256:219dee5b161c4fee04780531db6c137389f4ae6f9a5081e2139cfb32b17beeec
Status: Downloaded newer image for icij/datashare:0.17
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/datashare/lib/datashare-nlp-gate-0.17-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/datashare/lib/datashare-cli-0.17-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/datashare/lib/datashare-nlp-opennlp-0.17-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/datashare/lib/datashare-nlp-mitie-0.17-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/datashare/lib/datashare-nlp-ixapipe-0.17-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/datashare/lib/datashare-nlp-corenlp-0.17-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/datashare/lib/datashare-web-0.17-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
14:02:11,124 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback.groovy]
14:02:11,124 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback-test.xml]
14:02:11,124 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Found resource [logback.xml] at [jar:file:/home/datashare/lib/datashare-nlp-gate-0.17-jar-with-dependencies.jar!/logback.xml]
14:02:11,125 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs multiple times on the classpath.
14:02:11,126 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs at [jar:file:/home/datashare/lib/datashare-nlp-mitie-0.17-jar-with-dependencies.jar!/logback.xml]
14:02:11,126 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs at [jar:file:/home/datashare/lib/datashare-nlp-ixapipe-0.17-jar-with-dependencies.jar!/logback.xml]
14:02:11,126 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs at [jar:file:/home/datashare/lib/datashare-nlp-gate-0.17-jar-with-dependencies.jar!/logback.xml]
14:02:11,126 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs at [jar:file:/home/datashare/lib/datashare-nlp-opennlp-0.17-jar-with-dependencies.jar!/logback.xml]
14:02:11,126 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs at [jar:file:/home/datashare/lib/datashare-nlp-corenlp-0.17-jar-with-dependencies.jar!/logback.xml]
14:02:11,126 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs at [jar:file:/home/datashare/lib/datashare-cli-0.17-jar-with-dependencies.jar!/logback.xml]
14:02:11,151 |-INFO in ch.qos.logback.core.joran.spi.ConfigurationWatchList@6cc4c815 - URL [jar:file:/home/datashare/lib/datashare-nlp-gate-0.17-jar-with-dependencies.jar!/logback.xml] is not of type file
14:02:11,316 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - debug attribute not set
14:02:11,324 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.ConsoleAppender]
14:02:11,335 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [STDOUT]
14:02:11,345 |-INFO in ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] property
14:02:11,426 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.rolling.RollingFileAppender]
14:02:11,429 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [FILE]
14:02:11,441 |-INFO in c.q.l.core.rolling.TimeBasedRollingPolicy@981661423 - No compression will be used
14:02:11,443 |-INFO in c.q.l.core.rolling.TimeBasedRollingPolicy@981661423 - Will use the pattern ./logs/datashare_gatenlp.%d{yyyy-MM-dd}.log for the active file
14:02:11,449 |-INFO in c.q.l.core.rolling.DefaultTimeBasedFileNamingAndTriggeringPolicy - The date pattern is 'yyyy-MM-dd' from file name pattern './logs/datashare_gatenlp.%d{yyyy-MM-dd}.log'.
14:02:11,449 |-INFO in c.q.l.core.rolling.DefaultTimeBasedFileNamingAndTriggeringPolicy - Roll-over at midnight.
14:02:11,454 |-INFO in c.q.l.core.rolling.DefaultTimeBasedFileNamingAndTriggeringPolicy - Setting initial period to Fri Aug 03 14:02:11 GMT 2018
14:02:11,456 |-INFO in ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] property
14:02:11,459 |-INFO in ch.qos.logback.core.rolling.RollingFileAppender[FILE] - Active log file name: ./logs/datashare.log
14:02:11,459 |-INFO in ch.qos.logback.core.rolling.RollingFileAppender[FILE] - File property is set to [./logs/datashare.log]
14:02:11,461 |-INFO in ch.qos.logback.classic.joran.action.RootLoggerAction - Setting level of ROOT logger to INFO
14:02:11,461 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [STDOUT] to Logger[ROOT]
14:02:11,462 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [FILE] to Logger[ROOT]
14:02:11,462 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - End of configuration.
14:02:11,463 |-INFO in ch.qos.logback.classic.joran.JoranConfigurator@100fc185 - Registering current configuration as safe fallback point

SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder]
2018-08-03 14:02:11,596 [main] INFO DatashareCli - Running datashare web server
2018-08-03 14:02:11,603 [main] INFO DatashareCli - with properties: {nlpParallelism=1, parserParallelism=1, stages=SCAN,INDEX,NLP, elasticsearchAddress=http://elasticsearch:9200, messageBusAddress=redis, resume=false, parallelism=2, help=false, web=true, mode=LOCAL, enableOcr=false, nlpPipelines=CORENLP,GATENLP,IXAPIPE,MITIE,OPENNLP, cors=no-cors, redisAddress=redis://redis:6379, clusterName=datashare, indexName=local-datashare, dataDir=/home/datashare/data}
2018-08-03 14:02:11,610 [main] INFO PropertiesProvider - reading properties from jar:file:/home/datashare/lib/datashare-web-0.17-jar-with-dependencies.jar!/datashare.properties
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.http.HttpHost.create(Ljava/lang/String;)Lorg/apache/http/HttpHost;
at org.icij.datashare.text.indexing.elasticsearch.ElasticsearchConfiguration.createESClient(ElasticsearchConfiguration.java:66)
at org.icij.datashare.mode.LocalMode.configure(LocalMode.java:33)
at com.google.inject.AbstractModule.configure(AbstractModule.java:62)
at com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:340)
at com.google.inject.spi.Elements.getElements(Elements.java:110)
at com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:138)
at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:104)
at com.google.inject.Guice.createInjector(Guice.java:99)
at com.google.inject.Guice.createInjector(Guice.java:73)
at com.google.inject.Guice.createInjector(Guice.java:62)
at net.codestory.http.injection.GuiceAdapter.(GuiceAdapter.java:28)
at org.icij.datashare.mode.CommonMode.defaultRoutes(CommonMode.java:53)
at org.icij.datashare.mode.CommonMode.lambda$createWebConfiguration$c4402707$1(CommonMode.java:47)
at net.codestory.http.routes.RouteCollection.configure(RouteCollection.java:88)
at net.codestory.http.reload.FixedRoutesProvider.(FixedRoutesProvider.java:27)
at net.codestory.http.reload.RoutesProvider.fixed(RoutesProvider.java:29)
at net.codestory.http.AbstractWebServer.configure(AbstractWebServer.java:62)
at org.icij.datashare.WebApp.start(WebApp.java:36)
at org.icij.datashare.cli.DatashareCli.main(DatashareCli.java:31)

Indexing form refactor

  • Delete choose_one_pipeline
  • Add a subtitle on the OCR frame and the NLP
  • Add a "(default)" after CoreNLP

Go back to all folders in Path Facet

After clicking on a specific folder to navigate, unless I click on search and start the search over, I cant go back to previous folders to chose other folders to explore.

dspath

Security s3 write access key

the S3 access key/secret with write access was saved into github.

We have to revoke the key and allow only read access

[unresolved] Elasticsearch overhead, spent [xxxms] collecting in the last [yys]

January 14th, 2019: Will wrote: "I can't get in to Datashare." Bruno checked on the internet and it could apparently regards Docker with Windows.

Here are Will's logs:

C:\Users\Will>docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b55618bf572d icij/datashare:1.02 "/entrypoint.sh -w" 3 hours ago Up 3 hours 0.0.0.0:8080->8080/tcp datashare102_datashare_1
6e88c06aa8af docker.elastic.co/elasticsearch/elasticsearch:6.3.0 "/usr/local/bin/dock…" 3 hours ago Up 3 hours 0.0.0.0:9200->9200/tcp, 9300/tcp datashare102_elasticsearch_1
cfe3f4f7ab68 redis:4.0.1-alpine "docker-entrypoint.s…" 3 hours ago Up 3 hours 0.0.0.0:6379->6379/tcp datashare102_redis_1

C:\Users\Will>docker logs 6e88c06aa8af
OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
[2019-01-11T15:49:46,436][INFO ][o.e.n.Node ] [] initializing ...
[2019-01-11T15:49:46,538][INFO ][o.e.e.NodeEnvironment ] [DfhKXPP] using [1] data paths, mounts [[/ (overlay)]], net usable_space [52.5gb], net total_space [58.4gb], types [overlay]
[2019-01-11T15:49:46,539][INFO ][o.e.e.NodeEnvironment ] [DfhKXPP] heap size [1007.3mb], compressed ordinary object pointers [true]
[2019-01-11T15:49:46,540][INFO ][o.e.n.Node ] [DfhKXPP] node name derived from node ID [DfhKXPPQQNyzV_xpUXfeAg]; set [node.name] to override
[2019-01-11T15:49:46,540][INFO ][o.e.n.Node ] [DfhKXPP] version[6.3.0], pid[1], build[default/tar/424e937/2018-06-11T23:38:03.357887Z], OS[Linux/4.9.125-linuxkit/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/10.0.1/10.0.1+10]
[2019-01-11T15:49:46,540][INFO ][o.e.n.Node ] [DfhKXPP] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.io.tmpdir=/tmp/elasticsearch.oiPgU1hW, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=data, -XX:ErrorFile=logs/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -Djava.locale.providers=COMPAT, -Des.cgroups.hierarchy.override=/, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/usr/share/elasticsearch/config, -Des.distribution.flavor=default, -Des.distribution.type=tar]
[2019-01-11T15:49:49,863][INFO ][o.e.p.PluginsService ] [DfhKXPP] loaded module [aggs-matrix-stats]
[2019-01-11T15:49:49,864][INFO ][o.e.p.PluginsService ] [DfhKXPP] loaded module [analysis-common]
[2019-01-11T15:49:49,864][INFO ][o.e.p.PluginsService ] [DfhKXPP] loaded module [ingest-common]
[2019-01-11T15:49:49,864][INFO ][o.e.p.PluginsService ] [DfhKXPP] loaded module [lang-expression]
[2019-01-11T15:49:49,865][INFO ][o.e.p.PluginsService ] [DfhKXPP] loaded module [lang-mustache]
[2019-01-11T15:49:49,865][INFO ][o.e.p.PluginsService ] [DfhKXPP] loaded module [lang-painless]
[2019-01-11T15:49:49,865][INFO ][o.e.p.PluginsService ] [DfhKXPP] loaded module [mapper-extras]
[2019-01-11T15:49:49,865][INFO ][o.e.p.PluginsService ] [DfhKXPP] loaded module [parent-join]
[2019-01-11T15:49:49,865][INFO ][o.e.p.PluginsService ] [DfhKXPP] loaded module [percolator]
[2019-01-11T15:49:49,865][INFO ][o.e.p.PluginsService ] [DfhKXPP] loaded module [rank-eval]
[2019-01-11T15:49:49,866][INFO ][o.e.p.PluginsService ] [DfhKXPP] loaded module [reindex]
[2019-01-11T15:49:49,866][INFO ][o.e.p.PluginsService ] [DfhKXPP] loaded module [repository-url]
[2019-01-11T15:49:49,866][INFO ][o.e.p.PluginsService ] [DfhKXPP] loaded module [transport-netty4]
[2019-01-11T15:49:49,866][INFO ][o.e.p.PluginsService ] [DfhKXPP] loaded module [tribe]
[2019-01-11T15:49:49,867][INFO ][o.e.p.PluginsService ] [DfhKXPP] loaded module [x-pack-core]
[2019-01-11T15:49:49,867][INFO ][o.e.p.PluginsService ] [DfhKXPP] loaded module [x-pack-deprecation]
[2019-01-11T15:49:49,868][INFO ][o.e.p.PluginsService ] [DfhKXPP] loaded module [x-pack-graph]
[2019-01-11T15:49:49,874][INFO ][o.e.p.PluginsService ] [DfhKXPP] loaded module [x-pack-logstash]
[2019-01-11T15:49:49,875][INFO ][o.e.p.PluginsService ] [DfhKXPP] loaded module [x-pack-ml]
[2019-01-11T15:49:49,875][INFO ][o.e.p.PluginsService ] [DfhKXPP] loaded module [x-pack-monitoring]
[2019-01-11T15:49:49,875][INFO ][o.e.p.PluginsService ] [DfhKXPP] loaded module [x-pack-rollup]
[2019-01-11T15:49:49,875][INFO ][o.e.p.PluginsService ] [DfhKXPP] loaded module [x-pack-security]
[2019-01-11T15:49:49,875][INFO ][o.e.p.PluginsService ] [DfhKXPP] loaded module [x-pack-sql]
[2019-01-11T15:49:49,875][INFO ][o.e.p.PluginsService ] [DfhKXPP] loaded module [x-pack-upgrade]
[2019-01-11T15:49:49,875][INFO ][o.e.p.PluginsService ] [DfhKXPP] loaded module [x-pack-watcher]
[2019-01-11T15:49:49,876][INFO ][o.e.p.PluginsService ] [DfhKXPP] loaded plugin [ingest-geoip]
[2019-01-11T15:49:49,876][INFO ][o.e.p.PluginsService ] [DfhKXPP] loaded plugin [ingest-user-agent]
[2019-01-11T15:49:54,608][INFO ][o.e.x.s.a.s.FileRolesStore] [DfhKXPP] parsed [0] roles from file [/usr/share/elasticsearch/config/roles.yml]
[2019-01-11T15:49:55,987][INFO ][o.e.x.m.j.p.l.CppLogMessageHandler] [controller/77] [Main.cc@109] controller (64 bit): Version 6.3.0 (Build 0f0a34c67965d7) Copyright (c) 2018 Elasticsearch BV
[2019-01-11T15:49:57,116][INFO ][o.e.d.DiscoveryModule ] [DfhKXPP] using discovery type [single-node]
[2019-01-11T15:49:58,125][INFO ][o.e.n.Node ] [DfhKXPP] initialized
[2019-01-11T15:49:58,125][INFO ][o.e.n.Node ] [DfhKXPP] starting ...
[2019-01-11T15:49:58,341][INFO ][o.e.t.TransportService ] [DfhKXPP] publish_address {172.18.0.4:9300}, bound_addresses {0.0.0.0:9300}
[2019-01-11T15:49:58,414][INFO ][o.e.x.s.t.n.SecurityNetty4HttpServerTransport] [DfhKXPP] publish_address {172.18.0.4:9200}, bound_addresses {0.0.0.0:9200}
[2019-01-11T15:49:58,415][INFO ][o.e.n.Node ] [DfhKXPP] started
[2019-01-11T15:49:58,505][WARN ][o.e.x.s.a.s.m.NativeRoleMappingStore] [DfhKXPP] Failed to clear cache for realms [[]]
[2019-01-11T15:49:58,634][INFO ][o.e.g.GatewayService ] [DfhKXPP] recovered [0] indices into cluster_state
[2019-01-11T15:49:58,977][INFO ][o.e.c.m.MetaDataIndexTemplateService] [DfhKXPP] adding template [.watches] for index patterns [.watches*]
[2019-01-11T15:49:59,080][INFO ][o.e.c.m.MetaDataIndexTemplateService] [DfhKXPP] adding template [.watch-history-7] for index patterns [.watcher-history-7*]
[2019-01-11T15:49:59,181][INFO ][o.e.c.m.MetaDataIndexTemplateService] [DfhKXPP] adding template [.triggered_watches] for index patterns [.triggered_watches*]
[2019-01-11T15:49:59,259][INFO ][o.e.c.m.MetaDataIndexTemplateService] [DfhKXPP] adding template [.monitoring-logstash] for index patterns [.monitoring-logstash-6-]
[2019-01-11T15:49:59,380][INFO ][o.e.c.m.MetaDataIndexTemplateService] [DfhKXPP] adding template [.monitoring-es] for index patterns [.monitoring-es-6-
]
[2019-01-11T15:49:59,425][INFO ][o.e.c.m.MetaDataIndexTemplateService] [DfhKXPP] adding template [.monitoring-alerts] for index patterns [.monitoring-alerts-6]
[2019-01-11T15:49:59,520][INFO ][o.e.c.m.MetaDataIndexTemplateService] [DfhKXPP] adding template [.monitoring-beats] for index patterns [.monitoring-beats-6-]
[2019-01-11T15:49:59,585][INFO ][o.e.c.m.MetaDataIndexTemplateService] [DfhKXPP] adding template [.monitoring-kibana] for index patterns [.monitoring-kibana-6-
]
[2019-01-11T15:49:59,741][INFO ][o.e.l.LicenseService ] [DfhKXPP] license [f294b254-4b32-423d-8a7c-2662b4f35365] mode [basic] - valid
[2019-01-11T15:49:59,919][INFO ][o.e.c.m.MetaDataCreateIndexService] [DfhKXPP] [local-datashare] creating index, cause [api], templates [], shards [5]/[1], mappings [doc]
[2019-01-11T15:52:17,427][INFO ][o.e.c.m.MetaDataMappingService] [DfhKXPP] [local-datashare/dBluMRVLRgKc1xC-QAxW-A] update_mapping [doc]
[2019-01-11T15:52:18,699][INFO ][o.e.m.j.JvmGcMonitorService] [DfhKXPP] [gc][140] overhead, spent [469ms] collecting in the last [1.4s]
[2019-01-11T15:54:19,214][INFO ][o.e.m.j.JvmGcMonitorService] [DfhKXPP] [gc][260] overhead, spent [429ms] collecting in the last [1.4s]
[2019-01-11T15:58:42,777][INFO ][o.e.c.m.MetaDataMappingService] [DfhKXPP] [local-datashare/dBluMRVLRgKc1xC-QAxW-A] update_mapping [doc]
[2019-01-11T16:07:02,523][INFO ][o.e.c.m.MetaDataMappingService] [DfhKXPP] [local-datashare/dBluMRVLRgKc1xC-QAxW-A] update_mapping [doc]
[2019-01-11T16:07:19,789][INFO ][o.e.c.m.MetaDataMappingService] [DfhKXPP] [local-datashare/dBluMRVLRgKc1xC-QAxW-A] update_mapping [doc]
[2019-01-11T16:08:03,459][INFO ][o.e.c.m.MetaDataMappingService] [DfhKXPP] [local-datashare/dBluMRVLRgKc1xC-QAxW-A] update_mapping [doc]
[2019-01-11T16:08:05,379][INFO ][o.e.c.m.MetaDataMappingService] [DfhKXPP] [local-datashare/dBluMRVLRgKc1xC-QAxW-A] update_mapping [doc]
[2019-01-11T16:08:09,873][INFO ][o.e.c.m.MetaDataMappingService] [DfhKXPP] [local-datashare/dBluMRVLRgKc1xC-QAxW-A] update_mapping [doc]
[2019-01-11T16:08:10,008][INFO ][o.e.c.m.MetaDataMappingService] [DfhKXPP] [local-datashare/dBluMRVLRgKc1xC-QAxW-A] update_mapping [doc]
[2019-01-11T16:09:45,300][INFO ][o.e.c.m.MetaDataMappingService] [DfhKXPP] [local-datashare/dBluMRVLRgKc1xC-QAxW-A] update_mapping [doc]
[2019-01-11T16:10:48,598][INFO ][o.e.c.m.MetaDataMappingService] [DfhKXPP] [local-datashare/dBluMRVLRgKc1xC-QAxW-A] update_mapping [doc]
[2019-01-11T16:10:50,414][INFO ][o.e.m.j.JvmGcMonitorService] [DfhKXPP] [gc][1250] overhead, spent [325ms] collecting in the last [1s]
[2019-01-11T16:10:52,375][INFO ][o.e.c.m.MetaDataMappingService] [DfhKXPP] [local-datashare/dBluMRVLRgKc1xC-QAxW-A] update_mapping [doc]
[2019-01-11T16:22:51,577][INFO ][o.e.c.m.MetaDataMappingService] [DfhKXPP] [local-datashare/dBluMRVLRgKc1xC-QAxW-A] update_mapping [doc]
[2019-01-11T16:22:52,043][INFO ][o.e.c.m.MetaDataMappingService] [DfhKXPP] [local-datashare/dBluMRVLRgKc1xC-QAxW-A] update_mapping [doc]
[2019-01-11T16:29:37,864][INFO ][o.e.c.m.MetaDataMappingService] [DfhKXPP] [local-datashare/dBluMRVLRgKc1xC-QAxW-A] update_mapping [doc]
[2019-01-11T16:34:26,966][INFO ][o.e.c.m.MetaDataMappingService] [DfhKXPP] [local-datashare/dBluMRVLRgKc1xC-QAxW-A] update_mapping [doc]
[2019-01-11T16:36:00,544][INFO ][o.e.m.j.JvmGcMonitorService] [DfhKXPP] [gc][2759] overhead, spent [345ms] collecting in the last [1s]
[2019-01-11T16:58:38,153][INFO ][o.e.c.m.MetaDataMappingService] [DfhKXPP] [local-datashare/dBluMRVLRgKc1xC-QAxW-A] update_mapping [doc]
[2019-01-11T16:58:47,294][WARN ][o.e.m.j.JvmGcMonitorService] [DfhKXPP] [gc][4125] overhead, spent [555ms] collecting in the last [1s]
[2019-01-11T16:59:56,668][WARN ][o.e.m.j.JvmGcMonitorService] [DfhKXPP] [gc][young][4194][77] duration [1s], collections [1]/[1.3s], total [1s]/[6s], memory [253.7mb]->[217.1mb]/[1007.3mb], all_pools {[young] [69.3mb]->[5.8mb]/[133.1mb]}{[survivor] [16.6mb]->[13mb]/[16.6mb]}{[old] [167.7mb]->[198.2mb]/[857.6mb]}
[2019-01-11T16:59:56,668][WARN ][o.e.m.j.JvmGcMonitorService] [DfhKXPP] [gc][4194] overhead, spent [1s] collecting in the last [1.3s]
[2019-01-11T17:00:34,676][INFO ][o.e.m.j.JvmGcMonitorService] [DfhKXPP] [gc][4232] overhead, spent [484ms] collecting in the last [1s]
[2019-01-11T17:18:01,071][INFO ][o.e.m.j.JvmGcMonitorService] [DfhKXPP] [gc][5278] overhead, spent [338ms] collecting in the last [1.1s]
[2019-01-11T17:18:07,075][INFO ][o.e.m.j.JvmGcMonitorService] [DfhKXPP] [gc][young][5284][94] duration [858ms], collections [1]/[1s], total [858ms]/[8.3s], memory [418.3mb]->[347.2mb]/[1007.3mb], all_pools {[young] [102.4mb]->[16mb]/[133.1mb]}{[survivor] [16.6mb]->[12.2mb]/[16.6mb]}{[old] [299.2mb]->[318.9mb]/[857.6mb]}
[2019-01-11T17:18:07,075][WARN ][o.e.m.j.JvmGcMonitorService] [DfhKXPP] [gc][5284] overhead, spent [858ms] collecting in the last [1s]
[2019-01-11T17:18:24,079][INFO ][o.e.m.j.JvmGcMonitorService] [DfhKXPP] [gc][5301] overhead, spent [361ms] collecting in the last [1s]
[2019-01-11T17:18:55,167][INFO ][o.e.m.j.JvmGcMonitorService] [DfhKXPP] [gc][young][5332][98] duration [726ms], collections [1]/[1s], total [726ms]/[9.5s], memory [459.3mb]->[384.4mb]/[1007.3mb], all_pools {[young] [107.6mb]->[16.2mb]/[133.1mb]}{[survivor] [16.6mb]->[16.6mb]/[16.6mb]}{[old] [335mb]->[351.5mb]/[857.6mb]}
[2019-01-11T17:18:55,170][WARN ][o.e.m.j.JvmGcMonitorService] [DfhKXPP] [gc][5332] overhead, spent [726ms] collecting in the last [1s]
[2019-01-11T17:18:56,877][INFO ][o.e.m.j.JvmGcMonitorService] [DfhKXPP] [gc][young][5333][99] duration [702ms], collections [1]/[1.7s], total [702ms]/[10.3s], memory [384.4mb]->[384.3mb]/[1007.3mb], all_pools {[young] [16.2mb]->[674.4kb]/[133.1mb]}{[survivor] [16.6mb]->[15.9mb]/[16.6mb]}{[old] [351.5mb]->[367.9mb]/[857.6mb]}
[2019-01-11T17:18:56,877][INFO ][o.e.m.j.JvmGcMonitorService] [DfhKXPP] [gc][5333] overhead, spent [702ms] collecting in the last [1.7s]
[2019-01-11T17:19:12,883][WARN ][o.e.m.j.JvmGcMonitorService] [DfhKXPP] [gc][5349] overhead, spent [615ms] collecting in the last [1s]
[2019-01-11T17:26:28,000][INFO ][o.e.m.j.JvmGcMonitorService] [DfhKXPP] [gc][5778] overhead, spent [370ms] collecting in the last [1s]
[2019-01-11T17:41:30,904][INFO ][o.e.m.j.JvmGcMonitorService] [DfhKXPP] [gc][6600] overhead, spent [301ms] collecting in the last [1s]
[2019-01-11T17:41:31,905][INFO ][o.e.m.j.JvmGcMonitorService] [DfhKXPP] [gc][6601] overhead, spent [300ms] collecting in the last [1s]
[2019-01-11T17:41:32,905][WARN ][o.e.m.j.JvmGcMonitorService] [DfhKXPP] [gc][6602] overhead, spent [519ms] collecting in the last [1s]
[2019-01-11T17:41:34,351][WARN ][o.e.m.j.JvmGcMonitorService] [DfhKXPP] [gc][6603] overhead, spent [827ms] collecting in the last [1.4s]
[2019-01-11T17:41:38,535][WARN ][o.e.m.j.JvmGcMonitorService] [DfhKXPP] [gc][6607] overhead, spent [637ms] collecting in the last [1.1s]
[2019-01-11T17:41:43,352][INFO ][o.e.c.m.MetaDataMappingService] [DfhKXPP] [local-datashare/dBluMRVLRgKc1xC-QAxW-A] update_mapping [doc]
[2019-01-11T17:41:44,556][INFO ][o.e.m.j.JvmGcMonitorService] [DfhKXPP] [gc][6613] overhead, spent [466ms] collecting in the last [1s]
[2019-01-11T17:41:48,666][INFO ][o.e.m.j.JvmGcMonitorService] [DfhKXPP] [gc][6617] overhead, spent [547ms] collecting in the last [1.1s]
[2019-01-11T17:41:53,369][WARN ][o.e.m.j.JvmGcMonitorService] [DfhKXPP] [gc][6619] overhead, spent [3.5s] collecting in the last [3.7s]

C:\Users\Will>

Error while launching NLP pipeline as cli

Command line :
./datashare.sh -r -s NLP --nlpp CORENLP

Error :
Exception in thread "main" com.google.inject.ConfigurationException: Guice configuration errors:

  1. No implementation for org.icij.extract.queue.DocumentQueue was bound.
    while locating org.icij.extract.queue.DocumentQueue

1 error
at com.google.inject.internal.InjectorImpl.getProvider(InjectorImpl.java:1045)
at com.google.inject.internal.InjectorImpl.getProvider(InjectorImpl.java:1004)
at com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1054)
at org.icij.datashare.cli.CliApp.start(CliApp.java:66)
at org.icij.datashare.cli.DatashareCli.main(DatashareCli.java:34)

Index optimisation

We have seen that on an amazon t2.2xlarge instance (8 cores) a corpus with several 1000's files is taking all the CPU's and could freeze.

The index is much more lighter without OCR (10x faster env.)

Polling tasks never stops after calling action

To reproduce :

  1. Go on Datashare welcome screen
  2. Click on "Analyze documents"
    => The app starts polling tasks each 2 seconds
  3. Click on "Extract text" to extract the texts of your documents
  4. Click on the "search" button
    => The app keeps polling tasks each 2 seconds instead of stopping

PS : If you skip 3., everything works as expected !

Go offline at runtime

When user is offline, NLP models can't work without being blocked when they try to check if there are new models available.

Pass actual `dataDir` to the client config

In order to display the current data directory, the client needs to be aware of the current configuration.

The data directory is displayed in the footer:

sans titre

As well as under each document:

sans titre

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.