Comments (5)
Here is another indexing speed optimization:
8d4169a
from mindbender.
Thanks for the suggestions! Yeah I was anticipating we'd need parallel indexing pretty soon. I had bad experience with GNU parallel–it was unstable, bloated, CLI changing too much across versions–but will backport these soon maybe using the more familiar xargs or embedding an exact version of parallel.
Side question: After parallelizing, is there any sign of ES being the new bottleneck? Would adding more nodes to the ES cluster help? The keep-elasticsearch-during
currently launches an isolated single node ES server, but we could enhance it and introduce a subcommand like mindbender search join-cluster
to make it easy to scale out.
from mindbender.
No, ES seems to have a very flexible thread pool scheme in one node and can saturate all cores. I suspect that even if there is only one shard, it's still able to saturate all cores. If hardware is the bottleneck, then yeah, we could add new node support.
from mindbender.
I see. Sounds like deciding the cluster size should depend on query time latency requirement.
from mindbender.
Another key performance knob is ES_HEAP_SIZE:
https://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html
But default ES's heap size is 0.25-1G. We may want to use a different default.
from mindbender.
Related Issues (20)
- Cannot run deepdive in examples/spouse_example folder HOT 4
- invalid mindbender COMMAND HOT 1
- Support @searchable for text[]? HOT 1
- How to trace source sentence with multiple keys? HOT 2
- MB Search relies on Postgres 9.3+ for to_json (so PGXL and Greenplum are out of luck) HOT 6
- Faceted Search in MBS: Support unified ES doc type so we can have faceted search HOT 1
- Spouse example does not work with latest DeepDive/ddlog HOT 3
- mindtagger search/filter function
- is_correct counts disappear after restarting mindtagger HOT 3
- After opening a snapshot, pane is completely empty; fills after multiple refreshes
- Adapt Mindtagger Instance for Genepheno Precision alone HOT 2
- Master branch doesn't build, but there's an easy fix HOT 1
- Source Directory Must Be Lowercase
- How to change ip:port config to run mindbender in a server? HOT 1
- Error loading relation
- recall mode: tsv export fails to separate mention spans
- Precision Mode: Mindtagger does not automatically update Tags Statistics when update new data
- Mindtagger: Defining categorical tags/labels
- Parametric tag remove button doesn't commit
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mindbender.