visallo / vertexium Goto Github PK
View Code? Open in Web Editor NEWHigh-security graph database
Home Page: http://vertexium.org/
License: Apache License 2.0
High-security graph database
Home Page: http://vertexium.org/
License: Apache License 2.0
It seems that the example provided at URL https://github.com/v5analytics/vertexium for API usage example to create AccumuloGraph does not use ElasticSearch by default.
I am using the following configuration to use create AccumuloGraph to ensure that ElasticSearch indexes are being used while saving data or retrieving data from Graph. I am using ElasticSearch version 1.4.5
Map mapConfig = new HashMap();
mapConfig.put(AccumuloGraphConfiguration.USE_SERVER_SIDE_ELEMENT_VISIBILITY_ROW_FILTER, false);
mapConfig.put(AccumuloGraphConfiguration.ACCUMULO_INSTANCE_NAME, "instance_name");
mapConfig.put(AccumuloGraphConfiguration.ACCUMULO_USERNAME, "username");
mapConfig.put(AccumuloGraphConfiguration.ACCUMULO_PASSWORD, "password");
mapConfig.put(AccumuloGraphConfiguration.ZOOKEEPER_SERVERS, "localhost");
mapConfig.put("search", "org.vertexium.elasticsearch.ElasticsearchSingleDocumentSearchIndex");
mapConfig.put("search.indexName", "instance-graph");
mapConfig.put("search.indexEdges", "false");
mapConfig.put("search.locations", "172.xxx.xxx.xxx");
mapConfig.put("search.indicesToQuery", "instance-graph");
mapConfig.put("search.clusterName", "elasticsearch");
AccumuloGraphConfiguration graphConfig = new AccumuloGraphConfiguration(mapConfig);
Graph graph = AccumuloGraph.create(graphConfig);
I am successfully able to insert a new vertex into the graph store. Let us assume that I have created a vertex with id=1 and property name="john",email="[email protected]"
I am successfully able to search for the stored vertex using vertex id (1 for this particular example). When I try to search for an already existing vertex using a property value (say name="john"), I do not get any hit(s) from ElasticSearch. Sample code
Query graphQuery = graph.query("", authorizations);
graphQuery.has("name", "john");
// zero vertices are returned
Iterable iterableVertices = graphQuery.vertices();
Please note that if do not use ElasticSearch by commenting out all properties starting with "search" while instantiating MapConfig object in the code snipped above, I can search for vertices using either vertex id or a property value.
Could one of the experts please throw some light on how to get past this issue?
In the case of a combination of Accumulo and Elasticsearch, blind updating a single value of a multi-valued property on a vertex will result in the Elasticsearch document only having that single value. This is due to the fact that at the time of inserting that property, the mutation has not loaded all of the values for the other keys from Accumulo.
If you have multiple graph instances, the metadata in synced using Curator, but the GraphBase keeps a local cache of all PropertyDefinitions.
We need to invalidate that cache when we get defineProperty metadata changes
The following should throw an exception because the properties were not retrieved.
Vertex v = g.getVertex("v1", FetchHint.NONE, auths);
v.getProperties();
Expected
Properties should be empty.
Found
Properties contain the properties from before the soft delete
We are using the vertex/edge/extended data id as the Elasticsearch search id which is taking up large amounts of cache size.
To fix this I propose we make the following changes
_type
from element
to e
.The ElasticSearch code indexes everything into a single index by the id of the vertex or edge. Therefore, an edge with id=A will overwrite a previously indexed vertex with id=A and vice-versa.
The question is whether this is expected behavior or not. Are ids assumed to be unique across the graph or only within their own type (vertex or edge)?
By setting termAggregation.shardSize
to a specific value this limits requests with larger requested terms.
So if you ask for 100 terms, 10 is too small.
If instead we change this to a function of the requested terms this could accommodate more requests. Possibly 2 times the number of requested terms.
QueryResultsIterable<Edge> edgeInVertexIdMatch = graph.query(authorizations)
.has(Edge.VERTEX_IDS_PROPERTY_NAME, Contains.IN, vertexIds)
.edges();
This only occurs with SQL/InMemory from what I can tell. I'm guessing we need to record metadata changes in the mutation log.
See branch: https://github.com/v5analytics/vertexium/tree/sql-metadata-history
Create an Accumulo iterator that will run during compaction that reduces duplicate rows with different timestamps but otherwise identical data into the oldest row.
To reproduce:
With auths A, B, create a vertex with auth A and a property with auth B.
Get vertex with auth A.
Set property with auth A.
Query for property in step 1. The vertex does not come back, but should.
In an effort to get Rester running with Vertexium, I have followed the README provided (making some modifications like providing the full classname), however have run into an issue that I can't seem to figure out. Any guidance would be appreciated.
My rexster.xml
<graph>
<graph-name>org.vertexium.accumulo.AccumuloGraph</graph-name>
<graph-type>org.vertexium.accumulo.blueprints.AccumuloVertexiumRexsterGraphConfiguration</graph-type>
<storage>org.vertexium.accumulo.blueprints.AccumuloVertexiumBlueprintsGraphFactory</storage>
<graph-useServerSideElementVisibilityRowFilter>false</graph-useServerSideElementVisibilityRowFilter>
<graph-accumuloInstanceName>[INSTANCE_NAME]</graph-accumuloInstanceName>
<graph-username>root</graph-username>
<graph-password>[PASSWORD]</graph-password>
<graph-tableNamePrefix>baseball</graph-tableNamePrefix>
<graph-zookeeperServers>[IP_ADDRESS]</graph-zookeeperServers>
<graph-serializer>org.vertexium.accumulo.serializer.JavaValueSerializer</graph-serializer>
<graph-idgenerator>org.vertexium.id.UUIDIdGenerator</graph-idgenerator>
<graph-search>org.vertexium.elasticsearch.ElasticSearchSearchIndex</graph-search>
<graph-search-locations>[IP_ADDRESS]</graph-search-locations>
<graph-search-indexName>baseball</graph-search-indexName>
<visibilityProvider>org.vertexium.blueprints.DefaultVisibilityProvider</visibilityProvider>
<authorizationsProvider>org.vertexium.accumulo.blueprints.AccumuloAuthorizationsProvider</authorizationsProvider>
<authorizationsProvider-auths>1,2,3,A,B,C,HIGH_INTEREST,SENSITIVE,SOME_INTEREST,UNKNOWN_INTEREST</authorizationsProvider-auths>
<extensions>
<allows>
<allow>tp:gremlin</allow>
</allows>
</extensions>
</graph>
...Just to show that I have the jar in my class path and the AccumuloAuthorizationsProvider class is there...
$ jar tf /usr/local/rexster/lib/vertexium-accumulo-blueprints-0.10.1-SNAPSHOT.jar
META-INF/
META-INF/MANIFEST.MF
org/
org/vertexium/
org/vertexium/accumulo/
org/vertexium/accumulo/blueprints/
org/vertexium/accumulo/blueprints/AccumuloAuthorizationsProvider.class
org/vertexium/accumulo/blueprints/AccumuloVertexiumBlueprintsGraph.class
org/vertexium/accumulo/blueprints/AccumuloVertexiumBlueprintsGraphFactory.class
org/vertexium/accumulo/blueprints/AccumuloVertexiumRexsterGraphConfiguration.class
META-INF/maven/
META-INF/maven/org.vertexium/
META-INF/maven/org.vertexium/vertexium-accumulo-blueprints/
META-INF/maven/org.vertexium/vertexium-accumulo-blueprints/pom.xml
META-INF/maven/org.vertexium/vertexium-accumulo-blueprints/pom.properties
The Exception after running ./bin/rexster.sh --start....
Caused by: java.lang.NullPointerException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:191)
at org.vertexium.util.ConfigurationUtils.createProvider(ConfigurationUtils.java:24)
... 10 more
[WARN] GraphConfigurationContainer - Could not create authorization provider
org.vertexium.VertexiumException: Could not create authorization provider
at org.vertexium.accumulo.blueprints.AccumuloVertexiumBlueprintsGraphFactory.createAuthorizationsProvider(AccumuloVertexiumBlueprintsGraphFactory.java:42)
at org.vertexium.accumulo.blueprints.AccumuloVertexiumBlueprintsGraphFactory.createGraph(AccumuloVertexiumBlueprintsGraphFactory.java:17)
at org.vertexium.accumulo.blueprints.AccumuloVertexiumRexsterGraphConfiguration.configureGraphInstance(AccumuloVertexiumRexsterGraphConfiguration.java:30)
at com.tinkerpop.rexster.config.GraphConfigurationContainer.getGraphFromConfiguration(GraphConfigurationContainer.java:119)
at com.tinkerpop.rexster.config.GraphConfigurationContainer.(GraphConfigurationContainer.java:54)
at com.tinkerpop.rexster.server.XmlRexsterApplication.reconfigure(XmlRexsterApplication.java:99)
at com.tinkerpop.rexster.server.XmlRexsterApplication.(XmlRexsterApplication.java:47)
at com.tinkerpop.rexster.Application.(Application.java:96)
at com.tinkerpop.rexster.Application.main(Application.java:188)
Caused by: org.vertexium.VertexiumException: java.lang.NullPointerException
at org.vertexium.util.ConfigurationUtils.createProvider(ConfigurationUtils.java:52)
at org.vertexium.util.ConfigurationUtils.createProvider(ConfigurationUtils.java:16)
at org.vertexium.accumulo.blueprints.AccumuloVertexiumBlueprintsGraphFactory.createAuthorizationsProvider(AccumuloVertexiumBlueprintsGraphFactory.java:40)
... 8 more
Caused by: java.lang.NullPointerException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:191)
at org.vertexium.util.ConfigurationUtils.createProvider(ConfigurationUtils.java:24)
... 10 more
When markPropertyHidden
and markPropertyVisible
are called on the AccumuloGraph
, the search index is not updated to reflect it.
extended_bounds
allows you to specify the min and max of the aggregation start. Especially useful with date histograms to start on Jan 1, with interval of year.
These classes is used in CypherAstParser
see code
but i can not find it in anywhere
This is only an issue when running Accumulo + ES. The problem is we read the input stream when we save the data into Accumulo (https://github.com/visallo/vertexium/blob/master/core/src/main/java/org/vertexium/property/MutableProperty.java#L26) so when we update the index in ES the value becomes empty string (https://github.com/visallo/vertexium/blob/master/elasticsearch5/src/main/java/org/vertexium/elasticsearch5/Elasticsearch5SearchIndex.java#L998).
Test that reproduces the issue:
@Test
public void testReAddingStreamingPropertyValue() throws IOException {
PropertyValue propSmall = StreamingPropertyValue.create(new ByteArrayInputStream("value1".getBytes()), String.class, 6L);
graph.prepareVertex("v1", VISIBILITY_A)
.setProperty("propSmall", propSmall, VISIBILITY_A)
.save(AUTHORIZATIONS_A_AND_B);
graph.flush();
List<Vertex> vertexHits = toList(graph.query(AUTHORIZATIONS_A_AND_B)
.has("propSmall", "value1")
.vertices());
assertEquals(1, vertexHits.size());
assertEquals("v1", vertexHits.get(0).getId());
Vertex v1 = graph.getVertex("v1", AUTHORIZATIONS_A_AND_B);
Iterable<Object> propSmallValues = v1.getPropertyValues("propSmall");
Assert.assertEquals(1, count(propSmallValues));
Object propSmallValue = propSmallValues.iterator().next();
assertTrue("propSmallValue was " + propSmallValue.getClass().getName(), propSmallValue instanceof StreamingPropertyValue);
StreamingPropertyValue value = (StreamingPropertyValue) propSmallValue;
assertEquals(String.class, value.getValueType());
assertEquals("value1".getBytes().length, (long) value.getLength());
assertEquals("value1", IOUtils.toString(value.getInputStream()));
propSmall = StreamingPropertyValue.create(new ByteArrayInputStream("value2".getBytes()), String.class, 6L);
v1 = graph.getVertex("v1", AUTHORIZATIONS_A_AND_B);
v1.prepareMutation()
.setProperty("propSmall", propSmall, VISIBILITY_A)
.save(AUTHORIZATIONS_A_AND_B);
graph.flush();
vertexHits = toList(graph.query(AUTHORIZATIONS_A_AND_B)
.has("propSmall", "value2")
.vertices());
assertEquals(1, vertexHits.size());
assertEquals("v1", vertexHits.get(0).getId());
v1 = graph.getVertex("v1", AUTHORIZATIONS_A_AND_B);
propSmallValues = v1.getPropertyValues("propSmall");
Assert.assertEquals(1, count(propSmallValues));
propSmallValue = propSmallValues.iterator().next();
assertTrue("propSmallValue was " + propSmallValue.getClass().getName(), propSmallValue instanceof StreamingPropertyValue);
value = (StreamingPropertyValue) propSmallValue;
assertEquals(String.class, value.getValueType());
assertEquals("value2".getBytes().length, (long) value.getLength());
assertEquals("value2", IOUtils.toString(value.getInputStream()));
}
Either clean up overflow data or preferably move it into Accumulo and split up rows
Map<String, Object> config = new HashMap<>();
InMemoryGraphConfiguration configuration = new InMemoryGraphConfiguration(config);
InMemoryGraph inMemoryGraph = InMemoryGraph.create(configuration);
String securityTag = "manager";
Authorizations noAuthorizations = inMemoryGraph.createAuthorizations();
Authorizations higherLevelAuthorizations = inMemoryGraph.createAuthorizations(securityTag);
Visibility visibility = Visibility.EMPTY;
Visibility highLevelVisibility = new Visibility(securityTag);
Vertex myVertex = inMemoryGraph.prepareVertex("myVertex", highLevelVisibility)
.setProperty("pro1", "val1", visibility)
.save(higherLevelAuthorizations);
inMemoryGraph.markVertexHidden(myVertex,highLevelVisibility,higherLevelAuthorizations);
inMemoryGraph.markVertexVisible(myVertex,visibility,higherLevelAuthorizations);
// but when i set visibility error
// inMemoryGraph.markVertexVisible(myVertex,highLevelVisibility,higherLevelAuthorizations);
inMemoryGraph.flush();
System.out.println("==== first query");
Iterable<Vertex> vertices2 = inMemoryGraph.getVertices(higherLevelAuthorizations);
for(Vertex vertex : vertices2){
System.out.println(vertex.getId());
}
I ran into a case in which Elasticsearch became unusable after a large document was ingested into the system. The only resolution was to remove that document and truncate future documents from getting that large.
Possible mitigation would be to throw an exception instead of ingesting the document if it is too large 500MB? Possibly warn with smaller documents.
Graph looks like:
v1 -> v2
Find path between v1 and v2 with a max hop of 2 should return 1 path. Currently it is returning no paths.
Given:
v1->v2
v1->v3
v2->v3
Running v1.query().edges()
should return only edges v1->v2
and v1->v3
. It currently returns all edges.
org.vertexium.VertexiumException: Failed to find edge with id: PERSON_john_smith_HAS_TOPIC_SKILL_technology
at org.vertexium.accumulo.AccumuloGraph.getEdge(AccumuloGraph.java:1429)
at org.vertexium.GraphBase.getEdge(GraphBase.java:239)
at org.vertexium.GraphBase.getEdge(GraphBase.java:255)
at com.visallo.meetup.workers.MeetupGetMemberWorker.saveMemberToTopicEdge(MeetupGetMemberWorker.java:73)
at com.visallo.meetup.workers.MeetupGetMemberWorker.saveMemberTopics(MeetupGetMemberWorker.java:67)
at com.visallo.meetup.workers.MeetupGetMemberWorker.process(MeetupGetMemberWorker.java:53)
at com.visallo.meetup.workers.MeetupGetMemberWorker.process(MeetupGetMemberWorker.java:43)
at com.visallo.meetup.MeetupExternalResourceWorker.process(MeetupExternalResourceWorker.java:30)
at org.visallo.core.externalResource.QueueExternalResourceWorker.run(QueueExternalResourceWorker.java:62)
at org.visallo.core.externalResource.ExternalResourceRunner$2.run(ExternalResourceRunner.java:74)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: More than 1 item found.
at org.vertexium.util.IterableUtils.singleOrDefault(IterableUtils.java:142)
at org.vertexium.accumulo.AccumuloGraph.getEdge(AccumuloGraph.java:1427)
... 10 more
I have about 90K vertices of specific kind, which correlates to about 4M rows in Accumulo, and it takes about 10 sec to get the full list of vertices. Most of the time is spent in getting the data from Accumulo, however I have tried it on different configuration with very strong machines on AWS and the performance does not get better than about 10s. Is that reasonable? does anyone have experience with such queries?
Using:
Vertexium 4.4 with suggested ES / Accumulo / Hadoop
4vCPU, 16GB RAM (AWS m5d.xlarge), 5 servers running Accumulo tservers and Hadoop data nodes, one name node running Accumulo master.
Currently the limit is hard coded at 100. It would be nice if this limit was configurable at a global level.
In aggregations this returns the incorrect numbers.
In search this results the incorrect result counts.
Also see #57
Vertexium Exception: Could not find property
Currently SPV data is not deleted on property deletion.
I would encourage this project to consider adding a link to the Accumulo website. If you are interested, I am happy to assist with this if you know what content you would like. You could also make a PR to change pages/related-projects.md
Sample Graph:
A,B
A,C
Steps to reproduce:
Expected Result: AB would still be hidden
Actual Result: AB is marked visible when A is marked visible
Currently if you've altered the visibility of the property you want to delete in the past, that historical value is not deleted.
GraphQuery query = defaultGraph.query(authA);
long time = new Date().getTime();
// Query has = query.has("name", "代乐嘿");
Query has = query
.has("name","318");
// .has("age",43);
System.out.println("edge start==================");
QueryResultsIterable<Edge> edges = has.edges();
edges.forEach(x->System.out.println(x));
System.out.println("vertex start==================");
QueryResultsIterable<Vertex> vertices = has.vertices();
vertices.forEach(x->System.out.println(x));
long l = new Date().getTime() - time;
System.out.println(l+"ms");
DEBUG INFO
2018-11-22 17:01:17.953/CST INFO [vertexium.accumulo.AccumuloGraphConfiguration] Connecting to accumulo instance [accumulo] zookeeper servers [sinan10:2181,sinan11:2181,sinan12:2181]
2018-11-22 17:01:17.981/CST DEBUG [vertexium.util.ConfigurationUtils] creating provider 'org.vertexium.elasticsearch5.Elasticsearch5SearchIndex'
2018-11-22 17:01:17.981/CST DEBUG [vertexium.util.ConfigurationUtils] creating provider 'org.vertexium.elasticsearch5.DefaultIndexSelectionStrategy'
2018-11-22 17:01:17.981/CST INFO [vertexium.elasticsearch5.DefaultIndexSelectionStrategy] Default index name: spvertexium
2018-11-22 17:01:17.981/CST INFO [vertexium.elasticsearch5.DefaultIndexSelectionStrategy] Extended data index name prefix: spvertexium_extdata_
2018-11-22 17:01:17.981/CST INFO [vertexium.elasticsearch5.DefaultIndexSelectionStrategy] Split edges and vertices: true
2018-11-22 17:01:17.981/CST DEBUG [vertexium.util.ConfigurationUtils] creating provider 'org.vertexium.elasticsearch5.MetadataTablePropertyNameVisibilitiesStore'
2018-11-22 17:01:18.142/CST WARN [vertexium.elasticsearch5.Elasticsearch5SearchIndex] Running without the server side Vertexium plugin will disable some features.
2018-11-22 17:01:18.142/CST DEBUG [vertexium.util.ConfigurationUtils] creating provider 'com.wxss.vertexium.conf.SnowflakeIdWorker'
2018-11-22 17:01:18.142/CST DEBUG [vertexium.util.ConfigurationUtils] creating provider 'org.vertexium.JavaVertexiumSerializer'
2018-11-22 17:01:18.142/CST DEBUG [vertexium.util.ConfigurationUtils] creating provider 'org.vertexium.id.IdentityNameSubstitutionStrategy'
2018-11-22 17:01:18.142/CST DEBUG [vertexium.util.ConfigurationUtils] creating provider 'org.vertexium.accumulo.util.OverflowIntoHdfsStreamingPropertyValueStorageStrategy'
2018-11-22 17:01:18.265/CST INFO [vertexium.accumulo.AccumuloGraph] accumulo.graph.version=2
//Why are there two queries?
2018-11-22 17:01:18.804/CST DEBUG [vertexium.elasticsearch5.ElasticsearchSearchQueryBase] elasticsearch results (vertices: 0 + edges: 1 + extended data: 0 = 1)
2018-11-22 17:01:18.919/CST DEBUG [vertexium.elasticsearch5.ElasticsearchSearchQueryBase] elasticsearch results (vertices: 0 + edges: 0 + extended data: 0 = 0)
vertex start==================
2018-11-22 17:01:18.968/CST DEBUG [vertexium.elasticsearch5.ElasticsearchSearchQueryBase] elasticsearch results (vertices: 0 + edges: 0 + extended data: 0 = 0)
672ms
when i set Split edges and vertices false ,But there is no such problem
If the queries return org.securegraph.query.IterableWithScores sort by score in org.securegraph.query.CompositeGraphQuery
If I am not mistaken, the API currently does not support developer specified timestamps for attributes? If not supported, any plans to do so soon?
For example, if I wanted to keep track of the outdoor temperature of an area on a daily basis, I might treat the place as a vertex and the temperature as an attribute of that vertex. For each temperature reading, I would want to use the timestamp reflecting when that temperature was taken.
Many similar graph databases are on this list: http://tinkerpop.apache.org/providers.html
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.