Coder Social home page Coder Social logo

visallo / vertexium Goto Github PK

View Code? Open in Web Editor NEW
62.0 62.0 35.0 17.28 MB

High-security graph database

Home Page: http://vertexium.org/

License: Apache License 2.0

Java 83.71% ANTLR 0.65% Gherkin 15.61% Shell 0.01% XSLT 0.02% Dockerfile 0.01%
accumulo elasticsearch graph-database java

vertexium's People

Contributors

dependabot[bot] avatar diegogrz avatar dsingley avatar dspoja avatar jharwig avatar joeferner avatar joeybrk372 avatar kunklejr avatar mwizeman avatar rygim avatar sfeng88 avatar srfarley avatar sugargreenbean avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vertexium's Issues

ElasticSearch does not work with Vertexium

It seems that the example provided at URL https://github.com/v5analytics/vertexium for API usage example to create AccumuloGraph does not use ElasticSearch by default.

I am using the following configuration to use create AccumuloGraph to ensure that ElasticSearch indexes are being used while saving data or retrieving data from Graph. I am using ElasticSearch version 1.4.5

Map mapConfig = new HashMap();
mapConfig.put(AccumuloGraphConfiguration.USE_SERVER_SIDE_ELEMENT_VISIBILITY_ROW_FILTER, false);
mapConfig.put(AccumuloGraphConfiguration.ACCUMULO_INSTANCE_NAME, "instance_name");
mapConfig.put(AccumuloGraphConfiguration.ACCUMULO_USERNAME, "username");
mapConfig.put(AccumuloGraphConfiguration.ACCUMULO_PASSWORD, "password");
mapConfig.put(AccumuloGraphConfiguration.ZOOKEEPER_SERVERS, "localhost");
mapConfig.put("search", "org.vertexium.elasticsearch.ElasticsearchSingleDocumentSearchIndex");
mapConfig.put("search.indexName", "instance-graph");
mapConfig.put("search.indexEdges", "false");
mapConfig.put("search.locations", "172.xxx.xxx.xxx");
mapConfig.put("search.indicesToQuery", "instance-graph");
mapConfig.put("search.clusterName", "elasticsearch");
AccumuloGraphConfiguration graphConfig = new AccumuloGraphConfiguration(mapConfig);
Graph graph = AccumuloGraph.create(graphConfig);

I am successfully able to insert a new vertex into the graph store. Let us assume that I have created a vertex with id=1 and property name="john",email="[email protected]"

I am successfully able to search for the stored vertex using vertex id (1 for this particular example). When I try to search for an already existing vertex using a property value (say name="john"), I do not get any hit(s) from ElasticSearch. Sample code
Query graphQuery = graph.query("", authorizations);
graphQuery.has("name", "john");
// zero vertices are returned
Iterable iterableVertices = graphQuery.vertices();

Please note that if do not use ElasticSearch by commenting out all properties starting with "search" while instantiating MapConfig object in the code snipped above, I can search for vertices using either vertex id or a property value.

Could one of the experts please throw some light on how to get past this issue?

Blind Update of a Multi-valued Property May Index Improperly

In the case of a combination of Accumulo and Elasticsearch, blind updating a single value of a multi-valued property on a vertex will result in the Elasticsearch document only having that single value. This is due to the fact that at the time of inserting that property, the mutation has not loaded all of the values for the other keys from Accumulo.

See test: https://github.com/visallo/vertexium/blob/135-query-with-fetchhint-none/test/src/main/java/org/vertexium/test/GraphTestBase.java#L4137-L4138

Soft deleting a vertex does not remove properties

  1. Create a vertex with a property
  2. Soft delete that vertex
  3. Create a new vertex with the same id

Expected
Properties should be empty.

Found
Properties contain the properties from before the soft delete

Elasticsearch large field data _uid cached

We are using the vertex/edge/extended data id as the Elasticsearch search id which is taking up large amounts of cache size.

To fix this I propose we make the following changes

  • _type from element to e.
  • MD5 then ASCII85 the id.

Vertices and edges with same id conflict during indexing with ElasticSearch

The ElasticSearch code indexes everything into a single index by the id of the vertex or edge. Therefore, an edge with id=A will overwrite a previously indexed vertex with id=A and vice-versa.

The question is whether this is expected behavior or not. Are ids assumed to be unique across the graph or only within their own type (vertex or edge)?

Elasticsearch: look at changing `termAggregation.shardSize` to be a function of requested terms

https://github.com/visallo/vertexium/blob/master/elasticsearch5/src/main/java/org/vertexium/elasticsearch5/ElasticsearchSearchIndexConfiguration.java#L25

By setting termAggregation.shardSize to a specific value this limits requests with larger requested terms.

So if you ask for 100 terms, 10 is too small.

If instead we change this to a function of the requested terms this could accommodate more requests. Possibly 2 times the number of requested terms.

Difficulty with Rexster Integration

In an effort to get Rester running with Vertexium, I have followed the README provided (making some modifications like providing the full classname), however have run into an issue that I can't seem to figure out. Any guidance would be appreciated.

My rexster.xml

<graph>
    <graph-name>org.vertexium.accumulo.AccumuloGraph</graph-name>
    <graph-type>org.vertexium.accumulo.blueprints.AccumuloVertexiumRexsterGraphConfiguration</graph-type>
    <storage>org.vertexium.accumulo.blueprints.AccumuloVertexiumBlueprintsGraphFactory</storage>
    <graph-useServerSideElementVisibilityRowFilter>false</graph-useServerSideElementVisibilityRowFilter>
    <graph-accumuloInstanceName>[INSTANCE_NAME]</graph-accumuloInstanceName>
    <graph-username>root</graph-username>
    <graph-password>[PASSWORD]</graph-password>
    <graph-tableNamePrefix>baseball</graph-tableNamePrefix>
    <graph-zookeeperServers>[IP_ADDRESS]</graph-zookeeperServers>
    <graph-serializer>org.vertexium.accumulo.serializer.JavaValueSerializer</graph-serializer>
    <graph-idgenerator>org.vertexium.id.UUIDIdGenerator</graph-idgenerator>
    <graph-search>org.vertexium.elasticsearch.ElasticSearchSearchIndex</graph-search>
    <graph-search-locations>[IP_ADDRESS]</graph-search-locations>
    <graph-search-indexName>baseball</graph-search-indexName>
    <visibilityProvider>org.vertexium.blueprints.DefaultVisibilityProvider</visibilityProvider>
   <authorizationsProvider>org.vertexium.accumulo.blueprints.AccumuloAuthorizationsProvider</authorizationsProvider>
    <authorizationsProvider-auths>1,2,3,A,B,C,HIGH_INTEREST,SENSITIVE,SOME_INTEREST,UNKNOWN_INTEREST</authorizationsProvider-auths>
    <extensions>
        <allows>
            <allow>tp:gremlin</allow>
        </allows>
    </extensions>
</graph>

...Just to show that I have the jar in my class path and the AccumuloAuthorizationsProvider class is there...

$ jar tf /usr/local/rexster/lib/vertexium-accumulo-blueprints-0.10.1-SNAPSHOT.jar 
META-INF/
META-INF/MANIFEST.MF
org/
org/vertexium/
org/vertexium/accumulo/
org/vertexium/accumulo/blueprints/
org/vertexium/accumulo/blueprints/AccumuloAuthorizationsProvider.class
org/vertexium/accumulo/blueprints/AccumuloVertexiumBlueprintsGraph.class
org/vertexium/accumulo/blueprints/AccumuloVertexiumBlueprintsGraphFactory.class
org/vertexium/accumulo/blueprints/AccumuloVertexiumRexsterGraphConfiguration.class
META-INF/maven/
META-INF/maven/org.vertexium/
META-INF/maven/org.vertexium/vertexium-accumulo-blueprints/
META-INF/maven/org.vertexium/vertexium-accumulo-blueprints/pom.xml
META-INF/maven/org.vertexium/vertexium-accumulo-blueprints/pom.properties

The Exception after running ./bin/rexster.sh --start....
Caused by: java.lang.NullPointerException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:191)
at org.vertexium.util.ConfigurationUtils.createProvider(ConfigurationUtils.java:24)
... 10 more
[WARN] GraphConfigurationContainer - Could not create authorization provider
org.vertexium.VertexiumException: Could not create authorization provider
at org.vertexium.accumulo.blueprints.AccumuloVertexiumBlueprintsGraphFactory.createAuthorizationsProvider(AccumuloVertexiumBlueprintsGraphFactory.java:42)
at org.vertexium.accumulo.blueprints.AccumuloVertexiumBlueprintsGraphFactory.createGraph(AccumuloVertexiumBlueprintsGraphFactory.java:17)
at org.vertexium.accumulo.blueprints.AccumuloVertexiumRexsterGraphConfiguration.configureGraphInstance(AccumuloVertexiumRexsterGraphConfiguration.java:30)
at com.tinkerpop.rexster.config.GraphConfigurationContainer.getGraphFromConfiguration(GraphConfigurationContainer.java:119)
at com.tinkerpop.rexster.config.GraphConfigurationContainer.(GraphConfigurationContainer.java:54)
at com.tinkerpop.rexster.server.XmlRexsterApplication.reconfigure(XmlRexsterApplication.java:99)
at com.tinkerpop.rexster.server.XmlRexsterApplication.(XmlRexsterApplication.java:47)
at com.tinkerpop.rexster.Application.(Application.java:96)
at com.tinkerpop.rexster.Application.main(Application.java:188)
Caused by: org.vertexium.VertexiumException: java.lang.NullPointerException
at org.vertexium.util.ConfigurationUtils.createProvider(ConfigurationUtils.java:52)
at org.vertexium.util.ConfigurationUtils.createProvider(ConfigurationUtils.java:16)
at org.vertexium.accumulo.blueprints.AccumuloVertexiumBlueprintsGraphFactory.createAuthorizationsProvider(AccumuloVertexiumBlueprintsGraphFactory.java:40)
... 8 more
Caused by: java.lang.NullPointerException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:191)
at org.vertexium.util.ConfigurationUtils.createProvider(ConfigurationUtils.java:24)
... 10 more

Re-adding SPV properties with the same key changes the ES field value to ""

This is only an issue when running Accumulo + ES. The problem is we read the input stream when we save the data into Accumulo (https://github.com/visallo/vertexium/blob/master/core/src/main/java/org/vertexium/property/MutableProperty.java#L26) so when we update the index in ES the value becomes empty string (https://github.com/visallo/vertexium/blob/master/elasticsearch5/src/main/java/org/vertexium/elasticsearch5/Elasticsearch5SearchIndex.java#L998).

Test that reproduces the issue:

@Test
    public void testReAddingStreamingPropertyValue() throws IOException {
        PropertyValue propSmall = StreamingPropertyValue.create(new ByteArrayInputStream("value1".getBytes()), String.class, 6L);

        graph.prepareVertex("v1", VISIBILITY_A)
            .setProperty("propSmall", propSmall, VISIBILITY_A)
            .save(AUTHORIZATIONS_A_AND_B);
        graph.flush();

        List<Vertex> vertexHits = toList(graph.query(AUTHORIZATIONS_A_AND_B)
            .has("propSmall", "value1")
            .vertices());
        assertEquals(1, vertexHits.size());
        assertEquals("v1", vertexHits.get(0).getId());

        Vertex v1 = graph.getVertex("v1", AUTHORIZATIONS_A_AND_B);
        Iterable<Object> propSmallValues = v1.getPropertyValues("propSmall");
        Assert.assertEquals(1, count(propSmallValues));
        Object propSmallValue = propSmallValues.iterator().next();
        assertTrue("propSmallValue was " + propSmallValue.getClass().getName(), propSmallValue instanceof StreamingPropertyValue);
        StreamingPropertyValue value = (StreamingPropertyValue) propSmallValue;
        assertEquals(String.class, value.getValueType());
        assertEquals("value1".getBytes().length, (long) value.getLength());
        assertEquals("value1", IOUtils.toString(value.getInputStream()));

        propSmall = StreamingPropertyValue.create(new ByteArrayInputStream("value2".getBytes()), String.class, 6L);
        v1 = graph.getVertex("v1", AUTHORIZATIONS_A_AND_B);
        v1.prepareMutation()
            .setProperty("propSmall", propSmall, VISIBILITY_A)
            .save(AUTHORIZATIONS_A_AND_B);
        graph.flush();

        vertexHits = toList(graph.query(AUTHORIZATIONS_A_AND_B)
            .has("propSmall", "value2")
            .vertices());
        assertEquals(1, vertexHits.size());
        assertEquals("v1", vertexHits.get(0).getId());
        v1 = graph.getVertex("v1", AUTHORIZATIONS_A_AND_B);
        propSmallValues = v1.getPropertyValues("propSmall");
        Assert.assertEquals(1, count(propSmallValues));
        propSmallValue = propSmallValues.iterator().next();
        assertTrue("propSmallValue was " + propSmallValue.getClass().getName(), propSmallValue instanceof StreamingPropertyValue);
        value = (StreamingPropertyValue) propSmallValue;
        assertEquals(String.class, value.getValueType());
        assertEquals("value2".getBytes().length, (long) value.getLength());
        assertEquals("value2", IOUtils.toString(value.getInputStream()));
    }

inMemoryGraph.markVertexVisible() Visibility

Map<String, Object> config = new HashMap<>();
        InMemoryGraphConfiguration configuration = new InMemoryGraphConfiguration(config);
        InMemoryGraph inMemoryGraph = InMemoryGraph.create(configuration);

        String securityTag = "manager";
        Authorizations noAuthorizations = inMemoryGraph.createAuthorizations();
        Authorizations higherLevelAuthorizations = inMemoryGraph.createAuthorizations(securityTag);
        Visibility visibility = Visibility.EMPTY;
        Visibility highLevelVisibility = new Visibility(securityTag);

        Vertex myVertex = inMemoryGraph.prepareVertex("myVertex", highLevelVisibility)
                .setProperty("pro1", "val1", visibility)
                .save(higherLevelAuthorizations);
        inMemoryGraph.markVertexHidden(myVertex,highLevelVisibility,higherLevelAuthorizations);

        inMemoryGraph.markVertexVisible(myVertex,visibility,higherLevelAuthorizations);
        // but when i set visibility error
//        inMemoryGraph.markVertexVisible(myVertex,highLevelVisibility,higherLevelAuthorizations);
        inMemoryGraph.flush();
        System.out.println("====  first query");
        Iterable<Vertex> vertices2 = inMemoryGraph.getVertices(higherLevelAuthorizations);
        for(Vertex vertex : vertices2){
            System.out.println(vertex.getId());
        }

Throw an exception on large Elasticsearch documents

I ran into a case in which Elasticsearch became unusable after a large document was ingested into the system. The only resolution was to remove that document and truncate future documents from getting that large.

Possible mitigation would be to throw an exception instead of ingesting the document if it is too large 500MB? Possibly warn with smaller documents.

IllegalStateException: "More than 1 item found" when trying to getEdge by id

org.vertexium.VertexiumException: Failed to find edge with id: PERSON_john_smith_HAS_TOPIC_SKILL_technology
at org.vertexium.accumulo.AccumuloGraph.getEdge(AccumuloGraph.java:1429)
at org.vertexium.GraphBase.getEdge(GraphBase.java:239)
at org.vertexium.GraphBase.getEdge(GraphBase.java:255)
at com.visallo.meetup.workers.MeetupGetMemberWorker.saveMemberToTopicEdge(MeetupGetMemberWorker.java:73)
at com.visallo.meetup.workers.MeetupGetMemberWorker.saveMemberTopics(MeetupGetMemberWorker.java:67)
at com.visallo.meetup.workers.MeetupGetMemberWorker.process(MeetupGetMemberWorker.java:53)
at com.visallo.meetup.workers.MeetupGetMemberWorker.process(MeetupGetMemberWorker.java:43)
at com.visallo.meetup.MeetupExternalResourceWorker.process(MeetupExternalResourceWorker.java:30)
at org.visallo.core.externalResource.QueueExternalResourceWorker.run(QueueExternalResourceWorker.java:62)
at org.visallo.core.externalResource.ExternalResourceRunner$2.run(ExternalResourceRunner.java:74)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: More than 1 item found.
at org.vertexium.util.IterableUtils.singleOrDefault(IterableUtils.java:142)
at org.vertexium.accumulo.AccumuloGraph.getEdge(AccumuloGraph.java:1427)
... 10 more

Performance with mid-size dataset

I have about 90K vertices of specific kind, which correlates to about 4M rows in Accumulo, and it takes about 10 sec to get the full list of vertices. Most of the time is spent in getting the data from Accumulo, however I have tried it on different configuration with very strong machines on AWS and the performance does not get better than about 10s. Is that reasonable? does anyone have experience with such queries?
Using:
Vertexium 4.4 with suggested ES / Accumulo / Hadoop
4vCPU, 16GB RAM (AWS m5d.xlarge), 5 servers running Accumulo tservers and Hadoop data nodes, one name node running Accumulo master.

Too Many Edges Marked Visible

Sample Graph:
A,B
A,C

Steps to reproduce:

  1. Mark edge AB as hidden
  2. Verify only AB is hidden
  3. Mark A hidden
  4. Verify that A, AB, and AC are all hidden
  5. Mark A as visible

Expected Result: AB would still be hidden
Actual Result: AB is marked visible when A is marked visible

when Split edges and vertices true , Information cannot be queried

GraphQuery query = defaultGraph.query(authA);
        long time = new Date().getTime();
//        Query has = query.has("name", "代乐嘿");
        Query has = query
                .has("name","318");
//                .has("age",43);
        System.out.println("edge start==================");
        QueryResultsIterable<Edge> edges = has.edges();
        edges.forEach(x->System.out.println(x));

        System.out.println("vertex start==================");
        QueryResultsIterable<Vertex> vertices = has.vertices();
        vertices.forEach(x->System.out.println(x));
        long l = new Date().getTime() - time;

        System.out.println(l+"ms");

DEBUG INFO
2018-11-22 17:01:17.953/CST INFO [vertexium.accumulo.AccumuloGraphConfiguration] Connecting to accumulo instance [accumulo] zookeeper servers [sinan10:2181,sinan11:2181,sinan12:2181]
2018-11-22 17:01:17.981/CST DEBUG [vertexium.util.ConfigurationUtils] creating provider 'org.vertexium.elasticsearch5.Elasticsearch5SearchIndex'
2018-11-22 17:01:17.981/CST DEBUG [vertexium.util.ConfigurationUtils] creating provider 'org.vertexium.elasticsearch5.DefaultIndexSelectionStrategy'
2018-11-22 17:01:17.981/CST INFO [vertexium.elasticsearch5.DefaultIndexSelectionStrategy] Default index name: spvertexium
2018-11-22 17:01:17.981/CST INFO [vertexium.elasticsearch5.DefaultIndexSelectionStrategy] Extended data index name prefix: spvertexium_extdata_
2018-11-22 17:01:17.981/CST INFO [vertexium.elasticsearch5.DefaultIndexSelectionStrategy] Split edges and vertices: true
2018-11-22 17:01:17.981/CST DEBUG [vertexium.util.ConfigurationUtils] creating provider 'org.vertexium.elasticsearch5.MetadataTablePropertyNameVisibilitiesStore'
2018-11-22 17:01:18.142/CST WARN [vertexium.elasticsearch5.Elasticsearch5SearchIndex] Running without the server side Vertexium plugin will disable some features.
2018-11-22 17:01:18.142/CST DEBUG [vertexium.util.ConfigurationUtils] creating provider 'com.wxss.vertexium.conf.SnowflakeIdWorker'
2018-11-22 17:01:18.142/CST DEBUG [vertexium.util.ConfigurationUtils] creating provider 'org.vertexium.JavaVertexiumSerializer'
2018-11-22 17:01:18.142/CST DEBUG [vertexium.util.ConfigurationUtils] creating provider 'org.vertexium.id.IdentityNameSubstitutionStrategy'
2018-11-22 17:01:18.142/CST DEBUG [vertexium.util.ConfigurationUtils] creating provider 'org.vertexium.accumulo.util.OverflowIntoHdfsStreamingPropertyValueStorageStrategy'
2018-11-22 17:01:18.265/CST INFO [vertexium.accumulo.AccumuloGraph] accumulo.graph.version=2

//Why are there two queries?
2018-11-22 17:01:18.804/CST DEBUG [vertexium.elasticsearch5.ElasticsearchSearchQueryBase] elasticsearch results (vertices: 0 + edges: 1 + extended data: 0 = 1)
2018-11-22 17:01:18.919/CST DEBUG [vertexium.elasticsearch5.ElasticsearchSearchQueryBase] elasticsearch results (vertices: 0 + edges: 0 + extended data: 0 = 0)
vertex start==================
2018-11-22 17:01:18.968/CST DEBUG [vertexium.elasticsearch5.ElasticsearchSearchQueryBase] elasticsearch results (vertices: 0 + edges: 0 + extended data: 0 = 0)
672ms

when i set Split edges and vertices false ,But there is no such problem

Enable record timestamp for attributes

If I am not mistaken, the API currently does not support developer specified timestamps for attributes? If not supported, any plans to do so soon?

For example, if I wanted to keep track of the outdoor temperature of an area on a daily basis, I might treat the place as a vertex and the temperature as an attribute of that vertex. For each temperature reading, I would want to use the timestamp reflecting when that temperature was taken.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.