Coder Social home page Coder Social logo

microsoft / graphengine Goto Github PK

View Code? Open in Web Editor NEW
2.2K 128.0 325.0 41.7 MB

Microsoft Graph Engine

Home Page: https://www.graphengine.io/

License: MIT License

C# 48.83% C 0.26% CMake 0.72% C++ 35.72% HTML 12.12% Scala 0.19% Shell 0.24% Batchfile 0.02% Lex 0.52% Yacc 0.40% PowerShell 0.25% R 0.72% sed 0.01%
graph-engine graph-query-language in-memory-storage in-memory-computations distributed-computing dotnet

graphengine's Introduction

Microsoft Graph Engine

- Windows Linux
Build Build status badge Build status badge

This repository contains the source code of Microsoft Graph Engine and its graph query language -- Language Integrated Knowledge Query (LIKQ).

Microsoft Graph Engine is a distributed in-memory data processing engine, underpinned by a strongly-typed in-memory key-value store and a general-purpose distributed computation engine.

LIKQ is a versatile graph query language built atop Graph Engine. It combines the capability of fast graph exploration with the flexibility of lambda expressions. Server-side computations can be expressed in lambda expressions, embedded in LIKQ, and executed on the Graph Engine servers during graph traversal.

Getting started

Recommended operating system: Windows 10 or Ubuntu 22.04.

Building on Windows

Download and install Visual Studio with the following "workloads" and "individual components" selected:

  • The ".NET desktop development" and "Desktop development with C++" workloads.
  • The ".NET Portable Library targeting pack" individual component.

Open a PowerShell window, run .\tools\build.ps1 for building the NuGet packages. The script has been tested on Windows 10 (22H2) with Visual Studio 2022.

Building on Linux

Install g++, cmake, and libssl-dev. For example, on Ubuntu, simply run

sudo apt update && sudo apt install g++ cmake libssl-dev

Install .NET SDK x64 8.0. For example, on Ubuntu 22.04, run sudo apt update && sudo apt install -y dotnet-sdk-8.0. Then, build GraphEngine with the following command:

bash tools/build.sh

The build script has been tested on Ubuntu 22.04 with g++ 11.4.0.

Using the built packages

You can find the built NuGet packages build/GraphEngine**._version_.nupkg in the build folder. In the building process, the build directory has been registered as a local NuGet repository and the local package cache for GraphEngine.Core has been cleared. After the packages are built, run dotnet restore to use the newly built packages.

Running your first Graph Engine app

Go to the samples/Friends folder, execute dotnet restore and dotnet run to run the sample project.

Contributing

Pull requests, issue reports, and suggestions are welcome.

Please read the code of conduct before contributing code.

Follow these instructions for reporting security issues.

License

Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT license.

Disclaimer

Microsoft Graph Engine is a research project. It is not an officially supported Microsoft product.

References

We kindly request that any published paper that makes use of Microsoft Graph Engine cites the following paper:

If you want to learn more about the algorithms and applications built on top of Microsoft Graph Engine, please refer to these publications.

graphengine's People

Contributors

chaosddp avatar fexio avatar jiabaohan avatar jsoref avatar leasunhy avatar leoxia avatar lianghe-ms avatar mandelliant avatar microsoftopensource avatar nvankaam avatar phomes avatar shaobin avatar tavitruman avatar thautwarm avatar v-yadli avatar yatli avatar z-shang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

graphengine's Issues

LIKQ: 0-hop traverse action

This is particularly useful for situations, where the starting nodes come from an index, but we want logic to be applied.

More data structures in TSL

Currently we only have primitive data types, structs and lists. It could be very useful if we introduce other data types like set and map.

Trinity.TSL: consider overriding object.Equals and object.GetHashCode

When we first build the codegen, we provided overridden implementations for == and !=, but not Equals and GetHashCode. This brings trouble for expressions like if (a == null) (we have to write if ((object)a == null). The compiler gives warnings about this. This makes it easy for us to add the missing functions once we migrate to the new template-based codegen, as we can have the compiler warnings available to us even at the template stage. However, on the other hand, we have to make sure that this is not a breaking change, for that overriding these two function may change the behavior of the code.

Bring up Trinity.TSL

The TSL compiler consists of multiple projects using C#, C++ and C++/CLI.
We have to first clear it up a bit and bring them up here gradually.
Also, C++/CLI will not be cross-platform in the near future, so we have to port it back to C#.

Feature req: Tinkerpop support

Hey guys!
Kudos on releasing this to the OS community :D

I wanted to ask if you're planning on implementing apache's tinkerpop in ms graph engine.
this feature would be crucial for most neo4j, titan (r.i.p), and all the other graphdb's users to port their systems to ms graph engine.

Generic fields backed by ITrinityStruct/IAccessor

This could be very useful for implementing generic algorithms on different cell types.
Take this example:

struct E
{
    [GraphEdge]
    cellid neighbor;
    [GraphEdgeWeight]
    double weight;
}

cell V
{
    [CompositeEdge]
    List<E> edges;
}

Currently ICell.GetField<T> only accepts non-generic types, so user-defined composite edge types are not possible. If we implement a generic field, we can then write ICell.EnumerateValues<IField>("CompositeEdge") without worrying about the type. Then, an algorithm can proceed and inspect [GraphEdge] and [GraphEdgeWeight] from the generic field respectedly.

Implement a TSL-neutral type conversion system

There are a few benefits from this, for example:

  1. It will greatly reduce the complexity of the codegen.
  2. It will accept open types, instead of sticking to a closed type system which only contains types defined in TSL. (for example, GetField is invalid when the type system only contains float, which is counterintuitive)

Trinity.TSL: move [Index] support out of codegen

Currently, [Index] attribute is interpreted by the TSL codegen, which will then generate an inverted index service for each of these fields. However, to make indexing more flexible, we should remove the index codegen, and migrate the indexing capabilities to an ICell-based framework, where indexer modules can claim their indexing capabilities, and arranged by our system. It is even possible that one field is indexed by multiple indexer modules claiming different capabilities (some good at range searches, some good at FTS, etc).

Do you have an install document?

I have tried to install GraphEngine on windows or linux(centos7) not Ubuntu,but the result are all failed. Do you have any document can tell me how to install the graphsql ?

Cell type mismatched : Global.LocalStorage.UseFriendship(Monica.CellID)

Hi,
I followed example Friend Graph and try to understand how Graph Engine work.
First, i try Performance and it was fine :

            long spouse_id = -1;
            long cast_id = -1;
            using (var cm = Global.LocalStorage.UseCharacter(Monica.CellID))
            {
                
                if (cm.Married) // Check if Married true or false
                    Console.WriteLine(cm.Married);
                    spouse_id = cm.Spouse;
                    Console.WriteLine("Spouse Id : " + spouse_id);

                if (cm.Performer != null)
                {
                    cast_id = cm.Performer;
                    Console.WriteLine("Cast ID: " + cast_id);
                }

                
            }

            Console.WriteLine("The Cast of Monica is: ");
            
            using (var ca = Global.LocalStorage.UsePerformer(cast_id))
            {
                Console.WriteLine(ca.Name);
            }

After that i try Global.LocalStorage.UseFriendship(Monica.CellID) but i have an error :

An unhandled exception of type 'Trinity.Storage.CellTypeNotMatchException' occurred in FriendsCell.dll
Additional information: Cell type mismatched.
Please help me understand where i wrong and how to fix this error.
Thanks so much !

Trinity.C: MT_SHADOW_ENUMERATOR needs to be polished

https://github.com/Microsoft/GraphEngine/blob/multi_cell_lock/src/Trinity.C/src/Storage/MTHash/MT_SHADOW_ENUMERATOR.cpp#L43

The enumerator reports the status code from local memory storage use cell operation back to LMS enumeration routine, but if someone deleted the cell, the status code should be E_CELL_NOT_FOUND, and will cause the enumeration routine to stop the enumeration on the current trunk and move to the next one before all other entries are enumerated.

Also, https://github.com/Microsoft/GraphEngine/blob/multi_cell_lock/src/Trinity.C/src/Storage/MTHash/MT_SHADOW_ENUMERATOR.cpp#L20

Currently the memory cost of an enumerator grows linear to the data size. This is bad if we have a lot of threads doing enumerations together.

cross platform build

I want a build procedure that works cross platform from the command line at the root level of the project.

ie, I want to:

make && make install

or even something like

sudo apt-get install msbuild && msbuild buildall.sln

build procedure should be documented in the README.md.

FanoutSearch unit tests failed

64 Tests Failed and 68 Tests Passed.

Test Name: JsonDSLTest_2hop
Test FullName: FanoutSearch.UnitTest.JsonDSLTest.JsonDSLTest_2hop
Test Source: D:\GraphEngine\src\LIKQ\FanoutSearch.UnitTest\JsonDSLTest.cs : line 40
Test Outcome: Failed
Test Duration: 0:00:00.2089944

Result StackTrace:
at FanoutSearch.ExpressionSerializer.EnsureSerializer()
at FanoutSearch.FanoutSearchDescriptor.<>c.b__22_0(Expression pred)
at System.Linq.Enumerable.WhereSelectListIterator2.MoveNext() at System.Collections.Generic.List1..ctor(IEnumerable1 collection) at System.Linq.Enumerable.ToList[TSource](IEnumerable1 source)
at FanoutSearch.FanoutSearchDescriptor.Serialize()
at FanoutSearch.FanoutSearchDescriptor._ExecuteQuery()
at FanoutSearch.FanoutSearchDescriptor.GetEnumerator()
at System.Collections.Generic.List1..ctor(IEnumerable1 collection)
at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
at FanoutSearch.FanoutSearchModule._SerializePaths(FanoutSearchDescriptor search, TextWriter writer)
at FanoutSearch.FanoutSearchModule.JsonQuery(String queryString, String queryPath)
Result Message:
Test method FanoutSearch.UnitTest.JsonDSLTest.JsonDSLTest_2hop threw exception:
System.NullReferenceException: Object reference not set to an instance of an object.

A series of walkthrough tutorials to help everyone to better understand the system

I think it is better that we maintain both a (unordered) set of rather complete examples, and a (linear) series of tutorials, each covering a very specific topic of our system. A new user can go through them one by one and then get a better understanding of the system, compared to looking at examples, which are more task-oriented.

Of course, if we design it "incremental learning" oriented, at some stages one can reach a checkpoint where a rather complete example can be introduced.

@leoxia @shaobin thoughts?

TSL module keyword

@shaobin I see the "module" keyword in the LIKQ Fanoutsearch project; what is the "module" keyword do? Is there any documentation that describes the "module" keyword? It does not appear in any of the documentation that I have access to.

Tavi

GraphEngine: Save Cell

Save a cell with size larger than 1G in GraphEngine, it will crash, but the return value is "E_SUCCESS". it is not reasonable, it should be “cell too big” or some others like this.

What is the most canonical way to represent a relationship?

In the Friends example, relationships are implied by storing the cell id (or a List of cell ids) in the related nodes. In the Freebase example, though, there's the notion of a [GraphEdge] that's introduced. I was hoping the documentation (and perhaps here), we could get a clear understanding of how to properly model relationships/edges using GraphEngine.

GraphEngine.DiskImageVerifier: offline data image analysis

It could be useful if we have an offline tool to analyze the disk images.
In general it could perform the following tasks:

  • Verify the cell entries: Duplication? Invalid addresses? Dead locks? Invalid types?
  • Verify low-level cell content: Compare & diff two images.
  • Verify high-level cell integrity: extract metadata from a TSL assembly, and check the content of each cell according to the type information.

How to run a sample on linux command line?

Hello,

I built GraphEngine from source on ubuntu 16.04 and tried to start/run the samples/Friends project project by doing ./tools/dotnet/dotnet restore ./samples/Friends/Friends.sln, however I get the error:

Nothing to do. None of the projects specified contain packages to restore.

How do I run a sample using the linux command line?

CoreCLR: Cross-platform P/Invoke

Trinity has a lot of interop interfaces for communication between the managed part (Trinity.Core) and the unmanaged one (Trinity.C). When we first evaluated the cross-platform capabilities, we found that Mono provided two ways to address the platform-specific native calls: the DllMap, and that one can register InternalCall entries into the runtime host, which perfectly matched our current interop model.

On the CoreCLR side, currently it is required that a developer only specifies the name of a library, while the runtime "guesses" the full library name ("name.suffix"). See: dotnet/coreclr#1248

It is unclear whether there will be a feature like discussed here: https://github.com/dotnet/coreclr/issues/930, but currently our best option is to follow the convention and update all our interop routines to point to "Trinity.C", so that it will be "guessed" as "Trinity.C.dll" on Windows and "libTrinity.C.so" on Linux. (I'm not sure whether the runtime will stop guessing because it sees a suffix ".C" here..)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.