Coder Social home page Coder Social logo

vaticle / typedb-ml Goto Github PK

View Code? Open in Web Editor NEW
552.0 38.0 93.0 11.26 MB

TypeDB-ML is the Machine Learning integrations library for TypeDB

Home Page: https://vaticle.com

License: Apache License 2.0

Python 79.86% Starlark 20.14%
grakn graql machine-learning artificial-intelligence ml knowledge-graph knowledgebase link-prediction relational-learning knowledge-graph-completion

typedb-ml's People

Contributors

dmitrii-ubskii avatar flyingsilverfin avatar gowtham1997 avatar grabl avatar haikalpribadi avatar jamesreprise avatar jmsfltchr avatar lolski avatar trellixvulnteam avatar vmax avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

typedb-ml's Issues

Create Grakn schema traversal

Walk the schema to find schemas concept types and their hierarchies. This is required in order to encode information about these types in the TensorFlow pipeline, so that the framework has the capacity to learn the impact of a node having a particular type, and the influence of that type's super-types.

Needed by #13

BLAST: define the schema

This issue was originally posted by @sorsaffari on 2018-09-17 19:49.

Based on the response(s) returned from the API, as implemented here, the schema for the Grakn knowledge graph needs to be defined and loaded into a keyspace.

Add a "Use Cases" section in the README

The README.md describes what KGCN is, but it does not describe how it will be beneficial for users.

We should have a use-case section describing the kind of problems in which KGCN makes sense as a solution.

Network architecture is not type-centric

At present, the approach used is close to the approach described in GraphSAGE, which assumes homogenous data.

The downside of this approach are:

  1. When querying for a Thing's neighbours, we often receive them sorted by type, which makes randomly sampling these difficult
  2. Randomly sampling is biased by point 1, but also by the number of neighbour instances of different Types. For instance, there may be only neighbour of Type A, but 10,000 of Type B, how do we choose to sample these?
  3. When pseudo-randomly sampling neighbours we set a limit for the number of Things we are willing to consider in order to reduce expense. Doing this, given results often come back sorted by Type, we may not see any examples of some neighbour types that are actually present

Working on Grakn, we have type information that allows us to understand the nature of the neighbours a Thing has. The network architecture should make use of this.

Encode traversal raw data into float tensors

Ingest data that describes the traversals from a batch of starting concepts and build float tensors to feed into the main trunk of the pipeline.

Requires:

  • Schema type encoder
  • Role type encoder
  • Role direction encoder
  • Long encoder
  • Double encoder
  • Boolean encoder
  • Date encoder
  • String encoder - potentially using a drop-in from TensorFlow Hub

Needed by #13
Needs #17
Needs #15

Extend the deployment test to be more thorough

Right now the deployment test:

  1. Verifies whether it can deploy the Pip package to test.pypy.org
  2. Verifies whether it can install it using pip install

These tests do not verify whether the Pip package are well-formed therefore we should have a test which performs basic sanity checks on the installed package, for example, by attempting to import and instantiate the kglib from a real Python program.

End-to-end-test may use the wrong PyPi version

If there are 2 workflows running at the same time, since date is used as the VERSION number for test PyPi, and the highest number is used as the latest, then the wrong version may be used by the next job in one of those workflows. That job is end-to-end-test.

Expert Systems Research

This issue was originally posted by @jmsfltchr on 2018-09-14 12:42.

Why:
Expert Systems are critical for a variety of domains, including chatbots, and medical diagnostics (for their transparency over ML systems)
How:
Research and disseminate how to build a general ES framework for Grakn to demonstrate Grakn's usefulness in this domain.

Just a quick question

Hi there! This is neither a bug report nor a feature request, so I hope you don't mind me posting this here.

My name is Reed, and I'm a software engineering researcher at Sandia National Laboratories in the US. I've created an issue on your repo just to ask a quick question. If you don't have time or don't care to respond, feel free to ignore me and/or delete this issue.

Where I work, we have a very diverse ecosystem of cutting-edge research codes spanning every discipline you could imagine. I'm part of our software engineering research department, and it's our job to keep that ecosystem robust and healthy. Part of that means helping scientists to adopt good software practices. Right now, my mind has been on software versioning/release schemes (e.g. semantic versioning).

In order to build a case for/against getting my people on-board with the practice, I figured I should ask people who already use versioning to release their software to see what they think. So I gathered up a list of scientific software repositories on GitHub, then I selected those that tracked versioned releases, that were reasonably active, etc. Finally, I picked a handful of those repos and decided to reach out to them. You were on that list.

Anyway, here's the question:

What do you believe are the benefits (or drawbacks) of having versioned releases of your software (i.e. 1.0.0, 1.1.0, 2.0.0...)? When should someone start thinking about versioning/releasing their code?

Just a sentence or two, that's all I need. For context, imagine the preceding sentence is this: "But don't just take my word for it, just listen to what these accomplished researchers have to say!".

Thank you so much!

Reed Milewicz
[email protected]

Prototype a Concept Feature Embedding Framework

The objective is to prototype a method of building vector representations of Concepts in a Knowledge Graph. These vectors can then subsequently be used in machine learning pipelines in order to perform learning across the graph.

End-to-end test requires hard-coded data source

As below, the dataset has been hard-coded. Ideally we shouldn't piggyback on release data for testing.

http_file(
  name = "animaltrade_dist",
  urls = ["https://github.com/graknlabs/kglib/releases/download/v0.1a1/grakn-animaltrade.zip", # TODO How to update to the latest relase each time?
  ]
)

Implement type-wise Attribute value normalisation

Presently it is very difficult to architect a concise way to normalise attribute values.

Current problems include:

  • All Things in the graph are treated as if they could be an attribute. This means that:
    • All Things must support a field for long, double, string, date and boolean in case they are an Attribute. In the case that a Thing is an Attribute, then only one of these values will be set to non-default.
    • The vast majority of the attribute value fields are set to a default value. This obfuscates the meaning of zero. In some cases it means an actual value of zero, in others it is present because the Thing is not an Attribute . This is particularly difficult to handle in the case of dates where in unix time zero is Thursday, 1 January 1970.
  • Attributes need to be normalised by Type, otherwise the distribution of values from one type will impact that of another
  • Normalisation needs to be calibrated on the training set, and the parameters used to normalise data passed subsequently.
  • Encoding of the input data takes place inside the TensorFlow computation graph, adding normalisation there may be non-trivial, and there aren't any OOTB components from TensorFlow like the preprocessing.StandardScaler() of scikit-learn

Should be made easier to accomplish by solving #51

Implement end-to-end test using test deployment

We require a way to test that kglib can be imported via pip. This requires a dedicated test that can be run independent of other typical tests. This test, using bazel, should depend on the latest deployment to the test PyPI server.

Pytorch Issue Windows

I try to install pytorch and after days of trying Im here with a big, big problem. I read a lot of articles of "how to install pytorch" I try to install with pip install but dont work for me and after I install it with Anaconda, but in anaconda is pytorch install, when I type: conda list, he is there like this form: pytorch 1.0.1 py3.7_cuda100_cudnn7_1 pytorch, I have python 3.7, when I run a code with import torch this show me a message like this:

problem

And when i try to import torch in python 3.7:

fdzzddz

Pip install error:

dsfdasad

How to pass this errors? Please Help, thx.

Add SonarCloud to KGLIB repo

Given that KGLIB's codebase is still in its infancy right now, it's good to start putting a code quality system enforce early on. Let's add this as part of PR and master workflow.

Feature normaliser by attribute type

Feature values once encoded need to be normalised relative to the other values for the same attribute type. This is necessary since we can expect that different attribute types (of the same datatype) will have wildly different distributions.

Needed by #13

Machine Learning Research

This issue was originally posted by @jmsfltchr on 2018-08-31 17:29.

Why
To explore the benefits of combining Grakn with Machine Learning
How
Investigate the integration of machine learning with Grakn.
Efforts to include:

  • Feature extraction from Grakn (or equivalent)
  • Running ML at query-time, triggered by querying, therefore also by reasoning

This issue needs #9, needs #8, needs #4, needs #13

Initial investigation into Random Forests in Grakn

This issue was originally posted by @jmsfltchr on 2018-08-31 17:27.

Is it possible to create a random forest that sits inside Grakn so that it can be used for classification/regression at query-time? This experiment has not yet gone far enough to determine feasibility in terms of speed. The blocker that was encountered before this was being able to perform aggregations in rules, because we need to do aggregate mode inside a rule in order to implement the majority voting of trees in a forest to classify an example. Performing this operation outside Grakn seems to defeat the point of embedding the forest in Grakn at all. I have made no effort to consider how to build or "train" (*1) the forest. This training (*1) could be done in application code and then the tree translated into Grakn. *1 by "training" I mean that the trees are not built totally randomly, the discrimination boundary picked for each node (and which feature to use, picked from a random set?) is picked based upon the basis of what divides the data the most.

Sync dependencies upon graknlabs repos

Problem to Solve

KGLIB should be automatically updated to depend upon the latest commit

Current Workaround

Currently the dependencies must be updated manually.

Proposed Solution

Have Grabl automatically update the commits that are depended upon.

Create a CI pipeline which performs tests and release (if manually approved)

Scope of the CI pipeline:

  1. tests:
    1. runs unit tests
    2. runs a deployment test which deploys the artifact to test.pypi.org
    3. runs an end-to-end test which verifies if the deployed artifact can be used
  2. a manual approval prompt which should trigger a release process
  3. release:
    1. deploy to pypi.org
    2. deploy a release draft to Githubb

Use cached test results for unchanged source code

Problem to Solve

All tests rerun from cold, which is unnecessary if large amounts of source code are unchanged.

Current Workaround

Ignore this computational penalty

Proposed Solution

Use RBE with bazel to cache test results

Cannot depend upon client-python and grakn releases due to conflicting transitive build-tools dependencies

Problem to Solve

In KGLIB we wish to conduct tests in CI against the latest releases of graknlabs/client-python and graknlabs/grakn. This is to ensure that user experience is aligned with the testing conditions in CI. We wish to do this by depending upon git repositories by tag with bazel, using sync-dependencies to auto-update the tags to reflect the latest releases.

It is common that the latest release of graknlabs/client-python and the latest release of graknlabs/grakn depend upon different commits of graknlabs/build-tools. Using bazel there is no way to use graknlabs/build-tools with two different versions. This is due to the fact that both of these repositories refer to graknlabs/build-tools as @graknlabs_build_tools.

This transitive dependency misalignment makes it impossible to both use bazel and test against the latest releases of graknlabs/client-python and graknlabs/grakn.

Current Workaround

Depend upon graknlabs/client-python and graknlabs/grakn by commit and use sync-dependencies, in which case they both use the same version of graknlabs/build-tools, hence resolving the conflict.

In this case we only test against the latest releases of graknlabs/client-python and graknlabs/grakn in the test-deployment-pip job in CI, when we use a deployed snapshot of KGLIB, which will depend upon a released version of client-python. This version must be manually updated in install_requires of assemble_pip.
This test will also use a released version of Grakn, retrieved as a zip.

Proposed Solution

Add functionality to bazel to permit including transitive dependencies in a scoped way, such that graknlabs/client-python, graknlabs/grakn and graknlabs/kglib can each depend upon a different version of graknlabs/build-tools without conflicting.

BLAST: Try the API

This issue was originally posted by @sorsaffari on 2018-09-05 18:39.

As the first step in writing an example to illustrate how BLAST can be used with a Grakn Knowledge Graph. - try with a single protein sequence - try with a file containing multiple protein sequences - assess the result

Build Tensorflow implementation of supervised GraphSAGE

Translate the approach of GraphSAGE to the context of Grakn, taking inspiration from the author's code where applicable. Implement first cut implementation for inference, loss and optimisation. Test using dummy data as a stand-in for an encoding pipeline.

needed by #13

A Clerical Error in the README.md

Maybe there is a clerical error that exists in the sentence :"Delete all appendix attributes from both animaltrade_train and animaltrade_test keyspaces. This is the label we will predict in this example, so it should not be present in Grakn otherwise the network can cheat"

Here the animaltrade_train may should be animaltrade_eval. Just let me know if that's right.

Build a knowledge graph of the Graph ML space

This issue was originally posted by @jmsfltchr on 2018-09-11 11:34.

Track the papers of interest found during my research into how to do ML over a knowledge graph. Develop a schema sophisticated enough to capture this information fully.

Unapproved but successful workflows should show as green

Problem to Solve

A CI workflow which reaches an approval step, where approval is not given, is indicated as pending (orange). This is misleading as it gives the impression that tests have not passed. Approval is only given when releasing, so this problem is very common.

Current Workaround

Inspect the pending flag to see how far the workflow progressed

Proposed Solution

Use a custom approval system

Cannot install kglib - no matching distribution found for tensorflow

I can't get to install grakn-kglib. After I run "pip3 install grakn-kglib" (inside a new venv), I receive this error:

Could not find a version that satisfies the requirement tensorflow==1.11.0 (from grakn-kglib) (from versions: ) No matching distribution found for tensorflow==1.11.0 (from grakn-kglib)

I'm running python 3.7.1.

Add a terminology section to KGCN README

At present there are several terms that may need explaining to avoid confusion. For example:

  • What we mean by neighbourhood
  • What an example is (either a Thing we want to embed/classify or example code)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.