Coder Social home page Coder Social logo

bio-ontology-research-group / walking-rdf-and-owl Goto Github PK

View Code? Open in Web Editor NEW
44.0 15.0 7.0 1.32 MB

Feature learning over RDF data and OWL ontologies

License: BSD 3-Clause "New" or "Revised" License

Groovy 33.60% Python 44.43% Makefile 8.00% C++ 5.85% Shell 1.10% TeX 0.72% Batchfile 6.31%
rdf-graph machine-learning owl classification feature-learning semantic-web

walking-rdf-and-owl's Introduction

Walking RDF and OWL

Feature learning on RDF and OWL (i.e., Description Logic theories).

Here are some scripts to facilitate building the graph, classifying it and learning its node representations:

To run: groovy RDFWrapper and follow instructions. The input is an RDF graph and the output file can be used as input for the modified DeepWalk tool available as part of this repository (https://github.com/bio-ontology-research-group/walking-rdf-and-owl/tree/master/deepwalk_rdf).

For example, to classify the RDF graph RDFgraph.nt using the OWL ontologies in onto_dir with the ELK reasoner, and writing an edge list representation of the inferred graph to outWrapper.txt, use the following command:

groovy RDFWrapper.groovy -i RDFgraph.nt -o outWrapper.txt -m mappingFile.txt -d onto_dir -c true 

To generate representations (embeddings) of the nodes (and edges) in the RDF graph, run

deepwalk --workers 64 --representation-size 256 --format edgelist --input outWrapper.txt  --output out.txt --window-size 5 --number-walks 500 --walk-length 40

to learn an embedding of size 256 using 64 parallel workers based on 500 walks of length 40 for each node. The deepwalk needs to be the modified version contained in this repository so that object properties are taken into account during the walk.

We also provided the algorithm with the option to allow walking from specific nodes only, by adding an excludelist parameter, which contains the identifiers of nodes to be excluded from the walks and therefore restrict walks to those that start from the remaining nodes. This modified version may provide faster training.

deepwalk --workers 48  --walk-length 20 --window-size 10 --number-walks 100 --representation-size 512 --format edgelist --excludlist exnodes.txt  --input outWrapper.txt --output outDeep.txt

To run the C++ multi-threaded implementation of the corpus generation modele, you need to have the C++ Boost libraries installed; on an Ubuntu system, you can do:

sudo apt-get install libboost-all-dev

You also need to install the Boost Threadpool Header files. Once all header files and libraries are installed, just type make to compile and run deepwalk

./deepwalk edgelistfile.txt walksfile.txt

Classification support

The RDFWrapper script comes with built-in support for OWL classification. Use this when your RDF dataset contains references to ontologies and the full ontology. In the script we provide, we use the ELK reasoner (which supports the OWL 2 EL profile) to classify the ontology and infer class assertion axioms for all individuals. These are added to the RDF dataset following classification and used to build the graph.

Example

An example knowledge graph and the resulting embeddings can be found here:

How to cite

If you use our code, please cite our paper: Alsharani et al. Neuro-symbolic representation learning on biological knowledge graphs. Bioinformatics 2017. link

walking-rdf-and-owl's People

Contributors

coolmaksat avatar leechuck avatar miguelangelrg avatar monaalsh avatar omarmaddouri avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

walking-rdf-and-owl's Issues

Disease Ontology terms not present

Hi,

Are the embeddings present in bio-knowldge-graph.embeddings complete? I know it was mentioned that these are the embeddings that were used in the Walking RDF and OWL paper but I've noticed some very basic disease ontology terms like DOID_552 (pneumonia) are not present in the embeddings.

Thanks!
Abby

Understanding and reformatting the knowledge graph.

Thanks for sharing this knowledge graph! I would love to be able to do a compare and contrast with some other methods, and ideally expand it a bit by joining it with other resources.

My apologies for the question of ignorance, but as a preliminary step, I am trying to convert the knowledge graph into a simpler triple format that I can load as a flat file into something like numpy. As such, I want to be sure I correctly understand the structure.

Could you confirm if I am reading this correctly? It appears that each triple forms two rows that look like this

<http://www.ncbi.nlm.nih.gov/gene/448835> <http://purl.obolibrary.org/obo/RO_0000085> <http://aber-owl.net/go/instance_0> . <http://aber-owl.net/go/instance_0> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.obolibrary.org/obo/GO_0031424> .

. Of the sets of brackets, it appears the first identifies the source node, the second encodes the edge's relationship, and the sixth identifies the target node. The third/fourth, appear to be an identifier of the tuple and the fifth appears to be the same everywhere.

Is the above interpretation correct? If so, is there an easy way to build up a simple dictionary of the node/edges urls? I'd prefer to encode them as simple numbers with a separate table mapping each number to a string, but couldn't find a node/edge dictionary in the repo.

Thanks so much again for all your work, and I hope this isn't a pain for you to answer.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.