Coder Social home page Coder Social logo

smartdataanalytics / knowledge-graph-analysis-programming-exercises Goto Github PK

View Code? Open in Web Editor NEW
277.0 24.0 71.0 8.11 MB

Exercises for the Analysis of Knowledge Graphs

Home Page: https://sewiki.iai.uni-bonn.de/teaching/lectures/kga/2017/start

License: Apache License 2.0

Jupyter Notebook 98.38% Python 1.62%
knowledge-graph exercises semantics machine-learning

knowledge-graph-analysis-programming-exercises's Introduction

Programming Exercises for the Analysis of Knowledge Graphs

This is a repository, which allows interested students and researchers to perform hands-on analysis of knowledge graphs. It is primarily developed as part of the knowledge graph analysis lecture of the SDA Group at the University of Bonn. However, the material itself is also useful for anyone else.

Knowledge Graphs - Things, not Strings!

Knowledge graphs represent knowledge in terms of entities and their relationships as shown in the figure below. The nodes of a knowledge graph are the objects which are relevant in your domain and have a unique identifier (so they represent real world "things" rather than just a string label). The edges are the connections between those objects. Since knowledge graphs are intuitive and enjoy a number of benefits, they became very popular over the past decade. Some of the most well known knowledge graphs are the Google Knowledge Graph (a major component of Google Search and other services), DBpedia (a knowledge graph extracted from Wikipedia), Wikidata, YAGO, the Facebook Social Graph, Satori (Microsoft Knowledge Graph) and the LinkedIn Knowledge Graph.

Many knowledge graphs are very large and their creation is crowdsourced and/or they are generated from various sources. Relational learning methods can then be employed on knowledge graphs for a variety of tasks, e.g. link prediction tries to find missing edges in knowledge graphs (e.g. suggesting friends via your social graph is about predicting missing edges to other persons), link correction is about finding incorrect edges, entity resolution is about mapping entities in text to knowledge graphs and clustering groups entities based on their similarity. In the exercises, you will learn about relational learning methods for knowledge graphs.

The two knowledge representation formalisms for knowledge graphs, which are used in the exercises, are RDF knowledge graphs and property graph databases. Since knowledge graphs represent a whole network of entites, the methods to solve the above problems often go beyond simple feature based machine learning. In the exercises, you will learn about the creation of knowledge graph embeddings via tensors and tensor factorisation as well as neural network based techniques. You will also learn about Markov Networks.

knowledge graph example

Exercise Overview

Each individual exercise contains a description of tasks and background. We first start with the formalisms to create an query knowledge graphs and then proceed with relational learning methods.

Contributing and Feedback

Please use the issue tracker for reporting problems and suggesting improvements. Feel free to submit pull requests for improvements of the exercises. Please send other feedback via mail to Prof. Jens Lehmmann.

Authors

License

The repository itself is under Apache License. For the individual libraries and tools used in the exercises, please check their license conditions.

Acknowledgements

We thank the students of the Knowledge Graph Analysis lecture in Bonn as well as the developers of the frameworks we are using for their support in creating this learning resource.

knowledge-graph-analysis-programming-exercises's People

Contributors

mdasifkhan avatar mehrdadbozorg avatar s6fikass avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

knowledge-graph-analysis-programming-exercises's Issues

Alyawarra (kinship) data dictionary is missing names for entities/relations

Hello!

Thank you for putting together and releasing these exercises. I am currently going through Exercise 6 and I noticed that the data dictionary lookups for the Alyawarra (kinship) dataset (data/kinship/bin/idx2ent.npy and data/kinship/bin/idx2rel.npy) appear to only contain indices and not the names of the entities/relations. It would seem that this information is necessary for the interpretability of the t-SNE visualization and k-NN of the entity/relation embeddings.

I did find a dictionary for the original Alyawarra dataset from 1971 at Kinsources. However, the codes for the relation types there range from 1 to 29, while the indices in data/kinship/bin/idx2rel.npy range from 0 to 25, so I am not sure of the mapping between these two sets of values.

Is there anything that I am missing? Any insights would be very helpful.

Many thanks!

Max margin loss function for ER-MLP seems wrong

Hi there! First of all thank you for the detailed programming exercises you have provided here.

I am trying to implement ER-MLP in Ex.6 with the max-margin loss function. The variables y_pos and y_neg used there seem a little confusing.

    M = y_pos.size(0)

    y_pos = y_pos.view(-1).repeat(C)  # repeat to match y_neg

    y_neg = y_neg.view(-1)

    # target = [-1, -1, ..., -1], i.e. y_neg should be higher than y_pos
    target = -np.ones(M*C, dtype=np.float32)
    loss = F.margin_ranking_loss(
        y_pos, y_neg, target, margin=margin, size_average=average
    )

`

  1. Shouldn't y_neg contain the scores derived for the sampled negatives from the network? The code mentions they are binary-valued containing the true labels, but that doesn't seem right to me
  2. After the repeating, y_pos becomes M*Cx1 sized tensor, while y_neg is still Mx1 size. This may cause issues when passing to the loss function
  3. The variable target is initialized as -1, i.e. it would learn to rank y_neg higher, but don't we want the network to rank y_pos higher than y_neg?

Thanks in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.