dobraczka / kiez Goto Github PK

🏘️ Hubness reduced nearest neighbor search for entity alignment with knowledge graph embeddings

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

entity-resolution embedding knowledge-graph hubness nearest-neighbors knowledge-graph-embedding approximate-nearest-neighbor-search entity-alignment

kiez's Introduction

Hi there 👋

🔭 I’m currently working on knowledge graph embeddings for entity resolution
🌱 I’m always trying to learn better ways to create clean python code
👯 I’m looking to collaborate on knowledge graph embeddings and data integration

kiez's People

Contributors

Stargazers

Watchers

Forkers

cthoyt daima2017 oceantangwei

kiez's Issues

Implement GPU-based ANN libraries

Candidates are:

Simplify deletion of ANN indices

Annoy and NGT create permanent indices in the file system. At the moment it is the user's responsibility to remove these. It would be nice to provide some function, that could be called in order to remove such files easily, when they are not needed anymore.

Installation problems with Python 3.11

NGT is not available for Python 3.11, Faiss installation maybe needs to be adapted.

Avoid hardcoded version string

Use single-source for version setting in kiez/__init__.py

Use case: assessing goodness of fit between two PyKEEN models

If I have two different embedding spaces describing the same entities, like if I train two models on the same dataset in PyKEEN, how can I use Kiez to assess how good they correspond? Or maybe there's a notion of how "good" the Kiez fit is?

A naive idea is I could I iterate through each entity and calculate the overlap coefficient of the nearest neighbors in both embedding spaces, then maybe report the average overlap coefficient. I'm sure I could come up with a few things like this, but I bet you know better! Any ideas appreciated.

I would start with code like this:

from pykeen.pipeline import pipeline
from pykeen.datasets import Nations

dataset = Nations()

# Train the same dataset with two different models
r1 = pipeline(
    model='TransE',
    dataset=dataset,
    epochs=1,  # change this to ~25 for real usage on Nations
)

r2 = pipeline(
    model='PairRE',
    dataset=dataset,
    epochs=1,  # change this to ~25 for real usage on Nations
) 

from kiez import Kiez

k_inst = Kiez()
k_inst.fit(
    r1.model.entity_representations[0]().detach().numpy(),
    r2.model.entity_representations[0]().detach().numpy(),
)

# How do I assess how well these spaces correspond? Is there a metric for how "good" the fit is?

Provide simple way to load pre-calculated benchmark knowledge graph embeddings

Update docs

Explain the new possibilities enabled by class-resolver
Incorporate architecture picture with explanation
link to readthedocs in readme
Give new examples (including for single-source)

Use objects as arguments for algorithms and hubness

Using predefined strings etc. limits extendability. Enabling objects as arguments enhances the possibility for users to use their preferred (A)NN libraries, and own hubness reduction methods.

Improve single source handling

Using single sources is a bit awkward since Kiez.fit() expects source and target. See #7 for an example use-case.

dobraczka / kiez Goto Github PK

kiez's Introduction

Hi there 👋

kiez's People

Contributors

Stargazers

Watchers

Forkers

kiez's Issues

Implement GPU-based ANN libraries

Simplify deletion of ANN indices

Installation problems with Python 3.11

Avoid hardcoded version string

Use case: assessing goodness of fit between two PyKEEN models

Provide simple way to load pre-calculated benchmark knowledge graph embeddings

Update docs

Use objects as arguments for algorithms and hubness

Improve single source handling

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent