If I have two different embedding spaces describing the same entities, like if I train two models on the same dataset in PyKEEN, how can I use Kiez to assess how good they correspond? Or maybe there's a notion of how "good" the Kiez fit is?
A naive idea is I could I iterate through each entity and calculate the overlap coefficient of the nearest neighbors in both embedding spaces, then maybe report the average overlap coefficient. I'm sure I could come up with a few things like this, but I bet you know better! Any ideas appreciated.
I would start with code like this:
from pykeen.pipeline import pipeline
from pykeen.datasets import Nations
dataset = Nations()
# Train the same dataset with two different models
r1 = pipeline(
model='TransE',
dataset=dataset,
epochs=1, # change this to ~25 for real usage on Nations
)
r2 = pipeline(
model='PairRE',
dataset=dataset,
epochs=1, # change this to ~25 for real usage on Nations
)
from kiez import Kiez
k_inst = Kiez()
k_inst.fit(
r1.model.entity_representations[0]().detach().numpy(),
r2.model.entity_representations[0]().detach().numpy(),
)
# How do I assess how well these spaces correspond? Is there a metric for how "good" the fit is?