Coder Social home page Coder Social logo

nju-websoft / openea Goto Github PK

View Code? Open in Web Editor NEW
512.0 16.0 77.0 2.53 MB

A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs, VLDB 2020

License: GNU General Public License v3.0

Python 99.40% Shell 0.60%
entity-alignment knowledge-graph-embedding

openea's Introduction

Contributions Welcome License language-python3 made-with-Tensorflow Paper

Entity alignment seeks to find entities in different knowledge graphs (KGs) that refer to the same real-world object. Recent advancement in KG embedding impels the advent of embedding-based entity alignment, which encodes entities in a continuous embedding space and measures entity similarities based on the learned embeddings. In this paper, we conduct a comprehensive experimental study of this emerging field. This study surveys 23 recent embedding-based entity alignment approaches and categorizes them based on their techniques and characteristics. We further observe that current approaches use different datasets in evaluation, and the degree distributions of entities in these datasets are inconsistent with real KGs. Hence, we propose a new KG sampling algorithm, with which we generate a set of dedicated benchmark datasets with various heterogeneity and distributions for a realistic evaluation. This study also produces an open-source library, which includes 12 representative embedding-based entity alignment approaches. We extensively evaluate these approaches on the generated datasets, to understand their strengths and limitations. Additionally, for several directions that have not been explored in current approaches, we perform exploratory experiments and report our preliminary findings for future studies. The benchmark datasets, open-source library and experimental results are all accessible online and will be duly maintained.

Key contributors ✨

Zequn Sun
Zequn Sun (NJU)

Wei Hu (NJU)
Wei Hu (NJU)

Muhao Chen (NJU)
Muhao Chen (UC Davis)

Haofen Wang (TONGJI)
Haofen Wang (TONGJI)

*** UPDATE ***

  • Aug. 1, 2021: We release the source code for entity alignment with dangling cases.

  • June 29, 2021: We release the DBP2.0 dataset for entity alignment with dangling cases.

  • Jan. 8, 2021: The results of AliNet on OpenEA datasets are avaliable at Google docs.

  • Nov. 30, 2020: We release a new version (v2.0) of the OpenEA dataset, where the URIs of DBpedia and YAGO entities are encoded to resovle the name bias issue. It is strongly recommended to use the v2.0 dataset for evaluating attribute-based entity alignment methods, such that the results can better reflect the robustness of these methods in real-world situation.

  • Sep. 24, 2020: add AliNet.

Table of contents

  1. Library for Embedding-based Entity Alignment
    1. Overview
    2. Getting Started
      1. Code Package Description
      2. Dependencies
      3. Installation
      4. Usage
  2. KG Sampling Method and Datasets
    1. Iterative Degree-based Sampling
    2. Dataset Overview
    3. Dataset Description
  3. Experiment and Results
    1. Experiment Settings
    2. Detailed Results
  4. License
  5. Citation

Library for Embedding-based Entity Alignment

Overview

We use Python and Tensorflow to develop an open-source library, namely OpenEA, for embedding-based entity alignment. The software architecture is illustrated in the following Figure.

The design goals and features of OpenEA include three aspects, i.e., loose coupling, functionality and extensibility, and off-the-shelf solutions.

Getting Started

These instructions cover how to get a copy of the library and how to install and run it on your local machine for development and testing purposes. It also provides an overview of the package structure of the source code.

Package Description

src/
├── openea/
│   ├── approaches/: package of the implementations for existing embedding-based entity alignment approaches
│   ├── models/: package of the implementations for unexplored relationship embedding models
│   ├── modules/: package of the implementations for the framework of embedding module, alignment module, and their interaction
│   ├── expriment/: package of the implementations for evalution methods

Dependencies

  • Python 3.x (tested on Python 3.6)
  • Tensorflow 1.x (tested on Tensorflow 1.8 and 1.12)
  • Scipy
  • Numpy
  • Graph-tool or igraph or NetworkX
  • Pandas
  • Scikit-learn
  • Matching==0.1.1
  • Gensim

Installation

We recommend creating a new conda environment to install and run OpenEA. You should first install tensorflow-gpu (tested on 1.8 and 1.12), graph-tool (tested on 2.27 and 2.29, the latest version would cause a bug), and python-igraph using conda:

conda create --name openea python=3.6 graph-tool==2.40 -c conda-forge
conda activate openea
conda install tensorflow-gpu==1.12
conda install -c conda-forge python-igraph

Then, OpenEA can be installed using pip with the following steps:

git clone https://github.com/nju-websoft/OpenEA.git OpenEA
cd OpenEA
pip install -e .

Usage

The following is an example about how to use OpenEA in Python (We assume that you have already downloaded our datasets and configured the hyperparameters as in the examples.)

import openea as oa

model = oa.kge_model.TransE
args = load_args("hyperparameter file folder")
kgs = read_kgs_from_folder("data folder")
model.set_args(args)
model.set_kgs(kgs)
model.init()
model.run()
model.test()
model.save()

More examples are available here

To run the off-the-shelf approaches on our datasets and reproduce our experiments, change into the ./run/ directory and use the following script:

python main_from_args.py "predefined_arguments" "dataset_name" "split"

For example, if you want to run BootEA on D-W-15K (V1) using the first split, please execute the following script:

python main_from_args.py ./args/bootea_args_15K.json D_W_15K_V1 721_5fold/1/

KG Sampling Method and Datasets

As the current widely-used datasets are quite different from real-world KGs, we present a new dataset sampling algorithm to generate a benchmark dataset for embedding-based entity alignment.

Iterative Degree-based Sampling

The proposed iterative degree-based sampling (IDS) algorithm simultaneously deletes entities in two source KGs with reference alignment until achieving the desired size, meanwhile retaining a similar degree distribution of the sampled dataset as the source KG. The following figure describes the sampling procedure.

Dataset Overview

We choose three well-known KGs as our sources: DBpedia (2016-10),Wikidata (20160801) and YAGO3. Also, we consider two cross-lingual versions of DBpedia: English--French and English--German. We follow the conventions in JAPE and BootEA to generate datasets of two sizes with 15K and 100K entities, using the IDS algorithm:

# Entities Languages Dataset names
15K Cross-lingual EN-FR-15K, EN-DE-15K
15K English D-W-15K, D-Y-15K
100K Cross-lingual EN-FR-100K, EN-DE-100K
100K English-lingual D-W-100K, D-Y-100K

The v1.1 datasets used in this paper can be downloaded from figshare, Dropbox or Baidu Wangpan (password: 9feb). (Note that, we have fixed a minor format issue in YAGO of our v1.0 datasets. Please download our v1.1 datasets from the above links and use this version for evaluation.)

(Recommended) The v2.0 datasets can be downloaded from figshare, Dropbox or Baidu Wangpan (password: nub1).

Dataset Statistics

We generate two versions of datasets for each pair of KGs to be aligned. V1 is generated by directly using the IDS algorithm. For V2, we first randomly delete entities with low degrees (d <= 5) in the source KG to make the average degree doubled, and then execute IDS to fit the new KG. The statistics of the datasets are shown below.

Dataset Description

We hereby take the EN_FR_15K_V1 dataset as an example to introduce the files in each dataset. In the 721_5fold folder, we divide the reference entity alignment into five disjoint folds, each of which accounts for 20% of the total alignment. For each fold, we pick this fold (20%) as training data and leave the remaining (80%) for validation (10%) and testing (70%). The directory structure of each dataset is listed as follows:

EN_FR_15K_V1/
├── attr_triples_1: attribute triples in KG1
├── attr_triples_2: attribute triples in KG2
├── rel_triples_1: relation triples in KG1
├── rel_triples_2: relation triples in KG2
├── ent_links: entity alignment between KG1 and KG2
├── 721_5fold/: entity alignment with test/train/valid (7:2:1) splits
│   ├── 1/: the first fold
│   │   ├── test_links
│   │   ├── train_links
│   │   └── valid_links
│   ├── 2/
│   ├── 3/
│   ├── 4/
│   ├── 5/

Experiment and Results

Experiment Settings

The common hyper-parameters used for OpenEA are shown below.

15K 100K
Batch size for rel. triples 5,000 20,000
Termination condition Early stop when the Hits@1 score begins to drop on
the validation sets, checked every 10 epochs.
Max. epochs 2,000

Besides, it is well-recognized to split a dataset into training, validation and test sets. The details are shown below.

# Ref. alignment # Training # Validation # Test
15K 3,000 1,500 10,500
100K 20,000 10,000 70,000

We use Hits@m (m = 1, 5, 10, 50), mean rank (MR) and mean reciprocal rank (MRR) as the evaluation metrics. Higher Hits@m and MRR scores as well as lower MR scores indicate better performance.

Detailed Results

The detailed and supplementary experimental results are list as follows:

Detailed results of current approaches on the 15K datasets

detailed_results_current_approaches_15K.csv

Detailed results of current approaches on the 100K datasets

detailed_results_current_approaches_100K.csv

Running time (sec.) of current approaches

running_time.csv

Unexplored KG Embedding Models

Detailed results of unexplored KG embedding models on the 15K datasets

detailed_results_unexplored_models_15K.csv

Detailed results of unexplored KG embedding models on the 100K datasets

detailed_results_unexplored_models_100K.csv

License

This project is licensed under the GPL License - see the LICENSE file for details

Citation

If you find the benchmark datasets, the OpenEA library or the experimental results useful, please kindly cite the following paper:

@article{OpenEA,
  author    = {Zequn Sun and
               Qingheng Zhang and
               Wei Hu and
               Chengming Wang and
               Muhao Chen and
               Farahnaz Akrami and
               Chengkai Li},
  title     = {A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs},
  journal   = {Proceedings of the VLDB Endowment},
  volume    = {13},
  number    = {11},
  pages     = {2326--2340},
  year      = {2020},
  url       = {http://www.vldb.org/pvldb/vol13/p2326-sun.pdf}
}

If you use the DBP2.0 dataset, please kindly cite the following paper:

@inproceedings{DBP2,
  author    = {Zequn Sun and
               Muhao Chen and
               Wei Hu},
  title     = {Knowing the No-match: Entity Alignment with Dangling Cases},
  booktitle = {ACL},
  year      = {2021}
}

openea's People

Contributors

sunzequn avatar sven-h avatar whu2015 avatar zhangzhao219 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openea's Issues

Fatal Python error: Segmentation fault

作者您好,Linux系统中,使用anaconda,按照README.md中Dependencies和Installation的要求和步骤,配置好环境,运行tutorial中的例子时,遇到Fatal Python error: Segmentation fault错误,请问这个错误的原因可能有哪些并且有什么解决的方法吗?

List index out of range exception in mwgm_graph_tool - BootEA

Hello
First, congratulations for the great paper and repository!
I successfully ran RDGCN matching the numbers of the paper, but I am running into troubles with BootEA. I followed the README to run BootEA on D-W V2 (15k) with python main_from_args.py ./args/bootea_args_15k.json D_W_15k_V1 721_5fold/fold_number. However, I am getting an index out of range exception at the 3rd fold inside the mwgm_graph_tool function. Check the error below:

File ..., line 112, in mwgm_graph_tool
    matched_pairs.add(pairs[index])
IndexError: list index out of range

I checked that when it throws the exception index=10, while the length of pairs is 6.
I also ran BootEA on D-W V1 (15k), and got the same error on the 2nd fold. Interestingly, I re-ran the code on this fold, and at the third attempt it ran smoothly. However, the average performance is 10 points below what is reported in your recent VLDB paper (MRR around 0.52).
Any idea about what might be happening?
Thanks!

about the implementation of RDGCN

Hi,

I am not quite familiar with TensorFlow, but I think there may be an error in the implementation of RDGCN. In file src\openea\approaches\rdgcn.py method training, you execute the negative sampling outside the epoch, so the negative samples unchanged, along with the feeddict. But I refer to the original code of RDGCN, they perform negative sampling every 10 epochs. Is that the same execution or your code made a mistake? I am not sure of my opinion. So hope you can solve my problem.

Many thanks

something about the output

I am a beginner in this research direction, thank you for your patience to answer my doubts.

I am confused about the alignment_results_12 in output/results. Is it the paired entity id after KG1 and KG2 are aligned?
I checked some of the ids in alignment_results_12 and found that there are differences between the paired entities

Dataset with the entity label

Hello author, in the paper, the following part is mentioned.

"Considering that DBpedia,
Wikidata and YAGO collect data from very similar sources
(mainly, Wikipedia), the aligned entities usually have identical labels. They would become “tricky” features for entity
alignment and influence the evaluation of real performance.
According to the suggestion in [95], we delete entity label"

  1. May I know if this label refers to the type of that entity?( for example, the type of Michael_Jordan is Person)

  2. Do you still have the dataset with all the labels? I would like to see whether this label could help to embed in some interesting way. If not, I might have to do some crawling to DBpedia and wikidata.

Thanks!

Marginal Ranking and Background Ranking

Dear authors,

I am very curious about the three dangling detection modules' training time,
image
Could you release the original code of all these modules, I want to try them on my machine.
Thanks ahead and appreciate your achievement.

实体对齐后的步骤

老师,您好~
请问求出对齐的实体后,怎么将两个实体各自的子知识图谱进行融合呢?如何合并相同的属性和关系,构成一个新的融合的知识图谱?

输出文件alignment_results_12的一些问题

您好,我观察到在文件alignment_results_12中的实体对的ID与kg1_ent_ids以及kg2_ent_ids中的实体ID是一致的,但是似乎并不是所有的实体都是对齐,我在问题something about the output中看到您已经更新了代码,那alignment_results_12中的实体对的ID对应的是哪个文件的呢?请您解答,谢谢。

无法复现MultiKE模型在各个子数据集上的结果

您好,我尝试复现了几个模型在数据集上的效果。目前MultiKE模型在各个子数据集上的Hit@1均只有20+。我尝试了跟换TensorFlow版本(1.12,1.15),均没有效果。
请问我是否遗漏了什么细节。

BTW:MultiKE使用的name embedding是word2vec,用在法语和德语是是否合理呢?

Predict methods for models

Hi,

thank you for your great library.
I would like to use it for KG matching.
As far as I can see, you evaluated the models on a test set
but do not provide the full extracted alignment (or am I wrong?).
I think it would be a good idea to have two kinds of predict methods:

  1. it gets a correspondence between e1 and e2 and returns the distance/confidence between them
  2. given a k value (for top-k retrieval), extract all correspondences (for example via greedy search)
    and write them to a file where each line contains e1, e2, and distance/confidence.
    Then one can use and further experiment with them as well as evaluating these correspondences 'offline'.
    What do you think?

I have two more questions:

  1. Regarding the greedy search: If the number of entities in the two graphs are not equal (contrary to your datasets), then always picking the first KG to search for correspondences can lead to many/few correspondences.
    The question is whether to choose the KG with fewer or more entities.
  2. During test, you look up the embeddings of the entities appearing only in the test set
    (see line 117 in the basic model as well as in the other models ).
    And based on these embeddings you calculate the greedy alignment and the evaluation measures (e.g. hits@k - see line 131).
    Is this correct or do I oversee something?
    Because if this is the case you would only rank elements which appear in the test set.
    And this would be some kind of leakage because in a prediction step you would rank all elements and choose the best k.
    In case the model rank elements not in the test set very high, then these would not appear in the evaluation.
    If the computation of the nearest neighbours is very costly, maybe the libraries for nearest neighbor search help
    (faiss, annoy, and a benchmark )?
    Maybe you can elaborate a bit more on this?

Stay healthy
Best regards
Sven

Accessing aligned pairs of trained model

I am following the entity alignment tutorial tutorial/entity_alignment/main.py and would like to store all closest alignments for each entity in model.kgs.kg1.entities_set.

Question: How can I access them?

Currently, I am using the following repurposed code (function added to tutorial/entity_alignment/mtranse.py):

def get_matches(self, remove_dangling=False):

     candidate_list_kg2 = self.kgs.kg2.entities_set
     if remove_dangling:
         candidate_list_kg2 = (
             candidate_list_kg2
             - set([x for x, _ in self.kgs.train_unlinked_entities2])
             - set([x for x, _ in self.kgs.valid_unlinked_entities2])
         )

     candidate_list_kg2 = list(candidate_list_kg2)

     embeds1 = tf.nn.embedding_lookup(self.ent_embeds, list(self.kgs.kg1.entities_set)).eval(session=self.session)
     embeds2 = tf.nn.embedding_lookup(self.ent_embeds, candidate_list_kg2).eval(session=self.session)
     mapping = self.mapping_mat.eval(session=self.session) if self.mapping_mat is not None else None

     alignment_rest_12, _, _, sim_list = test(
         embeds1, embeds2, mapping, self.args.top_k, self.args.test_threads_num,
         metric=self.args.eval_metric, normalize=self.args.eval_norm, csls_k=0, accurate=True
     )
     
     return alignment_rest_12

But the problem is that the tuple in alignment_rest_12 cannot be mapped to the entity URIs. The integer values seem not to be referring to entity IDs since

assert all([i[0] in model.kgs.kg1.entities_id_dict.values() for i in alignment_rest_12])

fails.

配置环境的问题

(openea) root@container-b8cd11b052-ce6b11ab:~# conda install tensorflow-gpu==1.12
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: -
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
请问为什么我配置 tensorflow-gpu1.12时会出现版本冲突?我前面创建虚拟环境时用的是python3.6呀

About IMUSE

Hello, I am looking to get the value of "λ" (trained by the bivariate regression model, as mentioned in original paper of IMUSE). Could you please confirm that this happens and if yes, where ?

Thank you in advance

AttributeError: 'Graph' object has no attribute 'label'

I run the main.py demo,but I get the error!

Traceback (most recent call last):
  File "/Users/youbo/PycharmProjects/demo/others/OpenEA-tutorial/ontology_matching/src/main.py", line 33, in <module>
    main()
  File "/Users/youbo/PycharmProjects/demo/others/OpenEA-tutorial/ontology_matching/src/main.py", line 17, in main
    src_rdf = RdfParser(src_file, 'http://oaei.ontologymatching.org/2007/benchmarks/101/onto.rdf#')
  File "/Users/youbo/PycharmProjects/demo/others/OpenEA-tutorial/ontology_matching/src/data_input.py", line 26, in __init__
    self.class_labels = [self._graph.label(uri) for uri in self.class_uris]
  File "/Users/youbo/PycharmProjects/demo/others/OpenEA-tutorial/ontology_matching/src/data_input.py", line 26, in <listcomp>
    self.class_labels = [self._graph.label(uri) for uri in self.class_uris]
AttributeError: 'Graph' object has no attribute 'label'

有一些模型在D_Y数据集上复现的精度达到了0.99

有一些模型,比如MultiKE,在D_Y_15K_V1上性能达到了0.99。配置文件没有改动,运行命令如下:

python main_from_args.py ./args/multike_args_15K.json D_Y_15K_V1 721_5fold/1/

请问这有可能是哪里出问题了呢?

关于输入输出的些许问题

作者您好,最近刚开始研究实体对齐这块内容,请问OpenEA框架是针对于两个不同知识图谱之间的实体对齐场景吗?相当于实体链接?因为看到输入数据中有KG1和KG2的两个知识图谱的关系三元组和属性三元组,此外train_links这些训练数据是两个知识图谱中对应相同的实体对吗?输出结果中有一个alignment_results_12文件,里面是预测出来的两个知识图谱中相同的实体对id吗?还有kg1_ent_embeds_txt和kg2_ent_embeds_txt分别是两个知识图谱中每个实体的embedding表示吗?
最后一个问题就是我们的应用场景是将实体抽取得到的实体进行对齐,也就是没有知识图谱,仅仅是对抽取得到的实体进行对齐,这种场景下可以使用OpenEA吗?如果可以的话,输入应该怎么设置呢?感谢作者的解答

Error in bootEA (maybe because of different number of entities?)

Hi,

I tested bootEA with my dataset.
Unfortunately after some epochs it returned with the following error:

File "main.py", line 26, in
model.run()
File "/src/openea/approaches/bootea.py", line 311, in run
neighbors_num1, self.args.batch_threads_num)
File "/src/openea/modules/train/batch.py", line 152, in generate_neighbours_single_thread
entity_embeds, neighbors_num)
File "/src/openea/modules/train/batch.py", line 161, in find_neighbours
sort_index = np.argpartition(-sim_mat[i, :], k)
File "<array_function internals>", line 6, in argpartition
File "/envs/openea/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 832, in argp
artition
return _wrapfunc(a, 'argpartition', kth, axis=axis, kind=kind, order=order)
File "/envs/openea/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 58, in _wrapfunc
return bound(*args, **kwds)
ValueError: kth(=25453) out of bounds (890)

maybe you have an idea what is going on there.
I can also provide you with the dataset I used (to reproduced the error).
However, I haven't tried the other approaches yet.

Best regards
Sven

Hosting datasets at zenodo or figshare

Thank you for this nice library. I was wondering if it would be possible to put the datasets in a more stable location e.g. Zenodo or FigShare. Zenodo for example enables versioning, which as seen in the original benchmark datasets is a common occurrence.

关于RDF图数据融合

您好,我刚接触知识图谱构建技术不久,目前有一项工程需要用到知识图谱融合技术将两个中文的领域知识图谱进行融合,得到融合后的知识图谱,请问可以使用您这个工具吗?如果可以那我该怎么做呢?还请不吝赐教。

请问dataset文件夹在哪

python main_from_args.py ./args/bootea_args_15K.json D_W_15K_V1 721_5fold/1/
给的例子里面用到了datase文件夹 "training_data": "../../datasets/" 下的D_W_15K_V1数据集,但是现在没有找到。

Pip Package

Hi,

would it be possible to release the library on conda-forge or as a pip package?
I think the pip package should be fairly easy to upload with your given struture.
Let me know what you think about it.

Furthermore I get the following warning during execution:

/anaconda3/envs/openea/lib/python3.6/site-packages/graph_tool/draw/cairo_draw.py:67: RuntimeWarning: Error importing matplotlib module. Graph drawing will not work.
  warnings.warn(msg, RuntimeWarning)
/anaconda3/envs/openea/lib/python3.6/site-packages/graph_tool/draw/cairo_draw.py:67: RuntimeWarning: Error importing matplotlib module. Graph drawing will not work.
  warnings.warn(msg, RuntimeWarning)
/anaconda3/envs/openea/lib/python3.6/site-packages/graph_tool/all.py:40: RuntimeWarning: Error importing draw module, proceeding nevertheless: No module named 'matplotlib'
  warnings.warn(msg, RuntimeWarning)
graph via graph_tool <Graph object, directed, with 148 vertices and 105 edges at 0x7f82d88afdd8>

It just misses the dependency to matplotlib.
I think it is a good idea to add it to the requirements.txt as well as to the readme of this repository which explains the conda environment creation.

Best regards
Sven

Is it possible to have a PyTorch vesion OpenEA?

Hi,

Thanks for the great work for Entity Alignment. Just wondering is there any possibility to publish a PyTorch version as the current version of TensorFlow is hard to debug during the session is running.

Cheers,
Yu

about get_pretrained_input in RDGCN

Hi,

I found in method get_pretrained_input(), the initial embedding of each entity is not normalized, the l2_normalize in original code of RDGCN is comment out. Could you explain the reason why you remove the l2.normalize here. I have tried adding the l2_normalize for EN_DE dataset and found the performance would decrease. But without l2_normalize seems also strange and is easy to cause inf value. Could you give a possible solution for this issue?

Best Regards

dataset not found

Hi,
When I try to run RDGCN model, I didn't find the file 'wiki-news-300d-1M.vec'. Where did you uploaded?

Many thanks

AttributeError: module 'openea' has no attribute 'kge_model'

Dear author,

There is some errors when I run these codes:
import openea as oa

model = oa.kge_model.TransE
args = load_args("hyperparameter file folder")
kgs = read_kgs_from_folder("data folder")
model.set_args(args)
model.set_kgs(kgs)
model.init()
model.run()
model.test()
model.save()

AttributeError: module 'openea' has no attribute 'kge_model'

Looking for your reply! Thank you!

Best regards,
Xu.

about AttrE

Hello,

in the original paper of AttrE, is mentioned that "The predicate alignment module merges two KGs by renaming the predicates of both KGs with a unified naming scheme ...". Did you adopt this technique in your implementation? If yes, could you please give me the reference in your code ?

Thank you

Alinet的实验结果

老师,您好!
我在D_Y_15K_V2数据集上进行Alinet的实验,为什么hits指标的结果基本为0呢?并且随着迭代,指标并没有上升,导致early stop。
我的语句为
python main_from_args.py ./args/alinet_args_15K.json D_Y_15K_V2 721_5fold/1/
结果为
accurate results: hits@[1,5,10,50]=[0.029 0.038 0.152 0.933]%, mr=4695.347, mrr=0.001430,time=4.345s

关于数据集的问题

作者,您好!我是一名刚接触知识图谱的小白,特别是实体融合这块领域,我们有幸搜到您提供的工具,但是我们水平很低,对于数据集这块我们有一个困惑,我们是不是数据集必须是一个一个标签页面,比如html或者xml?我们现实的需求其实是把两个相似的实体归类到一类本地下,例如梅西和C罗,都属于球员这个本地。请问我们应该怎么准备这个数据集呢?是不是我们需要事先生成一个个网页?格式是不是自定义?或者我们也可以直接使用一段文本作为数据集?问题很粗浅,还请您见谅。

graph-tool version 2.29 not available through conda

Hello,
I have tried to create the conda environment following the instructions in the README file, but it is not possible to find the graph-tool package (both 2,27 and 2.29 versions).
Could you please suggest me a way to solve this problem?
Thank you in advance.

Source code for generating subsampled datasets

Hello,

Thanks for compiling the embedding-based entity alignment methods, and for open-sourcing the GitHub repository.

I found the methods for generating subsampled version of various KG datasets to be very practical and useful. While you have publicly shared all the datasets (https://github.com/nju-websoft/OpenEA#dataset-overview), I couldn't find the source code for generating these datasets in the repo. Do you plan to share the implementation of IDS? If yes, is there an ETA?

Looking forward to your response.

Thanks,
Sainyam

tensorflow version issue

Have anyone tried this code on newer environment version, such as TF 1.15?
TF 1.12 need python 3.6, however vscode debugger no longer supports python 3.6.
And the only version of TF1.x that can run on RTX 30 series GPU is TF 1.15.
If someone could kindly provide a newer compatible requirement versions, I would be extremely grateful.

How to resolve DBPedia and Yago Resource IDs?

Hi,
thank you very much for your work and datasets! This saves me a lot of time and effort :)

To make full use of the data, I would need to resolve the DBPedia and YAGO Resource IDs you provide in the entity links files.

More specifically, the question is how to resolve the IDs below? How to get the "true / real" resources for, e.g., http://dbpedia.org/resource/E513085 or YAGO/E791619 ?

❯ head D_Y_100K_V2/ent_links
http://dbpedia.org/resource/E513085     YAGO/E791619
http://dbpedia.org/resource/E239159     YAGO/E095972
http://dbpedia.org/resource/E878692     YAGO/E595277
http://dbpedia.org/resource/E833901     YAGO/E766339
http://dbpedia.org/resource/E819974     YAGO/E726068
http://dbpedia.org/resource/E951846     YAGO/E483839
http://dbpedia.org/resource/E141239     YAGO/E534868
http://dbpedia.org/resource/E711446     YAGO/E523000
http://dbpedia.org/resource/E075843     YAGO/E283859
http://dbpedia.org/resource/E880611     YAGO/E193140

AttrE and IMUSE

Hello,

I am trying to reproduce the results for AttrE and IMUSE, but I can see that you used a small portion of seed alignment for training. When I used the seed alignment only for testing and validation (no training), the performance of the models was very low. Also, in original papers of AttrE and IMUSE they used the seed alignment only for testing.

Thank you

which version is your gensim?

I have a error:

Traceback (most recent call last): File "main_from_args.py", line 94, in <module> model.init() File "/data/home/wuyuming/wxl/OpenEA-master/src/openea/approaches/multi_ke.py", line 327, in init self._generate_literal_vectors() File "/data/home/wuyuming/wxl/OpenEA-master/src/openea/approaches/multi_ke.py", line 404, in _generate_literal_vectors literal_encoder = LiteralEncoder(self.literal_list, word2vec, self.args, 300) File "/data/home/wuyuming/wxl/OpenEA-master/src/openea/approaches/literal_encoder.py", line 198, in __init__ self.word2vec = generate_unlisted_word2vec(word2vec, literal_list, word2vec_dimension) File "/data/home/wuyuming/wxl/OpenEA-master/src/openea/approaches/literal_encoder.py", line 175, in generate_unlisted_word2vec model = Word2Vec(char_sequences, size=vector_dimension, window=5, min_count=1) TypeError: __init__() got an unexpected keyword argument 'size'

F1,precision and recall

In entity alignment,if use greedy alignment, one entity must have one answer.So,len(prediction) = len(gold) => precision=recall?
If use stable alignment,len(prediction) != len(gold) => precision != recall?
Is it true?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.