Coder Social home page Coder Social logo

ict-bigdatalab / vnel Goto Github PK

View Code? Open in Web Editor NEW
22.0 1.0 2.0 5.03 MB

Dataset and code for EMNLP 2022 "Visual Named Entity Linking: A New Dataset and A Baseline"

License: Apache License 2.0

cross-modal-retrieval entity-linking image-retrieval multimodal

vnel's Introduction

Visual Named Entity Linking

This repository contains the dataset and code for the EMNLP2022 paper Visual Named Entity Linking: A New Dataset and A Baseline, which is the first large-scale visual input-oriented multimodal named entity linking dataset. The whole dataset contains over 48k annotated images based on 120k Wikipedia knowledge base.

Task introduction

VNEL(Visual Named Entity Linking) takes an image as input, then recognize visual mentions with bounding boxes around and links them to the corresponding entities in the large Knowledge Base. As a named entity linking task, it transfers the scenario from the traditonal textual modality to the pure visual modality. VNEL Task

Dataset

WIKIPerson is a high-quality human-annotated visual person linking dataset focused on Visual Named Entity Linking. Notice that our dataset is labeled on the News-related dataset with diverse agencies such as USA TODAY, BBC, the Washington Post, and so on, which means the quality of the image is much higher than the image that is directly searched by search engine.

Dataset Example

Example The dataset contains each image with the entity's bounding box and the corresponding wikidata id. released json data could be found on released _data folder. However, if you want to get access to the full image data, please check the following website.

Getting Data

To access our dataset, please refer to this on kaggle WIKIPerson_Kaggle. A detailed elaboration on the data format and information can be viewed via the link. Notice there are much entity in the Large Knowledge Base are unseen in the input images, so a general Feature Extraction Model is desired in this task.

#Image #Coverd Entity #Knowledge Base
WIKIPerson_V1.0 48k 13K 120K

Legal Notices

Any contributors grant you a license to the WIKIPerson Dataset and other content in this repository under the under the MIT License, see the LICENSE.md file.

Any contributors reserve all other rights, whether under their respective copyrights, patents, or trademarks, whether by implication, estoppel or otherwise.

Model

We can solve this task by dividing it into two parts, entity recall and entity disambiguation, based on named entity linking on traditional texts. The entity recall based on image similarity, and the entity disambiguation based on image entity context and entity meta information. Especially for current cross-modal models, we find that the capabilities at the fine-grained entity level are far from satisfactory. More information could be seen on VNEL. Model

Citation

If you find our work useful, please consider citing our paper:

@article{sun2022VNEL,
  title={Visual Named Entity Linking: A New Dataset and A Baseline},
  author={Sun, Wen  and Fan, Yixing  and Guo, Jiafeng  and Zhang, Ruqing  and Cheng, Xueqi},
  journal={Findings of EMNLP 2022},
  year={2022}
}

vnel's People

Contributors

sunwenxiang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

vnel's Issues

实验代码

您好,近期想follow您的工作,方便开源代码吗

The method of candidate entity retrieval

Thanks for your work!
I wonder the method of candidate retrieval. Is the candidate entity obtained by calculating the similarity between the input image and the entity image?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.