Coder Social home page Coder Social logo

nerel's Introduction

NEREL

A Russian dataset with nested named entities, relations, events and linked entities.

alt text

Version history

1.1

Added: Linked entities

1.0

First version:

  • Nested named entities
  • Events
  • Relations

Entities

List of entity types

No. Entity type No. Entity type No. Entity type
1. AGE 11. FAMILY 21. PENALTY
2. AWARD 12. IDEOLOGY 22. PERCENT
3. CITY 13. LANGUAGE 23. PERSON
4. COUNTRY 14. LAW 24. PRODUCT
5. CRIME 15. LOCATION 25. PROFESSION
6. DATE 16. MONEY 26. RELIGION
7. DISEASE 17. NATIONALITY 27. STATE_OR_PROV
8. DISTRICT 18. NUMBER 28. TIME
9. EVENT 19. ORDINAL 29. WORK_OF_ART
10. FACILITY 20. ORGANIZATION

Baselines for nested NER

Word representations used with all models are fastText (fT) and pre-trained RuBERT-cased embeddings.

For more details, please see here.

Results of nested NER for NEREL

Method P R F1
Biaffine, fT 81.64 77.69 79.62
Biaffine, RuBERT, ft 80.71 77.84 79.25
Pyramid, fT 75.87 72.40 74.09
Pyramid, RuBERT, ft 79.54 79.91 79.73
SpERT, RuBERT 82.90 82.14 82.52
MRC 85.04 84.95 84.99

Relations

List of relation types

No. Relation type No. Relation type No. Relation type
1. ABBREVIATION 18. HEADQUARTERED_IN 35. PLACE_RESIDES_IN
2. AGE_DIED_AT 19. IDEOLOGY_OF 36. POINT_IN_TIME
3. AGE_IS 20. INANIMATE_INVOLVED 37. PRICE_OF
4. AGENT 21. INCOME 38. PRODUCES
5. ALTERNATIVE_NAME 22. KNOWS 39. RELATIVE
6. AWARDED_WITH 23. LOCATED_IN 40. RELIGION_OF
7. CAUSE_OF_DEATH 24. MEDICAL_CONDITION 41. SCHOOLS_ATTENDED
8. CONVICTED_OF 25. MEMBER_OF 42. SIBLING
9. DATE_DEFUNCT_IN 26. ORGANIZES 43. SPOUSE
10. DATE_FOUNDED_IN 27. ORIGINS_FROM 44. START_TIME
11. DATE_OF_BIRTH 28. OWNER_OF 45. SUBEVENT_OF
12. DATE_OF_CREATION 29. PARENT_OF 46. SUBORDINATE_OF
13. DATE_OF_DEATH 30. PART_OF 47. TAKES_PLACE_IN
14. END_TIME 31. PARTICIPANT_IN 48. WORKPLACE
15. EXPENDITURE 32. PENALIZED_AS 49. WORKS_AS
16. FOUNDED_BY 33. PLACE_OF_BIRTH
17. HAS_CAUSE 34. PLACE_OF_DEATH

Baselines for In-sentence relation extraction

Baselines for nested relation extraction

Baseline for Document-level relation extraction

The encoders used with SpanBERT and OpenNRE are multilingual BERT and RuBERT.

Results of relation extraction for NEREL

Method P R F1
In-sentence relations
OpenNRE, mBERT 81.7 81.6 81.7
OpenNRE, RuBERT 85.3 84.6 84.9
SpanBERT, mBERT 76.8 75.4 76.1
SpanBERT, RuBERT 77.4 78.6 78.0
TRE 66.4 68.1 67.2
In-sentence nested relations
OpenNRE, mBERT 74.3 77.7 76.0
OpenNRE, RuBERT 77.8 79.6 78.7
IntModel 76.3 72.4 74.3
Document-level relations
OpenNRE, mBERT 35.7 51.2 42.1
OpenNRE, RuBERT 52.1 51.3 51.7

Useful links

๐Ÿ““ Update 1 November 2023: this collection is now available in arekit-ss for a quick sampling of contexts with most subject-object relation mentions with just single script into JSONL/CSV/SqLite including (optional) language transfering ๐Ÿ”ฅ [Learn more ...]

NEREL-BIO

NEREL-BIO is an extension of the NEREL dataset, introducing biomedical entity types in addition to the general-domain entities.

Citing & Authors

If you find this repository helpful, feel free to cite our papers:

[1] Loukachevitch N. et al. NEREL: a Russian information extraction dataset with rich annotation for nested entities, relations, and wikidata entity links. Language Resources and Evaluation (2023). https://doi.org/10.1007/s10579-023-09674-z

@article{loukachevitch2023nerel,
  title={NEREL: a Russian information extraction dataset with rich annotation for nested entities, relations, and wikidata entity links},
  author={Loukachevitch, Natalia and Artemova, Ekaterina and Batura, Tatiana and Braslavski, Pavel and Ivanov, Vladimir and Manandhar, Suresh and Pugachev, Alexander and Rozhkov, Igor and Shelmanov, Artem and Tutubalina, Elena and others},
  journal={Language Resources and Evaluation},
  pages={1--37},
  year={2023},
  publisher={Springer}
}

[2] Loukachevitch N., Artemova E., Batura T., Braslavski P., Denisov I., Ivanov V., Manandhar S., Pugachev A., Tutubalina E. NEREL: A Russian Dataset with Nested Named Entities, Relations and Events. Proceedings of RANLP. 2021. pp. 880โ€“889.

@inproceedings{loukachevitch2021nerel,
  title={{NEREL: A Russian} Dataset with Nested Named Entities, Relations and Events},
  author={Loukachevitch, Natalia and Artemova, Ekaterina and Batura, Tatiana and Braslavski, Pavel and Denisov, Ilia and Ivanov, Vladimir and Manandhar, Suresh and Pugachev, Alexander and Tutubalina, Elena},
  booktitle={Proceedings of RANLP},
  pages={876--885},
  year={2021}
}

[3] Loukachevitch N., Braslavski P., Ivanov V., Batura T., Manandhar S., Shelmanov A., Tutubalina E. Entity Linking over Nested Named Entities for Russian. Proceedings of LREC. 2022. pp. 4458โ€“4466.

@inproceedings{nerel-el-nne, 
  title={{Entity Linking over Nested Named Entities for Russian}},
  author={Loukachevitch, Natalia and Braslavski, Pavel and Ivanov, Vladimir and Batura, Tatiana and Manandhar, Suresh and Shelmanov, Artem and Tutubalina, Elena},
  booktitle={Proceedings of LREC},
  year={2022},
}

nerel's People

Contributors

1ytic avatar nerel-ds avatar tutubalinaev avatar tvbat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

nerel's Issues

Mention Handy Dataset Reading Option in README

Dear resource maintaners,

Thank you for sharing such a large and saturated with annotations collection!
Since the original collection represent a BRAT-formatted document, for the quick-starting cases and work with relations, it might be found in writing an addtional service for parsing and extracting text parts with mentioned relations in it.
To address this limitation, I am writing to contribute and propose a handy and quick solution for a quick extraction of most relations between mentioned objects just within a single command line with the following opensource framework:

python3 -m arekit_ss.sample --writer jsonl --source nerel --sampler bert --text_parser lm --output_dir "NEREL-samples`

Basically, it converts the BRAT-based representation of NEREL collection into jsonl.
Other formats, such as csv or sqlite3, entities masking, are supported and the complete list of the formats could be found here

Proposal for a quick README modification

I hope this is both for the beneficial for a quick application of your collection by other as well as personal interest in maintaining opensource solutions to contribute in studies, based on semantic relations in texts.

Here is an example on how to add the reading info into the README:

[![](https://img.shields.io/badge/AREkit--ss_Compatible-0.23.1-purple.svg)](https://github.com/nicolay-r/arekit-ss#usage)

> ๐Ÿ““ **Update 25 October 2023**: this collection **is now available in [arekit-ss](https://github.com/nicolay-r/arekit-ss)**
> for a [quick sampling](https://github.com/nicolay-r/arekit-ss#usage) of contexts with most subject-object relation mentions with just **single script into
> `JSONL/CSV/SqLite`** including (optional) language transfering ๐Ÿ”ฅ [[Learn more ...]](https://github.com/nicolay-r/arekit-ss#usage)

Which will look as follows:

๐Ÿ““ Update 25 October 2023: this collection is now available in arekit-ss
for a quick sampling of contexts with most subject-object relation mentions with just single script into
JSONL/CSV/SqLite
including (optional) language transfering ๐Ÿ”ฅ [Learn more ...]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.