Evaluation -- add script for passing custom archives for test

RuSentRel Leaderboard

📓 Update 01 October 2023: this collection is now available in arekit-ss for a quick sampling of contexts with all subject-object relation mentions with just single script into JSONL/CSV/SqLite including (optional) language transferring 🔥 [Learn more ...]

Dataset description: RuSentRel collection consisted of analytical articles from Internet-portal inosmi.ru. These are translated into Russian texts in the domain of international politics obtained from foreign authoritative sources. The collected articles contain both the author's opinion on the subject matter of the article and a large number of references mentioned between the participants of the described situations. In total, 73 large analytical texts were labeled with about 2000 relations.

This repository is an official results benchmark for automatic sentiment attitude extraction task within RuSentRel collection. Let's follow the task section for greater details.

Contributing: Please feel free to make pull requests, and at awesome-sentiment-attitude-extraction especially!

For more details about RuSentRel please proceed with the related repository.

Task

Given a subset of documents in the RuSentRel collection, where each document is presented by a pair: (1) text, (2) a list of selected named entities. For each document, it is required to complete a list of such entity pairs (e_s, e_o), for which text conveys the presence of sentiment relation from the e_s (subject) towards an e_o (object). Label assignation can be neg or pos.

Example

... При этом Москва неоднократно подчеркивала, что ее активность на балтике является ответом именно на действия НАТО и эскалацию враждебного подхода к Росcии вблизи ее восточных границ ... (... Meanwhile Moscow has repeatedly emphasized that its activity in the Baltic Sea is a response precisely to actions of NATO and the escalation of the hostile approach to Russia near its eastern borders ...)

(NATO->Russia, neg), (Russia->NATO, neg)

Task paper: https://arxiv.org/pdf/1808.08932.pdf

Approaches

The task is considered as a context classification problem, in which context is a text region with mentioned pair (attitude participants) in it. Then classified context-level attitudes transfers onto document-level by averaging context labels of the related pair (using the voting method).

We implement AREkit toolkit which becomes a framework for the following applications:

BERT-based language models [code];
Neural Networks with (and w/o) Attention mechanism [code];
Conventional Machine Learning methods [code];

Back to Top

Submission Evaluation

Source code exported from AREkit-0.21.0 library and yields of:

Evaluation directory for details of the evaluator implementation and the related dependencies;
Test directory, which includes test scripts that allow applying evaluator for the archived results.

Use evaluate.py to evaluate your submissions. Below is an example for assessing the results of ChatGPT-3.5-0613:

python3 evaluate.py --input data/chatgpt-avg.zip --mode classification --split cv3

Back to Top

Leaderboard

Results ordered from the latest to the oldest. We measure F1 (scaled by 100) across the following foldings (see evaluator section for greater details):

F1_cv - the average F1 of a 3-fold CV check; foldings carried out by preserving the same number of sentences in each of them;
F_t -- F1 over the predefined TEST set;

The result assessment organized in experiments:

3l -- subject-object pairs extraction.
2l -- classification of already given subject-object pairs on document level;

Methods	F1_cv (3l)	F1_t (3l)	F1_cv (2l)	F1_t (2l)

Expert Agreement^** [1]	55.0	55.0	-	-

ChatGPT zero-shot with promptings^*** [7]

ChatGPT_{3.5-0613, avg} [200 words distance]	37.7	39.6
ChatGPT_{3.5-0613, avg} [50 words distance]			66.19	74.47
ChatGPT_{3.5-0613, first} [50 words distance]			69.23	74.09

Distant Supervision_RA-2.0-large for Language Models (BERT-based) [6]
[_pt -- pretrained, _ft -- fine-tunded]
SentenceRuBERT (NLI_pt + NLI_ft)	39.0	38.0	70.2	67.7
SentenceRuBERT (NLI_pt + QA_ft)	38.4	41.9	69.6	64.2
SentenceRuBERT (NLI_pt + C_ft)	37.9	39.8	70.0	69.8
RuBERT (NLI_pt + NLI_ft)	36.8	39.9	71.0	68.6
RuBERT (NLI_pt + QA_ft)	34.8	37.0	69.6	68.2
RuBERT (NLI_pt + C_ft)	35.6	35.4	70.0	69.8
mBase (NLI_pt + NLI_ft)	33.6	36.0	69.4	68.2
mBase (NLI_pt + QA_ft)	30.1	35.5	69.6	65.2
mBase (NLI_pt + C_ft)	30.5	31.1	68.9	67.7

Distant Supervision_RA-2.0-large for (Attentive) Neural Networks + Frames annotation [Joined Training] [6]_reproduced, [4]_original
PCNN_ends	32.2	39.9	70.2	67.8
BiLSTM	32.0	38.8	71.2	68.4
PCNN	31.6	39.7	69.5	70.5
LSTM	31.6	39.5	68.0	75.4
Att-BiLSTM [P.Zhou et. al]	31.0	37.3	66.2	71.2
AttCNN_ends	30.9	39.9	66.8	72.7
IAN_ends	30.7	36.7	69.1	72.6

Distant Supervision_RA-1.0 for Multi-Instance Neural Networks [Joined Training] [5]

MI-PCNN				68.0
MI-CNN				62.0
PCNN				67.0
CNN				63.0
Language Models (BERT-based) [6]
SentenceRuBERT (NLI)	33.4	32.7	69.8	67.6
SentenceRuBERT (QA)	34.3	38.9	70.2	67.1
SentenceRuBERT (C)	34.0	35.2	69.3	65.5
RuBERT (NLI)	29.4	39.6	68.9	66.4
RuBERT (QA)	32.0	35.3	69.5	66.2
RuBERT (C)	36.8	37.6	67.8	66.2
mBase (NLI)	29.2	37.0	67.8	58.4
mBase (QA)	28.6	33.8	66.5	65.4
mBase (C)	26.9	30.0	67.0	68.9

(Attentive) Neural Networks + Frames annotation ([6]_reproduced, [3]_original)
IAN_ends	30.8	32.2	60.8	63.5
AttPCNN_ends	29.9	32.6	64.3	63.3
PCNN	29.6	32.5	64.4	63.3
CNN	28.7	31.4	63.6	65.9
BILSTM	28.6	32.4	62.3	71.2
LSTM	27.9	31.6	61.9	65.3
AttCNN_ends	27.6	29.7	65.0	66.2
Att-BiLSTM [P.Zhou et. al]	27.5	32.3	65.7	68.2

Convolutional networks [2]
PCNN [code]		0.31
CNN		0.30

Conventional methods [1] [code]
Gradient Boosting (Grid search)	20.3^*	28.0
Random Forest (Grid search)	19.1^*	27.0
Random Forest	15.7^*	27.0
Naive Bayes (Bernoulli)	15.2^*	16.0
SVM	15.1^*	15.0
Gradient Boosting	14.4^*	27.0
SVM (Grid search)	14.3^*	15.0
NaiveBayes (Gauss)	9.2^*	11.0
KNN	7.0^*	9.0

Baseline (School) [link]		12.0
Baseline (Distr)		8.0
Baseline (Random)	7.4^*	8.0
Baseline (Pos)	3.9^*	4.0
Baseline (Neg)	5.2^*	5.0

*: Results that were not mentioned in papers.

**: We asked another super-annotator to label the collection, and compared her annotation with our gold standard using average F-measure of positive and negative classes in the same way as for automatic approaches. In such a way, we can reveal the upper border for automatic algorithms. We obtained that F-measure of human labeling. [1]

***: We consider translation into english samples via the arekit-ss by translating texts into english first, and then wrapping them into prompts. We consider a k-words distance (50 by default, in english) between words as a upper bound for pairs organization; because of the latter and prior standards, results might be lower (translation increases distance in words).

Back to Top

Neural Networks Optimization

The training process is described in Rusnachenko et. al., 2020 (section 7.1) and relies on the Multi-Instance learning approach, originally proposed in Zeng et. al., 2015 paper. (SGD application, bags terminology, instances selection within bags). All the batch context samples are gathered into bags. Authors propose to select the best instance in every bag as follows: calculate the max value of p(y_i|m_i,j) across i'th values within a particular j'th bag. The latter allows them to adopt loss function on bags level.

In our works, we adopt bags for synonymous context gathering. Therefore, for gradients calculation within bags, we choose avg function instead. The assumption here is to consider other synonymous attitudes during the gradients calculation procedure. We use BagSize > 1 in earlier work Rusnachenko, 2018 In the latest experiments, we consider BagSize = 1 and therefore don't exploit bag values averaging.

Back to Top

Related works

Awesome Sentiment Attitude Extraction

Back to Top

References

[1] Natalia Loukachevitch, Nicolay Rusnachenko Extracting Sentiment Attitudes from Analytical Texts Proceedings of International Conference on Computational Linguistics and Intellectual Technologies Dialogue-2018 (arXiv:1808.08932) [paper] [code]

[2] Nicolay Rusnachenko, Natalia Loukachevitch Using Convolutional Neural Networks for Sentiment Attitude Extraction from Analytical Texts, EPiC Series in Language and Linguistics 4, 1-10, 2019 [paper] [code]

[3] Nicolay Rusnachenko, Natalia Loukachevitch Studying Attention Models in Sentiment Attitude Extraction Task Métais E., Meziane F., Horacek H., Cimiano P. (eds) Natural Language Processing and Information Systems. NLDB 2020. Lecture Notes in Computer Science, vol 12089. Springer, Cham [paper] [code]

[4] Nicolay Rusnachenko, Natalia Loukachevitch Attention-Based Neural Networks for Sentiment Attitude Extraction using Distant Supervision The 10th International Conference on Web Intelligence, Mining and Semantics (WIMS 2020), June 30-July 3 (arXiv:2006.13730) [paper] [code]

[5] Nicolay Rusnachenko, Natalia Loukachevitch, Elena Tutubalina Distant Supervision for Sentiment Attitude Extraction Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019) [paper] [code]

[6] Nicolay Rusnachenko Language Models Application in Sentiment Attitude Extraction Task Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2021;33(3):199-222. (In Russ.) [paper] [code-networks] [code-bert]

[7] Bowen Zhang, Daijun Ding, Liwen Jing How would Stance Detection Techniques Evolve after the Launch of ChatGPT? [paper]

Back to Top

	def get_archive_filepath(result_version):
	return join(dirname(__file__), "data/{version}".format(version=result_version))

nicolay-r / rusentrel-leaderboard Goto Github PK

rusentrel-leaderboard's Introduction

RuSentRel Leaderboard

Contents

Task

Approaches

Submission Evaluation

Leaderboard

Neural Networks Optimization

Related works

References

rusentrel-leaderboard's People

Contributors

Stargazers

Watchers

rusentrel-leaderboard's Issues

Recommend Projects

Recommend Topics

Recommend Org