dkorenci / pan-clef-2024-oppositional Goto Github PK

5.0 2.0 5.0 3.7 MB

Utilities, baselines, instructions, and guidelines for the PAN CLEF 2024 shared task "Oppositional thinking analysis: Conspiracy vs critical narratives"

License: Apache License 2.0

Python 99.17% Shell 0.83%

pan-clef-2024-oppositional's Introduction

This repository contains code with utilities, baselines, instructions, and guidelines for the
PAN CLEF 2024 shared task "Oppositional thinking analysis: Conspiracy vs critical narratives"
https://pan.webis.de/clef24/pan24-web/oppositional-thinking-analysis.html
The code is licensed under the APACHE 2.0 license, see LICENSE file.
The exception is the span_f1_metric.py file, which is licensed under the GPL license, see the file for details.

This document contains the high-level overview of the code,
and the instructions to setup the environment for the project.

For the details of the data, see README-DATA,
for the conceptual overview of the task, and the guidelines and possible approaches for the participants,
see the slides in pan2024_oppositional_overview_guidelines.pdf

For the details on the evaluation metrics and procedure, see the README-EVALUATION file.

IMPORTANT: keep up with the updates on the official repository,
as the code and especially the README files might be improved and updated.

# 1. High-level Overview

The code is organized into three packages:

- data_tools
loading and preprocessing of the data

- classif_experim
baselines for the classification task, and the supporting functionality

- sequence_labeling
baselines for the sequence labeling task, and the supporting functionality

- getting started
First, create and modify the local variants of .gitignore and setting.py (from templates)
only templates are git-tracked, to facilitate the local setup
Second, setup the environment (see below). Third, run the experiments (see below).

# 2. Setup of the environment using conda

conda create --name env-name python=3.X # 3.10 is recommended
conda activate env-name
# from the root of the repository
./generate_requirements.sh # mind the additional dependencies, see the code
pip install -r requirements.txt

If you will be relying on spacy docs as input format while working
with the seq. labeling baseline (this is the default so you probably will), do:
python -m spacy download en_core_web_sm
python -m spacy download es_core_news_sm

# 3. Running the experiments
Rename setting_template.py to settings.py and fill in the paths to the dataset files.

python entrypoint for running the classification experiment (subtask 1) is:
classif_experiment_runner.run_classif_experiments

python entrypoint for running the seq. labeling experiment () is:
seqlabel_experiment_runner.run_seqlab_experiments
CLI entrypoint is the seqlabel_experiment_runner.main,
you can run the module as in: run_seqlabel.sh.template, create the local .sh copy first

pan-clef-2024-oppositional's People

Contributors

Stargazers

Watchers

Forkers

angelotulbure buzzeitor30 victormyeste nacho-bytes costantino2000

pan-clef-2024-oppositional's Issues

Question Regarding CAMPAIGNER/AGENT/FACILITATOR Labels

We are currently working on the Oppositional Analysis task but we are not sure about the difference between the labels: Campaigner, Agent, and Facilitator. Could you please provide some insight into how these categories were assigned?

Issues when running the baselines

While running the baselines (classif_experiment_runner.py), I encountered the following issues:

The predict method of the SklearnTransformerClassif throws an error since the method from_list does not exist. It can be fixed by replacing the function with:

def predict(self, X):
    '''
    :param X: list-like of texts
    :return: array of label predictions
    '''
    ...
    df = pd.DataFrame([{'text': txt} for txt in X])
    dset = datasets.Dataset.from_pandas(df)

Another issue occurs in the SklearnTransformerBase class, where the CUDA device is not set properly:

        if device: self._device = device
        else: self._device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

should be replaced by:

        if device: self._device = device
        else: self._device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

Now I'm stuck in the score calculation phase, where I am unable to get past the error:

Traceback (most recent call last): 
File "/notebooks/pan-clef-2024-oppositional/classif_experim/classif_experiment_runner.py", line 170, in run_all_critic_conspi() 
File "/notebooks/pan-clef-2024-oppositional/classif_experim/classif_experiment_runner.py", line 165, in run_all_critic_conspi run_classif_experiments(lang=lang, num_folds=5, rnd_seed=seed, test=None, 
File "/notebooks/pan-clef-2024-oppositional/classif_experim/classif_experiment_runner.py", line 142, in run_classif_experiments res = run_classif_crossvalid(lang=lang, model_label=model, model_params=params, num_folds=num_folds, 
File "/notebooks/pan-clef-2024-oppositional/classif_experim/classif_experiment_runner.py", line 52, in run_classif_crossvalid scores = pd.DataFrame({fname: [f(cls_tst, cls_pred)] for fname, f in score_fns.items()}) 
File "/notebooks/pan-clef-2024-oppositional/classif_experim/classif_experiment_runner.py", line 52, in scores = pd.DataFrame({fname: [f(cls_tst, cls_pred)] for fname, f in score_fns.items()}) 
File "/usr/local/lib/python3.9/dist-packages/sklearn/metrics/_classification.py", line 1136, in f1_score return fbeta_score( 
File "/usr/local/lib/python3.9/dist-packages/sklearn/metrics/_classification.py", line 1277, in fbeta_score _, _, f, _ = precision_recall_fscore_support( 
File "/usr/local/lib/python3.9/dist-packages/sklearn/metrics/_classification.py", line 1563, in precision_recall_fscore_support labels = _check_set_wise_labels(y_true, y_pred, average, labels, pos_label) 
File "/usr/local/lib/python3.9/dist-packages/sklearn/metrics/_classification.py", line 1372, in _check_set_wise_labels raise ValueError( ValueError: pos_label=1 is not a valid label. It should be one of ['CONSPIRACY', 'CRITICAL']

dkorenci / pan-clef-2024-oppositional Goto Github PK

pan-clef-2024-oppositional's Introduction

pan-clef-2024-oppositional's People

Contributors

Stargazers

Watchers

Forkers

pan-clef-2024-oppositional's Issues

Question Regarding CAMPAIGNER/AGENT/FACILITATOR Labels

Issues when running the baselines

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent