Coder Social home page Coder Social logo

dkorenci / pan-clef-2024-oppositional Goto Github PK

View Code? Open in Web Editor NEW
5.0 2.0 5.0 3.7 MB

Utilities, baselines, instructions, and guidelines for the PAN CLEF 2024 shared task "Oppositional thinking analysis: Conspiracy vs critical narratives"

License: Apache License 2.0

Python 99.17% Shell 0.83%

pan-clef-2024-oppositional's Introduction

This repository contains code with utilities, baselines, instructions, and guidelines for the
PAN CLEF 2024 shared task "Oppositional thinking analysis: Conspiracy vs critical narratives"
https://pan.webis.de/clef24/pan24-web/oppositional-thinking-analysis.html
The code is licensed under the APACHE 2.0 license, see LICENSE file.
The exception is the span_f1_metric.py file, which is licensed under the GPL license, see the file for details.

This document contains the high-level overview of the code,
and the instructions to setup the environment for the project.

For the details of the data, see README-DATA,
for the conceptual overview of the task, and the guidelines and possible approaches for the participants,
see the slides in pan2024_oppositional_overview_guidelines.pdf

For the details on the evaluation metrics and procedure, see the README-EVALUATION file.

IMPORTANT: keep up with the updates on the official repository,
as the code and especially the README files might be improved and updated.


# 1. High-level Overview

The code is organized into three packages:

- data_tools
loading and preprocessing of the data

- classif_experim
baselines for the classification task, and the supporting functionality

- sequence_labeling
baselines for the sequence labeling task, and the supporting functionality

- getting started
First, create and modify the local variants of .gitignore and setting.py (from templates)
only templates are git-tracked, to facilitate the local setup
Second, setup the environment (see below). Third, run the experiments (see below).


# 2. Setup of the environment using conda

conda create --name env-name python=3.X # 3.10 is recommended
conda activate env-name
# from the root of the repository
./generate_requirements.sh  # mind the additional dependencies, see the code
pip install -r requirements.txt

If you will be relying on spacy docs as input format while working
with the seq. labeling baseline (this is the default so you probably will), do:
python -m spacy download en_core_web_sm
python -m spacy download es_core_news_sm


# 3. Running the experiments
Rename setting_template.py to settings.py and fill in the paths to the dataset files.

python entrypoint for running the classification experiment (subtask 1) is:
classif_experiment_runner.run_classif_experiments

python entrypoint for running the seq. labeling experiment () is:
seqlabel_experiment_runner.run_seqlab_experiments
CLI entrypoint is the seqlabel_experiment_runner.main,
you can run the module as in: run_seqlabel.sh.template, create the local .sh copy first

pan-clef-2024-oppositional's People

Contributors

dkorenci avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

pan-clef-2024-oppositional's Issues

Question Regarding CAMPAIGNER/AGENT/FACILITATOR Labels

We are currently working on the Oppositional Analysis task but we are not sure about the difference between the labels: Campaigner, Agent, and Facilitator. Could you please provide some insight into how these categories were assigned?

Issues when running the baselines

While running the baselines (classif_experiment_runner.py), I encountered the following issues:

  • The predict method of the SklearnTransformerClassif throws an error since the method from_list does not exist. It can be fixed by replacing the function with:
def predict(self, X):
    '''
    :param X: list-like of texts
    :return: array of label predictions
    '''
    ...
    df = pd.DataFrame([{'text': txt} for txt in X])
    dset = datasets.Dataset.from_pandas(df)
  • Another issue occurs in the SklearnTransformerBase class, where the CUDA device is not set properly:
        if device: self._device = device
        else: self._device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

should be replaced by:

        if device: self._device = device
        else: self._device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
  • Now I'm stuck in the score calculation phase, where I am unable to get past the error:
Traceback (most recent call last): 
File "/notebooks/pan-clef-2024-oppositional/classif_experim/classif_experiment_runner.py", line 170, in run_all_critic_conspi() 
File "/notebooks/pan-clef-2024-oppositional/classif_experim/classif_experiment_runner.py", line 165, in run_all_critic_conspi run_classif_experiments(lang=lang, num_folds=5, rnd_seed=seed, test=None, 
File "/notebooks/pan-clef-2024-oppositional/classif_experim/classif_experiment_runner.py", line 142, in run_classif_experiments res = run_classif_crossvalid(lang=lang, model_label=model, model_params=params, num_folds=num_folds, 
File "/notebooks/pan-clef-2024-oppositional/classif_experim/classif_experiment_runner.py", line 52, in run_classif_crossvalid scores = pd.DataFrame({fname: [f(cls_tst, cls_pred)] for fname, f in score_fns.items()}) 
File "/notebooks/pan-clef-2024-oppositional/classif_experim/classif_experiment_runner.py", line 52, in scores = pd.DataFrame({fname: [f(cls_tst, cls_pred)] for fname, f in score_fns.items()}) 
File "/usr/local/lib/python3.9/dist-packages/sklearn/metrics/_classification.py", line 1136, in f1_score return fbeta_score( 
File "/usr/local/lib/python3.9/dist-packages/sklearn/metrics/_classification.py", line 1277, in fbeta_score _, _, f, _ = precision_recall_fscore_support( 
File "/usr/local/lib/python3.9/dist-packages/sklearn/metrics/_classification.py", line 1563, in precision_recall_fscore_support labels = _check_set_wise_labels(y_true, y_pred, average, labels, pos_label) 
File "/usr/local/lib/python3.9/dist-packages/sklearn/metrics/_classification.py", line 1372, in _check_set_wise_labels raise ValueError( ValueError: pos_label=1 is not a valid label. It should be one of ['CONSPIRACY', 'CRITICAL']

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.