Coder Social home page Coder Social logo

toloka / crowd-kit Goto Github PK

View Code? Open in Web Editor NEW
204.0 13.0 15.0 1.37 MB

Control the quality of your labeled data with the Python tools you already know.

Home Page: https://crowd-kit.readthedocs.io/

License: Other

Python 100.00%
data-science data-mining crowd toloka labeling annotation aggregations python crowdsourcing quality-control

crowd-kit's Introduction

Crowd-Kit: Computational Quality Control for Crowdsourcing

Crowd-Kit

PyPI Version GitHub Tests Codecov Documentation Paper

Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets. We strive to implement functionality that simplifies working with crowdsourced data.

Currently, Crowd-Kit contains:

  • implementations of commonly-used aggregation methods for categorical, pairwise, textual, and segmentation responses;
  • metrics of uncertainty, consistency, and agreement with aggregate;
  • loaders for popular crowdsourced datasets.

Also, the learning subpackage contains PyTorch implementations of deep learning from crowds methods and advanced aggregation algorithms.

Installing

To install Crowd-Kit, run the following command: pip install crowd-kit. If you also want to use the learning subpackage, type pip install crowd-kit[learning].

If you are interested in contributing to Crowd-Kit, use Pipenv to install the library with its dependencies: pipenv install --dev. We use pytest for testing.

Getting Started

This example shows how to use Crowd-Kit for categorical aggregation using the classical Dawid-Skene algorithm.

First, let us do all the necessary imports.

from crowdkit.aggregation import DawidSkene
from crowdkit.datasets import load_dataset

import pandas as pd

Then, you need to read your annotations into Pandas DataFrame with columns task, worker, label. Alternatively, you can download an example dataset:

df = pd.read_csv('results.csv')  # should contain columns: task, worker, label
# df, ground_truth = load_dataset('relevance-2')  # or download an example dataset

Then, you can aggregate the workers' responses using the fit_predict method from the scikit-learn library:

aggregated_labels = DawidSkene(n_iter=100).fit_predict(df)

More usage examples

Implemented Aggregation Methods

Below is the list of currently implemented methods, including the already available (โœ…) and in progress (๐ŸŸก).

Categorical Responses

Method Status
Majority Vote โœ…
One-coin Dawid-Skene โœ…
Dawid-Skene โœ…
Gold Majority Vote โœ…
M-MSR โœ…
Wawa โœ…
Zero-Based Skill โœ…
GLAD โœ…
KOS โœ…
MACE โœ…

Multi-Label Responses

Method Status
Binary Relevance โœ…

Textual Responses

Method Status
RASA โœ…
HRRASA โœ…
ROVER โœ…

Image Segmentation

Method Status
Segmentation MV โœ…
Segmentation RASA โœ…
Segmentation EM โœ…

Pairwise Comparisons

Method Status
Bradley-Terry โœ…
Noisy Bradley-Terry โœ…

Learning from Crowds

Method Status
CrowdLayer โœ…
CoNAL โœ…

Citation

@article{CrowdKit,
  author    = {Ustalov, Dmitry and Pavlichenko, Nikita and Tseitlin, Boris},
  title     = {{Learning from Crowds with Crowd-Kit}},
  year      = {2024},
  journal   = {Journal of Open Source Software},
  volume    = {9},
  number    = {96},
  pages     = {6227},
  publisher = {The Open Journal},
  doi       = {10.21105/joss.06227},
  issn      = {2475-9066},
  eprint    = {2109.08584},
  eprinttype = {arxiv},
  eprintclass = {cs.HC},
  language  = {english},
}

Support and Contributions

Please use GitHub Issues to seek support and submit feature requests. We accept contributions to Crowd-Kit via GitHub as according to our guidelines in CONTRIBUTING.md.

License

ยฉ Crowd-Kit team authors, 2020โ€“2024. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.

crowd-kit's People

Contributors

alexandervnuchkov avatar alexdremov avatar alexdrydew avatar aliskin avatar arcadia-devtools avatar denaxen avatar dependabot[bot] avatar drhf avatar dustalov avatar losik avatar natalyl3 avatar ortemij avatar pavelgein avatar pilot7747 avatar senarect avatar shadchin avatar shenxiangzhuang avatar tulinev avatar varfolomeii avatar yulian-gilyazev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

crowd-kit's Issues

[BUG] Wrong columns names in crowd speech dataset

Observed behavior

I was using crowd speech dataset from crowd-kit and wanted to implement some aggravation methods and it found out that function fit_predict worked with only the columns named 'task', 'worker', 'output' but in this dataset their names new 'task', 'performer', 'text'. So I got the error that worker and output were not in the index.

Expected behavior

Nikita Pavlichenko suggested me to create this issue to change the names of the columns in the dataset

Python Version

3.7

Crowd-Kit Version

1.0.0

Other Packages Versions

No response

Example code

from crowdkit.datasets import load_dataset
from crowdkit.aggregation import TextHRRASA
from sentence_transformers import SentenceTransformer
encoder = SentenceTransformer('all-mpnet-base-v2')
hrrasa = TextHRRASA(encoder=encoder.encode)
df, gt = load_dataset('crowdspeech-test-clean')
df['text'] = df['text'].apply(lambda s: s.lower())
result = hrrasa.fit_predict(df)

Relevant log output

No response

[BUG] Mypy check error in newest pandas-stubs version

Observed behavior

I run the ci on my cloned repo with the error in python >= 3.9 as fellows:

Run mypy crowdkit tests
crowdkit/aggregation/classification/glad.py:160: error: Unexpected keyword argument "copy" for "merge"  [call-arg]
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/pandas-stubs/core/reshape/merge.pyi:22: note: "merge" defined here
Found 1 error in 1 file (checked 79 source files)
Error: Process completed with exit code 1.

I just lookup the changelog of pandas-stubs package, and I do find the delete operation of the coyp parameter: https://github.com/pandas-dev/pandas-stubs/pull/904/files#diff-8867b41003793df20a65f8c9bf8b4085caf372f603165a8fc0d9dad51ca37441

According to the newest document of pandas, maybe we can just remove the usage of copy here. I will create a pr for this later and feel free the give your suggestions about this!

Expected behavior

No response

Python Version

3.9

Crowd-Kit Version

1.3.0.post0

Other Packages Versions

No response

Example code

Just trigger the ci, the mypy error will be raised

Relevant log output

No response

import crowdkit [BUG]

Observed behavior


import crowdkit
# ...
    mmsr = crowdkit.aggregation.classification.m_msr.MMSR(
        n_iter=10000,
        tol=1e-10,
        n_workers=len(worker_to_id),
        n_tasks=len(st2_int),
        n_labels=2,  # Assuming binary responses
        workers_mapping=worker_to_id,
        tasks_mapping=task_to_id,
        labels_mapping=label_to_id,
    )

Exception has occurred: AttributeError
module 'crowdkit' has no attribute 'aggregation'
  File "/Users/athundt/Documents/m3c/analyze_survey_results.py", line 62, in assess_worker_responses
    mmsr = crowdkit.aggregation.classification.m_msr.MMSR(
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/athundt/Documents/m3c/analyze_survey_results.py", line 120, in statistical_analysis
    worker_skills = assess_worker_responses(binary_rank_df)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/athundt/Documents/m3c/analyze_survey_results.py", line 378, in main
    aggregated_df = statistical_analysis(combined_df, args.network_models)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/athundt/Documents/m3c/analyze_survey_results.py", line 381, in <module>
    main()
AttributeError: module 'crowdkit' has no attribute 'aggregation'

bugreport.py:

import crowdkit

def test_mmsr():
    try:
        mmsr = crowdkit.aggregation.classification.m_msr.MMSR
    except AttributeError as e:
        print(f"An error occurred: {e}")
    print('it worked!')

test_mmsr()

Expected behavior

MMSR constructor to be called.

Note this is how it is literally specified on the website, which should work if copied:
https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.m_msr.MMSR/

MMSR
crowdkit.aggregation.classification.m_msr.MMSR | [Source code](https://github.com/Toloka/crowd-kit/blob/v1.2.1/crowdkit/aggregation/classification/m_msr.py#L17)

MMSR(
    self,
    n_iter: int = 10000,
    tol: float = 1e-10,
    random_state: Optional[int] = 0,
    observation_matrix: ... = _Nothing.NOTHING,
    covariation_matrix: ... = _Nothing.NOTHING,
    n_common_tasks: ... = _Nothing.NOTHING,
    n_workers: int = 0,
    n_tasks: int = 0,
    n_labels: int = 0,
    labels_mapping: Dict[Any, int] = _Nothing.NOTHING,
    workers_mapping: Dict[Any, int] = _Nothing.NOTHING,
    tasks_mapping: Dict[Any, int] = _Nothing.NOTHING
)

The following does work, but the reported bug should work too!

from crowdkit.aggregation import MMSR

def test_mmsr():
    try:
        mmsr = MMSR
    except AttributeError as e:
        print(f"An error occurred: {e}")
    print('it worked!')

test_mmsr()

Thanks for giving this a look!

Python Version

3.11

Crowd-Kit Version

1.2.1

Other Packages Versions

athundt@MacBook-Pro m3c % pip freeze
aiohttp==3.8.6
aiohttp-retry==2.8.3
aiosignal==1.3.1
amqp==5.1.1
annotated-types==0.6.0
antlr4-python3-runtime==4.9.3
appdirs==1.4.4
async-timeout==4.0.3
asyncssh==2.14.0
atpublic==4.0
attrs==23.1.0
billiard==4.1.0
blinker==1.7.0
boto3==1.28.82
botocore==1.31.82
celery==5.3.4
certifi==2023.7.22
cffi==1.16.0
charset-normalizer==3.3.2
click==8.1.7
click-didyoumean==0.3.0
click-plugins==1.1.1
click-repl==0.3.0
colorama==0.4.6
configobj==5.0.8
crowd-kit==1.2.1
cryptography==41.0.5
dictdiffer==0.9.0
diskcache==5.6.3
distro==1.8.0
docopt==0.6.2
dpath==2.1.6
dulwich==0.21.6
dvc==3.28.0
dvc-data==2.20.0
dvc-http==2.30.2
dvc-objects==1.1.0
dvc-render==0.6.0
dvc-studio-client==0.15.0
dvc-task==0.3.0
dvclive==3.2.0
entrypoints==0.4
filelock==3.13.1
Flask==3.0.0
flatten-dict==0.4.2
flufl.lock==7.1.1
frozenlist==1.4.0
fsspec==2023.10.0
funcy==2.0
gitdb==4.0.11
GitPython==3.1.40
grandalf==0.8
gto==1.5.0
huggingface-hub==0.17.3
hydra-core==1.3.2
idna==3.4
iterative-telemetry==0.0.8
itsdangerous==2.1.2
Jinja2==3.1.2
jmespath==1.0.1
joblib==1.3.2
kombu==5.3.2
markdown-it-py==3.0.0
MarkupSafe==2.1.3
mdurl==0.1.2
mpmath==1.3.0
multidict==6.0.4
networkx==3.2.1
nltk==3.8.1
numpy==1.26.1
omegaconf==2.3.0
orjson==3.9.10
packaging==23.2
pandas==2.1.2
pathspec==0.11.2
pipreqs==0.4.13
platformdirs==3.11.0
prompt-toolkit==3.0.39
psutil==5.9.6
pycparser==2.21
pydantic==2.4.2
pydantic_core==2.10.1
pydot==1.4.2
pygit2==1.13.2
Pygments==2.16.1
pygtrie==2.5.0
pyparsing==3.1.1
python-dateutil==2.8.2
pytz==2023.3.post1
PyYAML==6.0.1
regex==2023.10.3
requests==2.31.0
rich==13.6.0
ruamel.yaml==0.18.5
ruamel.yaml.clib==0.2.8
s3transfer==0.7.0
safetensors==0.4.0
scikit-learn==1.3.2
scipy==1.11.3
scmrepo==1.4.1
semver==3.0.2
shortuuid==1.0.11
shtab==1.6.4
six==1.16.0
smmap==5.0.1
sqltrie==0.8.0
sympy==1.12
tabulate==0.9.0
threadpoolctl==3.2.0
tokenizers==0.14.1
tomlkit==0.12.2
torch==2.1.0
tqdm==4.66.1
transformers==4.35.0
typer==0.9.0
typing_extensions==4.8.0
tzdata==2023.3
urllib3==2.0.7
vine==5.0.0
voluptuous==0.13.1
wcwidth==0.2.9
Werkzeug==3.0.1
yarg==0.1.9
yarl==1.9.2
zc.lockfile==3.0.post1

Example code

import crowdkit

def test_mmsr():
    try:
        mmsr = crowdkit.aggregation.classification.m_msr.MMSR
    except AttributeError as e:
        print(f"An error occurred: {e}")

test_mmsr()

Relevant log output

An error occurred: module 'crowdkit' has no attribute 'aggregation'

Cannot use integer series in input data

When I use a matrix like

data = pd.DataFrame(
        [
            [1, 1, 0],
            [1, 2, 1],
            [1, 4, 1],
            [1, 5, 0],

            [2, 1, 1],
            [2, 2, 1],
            [2, 3, 1],
            [2, 4, 0],
            [2, 5, 0],

            [3, 1, 1],
            [3, 2, 0],
            [3, 3, 0],
            [3, 4, 1],
            [3, 5, 0],

            [4, 1, 1],
            [4, 2, 1],
            [4, 3, 1],
            [4, 4, 1],
            [4, 5, 1],

            [5, 1, 1],
            [5, 2, 0],
            [5, 3, 0],
            [5, 4, 0],
            [5, 5, 0],
        ],
        columns=['task', 'performer', 'label']
    )

and try

DawidSkene(n_iter=100).fit_predict(data)

then I get

Traceback (most recent call last):

  File "<input>", line 1, in <module>
  File "Redacted:\crowdkit\akgregation\dawid_skene.py", line 112, in fit_predict
    return self.fit(data).labels_
  File "Redacted:\crowdkit\aggregation\dawid_skene.py", line 94, in fit
    probas = self._e_step(data, priors, errors)
  File "Redacted:\crowdkit\aggregation\dawid_skene.py", line 62, in _e_step
    joined = data.join(np.log2(errors), on=['performer', 'label'])
  File "Redacted:\pandas\core\frame.py", line 8110, in join
    return self._join_compat(
  File "Redacted:\pandas\core\frame.py", line 8135, in _join_compat
    return merge(
  File "Redacted:\pandas\core\reshape\merge.py", line 89, in merge
    return op.get_result()
  File "Redacted:\pandas\core\reshape\merge.py", line 686, in get_result
    llabels, rlabels = _items_overlap_with_suffix(
  File "Redacted:\pandas\core\reshape\merge.py", line 2178, in _items_overlap_with_suffix
    raise ValueError(f"columns overlap but no suffix specified: {to_rename}")
ValueError: columns overlap but no suffix specified: Index(['task'], dtype='object')

When I convert the series all to strings/object then it works.

[FEATURE] Make DS algorithm takes the information from gold standard questions

Problem description

In the real world, when doing truth inference, we have a small set of gold standard questions mixed in with other unlabeled questions. I thought it would help if we could use the true labels of these gold standard questions to adjust the estimation of the user confusion matrix and the prior distribution of the options.

Feature description

Make ds algorithm accept an optional argument, gt for example, which can be used to adjust the estimation of user's confusion matrix and the priori distribution of options.

Potential alternatives

No response

Papers connected with feature

No response

Additional information

This feature may make the EM process in DS more complicated if we add the argument and make some changes directly on the original implementation. Maybe there are better ways to do this?

[BUG] The ROVER from crowd-kit expects wrong column name

Observed behavior

I was using the ROVER for the textual responses aggregation and found out that ROVER expected the column named 'text'. It's very suspicious because the analogs like TextRASA and TextHRRASA expect the column named 'output'.

Expected behavior

I suggest to unify functions' input. Use 'output' name for an example.

Python Version

3.7

Crowd-Kit Version

1.0.0

Other Packages Versions

No response

Example code

from crowdkit.aggregation import load_dataset
from crowdkit.aggregation import ROVER

df, gt = load_dataset('crowdspeech-test-clean')
df['text'] = df['text'].apply(lambda s: s.lower())
df=df.rename(columns={'performer': 'worker'})
df=df.rename(columns={'text': 'output'})

tokenizer = lambda s: s.split(' ')
detokenizer = lambda tokens: ' '.join(tokens)
result = ROVER(tokenizer, detokenizer).fit_predict(df)

Relevant log output

No response

how to deal with features in pairwaise comparison models?

I am working on a dataset of ATP (Association of Tennis Professionals - men only) tennis games over several years. I want to predict the outcome of tennis so one way to do that is using a Bradley-Terry model which is a probability model I am asking about how to do feature selection or feature engineering( I am not talking about domain knowledge FE) or preprocessing that must be applied before training the model

Ordinal Labels

Is it possible to support aggregation of ordinal labels as a part of this toolkit via this reduction algorithm.

  • Labels are categorical but have an ordering defined 1 < ... < K.
  • The K class ordinal labels are transformed into Kโˆ’1 binary class label data.
  • Each of the binary task is then aggregated via crowdkit to estimate Pr[yi > c] for c = 1,...,K โˆ’1.
  • The probability of the actual class values can then be obtained as Pr[yi = c] = Pr[yi > cโˆ’1 and yi โ‰ค c] = Pr[yi > cโˆ’1]โˆ’Pr[yi > c].
  • The class with the maximum probability is assigned to the instance

Add MACE

Is it possible that you add MACE ? It is often used in my field but there is only a Java implementation that is hard to integrate into Python projects.

[DOCS] ROVER Example snippet not working

Problem description

Hi,
I was trying to execute the code snippet provided as an example but it seems that the function is now in another castle.
This is the original snippet:

from crowdkit.aggregation import load_dataset
from crowdkit.aggregation import ROVER
df, gt = load_dataset('crowdspeech-test-clean')
df['text'] = df['text'].str.lower()
tokenizer = lambda s: s.split(' ')
detokenizer = lambda tokens: ' '.join(tokens)
result = ROVER(tokenizer, detokenizer).fit_predict(df)

and this is the same with the dirst line corrected

from crowdkit.datasets.load_dataset import load_dataset
from crowdkit.aggregation import ROVER
df, gt = load_dataset('crowdspeech-test-clean')
df['text'] = df['text'].str.lower()
tokenizer = lambda s: s.split(' ')
detokenizer = lambda tokens: ' '.join(tokens)
result = ROVER(tokenizer, detokenizer).fit_predict(df)

Thanks,
Marceau

Documentation links

https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.texts.rover.ROVER/
https://crowd-kit.readthedocs.io/en/latest/texts/#crowdkit.aggregation.texts.ROVER

Potential fix suggestion

I think that the first line from crowdkit.aggregation import load_dataset should be changed to from crowdkit.datasets.load_dataset import load_dataset (or from crowdkit.datasets import load_dataset)

[FEATURE] Unifying classification aggregators interfaces

Problem description

I wanted to test quality metrics of several different algorithms from crowdkit.aggregation.classification and found myself writing such kind of function:

def get_scores(model, data, fit=True):
    if fit:
        model.fit(data)
    probas = getattr(model, "probas_", None)
    if probas is not None:
        return probas
    predictor = getattr(model, "predict_score", None)
    if predictor is None:
        predictor = model.predict_proba
    return predictor(data)

That's because different models have different methods for retrieving scores. For example, MMSR has predict_score while almost all others have predict_proba. Some have field probas_ , while others don't.

This seems strange and inconsistent.

Feature description

Unify naming of predict_score functions and presence of probas_ field

Renamed columns?

Hi,
the guide says

df = pd.read_csv('results.csv') # should contain columns: task, performer, label

but when I load this file, then the second column is worker and not performer. I had used crowdkit with dataframes that had columns: task, performer, label, but after an update, it broke.

[BUG] MajorityVote() doesn't return what expected

Observed behavior

MajorityVote() doesn't return the result of the most workers.

Expected behavior

MajorityVote() to return the result of the most workers.

Python Version

3.11

Crowd-Kit Version

1.2.1

Other Packages Versions

No response

Example code

mv = MajorityVote()
resultmv = mv.fit_predict(df_crowd)

Relevant log output

No response

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.