Coder Social home page Coder Social logo

polianna's Introduction

POLIcy design ANNotAtions (POLIANNA)

This repository contains the code for the journal article Sewerin, S., Kaack, L.H., Küttel, J. et al. Towards understanding policy design through text-as-data approaches: The policy design annotations (POLIANNA) dataset. Sci Data 10, 896 (2023). https://doi.org/10.1038/s41597-023-02801-z.

This README is not intended to be self-explanatory, but rather should be read together with the original paper. Below we give an overview of the project and how to use this repository.

Paper abstract

Despite the importance of ambitious policy action for addressing climate change, large and systematic assessments of public policies and their design are lacking as analysing text manually is labour-intensive and costly. POLIANNA is a dataset of policy texts from the European Union (EU) that are annotated based on theoretical concepts of policy design, which can be used to develop supervised machine learning approaches for scaling policy analysis. The dataset consists of 20,577 annotated spans, drawn from 18 EU climate change mitigation and renewable energy policies. We developed a novel coding scheme translating existing taxonomies of policy design elements to a method for annotating text spans that consist of one or several words. Here, we provide the coding scheme, a description of the annotated corpus, and an analysis of inter-annotator agreement, and discuss potential applications. As understanding policy texts is still difficult for current text-processing algorithms, we envision this database to be used for building tools that help with manual coding of policy texts by automatically proposing paragraphs containing relevant information.

Installing dependencies

Conda

conda env create -f environment.yml

Then manually install the pygamma package1 over pip:

conda activate POLIANNA
pip install git+https://github.com/bootphon/pygamma-agreement.git

Data

The annotated corpus is publicly available at https://doi.org/10.5281/zenodo.8284380. This folder contains a file with the data preprocessed as a dataframe as it is needed for working with this repository. If you wish to process data from scratch, for example to add your own annotatations, we also provide a way to process data exported from Inception2.

Processed data

Download this folder and store the subfolder 02_processed_to_dataframe under the directory /data/02_processed_to_dataframe.

Processing from Inception

Store your raw data folder directory in the RAW_DATA_PATH variable in the definitions.py file. Note that the data published at the link above is not the raw data in the Inception format, which is not anonymous, but converted in a generally usable format.

Process the raw data by running the src/d01_data/load_data.py script. The cleaned corpus can then be found under data/d02_processed_to_dataframe as a pkl and csv format. The script allows to anonymize the data with the following flag:

python load_data.py --anonymous_annotators=1

Corpus Class

The Corpus class in src/d02_corpus_statistics/corpus.py contains all the relevant functions to calculate the relevant corpus statistics such as annotation counts and tag frequencies. A working example of all the functions can be found in the Jupyter Notebook notebooks/Tutorial.ipynb.

Inter-Annotator Agreement Class

The Inter Annotator Agreement class in src/d03_inter_annotator_agreement contains all the relevant functions to calculate the relevant inter annotator agreement scores. A working example of all the functions can be found in the Jupyter Notebook notebooks/Inter_Annotator_Agreement.ipynb

Coding scheme

A file containing the coding scheme in JSON format can be found in the provided dataset at 01_policy_info/Coding_Scheme.json. Store this under the data directory as well.

Getting started

The Jupyter Notebook notebooks/Tutorial.ipynb introduces the classes and simple examples how to work with the data.

Analysis

The notebook notebooks/Analysis.ipynb contains the descriptive analysis of the data and figures used in the paper.

Tests and more examples

Test notebooks can be found under notebooks/test_notebooks and more examples (such as sentence wise implementation) can be found under notebooks/other.

Other scripts to process your own data

For the labeled data, we have split the EU laws into articles. We provide the scripts to download EU laws as text_processing/download_searches.py and to split those into articles as text_processing/process_text.ipynb.

References

Footnotes

  1. https://pygamma-agreement.readthedocs.io/en/latest/

  2. https://github.com/inception-project

polianna's People

Contributors

kueddelmaier avatar lynnkaack avatar

Stargazers

Arif Cem Gundogan avatar Oskar Krafft avatar Yangdong Liu avatar Mao Wang avatar Arsam Aryandoust avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.