Coder Social home page Coder Social logo

mediabias_x_chatgpt's Introduction

Media Bias x ChatGPT

A ML model and app for detecting bias in media and AI generated content on a set of topics.

This repository explores approached to classification based on topic, bias and political bias in sentences sourced from various media outlets, using the OpenAI ADA embeddings. It additionally explores the political bias of content generated by ChatGPT with the trained model on human-labeled data.

Dependencies

The code in this repository utilizes the following packages:

  • numpy
  • pandas
  • matplotlib
  • scipy
  • plotly
  • seaborn
  • scikit-learn
  • umap-learn
  • openai
  • tiktoken

The accompanying web app additionally depends on streamlit, which was used to build it and is necessary to run it locally.

Data and Modeling

The data used for training the machine learning models was obtained from the BABE dataset on Kaggle. I used the largest dataset of sentences labeled by human experts (SG2). Approaches to topic, bias and outlet bias classification were explored with sklearn and I found that:

  • a neigbors (distance) based topic works best for topic due to the nature of ADA embeddings
  • LogisticRegression was the best classifier for bias, closely followed by RandomForest and MLP classifiers
  • an MLP classifier performed best for the political (outlet) bias prediction of the sentences. Additionally, all model hyperparameters were tuned using a grid search with cross validation method with 5 folds. Due to the unbalanced nature of classes in the dataset, in particular in terms of topic labels, the F1-weighted score was used as the main metric to assess model performance across the board.

Repository structure

The Project.ipynb notebook in the top-level directory covers the entire process of this project with expanations and visualization.

The notebooks/ directory contains all weird and random steps I took in the process of data exploration and model selection / hyper parameter tuning, including many that didn't make it in the final project deliverable.

All models and data used can be found in the data/ and models/ directories.

The top-level .py files in the repository contain all the files needed to run the streamlit app.

ChatGPT bias

The main question I tried to answer in the final deliverable of this project is whether content generated with ChatGPT is perceived as politically biased with respect to content generally reported by media outlets. For this purpose, I prompted ChatGPT to produce several sentences on a small set of topics present in the training data. All content generated by ChatGPT was classified as non-biased, but in terms of political bias a left-leaning classification showed to be more prevalent. The accompanying interactive app can be used to test any content with a valid OpenAI API key!

Running the app locally

Clone this repo and install the dependencies with

pip install -r requirements.txt

Run the app with

streamlit run app.py

If you have an OpenAI API key and want to try out the content analyzer, all you need to do is make a new file in the app directory called .env and log your api key in it as:

OPENAI_API_KEY = 'your-api-key-here'

The app will then read the key from your local environment and open the content analyzer section!

Live App

Some of the interactive functionality, like exploring the data and fitting different models is available in a streamlit app. Unfortunately, the content analyzer can only run locally for the time being due to the OpenAI API key restrictions.

mediabias_x_chatgpt's People

Contributors

gecheline avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.