Coder Social home page Coder Social logo

mwss's Introduction

Multi-source Weak Social Supervision for Fake News Detection

Authors: Guoqing Zheng ([email protected]), Yichuan Li ([email protected]), Kai Shu ([email protected])

This repository contains code for fake news detection with Multi-source Weak Social Supervision (MWSS), published at ECML-PKDD 2020 at: Early Detection of Fake News with Multi-source Weak Social Supervision

Model Structure

The structure of the MWSS model: model structure

The training process of MWSS model: model structure

Requirements

torch=1.x

transformers=2.4.0

Prepare dataset

The weak labeled dataset is in the format of:

news Weak_Label_1 Weak_Label_2 ... Weak_Label_K
Hello this is MWSS readme 1 0 ... 1

The clean labeled dataset is in the format of:

News label
Hello this is MWSS readme 1

Usage

a. train_type is {0:"clean", 1:"noise", 2:"clean+noise"}

b. "--meta_learn" is to set the instance weight for each noise samples.

c. "--multi_head" is to set the weak source count, if you have three different weak source, you should set it to 3.

d. "--group_opt": specific optimizer for group weight. You can choose Adam and SGD.

e. "--gold_ratio": Float gold ratio for the training data. Default is 0 which will use [0.02, 0.04, 0.06, 0.08, 0.1] all the gold ratio. For gold ratio 0.02, set it as "--gold_ratio 0.02"

  • Finetune on RoBERTa Group Weight

    python3 run_classifiy.py
    --model_name_or_path roberta-base
    --evaluate_during_training --do_train --do_eval
    --num_train_epochs 15
    --output_dir ./output/
    --logging_steps 100
    --max_seq_length 256
    --train_type 0
    --per_gpu_eval_batch_size 16
    --g_train_batch_size 5
    --s_train_batch_size 5
    --clf_model "robert"
    --meta_learn
    --weak_type "none"
    --multi_head 3
    --use_group_net
    --group_opt "adam"
    --train_path "./data/political/weak"
    --eval_path "./data/political/test.csv"
    --fp16
    --fp16_opt_level O1
    --learning_rate 1e-4
    --group_adam_epsilon 1e-9
    --group_lr 1e-3
    --gold_ratio 0.04
    --id "ParameterGroup1"

The log information will stored in

~/output
  • CNN Baseline Model

    python run_classifiy.py
    --model_name_or_path distilbert-base-uncased
    --evaluate_during_training --do_train --do_eval --do_lower_case
    --num_train_epochs 30
    --output_dir ./output/
    --logging_steps 10
    --max_seq_length 256
    --train_type 0
    --weak_type most_vote
    --per_gpu_train_batch_size 256
    --per_gpu_eval_batch_size 256
    --learning_rate 1e-3
    --clf_model cnn

  • CNN Instance Weight Model with multi classification heads

    python run_classifiy.py
    --model_name_or_path distilbert-base-uncased
    --evaluate_during_training --do_train --do_eval --do_lower_case
    --num_train_epochs 256
    --output_dir ./output/
    --logging_steps 10
    --max_seq_length 256
    --train_type 0
    --per_gpu_eval_batch_size 256
    --g_train_batch_size 256
    --s_train_batch_size 256
    --learning_rate 1e-3
    --clf_model cnn
    --meta_learn
    --weak_type "none"

  • CNN group weight

    python run_classifiy.py
    --model_name_or_path distilbert-base-uncased
    --evaluate_during_training --do_train --do_eval --do_lower_case
    --num_train_epochs 256
    --output_dir ./output/
    --logging_steps 10
    --max_seq_length 256
    --train_type 0
    --per_gpu_eval_batch_size 256
    --g_train_batch_size 256
    --s_train_batch_size 256
    --learning_rate 1e-3
    --clf_model cnn
    --meta_learn
    --weak_type "none"
    --multi_head 3
    --use_group_weight
    --group_opt "SGD"
    --group_momentum 0.9
    --group_lr 1e-5

  • RoBERTa Baseline Model

    python run_classifiy.py
    --model_name_or_path roberta-base
    --evaluate_during_training
    --do_train --do_eval --do_lower_case
    --num_train_epochs 30
    --output_dir ./output/
    --logging_steps 10
    --max_seq_length 256
    --train_type 0
    --weak_type most_vote
    --per_gpu_train_batch_size 16
    --per_gpu_eval_batch_size 16
    --learning_rate 5e-5
    --clf_model robert

  • RoBERTa Instance Weight with Multi Head Classification

    python run_classifiy.py
    --model_name_or_path roberta-base
    --evaluate_during_training --do_train --do_eval --do_lower_case
    --num_train_epochs 256
    --output_dir ./output/
    --logging_steps 10
    --max_seq_length 256
    --weak_type most_vote
    --per_gpu_eval_batch_size 16
    --g_train_batch_size 16
    --s_train_batch_size 16
    --learning_rate 5e-5
    --clf_model robert
    --meta_learn
    --weak_type "none"
    --multi_head 3 \

  • RoBERTa Group Weight

    python run_classifiy.py
    --model_name_or_path roberta-base
    --evaluate_during_training --do_train --do_eval --do_lower_case
    --num_train_epochs 256
    --output_dir ./output/
    --logging_steps 10
    --max_seq_length 256
    --weak_type most_vote
    --per_gpu_eval_batch_size 16
    --g_train_batch_size 16
    --s_train_batch_size 16
    --learning_rate 5e-5
    --clf_model robert
    --meta_learn
    --weak_type "none"
    --multi_head 3
    --use_group_weight
    --group_opt "SGD"
    --group_momentum 0.9
    --group_lr 1e-5

  • Finetune on RoBERTa Group Weight

    python3 run_classifiy.py
    --model_name_or_path roberta-base
    --evaluate_during_training --do_train --do_eval
    --num_train_epochs 15
    --output_dir ./output/
    --logging_steps 100
    --max_seq_length 256
    --train_type 0
    --per_gpu_eval_batch_size 16
    --g_train_batch_size 1
    --s_train_batch_size 1
    --clf_model "robert"
    --meta_learn
    --weak_type "none"
    --multi_head 3
    --use_group_net
    --group_opt "adam"
    --train_path "./data/political/weak"
    --eval_path "./data/political/test.csv"
    --fp16
    --fp16_opt_level O1
    --learning_rate "1e-4,5e-4,1e-5,5e-5"
    --group_adam_epsilon "1e-9, 1e-8, 5e-8"
    --group_lr "1e-3,1e-4,3e-4,5e-4,1e-5,5e-5"
    --gold_ratio 0.04

The log information will stored in

~/ray_results/GoldRatio_{}_GroupNet

You can run the following command to extract the best result which is sorted by the average of accuracy and f1.

export LOG_FILE=~/ray_results/GoldRatio_{}_GroupNet
python read_json.py --file_name $LOG_FILE --save_dir ./output

In the meantime, you can visualize the log text by tensorboard

tensorboard --logdir $LOG_FILE

mwss's People

Contributors

microsoftopensource avatar bigheiniu avatar microsoft-github-operations[bot] avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.