Coder Social home page Coder Social logo

machine_learning_project's Introduction

User Guide

Part1 : Code Guide

  1. We have two datasets: '100Votelevel_touse.dta' and '100CASELEVEL_Touse.dta'. '100Votelevel_touse.dta' is used to extract case information and judge information.

  2. data cleaning

  • Change the path to the original file '100Votelevel_touse.dta' in 'clean_data.py'.

  • In tarminal, type in

    $ python clean_data.py
  • The output is 'votelevel_cleaned.csv'.

  1. Target Generation
  • Change the path to 'votelevel_cleaned.csv' in 'target_generation.py'.

  • To generate the target which indicates the agreement between 2 judges,

    python target_generation.py
  • The output is 'case_pro_list.p' and 'case_dict.p'.

  • Hand coding part: In 'case_pro_list.p' there are cases where the judges' names are in bad format so we need to hand code this part to get the target for the case. It contains about 1000 bad entries.

  • After handcoing, type in

    python get_target_csv.py

    Then we will get the .csv file for target 'target.csv'.

  1. MapReduce
  • Run the MapReduce code locally or on Hadoop to generate the dataset we need, in a format of 'case_information + judge1's_information + judge2's_information + inter+information + target'.

    cat xxx.csv | python map_interact.py | sort -n | python reduce_interact.py > output1
    cat xxx.csv | python map_sit.py | sort -n | python reduce_sit.py > output2
    cat xxx.csv | python map_name_date.py | sort -n | python reduce_name_date.py > output3
  • Then, type in

    python af_mapreduce.py
  • The output is 'data_mapreduce_clean.csv'.

  1. Model Fitting
  • Change the path to the file '0504_normalize_data.csv'

  • In terminal, type in

    python fit_model.py
  • Output: 'output_plot_auc.csv' for AUC plotting.

  1. F-test
  • Change the path to the file '0504_normalize_data.csv'.

  • In tarminal, type in

    $ python f_test.py
  • Output: two list: one contains the name for the column and the other contains the corresponding f-value for the column.

Part2 : Poster

  • The poster is made by Latex. The template is from ShareLatex. Final version poster is available.

Part3 : Report

machine_learning_project's People

Contributors

shanglanyu avatar jz2327 avatar xc918 avatar

Watchers

 avatar  avatar  avatar

Forkers

shanglanyu

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.