Coder Social home page Coder Social logo

brand-safety-project's Introduction

Brand Safety

Setup project:

make setup

Install some requirements

Prepare data:

Step 1

make gen_links
  • Search article links by 50 keywords for each label (accident or ordinary)
  • Get 10 news links on each keyword
  • --> 500 accident links + 500 ordinary links

Step 2

make scrape_pages
  • Crawl 1000 links and get content from that
  • For each content, extract nouns from them and save to txt data

Step 3

make gen_csv
  • Choose 3000 most common nouns on each label -> 6000 nouns
  • Make a csv file with 6000 nouns as field names, 1000 rows as 1000 pages
  • Each row contains frequencies of 6000 nouns on each page

Train:

make train
  • Read the csv dataset from the previous stage
  • Create random forest classifier

Predict the test set:

make predict_test_set
  • Apply the classifier and predict result on test set
  • Print out confusion matrix

Predict a specific url:

make predict_specific_url url=your_url
  • Input url and get the result (is accident or not)

brand-safety-project's People

Contributors

limmiehoang avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.