Light

frankgu3528 / naive-bayes-classifier-demo Goto Github PK

View Code? Open in Web Editor NEW

0.0 1.0 0.0 18.61 MB

A Naive Bayes Classifier Demo from Scratch

Python 100.00%

naive-bayes-classifier-demo's Introduction

基于朴素贝叶斯的垃圾邮件分类器

数据集

https://plg.uwaterloo.ca/cgi-bin/cgiwrap/gvcormac/foo06 来自uwateloo收集的邮件数据，本代码采用中文邮件版本(trec06c),里面包含了60000多条中文邮件数据，在index里面已经标注好了那些是SPAM(垃圾邮件)或者HAM(正常邮件)。

使用

下载数据集到项目目录，运行PreData(SPAM).py得到预处理好的data目录。将要预测的文本放到text变量中，再运行train.py预测结果。

第一部分词耗时较长，我把第一部预处理好的文件已经放在data/下，可以直接进行第二步。

demo

用下面的语料进行测试，text1被判断成正常邮件，text2为垃圾邮件，符合事实。

text1 = "我是一个外国的留学生\
        我很喜欢**的传统艺术\
        我想跟您学习毛笔字"
text2 = "您好！我深圳宏易实业有限公司。\
    公司在全国各地大城市设有分公司，因进项较多完成不了每月销售额度,\
    现我公司有多余的发票可以低点向贵公司代开，供贵公司财务做帐及抵扣,可以验证后付款。"

naive-bayes-classifier-demo's People

Contributors

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.