Comments (2)
Homework2
http://stackoverflow.com/questions/38728366/pandas-cannot-load-data-csv-encoding-mystery
It seems that there's something very wrong with the input file. There are encoding errors throughout.
One thing you could do, is to read the CSV file as a binary, decode the binary string and replace the erroneous characters.
in_filename = '/Users/chengjun/github/cjc/data/try.txt'
out_filename = '/Users/chengjun/github/cjc/data/try4.txt'
from functools import partial
# chunksize = 100*1024*1024 # read 100MB at a time
# Decode with UTF-8 and replace errors with "?"
with open(in_filename, 'rb') as in_file:
with open(out_filename, 'w') as out_file:
# for byte_fragment in iter(partial(in_file.read, chunksize), b''):
for byte_fragment in iter(partial(in_file.read), b''):
byte_file = byte_fragment.decode(encoding='gb18030', errors='replace')
out_file.write(byte_file.encode('utf8'))
# Now read the repaired file into a dataframe
import pandas as pd
df = pd.read_csv(out_filename, sep = ';')
df.head()
公众号昵称 | 微信号 | 公众号类别 | 作者 | 发布位置 | 是否原创 | 标题 | 文章链接 | 摘要 | 正文 | ... | 更新时间 | Unnamed: 19 | Unnamed: 20 | Unnamed: 21 | Unnamed: 22 | Unnamed: 23 | Unnamed: 24 | Unnamed: 25 | Unnamed: 26 | Unnamed: 27 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | **政府网 | zhengfu | 政务 | NaN | 0 | 0 | 李克强“盯”住农民工欠薪:决不能让他们背井离乡流汗再流泪 | http://mp.weixin.qq.com/s?__biz=MzA4MDA0MzcwMA... | “农民工在外打工非常不易,决不能让他们背井离乡流汗再流泪!”李克强斩钉截铁地说。 | 丨来源:新京报新媒体鲁甸地震受灾群众甘永荣的一句话,让李克强总理的表情立刻凝重起来。“你打工... | ... | 2017-01-27 11:32:16 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | **政府网 | zhengfu | 政务 | NaN | 0 | 0 | 总理对话农民工,问过哪些问题? | http://mp.weixin.qq.com/s?__biz=MzA4MDA0MzcwMA... | 总理考察活动时和农民工聊过什么话题?说过哪些话?**政府网为你一一梳理。 | 总理考察活动时和农民工聊过什么话题?说过哪些话?**政府网为你一一梳理。 总理和农民工聊过这... | ... | 2017-02-02 11:32:48 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | **政府网 | zhengfu | 政务 | NaN | 0 | 0 | 云南考察 | 李克强:农民工欠薪问题必须反复抓、抓到底 | http://mp.weixin.qq.com/s?__biz=MzA4MDA0MzcwMA... | 李克强23日考察灾后重建的云南鲁甸,再三问询围拢人群,有没有没领到工资的农民工?现场陆续有人... | 李克强春节前重回鲁甸李克强23日重回云南鲁甸考察灾后重建。看到这里焕然一新的面貌,总理说,你... | ... | 2017-01-26 13:16:40 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | **政府网 | zhengfu | 政务 | NaN | 0 | 0 | 李克强:决不能让农民工的辛勤付出得不到回报 | http://mp.weixin.qq.com/s?__biz=MzA4MDA0MzcwMA... | 李克强:决不能让农民工的辛勤付出得不到回报 | 2月3日,春节后的首个工作日,国务院召开常务会议,其中议题之一便是部署建立解决农民工工资拖欠... | ... | 2017-02-07 11:38:30 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | **政府网 | zhengfu | 政务 | NaN | 0 | 0 | 48小时!总理帮震区农民工“讨”回欠薪 | http://mp.weixin.qq.com/s?__biz=MzA4MDA0MzcwMA... | 48小时!总理帮震区农民工“讨”回欠薪 | 丨来源:新京报新媒体1月25日早上8点半,甘永荣的银行卡里打进来5.8万元。这是李克强总理帮... | ... | 2017-01-29 11:57:45 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 28 columns
df.shape
(1231, 28)
%matplotlib inline
import matplotlib.pyplot as plt
xi = [1, 2, 3, 4, 5]
y = [3, 5, 9, 13, 16]
plt.plot(xi, y, 'gs')
plt.xlabel('$x_i$', fontsize = 20)
plt.ylabel('$y$', fontsize = 20)
plt.title('$Scatter\,Plot$', fontsize = 20)
plt.show()
from cjc.
-
task 1
-
task2
-
task3
from cjc.
Related Issues (20)
- 申一杨18210130076第一次作业 HOT 6
- Song-Zheyuan HOT 7
- hanshaojun HOT 1
- Luo Taotao HOT 7
- 王雪-18210130089-个人作业 HOT 10
- Guanying LI HOT 5
- XIA YUN HOT 5
- 范雅晨 18210130052 HOT 7
- 夏维兰 HOT 7
- 江婧轩 HOT 6
- Dong Yinghui HOT 8
- 杨颜菲 HOT 6
- 游雁麟
- 小组大作业-郑诗晨&孙佳煜
- 期末小组大作业-范雅晨 申一杨 游雁麟
- 期末小组大作业-王伟哲 王雪 夏蕴 HOT 1
- 期末小组大作业 - 宋哲源 夏维兰 周圆
- 期末作业-杨颜菲 吴荃雁 骆陶陶
- 期末小组大作业_过馨妍 江婧轩 董颖慧
- 夏维兰
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cjc.