Coder Social home page Coder Social logo

stonycat / ml-in-action-code-and-note Goto Github PK

View Code? Open in Web Editor NEW
269.0 21.0 179.0 39.24 MB

:chart_with_upwards_trend:Machine Learning code in Python3.x. (机器学习实战 py3代码整理)Some notes about the practices:(for reference only)

Home Page: http://blog.csdn.net/sinat_17196995

Python 100.00%
machine-learning python3

ml-in-action-code-and-note's Introduction

ML-in-Action-Code-and-Note

Machine Learning code in Python3.x. Some notes about the practices please click here:

ch2:k-近邻算法
ch3:决策树
ch4:基于概率论的分类方法:朴素贝叶斯
ch5:Logistic回归
ch6:支持向量机
ch7:利用Adaboost元算法提高分类性能
ch8:预测数值型数据:回归(线性回归)
ch9:树回归
ch10:K-均值聚类算法
ch11:使用Apriori算法进行关联分析
ch12:使用FP-growth算法来高效发现频繁项集
ch13:利用PCA简化数据
ch14:利用SVD简化数据
ch15:大数据与MapReduce

统一声明:主要参考书中所提供源码 @author: Peter Harrington

ml-in-action-code-and-note's People

Contributors

stonycat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ml-in-action-code-and-note's Issues

a run problem to point out

ch3

def getTreeDepth(myTree):
maxDepth=0
# 下面三行为代码 python3 替换注释的两行代码
firstSides = list(myTree.keys())
firstStr = firstSides[0]
secondDict = myTree[firstStr]
#firstStr=myTree.keys()[0]
#secondDict=myTree[firstStr]#获取划分类别的标签
for key in secondDict.keys():
if type(secondDict[key]) == dict:
thisDepth = 1 + getTreeDepth(secondDict[key])
else:
thisDepth = 1
if thisDepth > maxDepth: maxDepth = thisDepth
return maxDepth

in pycharm it should be format to:

def getTreeDepth(myTree):
maxDepth=0
# 下面三行为代码 python3 替换注释的两行代码
firstSides = list(myTree.keys())
firstStr = firstSides[0]
secondDict = myTree[firstStr]
#firstStr=myTree.keys()[0]
#secondDict=myTree[firstStr]#获取划分类别的标签
for key in secondDict.keys():
if type(secondDict[key]) == dict:
thisDepth = 1 + getTreeDepth(secondDict[key])
else:
thisDepth = 1
if thisDepth > maxDepth: maxDepth = thisDepth
return maxDepth

A bug need to adjust

def getNumleafs(myTree):
	numLeafs =0#初始化节点数
	firstSides = list(myTree.keys())#先把最广的key遍历
	firstStr = firstSides[0]#取第一个key
	secondDict = myTree[firstStr]#取第一个key的子字典

#=>Traceback (most recent call last):
  File "/Users/admin/PycharmProjects/untitled20/机器学习/决策树的实现.py", line 317, in <module>
    createPlot(myDat)
  File "/Users/admin/PycharmProjects/untitled20/机器学习/决策树的实现.py", line 98, in createPlot
    plotTree.totalW = float(getNumleafs(inTree))
  File "/Users/admin/PycharmProjects/untitled20/机器学习/决策树的实现.py", line 18, in getNumleafs
    firstSides = dict(myTree.keys())#先把最广的key遍历
AttributeError: 'list' object has no attribute 'keys'

The problem was that the PY3.5 to PY3.6 had changed the rules within the dict uses

K-means bug fix

#@wheniseeyou K-means
from numpy import *



#step1 load DATASET
def loadDataSet(filename):
	'''

	输出的时候要转为mat
	:param filename:
	:return:
	'''
	dataMat =[]#数据矩阵
	fr = open(filename)
	for line in fr.readlines():
		#拆分
		curLine = line.strip().split('\t')
		fltLine = list(map(float,curLine))
		dataMat.append(fltLine)
	return dataMat

'''
[[1.658985, 4.285136], [-3.453687, 3.424321], [4.838138, -1.151539], [-5.37971
'''
#======test====
#x=loadDataSet('/Users/admin/Desktop/testSet_K.txt')
#step2 calc distance
def distEclud(vecA,vecB):
	return sqrt(sum(power(vecA-vecB,2)))

#初始化K点

def randCent(dataSet,k):
	'''
	这里假设dataSet是list 那么 下面的取最值就要改为下面的方法
	回顾下面的算法 发现:
	其随机zhixing的第二个特诊即y是从已知数据最小到最大的rand值 随机游走
	:param dataSet:
	:param k:
	:return:
	'''
	#取数据集的列
	n = shape(dataSet)[1]
	#创建矩阵块 以k为行数 n为列
	centroids = mat(zeros((k,n)))
	'''
	[[ 0.  0.  0.]
    [ 0.  0.  0.]
    [ 0.  0.  0.]
    [ 0.  0.  0.]]
	'''
	#创造随机质心
	for j in range(n):#对列遍历
		#找到最小的数据集的列的数
		#minJ = min(dataSet[:,j])
		minJ = min(array(dataSet)[:,j])
		rangeJ = float(max(array(dataSet)[:,j])-minJ)
		#rangeJ = float(max(dataSet[:,j])-minJ)#这里是找到每一列的特征的最大值 去减一个ZUIXIAO
		#生成随机质点
		'''
		np.random.rand(3,1)
		#=>
		[[ 0.70504333]
         [ 0.88452734]
         [ 0.31081878]]
		'''
		centroids[:,j] = mat(minJ + rangeJ*random.rand(k,1))#random.rand可以保证出现k个不一样的值
	return centroids

'''
K-均值聚类算法接收4个参数,
两个必要参数为数据集和k的值,
另外两个为距离计算函数和初始化函数(可修改)。
算法采用计算质心-分配-重新计算质心反复迭代的方式,
直到所有点的分配结果不再改变。
设置flag为clusterChange=True。
'''
#K-均值算法:
def KMeans(dataSet,k,distMeas = distEclud,createCent = randCent):
	'''

	:param dataSet: 数据集
	:param k: 你懂的
	:param distMeas:距离计算方式
	:param createCent: k点生成方式
	:return:
	'''
	#取数据集的行数
	m = shape(dataSet)[0]
	#簇群的0矩阵
	clusterAssment = mat(zeros((m,2)))
	'''
	[[ 0.  0.]
    [ 0.  0.]
    [ 0.  0.]
    [ 0.  0.]]
	'''
	#生成随机zhixing
	centroids = createCent(dataSet,k)
	#为了让最后的簇群保持不表
	clusterChanged = True
	while clusterChanged:
		clusterChanged = False
		#对于数据集的每一个点
		for i in range(m):
			minDist =inf#无穷
			minIndex = -1#初始化 类别
			for j in range(k):#对于每一个知心
				distJI = distMeas(centroids[j,:],dataSet[i,:])
				if distJI <minDist:#迭代距离
					minDist = distJI
					#更新最小距离是属于哪个类别的(类别与zhixn一模一样))
					minIndex = j
			#当数据集的簇群点的第一个数字 即类别 不属于j类 则继续迭代
			if clusterAssment[i,0] != minIndex:
				clusterChanged = True
			clusterAssment[i,:] = minIndex,minDist**2
		print(centroids)
		#更新k点
		for cent in range(k):
			'''
			nonzeros(a)返回数组a中值不为零的元素的下标,
			它的返回值是一个长度为a.ndim(数组a的轴数)的元组,
			元组的每个元素都是一个整数数组,其值为非零元素的下标在对应轴上的值。
			dataSet =[
			[1, 1],
			[1, 1],
			[1, 0],
			[0, 1],
			[0, 1]
			]
			print(np.nonzero(dataSet))
			#=>
			(array([0, 0, 1, 1, 2, 3, 4]), array([0, 1, 0, 1, 0, 1, 1]))
			上面是左右对应的 坐标的左边 与右边
			'''
			#找到dataset里面的 属于k类的全体点
			ptsInClust = dataSet[nonzero(clusterAssment[:,0].A == cent)[0]]
			#更新zhixin
			centroids[cent,:] = mean(ptsInClust,axis =0)
	return centroids,clusterAssment


dataMat = loadDataSet('/Users/admin/Desktop/testSet_K.txt')
print(mat(dataMat))

print(KMeans(mat(dataMat),2))

IN PY3.6 if use ur org-code it will run error! TO FIX IT the input must be change th type to MATRIX!!

[[ 1.658985  4.285136]
 [-3.453687  3.424321]
 [ 4.838138 -1.151539]
 [-5.379713 -3.362104]
 [ 0.972564  2.924086]
 [-3.567919  1.531611]
 [ 0.450614 -3.302219]
 [-3.487105 -1.724432]
 [ 2.668759  1.594842]
 [-3.156485  3.191137]
 [ 3.165506 -3.999838]
 [-2.786837 -3.099354]
 [ 4.208187  2.984927]
 [-2.123337  2.943366]
 [ 0.704199 -0.479481]
 [-0.39237  -3.963704]
 [ 2.831667  1.574018]
 [-0.790153  3.343144]
 [ 2.943496 -3.357075]
 [-3.195883 -2.283926]
 [ 2.336445  2.875106]
 [-1.786345  2.554248]
 [ 2.190101 -1.90602 ]
 [-3.403367 -2.778288]
 [ 1.778124  3.880832]
 [-1.688346  2.230267]
 [ 2.592976 -2.054368]
 [-4.007257 -3.207066]
 [ 2.257734  3.387564]
 [-2.679011  0.785119]
 [ 0.939512 -4.023563]
 [-3.674424 -2.261084]
 [ 2.046259  2.735279]
 [-3.18947   1.780269]
 [ 4.372646 -0.822248]
 [-2.579316 -3.497576]
 [ 1.889034  5.1904  ]
 [-0.798747  2.185588]
 [ 2.83652  -2.658556]
 [-3.837877 -3.253815]
 [ 2.096701  3.886007]
 [-2.709034  2.923887]
 [ 3.367037 -3.184789]
 [-2.121479 -4.232586]
 [ 2.329546  3.179764]
 [-3.284816  3.273099]
 [ 3.091414 -3.815232]
 [-3.762093 -2.432191]
 [ 3.542056  2.778832]
 [-1.736822  4.241041]
 [ 2.127073 -2.98368 ]
 [-4.323818 -3.938116]
 [ 3.792121  5.135768]
 [-4.786473  3.358547]
 [ 2.624081 -3.260715]
 [-4.009299 -2.978115]
 [ 2.493525  1.96371 ]
 [-2.513661  2.642162]
 [ 1.864375 -3.176309]
 [-3.171184 -3.572452]
 [ 2.89422   2.489128]
 [-2.562539  2.884438]
 [ 3.491078 -3.947487]
 [-2.565729 -2.012114]
 [ 3.332948  3.983102]
 [-1.616805  3.573188]
 [ 2.280615 -2.559444]
 [-2.651229 -3.103198]
 [ 2.321395  3.154987]
 [-1.685703  2.939697]
 [ 3.031012 -3.620252]
 [-4.599622 -2.185829]
 [ 4.196223  1.126677]
 [-2.133863  3.093686]
 [ 4.668892 -2.562705]
 [-2.793241 -2.149706]
 [ 2.884105  3.043438]
 [-2.967647  2.848696]
 [ 4.479332 -1.764772]
 [-4.905566 -2.91107 ]]
[[ 0.83986288 -1.83976116]
 [-4.0469936  -1.40467133]]
[[ 2.47925453  0.21182184]
 [-3.10532438 -0.12876332]]
[[ 2.71473038  0.18858278]
 [-2.9219568  -0.07998038]]
(matrix([[ 2.71473038,  0.18858278],
        [-2.9219568 , -0.07998038]]), matrix([[  0.        ,  17.89634662],
        [  1.        ,  12.56286513],
        [  0.        ,   6.30478631],
        [  1.        ,  16.81290103],
        [  0.        ,  10.51812157],
        [  1.        ,   3.01449392],
        [  0.        ,  17.31191999],
        [  1.        ,   3.02361363],
        [  0.        ,   1.97967838],
        [  1.        ,  10.75521236],
        [  0.        ,  17.74606725],
        [  1.        ,   9.13487445],
        [  0.        ,  10.04995372],
        [  1.        ,   9.77841689],
        [  0.        ,   4.48854562],
        [  1.        ,  21.48211857],
        [  0.        ,   1.93310494],
        [  1.        ,  16.26236793],
        [  0.        ,  12.62402277],
        [  1.        ,   4.93241188],
        [  0.        ,   7.36050686],
        [  1.        ,   8.22877329],
        [  0.        ,   4.66259677],
        [  1.        ,   7.51261982],
        [  0.        ,  14.50993584],
        [  1.        ,   6.85903854],
        [  0.        ,   5.04565231],
        [  1.        ,  10.95654103],
        [  0.        ,  10.44232656],
        [  1.        ,   0.80741959],
        [  0.        ,  20.89357231],
        [  1.        ,   5.32341991],
        [  0.        ,   6.93251564],
        [  1.        ,   3.53209105],
        [  0.        ,   3.77046308],
        [  1.        ,  11.79736257],
        [  0.        ,  25.69995006],
        [  1.        ,   9.64081992],
        [  0.        ,   8.12103192],
        [  1.        ,  10.91213604],
        [  0.        ,  14.05290621],
        [  1.        ,   9.06855533],
        [  0.        ,  11.80514107],
        [  1.        ,  17.88489819],
        [  0.        ,   9.09553212],
        [  1.        ,  11.37480809],
        [  0.        ,  16.17242331],
        [  1.        ,   6.23872366],
        [  0.        ,   7.39385874],
        [  1.        ,  20.07577022],
        [  0.        ,  10.4085923 ],
        [  1.        ,  16.85042532],
        [  0.        ,  25.63541221],
        [  1.        ,  15.29989117],
        [  0.        ,  11.90587245],
        [  1.        ,   9.58149736],
        [  0.        ,   3.20000848],
        [  1.        ,   7.57676457],
        [  0.        ,  12.04560092],
        [  1.        ,  12.25947225],
        [  0.        ,   5.32472486],
        [  1.        ,   8.91695746],
        [  0.        ,  17.70978882],
        [  1.        ,   3.86003859],
        [  0.        ,  14.78056918],
        [  1.        ,  15.0490604 ],
        [  0.        ,   7.74010731],
        [  1.        ,   9.21313835],
        [  0.        ,   8.95426674],
        [  1.        ,  10.64677491],
        [  0.        ,  14.60725641],
        [  1.        ,   7.24915895],
        [  0.        ,   3.07484117],
        [  1.        ,  10.6932501 ],
        [  0.        ,  11.38833208],
        [  1.        ,   4.30033192],
        [  0.        ,   8.17888612],
        [  1.        ,   8.5792329 ],
        [  0.        ,   6.92941377],
        [  1.        ,  11.94977392]]))

Process finished with exit code 0

gtsam IMUKITTIExampleGPS data format description

Hello, nice to meet you, and thank you very much for providing such a useful example for new users of gtsam. About the program (imukittiexamplegps. M), I have successfully run it. Now I want to test it with different data. As a first step, I try to generate data in the same format from Kitti raw data. I want to know   Kittiequivbiasedimu.txt and attached   Kittigps_converted.txt what do these two data components represent respectively, such as attached   What are the meanings of X, y and Z in kittigps_converted.txt? What are the meanings of X, y and Z in kittiequivbiasedimu.txt. Looking forward to your reply!!!
您好,很高兴见到您,非常感谢您为 gts am 的新用户提供了这样一个有用的示例。关于程序( imu kitti example gps 。米),我已经成功运行了。现在我想用不同的数据来检验。作为第一步,我尝试从 Kitti 原始数据生成相同格式的数据。我想知道 Kitti equiv bia sedimum . txt 和附件中的 Kitt i gps _ convorted . txt 这两个数据分量分别代表什么,如附件中 kitt iq ps _ Convented . txt 中的 X , y , Z 是什么意思?kitti equiv biased imu . txt 中的 X 、 y 、 Z 是什么意思?期待您的回复!!!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.