700字范文,内容丰富有趣,生活中的好帮手!
700字范文 > 《机器学习实战》支持向量机(手稿+代码)

《机器学习实战》支持向量机(手稿+代码)

时间:2019-09-26 14:41:55

相关推荐

《机器学习实战》支持向量机(手稿+代码)

注释:已经看过有半年了,而且当时只是看了理论和opencv的函数调用,现在忘的九霄云外了,Ng的视频也没看SVM,现在准备系统的再学一遍吧。

之前记录的博客:/wjy-lulu/p/6979436.html

一.SVM理论重点和难点剖析

注释:这里主要讲解公式推导和证明的重难点,不是按部就班的讲解SVM的求解过程,算是对推导过程的补充吧!

一点未接触过SVM的请看大神的博客:

/c406495762/article/details/78072313

/jerrylead/archive//03/13/1982684.html

/zouxy09/article/details/17291543

/?p=685

/v_july_v/article/details/7624837

1.1点到直线距离的由来

我们先讨论点到平面的距离,由此推广到点到直线和点到超平面的距离公式。

点到平面公式推导

SVM公式推导一

SVM公式推导二

1.2拉格朗日对偶问题

用于求解带条件的最优化问题,其实到最后你就明白了SVM从头到尾最主要做的就是如何高效的求解目标值。而其它的学习算法做的都是对数据的求解优化问题,这点是SVM和其它算法根本的区别。

原始问题

对偶问题一

对偶问题2

原始问题和对偶问题的关系

KKT条件

SVM公式推导三

SVM公式推导四

1.3核函数的推导

目的:1.处理线性不可分的情况。2.求解方便。

过程:二维情况的不可分割,就映射到三维、四维....等高维空间去分割。

通俗解释:/question/24627666 知乎大神开始装逼的时刻了。

理论部分:如下公式推导.......

核函数引出一

1.4松弛变量的引入

目的:防止部分异常点的干扰。

原理:和其它算法引入惩罚系数一样的,允许有异常点但是要接受惩罚。比如:异常的点肯定都是偏离群体的点,既然偏离群体,那么它的值就为负数且绝对值愈大惩罚程度越大。

具体推导:见下文......

松弛变量的引入

1.5.SMO算法

SMO算法一

SMO算法二

注释:后面还有参数如何最优选择,有点看不懂而且也有点不想看了,干脆从下面的代码去分析SMO的具体过程吧!

二.程序实现

代码实现强烈推荐:/c406495762/article/details/78072313

给了程序伪代码很详细,程序读起来很方便。

2.1.SMO实现

2.1.1简化版SMO

简化版:针对理论中“SMO”的最后一句话,最优选择的问题!简化版是随机选择,选择上不做优化。

1 import numpy as np2 import matplotlib.pyplot as plt3 4 #预处理数据5 def loadDataSet(fileName):6dataMat = []7labelMat = []8fr = open(fileName,'r')9for line in fr.readlines():10 lineArr = line.strip().split('\t')11 dataMat.append([float(lineArr[0]),float(lineArr[1])])12 labelMat.append(float(lineArr[2]))13a = np.mat(dataMat)14b = np.mat(labelMat).transpose()15DataLabel = np.array(np.hstack((a,b)))16return dataMat, labelMat, DataLabel17 #随机18 def selectJrand(i,m):19j = i20while(j==i):21 j = int(np.random.uniform(0,m))22return j23 #约束之后的aj24 def clipAlpha(aj,H,L):25if aj>H:26 aj = H27elif aj<L:28 aj = L29return aj30 #C:松弛系数31 #maxIter:迭代次数32 def smoSimple(dataMatIn,classLabels,C,toler,maxIter):33dataMatraix = np.mat(dataMatIn)34labelMatraix = np.mat(classLabels).transpose()35b =036m,n = dataMatraix.shape37alphas = np.mat(np.zeros((m,1)))#系数都初始化为038iter = 039while(iter<maxIter):#大循环是最大迭代次数40 alphaPairsChange = 041 for i in range(m):#样本训练42 #预测值,具体看博客手写图片43 fXi = float(np.multiply(alphas,labelMatraix).T*(dataMatraix*dataMatraix[i,:].T))+b44 #误差45 Ei = fXi - float(labelMatraix[i])46 #满足条件才能更新a,b的值,感觉这里的labelMat[i]不影响结果47 if(((labelMatraix[i]*Ei<-toler) and (alphas[i]<C))\48 or ((labelMatraix[i]*Ei)>toler) and (alphas[i]>0)):49 j = selectJrand(i,m)#随机选择一个不同于i的[0,m]50 fXj = float(np.multiply(alphas,labelMatraix).transpose()\51 *(dataMatraix*dataMatraix[j,:].T))+b52 Ej = fXj - float(labelMatraix[j])53 alphaIold = alphas[i].copy()54 alphaJold = alphas[j].copy()55 if (labelMatraix[i] != labelMatraix[j]):56 L = max(0,alphas[j]-alphas[i])57 H = min(C,C+alphas[j]-alphas[i])58 else:59 L = max(0,alphas[i]+alphas[j]-C)60 H = min(C,alphas[i]+alphas[j])61 if (L==H):62 print('L==H')63 continue64 #计算65 eta = 2.0*dataMatraix[i,:]*dataMatraix[j,:].T\66- dataMatraix[i,:]*dataMatraix[i,:].T\67- dataMatraix[j,:]*dataMatraix[j,:].T68 if (eta>0):69 print("eta>0")70 continue71 #更新a的新值j72 alphas[j] -= labelMatraix[j]*(Ei - Ej)/eta73 #修剪aj74 alphas[j] = clipAlpha(alphas[j],H,L)75 if (abs(alphas[j] - alphaJold) < 0.00001):76 print("aj not moving")77 #更新ai78 alphas[i] += labelMatraix[j]*labelMatraix[i]*(alphaJold - alphas[j])79 #更新b1,b280 b1 = b - Ei - labelMatraix[i]*(alphas[i]-alphaIold)*dataMatraix[i,:]\81 *dataMatraix[i,:].T - labelMatraix[j]*(alphas[j]-alphaJold)*dataMatraix[i,:]*dataMatraix[j,:].T82 b2 = b - Ej - labelMatraix[i]*(alphas[i] - alphaIold)*dataMatraix[i,:]\83 * dataMatraix[j, :].T - labelMatraix[j]*(alphas[j] - alphaJold) * dataMatraix[j,:] * dataMatraix[j,:].T84 #通过b1和b2计算b85 if (0< alphas[i] <C): b = b186 elif (0<alphas[j]<C): b = b287 else: b = (b1+b2)/288 #计算更新次数,用来判断是否训练数据是否全部达标89 alphaPairsChange += 190 print("iter: %d i:%d pairs:%d "%(iter,i,alphaPairsChange))91 #连续的达标次数92 if(alphaPairsChange==0): iter +=193 else: iter = 094 print("iteration number: %d"%(iter))95return b, alphas

2.1.2效果图

main.py文件

1 import svm2 import matplotlib.pyplot as plt3 import numpy as np4 5 if __name__ == '__main__':6fig = plt.figure()7axis = fig.add_subplot(111)8dataMat, labelMat,DataLabel= svm.loadDataSet("testSet.txt")9#b, alphas = svm.smoSimple(dataMat,labelMat,0.6,0.001,40)10#ws = svm.calsWs(alphas,dataMat,labelMat)11pData0 = [0,0,0]12pData1 = [0,0,0]13for hLData in DataLabel:14 if (hLData[-1]==1):pData0 = np.vstack((pData0,hLData))15 elif(hLData[-1]==-1):pData1 = np.vstack((pData1,hLData))16 else:continue17vmax = np.max(pData0[:,0:1])18vmin = np.min(pData0[:,0:1])19axis.scatter(pData0[:,0:1],pData0[:,1:2],marker = 'v')20axis.scatter(pData1[:,0:1],pData1[:,1:2],marker = 's')21xdata = np.random.uniform(2.0,8.0,[1,20])22ydata = xdata*(0.81445/0.27279) - (3.837/0.27279)23axis.plot(xdata.tolist()[0],ydata.tolist()[0],'r')2425fig.show()2627#print("alphas = ",alphas[alphas>0])

View Code

svm.py文件

1 import numpy as np2 import matplotlib.pyplot as plt3 4 #预处理数据5 def loadDataSet(fileName):6dataMat = []7labelMat = []8fr = open(fileName,'r')9for line in fr.readlines():10 lineArr = line.strip().split('\t')11 dataMat.append([float(lineArr[0]),float(lineArr[1])])12 labelMat.append(float(lineArr[2]))13a = np.mat(dataMat)14b = np.mat(labelMat).transpose()15DataLabel = np.array(np.hstack((a,b)))16return dataMat, labelMat, DataLabel17 #随机18 def selectJrand(i,m):19j = i20while(j==i):21 j = int(np.random.uniform(0,m))22return j23 #约束之后的aj24 def clipAlpha(aj,H,L):25if aj>H:26 aj = H27elif aj<L:28 aj = L29return aj30 #C:松弛系数31 #maxIter:迭代次数32 def smoSimple(dataMatIn,classLabels,C,toler,maxIter):33dataMatraix = np.mat(dataMatIn)34labelMatraix = np.mat(classLabels).transpose()35b =036m,n = dataMatraix.shape37alphas = np.mat(np.zeros((m,1)))#系数都初始化为038iter = 039while(iter<maxIter):#大循环是最大迭代次数40 alphaPairsChange = 041 for i in range(m):#样本训练42 #预测值,具体看博客手写图片43 fXi = float(np.multiply(alphas,labelMatraix).T*(dataMatraix*dataMatraix[i,:].T))+b44 #误差45 Ei = fXi - float(labelMatraix[i])46 #满足条件才能更新a,b的值,感觉这里的labelMat[i]不影响结果47 if(((labelMatraix[i]*Ei<-toler) and (alphas[i]<C))\48 or ((labelMatraix[i]*Ei)>toler) and (alphas[i]>0)):49 j = selectJrand(i,m)#随机选择一个不同于i的[0,m]50 fXj = float(np.multiply(alphas,labelMatraix).transpose()\51 *(dataMatraix*dataMatraix[j,:].T))+b52 Ej = fXj - float(labelMatraix[j])53 alphaIold = alphas[i].copy()54 alphaJold = alphas[j].copy()55 if (labelMatraix[i] != labelMatraix[j]):56 L = max(0,alphas[j]-alphas[i])57 H = min(C,C+alphas[j]-alphas[i])58 else:59 L = max(0,alphas[i]+alphas[j]-C)60 H = min(C,alphas[i]+alphas[j])61 if (L==H):62 print('L==H')63 continue64 #计算65 eta = 2.0*dataMatraix[i,:]*dataMatraix[j,:].T\66- dataMatraix[i,:]*dataMatraix[i,:].T\67- dataMatraix[j,:]*dataMatraix[j,:].T68 if (eta>0):69 print("eta>0")70 continue71 #更新a的新值j72 alphas[j] -= labelMatraix[j]*(Ei - Ej)/eta73 #修剪aj74 alphas[j] = clipAlpha(alphas[j],H,L)75 if (abs(alphas[j] - alphaJold) < 0.00001):76 print("aj not moving")77 #更新ai78 alphas[i] += labelMatraix[j]*labelMatraix[i]*(alphaJold - alphas[j])79 #更新b1,b280 b1 = b - Ei - labelMatraix[i]*(alphas[i]-alphaIold)*dataMatraix[i,:]\81 *dataMatraix[i,:].T - labelMatraix[j]*(alphas[j]-alphaJold)*dataMatraix[i,:]*dataMatraix[j,:].T82 b2 = b - Ej - labelMatraix[i]*(alphas[i] - alphaIold)*dataMatraix[i,:]\83 * dataMatraix[j, :].T - labelMatraix[j]*(alphas[j] - alphaJold) * dataMatraix[j,:] * dataMatraix[j,:].T84 #通过b1和b2计算b85 if (0< alphas[i] <C): b = b186 elif (0<alphas[j]<C): b = b287 else: b = (b1+b2)/288 #计算更新次数,用来判断是否训练数据是否全部达标89 alphaPairsChange += 190 print("iter: %d i:%d pairs:%d "%(iter,i,alphaPairsChange))91 #连续的达标次数92 if(alphaPairsChange==0): iter +=193 else: iter = 094 print("iteration number: %d"%(iter))95return b, alphas96 97 #分类函数98 def calsWs(alphas,dataArr,classLabels):99X = np.mat(dataArr)100labelMat = np.mat(classLabels).T101alphas = np.mat(np.array(alphas).reshape((1,100))).T102m, n = X.shape103w = np.zeros([n,1])104for i in range(m):105 w += np.multiply(alphas[i]*labelMat[i],X[i,:].T)106return w

View Code

2.1.3非线性分类(核向量)

程序代码:

1 import numpy as np2 import matplotlib.pyplot as plt3 4 #预处理数据5 def loadDataSet(fileName):6dataMat = []7labelMat = []8fr = open(fileName,'r')9for line in fr.readlines():10 lineArr = line.strip().split('\t')11 dataMat.append([float(lineArr[0]),float(lineArr[1])])12 labelMat.append(float(lineArr[2]))13a = np.mat(dataMat)14b = np.mat(labelMat).T15DataLabel = np.array(np.hstack((a,b)))16return dataMat, labelMat, DataLabel17 #随机18 def selectJrand(i,m):19j = i20while(j==i):21 j = int(np.random.uniform(0,m))22return j23 #约束之后的aj24 def clipAlpha(aj,H,L):25if aj>H:26 aj = H27elif aj<L:28 aj = L29return aj30 #C:松弛系数31 #maxIter:迭代次数32 def smselfimple(dataMatIn,classLabels,C,toler,maxIter):33dataMatraix = np.mat(dataMatIn)34labelMatraix = np.mat(classLabels).transpselfe()35b =036m,n = dataMatraix.shape37alphas = np.mat(np.zerself((m,1)))#系数都初始化为038iter = 039while(iter<maxIter):#大循环是最大迭代次数40 alphaPairsChange = 041 for i in range(m):#样本训练42 #预测值,具体看博客手写图片43 fXi = float(np.multiply(alphas,labelMatraix).T*(dataMatraix*dataMatraix[i,:].T))+b44 #误差45 Ei = fXi - float(labelMatraix[i])46 #满足条件才能更新a,b的值,感觉这里的labelMat[i]不影响结果47 if(((labelMatraix[i]*Ei<-toler) and (alphas[i]<C))\48 or ((labelMatraix[i]*Ei)>toler) and (alphas[i]>0)):49 j = selectJrand(i,m)#随机选择一个不同于i的[0,m]50 fXj = float(np.multiply(alphas,labelMatraix).transpselfe()\51 *(dataMatraix*dataMatraix[j,:].T))+b52 Ej = fXj - float(labelMatraix[j])53 alphaIold = alphas[i].copy()54 alphaJold = alphas[j].copy()55 if (labelMatraix[i] != labelMatraix[j]):56 L = max(0,alphas[j]-alphas[i])57 H = min(C,C+alphas[j]-alphas[i])58 else:59 L = max(0,alphas[i]+alphas[j]-C)60 H = min(C,alphas[i]+alphas[j])61 if (L==H):62 print('L==H')63 continue64 #计算65 eta = 2.0*dataMatraix[i,:]*dataMatraix[j,:].T\66- dataMatraix[i,:]*dataMatraix[i,:].T\67- dataMatraix[j,:]*dataMatraix[j,:].T68 if (eta>0):69 print("eta>0")70 continue71 #更新a的新值j72 alphas[j] -= labelMatraix[j]*(Ei - Ej)/eta73 #修剪aj74 alphas[j] = clipAlpha(alphas[j],H,L)75 if (abs(alphas[j] - alphaJold) < 0.00001):76 print("aj not moving")77 #更新ai78 alphas[i] += labelMatraix[j]*labelMatraix[i]*(alphaJold - alphas[j])79 #更新b1,b280 b1 = b - Ei - labelMatraix[i]*(alphas[i]-alphaIold)*dataMatraix[i,:]\81 *dataMatraix[i,:].T - labelMatraix[j]*(alphas[j]-alphaJold)*dataMatraix[i,:]*dataMatraix[j,:].T82 b2 = b - Ej - labelMatraix[i]*(alphas[i] - alphaIold)*dataMatraix[i,:]\83 * dataMatraix[j, :].T - labelMatraix[j]*(alphas[j] - alphaJold) * dataMatraix[j,:] * dataMatraix[j,:].T84 #通过b1和b2计算b85 if (0< alphas[i] <C): b = b186 elif (0<alphas[j]<C): b = b287 else: b = (b1+b2)/288 #计算更新次数,用来判断是否训练数据是否全部达标89 alphaPairsChange += 190 print("iter: %d i:%d pairs:%d "%(iter,i,alphaPairsChange))91 #连续的达标次数92 if(alphaPairsChange==0): iter +=193 else: iter = 094 print("iteration number: %d"%(iter))95return b, alphas96 97 #分类函数98 def calsWs(alphas,dataArr,classLabels):99X = np.mat(dataArr)100labelMat = np.mat(classLabels).T101alphas = np.mat(np.array(alphas).reshape((1,100))).T102w = np.sum(np.multiply(np.multiply(alphas,labelMat),X),axis=0)103return w104 105 #完整版SMO106 class optStruct:107def __init__(self,dataMatIn,classLabel,C,toler,kTup):108 self.X = dataMatIn109 self.labelMat = classLabel110 self.C = C111 self.tol = toler112 self.m = np.shape(dataMatIn)[0]113 self.alphas = np.mat(np.zeros((self.m,1)))114 self.b = 0115 self.eCache = np.mat(np.zeros((self.m,2)))#存储误差,用于启发式搜索116 self.K = np.mat(np.zeros((self.m,self.m)))#存储核函数转化后的dataMatIn117 for i in range(self.m):118 self.K[:,i] = kernelTrans(self.X,self.X[i,:],kTup)119def clacEk(self,k):120 #计算当前值121 fXk = float(np.multiply(self.alphas,self.labelMat).T*self.K[:,k]+ self.b)122 Ek = fXk - float(self.labelMat[k])123 return Ek124def selecJ(self,i,Ei):125 maxK = -1126 maxDeltaE = 0127 Ej = 0128 self.eCache[i] = [1,Ei]#存储偏差129 #A代表mat转换成array类型,nonzero返回非零元素的下标130 validEcacheList = np.nonzero(self.eCache[:,0].A)[0]131 if (len(validEcacheList)>1):#启发式选择132 for k in validEcacheList:133 if (k==i):continue#k不能等于i134 Ek = self.clacEk(k)#计算绝对偏差135 deltaE = abs(Ei - Ek)#相对偏差136 if (deltaE >maxDeltaE):137 maxK = k138 maxDeltaE = deltaE139 Ej = Ek140 return maxK, Ej141 142 else:143 j = selectJrand(i,self.m)#随机选择144 Ej = self.clacEk(j)#随机绝对偏差145 return j,Ej146def updataEk(self,k):147 Ek = self.clacEk(k)148 self.eCache[k] = [k,Ek]149def innerL(self,i):150 Ei = self.clacEk(i)151 if ((self.labelMat[i]*Ei<-self.tol and self.alphas[i]<self.C)\152 or(self.labelMat[i]*Ei>self.tol and self.alphas[i]>0)):153 j,Ej = self.selecJ(i,Ei)#选择J154 alphaIold = self.alphas[i].copy()155 alphaJold = self.alphas[j].copy()156 #计算L和H的值157 if (self.labelMat[i] != self.labelMat[j]):158 L = max(0,self.alphas[j]-self.alphas[i])159 H = min(self.C,self.C+self.alphas[j]-self.alphas[i])160 else:161 L = max(0,self.alphas[j]+self.alphas[i]-self.C)162 H = min(self.C,self.alphas[i] +self.alphas[j])163 if (L==H): return 0164 #eta = 2.0* self.X[i,:]*self.X[j,:].T - self.X[i,:]*self.X[i,:].T - \165 # self.X[j,:]*self.X[j,:].T166 eta = 2.0*self.K[i,j] - self.K[i,i] - self.K[j,j]#在此应用核函数167 if (eta>=0): return 0168 self.alphas[j] -= self.labelMat[j] * (Ei - Ej)/eta169 self.alphas[j] = clipAlpha(self.alphas[j],H,L)170 #更新新出现的aj171 self.updataEk(j)172 if (abs(self.alphas[j] - alphaJold)<0.00001):173 print('J not move')174 self.alphas[i] += self.labelMat[j]*self.labelMat[i]*(alphaJold-self.alphas[j])175 self.updataEk(i)#更新新出现的ai176 b1 = self.b - Ei - self.labelMat[i] *(self.alphas[i] - alphaIold)*\177 self.K[i,i] - self.labelMat[j]*(self.alphas[j]-alphaJold)*self.K[i,j]178 b2 = self.b - Ej - self.labelMat[i] *(self.alphas[i] - alphaIold)*\179 self.K[i,j] - self.labelMat[j]*(self.alphas[j]-alphaJold)*self.K[j,j]180 if (self.alphas[i]>0 and self.alphas[i]<self.C): self.b = b1181 elif (self.alphas[j]>0 and self.alphas[j]<self.C): self.b = b2182 else:self.b = (b1+b2)/2.0183 return 1184 else:185 return 0186 187 def smoP(dataMatIn, classLabels, C, toler, maxIter, kTup): # full Platt SMO188self = optStruct(np.mat(dataMatIn), np.mat(classLabels).transpose(), C, toler, kTup)189iter = 0190entireSet = True;191alphaPairsChanged = 0192while (iter < maxIter) and ((alphaPairsChanged > 0) or (entireSet)):193 alphaPairsChanged = 0194 if entireSet: # go over all195 for i in range(self.m):196 alphaPairsChanged += self.innerL(i)197 print('fullSet iter:%d, i=%d, pair change:%d' % (iter, i, alphaPairsChanged))198 iter += 1199 else: # go over non-bound (railed) alphas200 nonBoundIs = np.nonzero((self.alphas.A > 0) * (self.alphas.A < C))[0]201 for i in nonBoundIs:202 alphaPairsChanged += self.innerL(i)203 print('nonBound iter:%d, i=%d, pair change:%d' % (iter, i, alphaPairsChanged))204 iter += 1205 if entireSet:206 entireSet = False # toggle entire set loop207 elif (alphaPairsChanged == 0):208 entireSet = True209 print("iteration number: %d" % iter)210 return self.b, self.alphas211 #生成核函数,注意这里核函数的计算公式,见博文对其进行说明!212 def kernelTrans(X,A,kTup):213m,n = X.shape214K = np.mat(np.zeros([m,1]))215if (kTup[0]=='lin'):K = X*A.T216elif(kTup[0]=='rbf'):217 for j in range(m):218 deltaRow = X[j,:] - A219 K[j] = deltaRow * deltaRow.T220 K = np.exp(K/(-1*kTup[1]**2))221else:raise NameError('Houston We have a problem')222return K223 def testRbf(k1 = 1.3):224dataArr, labelArr, dataLbel = loadDataSet('testSetRBF.txt')225b, alphas = smoP(dataArr,labelArr,200,0.0001,10000,('rbf',k1))226dataMat = np.mat(dataArr)227labelMat = np.mat(labelArr).T228svInd = np.nonzero(alphas.A>0)[0]#支持向量a229sVs = dataMat[svInd]#支持向量X230labelSV = labelMat[svInd]231print("There are %d Support Vector"%(sVs.shape[0]))232m, n = dataMat.shape233errorCount = 0234for i in range(m):235 kernelEval = kernelTrans(sVs,dataMat[i,:],('rbf',k1))236 predict = kernelEval.T * np.multiply(labelSV,alphas[svInd])+b237 if(np.sign(predict)!=np.sign(labelArr[i])):errorCount+=1238print("The training error is: %f"%(float(errorCount/m)))239dataArr, labelArr, datalabel = loadDataSet('testSetRBF2.txt')240errorCount = 0241dataMat = np.mat(dataArr)242labelMat = np.mat(labelArr).T243m, n = dataMat.shape244for i in range(m):245 kernelEval = kernelTrans(sVs, dataMat[i,:], ('rbf', k1))246 predict = kernelEval.T * np.multiply(labelSV, alphas[svInd]) + b247 if (np.sign(predict) != np.sign(labelArr[i])): errorCount += 1248print("The test error is: %f" %(float(errorCount / m)))249 250 if __name__ == '__main__':251 252testRbf(1.3)

2.1.4手写数字识别测试

1 import numpy as np2 import matplotlib.pyplot as plt3 4 #预处理数据5 def loadDataSet(fileName):6dataMat = []7labelMat = []8fr = open(fileName,'r')9for line in fr.readlines():10 lineArr = line.strip().split('\t')11 dataMat.append([float(lineArr[0]),float(lineArr[1])])12 labelMat.append(float(lineArr[2]))13a = np.mat(dataMat)14b = np.mat(labelMat).T15DataLabel = np.array(np.hstack((a,b)))16return dataMat, labelMat, DataLabel17 #随机18 def selectJrand(i,m):19j = i20while(j==i):21 j = int(np.random.uniform(0,m))22return j23 #约束之后的aj24 def clipAlpha(aj,H,L):25if aj>H:26 aj = H27elif aj<L:28 aj = L29return aj30 #C:松弛系数31 #maxIter:迭代次数32 def smselfimple(dataMatIn,classLabels,C,toler,maxIter):33dataMatraix = np.mat(dataMatIn)34labelMatraix = np.mat(classLabels).transpselfe()35b =036m,n = dataMatraix.shape37alphas = np.mat(np.zerself((m,1)))#系数都初始化为038iter = 039while(iter<maxIter):#大循环是最大迭代次数40 alphaPairsChange = 041 for i in range(m):#样本训练42 #预测值,具体看博客手写图片43 fXi = float(np.multiply(alphas,labelMatraix).T*(dataMatraix*dataMatraix[i,:].T))+b44 #误差45 Ei = fXi - float(labelMatraix[i])46 #满足条件才能更新a,b的值,感觉这里的labelMat[i]不影响结果47 if(((labelMatraix[i]*Ei<-toler) and (alphas[i]<C))\48 or ((labelMatraix[i]*Ei)>toler) and (alphas[i]>0)):49 j = selectJrand(i,m)#随机选择一个不同于i的[0,m]50 fXj = float(np.multiply(alphas,labelMatraix).transpselfe()\51 *(dataMatraix*dataMatraix[j,:].T))+b52 Ej = fXj - float(labelMatraix[j])53 alphaIold = alphas[i].copy()54 alphaJold = alphas[j].copy()55 if (labelMatraix[i] != labelMatraix[j]):56 L = max(0,alphas[j]-alphas[i])57 H = min(C,C+alphas[j]-alphas[i])58 else:59 L = max(0,alphas[i]+alphas[j]-C)60 H = min(C,alphas[i]+alphas[j])61 if (L==H):62 print('L==H')63 continue64 #计算65 eta = 2.0*dataMatraix[i,:]*dataMatraix[j,:].T\66- dataMatraix[i,:]*dataMatraix[i,:].T\67- dataMatraix[j,:]*dataMatraix[j,:].T68 if (eta>0):69 print("eta>0")70 continue71 #更新a的新值j72 alphas[j] -= labelMatraix[j]*(Ei - Ej)/eta73 #修剪aj74 alphas[j] = clipAlpha(alphas[j],H,L)75 if (abs(alphas[j] - alphaJold) < 0.00001):76 print("aj not moving")77 #更新ai78 alphas[i] += labelMatraix[j]*labelMatraix[i]*(alphaJold - alphas[j])79 #更新b1,b280 b1 = b - Ei - labelMatraix[i]*(alphas[i]-alphaIold)*dataMatraix[i,:]\81 *dataMatraix[i,:].T - labelMatraix[j]*(alphas[j]-alphaJold)*dataMatraix[i,:]*dataMatraix[j,:].T82 b2 = b - Ej - labelMatraix[i]*(alphas[i] - alphaIold)*dataMatraix[i,:]\83 * dataMatraix[j, :].T - labelMatraix[j]*(alphas[j] - alphaJold) * dataMatraix[j,:] * dataMatraix[j,:].T84 #通过b1和b2计算b85 if (0< alphas[i] <C): b = b186 elif (0<alphas[j]<C): b = b287 else: b = (b1+b2)/288 #计算更新次数,用来判断是否训练数据是否全部达标89 alphaPairsChange += 190 print("iter: %d i:%d pairs:%d "%(iter,i,alphaPairsChange))91 #连续的达标次数92 if(alphaPairsChange==0): iter +=193 else: iter = 094 print("iteration number: %d"%(iter))95return b, alphas96 97 #分类函数98 def calsWs(alphas,dataArr,classLabels):99X = np.mat(dataArr)100labelMat = np.mat(classLabels).T101alphas = np.mat(np.array(alphas).reshape((1,100))).T102w = np.sum(np.multiply(np.multiply(alphas,labelMat),X),axis=0)103return w104 #文本转化为int105 def img2vector(filename):106returnVect = np.zeros((1,1024))107fr = open(filename)108for i in range(32):109 lineStr = fr.readline()110 for j in range(32):111 returnVect[0,32*i+j] = int(lineStr[j])112return returnVect113 #完整版SMO114 class optStruct:115def __init__(self,dataMatIn,classLabel,C,toler,kTup):116 self.X = dataMatIn117 self.labelMat = classLabel118 self.C = C119 self.tol = toler120 self.m = np.shape(dataMatIn)[0]121 self.alphas = np.mat(np.zeros((self.m,1)))122 self.b = 0123 self.eCache = np.mat(np.zeros((self.m,2)))#存储误差,用于启发式搜索124 self.K = np.mat(np.zeros((self.m,self.m)))#存储核函数转化后的dataMatIn125 for i in range(self.m):126 self.K[:,i] = kernelTrans(self.X,self.X[i,:],kTup)127def clacEk(self,k):128 #计算当前值129 fXk = float(np.multiply(self.alphas,self.labelMat).T*self.K[:,k]+ self.b)130 Ek = fXk - float(self.labelMat[k])131 return Ek132def selecJ(self,i,Ei):133 maxK = -1134 maxDeltaE = 0135 Ej = 0136 self.eCache[i] = [1,Ei]#存储偏差137 #A代表mat转换成array类型,nonzero返回非零元素的下标138 validEcacheList = np.nonzero(self.eCache[:,0].A)[0]139 if (len(validEcacheList)>1):#启发式选择140 for k in validEcacheList:141 if (k==i):continue#k不能等于i142 Ek = self.clacEk(k)#计算绝对偏差143 deltaE = abs(Ei - Ek)#相对偏差144 if (deltaE >maxDeltaE):145 maxK = k146 maxDeltaE = deltaE147 Ej = Ek148 return maxK, Ej149 150 else:151 j = selectJrand(i,self.m)#随机选择152 Ej = self.clacEk(j)#随机绝对偏差153 return j,Ej154def updataEk(self,k):155 Ek = self.clacEk(k)156 self.eCache[k] = [k,Ek]157def innerL(self,i):158 Ei = self.clacEk(i)159 if ((self.labelMat[i]*Ei<-self.tol and self.alphas[i]<self.C)\160 or(self.labelMat[i]*Ei>self.tol and self.alphas[i]>0)):161 j,Ej = self.selecJ(i,Ei)#选择J162 alphaIold = self.alphas[i].copy()163 alphaJold = self.alphas[j].copy()164 #计算L和H的值165 if (self.labelMat[i] != self.labelMat[j]):166 L = max(0,self.alphas[j]-self.alphas[i])167 H = min(self.C,self.C+self.alphas[j]-self.alphas[i])168 else:169 L = max(0,self.alphas[j]+self.alphas[i]-self.C)170 H = min(self.C,self.alphas[i] +self.alphas[j])171 if (L==H): return 0172 #eta = 2.0* self.X[i,:]*self.X[j,:].T - self.X[i,:]*self.X[i,:].T - \173 # self.X[j,:]*self.X[j,:].T174 eta = 2.0*self.K[i,j] - self.K[i,i] - self.K[j,j]#在此应用核函数175 if (eta>=0): return 0176 self.alphas[j] -= self.labelMat[j] * (Ei - Ej)/eta177 self.alphas[j] = clipAlpha(self.alphas[j],H,L)178 #更新新出现的aj179 self.updataEk(j)180 if (abs(self.alphas[j] - alphaJold)<0.00001):181 print('J not move')182 self.alphas[i] += self.labelMat[j]*self.labelMat[i]*(alphaJold-self.alphas[j])183 self.updataEk(i)#更新新出现的ai184 b1 = self.b - Ei - self.labelMat[i] *(self.alphas[i] - alphaIold)*\185 self.K[i,i] - self.labelMat[j]*(self.alphas[j]-alphaJold)*self.K[i,j]186 b2 = self.b - Ej - self.labelMat[i] *(self.alphas[i] - alphaIold)*\187 self.K[i,j] - self.labelMat[j]*(self.alphas[j]-alphaJold)*self.K[j,j]188 if (self.alphas[i]>0 and self.alphas[i]<self.C): self.b = b1189 elif (self.alphas[j]>0 and self.alphas[j]<self.C): self.b = b2190 else:self.b = (b1+b2)/2.0191 return 1192 else:193 return 0194 195 def smoP(dataMatIn, classLabels, C, toler, maxIter, kTup): # full Platt SMO196self = optStruct(np.mat(dataMatIn), np.mat(classLabels).transpose(), C, toler, kTup)197iter = 0198entireSet = True;199alphaPairsChanged = 0200while (iter < maxIter) and ((alphaPairsChanged > 0) or (entireSet)):201 alphaPairsChanged = 0202 if entireSet: # go over all203 for i in range(self.m):204 alphaPairsChanged += self.innerL(i)205 print('fullSet iter:%d, i=%d, pair change:%d' % (iter, i, alphaPairsChanged))206 iter += 1207 else: # go over non-bound (railed) alphas208 nonBoundIs = np.nonzero((self.alphas.A > 0) * (self.alphas.A < C))[0]209 for i in nonBoundIs:210 alphaPairsChanged += self.innerL(i)211 print('nonBound iter:%d, i=%d, pair change:%d' % (iter, i, alphaPairsChanged))212 iter += 1213 if entireSet:214 entireSet = False # toggle entire set loop215 elif (alphaPairsChanged == 0):216 entireSet = True217 print("iteration number: %d" % iter)218 return self.b, self.alphas219 #生成核函数,注意这里核函数的计算公式,见博文对其进行说明!220 def kernelTrans(X,A,kTup):221m,n = X.shape222K = np.mat(np.zeros([m,1]))223if (kTup[0]=='lin'):K = X*A.T224elif(kTup[0]=='rbf'):225 for j in range(m):226 deltaRow = X[j,:] - A227 K[j] = deltaRow * deltaRow.T228 K = np.exp(K/(-1*kTup[1]**2))229else:raise NameError('Houston We have a problem')230return K231 232 def testRbf(k1 = 1.3):233dataArr, labelArr, dataLbel = loadDataSet('testSetRBF.txt')234b, alphas = smoP(dataArr,labelArr,200,0.0001,10000,('rbf',k1))235dataMat = np.mat(dataArr)236labelMat = np.mat(labelArr).T237svInd = np.nonzero(alphas.A>0)[0]#支持向量a238sVs = dataMat[svInd]#支持向量X239labelSV = labelMat[svInd]240print("There are %d Support Vector"%(sVs.shape[0]))241m, n = dataMat.shape242errorCount = 0243for i in range(m):244 kernelEval = kernelTrans(sVs,dataMat[i,:],('rbf',k1))245 predict = kernelEval.T * np.multiply(labelSV,alphas[svInd])+b246 if(np.sign(predict)!=np.sign(labelArr[i])):errorCount+=1247print("The training error is: %f"%(float(errorCount/m)))248dataArr, labelArr, datalabel = loadDataSet('testSetRBF2.txt')249errorCount = 0250dataMat = np.mat(dataArr)251labelMat = np.mat(labelArr).T252m, n = dataMat.shape253for i in range(m):254 kernelEval = kernelTrans(sVs, dataMat[i,:], ('rbf', k1))255 predict = kernelEval.T * np.multiply(labelSV, alphas[svInd]) + b256 if (np.sign(predict) != np.sign(labelArr[i])): errorCount += 1257print("The test error is: %f" %(float(errorCount / m)))258 259 def loadImage(dirName):260from os import listdir261hwLabel = []262trainingFileList = listdir(dirName)263m = len(trainingFileList)264trainingMat = np.zeros((m,1024))265for i in range(m):266 fileNameStr = trainingFileList[i]267 fileStr = fileNameStr.split('.')[0]268 classNumStr = int(fileStr.split('_')[0])269 if (classNumStr==9):270 hwLabel.append(-1)271 else:hwLabel.append(1)272 trainingMat[i,:] = img2vector('%s/%s'%(dirName,fileNameStr))273return trainingMat, np.array(hwLabel)274 275 def testDigits(kTup = ('rbf',10)):276dataArr, labelArr = loadImage('trainingDigits')277b, alphas = smoP(dataArr,labelArr,200,0.0001,10000,kTup)278dataMat = np.mat(dataArr);labelMat = np.mat(labelArr).T279svInd = np.nonzero(alphas.A>0)[0]280sVs = dataMat[svInd]281labelSV = labelMat[svInd]282print("there are %d Support Vectors"%(int(sVs.shape[0])))283m,n = dataMat.shape284errorCount = 0285for i in range(m):286 kernelEval = kernelTrans(sVs,dataMat[i,:],kTup)287 predict = kernelEval.T * np.multiply(labelSV,alphas[svInd]) + b288 if (np.sign(predict)!=np.sign(labelArr[i])):errorCount+=1289print("The training error is: %f"%(float((errorCount)/m)))290dataArr, labelArr = loadImage('testDigits')291errorCount = 0292dataMat = np.mat(dataArr);labelMat = np.mat(labelArr).T293m, n = dataMat.shape294for i in range(m):295 kernelEval = kernelTrans(sVs, dataMat[i, :], kTup)296 predict = kernelEval.T * np.multiply(labelSV, alphas[svInd]) + b297 if (np.sign(predict) != np.sign(labelArr[i])): errorCount += 1298print("The test error is: %f" % (float((errorCount) / m)))

三.参考文献

参考:

注释:以下是参考链接里面的内容,按以下中文描述排列!都是大神的结晶,没有主观次序。

点到平面距离、拉格朗日二次优化、二次曲线公式推导、SMO公式推导

/graphics/archive//07/10/1774809.html

/90zeng/p/Lagrange_duality.html

/view/e28aa3040b1c59eef9c7b40a.html

/xuanyuansen/article/details/41153023

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。