700字范文,内容丰富有趣,生活中的好帮手!
700字范文 > 项目:机器学习+FLD分类+python图像处理mnist数据集

项目:机器学习+FLD分类+python图像处理mnist数据集

时间:2023-06-25 05:25:53

相关推荐

项目:机器学习+FLD分类+python图像处理mnist数据集

机器学习+FLD分类+python图像处理mnist数据集

**

以mnist数据集实现Fisher Linear Discriminant(FLD)的分类以及降维功能

任务一如下所示

以下任务是teacher qizhi 's require

thanks teacher for give me a clear concept in machine learning

在任务一中,着重在于数据集的处理与FLD算法原理的实现

在任务二中,着重留意降维与分类

在这里插入代码片from mnist import MNISTimport matplotlib.pyplot as pltimport numpy as npimport randomfrom scipy.spatial.distance import cdistimport copyimport osimport structfrom struct import unpackfrom numpy import *import randomfrom numpy.linalg import pinvimport sklearnfrom sklearn.datasets import make_classification#***********初始化********************************#请将文件的路径填到下面的格式里,格式里不要有中文# 训练集文件train_images_idx3_ubyte_file = r'C:\Users\**Desktop\111\DataSet\train-images-idx3-ubyte\train-images.idx3-ubyte'# 训练集标签文件train_labels_idx1_ubyte_file = r'C:\Users\**\Desktop\111\DataSet\train-labels-idx1-ubyte\train-labels.idx1-ubyte'# 测试集文件test_images_idx3_ubyte_file = r'C:\Users\**\Desktop\111\DataSet\t10k-images-idx3-ubyte\t10k-images.idx3-ubyte'# 测试集标签文件test_labels_idx1_ubyte_file = r'C:\Users\****\Desktop\111\DataSet\t10k-labels-idx1-ubyte\t10k-labels.idx1-ubyte'#**********软件版本问题:必须解析读取数据集函数*******************def decode_idx3_ubyte(idx3_ubyte_file):"""解析idx3文件的通用函数:param idx3_ubyte_file: idx3文件路径:return: 数据集"""# 读取二进制数据bin_data = open(idx3_ubyte_file, 'rb').read()# 解析文件头信息,依次为魔数、图片数量、每张图片高、每张图片宽offset = 0fmt_header = '>iiii' #因为数据结构中前4行的数据类型都是32位整型,所以采用i格式,但我们需要读取前4行数据,所以需要4个i。我们后面会看到标签集中,只使用2个ii。magic_number, num_images, num_rows, num_cols = struct.unpack_from(fmt_header, bin_data, offset)print('魔数:%d, 图片数量: %d张, 图片大小: %d*%d' % (magic_number, num_images, num_rows, num_cols))# 解析数据集image_size = num_rows * num_colsoffset += struct.calcsize(fmt_header) #获得数据在缓存中的指针位置,从前面介绍的数据结构可以看出,读取了前4行之后,指针位置(即偏移位置offset)指向0016。print(offset)fmt_image = '>' + str(image_size) + 'B' #图像数据像素值的类型为unsigned char型,对应的format格式为B。这里还有加上图像大小784,是为了读取784个B格式数据,如果没有则只会读取一个值(即一副图像中的一个像素值)print(fmt_image,offset,struct.calcsize(fmt_image))images = np.empty((num_images, num_rows, num_cols))for i in range(num_images):images[i] = np.array(struct.unpack_from(fmt_image, bin_data, offset)).reshape((num_rows, num_cols))offset += struct.calcsize(fmt_image)return imagesdef decode_idx1_ubyte(idx1_ubyte_file):"""解析idx1文件的通用函数:param idx1_ubyte_file: idx1文件路径:return: 数据集"""# 读取二进制数据bin_data = open(idx1_ubyte_file, 'rb').read()# 解析文件头信息,依次为魔数和标签数offset = 0fmt_header = '>ii'magic_number, num_images = struct.unpack_from(fmt_header, bin_data, offset)print('魔数:%d, 图片数量: %d张' % (magic_number, num_images))# 解析数据集offset += struct.calcsize(fmt_header)fmt_image = '>B'labels = np.empty(num_images)for i in range(num_images):# if (i + 1) % 10000 == 0:# print ('已解析 %d' % (i + 1) + '张')labels[i] = struct.unpack_from(fmt_image, bin_data, offset)[0]offset += struct.calcsize(fmt_image)return labelsdef load_train_images(idx_ubyte_file=train_images_idx3_ubyte_file):"""TRAINING SET IMAGE FILE (train-images-idx3-ubyte):[offset] [type][value][description]000032 bit integer 0x00000803(2051) magic number000432 bit integer 60000 number of images000832 bit integer 28number of rows001232 bit integer 28number of columns0016unsigned byte ??pixel0017unsigned byte ??pixel........xxxxunsigned byte ??pixelPixels are organized row-wise. Pixel values are 0 to 255. 0 means background (white), 255 means foreground (black).:param idx_ubyte_file: idx文件路径:return: n*row*col维np.array对象,n为图片数量"""return decode_idx3_ubyte(idx_ubyte_file)def load_train_labels(idx_ubyte_file=train_labels_idx1_ubyte_file):"""TRAINING SET LABEL FILE (train-labels-idx1-ubyte):[offset] [type][value][description]000032 bit integer 0x00000801(2049) magic number (MSB first)000432 bit integer 60000 number of items0008unsigned byte ??label0009unsigned byte ??label........xxxxunsigned byte ??labelThe labels values are 0 to 9.:param idx_ubyte_file: idx文件路径:return: n*1维np.array对象,n为图片数量"""return decode_idx1_ubyte(idx_ubyte_file)if __name__ == '__main__':train_images = load_train_images()train_labels = load_train_labels()#******************按照要求**************************path=r'C:\Users\shirui\Desktop\111\DataSet\train-images-idx3-ubyte\train-images.idx3-ubyte'mndata = MNIST(rb'DataSet')#images,labels = mndata.load_training()images=load_train_images(r'C:\Users\shirui\Desktop\111\DataSet\train-images-idx3-ubyte\train-images.idx3-ubyte')labels=load_train_labels(r'C:\Users\shirui\Desktop\111\DataSet\train-labels-idx1-ubyte\train-labels.idx1-ubyte')images=images.tolist()labels=labels.tolist()images = array(images, dtype ='int32' ) # define dtypelabels = array(labels, dtype ='int32' ) # define dtypeprint(type(images))print(len(images))print(np.size(images[6000]))print(type(labels))print(np.size(labels))images_new=np.reshape(np.asarray(images),(-1,28,28))#将列表转换成3维数组,第一、第二轴尺寸定为28, 第零轴尺寸-1表示将自动计算该维度尺寸。\n",print(len(images_new))print(images_new.shape)images_list=[[],[],[],[],[],[],[],[],[],[]]print(len(images_list))for image_id in range(len(images)): images_list[labels[image_id]].append(images[image_id])print(len(images_list[0]))print(len(images_list[1]))print(len(images_list[2]))print(len(images_list[3]))print(len(images_list[4]))print(len(images_list[5]))print(len(images_list[6]))print(len(images_list[7]))print(len(images_list[8]))print(len(images_list[9]))#以上根据label将sample分放在10个子列表中,每个子列表中存放对应数字类的所有样本。##以下为your code,生成images_train,images_test两个四维数组,分别存放数字0-9的训练集以及测试集。#images_train的尺寸是10,800,28,28.\n",##images_test的尺寸是10,200,28,28.\n",##1)推荐使用numpy函数random.choice或者random.sample来随机选取某类数字#任意1000张中的800张组成训练集,剩余200张组成测试集。\n",##2)像素数值归一化到0-1之间\n",##此步骤 4分\n"##$$$$$$$$$$$$$$$$$$$$$$ 代码1 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$#注:该部分代码是将不重复抽取的1000个数据分800放在训练集中,200放在测试集中images0_sum = random.sample(range(len(images_list[0])), 1000)#不重复抽取1000images1_sum = random.sample(range(len(images_list[1])), 1000)#不重复抽取1000images2_sum = random.sample(range(len(images_list[2])), 1000)#不重复抽取1000images3_sum = random.sample(range(len(images_list[3])), 1000)#不重复抽取1000images4_sum = random.sample(range(len(images_list[4])), 1000)#不重复抽取1000images5_sum = random.sample(range(len(images_list[5])), 1000)#不重复抽取1000images6_sum = random.sample(range(len(images_list[6])), 1000)#不重复抽取1000images7_sum = random.sample(range(len(images_list[7])), 1000)#不重复抽取1000images8_sum = random.sample(range(len(images_list[8])), 1000)#不重复抽取1000images9_sum = random.sample(range(len(images_list[9])), 1000)#不重复抽取1000tra0=[]tra1=[]tra2=[]tra3=[]tra4=[]tra5=[]tra6=[]tra7=[]tra8=[]tra9=[]for i in range(0,800):tra0.append(images0_sum[i])#将800个训练数据添加到训练集中tra1.append(images1_sum[i])tra2.append(images2_sum[i])tra3.append(images3_sum[i])tra4.append(images4_sum[i])tra5.append(images5_sum[i])tra6.append(images6_sum[i])tra7.append(images7_sum[i])tra8.append(images8_sum[i])tra9.append(images9_sum[i])test0=[]test1=[]test2=[]test3=[]test4=[]test5=[]test6=[]test7=[]test8=[]test9=[]for i in range(800,1000):test0.append(images0_sum[i])#将200个训练数据添加到测试集中test1.append(images1_sum[i])test2.append(images2_sum[i])test3.append(images3_sum[i])test4.append(images4_sum[i])test5.append(images5_sum[i])test6.append(images6_sum[i])test7.append(images7_sum[i])test8.append(images8_sum[i])test9.append(images9_sum[i])images0_tra_v=[]#定义训练集800images1_tra_v=[]images2_tra_v=[]images3_tra_v=[]images4_tra_v=[]images5_tra_v=[]images6_tra_v=[]images7_tra_v=[]images8_tra_v=[]images9_tra_v=[]images0_test_v=[]#定义测试集200images1_test_v=[]images2_test_v=[]images3_test_v=[]images4_test_v=[]images5_test_v=[]images6_test_v=[]images7_test_v=[]images8_test_v=[]images9_test_v=[]images0_01=[]#训练集归一化images1_01=[]#训练集归一化images2_01=[]#训练集归一化images3_01=[]#训练集归一化images4_01=[]#训练集归一化images5_01=[]#训练集归一化images6_01=[]#训练集归一化images7_01=[]#训练集归一化images8_01=[]#训练集归一化images9_01=[]#训练集归一化images0_01t=[]#测试集归一化images1_01t=[]#测试集归一化images2_01t=[]#测试集归一化images3_01t=[]#测试集归一化images4_01t=[]#测试集归一化images5_01t=[]#测试集归一化images6_01t=[]#测试集归一化images7_01t=[]#测试集归一化images8_01t=[]#测试集归一化images9_01t=[]#测试集归一化for i in tra0:#训练集生成&归一images0_tra_v.append(images_list[0][i])images0_01.append(images_list[0][i]/255)for i in tra1:images1_tra_v.append(images_list[1][i])images1_01.append(images_list[1][i]/255)for i in tra2:images2_tra_v.append(images_list[2][i])images2_01.append(images_list[2][i]/255)for i in tra3:images3_tra_v.append(images_list[3][i])images3_01.append(images_list[3][i]/255)for i in tra4:images4_tra_v.append(images_list[4][i])images4_01.append(images_list[4][i]/255)for i in tra5:images5_tra_v.append(images_list[5][i])images5_01.append(images_list[5][i]/255)for i in tra6:images6_tra_v.append(images_list[6][i])images6_01.append(images_list[6][i]/255)for i in tra7:images7_tra_v.append(images_list[7][i])images7_01.append(images_list[7][i]/255)for i in tra8:images8_tra_v.append(images_list[8][i])images8_01.append(images_list[8][i]/255)for i in tra9:images9_tra_v.append(images_list[9][i])images9_01.append(images_list[9][i]/255)for i in test0:#测试集生成&归一images0_test_v.append(images_list[0][i])images0_01t.append(images_list[0][i]/255)for i in test1:images1_test_v.append(images_list[1][i])images1_01t.append(images_list[1][i]/255)for i in test2:images2_test_v.append(images_list[2][i])images2_01t.append(images_list[2][i]/255)for i in test3:images3_test_v.append(images_list[3][i])images3_01t.append(images_list[3][i]/255)for i in test4:images4_test_v.append(images_list[4][i])images4_01t.append(images_list[4][i]/255)for i in test5:images5_test_v.append(images_list[5][i])images5_01t.append(images_list[5][i]/255)for i in test6:images6_test_v.append(images_list[6][i])images6_01t.append(images_list[6][i]/255)for i in test7:images7_test_v.append(images_list[7][i])images7_01t.append(images_list[7][i]/255)for i in test8:images8_test_v.append(images_list[8][i])images8_01t.append(images_list[8][i]/255)for i in test9:images9_test_v.append(images_list[9][i])images9_01t.append(images_list[9][i]/255)#训练集与测试集的合并生成#未归一化images_train01=[images0_tra_v,images1_tra_v,images2_tra_v,images3_tra_v,images4_tra_v,images5_tra_v,images6_tra_v,images7_tra_v,images8_tra_v,images9_tra_v]images_test01=[images0_test_v,images1_test_v,images2_test_v,images3_test_v,images4_test_v,images5_test_v,images6_test_v,images7_test_v,images8_test_v,images9_test_v]#像素归一化0~1images_train=[images0_01,images1_01,images2_01,images3_01,images4_01,images5_01,images6_01,images7_01,images8_01,images9_01]images_test=[images0_01t,images1_01t,images2_01t,images3_01t,images4_01t,images5_01t,images6_01t,images7_01t,images8_01t,images9_01t]images_train=np.array(images_train)images_test=np.array(images_test)##$$$$$$$$$$$$$$$$$$$$$$ 代码1 结束 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$# 计算并画出数字“3”training set以及testing set的均值图像。#此步骤得2分print("images_train.shape=",images_train.shape)print("images_test.shape=",images_test.shape)#nub=3 #数字3## 以下your code 可使用plt.imshow画图##$$$$$$$$$$$$$$$$$$$$$$ 代码2 开始 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$#注:为了均值图像的可视化,我这边按照28*28的格式处理,后面会转成784维度进行处理sum3_it=[[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]]s=0for x in range(0,28):for y in range(0,28):for num in range(0,800):a=images_train01[3][num][x][y]s=a+ssum3_it[x].append(s)s=0for x in range(0,28):for y in range(0,28):sum3_it[x][y]=sum3_it[x][y]/800sum3_it=np.array(sum3_it)sum3_it=array(sum3_it, dtype ='int32' )plt.imshow(sum3_it,cmap='gray')#两个的均值都是28*28plt.show()sum6_it=[[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]]s=0for x in range(0,28):for y in range(0,28):for num in range(0,800):a=images_train01[6][num][x][y]s=a+ssum6_it[x].append(s)s=0for x in range(0,28):for y in range(0,28):sum6_it[x][y]=sum6_it[x][y]/800sum6_it=np.array(sum6_it)sum6_it=array(sum6_it, dtype ='int32' )plt.imshow(sum6_it,cmap='gray')plt.show()##$$$$$$$$$$$$$$$$$$$$$$ 代码2 结束 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$positive_nub=3 #以positive_nub为images_train第一轴的索引,获得“3”为正样本,negative_nub=6 #以negative_nub为索引获得数字“6”为负样本\n",##以下your code,计算正样本均值,负样本均值,Sw,利用linalg.svd函数求Sw的逆,由公式3.39计算出最佳投影方向W。\n",##此步骤得4分\n",##$$$$$$$$$$$$$$$$$$$$$$ 代码3 开始 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$aa=np.ones((800,1))images3_01=np.array(images3_01)images3_01=np.reshape(images3_01,(800,784))images3_01h=np.column_stack((images3_01,aa)).Tprint("images3_01=",images3_01.shape)images6_01=np.array(images6_01)images6_01=np.reshape(images6_01,(800,784))images6_01h=np.column_stack((images6_01,aa)).Tbb=np.ones((200,1))images3_01t=np.array(images3_01t)images3_01t=np.reshape(images3_01t,(200,784))images3_01th=np.column_stack((images3_01t,bb)).Timages6_01t=np.array(images6_01t)images6_01t=np.reshape(images6_01t,(200,784))images6_01th=np.column_stack((images6_01t,bb)).T#训练集与测试集变换为800*785def FDA(images3_01h,images6_01h):mean3_1= np.mean(images3_01h,axis = 1)mean6_1= np.mean(images6_01h,axis = 1)uuuu=((mean3_1-mean6_1))cov1=np.empty(shape=[785,785])for num in range(800):pp=((images3_01h[:,num]-mean3_1))cov1=np.mat(pp).T*np.mat(pp)+cov1cov2=np.empty(shape=[785,785])for num in range(800):qq=((images6_01h[:,num]-mean6_1))cov2=np.mat(qq).T*np.mat(qq)+cov2Sw = cov1 + cov2#可以用奇异值分解求伪逆,也可以用np.linalg.pinv#奇异值分解法U,sigma,VT = np.linalg.svd(Sw)sigma=sorted(sigma,reverse=True)sigma=np.diag(sigma)Sw_1=np.dot(np.dot(VT.T,np.linalg.pinv(sigma)),U.T)#求Sw_1Sb=uuuu*(uuuu.T)U,sigma,VT = np.linalg.svd(Sw)# 计算ww=np.dot(np.linalg.pinv(Sw),uuuu.T)#np.linalg.pinv求伪逆较为方便return w##$$$$$$$$$$$$$$$$$$$$$$ 代码3 结束 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$##以下your code,获得投影方向W之后,将共1600个正、负样本投影到该方向,#并统计正样本投影结果的分布以及负样本投影结果的分布,绘制在一张图中,由此寻找恰当的threshold。\n",##最后,调用images_test中正负样本集,通过W'X是否大于threshold来判断正负性,并统计“3”“6”的分类error。#请输出 confusion matrix。至此,任务一结束。\n",#此步骤得4分。\n",##$$$$$$$$$$$$$$$$$$$$$$ 代码4 开始 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$w=FDA(images3_01h,images6_01h)y3=np.mat(w)*np.mat(images3_01h)y6=np.mat(w)*np.mat(images6_01h) weight=np.ones([800])aa=weight*0.000125weightss=np.ones([200])bb=weightss*0.0005y3=reshape(array(y3),[800])y6=reshape(array(y6),[800])#weights= [1./ len(数组)] * len(数组)plt.hist(y3, label='3',weights= [1./ len(y3)] * len(y3))plt.hist(y6, label='6',weights= [1./ len(y6)] * len(y6))plt.legend(loc='upper right')plt.show()threshold=0y3_test=np.mat(w)*np.mat(images3_01th)y6_test=np.mat(w)*np.mat(images6_01th)count3=[]#计数:将正类误判为负类count6=[]#计数:将负类误判为正类for i in range(200):if y3_test[:,i]<threshold:count3.append(1)if y3_test[:,i]>=threshold:count3.append(0)print('count3_error=',sum(count3))for i in range(200):if y6_test[:,i]>threshold:count6.append(1)if y6_test[:,i]<=threshold:count6.append(0)print('count6_error=',sum(count6))print("**** confusion matrix *****")print("fn=",sum(count3))print("tp=",200-sum(count3))print("fp=",sum(count6))print("tn=",200-sum(count6))y3_test=reshape(array(y3_test),[200])y6_test=reshape(array(y6_test),[200])plt.hist(y3_test, label='3',weights= [1./ len(y3_test)] * len(y3_test),alpha=0.6)plt.hist(y6_test, label='6',weights= [1./ len(y6_test)] * len(y6_test),alpha=0.6)plt.legend(loc='upper right')plt.show()##$$$$$$$$$$$$$$$$$$$$$$ 代码4 结束 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$##以下任务二的your code。请输出三维空间的数字投影图,以不同颜色区分数字。 请在W1,W2, W3确定的三维空间中,#计算“3”,“6”,以及“未知数字”三类之间Sw,Sb,\n",##以数字类为横轴,以Sb/Sw为纵轴,绘制曲线,并确定能与“3”“6”最大程度区分的数字。 \n",##此步骤得6分\n",##$$$$$$$$$$$$$$$$$$$$$$ 代码5 开始 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$#此问中我的代码思路:#1首先处理数据#2投影图1:将十个数投影到以3,6,分类寻找到的三维特征空间中,查看分类情况#3投影图2:计算3 & 6的SW & SB,并将十个数据放到根据3 6找出的特征向量中测试性能#最后尝试计算sb/swi#有两个投影图,请等待程序运行images0_sum = random.sample(range(len(images_list[0])), 5000)#不重复抽取5000images1_sum = random.sample(range(len(images_list[1])), 5000)#不重复抽取5000images2_sum = random.sample(range(len(images_list[2])), 5000)#不重复抽取5000images3_sum = random.sample(range(len(images_list[3])), 5000)#不重复抽取5000images4_sum = random.sample(range(len(images_list[4])), 5000)images5_sum = random.sample(range(len(images_list[5])), 5000)images6_sum = random.sample(range(len(images_list[6])), 5000)images7_sum = random.sample(range(len(images_list[7])), 5000)images8_sum = random.sample(range(len(images_list[8])), 5000)images9_sum = random.sample(range(len(images_list[9])), 5000)tra0=[]tra1=[]tra2=[]tra3=[]tra4=[]tra5=[]tra6=[]tra7=[]tra8=[]tra9=[]for i in range(5000):tra0.append(images0_sum[i])tra1.append(images1_sum[i])tra2.append(images2_sum[i])tra3.append(images3_sum[i])tra4.append(images4_sum[i])tra5.append(images5_sum[i])tra6.append(images6_sum[i])tra7.append(images7_sum[i])tra8.append(images8_sum[i])tra9.append(images9_sum[i])images0_01=[]#训练集归一化images1_01=[]#训练集归一化images2_01=[]#训练集归一化images3_01=[]#训练集归一化images4_01=[]#训练集归一化images5_01=[]#训练集归一化images6_01=[]#训练集归一化images7_01=[]#训练集归一化images8_01=[]#训练集归一化images9_01=[]#训练集归一化for i in tra0:#测试集生成&归一images0_tra_v.append(images_list[0][i])images0_01.append(images_list[0][i]/255)for i in tra1:images1_tra_v.append(images_list[1][i])images1_01.append(images_list[1][i]/255)for i in tra2:images2_tra_v.append(images_list[2][i])images2_01.append(images_list[2][i]/255)for i in tra3:images3_tra_v.append(images_list[3][i])images3_01.append(images_list[3][i]/255)for i in tra4:images4_tra_v.append(images_list[4][i])images4_01.append(images_list[4][i]/255)for i in tra5:images5_tra_v.append(images_list[5][i])images5_01.append(images_list[5][i]/255)for i in tra6:images6_tra_v.append(images_list[6][i])images6_01.append(images_list[6][i]/255)for i in tra7:images7_tra_v.append(images_list[7][i])images7_01.append(images_list[7][i]/255)for i in tra8:images8_tra_v.append(images_list[8][i])images8_01.append(images_list[8][i]/255)for i in tra9:images9_tra_v.append(images_list[9][i])images9_01.append(images_list[9][i]/255)#images_train 5000样本归一化完毕aa=np.ones((5000,1))images0_01=np.array(images0_01)images0_01=np.reshape(images0_01,(5000,784))images0_01h=np.column_stack((images0_01,aa)).Timages1_01=np.array(images1_01)images1_01=np.reshape(images1_01,(5000,784))images1_01h=np.column_stack((images1_01,aa)).Timages2_01=np.array(images2_01)images2_01=np.reshape(images2_01,(5000,784))images2_01h=np.column_stack((images2_01,aa)).Timages3_01=np.array(images3_01)images3_01=np.reshape(images3_01,(5000,784))images3_01h=np.column_stack((images3_01,aa)).Timages4_01=np.array(images4_01)images4_01=np.reshape(images4_01,(5000,784))images4_01h=np.column_stack((images4_01,aa)).Timages5_01=np.array(images5_01)images5_01=np.reshape(images5_01,(5000,784))images5_01h=np.column_stack((images5_01,aa)).Timages6_01=np.array(images6_01)images6_01=np.reshape(images6_01,(5000,784))images6_01h=np.column_stack((images6_01,aa)).Timages7_01=np.array(images7_01)images7_01=np.reshape(images7_01,(5000,784))images7_01h=np.column_stack((images7_01,aa)).Timages8_01=np.array(images8_01)images8_01=np.reshape(images8_01,(5000,784))images8_01h=np.column_stack((images8_01,aa)).Timages9_01=np.array(images9_01)images9_01=np.reshape(images9_01,(5000,784))images9_01h=np.column_stack((images9_01,aa)).Timages_train=[images0_01h,images1_01h,images2_01h,images3_01h,images4_01h,images5_01h,images6_01h,images7_01h,images8_01h,images9_01h]images_train=np.array(images_train)print("images3_01=",images3_01.shape)images_train=np.reshape(images_train,(10,5000,785))print(images_train.shape)#计算swi & SWdef swi(images_data):aa=np.ones((5000,1))images_data=np.array(images_data)images_data=np.column_stack((images_data,aa)).Timages_data=np.reshape(images_data,(5000,785))images_datah=images_data.Tmean= np.mean(images_datah,axis = 1)cov=np.empty(shape=[785,785])for num in range(5000):pp=((images_datah[:,num]-mean))cov=np.mat(pp).T*np.mat(pp)+covreturn cov#计算十分类问题#将十组数据投影到根据之组数据计算出的三维特征向量的空间中#print(SW)images_all=np.column_stack((images0_01h,images1_01h,images2_01h,images3_01h,images4_01h,images5_01h,images6_01h,images7_01h,images8_01h,images9_01h))mean_all=np.mean(images_all,axis = 1)mean0=np.mean(images0_01h,axis = 1)mean1=np.mean(images1_01h,axis = 1)mean2=np.mean(images2_01h,axis = 1)mean3=np.mean(images3_01h,axis = 1)mean4=np.mean(images4_01h,axis = 1)mean5=np.mean(images5_01h,axis = 1)mean6=np.mean(images6_01h,axis = 1)mean7=np.mean(images7_01h,axis = 1)mean8=np.mean(images8_01h,axis = 1)mean9=np.mean(images9_01h,axis = 1)c0=((mean0-mean_all))c1=((mean1-mean_all))c2=((mean2-mean_all))c3=((mean3-mean_all))c4=((mean4-mean_all))c5=((mean5-mean_all))c6=((mean6-mean_all))c7=((mean7-mean_all))c8=((mean8-mean_all))c9=((mean9-mean_all))sb0=5000*dot(c0.T,c0)sb1=5000*dot(c1.T,c1)sb2=5000*dot(c2.T,c2)sb3=5000*dot(c3.T,c3)sb4=5000*dot(c4.T,c4)sb5=5000*dot(c5.T,c5)sb6=5000*dot(c6.T,c6)sb7=5000*dot(c7.T,c7)sb8=5000*dot(c8.T,c8)sb9=5000*dot(c9.T,c9)SB=sb0+sb1+sb2+sb3+sb4+sb5+sb6+sb7+sb8+sb9SW=np.empty(shape=[785,785])SW=swi(images0_01)+swi(images1_01)+swi(images2_01)+swi(images3_01)+swi(images4_01)+swi(images5_01)+swi(images6_01)+swi(images7_01)+swi(images8_01)+swi(images9_01)eig_valsaa, eig_vecsaa = np.linalg.eig(np.linalg.pinv(SW)*(SB))#print("w1_val=",eig_valsaa[0],"w2_val=",eig_valsaa[1],"w3_val=",eig_valsaa[2])ss=np.hstack((eig_vecsaa[:,0],eig_vecsaa[:,1],eig_vecsaa[:,2]))print("**")yy0=np.mat(ss).T*np.mat(images0_01h)yy1=np.mat(ss).T*np.mat(images1_01h)yy2=np.mat(ss).T*np.mat(images2_01h)yy3=np.mat(ss).T*np.mat(images3_01h)yy4=np.mat(ss).T*np.mat(images4_01h)yy5=np.mat(ss).T*np.mat(images5_01h)yy6=np.mat(ss).T*np.mat(images6_01h)yy7=np.mat(ss).T*np.mat(images7_01h)yy8=np.mat(ss).T*np.mat(images8_01h)yy9=np.mat(ss).T*np.mat(images9_01h)from mpl_toolkits.mplot3d import Axes3Dimport csvfrom PIL import Imagefig = plt.figure()X=yy3[0,:].astype('float32')Y=yy3[1,:].astype('float32')Z=yy3[2,:].astype('float32')ax=plt.subplot(111,projection='3d')ax.scatter(X,Y,Z,c='g')X=yy6[0,:].astype('float32')Y=yy6[1,:].astype('float32')Z=yy6[2,:].astype('float32')ax.scatter(X,Y,Z,c='r')X=yy0[0,:].astype('float32')Y=yy0[1,:].astype('float32')Z=yy0[2,:].astype('float32')ax.scatter(X,Y,Z,c='b')X=yy1[0,:].astype('float32')Y=yy1[1,:].astype('float32')Z=yy1[2,:].astype('float32')ax.scatter(X,Y,Z,c='k')X=yy2[0,:].astype('float32')Y=yy2[1,:].astype('float32')Z=yy2[2,:].astype('float32')ax.scatter(X,Y,Z,c='y')X=yy4[0,:].astype('float32')Y=yy4[1,:].astype('float32')Z=yy4[2,:].astype('float32')ax.scatter(X,Y,Z,c='m')X=yy5[0,:].astype('float32')Y=yy5[1,:].astype('float32')Z=yy5[2,:].astype('float32')ax.scatter(X,Y,Z,c='#A52A2A')X=yy7[0,:].astype('float32')Y=yy7[1,:].astype('float32')Z=yy7[2,:].astype('float32')ax.scatter(X,Y,Z,c='#FFFF00')X=yy8[0,:].astype('float32')Y=yy8[1,:].astype('float32')Z=yy8[2,:].astype('float32')ax.scatter(X,Y,Z,c='#00FF00')X=yy9[0,:].astype('float32')Y=yy9[1,:].astype('float32')Z=yy9[2,:].astype('float32')ax.scatter(X,Y,Z,c='#E6E6FA')plt.xlabel('LD1')plt.ylabel('LD2')plt.legend(loc='upper right', fancybox=True)plt.legend(loc='upper center', fancybox=True, ncol=5, labels=['3','6','0','1','2','4','5','7','8','9'])plt.title('LDA: Data is projected onto the first 3 feature vectors')plt.show()#将十组数据投影到根据之组数据计算出的三维特征向量的空间中#观察分类投影图认为1时离36最远的数字#十分类问题后的思考投影到n维度的问题,#私以为如果需要无特征损耗降维则需要求解(sw_1*sb)矩阵的秩,#秩数=特征向量的数量时没有特征损耗地降维,#但当讨论到1,2,3维度问题时,需要分析(sw_1*sb)矩阵的特征值的大小,#如果前两个特征值明显大于第三维特征值时,那么可以认为第三维度相对前两个维度时没那么重要的#个人看法,有不对的请多指导#计算“3”,“6”,以及“未知数字”三类之间Sw,Sb,\n",#将十个数投影到以3,6,分类寻找到的三维特征空间中,查看分类情况SW36=swi(images3_01)+swi(images6_01)train36=np.column_stack((images3_01h,images6_01h))mean3= np.mean(images3_01h,axis = 1)mean6= np.mean(images6_01h,axis = 1)mean_all=np.mean(train36,axis = 1)print("mean_all=",mean_all.shape)uuuu1=((mean3-mean_all))uuuu2=((mean6-mean_all))Sb=5000*dot(uuuu1.T,uuuu1)+5000*dot(uuuu2.T,uuuu2)eig_vals, eig_vecs = np.linalg.eig(np.linalg.pinv(SW36)*(Sb))#print("w1_val=",eig_vals[0],"w2_val=",eig_vals[1],"w3_val=",eig_vals[2])#特征向量#vec=[eig_vecs[:,0],eig_vecs[:,1],eig_vecs[:,2]]#vec=np.array(vec)s=np.hstack((eig_vecs[:,0],eig_vecs[:,1],eig_vecs[:,2]))#print(s.shape)SW_1=np.linalg.pinv(SW36)#U,sigma,VT = np.linalg.svd(np.mat(SW_1)*np.mat(Sb))#sigma=sorted(sigma,reverse=True)#w1_value=sigma[1]#w2_value=sigma[2]#w3_value=sigma[3]#print("w1=",w1_value,"w2=",w2_value,"w3=",w3_value)y0=np.mat(s).T*np.mat(images0_01h)y1=np.mat(s).T*np.mat(images1_01h)y2=np.mat(s).T*np.mat(images2_01h)y3=np.mat(s).T*np.mat(images3_01h)y4=np.mat(s).T*np.mat(images4_01h)y5=np.mat(s).T*np.mat(images5_01h)y6=np.mat(s).T*np.mat(images6_01h)y7=np.mat(s).T*np.mat(images7_01h)y8=np.mat(s).T*np.mat(images8_01h)y9=np.mat(s).T*np.mat(images9_01h)from mpl_toolkits.mplot3d import Axes3Dimport csvfrom PIL import Imagefig = plt.figure()X=y3[0,:].astype('float32')Y=y3[1,:].astype('float32')Z=y3[2,:].astype('float32')ax=plt.subplot(111,projection='3d')ax.scatter(X,Y,Z,c='g')X=y6[0,:].astype('float32')Y=y6[1,:].astype('float32')Z=y6[2,:].astype('float32')ax.scatter(X,Y,Z,c='r')X=y0[0,:].astype('float32')Y=y0[1,:].astype('float32')Z=y0[2,:].astype('float32')ax.scatter(X,Y,Z,c='b')X=y1[0,:].astype('float32')Y=y1[1,:].astype('float32')Z=y1[2,:].astype('float32')ax.scatter(X,Y,Z,c='k')X=y2[0,:].astype('float32')Y=y2[1,:].astype('float32')Z=y2[2,:].astype('float32')ax.scatter(X,Y,Z,c='y')X=y4[0,:].astype('float32')Y=y4[1,:].astype('float32')Z=y4[2,:].astype('float32')ax.scatter(X,Y,Z,c='m')X=y5[0,:].astype('float32')Y=y5[1,:].astype('float32')Z=y5[2,:].astype('float32')ax.scatter(X,Y,Z,c='#A52A2A')X=y7[0,:].astype('float32')Y=y7[1,:].astype('float32')Z=y7[2,:].astype('float32')ax.scatter(X,Y,Z,c='#F5FFFA')X=y8[0,:].astype('float32')Y=y8[1,:].astype('float32')Z=y8[2,:].astype('float32')ax.scatter(X,Y,Z,c='#00FF00')X=y9[0,:].astype('float32')Y=y9[1,:].astype('float32')Z=y9[2,:].astype('float32')ax.scatter(X,Y,Z,c='#E6E6FA')plt.xlabel('LD1')plt.ylabel('LD2')plt.legend(loc='upper right', fancybox=True)plt.legend(loc='upper center', fancybox=True, ncol=5, labels=['3','6','0','1','2','4','5','7','8','9'])plt.title('LDA: Data is projected onto the first 3 feature vectors')plt.show()#将十个数投影到以3,6分类寻找到的三维特征空间中,查看分类情况#十分类问题后的尝试画sb/sw#np.linalg.det()sw0=swi(images0_01)+swi(images3_01)+swi(images6_01)sb0=sb0+sb3+sb6sw0=np.linalg.det(sw0)value0=sb0/sw0sw1=swi(images1_01)+swi(images3_01)+swi(images6_01)sb1=sb1+sb3+sb6sw1=np.linalg.det(sw1)value1=sb1/sw1sw2=swi(images2_01)+swi(images3_01)+swi(images6_01)sb2=sb2+sb3+sb6sw2=np.linalg.det(sw2)value1=sb2/sw2sw4=swi(images4_01)+swi(images3_01)+swi(images6_01)sb4=sb4+sb3+sb6sw4=np.linalg.det(sw4)value4=sb4/sw4sw5=swi(images5_01)+swi(images3_01)+swi(images6_01)sb5=sb5+sb3+sb6sw5=np.linalg.det(sw5)value5=sb5/sw5sw7=swi(images7_01)+swi(images3_01)+swi(images6_01)sb7=sb7+sb3+sb6sw7=np.linalg.det(sw7)value7=sb7/sw7sw8=swi(images8_01)+swi(images3_01)+swi(images6_01)sb8=sb8+sb3+sb6sw8=np.linalg.det(sw8)value8=sb8/sw8sw9=swi(images9_01)+swi(images3_01)+swi(images6_01)sb9=sb9+sb3+sb6sw9=np.linalg.det(sw9)value9=sb9/sw9x = [0,1,2,4,5,7,8,9]y = [value0,value1,value2,value4,value5,value7,value8,value9]plt.plot(x, y, 'ro-')plt.show()

注:

#使用编程环境:python3.8 由于软件问题,必须解析读取数据集函数,解析函数已经写好,

#解析函数是csdn某博主写的,我这里代码时使用他的,如有版权问题请与我联系

#请将训练集文件等的!!路径!!填到下面的格式里,格式里不要有中文

问题

任务二:

#此问中我的代码思路:

#1首先处理数据

#2投影图1:将十个数投影到以3,6,分类寻找到的三维特征空间中,查看分类情况

#3投影图2:计算3 & 6的SW & SB,并将十个数据放到根据3 6找出的特征向量中测试性能

#4最后尝试计算sb/swi

#有两个投影图,请等待程序运行

#将十组数据投影到根据之组数据计算出的三维特征向量的空间中

#观察分类投影图认为1时离36最远的数字

#十分类问题后的思考投影到n维度的问题,

#私以为如果需要无特征损耗降维则需要求解(sw_1sb)矩阵的秩,

#秩数=特征向量的数量时没有特征损耗地降维,

#但当讨论到1,2,3维度问题时,需要分析(sw_1sb)矩阵的特征值的大小,

#如果前两个特征值明显大于第三维特征值时,那么可以认为第三维度相对前两个维度时没那么重要的

#个人看法,如有不对的请多指导

#由于多分类问题是通过比较Sb/Sw来做出判断,

因此,对问题做出以下思路:可以找到十分类的w1w2w3,然后将每个数据点投影到空间内,

先求出每组数据再三位空间中各自的平均值,类内散度计算通过每个数据与空间平均值的几何距离之和

先求出10组数据再三位空间中公有的平均值,类间散度计算通过每个数据与空间平均值的几何距离之和

最后比较Sb/Sw,商越大的该数据越容易与其他数据区分。

.11月完成

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。