700字范文 > 机器学习实例（四）鸢尾(Iris)数据集

机器学习实例（四）鸢尾(Iris)数据集

时间：2024-07-22 16:59:09

数据描述

Number of Instances: 150 (50 in each of three classes)

Number of Attributes: 4 numeric, predictive attributes and the class

Missing Attribute Values: None

Attribute Information:

sepal length in cmsepal width in cmpetal length in cmpetal width in cmclass: Iris-SetosaIris-VersicolourIris-Virginica

Class Distribution: 33.3% for each of 3 classes.

数据来自Scikit-learn工具包中的Iris数据集

# 从sklearn.datasets导入iris数据加载器from sklearn.datasets import load_iris# 使用加载器读取数据并且存入变量irisiris = load_iris()# 查验数据规模iris.data.shape

Iris数据集共有150朵鸢尾数据样本，并且均匀分布在3个不同的亚种；每个数据样本被4个不同的花瓣、花萼的形状特征所描述。按照惯例，随即分割训练集和测试集

# 从sklearn.model_selection里选择导入train_test_split用于数据分割from sklearn.model_selection import train_test_split# 从使用train_test_split利用随机种子random_state采样25%的数据作为测试集X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.25, random_state=33)# 从sklearn.preprocessing里选择导入数据标准化模块from sklearn.preprocessing import StandardScaler# 对训练和测试的特征数据进行标准化ss = StandardScaler()X_train = ss.fit_transform(X_train)X_test = ss.transform(X_test)

尝试模型

这里尝试的是K近邻分类器

# 从sklearn.neighbors里选择导入KNeighborsClassifier,即K近邻分类器from sklearn.neighbors import KNeighborsClassifier# 使用K近邻分类器对测试数据进行类别预测,预测结果储存在变量y_predict中knc = KNeighborsClassifier()knc.fit(X_train, y_train)y_predict = knc.predict(X_test)

模型评估

同样，使用准确率、查准率、召回率、F1指标对模型进行评估

# 使用模型自带的评估函数进行准确性评测print('The accuracy of K-Nearest Neighbor Classifier is', knc.score(X_test, y_test))# 依然使用sklearn.metrics里面的classification_report模块对预测结果做更加详细的分析from sklearn.metrics import classification_reportprint(classification_report(y_test, y_predict, target_names=iris.target_names))

K近邻分类器对38条鸢尾花测试样本分类的准确性约为89.474%，平均精确率、召回率以及F1指标分别为0.92、0.89和0.90

K近邻属于无参数模型中非常简单的一种。然而这样的决策算法，导致了非常高的计算复杂度和内存消耗。因为该模型每处理一个测试样本，都需要对所有预先加载在内存的训练样本进行遍历、逐一计算相似度、排序并且选择K个最近邻训练样本的标记，进而做出分类决策。

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。