700字范文,内容丰富有趣,生活中的好帮手!
700字范文 > 《scikit-learn机器学习》波斯顿房价预测(线性回归预测)

《scikit-learn机器学习》波斯顿房价预测(线性回归预测)

时间:2019-05-22 10:57:15

相关推荐

《scikit-learn机器学习》波斯顿房价预测(线性回归预测)

本节内容:

首先是要导入数据,看数据有多少个样本,有多少个特征标签,对其进行模型训练,用线性回归的方式对80%的训练集进行训练,发现训练的score比较低,优化为多项式模型,画学习曲线判断哪个多项式最符合,然后就训练出房价。

导入模块

%matplotlib inlineimport matplotlib.pyplot as pltimport numpy as np

导入数据并且看有多少样本

from sklearn.datasets import load_bostonboston = load_boston()X = boston.datay = boston.targetX.shape

看有多少特征标签

boston.feature_names

分离数据集

from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=3)

用线性回归模型进行训练

python3.8的time中,clock被禁用了,改为:perf_counter()

import timefrom sklearn.linear_model import LinearRegressionmodel = LinearRegression()start = time.perf_counter() model.fit(X_train, y_train)train_score = model.score(X_train, y_train)cv_score = model.score(X_test, y_test)print('elaspe: {0:.6f}; train_score: {1:0.6f}; cv_score: {2:.6f}'.format(time.perf_counter()-start, train_score, cv_score))

发现预测的效果不太好,优化模型

二阶多项式模型:degree=2

from sklearn.linear_model import LinearRegressionfrom sklearn.preprocessing import PolynomialFeaturesfrom sklearn.pipeline import Pipelinedef polynomial_model(degree=1):polynomial_features = PolynomialFeatures(degree=degree,include_bias=False)linear_regression = LinearRegression(normalize=True)pipeline = Pipeline([("polynomial_features", polynomial_features),("linear_regression", linear_regression)])return pipelinemodel = polynomial_model(degree=2)start = time.perf_counter()model.fit(X_train, y_train)train_score = model.score(X_train, y_train)cv_score = model.score(X_test, y_test)print('elaspe: {0:.6f}; train_score: {1:0.6f}; cv_score: {2:.6f}'.format(time.perf_counter()-start, train_score, cv_score))

绘制学习曲线

degree = 1,2,3,也就是一阶,二阶,三阶

二阶的学习曲线交叉间距太大,说明需要更多的训练集

from common.utils import plot_learning_curvefrom sklearn.model_selection import ShuffleSplitcv = ShuffleSplit(n_splits=10, test_size=0.2, random_state=0)plt.figure(figsize=(18, 4))title = 'Learning Curves (degree={0})'degrees = [1, 2, 3]start = time.clock()plt.figure(figsize=(18, 4), dpi=200)for i in range(len(degrees)):plt.subplot(1, 3, i + 1)plot_learning_curve(plt, polynomial_model(degrees[i]), title.format(degrees[i]), X, y, ylim=(0.01, 1.01), cv=cv)print('elaspe: {0:.6f}'.format(time.clock()-start))

如果为三阶

其中score为负数,说明这个过拟合过分了,不适合

其实线性回归的模型都差不多,代码都是一个样子的,只用换degree就行。

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。