700字范文 > 使用线性回归构建房价预测模型

使用线性回归构建房价预测模型

时间：2023-07-23 20:29:01

相关推荐

使用线性回归构建房价预测模型

前言一、采用正规方程方法1.1引入构建模型所需要的库1.2构建模型代码二、采用梯度下降方法2.1引入库2.2模型构建总结

前言

采用 scikit-learn 自带的数据集 load_boston 并采用线性回归方法构建房价预测模型。模型的优化一共采用了两种方法：正规方程、梯度下降

正规方程和梯度下降法是线性回归经常使用的两种优化算法。都以最小化损失函数为目标得到模型参数。

构建模型的标准步骤：

获取数据数据基本处理：分割数据、缺失值处理等特征工程：标准化、归一化等模型训练模型评估

scikit-learn 版本：0.20.4

python 版本：3.7.10

一、采用正规方程方法

1.1引入构建模型所需要的库

# 获取数据集from sklearn.datasets import load_boston# 数据分割from sklearn.model_selection import train_test_split# 数据标准化from sklearn.preprocessing import StandardScaler# 引入 scikit-learn 内置的正规方程优化模型from sklearn.linear_model import LinearRegression# 评估模型采用均方误差from sklearn.metrics import mean_squared_error

1.2构建模型代码

def linear_model1():"""线性回归:正规方程:return:"""# 1.获取数据boston = load_boston()# print(boston)# 2.数据基本处理# 2.1 分割数据x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.2)# 3.特征工程-标准化transfer = StandardScaler()# 属性值需要标准化，目标值不需要标准化x_train = transfer.fit_transform(x_train)x_test = transfer.fit_transform(x_test)# 4.机器学习-线性回归（通过正规方程方法）estimator = LinearRegression()estimator.fit(x_train, y_train) # 把训练数据的属性和目标值都传进去# 5.模型评估# 5.1 预测值y_pre = estimator.predict(x_test) #传入测试数据、传出预测目标值# print("预测值是:\n", y_pre)# 5.2 均方误差ret = mean_squared_error(y_test,y_pre) # 传入目标值和预测目标值，得到预测值与真实值的均方误差print("均方误差:\n", ret)

二、采用梯度下降方法

梯度下降算法与正规方程方法本身有很大差别，但在使用 scikit-learn 进行模型训练时，只是调用的模型方法不同

采用梯度下降方法，需要引入 SGDRegressor 随机梯度下降学习

2.1引入库

引入库的过程与前面并没有什么区别

# 获取数据集from sklearn.datasets import load_boston# 数据分割from sklearn.model_selection import train_test_split# 数据标准化from sklearn.preprocessing import StandardScaler# 引入 scikit-learn 内置的正规方程优化模型from sklearn.linear_model import SGDRegressor# 评估模型采用均方误差from sklearn.metrics import mean_squared_error

2.2模型构建

模型构建也没有什么区别，只是实例化模型的方法不同

def linear_model2():"""线性回归:梯度下降法:return:"""# 1.获取数据boston = load_boston()# print(boston)# 2.数据基本处理# 2.1 分割数据x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.2)# 3.特征工程-标准化transfer = StandardScaler()x_train = transfer.fit_transform(x_train)x_test = transfer.fit_transform(x_test)# 4.机器学习-线性回归（采用梯度下降算法）# 对于梯度下降算法从scikit-learn版本0.19后需要设置停止的规则，包括两个参数：max_iter、tol# 学习率算法默认采用 invscaling 会根据模型的训练对学习率动态调整estimator = SGDRegressor(max_iter=2000,tol=0.001)estimator.fit(x_train, y_train)# 5.模型评估# 5.1 预测值y_pre = estimator.predict(x_test)# 5.2 均方误差ret = mean_squared_error(y_test, y_pre)print("均方误差:\n", ret)