700字范文,内容丰富有趣,生活中的好帮手!
700字范文 > python爬虫+数据分析之NBA球员LBJ13个赛季的数据分析

python爬虫+数据分析之NBA球员LBJ13个赛季的数据分析

时间:2023-04-25 14:08:45

相关推荐

python爬虫+数据分析之NBA球员LBJ13个赛季的数据分析

python爬虫

最近在看数据分析的书籍,想着自己分析一点东西来,本人比较喜欢NBA,自然就先拿NBA作为分析的对象了,首先要获得最全的NBA数据,根据搜索的结果对比发现,stat-该网站的数据非常全面详细,真是业界良心。

数据源找到后第一件事情自然就是获取数据,这里用python的原生代码直接爬取的,个人觉的,beautifulSoup还是比较好的,只是一开始没有用,也就后来没用了,废话少叙,直接上代码。

#coding=utf-8import urllibimport reimport csvimport sys#计数,初始化count = 0#以下定义的与之对应的是球员姓名、赛季、胜负、比赛、首发、时间、投篮命中率、投篮命中数、投篮出手数、三分命中率、三分命中数、三分出手数、罚球命中率、罚球命中数、罚球次数、总篮板数、前场篮板数、后场篮板数、助攻数、抢断数、盖帽数、失误数、犯规数、得分list0 = []list1 = []list2 = []list3 = []list4 = []list5 = []list6 = []list7 = []list8 = []list9 = []list10 = []list11 = []list12 = []list13 = []list14 = []list15 = []list16 = []list17 = []list18 = []list19 = []list20 = []list21 = []list22 = []list23 = []list24 = []list25 = []list26 = []#定义获取页面函数def getHtml(url):page = urllib.urlopen(url)html = page.read()return html#获取数据并存入数据库中for k in range(0,51):#获取当前页面,该页面只有LBJ的职业生涯常规赛的数据,截止到.12.26html = getHtml("http://www.stat-/query.php?QueryType=game&GameType=season&Player_id=1862&crtcol=season&order=1&page=" + str(k))# 获取球员姓名、赛季、胜负、比赛、首发、时间、投篮命中率、投篮命中数、投篮出手数、三分命中率、三分命中数、三分出手数、罚球命中率、罚球命中数、罚球次数、总篮板数、前场篮板数、后场篮板数、助攻数、抢断数、盖帽数、失误数、犯规数、得分#正则得到相对应的数值playerdata = re.findall(r'<td class ="normal player_name_out change_color col1 row.+"><a.*>(.*)</a></td>'r'\s*<td class ="current season change_color col2 row.+"><a.*>(.*)</a></td>'r'\s*<td class ="normal wl change_color col3 row.+">(.*)</td>'r'\s*<td class ="normal result_out change_color col4 row.+"><a.*>(\D*|76人)(\d+)-(\d+)(\D*)</a></td>'r'\s*<td class ="normal gs change_color col5 row.+">(.*)</td>'r'\s*<td class ="normal mp change_color col6 row.+">(.*)</td>'r'\s*<td class ="normal fgper change_color col7 row.+">(.*%|\s*)</td>'r'\s*<td class ="normal fg change_color col8 row.+">(.*)</td>'r'\s*<td class ="normal fga change_color col9 row.+">(.*)</td>'r'\s*<td class ="normal threepper change_color col10 row.+">(.*%|\s*)</td>'r'\s*<td class ="normal threep change_color col11 row.+">(.*)</td>'r'\s*<td class ="normal threepa change_color col12 row.+">(.*)</td>'r'\s*<td class ="normal ftper change_color col13 row.+">(.*%|\s*)</td>'r'\s*<td class ="normal ft change_color col14 row.+">(.*)</td>'r'\s*<td class ="normal fta change_color col15 row.+">(.*)</td>'r'\s*<td class ="normal trb change_color col16 row.+">(.*)</td>'r'\s*<td class ="normal orb change_color col17 row.+">(.*)</td>'r'\s*<td class ="normal drb change_color col18 row.+">(.*)</td>'r'\s*<td class ="normal ast change_color col19 row.+">(.*)</td>'r'\s*<td class ="normal stl change_color col20 row.+">(.*)</td>'r'\s*<td class ="normal blk change_color col21 row.+">(.*)</td>'r'\s*<td class ="normal tov change_color col22 row.+">(.*)</td>'r'\s*<td class ="normal pf change_color col23 row.+">(.*)</td>'r'\s*<td class ="normal pts change_color col24 row.+">(.*)</td>', html)#获取每条数据,for data in playerdata:#将元组数据复制给列表,进行修改,数据中有空值,和含有%号的值,进行处理,得到数值data1 = [data[0], data[1], data[2], data[3], int(data[4]), data[5], data[6], data[7], data[8], data[9],data[10], data[11], data[12], data[13], data[14], data[15], data[16], data[17], data[18], data[19],data[20], data[21], data[22], data[23], data[24], data[25], data[26]]#将百分号去掉,只保留数值部分if (data1[15] == ' '):data1[15] = 0else:data1[15] = float("".join(re.findall(r'(.*)%', data1[15])))if (data1[9] == ' '):data1[9] = 0else:data1[9] = float("".join(re.findall(r'(.*)%', data1[9])))if (data1[12] == ' '):data1[12] = 0else:data1[12] = float("".join(re.findall(r'(.*)%', data1[12])))list0.append(data1[0])list1.append(data1[1])list2.append(data1[2])list3.append(data1[3])list4.append(data1[4])list5.append(data1[5])list6.append(data1[6])list7.append(data1[7])list8.append(data1[8])list9.append(data1[9])list10.append(data1[10])list11.append(data1[11])list12.append(data1[12])list13.append(data1[13])list14.append(data1[14])list15.append(data1[15])list16.append(data1[16])list17.append(data1[17])list18.append(data1[18])list19.append(data1[19])list20.append(data1[20])list21.append(data1[21])list22.append(data1[22])list23.append(data1[23])list24.append(data1[24])list25.append(data1[25])list26.append(data1[26])# 记录数据数量count += 1#建立csv存储文件,wb写 a+追加模式csvfile = file('nbadata.csv', 'ab+')writer = csv.writer(csvfile)#将提取的数据合并data2 = []for i in range(0,count):data2.append((list0[i],list1[i],list2[i],list3[i],list4[i],list5[i],list6[i],list7[i],list8[i],list9[i],list10[i],list11[i],list12[i],list13[i],list14[i],list15[i],list16[i],list17[i],list18[i],list19[i],list20[i],list21[i],list22[i],list23[i],list24[i], list25[i],list26[i]))#将合并的数据存入csvwriter.writerows(data2)csvfile.close()

经过爬取数据后得到了nbadata.csv文件,数据到手,下面就是分析了。

数据分析及可视化

这里只是简单的对LBJ职业生涯常规赛数据进行了分析,时间有限,就做了两个分析,一个是对常规赛的得分相同的次数进行统计。得出每个的分段的得分总次数;另一个是对过去13个赛季的五项能力(包括得分、篮板、助攻、盖帽、抢断)进行分析得出13个能力值图。

可视化部分应用的是flask框架搭建的web网站,前端用百度echart.js进行图像的搭建(强烈推荐echart,作图太666

直接上代码:

mydata.py

#coding:utf-8import sysreload(sys)sys.setdefaultencoding("utf-8")from flask import Flask,render_templatefrom flask_bootstrap import Bootstrapfrom pandas import DataFrame,Seriesimport pandas as pdimport numpy as npimport csv#文件路径srcFilePath = "c:/myflask/nbadata.csv"#读取cvs格式的数据文件reader = csv.reader(file(srcFilePath,'rb'))#csv中各列属性代表的含义(1)代表第一列# 球员姓名(1)、赛季(2)、胜负(3)、对手球队名称(4)、对手球队总得分(5)、己方球队总得分(6)# 、己方球队名称(7)、首发(8)【1为首发,0为替补】、上场时间(9)、投篮命中率(10)、投篮命中数(11)# 、投篮出手数(12)、三分命中率(13)、三分命中数(14)、三分出手数(15)、罚球命中率(16)# 、罚球命中数(17)、罚球次数(18)、总篮板数(19)、前场篮板数(20)、后场篮板数(21)、助攻数(22)# 、抢断数(23)、盖帽数(24)、失误数(25)、犯规数(26)、得分(27)records = [line for line in reader]frame = DataFrame(records)#获取得分数对应的场次数目pts_count = frame[26].value_counts()a = []b = []for i in pts_count.keys():a.append(i)for i in pts_count:b.append(i)c = {}for i in range(0,len(a)):c[int(a[i])] = int(b[i])d = sorted(c.items(), key=lambda c:c[0])#存储得分分数e = []#存储相应分数的次数f = []for i in d:e.append(i[0])f.append(i[1])#15-16赛季球员得分助攻篮板抢断盖帽平均值records_p1 = [(int(line[26]),int(line[21]),int(line[18]),int(line[22]),int(line[23])) for line in records if line[1] == '03-04']records_p2 = [(int(line[26]),int(line[21]),int(line[18]),int(line[22]),int(line[23])) for line in records if line[1] == '04-05']records_p3 = [(int(line[26]),int(line[21]),int(line[18]),int(line[22]),int(line[23])) for line in records if line[1] == '05-06']records_p4 = [(int(line[26]),int(line[21]),int(line[18]),int(line[22]),int(line[23])) for line in records if line[1] == '06-07']records_p5 = [(int(line[26]),int(line[21]),int(line[18]),int(line[22]),int(line[23])) for line in records if line[1] == '07-08']records_p6 = [(int(line[26]),int(line[21]),int(line[18]),int(line[22]),int(line[23])) for line in records if line[1] == '08-09']records_p7 = [(int(line[26]),int(line[21]),int(line[18]),int(line[22]),int(line[23])) for line in records if line[1] == '09-10']records_p8 = [(int(line[26]),int(line[21]),int(line[18]),int(line[22]),int(line[23])) for line in records if line[1] == '10-11']records_p9 = [(int(line[26]),int(line[21]),int(line[18]),int(line[22]),int(line[23])) for line in records if line[1] == '11-12']records_p10 = [(int(line[26]),int(line[21]),int(line[18]),int(line[22]),int(line[23])) for line in records if line[1] == '12-13']records_p11 = [(int(line[26]),int(line[21]),int(line[18]),int(line[22]),int(line[23])) for line in records if line[1] == '13-14']records_p12 = [(int(line[26]),int(line[21]),int(line[18]),int(line[22]),int(line[23])) for line in records if line[1] == '14-15']records_p13 = [(int(line[26]),int(line[21]),int(line[18]),int(line[22]),int(line[23])) for line in records if line[1] == '15-16']g1 = [float('%0.1f' % i) for i in DataFrame(records_p1).mean()]g2 = [float('%0.1f' % i) for i in DataFrame(records_p2).mean()]g3 = [float('%0.1f' % i) for i in DataFrame(records_p3).mean()]g4 = [float('%0.1f' % i) for i in DataFrame(records_p4).mean()]g5 = [float('%0.1f' % i) for i in DataFrame(records_p5).mean()]g6 = [float('%0.1f' % i) for i in DataFrame(records_p6).mean()]g7 = [float('%0.1f' % i) for i in DataFrame(records_p7).mean()]g8 = [float('%0.1f' % i) for i in DataFrame(records_p8).mean()]g9 = [float('%0.1f' % i) for i in DataFrame(records_p9).mean()]g10 = [float('%0.1f' % i) for i in DataFrame(records_p10).mean()]g11 = [float('%0.1f' % i) for i in DataFrame(records_p11).mean()]g12 = [float('%0.1f' % i) for i in DataFrame(records_p12).mean()]g13 = [float('%0.1f' % i) for i in DataFrame(records_p13).mean()]app = Flask(__name__)#引入bootstrap前端框架bootstrap = Bootstrap(app)@app.route('/')def hello_world():return render_template('index.html', a=e, b=f, c1=g1,c2=g2,c3=g3,c4=g4,c5=g5,c6=g6,c7=g7,c8=g8,c9=g9,c10=g10,c11=g11,c12=g12,c13=g13)if __name__ == '__main__':app.run(debug=True)

前端:

index.html

{% extends "base.html" %}{% block title %}Flasky{% endblock %}{% block page_content %}<div class="page-header"><h1>数据分析</h1></div><!-- 为ECharts准备一个具备大小(宽高)的Dom --><div id="main" style="height:400px; width: auto"></div><div id="main2" style="height:600px; width: auto; background-color: #333"><div id="s1" style="height:600px; width: auto"></div></div><!-- ECharts单文件引入 --><script src="../static/echarts.js"></script><!-- 主题文件引入 --><script src="../static/dark.js"></script><script type="text/javascript">// 基于准备好的dom,初始化echarts图表var myChart = echarts.init(document.getElementById('main'));var option = {title: {text: '得分次数图',subtext: '数据来源:www.stat-'},tooltip: {trigger: 'axis'},legend: {data: ['次数']},calculable: true,xAxis: [{type: 'category',boundaryGap: false,axisLabel: {formatter: '{value}分',rotate: 45,},data: {{ a }}}],yAxis: [{type: 'value',axisLabel: {formatter: '{value} 次数'}}],series: [{name: '次数',type: 'bar',data:{{ b }},markPoint: {data: [{type: 'max', name: '最大次数'},{type: 'min', name: '最小次数'}]}},]}// 为echarts对象加载数据myChart.setOption(option);</script><script type="text/javascript">// 基于准备好的dom,初始化echarts图表var myChart1 = echarts.init(document.getElementById('s1'),'dark');option = {legend: {data: ['03-04赛季', '04-05赛季', '05-06赛季', '06-07赛季', '07-08赛季', '08-09赛季', '09-10赛季', '10-11赛季', '11-12赛季', '12-13赛季', '13-14赛季', '14-15赛季', '15-16赛季'],textStyle:{fontSize:8,}},radar: [//03-04赛季{indicator: [{name: '得分', max: 28.0},{name: '助攻', max: 9.2},{name: '篮板', max: 13.9},{name: '抢断', max: 2.4},{name: '盖帽', max: 3.6}],center: ['10%', '25%'],radius: 80,name:{textStyle: {color:'#67d15d',fontSize: 6}}},//04-05赛季{indicator: [{name: '得分', max: 30.7},{name: '助攻', max: 11.5},{name: '篮板', max: 13.5},{name: '抢断', max: 2.9},{name: '盖帽', max: 3.0}],center: ['30%', '25%'],radius: 80,name:{textStyle: {color:'#d1c373',fontSize: 6}}},//05-06赛季{indicator: [{name: '得分', max: 35.4},{name: '助攻', max: 10.5},{name: '篮板', max: 12.7},{name: '抢断', max: 2.3},{name: '盖帽', max: 3.2}],center: ['50%', '25%'],radius: 80,name:{textStyle: {color:'#d16a62',fontSize: 6}}},//06-07赛季{indicator: [{name: '得分', max: 31.6},{name: '助攻', max: 11.6},{name: '篮板', max: 12.8},{name: '抢断', max: 2.1},{name: '盖帽', max: 3.3}],center: ['70%', '25%'],radius: 80,name:{textStyle: {color:'#d170b6',fontSize: 6}}},//07-08赛季{indicator: [{name: '得分', max: 30.0},{name: '助攻', max: 11.6},{name: '篮板', max: 14.2},{name: '抢断', max: 2.7},{name: '盖帽', max: 3.6}],center: ['90%', '25%'],radius: 80,name:{textStyle: {color:'#8f45d1',fontSize: 6}}},//08-09赛季{indicator: [{name: '得分', max: 30.2},{name: '助攻', max: 11.0},{name: '篮板', max: 13.8},{name: '抢断', max: 2.8},{name: '盖帽', max: 2.9}],center: ['10%', '55%'],radius: 80,name:{textStyle: {color:'#4048d1',fontSize: 6}}},//09-10赛季{indicator: [{name: '得分', max: 30.1},{name: '助攻', max: 11.0},{name: '篮板', max: 13.2},{name: '抢断', max: 2.3},{name: '盖帽', max: 2.8}],center: ['30%', '55%'],radius: 80,name:{textStyle: {color:'#d11872',fontSize: 6}}},//10-11赛季{indicator: [{name: '得分', max: 27.7},{name: '助攻', max: 11.4},{name: '篮板', max: 15.2},{name: '抢断', max: 2.4},{name: '盖帽', max: 2.6}],center: ['50%', '55%'],radius: 80,name:{textStyle: {color:'#d1c80e',fontSize: 6}}},//11-12赛季{indicator: [{name: '得分', max: 28.0},{name: '助攻', max: 11.7},{name: '篮板', max: 14.5},{name: '抢断', max: 2.5},{name: '盖帽', max: 3.7}],center: ['70%', '55%'],radius: 80,name:{textStyle: {color:'#09e8ac',fontSize: 6}}},//12-13赛季{indicator: [{name: '得分', max: 28.7},{name: '助攻', max: 9.7},{name: '篮板', max: 12.4},{name: '抢断', max: 2.4},{name: '盖帽', max: 3.0}],center: ['90%', '55%'],radius: 80,name:{textStyle: {color:'#9c8eca',fontSize: 6}}},//13-14赛季{indicator: [{name: '得分', max: 32.0},{name: '助攻', max: 10.7},{name: '篮板', max: 13.7},{name: '抢断', max: 2.5},{name: '盖帽', max: 2.8}],center: ['10%', '85%'],radius: 80,name:{textStyle: {color:'#a6fdaa',fontSize: 6}}},//14-15赛季{indicator: [{name: '得分', max: 28.1},{name: '助攻', max: 10.2},{name: '篮板', max: 15.0},{name: '抢断', max: 2.3},{name: '盖帽', max: 2.9}],center: ['30%', '85%'],radius: 80,name:{textStyle: {color:'#faa60d',fontSize: 6}}},//15-16赛季{indicator: [{name: '得分', max: 30.1},{name: '助攻', max: 11.7},{name: '篮板', max: 14.8},{name: '抢断', max: 2.1},{name: '盖帽', max: 3.7}],center: ['50%', '85%'],radius: 80,name:{textStyle: {color:'#72ACD1',fontSize: 6}}}],series: [//03-04赛季{name: '03-04赛季',type: 'radar',radarIndex: 0,textStyle:{color:'#fff'},data : [{value : {{ c1 }},name : '03-04赛季',label: {normal: {show: true,textStyle:{color:"#fff",fontSize:8}}},areaStyle: {normal: {color: 'rgba(100, 100, 255, 0.5)',}},}]},//04-05{name: '04-05赛季',type: 'radar',radarIndex: 1,textStyle:{color:'#fff'},data : [{value : {{ c2 }},name : '04-05赛季',label: {normal: {show: true,textStyle:{color:"#fff",fontSize:8}}},areaStyle: {normal: {color: 'rgba(100, 100, 255, 0.5)',}},}]},//05-06{name: '05-06赛季',type: 'radar',radarIndex: 2,textStyle:{color:'#fff'},data : [{value : {{ c3 }},name : '05-06赛季',label: {normal: {show: true,textStyle:{color:"#fff",fontSize:8}}},areaStyle: {normal: {color: 'rgba(100, 100, 255, 0.5)',}},}]},//06-07{name: '06-07赛季',type: 'radar',radarIndex: 3,textStyle:{color:'#fff'},data : [{value : {{ c4 }},name : '06-07赛季',label: {normal: {show: true,textStyle:{color:"#fff",fontSize:8}}},areaStyle: {normal: {color: 'rgba(100, 100, 255, 0.5)',}},}]},//07-08{name: '07-08赛季',type: 'radar',radarIndex: 4,textStyle:{color:'#fff'},data : [{value : {{ c5 }},name : '07-08赛季',label: {normal: {show: true,textStyle:{color:"#fff",fontSize:8}}},areaStyle: {normal: {color: 'rgba(100, 100, 255, 0.5)',}},}]},//08-09{name: '08-09赛季',type: 'radar',radarIndex: 5,textStyle:{color:'#fff'},data : [{value : {{ c6 }},name : '08-09赛季',label: {normal: {show: true,textStyle:{color:"#fff",fontSize:8}}},areaStyle: {normal: {color: 'rgba(100, 100, 255, 0.5)',}},}]},//09-10{name: '09-10赛季',type: 'radar',radarIndex: 6,textStyle:{color:'#fff'},data : [{value : {{ c7 }},name : '09-10赛季',label: {normal: {show: true,textStyle:{color:"#fff",fontSize:8}}},areaStyle: {normal: {color: 'rgba(100, 100, 255, 0.5)',}},}]},//10-11{name: '10-11赛季',type: 'radar',radarIndex: 7,textStyle:{color:'#fff'},data : [{value : {{ c8 }},name : '10-11赛季',label: {normal: {show: true,textStyle:{color:"#fff",fontSize:8}}},areaStyle: {normal: {color: 'rgba(100, 100, 255, 0.5)',}},}]},//11-12{name: '11-12赛季',type: 'radar',radarIndex: 8,textStyle:{color:'#fff'},data : [{value : {{ c9 }},name : '11-12赛季',label: {normal: {show: true,textStyle:{color:"#fff",fontSize:8}}},areaStyle: {normal: {color: 'rgba(100, 100, 255, 0.5)',}},}]},//12-13{name: '12-13赛季',type: 'radar',radarIndex: 9,textStyle:{color:'#fff'},data : [{value : {{ c10 }},name : '12-13赛季',label: {normal: {show: true,textStyle:{color:"#fff",fontSize:8}}},areaStyle: {normal: {color: 'rgba(100, 100, 255, 0.5)',}},}]},//13-14{name: '13-14赛季',type: 'radar',radarIndex: 10,textStyle:{color:'#fff'},data : [{value : {{ c11 }},name : '13-14赛季',label: {normal: {show: true,textStyle:{color:"#fff",fontSize:8}}},areaStyle: {normal: {color: 'rgba(100, 100, 255, 0.5)',}},}]},//14-15{name: '14-15赛季',type: 'radar',radarIndex: 11,textStyle:{color:'#fff'},data : [{value : {{ c12 }},name : '14-15赛季',label: {normal: {show: true,textStyle:{color:"#fff",fontSize:8}}},areaStyle: {normal: {color: 'rgba(100, 100, 255, 0.5)',}},}]},//15-16{name: '15-16赛季',type: 'radar',radarIndex: 12,textStyle:{color:'#fff'},data : [{value : {{ c13 }},name : '15-16赛季',label: {normal: {show: true,textStyle:{color:"#fff",fontSize:8}}},areaStyle: {normal: {color: 'rgba(100, 100, 255, 0.5)',}},}]},]};// 为echarts对象加载数据myChart1.setOption(option);</script>{% endblock %}

显示效果:

马上下班,有空再写。

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。