700字范文,内容丰富有趣,生活中的好帮手!
700字范文 > 利用Python爬取新冠肺炎疫情实时数据 Pyecharts画-nCoV疫情地图

利用Python爬取新冠肺炎疫情实时数据 Pyecharts画-nCoV疫情地图

时间:2021-11-12 10:06:28

相关推荐

利用Python爬取新冠肺炎疫情实时数据 Pyecharts画-nCoV疫情地图

前言

博客是2年前写的,中间有好多网友私信我获取源码及指出部分bug,感谢支持!

取不到数据的原因是数据接口发生较大变化,最近刚好有时间,所以重新整理了一下。

第一部分 网页分析

数据源 腾讯疫情实时追踪

今天重新整理了一下,发现数据结构和之前的有所变化,把具体的操作步骤也说一下吧!打开网址推荐使用火狐浏览器,F12 进入开发者工具(刷新一下页面),如下,所有数据都可以通过接口获取:

国内数据接口:

https://api./newsqa/v1/query/inner/publish/modules/list?modules=各省历史数据接口:

https://api./newsqa/v1/query/pubished/daily/list?adCode=国外数据接口:

https://api./newsqa/v1/automation/modules/list?modules=

第二部分 数据获取

导入模块

import time import jsonimport requestsfrom datetime import datetimeimport pandas as pd import numpy as npimport matplotlib.pyplot as plt pd.set_option('display.max_columns', None)pd.set_option('display.max_rows', None) plt.rcParams['font.sans-serif']=['SimHei'] #用来正常显示中文标签plt.rcParams['axes.unicode_minus']=False plt.style.use('ggplot')

抓取数据

获取步骤:

先定义接口调用函数通过接口名获取数据,然后用pandas处理成Dataframe格式

def catch_data(api_name):url = 'https://api./newsqa/v1/query/inner/publish/modules/list?modules=' + api_namereponse = requests.get(url=url).json()return reponse

国内数据明细接口-chinaDayList

# 近60天国内现有数据汇总chinadaylist = catch_data('chinaDayList')chinadaylist = pd.DataFrame(chinadaylist['data']['chinaDayList'])chinadaylist['date'] = pd.to_datetime(chinadaylist['y'].astype('str') + '.' + chinadaylist['date'])chinadaylist = chinadaylist[['date','confirm','heal','dead','importedCase','nowConfirm','nowSevere','localConfirm']]chinadaylist.columns = ['日期','累计确诊','累计治愈','累计死亡','累计境外输入','现有确诊','现有重症','本土现有确诊']chinadaylist.tail()

国内每日新增数据接口-chinaDayAddListNew

#国内每日新增数据chinanewadd = catch_data('chinaDayAddListNew')chinanewadd = pd.DataFrame(chinanewadd['data']['chinaDayAddListNew'])chinanewadd['date'] = pd.to_datetime(chinanewadd['y'].astype('str') + '.' + chinanewadd['date'])chinanewadd = chinanewadd[['date','confirm','dead','heal','infect','importedCase','localConfirmadd','localinfectionadd']]chinanewadd.columns = ['日期','新增确诊','新增死亡','新增治愈','新增无症状','新增境外','本土新增确诊','本土新增无症状']chinanewadd.tail()

国内城市数据接口-diseaseh5Shelf

省份数据和城市数据处理方法:

观察网页框架,省份数据在diseaseh5Shelf这个接口diseaseh5Shelf返回一个字典,数据在areaTree,areaTree是一个list,list中第一个元素的children是一个存储省份数据的listchildren共有34个元素,每一个元素是一个省份数,dict格式,包含name,adcode,total,today,children,前四个是省份总数据,children是省份城市明细城市数据和省份数据结构一样,但每一个children包含的城市数量不同用province_catch_data[i][‘children’]判断数量

#省份数据明细处理province_data = pd.DataFrame()#获取所有城市数据,第一步先处理省数据province_catch_data = catch_data('diseaseh5Shelf')['data']['diseaseh5Shelf']['areaTree'][0]['children']for i in range(len(province_catch_data)):province_total = province_catch_data[i]['total'] #省总数据province_total['name'] = province_catch_data[i]['name'] #省名province_total['adcode'] = province_catch_data[i]['adcode'] #省代码province_total['date'] = province_catch_data[i]['date'] #更新日期province_today = province_catch_data[i]['today'] #省当日数据province_today['name'] = province_catch_data[i]['name'] #省名province_total = pd.DataFrame(province_total,index=[i])province_today = pd.DataFrame(province_today,index=[i])province_today.rename({'confirm':'confirm_add'},inplace=True,axis=1) #today里面的confirm实际是每日新增merge_data = province_total.merge(province_today,how='left',on='name') #合并省总数据和当日数据province_data = pd.concat([province_data,merge_data]) #拼接省份数据province_data = province_data[['name','adcode','date','confirm','provinceLocalConfirm','heal','dead','nowConfirm','confirm_add','local_confirm_add','wzz_add','abroad_confirm_add','dead_add','mediumRiskAreaNum','highRiskAreaNum','isUpdated']]province_data.columns = ['省份','代码','日期','累计确诊','本土累计','累计治愈','累计死亡','现有确诊','当日新增','新增本土','新增无症状','新增境外','新增死亡','中风险数量','高风险数量','是否更新']province_data = province_data.sort_values(by='累计确诊',ascending=False,ignore_index=True)province_data.head()

df_city_data_total = pd.DataFrame()for x in range(len(province_catch_data)):province_dict = province_catch_data[x]['children']province_name = province_catch_data[x]['name']df_city_data = pd.DataFrame()for i in range(len(province_dict)):city_total = province_dict[i]['total']city_total['province_name'] = province_name #省名city_total['name'] = province_dict[i]['name'] #市区名city_total['adcode'] = province_dict[i]['adcode'] #市区代码city_total['date'] = province_dict[i]['date'] #更新日期city_today = province_dict[i]['today'] #当日数据city_today['province_name'] = province_name #省名city_today['name'] = province_dict[i]['name'] #市区名city_total = pd.DataFrame(city_total,index=[i])city_today = pd.DataFrame(city_today,index=[i])city_today.rename({'confirm':'confirm_add'},inplace=True,axis=1) #today里面的confirm实际是每日新增merge_city = city_total.merge(city_today,how='left',on=['province_name','name'])df_city_data = pd.concat([df_city_data,merge_city])df_city_data_total = pd.concat([df_city_data_total,df_city_data])df_city_data_total = df_city_data_total[['province_name','name','adcode','date','confirm','provinceLocalConfirm','heal','dead','nowConfirm','confirm_add','local_confirm_add','wzz_add','mediumRiskAreaNum','highRiskAreaNum']]df_city_data_total.columns = ['省份','城市','代码','日期','累计确诊','本土累计','累计治愈','累计死亡','现有确诊','当日新增','新增本土','新增无症状','中风险数量','高风险数量']df_city_data_total =df_city_data_total.sort_values(by='累计确诊',ascending=False,ignore_index=True)df_city_data_total.head()

省份历史数据明细

#各省份历史数据明细,缺失台湾香港澳门,城市历史数据更换城市code即可province_history_data = pd.DataFrame()for code in province_data['代码']:if code != '':history_data = requests.get('https://api./newsqa/v1/query/pubished/daily/list?adCode=' + str(code)).json()['data']history_df = pd.DataFrame(history_data)history_df['date'] = pd.to_datetime(history_df['year'].astype('str') + '.' + history_df['date'])history_df_use = history_df[['date','province','confirm','dead','heal','wzz','newConfirm','newHeal','newDead','wzz_add']]history_df_use.columns = ['日期','省份','累计确诊','累计死亡','累计治愈','无症状','新增确诊','新增治愈','新增死亡','新增无症状']province_history_data = pd.concat([province_history_data,history_df_use])province_history_data.shape

国外累计最新数据

#海外最新数据aboard_data = requests.get('https://api./newsqa/v1/automation/modules/list?modules=WomAboard').json()['data']['WomAboard']aboard_data = pd.DataFrame(aboard_data)aboard_data_use = aboard_data[['pub_date','continent','name','confirm','dead','heal','nowConfirm','confirmAdd']]aboard_data_use.columns = ['日期','大洲','国家','累计确诊','累计死亡','累计治愈','现有确诊','新增确诊']aboard_data_use.head()

第三部分 数据可视化

导入pyecharts绘图相关包

from pyecharts.charts import * #导入所有图表from pyecharts import options as opts#导入pyecharts的主题(如果不使用可以跳过)from pyecharts.globals import ThemeTypefrom mons.utils import JsCodefrom pyecharts.globals import CurrentConfig, NotebookTypeCurrentConfig.NOTEBOOK_TYPE = NotebookType.JUPYTER_NOTEBOOK

数据详情-Table

from ponents import Tablefrom pyecharts.options import ComponentTitleOptstable = Table()headers = list(chinadaylist.columns)rows = chinadaylist.sort_values(by='日期',ascending=False).head(1).valuestable.add(headers=headers,rows=rows)table.set_global_opts(title_opts=ComponentTitleOpts(title="国内最新数据", subtitle="更新日期:" + chinadaylist['日期'].astype('str').max()))table.render_notebook()

组合图(bar/line)

bar = Bar()bar.add_xaxis(list(chinadaylist["日期"].astype('str')))bar.add_yaxis(series_name ='累计确诊',y_axis=list(chinadaylist["累计确诊"]))bar.add_yaxis(series_name ="现有确诊",y_axis=list(chinadaylist['现有确诊']))bar.extend_axis(yaxis=opts.AxisOpts(name='治愈率',axislabel_opts=opts.LabelOpts(formatter="{value}%")))bar.set_series_opts(label_opts=opts.LabelOpts(is_show=False)) #不显示数据标签bar.set_global_opts(title_opts=opts.TitleOpts(title="国内累计确诊趋势",subtitle="数据来自腾讯疫情数据(含港澳台)", #添加副标题pos_left="center", #标题位置pos_top="top"),legend_opts=opts.LegendOpts(pos_left="left"), #图例位置-左侧xaxis_opts=opts.AxisOpts(type_="category",axislabel_opts=opts.AxisTickOpts()),yaxis_opts=opts.AxisOpts(name="人数"))line = Line()line.add_xaxis(list(chinadaylist["日期"].astype('str')))line.add_yaxis(series_name="治愈率(%)",y_axis=(chinadaylist['累计治愈']/chinadaylist['累计确诊']).round(decimals=3)*100,yaxis_index=1,symbol_size=3,is_smooth=True,label_opts=opts.LabelOpts(is_show=False),tooltip_opts=opts.TooltipOpts(formatter=JsCode("function (params) {return params.value+ '%'}"),is_show_content = True))bar.overlap(line) ##图形叠加bar.render_notebook()

折线图美化

background_color_js = ("new echarts.graphic.LinearGradient(0, 0, 0,1, ""[{offset: 0, color: '#99cccc'}, {offset: 1, color: '#00bfff'}], false)")line1 = Line(init_opts=opts.InitOpts(theme=ThemeType.ROMA,bg_color=JsCode(background_color_js))) #设置主题&背景颜色line1.add_xaxis(list(chinanewadd["日期"].astype('str'))) #添加x轴line1.add_yaxis(series_name = "新增确诊",y_axis = list(chinanewadd["新增确诊"]), #增加Y轴数据is_smooth=True,#添加Y轴,平滑曲线areastyle_opts=opts.AreaStyleOpts(opacity=0.3), #区域阴影透明度is_symbol_show = True,label_opts=opts.LabelOpts(is_show=False),yaxis_index = 0 #指定y轴顺序) #不显示标签line1.add_yaxis(series_name = "新增本土",y_axis = list(chinanewadd["本土新增确诊"]),is_smooth=True,areastyle_opts=opts.AreaStyleOpts(opacity=0.3),is_symbol_show = True,#是否显示标记# symbol = 'circle' #标记类型 'circle', 'rect', 'roundRect', 'triangle', 'diamond', 'pin', 'arrow', 'none'label_opts=opts.LabelOpts(is_show=False),yaxis_index = 1)#增加副轴line1.extend_axis(yaxis=opts.AxisOpts(name="新增本土(人)",name_location="end", #轴标题位置type_="value",#轴类型is_inverse=False, #逆序刻度值axistick_opts=opts.AxisTickOpts(is_show=True),splitline_opts=opts.SplitLineOpts(is_show=True)))#设置图表格式line1.set_global_opts(title_opts=opts.TitleOpts(title="国内每日新增趋势", #添加主标题subtitle="数据来自腾讯疫情数据(含港澳台)", #添加副标题subtitle_textstyle_opts = opts.TextStyleOpts(color='#000000'),pos_left="center", #标题位置pos_top="top"),legend_opts=opts.LegendOpts(pos_left="40%",pos_top='10%'), #图例位置-左侧xaxis_opts=opts.AxisOpts(type_="category",axislabel_opts=opts.AxisTickOpts()),yaxis_opts=opts.AxisOpts(name="新增确诊(人)", type_="value", #max_=100000),datazoom_opts=opts.DataZoomOpts(type_= 'slider',range_start=80 ,#横轴开始百分百range_end=100) , #横轴结束百分比toolbox_opts=opts.ToolboxOpts(is_show=True, #显示工具窗口orient='vertical', #垂直排列工具窗口pos_left='95%',pos_top='middle'))line1.render_notebook()

多tab轮播

map1= Map(init_opts=opts.InitOpts(width="900px",height="500px",bg_color=None))map1.add(series_name = "累计确诊",data_pair = [list(z) for z in zip(province_data['省份'],province_data['累计确诊'])],maptype = "china",is_map_symbol_show=False)map1.set_global_opts(title_opts=opts.TitleOpts(title="全国疫情地图-累计确诊",subtitle="更新日期:" + province_data['日期'].astype('str').max(),subtitle_textstyle_opts = opts.TextStyleOpts(color='#ffffff'),pos_left="center"),legend_opts=opts.LegendOpts(is_show=True, pos_top="40px", pos_left="30px"),visualmap_opts=opts.VisualMapOpts(is_piecewise=True,range_text=['高', '低'],pieces=[{"min": 50000, "color": "#751d0d"},{"min": 10000, "max": 49999, "color": "#ae2a23"},{"min": 5000, "max": 9999, "color": "#d6564c"},{"min": 1000, "max": 4999, "color": "#f19178"},{"min": 500, "max": 999, "color": "#f7d3a6"},{"min": 100, "max": 499, "color": "#fdf2d3"},{"min": 0, "max": 99, "color": "#FFFFFF"}]),toolbox_opts=opts.ToolboxOpts(is_show=True, #显示工具窗口orient='vertical', #垂直排列工具窗口pos_left='95%',pos_top='middle'),)map2= Map(init_opts=opts.InitOpts(width="900px",height="500px",bg_color=None))map2.add(series_name = "现有确诊",data_pair = [list(z) for z in zip(province_data['省份'],province_data['现有确诊'])],maptype = "china",is_map_symbol_show=False)map2.set_global_opts(title_opts=opts.TitleOpts(title="全国疫情地图-现有确诊",subtitle="更新日期:" + province_data['日期'].astype('str').max(),subtitle_textstyle_opts = opts.TextStyleOpts(color='#ffffff'),pos_left="center"),legend_opts=opts.LegendOpts(is_show=True, pos_top="40px", pos_left="30px"),visualmap_opts=opts.VisualMapOpts(is_piecewise=True,range_text=['高', '低'],pieces=[{"min": 10000, "color": "#751d0d"},{"min": 1000, "max": 9999, "color": "#ae2a23"},{"min": 500, "max": 999, "color": "#d6564c"},{"min": 100, "max": 499, "color": "#f19178"},{"min": 10, "max": 99, "color": "#f7d3a6"},{"min": 1, "max": 9, "color": "#fdf2d3"},{"min": 0, "max": 0, "color": "#FFFFFF"}]),toolbox_opts=opts.ToolboxOpts(is_show=True, #显示工具窗口orient='vertical', #垂直排列工具窗口pos_left='95%',pos_top='middle'),)##i添加选项卡tabtab = Tab()tab.add(map1, "累计确诊地图")tab.add(map2, "现有确诊地图")tab.render_notebook()

时间轴轮播-map

```python##给日期排序province_history_data['date_rank'] = province_history_data['日期'].rank(method='dense',ascending=True)df_list = []#取前15日数据,可任意变更for i in range(1,15):df_list.append(province_history_data.loc[province_history_data['date_rank']==i])tl = Timeline(init_opts=opts.InitOpts(theme=ThemeType.CHALK,width="900px", height="600px")) #时间轴for idx in range(len(df_list)):#循环给时间轴增加图形provinces = []confirm_value = []date = df_list[idx]['日期'].astype('str').unique()[0]for item_pv in df_list[idx]['省份']:provinces.append(item_pv)for item_pc in df_list[idx]['累计确诊']:confirm_value.append(item_pc)zipped = zip(provinces, confirm_value)f_map = Map(init_opts=opts.InitOpts(width="800",height="500px"))f_map.add(series_name="确诊数量",data_pair=[list(z) for z in zipped],maptype="china",is_map_symbol_show=False)f_map.set_global_opts(title_opts=opts.TitleOpts(title="全国疫情地图-累计确诊",subtitle="更新日期:" + date,subtitle_textstyle_opts = opts.TextStyleOpts(color='#ffffff'),pos_left="center"),legend_opts=opts.LegendOpts(is_show=False, pos_top="40px", pos_left="30px"),visualmap_opts=opts.VisualMapOpts(is_piecewise=True,range_text=['高', '低'],pieces=[{"min": 1000, "color": "#CC0033"},{"min": 200, "max": 999, "color": "#FF4500"},{"min": 50, "max": 199, "color": "#FF8C00"},{"min": 1, "max": 49, "color": "#FFDAB9"},{"min": 0, "max": 0, "color": "#F5F5F5"}],textstyle_opts = opts.TextStyleOpts(color='#ffffff'),pos_bottom='15%',pos_left='5%'))tl.add(f_map, "{}".format(date)) #添加图形tl.add_schema(is_timeline_show=True, # 是否显示play_interval=1200, # 播放间symbol=None, # 图标is_loop_play=True , # 循环播放is_auto_play = True)tl.render_notebook()

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。