700字范文,内容丰富有趣,生活中的好帮手!
700字范文 > python作业记录--爬虫 网易云音乐热评+词云生成

python作业记录--爬虫 网易云音乐热评+词云生成

时间:2024-04-30 11:34:07

相关推荐

python作业记录--爬虫 网易云音乐热评+词云生成

import requestsimport jsonimport reimport randomfrom Crypto.Cipher import AESfrom base64 import b64encodefrom wordcloud import WordCloudimport jieba #用来进行评论的分词处理import numpy as np #词云图要用from PIL import Image#需要输入的信息id_song = input("请输入你要查询的歌曲的id:")song_name = input("请输入歌曲名称:")#基本数据准备:url1 = "/weapi/comment/resource/comments/get?csrf_token="url2 = "/weapi/song/enhance/player/url/v1"data1 = {"csrf_token": "","cursor": "-1","offset": "0","orderType": "1","pageNo": "1","pageSize": "20","rid": f"R_SO_4_{id_song}","threadId": f"R_SO_4_{id_song}"}data2 = {"csrf_token": "","encodeType": "aac","ids": f"[{id_song}]","level": "standard"}e = "010001"f = "00e0b509f6259df8642dbc35662901477df22677ec152b5ff68ace615bb7b725152b3ab17a876aea8a5aa76d2e417629ec4ee341f56135fccf695280104e0312ecbda92557c93870114af6c9d05c4f7f0c3685b7a46bee255932575cce10b424d813cfe4875d3e82047b97ddef52741d546b8e289dc6935b3ece0462db0a22b8e7"g = "0CoJUm6Qyw8W8jud"i = "ijot13Ww76mRTqj0" #手动固定的->相当于在网易的随机库里找个一个定值# 固定i值,拿到encSecKeydef get_encSecKey(): #由于i是固定的,那么encSecKey也是固定的 c那个函数的结果,这个是直接在源代码设置断点拦截的结果,先拦截i再拦截i对应encSecKeyreturn "c03958575b7fa52443046d59ddcabb5f31fb3222d6a02ae8ecd8fba614e7e97f74a8a409bc46b4a502720437794036dfe4a0b802d17c0b5c6a34436225e9df0ab892f91a6163d02dc93706d96625c5cd839dead646170e7acfe263e5c6b7b2e8142a409646091ab67b0a18063079e1359ce29e4691c4d045c34fa52a8e6a3b39"# CBC加密函数,拿到paramsdef get_params(data): #默认收到的是字符串first = enc_params(data, g)second = enc_params(first, i)return second #返回的就是paramsdef to_16(data): #转化为16的倍数,为下面的加密算法做准备pad = 16-len(data) % 16data += chr(pad)*padreturn datadef enc_params(data, key): #加密过程iv = "0102030405060708"data = to_16(data)aes = AES.new(key=key.encode("utf-8"), IV=iv.encode('utf-8'), mode=AES.MODE_CBC) # 创建加密器bs = aes.encrypt(data.encode("utf-8")) # 加密,加密的内容长度必须为16的倍数return str(b64encode(bs), "utf-8") # 转化成字符串返回# 请求,拿到评论数据:def get_comment():params1 = get_params(json.dumps(data1))encSecKey1 = get_encSecKey()data3 = {"params": params1,"encSecKey": encSecKey1}resp1 = requests.post(url=url1, data=data3)comment_page = resp1.json()resp1.close()#评论信息写入.text文件f = open(f"some//{song_name}.text", mode="w", encoding="utf-8")for i in range(15):b = comment_page["data"]["hotComments"][i]["content"]f.write(b+'\n')print("写入完成!")f.close()#make_cloud()# 请求,拿到歌曲下载数据:def get_song():params2 = get_params(json.dumps(data2))encSecKey2 = get_encSecKey()data4 = {"params": params2,"encSecKey": encSecKey2}resp2 = requests.post(url=url2, data=data4)song = resp2.json()['data'][0]['url']with open(f"{song_name}.m4a", "wb") as f:f.write(requests.get(url=song).content)print("歌曲下载完毕!")resp2.close()#生成词云def make_cloud():mask = np.array(Image.open("some//云图.PNG")) #词云素材图# 读取词云文本with open(f"some//{song_name}.text", "r", encoding='utf-8') as f:text = f.read()text.join(text)f.close()# 处理词云文本ciyu = jieba.lcut(text) # 对长段文本进行分词操作,返回一个list类型x = " ".join(ciyu) #连接成字符串,空格分割,便于生成词云wcd = WordCloud(background_color=None, repeat=False, max_words=500, height=1080, width=1920, max_font_size=100,colormap='winter', mask=mask, font_path='fonts/simkai.ttf') #设置词云的一些基础信息,例如长宽,背景,有无重复等等wcd.generate(x)wcd.to_file('词云.png')print("词云图生成完毕!")#get_song()#get_comment()make_cloud()结果图:

词云图:

歌曲是直接m4a形式文件,

不是很完善的程序,临近期末也没太多时间搞,但是挺开心搞完了自己的第一个爬虫小程序。

"""总述:这个项目大框架是模拟网易的加密程序,找到未加密的信息,在自己的pycharm上进行加密,在传递给网易云进行解密,拿到相关数据,可以实现批量化下载,但是爬虫爬的好,牢饭吃到饱。。。1.写出两个必要参数,params,和encSecKey,利用函数封装两个加密过程,其中encSecKey加密只与网易源代码内部有关,所以为了减小难度,与encSecKey有关的i参数,和encSecKey是直接在网易源码中中断拿取的随机值,由源码分析,只要i和encSecKey一一对应就行我一共抓了三组数据,都是对应好的.1.i = "8t3BQN6WTmRlYzlc"def get_encSecKey(): #由于i是固定的,那么encSecKey也是固定的 c那个函数的结果,这个是直接在源代码设置断点拦截的结果,先拦截i再拦截i对应encSecKeyreturn "7de9146c5ce13da2d638a867c80c9bee41710d6fc0f2a696bb3f71ad18e146e104e50fbd7da7f62e05e1ec3648435891ec23765b0b44bd078d22d6c2a67cc51bfc7febe04ed3fe20ec84a86b8367fa5eb2a29b976283f163a8c6784250ff054d79106a7123a0e98401d8f54ca70917cb460eca9a3504f15039958155c0241ea1"2.i = "ijot13Ww76mRTqj0" #手动固定的->相当于在网易的随机库里找个一个定值def get_encSecKey(): #由于i是固定的,那么encSecKey也是固定的 c那个函数的结果,这个是直接在源代码设置断点拦截的结果,先拦截i再拦截i对应encSecKeyreturn "c03958575b7fa52443046d59ddcabb5f31fb3222d6a02ae8ecd8fba614e7e97f74a8a409bc46b4a502720437794036dfe4a0b802d17c0b5c6a34436225e9df0ab892f91a6163d02dc93706d96625c5cd839dead646170e7acfe263e5c6b7b2e8142a409646091ab67b0a18063079e1359ce29e4691c4d045c34fa52a8e6a3b39"3.i = "3zzeNzWueP6Pv6lC" #手动固定的->相当于在网易的随机库里找个一个定值def get_encSecKey(): #由于i是固定的,那么encSecKey也是固定的 c那个函数的结果,这个是直接在源代码设置断点拦截的结果,先拦截i再拦截i对应encSecKeyreturn "6042ab0944a88f26cee001ac73da357c607649be165369970eb4a893c17f848f4154bbf815d72cc5c9c7cf85e2a4ec5d5bee2fcb9b7cc7b3b755b61986b78f001acf1ce8f6dc264ceacb7d6e74c303183fafa075a1722cc0e06a3097a3cbc1e07f2327d2a9581ca8bc471c445282c00ac564bb3ac5d03c18095b0518be5641e9"encSecKey解决了以后 ,就是params的两次加密过程。(CBC加密)2.项目是要抓取网易云音乐的某个歌曲的前15热评和下载歌曲到本地。热评的信息是在get?csrf_token=请求下获取的,post请求下载地址是在url/v1这个请求中拿到的,也是post请求两个请求的参数不相同,这些参数是直接可以通过js逆向找到,大约在倒数第三个还是第四个栈里热评的真实参数是:data1 = {"csrf_token": "","cursor": "-1","offset": "0","orderType": "1","pageNo": "1","pageSize": "20","rid": f"R_SO_4_{id_song}","threadId": f"R_SO_4_{id_song}"}歌曲下载的真实参数是:data2 = {"csrf_token": "","encodeType": "aac","ids": f"[{song_id}]","level": "standard"}3.将相应的id信息传递给data1和data2,调用两次加密函数,将真实参数按照网易的加密逻辑加密,4.将加密后的信息传递给网易,拿到所需要的热评信息和歌曲的下载地址,(都是字典格式)5.对热评信息进行存放处理,生成词云,对歌曲的下载,保存。""""""加密相关源码(复制于网易源码)1.热评:i6c={"csrf_token": "","cursor": "-1","offset": "0","orderType": "1","pageNo": "1","pageSize": "20","rid": "R_SO_4_1325905146","threadId": "R_SO_4_1325905146"}var bUM2x = window.asrsea(JSON.stringify(i6c), bsG7z(["流泪", "强"]), bsG7z(WW3x.md), bsG7z(["爱心", "女孩", "惊恐", "大笑"]));e6c.data = j6d.cs6m({params: bUM2x.encText,encSecKey: bUM2x.encSecKey})2.下载:i6c = {"csrf_token": "","encodeType": "aac","ids": "[1408046609]","level": "standard"}var bUM2x = window.asrsea(JSON.stringify(i6c), bsG7z(["流泪", "强"]), bsG7z(WW3x.md), bsG7z(["爱心", "女孩", "惊恐", "大笑"]));e6c.data = j6d.cs6m({params: bUM2x.encText,encSecKey: bUM2x.encSecKey})3.两者功用的加密程序:!function() {function a(a=16) {var d, e, b = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789", c = "";for (d = 0; a > d; d += 1) #循环16次e = Math.random() * b.length, #生成随机数e = Math.floor(e), #取整c += b.charAt(e); #取字符串的某某位return c #返回16位的随机字符串}function b(a, b) { #CBC加密 a:数据 b:定值var c = CryptoJS.enc.Utf8.parse(b), d = CryptoJS.enc.Utf8.parse("0102030405060708"), e = CryptoJS.enc.Utf8.parse(a) #e是数据, f = CryptoJS.AES.encrypt(e, c, { #AES加密,AES加密算法的三要素,原文e,密钥c和偏移量div: d,mode: CryptoJS.mode.CBC});return f.toString()}function c(a, b, c) { #RSA加密 a=i:16位随机值,b,c定值 var d, e;return setMaxDigits(131),d = new RSAKeyPair(b,"",c),e = encryptedString(d, a)}function d(d, e, f, g) { d=数据,e,f,g定值var h = {} , i = a(16);h.encText = b(d, g)h.encText = b(h.encText, i) #两次CBC加密h.encSecKey = c(i, e, f)return h}window.asrsea = d,window.ecnonasr = e

"""

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。