700字范文,内容丰富有趣,生活中的好帮手!
700字范文 > pandas 利用 正则表达式 从文本中提取数字

pandas 利用 正则表达式 从文本中提取数字

时间:2021-09-13 04:47:28

相关推荐

pandas 利用 正则表达式 从文本中提取数字

需要从text特征中提取形如 13.5/10 这样的字符串,再分别提取分子分母。

1)可以利用str.extract()方法。

2)利用正则表达式\d+\.?\d*\/\d+进行匹配

3)再利用.split()方法提取分子分母

代码:

test.text.tolist()# output['This is Bella. She hopes her smile made you smile. If not, she is also offering you her favorite monkey. 13.5/10 https://t.co/qjrljjt948',"This is Logan, the Chow who lived. He solemnly swears he's up to lots of good. H*ckin magical af 9.75/10 https://t.co/yBO5wuqaPS","This is Sophie. She's a Jubilant Bush Pupper. Super h*ckin rare. Appears at random just to smile at the locals. 11.27/10 would smile back https://t.co/QFaUiIHxHq",'Here we have uncovered an entire battalion of holiday puppers. Average of 11.26/10 https://t.co/eNm2S6p9BD']

test['rating'] = test['text'].str.extract(r'(\d+\.?\d*\/\d+)', expand=False)# 提取分子test['rating_numerator'] = test.rating.apply(lambda x: eval(x.split('/')[0]))# 提取分母test['rating_denominator_fix'] = test.rating.apply(lambda x: eval(x.split('/')[1]))# 删除中间量test.drop(['rating'], axis=1, inplace=True)

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。