700字范文 > python统计文本字数_Python 简单应用--文章字数统计

python统计文本字数_Python 简单应用--文章字数统计

时间：2020-01-05 10:31:28

Python是做数据处理很好的工具，这里小时牛刀，用Python完成文章的字数统计。

系统：Ubuntu16.04

Python版本：3.4

文本：《西游记》txt片段

结果：存放于result.csv 中

# 下面两句可以查看使用的字符编码，结果为：utf-8

import sys

print (sys.getdefaultencoding())

fw = open('data.txt.utf8','r')

# character列表：存储所有出现的汉字

# stat字典：汉字为key值，出现次数为value值

characters = []

stat = {}

for line in fw:

line = line.strip()

# 如果某一行去掉空格没有内容，则这一行不做处理

if len(line) == 0:

continue

for x in range(0,len(line)):

# 暴力列举可能出现的标点符号，统计汉字的时候跳过这些符号

if line[x] in [' ','\n','\t','，','。','？','《','》','！','、','：','“','”','；']:

continue

# 如果当前汉字没有在character列表中，则加入character列表

if not(line[x] in characters):

characters.append(line[x])

# 判断stat字典中是否含存在当前汉字，如果不存在，则将此汉字加入stat字典，其value值赋 0

# python2的版本： if not(stat.has_key(line[x])):

if not (stat.__contains__(line[x])):

stat[line[x]] = 0

# 在stat字典中，使当前汉字的统计数 +1

stat[line[x]] += 1

fw.close()

# print the result

print(characters)

for key,value in stat.items():

print(key,value)

# 查看character和stat的长度，即里面含有的元素个数

print('characters列表的长度：' + str(len(characters)))

print('stat字典的长度：' + str(len(stat)))

输出结果：

因为数据太多，显示不方便，所有做一下简单的数据处理：

# 做一些简单的数据处理

# 将stat字典转换为列表，该列表以value值降序排列

stat = sorted(stat.items(),key = lambda d:d[1],reverse = True)

# 输出此时stat的类型，以及长度

print(type(stat),len(stat))

# 输出character列表中前十个汉字

for x in range(1,10):

print (characters[x])

print('******************************')

# 输出stat列表中前十个数据

for x in range(1,10):

print (stat[x][0], stat[x][1])

# 将数据结果保存在CSV文件中

fw = open('result.csv','w')

# 涉及到数据类型的转换

for item in stat:

fw.write(item[0] + ',' + str(item[1]) + '\n')

fw.close()

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。