700字范文,内容丰富有趣,生活中的好帮手!
700字范文 > Python之——网站访问流量统计

Python之——网站访问流量统计

时间:2024-04-28 20:11:32

相关推荐

Python之——网站访问流量统计

转载请注明出处:/l1028386804/article/details/79056976

一、场景描述

数据源准备工作详见博文《Python之——自动上传本地log文件到HDFS(基于Hadoop 2.5.2)》。

网站访问流量作为衡量一个站点的价值、热度的重要标准,另外,在CDN服务中心流量会涉及计费,如何快速准确分析当前站点的流量数据至关重要。本实例精确到分钟统计网站访问流量,原理是在mapper操作时将Web日志中小时的每分钟作为key,将对应的发送字节数作为value, 在reducer操作时对相同key做累加(sum)统计。

二、实现MapReduce

【/usr/local/python/source/httpflow.py】

# -*- coding:UTF-8 -*-'''Created on 1月14日@author: liuyazhuang'''from mrjob.job import MRJobimport reclass MRCounter(MRJob):def mapper(self, key, line):i = 0;for flow in line.split():#获取时间字段,位于日志的第4列,内容如[14/Jan/:08:41:24if i == 3:timerow = flow.split(":")#获取“小时:分钟”作为keyhm = timerow[1] + ":" + timerow[2]#获取日志第10列 - 发送的字节数,作为valueif i == 9 and re.match(r"\d{1,}", flow):#初始化key-valueyield hm, int(flow)i += 1def reducer(self, key, occurrences):#相同key "小时:分钟"的value作累加操作yield key, sum(occurrences)if __name__ == '__main__':MRCounter.run()

三、生成MapReduce任务

运行如下命令:

python httpflow.py -r hadoop --jobconf mapreduce.job.priority=VERY_HIGH --jobconf mapreduce.map.tasks=2 --jobconf mapduce.reduce.tasks=1 -o hdfs://liuyazhuang121:9000/output/httpflow hdfs://liuyazhuang121:9000/user/root//0114

此时打印的日志如下:

[root@liuyazhuang121 source]# python httpflow.py -r hadoop --jobconf mapreduce.job.priority=VERY_HIGH --jobconf mapreduce.map.tasks=2 --jobconf mapduce.reduce.tasks=1 -o hdfs://liuyazhuang121:9000/output/httpflow hdfs://liuyazhuang121:9000/user/root//0114No configs found; falling back on auto-configurationNo configs specified for hadoop runnerLooking for hadoop binary in $PATH...Found hadoop binary: /usr/local/hadoop-2.5.2/bin/hadoopUsing Hadoop version 2.5.2Looking for Hadoop streaming jar in /usr/local/hadoop-2.5.2...Found Hadoop streaming jar: /usr/local/hadoop-2.5.2/share/hadoop/tools/lib/hadoop-streaming-2.5.2.jarCreating temp directory /tmp/httpflow.root.0114.073946.471689Copying local files to hdfs:///user/root/tmp/mrjob/httpflow.root.0114.073946.471689/files/...Running step 1 of 1...packageJobJar: [/usr/local/hadoop-2.5.2/tmp/hadoop-unjar4177700266549256253/] [] /tmp/streamjob6159676598743062299.jar tmpDir=nullConnecting to ResourceManager at liuyazhuang121/192.168.209.121:8032Connecting to ResourceManager at liuyazhuang121/192.168.209.121:8032Total input paths to process : 1number of splits:2Submitting tokens for job: job_1515893542122_0006Submitted application application_1515893542122_0006The url to track the job: http://liuyazhuang121:8088/proxy/application_1515893542122_0006/Running job: job_1515893542122_0006Job job_1515893542122_0006 running in uber mode : falsemap 0% reduce 0%map 100% reduce 0%map 100% reduce 100%Job job_1515893542122_0006 completed successfullyOutput directory: hdfs://liuyazhuang121:9000/output/httpflowCounters: 49File Input Format Counters Bytes Read=2355499File Output Format Counters Bytes Written=5445File System CountersFILE: Number of bytes read=55559FILE: Number of bytes written=415983FILE: Number of large read operations=0FILE: Number of read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=2355749HDFS: Number of bytes written=5445HDFS: Number of large read operations=0HDFS: Number of read operations=9HDFS: Number of write operations=2Job Counters Data-local map tasks=2Launched map tasks=2Launched reduce tasks=1Total megabyte-seconds taken by all map tasks=6390784Total megabyte-seconds taken by all reduce tasks=3116032Total time spent by all map tasks (ms)=6241Total time spent by all maps in occupied slots (ms)=6241Total time spent by all reduce tasks (ms)=3043Total time spent by all reduces in occupied slots (ms)=3043Total vcore-seconds taken by all map tasks=6241Total vcore-seconds taken by all reduce tasks=3043Map-Reduce FrameworkCPU time spent (ms)=2760Combine input records=0Combine output records=0Failed Shuffles=0GC time elapsed (ms)=58Input split bytes=250Map input records=7555Map output bytes=47795Map output materialized bytes=55565Map output records=3879Merged Map outputs=2Physical memory (bytes) snapshot=652185600Reduce input groups=430Reduce input records=3879Reduce output records=430Reduce shuffle bytes=55565Shuffled Maps =2Spilled Records=7758Total committed heap usage (bytes)=468189184Virtual memory (bytes) snapshot=2668351488Shuffle ErrorsBAD_ID=0CONNECTION=0IO_ERROR=0WRONG_LENGTH=0WRONG_MAP=0WRONG_REDUCE=0Streaming final output from hdfs://liuyazhuang121:9000/output/httpflow..."00:00" 572"00:18" 572"00:35" 287"00:55" 573"01:10" 285"01:38" 574"01:53" 574"02:16" 570"02:31" 287"02:51" 572"03:06" 286"03:07" 287"03:26" 573"03:37" 569"03:58" 571"04:11" 570"04:42" 571"04:49" 574"04:58" 569"05:17" 572"05:23" 572"05:47" 285"06:07" 285"06:18" 288"06:38" 572"06:55" 285"07:09" 574"07:29" 573"07:39" 572"08:07" 570"08:13" 573"08:32" 573"08:33" 571"08:34" 574"08:35" 575"08:36" 572"08:37" 575"08:38" 576"08:39" 570"08:40" 575"08:41" 570"08:42" 571"08:43" 575"08:44" 574"08:45" 575"08:46" 574"08:47" 571"08:48" 574"08:49" 520452"08:50" 769"08:51" 568"08:52" 574"08:53" 573"08:54" 572"08:55" 571"08:56" 573"08:57" 862"08:58" 571"08:59" 570"09:00" 575"09:01" 148168"09:02" 570"09:03" 755"09:04" 151052"09:05" 153894"09:06" 571"09:07" 148374"09:08" 857"09:09" 857"09:10" 148446"09:11" 860"09:12" 1148"09:13" 6901"09:14" 8062"09:15" 11603"09:16" 2297730"09:17" 3352029"09:18" 1670086"09:19" 859"09:20" 1042"09:21" 574"09:22" 857"09:23" 572"09:24" 573"09:25" 2734"09:26" 1174"09:27" 1646607"09:28" 486805"09:29" 271606"09:30" 55121"09:31" 2593"09:32" 4079807"09:33" 574"09:34" 288"09:35" 287"09:36" 286"09:37" 574"09:38" 284"09:39" 280"09:40" 3875259"09:41" 147800"09:42" 859"09:43" 296035"09:44" 287"09:45" 287"09:46" 32419"09:47" 186591"09:48" 576"09:49" 570"09:50" 147798"09:51" 753099"09:52" 149511"09:53" 754"09:54" 286"09:55" 286"09:56" 148452"09:57" 853"09:58" 569"09:59" 2887"10:00" 2887"10:01" 5199"10:02" 858"10:03" 571"10:04" 148923"10:05" 571"10:06" 287"10:07" 148078"10:08" 571"10:09" 939"10:10" 1532"10:11" 573"10:12" 857"10:13" 573"10:14" 569"10:15" 575"10:16" 571"10:17" 570"10:18" 148452"10:19" 964"10:20" 287"10:21" 286"10:22" 568"10:23" 285"10:24" 283"10:25" 288"10:26" 286"10:27" 571"10:28" 286"10:29" 3983"10:30" 2812"10:31" 922"10:32" 1860"10:33" 613"10:34" 3173"10:35" 359727"10:36" 2298366"10:37" 857"10:38" 1529"10:39" 570"10:40" 572"10:41" 574"10:42" 859"10:43" 571"10:44" 5520"10:45" 1221"10:46" 5818"10:47" 148724"10:48" 1856"10:49" 574"10:50" 568"10:51" 574"10:52" 857"10:53" 572"10:54" 896"10:55" 573"10:56" 575"10:57" 1181"10:58" 573"10:59" 571"11:00" 146738"11:01" 413761"11:02" 148951"11:03" 576"11:04" 574"11:05" 659"11:06" 575"11:07" 1430"11:08" 97080"11:09" 573"11:10" 858"11:11" 573"11:12" 859"11:13" 574"11:14" 573"11:15" 571"11:16" 573"11:17" 859"11:18" 570"11:19" 572"11:20" 570"11:21" 576"11:22" 4975"11:23" 148383"11:24" 3653"11:25" 858"11:26" 860"11:27" 1149"11:28" 858"11:29" 855"11:30" 862"11:31" 1009620"11:32" 1146"11:33" 860"11:34" 860"11:35" 946"11:36" 857"11:37" 1145"11:38" 859"11:39" 860"11:40" 857"11:41" 856"11:42" 1147"11:43" 855"11:44" 860"11:45" 5721"11:46" 857"11:47" 1146"11:48" 854"11:49" 858"11:50" 860"11:51" 861"11:52" 1333"11:53" 857"11:54" 857"11:55" 857"11:56" 857"11:57" 1143"11:58" 856"11:59" 858"12:00" 858"12:01" 856"12:02" 1142"12:03" 859"12:04" 861"12:05" 1244"12:06" 862"12:07" 1148"12:08" 858"12:09" 856"12:10" 860"12:11" 859"12:12" 1144"12:13" 857"12:14" 858"12:15" 1047"12:16" 853"12:17" 1144"12:18" 856"12:19" 857"12:20" 857"12:21" 860"12:22" 1340"12:23" 861"12:24" 857"12:25" 857"12:26" 860"12:27" 1142"12:28" 856"12:29" 858"12:30" 573"12:31" 573"12:32" 857"12:33" 573"12:34" 571"12:35" 571"12:36" 571"12:37" 854"12:38" 571"12:39" 571"12:40" 572"12:41" 577"12:42" 858"12:43" 573"12:44" 571"12:45" 573"12:46" 573"12:47" 856"12:48" 574"12:49" 571"12:50" 571"12:51" 572"12:52" 1054"12:53" 572"12:54" 570"12:55" 571"12:56" 573"12:57" 858"12:58" 573"12:59" 573"13:00" 574"13:01" 574"13:02" 854"13:03" 572"13:04" 574"13:05" 575"13:06" 576"13:07" 858"13:08" 572"13:09" 572"13:10" 573"13:11" 574"13:12" 858"13:13" 573"13:14" 572"13:15" 571"13:16" 575"13:17" 857"13:18" 572"13:19" 573"13:20" 574"13:21" 572"13:22" 1054"13:23" 573"13:24" 575"13:25" 569"13:26" 572"13:27" 856"13:28" 572"13:29" 574"13:30" 572"13:31" 573"13:32" 857"13:33" 571"13:34" 573"13:35" 570"13:36" 574"13:37" 857"13:38" 747352"13:39" 1548813"13:40" 1548"13:41" 574"13:42" 3865293"13:43" 6170"13:44" 3331"13:45" 1545861"13:46" 901"13:47" 1722453"13:48" 3839352"13:49" 1672340"13:50" 2280"13:51" 1818880"13:52" 2548977"13:53" 3401"13:54" 862"13:55" 858"13:56" 572"13:57" 1329"13:58" 575"13:59" 574"14:00" 1526"14:01" 1530"14:02" 12994"14:03" 2391"14:04" 1149"14:05" 149010"14:06" 5492"14:07" 857"14:08" 857"14:09" 1148"14:10" 851"14:11" 854"14:12" 3894"14:13" 149041"14:14" 145109"14:15" 754"14:16" 1330"14:17" 861"14:18" 1223"14:19" 127167"14:20" 571"14:21" 285"14:22" 287"14:23" 572"14:24" 35292"14:25" 569"14:26" 570"14:27" 867"14:28" 2534"14:29" 856"14:30" 570"14:31" 573"14:32" 573"14:33" 574"14:34" 1595"14:35" 574"14:36" 571"14:37" 148726"14:38" 148452"14:39" 148727"14:40" 861"14:41" 148441"14:42" 859"14:43" 889605"14:44" 1144"14:45" 858"14:46" 857"14:47" 862"14:48" 1522546"14:49" 7094"14:50" 861"14:51" 767325"14:52" 1051"14:53" 148723"14:54" 860"14:55" 148743"14:56" 149333"14:57" 857"14:58" 5771"14:59" 5961"15:00" 59869"15:01" 10255"15:02" 859"15:03" 2892"15:04" 858"15:05" 2523173"15:06" 1547763"15:07" 1530"15:08" 2296079"15:09" 7799"15:10" 3482555Removing HDFS temp directory hdfs:///user/root/tmp/mrjob/httpflow.root.0114.073946.471689...Removing temp directory /tmp/httpflow.root.0114.073946.471689...

可以看出,打印出了结果,此时我们通过命令:

hadoop fs -ls /output/httpflow

查看生成的结果文件:

[root@liuyazhuang121 source]# hadoop fs -ls /output/httpflowFound 2 items-rw-r--r-- 1 root supergroup0 -01-14 15:40 /output/httpflow/_SUCCESS-rw-r--r-- 1 root supergroup 5445 -01-14 15:40 /output/httpflow/part-00000

然后我们通过命令

hadoop fs -cat /output/httpflow/part-00000

查看输出的结果如下:

[root@liuyazhuang121 source]# hadoop fs -cat /output/httpflow/part-00000"00:00" 572"00:18" 572"00:35" 287"00:55" 573"01:10" 285"01:38" 574"01:53" 574"02:16" 570"02:31" 287"02:51" 572"03:06" 286"03:07" 287"03:26" 573"03:37" 569"03:58" 571"04:11" 570"04:42" 571"04:49" 574"04:58" 569"05:17" 572"05:23" 572"05:47" 285"06:07" 285"06:18" 288"06:38" 572"06:55" 285"07:09" 574"07:29" 573"07:39" 572"08:07" 570"08:13" 573"08:32" 573"08:33" 571"08:34" 574"08:35" 575"08:36" 572"08:37" 575"08:38" 576"08:39" 570"08:40" 575"08:41" 570"08:42" 571"08:43" 575"08:44" 574"08:45" 575"08:46" 574"08:47" 571"08:48" 574"08:49" 520452"08:50" 769"08:51" 568"08:52" 574"08:53" 573"08:54" 572"08:55" 571"08:56" 573"08:57" 862"08:58" 571"08:59" 570"09:00" 575"09:01" 148168"09:02" 570"09:03" 755"09:04" 151052"09:05" 153894"09:06" 571"09:07" 148374"09:08" 857"09:09" 857"09:10" 148446"09:11" 860"09:12" 1148"09:13" 6901"09:14" 8062"09:15" 11603"09:16" 2297730"09:17" 3352029"09:18" 1670086"09:19" 859"09:20" 1042"09:21" 574"09:22" 857"09:23" 572"09:24" 573"09:25" 2734"09:26" 1174"09:27" 1646607"09:28" 486805"09:29" 271606"09:30" 55121"09:31" 2593"09:32" 4079807"09:33" 574"09:34" 288"09:35" 287"09:36" 286"09:37" 574"09:38" 284"09:39" 280"09:40" 3875259"09:41" 147800"09:42" 859"09:43" 296035"09:44" 287"09:45" 287"09:46" 32419"09:47" 186591"09:48" 576"09:49" 570"09:50" 147798"09:51" 753099"09:52" 149511"09:53" 754"09:54" 286"09:55" 286"09:56" 148452"09:57" 853"09:58" 569"09:59" 2887"10:00" 2887"10:01" 5199"10:02" 858"10:03" 571"10:04" 148923"10:05" 571"10:06" 287"10:07" 148078"10:08" 571"10:09" 939"10:10" 1532"10:11" 573"10:12" 857"10:13" 573"10:14" 569"10:15" 575"10:16" 571"10:17" 570"10:18" 148452"10:19" 964"10:20" 287"10:21" 286"10:22" 568"10:23" 285"10:24" 283"10:25" 288"10:26" 286"10:27" 571"10:28" 286"10:29" 3983"10:30" 2812"10:31" 922"10:32" 1860"10:33" 613"10:34" 3173"10:35" 359727"10:36" 2298366"10:37" 857"10:38" 1529"10:39" 570"10:40" 572"10:41" 574"10:42" 859"10:43" 571"10:44" 5520"10:45" 1221"10:46" 5818"10:47" 148724"10:48" 1856"10:49" 574"10:50" 568"10:51" 574"10:52" 857"10:53" 572"10:54" 896"10:55" 573"10:56" 575"10:57" 1181"10:58" 573"10:59" 571"11:00" 146738"11:01" 413761"11:02" 148951"11:03" 576"11:04" 574"11:05" 659"11:06" 575"11:07" 1430"11:08" 97080"11:09" 573"11:10" 858"11:11" 573"11:12" 859"11:13" 574"11:14" 573"11:15" 571"11:16" 573"11:17" 859"11:18" 570"11:19" 572"11:20" 570"11:21" 576"11:22" 4975"11:23" 148383"11:24" 3653"11:25" 858"11:26" 860"11:27" 1149"11:28" 858"11:29" 855"11:30" 862"11:31" 1009620"11:32" 1146"11:33" 860"11:34" 860"11:35" 946"11:36" 857"11:37" 1145"11:38" 859"11:39" 860"11:40" 857"11:41" 856"11:42" 1147"11:43" 855"11:44" 860"11:45" 5721"11:46" 857"11:47" 1146"11:48" 854"11:49" 858"11:50" 860"11:51" 861"11:52" 1333"11:53" 857"11:54" 857"11:55" 857"11:56" 857"11:57" 1143"11:58" 856"11:59" 858"12:00" 858"12:01" 856"12:02" 1142"12:03" 859"12:04" 861"12:05" 1244"12:06" 862"12:07" 1148"12:08" 858"12:09" 856"12:10" 860"12:11" 859"12:12" 1144"12:13" 857"12:14" 858"12:15" 1047"12:16" 853"12:17" 1144"12:18" 856"12:19" 857"12:20" 857"12:21" 860"12:22" 1340"12:23" 861"12:24" 857"12:25" 857"12:26" 860"12:27" 1142"12:28" 856"12:29" 858"12:30" 573"12:31" 573"12:32" 857"12:33" 573"12:34" 571"12:35" 571"12:36" 571"12:37" 854"12:38" 571"12:39" 571"12:40" 572"12:41" 577"12:42" 858"12:43" 573"12:44" 571"12:45" 573"12:46" 573"12:47" 856"12:48" 574"12:49" 571"12:50" 571"12:51" 572"12:52" 1054"12:53" 572"12:54" 570"12:55" 571"12:56" 573"12:57" 858"12:58" 573"12:59" 573"13:00" 574"13:01" 574"13:02" 854"13:03" 572"13:04" 574"13:05" 575"13:06" 576"13:07" 858"13:08" 572"13:09" 572"13:10" 573"13:11" 574"13:12" 858"13:13" 573"13:14" 572"13:15" 571"13:16" 575"13:17" 857"13:18" 572"13:19" 573"13:20" 574"13:21" 572"13:22" 1054"13:23" 573"13:24" 575"13:25" 569"13:26" 572"13:27" 856"13:28" 572"13:29" 574"13:30" 572"13:31" 573"13:32" 857"13:33" 571"13:34" 573"13:35" 570"13:36" 574"13:37" 857"13:38" 747352"13:39" 1548813"13:40" 1548"13:41" 574"13:42" 3865293"13:43" 6170"13:44" 3331"13:45" 1545861"13:46" 901"13:47" 1722453"13:48" 3839352"13:49" 1672340"13:50" 2280"13:51" 1818880"13:52" 2548977"13:53" 3401"13:54" 862"13:55" 858"13:56" 572"13:57" 1329"13:58" 575"13:59" 574"14:00" 1526"14:01" 1530"14:02" 12994"14:03" 2391"14:04" 1149"14:05" 149010"14:06" 5492"14:07" 857"14:08" 857"14:09" 1148"14:10" 851"14:11" 854"14:12" 3894"14:13" 149041"14:14" 145109"14:15" 754"14:16" 1330"14:17" 861"14:18" 1223"14:19" 127167"14:20" 571"14:21" 285"14:22" 287"14:23" 572"14:24" 35292"14:25" 569"14:26" 570"14:27" 867"14:28" 2534"14:29" 856"14:30" 570"14:31" 573"14:32" 573"14:33" 574"14:34" 1595"14:35" 574"14:36" 571"14:37" 148726"14:38" 148452"14:39" 148727"14:40" 861"14:41" 148441"14:42" 859"14:43" 889605"14:44" 1144"14:45" 858"14:46" 857"14:47" 862"14:48" 1522546"14:49" 7094"14:50" 861"14:51" 767325"14:52" 1051"14:53" 148723"14:54" 860"14:55" 148743"14:56" 149333"14:57" 857"14:58" 5771"14:59" 5961"15:00" 59869"15:01" 10255"15:02" 859"15:03" 2892"15:04" 858"15:05" 2523173"15:06" 1547763"15:07" 1530"15:08" 2296079"15:09" 7799"15:10" 3482555

可见输出了结果。

最后建议将分析结果数据定期入库MySQL,生成相应的数据报表。

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。