关于pv的那些事!!

遗留问题:whid=1969的日志记录是什么意思?

网站站点信息未分配的时候,会用1969去代替站点信息。

PV:页面浏览量(page view),用户每次打开或刷新一次网页即被计算一次.

关于pv的计算,我们首先考虑下最简单的情况:计算易迅网每天的的pv,这个pv不分站点和渠道.

我们打开跳板机,进入75机器的shell目录.

打开shell目录里的siteKeyData_AllSite.sh,开始研究这个文件.

我们注意到这个脚本文件里,有一个getPV函数,我们来看下这个函数.

pv计算规则:

①页面跳转,并且当前页面id合法;

②网站端的,不计无线;

③IFrame内嵌页卡不计入PV;

④外部投放的页卡不计入PV;

⑤符合以上4点的就算做一条pv,pv数加1.

 

我们根据以上5点,写一个测试脚本vic_AllSite_pv.sh.算一下20140909号的的一天的易迅网的pv量.

  1 #!/bin/bash
  2 
  3 ## file name: vic_AllSite_pv.sh
  4 
  5 
  6 ## example: ./vic_AllSite_pv.sh  20140909
  7 
  8 #获取当前脚本所在路径
  9 currdir=$(cd "$(dirname "$0")"; pwd)
 10 
 11 #进入脚本所在目录
 12 cd $currdir
 13 
 14 ## PHP path
 15 PHP_BIN="/usr/local/php/bin/php"
 16 
 17 ## 显示脚本的正确用法
 18 function showUsage()
 19 {
 20     echo "Usage: sh vic_AllSite_pv.sh  [log_file_time]" >&2
 21     exit 1;
 22 }
 23 
 24 ## 如果参数为0,则显示该脚本的用法
 25 if [ $# = 0 ]; then
 26     showUsage
 27 fi
 28 
 29 ## 变量初始化
 30 start_time=`date "+%Y-%m-%d %H:%M:%S.%N"`
 31 if [ $# = 1 ]; then
 32     log_file_time=$1
 33 fi
 34 
 35 ## 处理结果的存储文件名
 36 result_pv_filename="../data_today/""vic_ALLSITE_PV""_""$log_file_time"
 37 
 38 ## site_key
 39 ## warehouseid    - 各分站总体数据
 40 ## warehouseid_1  - 各分站商品详情页数据
 41 
 42 ## get PV
 43 function getPV()
 44 {
 45     
 46     log_file_time=$1
 47     log_filename="row_data_"$1
 48     log_filename=$log_filename"*"
 49 
 50     awk -v fdate=$log_file_time -v ftype=$period_type -F"\t" '{
 51         ## 不是页面跳转或者当前页面id不合法的请求,过滤掉
 52         if($3 != 1 || int($9) < 0) next;
 53 
 54         ## 不计无线
 55         wh_id = int($6);
 56         ## if ( (wh_id == 1990) || (wh_id == 1991) || (wh_id == 1999) || (wh_id == 1992) ) next;
 57         if ( (wh_id !=    1) &&
 58              (wh_id != 1001) &&
 59              (wh_id != 2001) &&
 60              (wh_id != 3001) &&
 61              (wh_id != 4001) &&
 62              (wh_id != 5001) &&
 63              (wh_id != 1969) ) next;
 64 
 65         ## IFrame内嵌页卡不计入PV
 66         if( (int($9) == 12376) || (int($9) == 1237017) ||
 67             (int($9) == 63786) || (int($9) == 6378017) ||
 68             (int($9) == 63796) || (int($9) == 6379017) ||
 69             (int($9) == 63806) || (int($9) == 6380017) ||
 70             (int($9) == 63816) || (int($9) == 6381017) ||
 71             (int($9) == 63826) || (int($9) == 6382017) ||
 72             (int($9) == 5020)  || (int($9) == 11718320) ||
 73             (int($9) == 9428027) || (int($9) == 9443027) ||
 74             (int($9) == 9445027) || (int($9) == 9446027) ||
 75             (int($9) == 9444017) || (int($9) == 9448017) ) next;
 76 
 77         ## 外部投放的页卡不计入PV
 78         if(0 == $10 && !match($11, /buy\.(51buy|yixun)\.com/)) next;
 79 
 80         if((int($9) % 10 == 1) || match($11, /item\.(51buy|yixun)\.com/)) {
 81             pvArr["0_1"] = pvArr["0_1"] + 1;
 82         }
 83 
 84         if((int($9) == 220) || match($11, /buy\.(51buy|yixun)\.com\/cart\.html/)) {
 85             pvArr["0_2"] = pvArr["0_2"] + 1;
 86         }
 87 
 88         if((int($9) == 230) || (int($9) == 250) || match($11, /buy\.(51buy|yixun)\.com\/order.html/)) {
 89             pvArr["0_3"] = pvArr["0_3"] + 1;
 90         }
 91 
 92         ## key: 分站ID
 93         pvArr["0"] = pvArr["0"] + 1;
 94         } END {
 95             for(key in pvArr)
 96             {
 97                print key"\t"pvArr[key] >  "../data_today/""ALLSITE_PV_"ftype"_"fdate
 98             }
 99         }' ../data_today/$log_filename
100 }
101 
102 
103 ## start execute
104 getPV  $log_file_time
105 
106 ## store result to DB
107 ##${PHP_BIN} -c . -f siteKeyDataStore_AllSite.php $result_pv_filename $result_uv_filename $result_btnuv_filename
108 
109 ## process end time
110 end_time=`date "+%Y-%m-%d %H:%M:%S.%N"`
111 
112 echo -e "ALLSITE_PV_Data:""["",""$log_file_time""]:""$start_time""~""$end_time"
vic_AllSite_pv.sh

计算结果如下:

0 3430852 //全站一天的pv
0_1 1071985 //商品详情页pv
0_2 76135 //购物车pv
0_3 40871 //下单页pv

 

全站的pv比较好算,那么各分站的pv又该怎么去算呢.

其实有了前面的基础,分站的计算就比较简单了,我们更改一下脚本.

  1 #!/bin/bash
  2 
  3 ## file name: vic_Sites_pv.sh
  4 
  5 
  6 ## example: ./vic_Sites_pv.sh  20140909
  7 
  8 #获取当前脚本所在路径
  9 currdir=$(cd "$(dirname "$0")"; pwd)
 10 
 11 #进入脚本所在目录
 12 cd $currdir
 13 
 14 ## PHP path
 15 PHP_BIN="/usr/local/php/bin/php"
 16 
 17 ## 显示脚本的正确用法
 18 function showUsage()
 19 {
 20     echo "Usage: sh vic_Sites_pv.sh  [log_file_time]" >&2
 21     exit 1;
 22 }
 23 
 24 ## 如果参数为0,则显示该脚本的用法
 25 if [ $# = 0 ]; then
 26     showUsage
 27 fi
 28 
 29 ## 变量初始化
 30 start_time=`date "+%Y-%m-%d %H:%M:%S.%N"`
 31 if [ $# = 1 ]; then
 32     log_file_time=$1
 33 fi
 34 
 35 ## 处理结果的存储文件名
 36 result_pv_filename="../data_today/""vic_SITES_PV""_""$log_file_time"
 37 
 38 ## site_key
 39 ## warehouseid    - 各分站总体数据
 40 ## warehouseid_1  - 各分站商品详情页数据
 41 
 42 ## get PV
 43 function getPV()
 44 {
 45     
 46     log_file_time=$1
 47     log_filename="row_data_"$1
 48     log_filename=$log_filename"*"
 49 
 50     awk -v fdate=$log_file_time -v ftype=$period_type -F"\t" '{
 51         ## 不是页面跳转或者当前页面id不合法的请求,过滤掉
 52         if($3 != 1 || int($9) < 0) next;
 53 
 54         ## 不计无线
 55         wh_id = int($6);
 56         ## if ( (wh_id == 1990) || (wh_id == 1991) || (wh_id == 1999) || (wh_id == 1992) ) next;
 57         if ( (wh_id !=    1) &&
 58              (wh_id != 1001) &&
 59              (wh_id != 2001) &&
 60              (wh_id != 3001) &&
 61              (wh_id != 4001) &&
 62              (wh_id != 5001) &&
 63              (wh_id != 1969) ) next;
 64 
 65         ## IFrame内嵌页卡不计入PV
 66         if( (int($9) == 12376) || (int($9) == 1237017) ||
 67             (int($9) == 63786) || (int($9) == 6378017) ||
 68             (int($9) == 63796) || (int($9) == 6379017) ||
 69             (int($9) == 63806) || (int($9) == 6380017) ||
 70             (int($9) == 63816) || (int($9) == 6381017) ||
 71             (int($9) == 63826) || (int($9) == 6382017) ||
 72             (int($9) == 5020)  || (int($9) == 11718320) ||
 73             (int($9) == 9428027) || (int($9) == 9443027) ||
 74             (int($9) == 9445027) || (int($9) == 9446027) ||
 75             (int($9) == 9444017) || (int($9) == 9448017) ) next;
 76 
 77         ## 外部投放的页卡不计入PV
 78         if(0 == $10 && !match($11, /buy\.(51buy|yixun)\.com/)) next;
 79 
 80         if((int($9) % 10 == 1) || match($11, /item\.(51buy|yixun)\.com/)) {
 81             pvArr[$6"_1"] = pvArr[$6"_1"] + 1;
 82         }
 83 
 84         if((int($9) == 220) || match($11, /buy\.(51buy|yixun)\.com\/cart\.html/)) {
 85             pvArr[$6"_2"] = pvArr[$6"_2"] + 1;
 86         }
 87 
 88         if((int($9) == 230) || (int($9) == 250) || match($11, /buy\.(51buy|yixun)\.com\/order.html/)) {
 89             pvArr[$6"_3"] = pvArr[$6"_3"] + 1;
 90         }
 91 
 92         ## key: 分站ID
 93         pvArr[$6] = pvArr[$6] + 1;
 94         } END {
 95             for(key in pvArr)
 96             {
 97                print key"\t"pvArr[key] >  "../data_today/""SITES_PV_"ftype"_"fdate
 98             }
 99         }' ../data_today/$log_filename
100 }
101 
102 
103 ## start execute
104 getPV  $log_file_time
105 
106 ## store result to DB
107 ##${PHP_BIN} -c . -f siteKeyDataStore_AllSite.php $result_pv_filename $result_uv_filename $result_btnuv_filename
108 
109 ## process end time
110 end_time=`date "+%Y-%m-%d %H:%M:%S.%N"`
111 
112 echo -e "SITES_PV_Data:""["",""$log_file_time""]:""$start_time""~""$end_time"

计算结果如下:

2001_3 5789
1_3 16977
4001_1 76058
2001 550962
1969_1 22
4001_2 4795
3001 314763
1969_2 1
4001_3 2252
4001 239516
5001 134831
3001_1 93959
3001_2 6710
1001_1 291933
3001_3 3231
5001_1 40875
1001_2 19668
5001_2 2934
1001_3 11163
5001_3 1459
1969 19877
1 1283055
2001_1 181135
1_1 388003
2001_2 10970
1_2 31057
1001 887848

 剩下的就是渠道pv数据的计算了.我们另起一章,详细解说.

posted on 2014-09-10 10:08  阮佳佳  阅读(602)  评论(0编辑  收藏  举报

导航