在电子商务网站中,假如我们需要分析用户的地区的分布的时候,虽然一般的网站都有记录用户所在省份和地区的字段,但是一般的网站都不会去核实用户的这些信息,因此数
据的准确性就有比较大的偏差,甚至很多用户都不填写这个信息(如果不是必填项的话),那么这个时候用户注册时候的IP地址就显得可靠多了,但是要把IP地址转换成地区并
非易事,倒不是说这个技术很难实现,是因为IP分配规则太乱了,不象邮政编码或者区号那么定的那么死,所以这个就成为了一项比较浩大的工程。IP地址库有商业的可以购买
,也有免费的可以使用。我就是使用的广大网友经常使用的QQWry.Data 这个东东,我先叫人家把这里面的数据整理成我定好的格式(这是一项比较大的工程,3个人弄大概弄了2
天),然后导入到数据仓库中我的地区维度表。
数据仓库中的地区维度的表结构如下:
GeographyKey int
Province varchar(50)
City varchar(50)
IPSections text
我的源系统中的用户的表结构如下:
CustomerID varchar(50)
Account nvarchar(50)
RegisterTime datetime
IPAddress varchar(50)
UpdateTime datetime
数据仓库中的用户的表结构如下:
CustomerKey int
GeographyKey int
RegisterDateKey int
CustomerCodeAlternateKey varchar(50)
Account nvarchar(50)
IPAddress varchar(50)
RegisterTime datetime
UpdateTime datetime
接下来还是使用我们的好帮手SSIS。
其中关键在于中间的那个脚本组件,里面包含一个类IPCompare
Code
Imports System
Imports System.Net
Public Class IPCompare
Shared Function AreEqual(ByVal IPAddr1 As String, ByVal IPAddr2 As String) As Boolean
Return IPAddressToLongBackwards(IPAddr1) = IPAddressToLongBackwards(IPAddr2)
End Function
Shared Function IsGreater(ByVal ToCompare As String, ByVal CompareAgainst As String) As Boolean
Return IPAddressToLongBackwards(ToCompare) > IPAddressToLongBackwards(CompareAgainst)
End Function
Shared Function IsLess(ByVal ToCompare As String, ByVal CompareAgainst As String) As Boolean
Return IPAddressToLongBackwards(ToCompare) < IPAddressToLongBackwards(CompareAgainst)
End Function
Shared Function IsGreaterOrEqual(ByVal ToCompare As String, ByVal CompareAgainst As String) As Boolean
Return IPAddressToLongBackwards(ToCompare) >= IPAddressToLongBackwards(CompareAgainst)
End Function
Shared Function IsLessOrEqual(ByVal ToCompare As String, ByVal CompareAgainst As String) As Boolean
Return IPAddressToLongBackwards(ToCompare) <= IPAddressToLongBackwards(CompareAgainst)
End Function
Shared Function LongToIPAddress(ByVal IPAddr As Long) As String
Return New System.Net.IPAddress(IPAddr).ToString()
End Function
Shared Function IPAddressToLongBackwards(ByVal IPAddr As String) As Long
Dim i As Integer
Dim IpByte As String() = IPAddr.Split(New Char() {"."c})
Dim nUpperBound As Integer = IpByte.GetUpperBound(0)
If (nUpperBound <> 3) Then
IpByte = New String(4 - 1) {}
i = 1
Do While (i <= (3 - nUpperBound))
IpByte((nUpperBound + i)) = "0"
i += 1
Loop
End If
Dim TempByte4 As Byte() = New Byte(4 - 1) {}
i = 0
Do While (i <= 3)
If IPCompare.IsNumeric(IpByte(i)) Then
TempByte4((3 - i)) = CByte((Convert.ToInt32(IpByte(i)) And &HFF))
End If
i += 1
Loop
Return BitConverter.ToUInt32(TempByte4, 0)
End Function
Private Shared Function IsNumeric(ByVal str As String) As Boolean
If (System.Text.RegularExpressions.Regex.IsMatch(str, "^-?\d+$")) Then
Return True
End If
Return False
End Function
End Class
脚本组件里面的代码
Code
1' Microsoft SQL Server Integration Services user script component
2' This is your new script component in Microsoft Visual Basic .NET
3' ScriptMain is the entrypoint class for script components
4
5Imports System
6Imports System.Xml
7Imports System.Net
8Imports System.Data
9Imports System.Data.SqlClient
10Imports System.Math
11Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
12Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
13
14Public Class ScriptMain
15 Inherits UserComponent
16
17 Private dataSet As DataSet = New DataSet()
18
19 Public Overrides Sub 输入0_ProcessInputRow(ByVal Row As 输入0Buffer)
20
21 Dim GeographyKey As Int32 = 0
22 Dim ipcurrent As String = String.Empty
23
24 Try
25 If (Row.IPAddress_IsNull()) Then
26 Row.GeographyKey = -1
27 Return
28 End If
29
30 ipcurrent = Row.IPAddress
31
32 If (ipcurrent.Length = 0) Then
33 Row.GeographyKey = -1
34 Return
35 End If
36
37 Dim i As Integer
38 For i = 0 To dataSet.Tables("DimGeography").Rows.Count - 1
39 Dim dtRow As DataRow = dataSet.Tables("DimGeography").Rows(i)
40 Dim ipsections As String()
41 ipsections = dtRow("IPSections").ToString().Split(CChar(";"))
42
43 Dim ipsection As String
44 For Each ipsection In ipsections
45 If (IpBetween(ipcurrent, ipsection)) Then
46 GeographyKey = Convert.ToInt32(dtRow("GeographyKey").ToString())
47 Exit For
48 End If
49 Next
50 If (GeographyKey <> 0) Then
51 Row.GeographyKey = GeographyKey
52 Return
53 End If
54 Next
55 GeographyKey = -1
56 Row.GeographyKey = GeographyKey
57 Catch
58 GeographyKey = -1
59 Row.GeographyKey = GeographyKey
60 End Try
61
62 End Sub
63
64 Public Overrides Sub PreExecute()
65 MyBase.PreExecute()
66
67 Dim conn As OleDb.OleDbConnection
68 Dim adapter As OleDb.OleDbDataAdapter
69 Dim sql As String = "select GeographyKey,IPSections from DimGeography"
70
71 conn = New OleDb.OleDbConnection(Connections.DataCenterDW.ConnectionString + ";pwd=work8520")
72
73 adapter = New OleDb.OleDbDataAdapter(sql, conn)
74
75 Try
76 conn.Open()
77 adapter.Fill(dataSet, "DimGeography")
78 Catch ex As Exception
79
80 Finally
81 conn.Close()
82 End Try
83
84 End Sub
85
86 Private Function IpBetween(ByVal targetip As String, ByVal ipsection As String) As Boolean
87 Dim ipstart As String = ipsection.Split(CChar("-"))(0)
88 Dim ipend As String = ipsection.Split(CChar("-"))(1)
89
90 If (IPCompare.IsGreaterOrEqual(targetip, ipstart) And IPCompare.IsLessOrEqual(targetip, ipend)) Then
91 Return True
92 End If
93
94 Return False
95 End Function
96
97End Class
98
99
这样就OK了。
附件里面我提供了这个包文件和地区维度表数据的下载,希望能够给朋友们提供一点微薄的帮助。
附件下载