FW:Data Mining UrlScan 3.0 Logs using LogParser 2.2
原文:http://blogs.msdn.com/robert_mcmurray/archive/2008/09/10/data-mining-urlscan-3-0-logs-using-logparser-2-2.aspx
We released a new version of UrlScan recently, and one of the great new features in this version is log files that conform to the W3C Extended Log File Format. What this means to administrators is that they can now parse their UrlScan activity using almost any common log utilities, including Microsoft's LogParser 2.2 utility. For anyone that hasn't heard of LogParser, this is a freeware utility from Microsoft that allows you to write SQL-style queries to extract useful information. Eventually I'd like for the following information to show up on the http://learn.iis.net/ web site, but for now I'd like everyone to at least have access to the information.
Getting Started
Downloading and Installing UrlScan and LogParser
The download locations for UrlScan are listed in the "Getting UrlScan" section of the following topic:
http://learn.iis.net/page.aspx/473/
Note: There are separate downloads for 32-bit and 64-bit Windows.
LogParser is available from the following URL:
Note: It's a 32-bit application, but you can use it on a 64-bit Windows system.
While it's not essential, copying Logparser.exe to a folder in your PATH will allow you to run LogParser from any folder on your system.
UrlScan Log Files
Locating your UrlScan Log Files
By default UrlScan installs to %WinDir%\System32\Inetsrv\urlscan, and this is where your Urlscan.ini file should be located. (If you installed to a different location, you would need to locate your installation by searching for your Urlscan.ini file.) In the [Options] section of the Urlscan.ini file you should see an entry for LoggingDirectory that is set "Logs" by default, which means that your default directory for UrlScan log files is %WinDir%\System32\Inetsrv\urlscan\Logs.
Log File Names
UrlScan log files are named using the following syntax:
urlscan.mmddyy.log
Where:
mm | = | Month |
dd | = | Day |
yy | = | Year |
Log File Fields
The following table lists the fields that are available:
Field | Description | ||||||||||||||||||||||||||
Date | Request date in YYYY-MM-DD format | ||||||||||||||||||||||||||
Time | Request time in UTC time | ||||||||||||||||||||||||||
c-ip | IP address of the client that made the request | ||||||||||||||||||||||||||
s-siteid | ID of the web site that received the request | ||||||||||||||||||||||||||
cs-method | HTTP method for the request | ||||||||||||||||||||||||||
cs-uri | URL for the request | ||||||||||||||||||||||||||
x-action | Action taken; for example:
| ||||||||||||||||||||||||||
x-reason | Reason for action; for example:
| ||||||||||||||||||||||||||
x-context | Part of request that triggered the action; for example:
| ||||||||||||||||||||||||||
cs-data | The request data that triggered the action. | ||||||||||||||||||||||||||
x-control | Config control data that caused the trigger. For example, if a request was rejected because of a deny string from a UrlScan rule, the specific deny string will be listed. |
Querying Data
Simple Queries
Selecting Everything
The simplest (and least useful) query that you can issue is the following, which is listed here just to get us started:
logparser.exe "SELECT * FROM urlscan.*.log" -i:w3c
Let's analyze what this query does:
- Selects all fields ("SELECT *") from all records in all log files ("FROM urlscan.*.log")
- Specifies the log file input type as "W3C Format" ("-i:w3c")
Note: Specifying the input file format as "W3C Format" is essential - without that step you would not be able to parse your UrlScan logs.
Selecting Specific Fields to Query
You can restrict the amount of data that you see by specifying which fields to return:
logparser.exe "SELECT date,time,c-ip,x-action FROM urlscan.*.log" -i:w3c
Here's what this query does:
- Selects only four specific fields ("SELECT date,time,c-ip,x-action") from all records in all log files ("FROM urlscan.*.log")
- Specifies the log file input type as "W3C Format" ("-i:w3c")
This allows you to greatly reduce the amount of information that you see in your results.
Exporting Logs into a Different Format
As mentioned earlier, selecting all of the records in all of your logs is probably not all that helpful, unless you were interested in parsing your UrlScan information using another application, like Microsoft Excel, Microsoft Access, SQL Server, etc. In which case, you might want to export all of your data into a comma-delimited file like the following example:
logparser.exe "SELECT * INTO UrlScan.csv FROM urlscan.*.log" -i:w3c -o:csv
Here's what this query does:
- Selects all fields ("SELECT *") from all records in all log files ("FROM urlscan.*.log")
- Inserts from all records into a comma-delimited file ("INTO UrlScan.csv")
- Specifies the input file type as "W3C Format" ("-i:w3c")
- Specifies the output file type as "CSV Format" ("-o:csv")
You could just as easily export into a tab-separated file using the following syntax:
logparser.exe "SELECT * INTO UrlScan.tsv FROM urlscan.*.log" -i:w3c -o:tsv
Filtering Queries
One of the great things about LogParser and SQL queries in general is that you can filter the amount of data that you see by specifying criteria using a WHERE clause. The following
Filtering based on IP Address
For example, to see all of the requests from a specific client IP address, you could use the following syntax:
logparser.exe "SELECT * FROM urlscan.*.log WHERE c-ip='192.168.1.1'" -i:w3c
Here's what this query does:
- Selects all fields ("SELECT *") from all records in all log files ("FROM urlscan.*.log")
- Filters the results based on a specific client IP address ("WHERE c-ip='192.168.1.1'")
- Specifies the input file type as "W3C Format" ("-i:w3c")
Filtering based on UrlScan Action
You can also filter based on UrlScan rejections using the following syntax:
logparser.exe "SELECT * FROM urlscan.*.log WHERE x-action='Rejected'" -i:w3c
Here's what this query does:
- Selects all fields ("SELECT *") from all records in all log files ("FROM urlscan.*.log")
- Filters the results where UrlScan rejected the request ("WHERE x-action='Rejected'")
- Specifies the input file type as "W3C Format" ("-i:w3c")
Filtering based on Uniqueness
You can filter the amount of data that you see by specifying which fields to return and grouping the data based on uniqueness. This is done by adding the DISTINCT clause as seen in the following syntax:
logparser "SELECT DISTINCT c-ip FROM urlscan.*.log" -i:w3c
Here's what this query does:
- Selects only the unique client IP addresses ("SELECT DISTINCT c-ip") from all records in all log files ("FROM urlscan.*.log")
- Specifies the input file type as "W3C Format" ("-i:w3c")
Grouping, Tabulating, and Sorting Queries
Counting Requests by Fields
Another great feature of SQL-style queries is the ability to tabulate data. The following example shows how retrieve a count of requests based on the client IP address:
logparser.exe "SELECT c-ip, COUNT(*) FROM urlscan.*.log GROUP BY c-ip" -i:w3c
Here's what this query does:
- Selects only the client IP address and request count ("SELECT c-ip, COUNT(*)") from all records in all log files ("FROM urlscan.*.log")
- Groups the request count based on the client IP address ("GROUP BY c-ip")
- Specifies the input file type as "W3C Format" ("-i:w3c")
Sorting Data by Fields
You can reorder your results into something more useful using the ORDER BY clause, as illustrated in the following example:
logparser.exe "SELECT x-action,x-reason,c-ip,cs-uri FROM urlscan.*.log ORDER BY x-action,x-reason" -i:w3c
Here's what this query does:
- Selects four specific fields ("SELECT x-action,x-reason,c-ip,cs-uri") from all records in all log files ("FROM urlscan.*.log")
- Sorts the data in ascending order based on the UrlScan action and reason for the action ("ORDER BY x-action,x-reason")
- Specifies the input file type as "W3C Format" ("-i:w3c")
Putting it All Together
Counting Requests by Client IP and Sorting Data by Request Count
Starting with the request count query from a previous example, you can rename the request count field to something more useful using the AS clause, as illustrated in the following example:
logparser.exe "SELECT c-ip, COUNT(*) AS c-request-count FROM urlscan.*.log GROUP BY c-ip" -i:w3c
This helps reorder the data based on the count, as shown in the following example:
logparser.exe "SELECT c-ip, COUNT(*) AS c-request-count FROM urlscan.*.log GROUP BY c-ip ORDER BY c-request-count DESC" -i:w3c
Here's what this query does:
- Selects only the client IP address and request count ("SELECT c-ip, COUNT(*)") from all records in all log files ("FROM urlscan.*.log")
- Groups the request count based on the client IP address ("GROUP BY c-ip")
- Sorts the data in descending order based on the request count ("ORDER BY c-request-count DESC")
- Specifies the input file type as "W3C Format" ("-i:w3c")
Counting Rejected Requests based on the Reason for Rejections
While somewhat similar to the previous example, the following syntax will filter the results based on rejections and count the number of times that each reason occurs:
logparser.exe "SELECT x-reason, COUNT(*) AS c-request-count FROM urlscan.*.log WHERE x-action='Rejected' GROUP BY x-reason ORDER BY c-request-count DESC" -i:w3c
Here's what this query does:
- Selects only the reason and request count ("SELECT x-reason, COUNT(*)") from all records in all log files ("FROM urlscan.*.log")
- Filters the results based on rejections ("WHERE x-action='Rejected'")
- Groups the request count based on the reason ("GROUP BY x-reason")
- Sorts the data in descending order based on the request count ("ORDER BY c-request-count DESC")
- Specifies the input file type as "W3C Format" ("-i:w3c")
Splitting Requests into Separate Log Files based on Site ID
UrlScan writes all log entries to a common log file, but you may want to see your data separated by site. The following example splits your UrlScan logs into separate files by site by creating unique log files that are named for each site ID. This example is a little more complex than previous examples, and is written as a batch file.
logparser.exe "SELECT DISTINCT s-siteid INTO siteids.tsv FROM urlscan.*.log" -i:W3C -o:tsv -headers:off
for /f %%a in (siteids.tsv) do (
logparser.exe "SELECT * INTO W3SVC%%a.log FROM urlscan.*.log WHERE s-siteid='%%a' ORDER BY date,time" -i:W3C -o:w3c
)
Here's what this batch file does:
- Creates a tab-separated file that contains a list of unique site IDs
- Creates a loop to step through the tab-separated file containing the site IDs
- Selects all fields for the specific site ID and creates a new W3C format file for each site