Statistical Indexes

SQL Server 2000 and 2005 both use a "cost-based" optimizer to quickly locate data in its databases. 

Indexes are actually internal tables, and work like the index in a book – it contains a list of the terms you've indicated should be tracked and lists where those terms can physically be found. If the index is current, and if the data is large, the SQL Server engine makes a determination to use the index — sometimes. At times it's actually quicker for the engine to read all of the rows in a table to find the one it's looking for (called a scan) or take some other path to the data.

The "cost" in the cost-based optimizer is based on many factors. One factor is a set of information that is actually another set of indexes. These indexes are called "statistics." The system can generate these indexes, so you don't have to specify the columns that it uses, although you can. You can specify when the statistics are created, and you can specify when they are updated.

By default, SQL Server automatically creates and updates statistics for you. If you leave these setting on, the system will create an index whenever you perform certain operations against tables. You can change this behavior to a manual setting if the creation and updates on these indexes becomes a performance issue. Whenever the system creates or updates the indexes, it takes time, memory, storage and other resources. Most of the time the performance you gain by having these statistics current outweighs the cost of letting the system create and maintain them.

But sometimes it's more advantageous to take control of the statistics yourself. In a moment I'll show you how to find out if the statistics are being used, and when you should turn them off and on.

Automatic and Manual Statistics

First, let's take a look at how you can determine the setting for the statistics. They are either created manually or automatically, and they can be updated manually or automatically. In other words, you have several combinations you can use — you can have the system automatically create the statistics and you can maintain them yourself, you can create the statistics and have the system maintain them, or you can do it all manually or let the system decide. As I mentioned, normally it's best to let the system decide on both choices, unless you determine that it's becoming a system impact.

So what kind of information does SQL Server use to determine when the statistics need to be updated? Many types of data are used to make the decision to create a statistic index or to update it:

  • The last time that the statistics were updated
  • The average length of keys in a particular column
  • The number of rows and pages occupied by a table’s data
  • A histogram that displays the distribution of data
  • String summaries created when performing LIKE queries on character data
  • Index or key columns in your tables

You can see that you'll have to think long and hard about deciding to take this on yourself.

You can check to see what the setting for your database is turned on or off by right-clicking the database in either Enterprise Manager (SQL Server 2000) or SQL Server Management Studio (SQL Server 2005) and selecting "Properties" from the menu that appears, in the "Options" area. A simple click there turns the creation and the maintenance on and off.

You can also use a query to find the same information using the sp_dboption stored procedure. This sample query shows the setting for the AdventureWorks database in a SQL Server 2005 instance:

/* Show whether statistics are enabled for a database */ 
EXEC sp_dboption AdventureWorks,’auto update statistics’; 
GO 
You can use that same stored procedure to turn the automatic setting on (True) or off (False):
/* Turn on automatic statistics */ 
EXEC sp_dboption AdventureWorks, ’auto update statistics’, ’TRUE’; 
GO 

There's another stored procedure, called sp_autostats that also controls the statistics. It just needs the database name and the state you want:

/* Turn on automatic statistics - use OFF to turn them off*/ 
EXEC sp_autostats AdventureWorks, ’ON’; 
GO 

If you turn that setting off, you can create the statistics manually with the CREATE STATISTICScommand. There several options with this command, which you can read more about in Books Online. The primary parts are the table name for the index, the columns to include in the index, and importantly, how many rows to sample for the index, using a PERCENT or FULLSCAN option. This last part is important because if it is a really big table, this can take some time. If you're doing this manually, then the odds are that you did so for performance reason.

There's a tradeoff here, just like many performance tuning decisions. If you don't sample enough rows, the statistical index won't be as useful. If you sample more rows, or all of them using theFULLSCAN option, it can impact performance again.

Here's the general syntax — as I mentioned, you can find the full syntax in Books Online:

CREATE STATISTICS statistical index name 
ON table or view ( column 1, column 2, etc. ) 
 WITH  FULLSCAN or SAMPLE number of PERCENT or number of ROWS 

Once you create the statistics, you can also manually update them. Remember, you can have SQL Server automatically update statistics, even if you created them manually.

/* Update statistics on a table – replace tablename with your table's name */
UPDATE STATISTICS tablename;
GO
/* or you can update statistics on all tables */
EXEC sp_updatestats;
GO

Statistics Meta Data

Now that you know how the statistics are created, you can start finding out about them. You can use the sp_helpstats stored procedure. It takes one parameter — the name of the object the statistics are on:

USE AdventureWorks;
GO
EXEC sp_helpstats [Sales.Customer];
GO

That will get you the statistics for an object. But what if you don't know the name? Well, you can go at the system tables for the names that Microsoft uses that start with an underscore character. Unfortunately, not all statistics are named this way:

/* Show the statistics by table – 2000, if named with an underscore */ 
SELECT a.name, b.name 
  FROM sysindexes a 
      INNER JOIN sysobjects b 
      ON a.id = b.id 
  WHERE a.name LIKE ’[_]%’ 
  ORDER BY b.name;
GO 

For 2005, it's a bit simpler, and has :

/* Show statistics from SQL Server 2005 */
SELECT a.name AS ’Statistic’, b.name AS ’Table’
  FROM sys.stats a 
      INNER JOIN sys.sysobjects b 
      ON a.object_id = b.id 
	WHERE a.auto_created = 1 or a.user_created = 1
  ORDER BY b.name;
GO 
/* This stored procedure shows statistics per table */ 
EXEC sp_helpstats ’Production.Product’; 
GO

Evaluating Statistics

Are these statistics even being used? You can find out in two places: in the Query Analyzer or SQL Server Management Studio Query windows using the Showplan tool that I described here. A red line on a query output will show you that the statistics are out of date and you can right-click that line to update them.

Also, SQL Server Profiler can show you that Hash warnings or that a column is missing statistics. Once you use one of these two tools you can find more information about them using these scripts:

/* Find out when statistics were last updated for SQL 2K5 */
SELECT name AS ’Statistic’, 
STATS_DATE(object_id, index_id) AS ’Last Updated’
FROM sys.indexes 
WHERE object_id = OBJECT_ID(’Production.Product’);
GO
 -- Can use this for 2K
DBCC SHOW_STATISTICS (’Person.Address’, AK_Address_rowguid);
GO

The upshot is this — it's often best just to leave the statistics automatically set and maintained by SQL Server. If you do change this setting, use these techniques to work with them.

InformIT Articles and Sample Chapters

If you're using SQL Server version 7, the statistics work in a similar way, but you maintain them differently. You can read more here.

Online Resources

If you really want to know more, go here.

posted @ 2010-05-13 11:53  programming snail  阅读(369)  评论(0编辑  收藏  举报