OLAP Distinct Counts and Performance Analysis

OLAP Distinct Counts and Performance Analysis

Using Microsoft Analysis Services 2000, SQL Server Accelerator for Business Intelligence schema, and a multiprocessor Unisys ES7000 Server

Sanjay Soni

March 2003

Applies to:
Microsoft® SQL Server™ 2000

Summary: Learn techniques to answer useful Distinct Counts business questions by using the Sales and Marketing scenario and Analysis Services 2000 to learn about performance characteristics of Analysis Services 2000 when dealing with Distinct Counts using a multiprocessor ES7000 server. (28 printed pages)

Contents

Introduction
Distinct Count Analysis
Microsoft Solution Offerings: SSABI Sales and Marketing
Building DC Cubes Using SSABI Schema
Performance Study Results
Query Performance
Measuring MDX Query Response Times (QRT) for Cold/Warm Cache with Up to 500 Concurrent Users
Performance Optimization Tips
Conclusion
Appendix A
Appendix B
Appendix C

Introduction

This white paper explains Distinct Counts (DC) by giving you insight into a Sales and Marketing Business Analytics scenario using the Microsoft® SQL Server™ Accelerator for Business Intelligence (SSABI) Sales and Marketing schema. It shows cube design techniques used to solve useful business questions, and then focuses on a performance study of Distinct Counts using Microsoft Analysis Services 2000 and a Unisys ES7000 server.

This white paper is organized in two parts. In the first half of the paper, you are given a brief introduction to the Distinct Counts problem in online analytical processing (OLAP). You are shown how to extend the standard SSABI Sales and Marketing schema to answer more advanced business questions using Distinct Counts. More importantly, this first section serves as the groundwork and explains the series of steps that were followed to prepare OLAP cubes for the performance study in the second part of the white paper.

The second half of the white paper focuses on the performance study of OLAP cubes involving Distinct Counts. The cube design was used in the first part of this white paper, and shares results of performance tests that were conducted on cubes involving Distinct Counts. This performance study used a 16-processor partition, on a 32-way Unisys ES7000 Server to show scalability of Analysis Services. This white paper includes optimization techniques for getting the optimal performance when working with Distinct Count OLAP cubes.

The capabilities of Analysis Services 2000 has the ability to aggregate millions of rows of data to get quick results by categorizing data into dimensions that are apparent from the user interface. Analysis Server 2000 also includes very powerful analysis tools and techniques to enhance large data warehouse environments. For example, partitioning of cubes, compression of data when storing relational data in multidimensional format and so on.

More advanced and challenging analysis capabilities of Analysis Sever 2000 are not obvious from the user interface, and may require more user expertise in order to take full advantage of them. One advanced and challenging analysis technique is Distinct Counts in OLAP. Distinct Count Analysis is one of the most popular types of analyses and one of the toughest problems for an OLAP system.

When working with Distinct Counts, OLAP performance considerations consist of three things that need to be balanced:

· Cube processing times

· Space used for the cubes

· Multidimensional Expressions (MDX) query execution times

The various factors that influence the balance of cube processing times, space used for cubes, and MDX are:

· Number of concurrent users querying the cubes

· Aggregation levels

· Parallel processing and hardware used

Using a Distinct Count scenario, MOLAP (Multidimensional OLAP storage mode), and parallel processing of cube partitions, you can vary different aggregation levels, concurrent users querying the cubes, and hardware.

The following results were tested:

· The processing times for a cube

· A comparison of disk space requirements

· A comparison of average MDX query response times with varying (up to 500) concurrent users

· A comparison of average MDX query response times with warm and cold caches

· A comparison of average MDX query response times, on 8-processor server versus a 16-processor partition on ES7000 Server

· A comparison of average MDX query response times with varying aggregation levels for 250 concurrent users

Many optimization tips for Distinct Count cubes have been included in this white paper. You will find shared results, with very good concurrent user query performance when using an ES7000 Server versus an 8-processor system.

The tests and results come from a set of business questions that originate from the Sales and Marketing Business Intelligence (SMBI) for the Sales cube scenario. Results from this study can be easily applied to any other cube, which involves Distinct Counts.

Conclusions and recommendations are drawn using some of the preceding factors for processing a large data set.

Test setup for distinct count performance

Some of the material in this section of the OLAP Distinct Counts and Performance Analysis white paper was taken from another white paper, written by Amir Netz and entitled, "Distinct Counts Using OLAP Services 7.0." However, some of the content in this paper has been modified as it applies to Analysis Services 2000 and to the SSABI example.

Distinct Count Analysis

Distinct Count Analysis is one of the most popular types of analyses, and yet, it is also one of the toughest problems for an OLAP system. The toughest problems refer to the problem as the many-to-many problem because analysis of the relationship between entities that have many-to-many relationships is involved.

The following represents typical usage for Distinct Count analysis.

· Sales and marketing: counting distinct numbers of customers and numbers of Sales Representatives, and so on.

· Insurance claims: one claim may have many damages.

The preceding bulleted list helps to explain the Sales and Marketing scenario. For example, suppose you have a cube that analyzes sales transactions. The cube has dimensions that describe the customer's geography, education, income level and gender. It also describes the product's categories, model, and size, time and sales representatives. The measurements include information about sales amounts, quantities and discounts.

One of the most common OLAP questions is, "How many customers bought a specific product in a particular time period?" A more general question is, "How many customers are buying each product?"

Although this last question seems simple, it is not. If you use a regular COUNT aggregate function for the measure it will not provide correct results because double counts may occur. If a single customer buys a product more than once, a regular COUNT with the measure will count the product sale by customer twice. In order to get the correct results, each customer needs to be counted only once. This is the classic Distinct Count problem, and it requires a fairly complex resolution in the online analytical processing (OLAP) environment.

Consider the following MDX query on a sample Sales cube:

SELECT

{ [Total Sales Count], [Distinct Customers Count] } On Columns,

Products.Members On Rows

From Sales

The query above returns the following results:

Table 1. Query Results

Products	Total sales count	Distinct customers count
All Products	8000	200
Computers	3300	80
Laptop	2000	70
PC	800	60
Workstations	500	30
Peripherals	4700	150
Monitors	1500	100
Printers	2500	100
Games	700	80

Understanding the Problem

In the Total Sales Count column, the numbers add up to subtotals and then to totals. This is the expected behavior of a SUM measure. However, in the Distinct Customers Count column, the numbers do not add up.

When looking at the Distinct Customers' Count column, you see that 70 customers bought laptops, 60 customers bought PCs, and 30 customers bought workstations. However, the total number of customers who bought computers, according to the results is not 70+60+30 or 160 as shown in the table. The query results display an actual count of 80 total computer customers. The irregularity is because many customers bought more than one product. A few customers bought both laptops and PCs, while others bought the whole three-piece package (laptop, PC, and workstation). And then there were those who bought just one item, like the laptops, and so on. The end result is that there is no way to infer directly from the lower level results what the customer subtotal really is. This discrepancy continues through the upper levels as well. For example, 80 customers bought computers, 150 bought peripherals, and all together all products totaled only 200 customers.

These kinds of irregularities pose challenges for OLAP systems. Non-additive measures create the following problems on a typical OLAP system:

· Roll-ups are not possible. When pre-calculating results during cube processing, the system cannot deduce summaries from other summaries. All results must be calculated from the detail data. This situation places a heavy burden in processing time.

· All results must be pre-calculated. With non-additive measures, there is no way to deduce the result for a higher-level summary query from one pre-calculated aggregation. Failure to pre-calculate the results in advance means that the results are not available. It is impossible to perform and maintain incremental updates to the system. A single transaction added to the cube usually invalidates huge portions of previously pre-calculated results. In order to recover from this, a complete recalculation is needed.

Distinct Counts in Analysis Services 2000

Standard functions available with OLAP Services 7.0 included Sum, Min, Max and Count. In the Analysis Services 2000 cube editor, another aggregated base function called DISTINCT COUNT has been added to the standard functions. This makes a big difference in terms of simplifying the implementation of Distinct Counts because you do not have to write complex MDX queries to implement Distinct Counts as you did in OLAP Services 7.0. The performance burden shifts from querying to processing. In many businesses, the number of customers can be very large. It is important to remember that even with these optimizations, Distinct Counts can be slower than other additive measures. Overall, Distinct Counts could be processor intensive. Later in this document, techniques to optimize Distinct Counts performance are discussed.

Microsoft Solution Offerings: SSABI Sales and Marketing

Microsoft developed the Microsoft SQL Server Accelerator for Business Intelligence (BI) to simplify the process of building an analytical application in a variety of business scenarios. One such offering is Sales and Marketing Analytics, which is a BI solution for a Sales and Marketing scenario. Building a BI solution that delivers insight into product sales (marketing campaigns, and customer behavior) is often a difficult, costly, and time-consuming undertaking whether you create a custom solution or use a packaged BI solution specific to an industry.

The Microsoft SQL Server Accelerator for BI consists of the following:

· Rapid deployment tools (the BI accelerator software), which you can use to create a customized analytical application based on a customized data model.

· Reference data models addressing BI analytics.

· Pre-configured client views that are based on templates and generated by client generators.

· A data-processing and management application (the Analytical Application Extraction, Transformation and loading process) consisting of DTS (Data transformation Service) packages, meta data tables, and stored procedures that moves the initial and incremental loads of data within the analytical application and processes the multidimensional cubes within the Analysis database.

· Prescriptive Architecture Guides (PAG) that assist you in developing, deploying, operating, and maintaining the analytical application.

Overall, the BI accelerator contains the Analytics builder workbook, Analysis Database, SQL Server subject matter database, DTS packages and client views. However, the cubes and the schema included with the BI accelerator do not support Distinct Count cubes by default. In this white paper, you are shown how to modify some of those cubes to answer queries that use Distinct Counts.

Building DC Cubes Using SSABI Schema

This section and the sections that follow describe how the Distinct Count cubes were built for performance testing using BI accelerator sales and marketing data. The following steps were used to modify the SSABI schema:

· Setting up the test configuration

· Identifying common business questions

· Identifying existing information

· Building OLAP cubes including Distinct Count measures

Although the source SQL Server database had more than 1-TB data only a portion (a few GB) of that data was used as source data for the Sales cube with partitioning. This Sales cube had sales transaction information for several months, three years, and 2.5 million customers. It took (in each cube) three months, and three partitions for this study to be concluded.

Test System Configuration

The hardware used for this set of experiments is shown in figure 1:

Figure 1. System configuration

As shown in Figure 1, a 16-processor partition on a 32-processor Unisys ES7000 Server was used for this study.

Other ES7000 details are as follows:

· 16 Intel Xeon 900 MHz CPUs with 2-MB L2 cache

· 16 GB of RAM

· 9 PCI buses

· 8 Emulex LP8000 PCI cards

Unisys ESM7800 Fiber Channel data storage consist of the following:

· RAID 0 disk arrays with disks

· Network: 100-MB network

Software Installed on the ES7000 partition:

· Windows 2000 Data Center Edition with service pack 2 (SP2)

· SQL Server 2000 and Analysis Services 2000 Enterprise Edition with SP2.

As shown in Figure 1, SQL Server and Analysis Services were housed in the same partition.

Two processor client machines were used for MDX queries.

Defining Business Questions

Table 2 lists questions that involve Distinct Counts. These questions are representative of typical questions a Sales and Marketing analyst may need to answer. These questions served as a starting point for the Distinct Counts study.

Table 2. Sales and Marketing Questions

Query number	Question
1	How many distinct customers bought product X in year 2000?
2	How many distinct products in category X were sold in each of the regions in last year?
3	How many distinct sales people sold product X in particular months?
4	How many distinct customers shopped at various stores in a particular quarter?
5	What is the average sales amount per distinct customer for various years?

Choosing a Dimensional Schema from SSABI

The preceding questions help to identify the appropriate source information in the existing BI accelerator SMA schema. Various relational tables were used to identify answers that are related to questions using the Sales Snowflake schema. Here is a list of some of the dimension tables:

· Product dimension tables contain information about products sold.

· Sales Force representatives dimension tables contain the information about sales force.

· Time dimension tables had information on time periods.

· Customer dimension tables contained information about customers and industries they belong to.

· Geography dimension tables contained information about customers as well as direct marketing areas.

· Customer status dimension tables contained information about customer status, for example, whether a customer is active or not.

The Fact table records the sales transaction fact information for various dimensions. Figure 2 shows part of that Snowflake schema design.

Figure 2. Snowflake schema with facts and dimensions (click thumbnail for larger image)

Details of all the individual relational database management system (RDBMS) tables involved in the schema are shown in Appendix A.

Building OLAP Cube(s) with DC Measures from Default Schema

The Snowflake schema shown above was used by the Sales cube included with the Accelerator for BI. The Sales cube was used to answer specific business questions. The Sales cube does not support Distinct Counts by default; the cube schema was modified to support Distinct Counts.

Dimensions Summary: Seven dimensions in the cubes were included based on the dimension tables in the dimensional data mart. The following table is a quick summary of the OLAP dimensions and RDBMS tables on which they are based.

OLAP dimensions	RDBMS Tables on which dimensions are built	Total size of RDBMS tables (MB)
CustomerSimple2	Dim_Cust_Direct_Cust, Dim_Cust_Direct_Industry	999.7
TimeSimple2	Dim_Time_Day Dim_Time_Std_Mon, Dim_Time_Std_Qtr, Dim_Time_Std_Year	1.05
Sales ForceSimple2	Dim_SF_Std_Rep, Dim_SF_Std_SFArea, Dim_SF_Std_SFRegion	2.47
Sales Rep StatusSimple2	Dim_SF_Std_Rep, Dim_SF_Std_SFArea, Dim_SF_Std_SFRegion	2.47
Customer StatusSimple2	Dim_CustStat_Std_Active, Dim_CustStat_Std_Range	0.09
GeographyMktgSimple2	Dim_Geeog_Mktg_DMA, Dim_Geog_Mktg_DMR Dim_Geog_Std_Zip	7.96
GeographyStdSimple2	Dim_Geog_Std_City, Dim_Geog_Std_Country, Dim_Geog_Std_Region, Dim_Geog_Std_State, Dim_Geog_Std_Zip	12.46
ProductSimple2	Dim_Prod_Std_ProdGrp, Dim_Prod_Std_ProdItem, Dim_Prod_Std_ProdLine	0.56

Measures Summary: Table 3 describes a subset of measures that were chosen in various cubes.

Table 3. Subset Measures

Fact measure	Simple description
Actual net Sales Amount	Actual net sales amount for the transaction.
Distinct customer Count	Distinct Count of customers for various dimensions.
Distinct Product Count	Distinct Count of products sold for various dimensions.
Distinct Sales Reps Count	Distinct Count of Sales Representatives involved for various dimensions.
Actual Sales Units	Actual Sales units involved in the transaction.

Figure 3 shows the design of one of the cubes modified for Distinct Counts.

Figure 3. Design of one of the cubes (click thumbnail for larger image)

The following steps were used to implement Distinct Counts using the SSABI cubes, and to conduct the performance study:

1. Make a new OLAP database out of the default OLAP database provided with the offering.

2. Copy and paste only the Sales cube and the shared dimensions used by the Sales cube in that OLAP database. (The Sales cube is called AllMSODDDB).

3. Modify all shared dimensions by removing custom rollups used by various levels of the dimensions. This step is necessary because you cannot define a Distinct Count measure in a cube that contains a dimension with a custom rollup.

4. Add Distinct Count measure in the Sales cube for customer ID (cust_direct_cust_Idx) field in the Fact table. This measure aided in calculating the Distinct Count of customers. We called this modified cube SaleswithDistinctcustomersM15. M15 denoted that it is a MOLAP cube with 15 percent aggregations. It is shown in Figure 4.

Figure 4. Added distinct count measure to the cube design (click thumbnail for larger image)

5. Create another cube called SaleswithDistinctSalesRepsM15 that has a Distinct Count measure in the Sales cube for Sales Representative ID (SF_std_rep_idx) field in the Fact table. This measure helps later on in calculating the Distinct Count of Sales Representatives.

6. Create another cube called SaleswithDistinctProductsM15 that has a Distinct Count measure in the Sales cube for product ID (prod_std_Proditem_idx) field in the Fact table. This measure helps later on in calculating the Distinct Count of Products.

7. Create a virtual cube called SaleswithDistinctCountsM15VC consisting of all the measures from the three cubes mentioned in the preceding table and all the shared dimensions.

8. Steps 4 to 7 are repeated for 30 percent aggregation levels and then for 60 percent aggregation levels. For each aggregation level, there is one virtual cube and three base cubes with one Distinct Count measure in each of them for counting distinct customers, distinct products purchased, and distinct sales representatives involved.

9. All end users query only the virtual cubes.

The following is a summary of the cubes built:

Note Although, the cubes have identical structure, their aggregation levels are different. Each cube had three partitions for March 1998, March 1999 and March 2000. Usage based optimization (UBO) was not used in this study due to insufficient time for performance testing. It is recommend that you use UBO for good query performance with Distinct Count cubes.

Table3. Cube Summary

Cube name	Aggregation (%)	Virtual cube?	Distinct count measure
SaleswithDistinctCustomersM15	15	No	Distinct Count of customers
SaleswithDistinctProductsM15	15	No	Distinct Count of products
SaleswithDistinctSalesRepsM15	15	No	Distinct Count of sales representatives
SaleswithDistinctCountsM15VC	15	Yes	Distinct Count of customers, products and sales representatives
SaleswithDistinctCustomersM30	30	No	Distinct Count of customers
SaleswithDistinctProductsM30	30	No	Distinct Count of products
SaleswithDistinctSalesRepsM30	30	No	Distinct Count of sales representatives
SaleswithDistinctCountsM30VC	30	Yes	Distinct Count of customers, products and sales representatives
SaleswithDistinctCustomersM60	60	No	Distinct Count of customers
SaleswithDistinctProductsM60	60	No	Distinct Count of products
SaleswithDistinctSalesRepsM60	60	No	Distinct Count of sales representatives
SaleswithDistinctCountsM60VC	60	Yes	Distinct Count of customers, products and sales representatives

Note Each cube had 3 partitions based on three time periods: March 1998, March 1999, and March 2000 respectively. For each partition, the corresponding Fact table had around 2 million rows. There were 6 million rows overall in the Fact's table for each cube. One large dimension included here was the customer dimension with 2.5 million members in it.

Browsing OLAP Cube(s) Using Proclarity®

Figure 5 shows a report that was produced using the Proclarity Cube Browsing tool for question number 1 in our business cases once the cubes were processed.

Figure 5. Sample report for business question 1 using processed cubes (click thumbnail for larger image)

Performance Study Results

In this second part of the paper, the concentration is on the performance results obtained with Distinct Count cubes.

Processing Performance

The follow graph shows aggregation level percentages of 15, 30, and 60. Note that the aggregation percentage number represents the expected improvement in query performance compared to having no pre-calculated aggregations.

Processing times for various aggregation levels

The following results were obtained when identically structured cubes, using different data storage modes and aggregation levels were processed.

The following figure shows processing times for various aggregation levels:

Figure 6. Processing times for cubes

This data in figure 6 shows that:

· At 15 percent aggregation, it took 20.45 minutes to process the three partitions in parallel.

· As the aggregation level increases, it takes more time to process the cubes. As shown in the graph above, it took almost an hour as we increased aggregation level to 30 percent. It increased to almost 2.5 hours as we increased the aggregation level to 60 percent.

· We also tested the processing time for a cube with 90 percent aggregation level, which took approximately 13 hours to process the same cube. A thorough study for all the partitions for that aggregation level was not done. With Distinct Count cubes, the correct thing to do is to keep the aggregation level low, as shown in the preceding figure.

· Overall, save time by processing three OLAP partitions in parallel using a multiple processor ES7000 server. It takes much more time if partitions were processed sequentially.

Disk Space Requirements

The following illustration shows the space requirements (in GB) for varying aggregation levels.

Figure 7. Disk space required for cubes

The data in figure 7 shows that:

· As the aggregation level increases, more space is needed to store the data in OLAP cubes.

· Space required increased from 6 GB to almost 17 GB as we increased aggregation level from 15 to 60 percent. Space is definitely a consideration when it comes to deciding the aggregation level.

Query Performance

All queries used for this performance testing involved Distinct Counts.

Developing MDX Queries

In this step we developed MDX queries for our set of business questions. An example for question 1 is shown here:

The following is an MDX query showing the number of distinct customers that bought product X in year 2000.

SELECT { [Time].[Simple].[Year Name].and[1/1/2000] } ON COLUMNS ,

{[ProductStandardSimple].[Product Group].and[2].CHILDREN } ON ROWS

FROM [Distinct Sales Rep and Customers]

WHERE ([Measures].[Distinct Customer Count] )

Figure 8. Results of MDX queries

MDX Queries Setup

MDX queries were written for the five business questions mentioned in table 2. These business scenarios were chosen as a starting point. Several different and fairly complex MDX queries were generated for the MDX query testing. Standard MDX query templates were created to generate random, concurrent, unique queries against the virtual OLAP cubes (for example SaleswithDistinctCountsM15VC). All queries involved at least one Distinct Count measure in addition to other dimensions and measures. The queries represented typical Distinct Count analytical queries that would be generated against analysis cubes.

Testing scenario

· Systems were rebooted before the query test. For concurrent user testing, three simulation groups were used: 100 users, 250 users, and 500 users. All MDX queries were launched concurrently.

· On the ES7000, eight processors were dedicated to Analysis Services and four processors were dedicated to SQL Server. The remaining four processors were shared between SQL Server and Analysis Services.

· For each test, response time measures were collected for the first 100 users in each test group.

Measuring MDX Query Response Times (QRT) for Cold/Warm Cache with Up to 500 Concurrent Users

The following section shows query response results. Average query response times were measured for various concurrent users for varying aggregation levels and for cold and warm cache. Times were recorded in seconds.

Average and Median QRT (Cold Cache/Up to 500 Concurrent Users)

For the cold cache measurement, no query results were in cache; the OLAP Services server was restarted prior to the cold cache tests.

For this test, all queries were going against the SaleswithDistinctCountsM15VC cube. The cubes that were used were not especially optimized for performance (as it had an aggregation level of 15 percent) and no special tuning was done for the queries.

The following graph shows the average MDX query response time as number of concurrent users was changed from 100 to 250 to 500. Query response times were collected from the first 100 users within each test group.

Figure 9. ART for varying concurrent users in cold cache with 15 percent aggregations

The data in figure 9 shows that:

· Average response time (ART) of 2.1 seconds was observed when 100 concurrent users were querying the OLAP cube.

· Average response time of 9.6 seconds was observed when 250 concurrent users were querying the OLAP cube. Considering the fact that all the queries were fairly complex and all of them involved Distinct Counts, this time is reasonable at lower 15 percent aggregation level.

· Average response time of 10.9 seconds was observed when 500 concurrent users were querying the OLAP cube, an increase of only 13.5 percent from the 250 users tested.

Median vs. average query response times

In statistics, the median represents the middle value in a group of measurements. It is a commonly used indicator of what measurement is typical or normal for a group. For various concurrent users, look at the median response times, and then observe that for 500 concurrent users, median response time is only 0.194 seconds as compared to an average response time of 10.9 seconds.

Although the average is used more frequently than the median, the median is still an important measure of central tendency. Median is not affected by the presence of a number that is extremely high or extremely low relative to the other numbers in the group. In other words, median is not skewed as a result of some 20 percent or so queries sent by users that really take a long time to return results while 80 percent of the queries could be taking only couple of seconds to return results. Sometimes median response time makes more sense than average response times. Figure 10 shows the median and average times in the same graph.

Figure 10. ART for varying concurrent users in cold cache with 15 percent aggregations

For a more thorough query results you can find AVERAGE, MEDIAN, MIN and MAX query response times in Appendix B.

The following sections show the improved results while using warm cache and increased aggregation levels.

Average QRT (Warm Cache/Up to 500 Concurrent Users)

For the warm cache measurement, query results were held in cache. Executing the cold cache test and then immediately executing the same test again resulted in a warm cache-testing scenario.

In this test, all queries were executed against the SaleswithDistinctCountsM15VC cube. The cube that was used was not optimized for performance (as it had an aggregation level of 15%). No special tuning was done for the queries.

The following graph shows the average MDX query response time as the number of concurrent users wase changed from 100 to 250, and then to 500. Query response times were collected from the first 100 users in each test group.

Figure 11. ART for varying concurrent users in warm cache with 15 percent aggregations

The data in figure 11 shows that:

· Average response time (ART) of 0.032 (less than one) second was observed when 100, 250 or 500 concurrent users were querying the OLAP cube. Although it seems obvious to get good performance when cache is warm, this performance is commendable, especially with 500 concurrent users querying the cube with 15 percent aggregations. This performance was observed on an ES7000 and may vary with hardware as well.

· Similar warm-cache testing were tried using OLAP Services 7.0, the warm cache query execution times for Distinct Count queries were not observed to be in the sub second range. Timings were similar to those in cold cache. Good response times for warm cache queries were observed in Analysis Services 2000.

Average QRT (Cold vs. WarmCache/Up to 500 Concurrent Users)

Figure 12 shows the results observed in the last two graphs have been added to the graph in this section for analysis.

Figure 12. ART for varying concurrent users in cold/warm cache with 15 percent aggregations

The data in figure 12 shows that:

· For cold cache, Average Response Time increased from 2.1 seconds to 10.9 seconds when the number of concurrent users querying the OLAP cube increased from 100 to 500. For warm cache, ART of 0.032 second was observed regardless of the number of concurrent users querying the cube.

· The graph shows that once the cache was warm, the results were less than a second. You can execute the most often used queries as a batch job immediately after finishing the cube processing. Using this method will result in improved query performance for business users in the production environment.

Average QRT for Varying Aggregation Levels (Cold Cache/250 Concurrent Users)

As we had observed an average response time of 9.6 seconds for 250 concurrent users in cold cache, we decided to increase the aggregation level from 15 percent to 30 percent to 60 percent to see the effect of aggregations on ART. Again, for the cold cache measurement, we restarted the OLAP Services server prior to the tests.

For this test, queries were going against the SaleswithDistinctCountsM15VC cube for 15 percent aggregation level test, and then against the SaleswithDistinctCountsM30VC cube for 30 percent aggregation level test and finally against the SaleswithDistinctCountsM60VC cube for 60 percent aggregation level test.

Figure 13 shows the average MDX query response time as the aggregation level was increased.

Figure 13. ART for varying aggregation levels with 250 concurrent users

The data in figure 13 shows that:

· Average response time of 9.6 seconds was observed for 15 percent aggregation level when 250 concurrent users were querying the OLAP cube using Distinct Count queries.

· As the aggregation level was increased to 30 percent, ART fell to 5.7 seconds.

· As the aggregation level was increased to 60 percent, ART improved to 3.8 seconds in cold cache. On average, query time decreases as aggregation levels increase. As we increased aggregation level, query times decreased on the ES7000 server.

· When analyzing this graph, it should be compared simultaneously with the graphs (printed in previous pages) that show processing times and disk space used as we increase aggregation levels.

· Instead of choosing high aggregation levels up front, you should start processing the cube with a lower aggregation level, and then increase the aggregation level in smaller increments until you get the desired query performance.

· The ideal method is to start with a low aggregation level, allow users to query for some time and then apply the Usage-Based Optimization Tool in Analysis Services. This will design aggregations specifically for the queries that real users have been submitting.

For the same test, figure 14 shows the median MDX query response time as the aggregation level was increased.

Figure 14. ART for varying aggregation levels with 250 concurrent users

As shown in the figure 14, median query response times are much better than their corresponding average response times. Complete details are also shown in Appendix B.

Table 4 lists a quick summary of our observation on the affect of aggregation levels on processing times, disk space, and query times.

Table 4. Aggregate levels and processing times

Aggregation level	Distinct customers cubes processing times	Disk space used for cubes	Average query response time with 250 users
15%	20.45 minutes	6.38 GB	9.6 seconds
30%	56.80 minutes	11.45 GB	5.7 seconds
60%	144.32 minutes	16.98 GB	3.8 seconds

As shown in the preceding table, going for a very high aggregation level like 60 percent might not give you a great query response, especially at the cost of very high disk space that is required for cubes and the processing time that is spent for loading the cubes. In choosing aggregation levels, balance the query performance, cube processing time, and disk space used.

Average QRT (Cold vs. Warm Cache with 500 Concurrent Users) for 8x vs. ES7000 Server

To compare average MDX query execution times that you may expect using a distributed commodity 8-processor server farm versus using a standalone Unisys ES7000 Server, a stress test was conducted for the 500 concurrent users.

On the ES7000 partition there were a total of 16 processors available in the partition, but eight processors were dedicated to Analysis Services and four processors were dedicated to SQL Server. The remaining four processors were shared between SQL Server and Analysis Services using the affinity feature available with ES7000.

To simulate the distributed commodity environment, three 8-processor servers (Unisys 5085) were used. Two database servers were used to hold the commodity environment SQL Server data warehouse. Using the federated database server feature within Microsoft SQL Server 2000, the data warehouse databases were linked, allowing each database in the commodity farm to transparently access data from either server. Analysis Services (hence the OLAP database) was installed on the third 8-processor server. For MDX query testing, this Analysis Server was used.

The same environment was deployed on both the ES7000 and commodity environments, including the same spindles on disks. Other details about this configuration are listed in Appendix C.

Note Commodity environment was only used for this section of this white paper and it was not used for any other sections.

For this cold cache test, all queries were going against the SaleswithDistinctCountsM15VC cube.

The following graph shows the average MDX query response time for 500 concurrent users in cold cache on both systems and then in warm cache on both systems. Query response times were collected from the first 100 users in each test group.

Figure 15. ART for 16 processors versus 8 processors for 500 concurrent users

The data in Figure 15 shows:

· In cold cache, ART was 10.9 seconds on the ES7000 versus the 37.45 seconds on the 8-way. In warm cache, ART was 0.032 seconds on the ES7000 versus the 13.58 seconds on 8x.

· There was an obvious need for more processing power for Distinct Counts, and the ES7000 Server performed very well during this stress test with 500 concurrent users.

· Analysis Services could use a maximum of twelve processors in ES7000 versus maximum of 8 processors in a commodity system. But, with just four more processors available in ES7000, and more memory available in the CMP system, query performance improved more than 240 percent. It also shows that overall performance depends on the hardware that is used for the implementation. This is particularly true in the complex case of Distinct Counts, which are processor intensive.

Performance Optimization Tips

Distinct Count cubes scan millions of rows of data, cause higher computational load, and pose challenges to get good performance. There is no one technique to solve the performance problem. However, there are some tips that you can use to get good performance. The optimization tips that are relevant to Distinct Counts can be grouped into two categories: tips for improving the cube processing time for the OLAP cubes and tips for improving the query performance.

Tips for Improving DC Cubes Processing Performance

Besides improving the processing performance, some of these tips can also improve MDX query performance indirectly.

· Aggregations: Aggregation level is especially relevant in the case of Distinct Count cubes. Start at a lower aggregation level like 15 percent, or even no aggregations level. Let users query the cubes and then use the usage-based optimization wizard. Processing a Distinct Count cube with 90 percent aggregations is not recommended. Lower aggregation levels will also save the space needed for the cubes.

· Parallel processing using multiprocessor server: If you have a multiprocessor server (similar to the Unisys ES7000), use parallel processing to reduce the processing time. Use partitioning wherever possible to improve processing and query performance. By doing parallel processing of partitions, you can reduce the processing time to a great extent.

The parallel processing tool is part of the SQL Server 2000 resource kit. For example, using this tool, three partitions could be processed in parallel, and then once that finishes, another three could be started in parallel, and so on until all of the partitions are finished. Because of memory constraints with Analysis Services, there is a limit on how many partitions you can process in parallel and it depends on the cube design (among other things). In this case, 4 partitions were processed in parallel successfully. However this study used 3 partitions. Future versions of Analysis Services (including the 64-bit edition) might be able to use more memory, hence the limitation of the number of partitions that can be processed in parallel might not exist.

Nonetheless, with even 4 partitions being processed in parallel, you can still save the processing time by even 4 times while processing cubes. This also depends on the cube schema.

· RDBMS schema: On the relational data mart side, use a dimensional schema and create a clustered index on the column that you are going to use for Distinct Count in the Fact table. For example, in the case of the SaleswithDistinctcustomersM15 cube, the Distinct Count column was cust_direct_cust_Idx in the Fact table (to calculate Distinct Count of customers), so we created a clustered index on cust_direct_cust_Idx in the Fact table. Processing times were greatly reduced by doing so. You might want to periodically update the statistics for this index if the data is subject to frequent change (incremental loads, and so on). When designing indexes on the relational tables, you need to weigh the cost of populating the table and updating the index, against the benefits during cube processing time.

· Virtual cubes with varying aggregation level: Create an isolated Distinct Count cube with only one measure in it for the Distinct Count. Choose a very low level of aggregation for it, or no aggregations at all. Create another cube with all remaining measures in it. Choose the appropriate level of aggregation for this second cube. Both cubes share the same dimensions. Create a virtual cube based on these two cubes. As far as end users are concerned, they will only query the virtual cube and the virtual cube will direct it to the appropriate physical cube. This approach can improve both processing and query performance.

· Memory: Use the maximum memory that is available. We modified the memory conservation threshold setting (under Properties of Analysis Services) to 3 GB, which is the maximum memory that could be used by Analysis Services. The ES7000 can be configured with up to 32 GB of memory and we had 16 GB available in the ES7000 partition we used. The rest of the memory can be used by other applications like SQL Server and the operating system, thus reducing the need for memory paging to disk. The planned 64-bit version of Analysis Services will remove this limitation of 3 GB and should make better use of memory that is available in the systems.

· Other analysis services settings: Under the Analysis Server Properties dialog box, there are various settings that can be modified:

· Process buffer size and read ahead buffer size: Increase these default sizes depending upon the size of cubes and dimensions. In our case, we got the best performance by increasing the process buffer size to 400 MB and keeping the Read ahead buffer size at 10 MB. The more important setting to focus on is the process buffer size. The rule of thumb is to modify this setting until no temporary folder is used during processing.

· Keep data folder and temporary folder on different drives if possible.

· It is good practice to log the processing of cubes and queries into log files.

· Correct dimension and partition counts for aggregation levels: One very important observation was that it is important to have correct counts on various dimension levels and facts before specifying the aggregation levels. Correct counts (number of members in the relational database tables) should be verified by looking at the dimension editor, and so on. There is a tool called Partition Manager (included with the Accelerator for BI) that can help in managing these counts. Aggregation levels will not be right if these counts are incorrect.

Other Tips for Improving Query Performance

· Use partitioning of cubes to be able to get better query performance especially when you have multiple processors available. This can play an important role when you have hundreds of concurrent users querying the same cube. To increase selectivity when querying, the slice information should be set in the partition appropriately. For example, in our case, the ES7000 made good use of multiple processors while cubes were being queried. Partitioned cubes had been based on months of study, so that queries spanning various months could make use of multiple processors available in ES7000. BI Accelerator supports monthly partitioning "out-of-the-box." Greater improvements in query performance could be obtained by partitioning along additional dimensions.

· When working with Distinct Count queries, it is a good practice to have an automatic script run the most important queries once the cube is processed prior to end users accessing the system. This way some of the queries enter the cold cache to create a nice group of common queries in the warm cache. As seen in the results, warm cache times were very good.

· Start with low aggregations and then let users query the database, and then use the Usage-Based Optimization Wizard to build additional aggregations needed to speed up the queries.

· To improve query performance when you have high number of concurrent users, make use of quality multiple processor hardware like ES7000.

· Use MOLAP storage for best query performance.

Conclusion

In the first half of this paper, we showed techniques to answer useful Distinct Counts business questions by using the Sales and Marketing scenario and Analysis Services 2000. We also showed how to extend BI Accelerator Sales and Marketing schema to include Distinct Counts. Distinct Counts, if used effectively, can help to uncover interesting business facts and get good insight into the business.

The second half of this white paper focused on the performance characteristics of Analysis Services 2000 when dealing with Distinct Counts using a multiprocessor ES7000 server. Finally you were given some performance tips that are especially useful in the case of Distinct Counts.

The following are some of the main observations from the performance section of the white paper and previous experiences with Distinct Counts:

· In large data warehouses, to reduce the processing time and to improve query performance, use a multiprocessor server (similar to Unisys ES7000) and use parallel processing of cubes along with partitioning of cubes. This can play an especially important role when you have hundreds of concurrent users querying the same cube. The gains with this approach are limited by the amount of memory Analysis Services can use. Future versions of Analysis Services (including the 64-bit edition) will be able to use more memory and hence any limitation in number of partitions that could be processed in parallel, might go away.

· As aggregation level increases, more time is taken to process Distinct Count cubes. Good practice would be to start at a lower aggregation level like 15 percent, or start with no aggregations. Let users query the cubes and then use the Usage-Based Optimization Wizard to increase the cube aggregation level in smaller increments.

· A good observation was that in warm cache, an average query response time of 0.032 seconds was observed even when concurrent users querying the OLAP cubes were increased from 100 to 500. In cold cache and at 15 percent aggregation level, average query response time increased from 2.1 seconds to 10.9 seconds when concurrent users querying the OLAP cubes were increased from 100 to 500. Considering the fact that all the queries were fairly complex and all of them involved Distinct Counts, this time is reasonable at lower 15 percent aggregation level. So it is good practice to have an automatic script run the most important queries once the cube is processed to get those queries into warm cache.

· For 500 concurrent users in cold cache and at 15 percent aggregation level, median response time is only 0.194 seconds as compared to an average response time of 10.9 seconds. Observing median time gave some new insight into the query behavior, especially when a few long running queries can skew the results by taking much more time than the other queries do.

· After comparing these results with previous observations, it was noted that there is a significant improvement in terms of query response time between Analysis Services 2000 and OLAP services 7.0 when it comes to Distinct Count queries. This is especially notable in the case of warm cache.

· For 250 concurrent users, cold cache, and 15 percent aggregation level, the average query response time of 9.6 second was reduced to 3.8 seconds as aggregation level was increased to 60 percent in case of Distinct Count queries.

· With 500 concurrent users, average query response times were compared between standalone ES7000 environment and distributed commodity server farm environment. The average response time was much better on ES7000 environment. There was an obvious need for more processing power for Distinct Counts and the ES7000 performed very well during this stress test with 500 concurrent users. It also shows that overall performance depends on the hardware that is used for the implementation. This is particularly true in the complex case of Distinct Counts, which are processor intensive.

· There are many performance optimization tips that can be followed when working with cubes with Distinct Counts to improve overall performance. Modifying Analysis Server settings is one of them.

Overall, Analysis Services 2000 handled Distinct Counts very well and is improved from OLAP Services 7.0. You can get improved processing and query performance by trying parallel processing and partitioning for large data sets using multiprocessor servers like the ES7000. By using features provided by Analysis Services, and following a few simple guidelines shown in this paper, you can enhance the power of OLAP to gain insight into solving complex business analysis scenarios.

Appendix A

Table 5. Details of all RDBMS tables from the Sales cube schema

Name	Number of rows	Size of the RDBMS tables (MB)
Fact_Sales_1998_3	2008759	349.63
Fact_Sales_1999_3	2155757	375.22
Fact_Sales_2000_3	2030632	353.44
Dim_Cust_Direct_Cust	2500001	999.49
Dim_Cust_Direct_Industry	636	0.21
Dim_CustStat_Std_Active	3	0.04
Dim_CustStat_Std_Range	5	0.05
Dim_Geog_Mktg_DMA	213	0.08
Dim_Geog_Mktg_DMR	15	0.04
Dim_Geog_Std_City	28267	4.46
Dim_Geog_Std_Country	4	0.04
Dim_Geog_Std_Region	16	0.05
Dim_Geog_Std_State	98	0.06
Dim_Geog_Std_Zip	42173	7.84
Dim_Prod_Std_ProdGrp	3	0.04
Dim_Prod_Std_ProdItem	1334	0.47
Dim_Prod_Std_ProdLine	6	0.05
Dim_SF_Std_Rep	8001	2.38
Dim_SF_Std_SFArea	3	0.04
Dim_SF_Std_SFRegion	6	0.05
Dim_Time_Day	2203	0.80
Dim_Time_Std_Mon	84	0.11
Dim_Time_Std_Qtr	36	0.09
Dim_Time_Std_Year	18	0.05
Total size of RDBMS tables		2094.74

Appendix B

Other query response time measures (for example, median) for various tests are shown in the tables 6, 7, and 8.

Table 6. QRT (cold cache/ up to 500 concurrent users)

Number of concurrent users	Average response time	Median response time	Minimum response time	Maximum response time
100	2.1	0.112	0.003	22.618
250	9.64	2.016	0.003	81.803
500	10.55	0.1945	0.003	167.985

Table 7. QRT (warm cache / up to 500 concurrent users)

Number of concurrent users	Average response time	Median response time	Minimum response time	Maximum response time
100	0.032316	0.009	0.003	0.416
250	0.031769	0.009	0.003	0.432
500	0.031769	0.009	0.003	0.432

Table 8. QRT for varying aggregation levels (cold cache / 250 concurrent users)

Aggregation Level	Average Response time	Median response time	Minimum response time	Maximum response time
15	9.646698	2.016	0.003	81.803
30	5.705361	0.455	0.003	84.658
60	3.813859	0.233	0.003	58.808

Appendix C

Hardware Configuration Details

The following represents the hardware configuration and details used in the "Measuring MDX query execution times for cold and warm cache, with 500 concurrent users, for 8x vs. ES7000" section of the white paper.

Configuration Summary

Figure 16.Summary of test environment configurations (excluding file server)

Figure 17. ES7000 system configuration

Figure 18.Commodity environment system configuration

Unisys Copyright Page

NO WARRANTIES OF ANY NATURE ARE EXTENDED BY THIS DOCUMENT. Any product or related information described herein is only furnished pursuant and subject to the terms and conditions of a duly executed agreement to purchase or lease equipment or to license software. The only warranties made by Unisys, if any, with respect to the products described in this document are set forth in such agreement. Unisys cannot accept any financial or other responsibility that may be the result of your use of the information in this document or software material, including direct, special, or consequential damages. You should be very careful to ensure that the use of this information and/or software material complies with the laws, rules, and regulations of the jurisdictions with respect to which it is used.

The information contained herein is subject to change without notice. Revisions may be issued to advise of such changes and/or additions. Notice to Government End Users: The software and accompanying documentation are delivered and licensed as "commercial computer software" and "commercial computer software documentation" as those terms are used in 48 C.F.R.'s 12.212 and 48 C.F.R.'s 227.7202-1 through 227.7202-4, as applicable. The Government shall receive only those rights provided in the standard commercial software license, or where applicable, the restricted and limited rights provisions of the contract FAR or DFARS (or equivalent agency) clause.

posted on 2005-02-07 10:37 董晓涛阅读(837) 评论(2) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

董晓涛(David Dong)

公告