Archive

Archive for September, 2013

MDX+SSRS #31– Query Designer in SSRS only allows the Measures dimension in the first axis

September 28, 2013 Leave a comment

In Chapter 1 of the book MDX with SSAS 2012 Cookbook, in the first recipe “Putting data on x and y axes”, I have given a simple example to show how easy it is to use the CROSSJOIN function to "combine" more than one hierarchy into the COLUMNS and ROWS axes.

SQL Server Reporting Services is a report design and information delivery tool, and has been adopted by many companies for their Business Intelligence reporting needs. However, building reports in SSRS accessing OLAP cubes in Analysis Services is not without frustration.

The graphical MDX Query Designer in Reporting Services allows you to retrieve data from any BI semantic model through the technique of drag and drop, without you actually needing to understand and write MDX queries.

If you have some experience building reports in SSRS with Analysis Services data, you have already discovered that the graphical MDX Query Designer serves its purpose very well to  graphically generate well-formed and efficient MDX queries. Very quickly, however, you also will find that the you will need to cross a bridge. This bridge will lead you to the generic MDX query editor where you can edit the MDX query that is built by the graphical designer.

The following is a screenshot of the Query Designer and the toggle button for switching between the graphical designer and the editor.

image

Don’t expect the MDX query editor in SSRS works the same way as the MDX query editor in SSMS.

This MDX query simply puts two measures on the COLUMNS, and the CROSSJOIN (all possible combinations) of the sales territory country and product category on ROWS.

SELECT   
    { [Measures].[Internet Sales Amount],
      [Measures].[Internet Gross Profit]
    } ON 0,
    { [Sales Territory].[Sales Territory Country].[Sales Territory Country] *
      [Product].[Product Categories].[Category]
    } ON 1
FROM   
    [Adventure Works]

In SSMS, the previous query will produce the following result.

image

If we copy the same query to the query editor in SSRS, we do get the same number of rows back, and the measures match perfectly for every combination of the sales territory country and product category. However, we see some noticeable differences, comparing the previous screenshot with the following screenshot.

image

1. The measures are not formatted in the query editor in SSRS.

2. We get four columns in the query editor in SSRS. In addition to the two measures that we put on the X axis, the two hierarchies from our CROSSJOIN function on the Y axis have also appeared as two separate columns.

Now, let’s change the previous MDX query slightly, by switching the measures to the ROWS, and the CROSSJOIN of the sales territory country and product category to COLUMNS.

SELECT   
    { [Sales Territory].[Sales Territory Country].[Sales Territory Country] *
      [Product].[Product Categories].[Category]
    } ON 0,
    { [Measures].[Internet Sales Amount],
      [Measures].[Internet Gross Profit]
    } ON 1
FROM   
    [Adventure Works]

 

In SSMS, we would expect to see the following results.

image

Let’s copy the same query to the query editor in SSRS. This time we would get an error.

image

The message is actually very clear. I’d translate the error message into the following two rules. The query editor (and the graphical Query Designer) in SSRS:

1. does not allow CROSSJOIN in the COLUMNS (or 0-axis)

2. only allows the Measures dimension in the COLUMNS (or 0-axis).

What we have put on the COLUMNS clearly violated both of the rules.

     { [Sales Territory].[Sales Territory Country].[Sales Territory Country] *
      [Product].[Product Categories].[Category]
    } ON 0

 

In my experience, knowing what to expect from the graphical MDX Query Designer and the MDX query editor in SSRS will put you half way through the learning curve.

MDX #30 – Get The Book: MDX with SSAS 2012 Cookbook

September 22, 2013 Leave a comment
image

MDX with SSAS 2012 Cookbook

by Sherry Li and Tomislav Piasevoli

Packt Publishing 2013

The book is now available in both paperback and eBook format. You can order copies from Amazon, Barnes & Nobel, Packt Publishing. It is also available on Safari Books Online.

image image image image image

Hope you will find the book useful in mastering MDX.

SQL #50–Deleting or updating data from a table of 20+ GB

September 22, 2013 1 comment

I don’t have a definition of what is considered a large SQL table. I am pretty sure, however, that a table with 20+ GB data plus another 20+ GB for a few indices is not a small table anymore.

Although this article The Data Loading Performance Guide was written for SQL Server 2008, it has many good information for data warehouse developers to design efficient ETL processes.

Scenarios in the Guide involve partitioned tables and nonpartitioned tables, loading data from text files and also from inside the SQL Server Database Engine.

In a data warehouse environment, it’s not uncommon to design a nightly (or weekly) ETL process to refresh the data going back a longer period of time. During this nightly process, a large amount of data will need to be deleted from the target fact table first and then reload the target fact table with the data from the source.

Our tables are nonpartitioned tables. Here are four common deletion/update scenarios I can think of for nonpartitioned tables. I’ve used the methods below in both automated ETL processes and in ad-hoc queries.

1. Deleting All Rows from a Nonpartitioned Table – Use TRUNCATE

The fastest way to remove all data from a table is to simply execute a truncate statement.

TRUNCATE TABLE factMainTable;

The advantage of TRUNCATE TABLE over DELETE with no WHERE clause is well explained in the SQL Books Online.

TRUNCATE TABLE (Transact-SQL)

  • Less transaction log space is used. TRUNCATE TABLE removes all rows from a table without logging the individual row deletions. The DELETE statement, on the other hand, removes rows one at a time and the deletion is fully logged for each deleted row.
  • Fewer locks are typically used. When the DELETE statement is executed using a row lock, each row in the table is locked for deletion. TRUNCATE TABLE always locks the table and page but not each row.
  • Without exception, zero pages are left in the table. After a DELETE statement is executed, the table can still contain empty pages.

2. Deleting a Large Amount of Rows in a Nonpartitioned Table – Switch with a Temp Table

In this scenario, I’ll assume that the rows that needs to be deleted is more than the rows you want to keep on the table. The idea is to use a temporary table to hold the rows we want to keep, and then switch the temporary table back to the main table. Because the rows we save to the temporary table are much less than what we want to delete, the logging is kept to the minimal compared to executing the DELETE command directly on the main table. In addition, the bulk-insert method can greatly help in the overall execution time because of the optimization.

Here are the general steps:

1) Create a copy of the main table with the data you want to keep.

You need to use one of the bulk insert methods.

To perform the bulk insert, you can use one of the following three methods.

The INSERT … SELECT method:

INSERT INTO factMainTable_Temp WITH (TABLOCK)

SELECT * FROM factMainTable

WHERE OrderDate >=‘20130701’

  AND OrderDate <‘20130901’

  AND <other Keep Criteria>;

The SELECT … INTO method:

SELECT * INTO factMainTable_Temp

FROM factMainTable

WHERE OrderDate >=‘20130701’

  AND OrderDate <‘20130901’

  AND <other Keep Criteria>;

As pointed out in the Guide, another way to perform the bulk insert is to use Integration Services to achieve many concurrent, streams into the factMainTable_Temp table. If you need to automate the DELETE in SSIS packages, then the Integration Services is the way to go.

2) Truncate the old data to remove all rows from the main table

BEGIN TRAN; –if you want to keep the operation transactional

TRUNCATE TABLE factMainTable;

3) Bulk Insert the Data from the Temp Table to the Main Table

INSERT INTO factMainTable;

SELECT * FROM factMainTable_Temp;

4) Drop the temporary table

DROP TABLE factMainTable_Temp;

COMMIT TRAN; –If you used a transaction, commit it now

Optionally, in step 2) you can DROP the main table:

DROP TABLE factMainTable;

Don’t forget to script out all the constraints and indexes and keys on the main table before it is dropped.

Then in step 3), you can rename the temp table as the main table:

EXECUTE SP_RENAME ‘TheShcemaName.factMainTable_Temp’, ‘factMainTable’;

Finally, re-create all constraints and indexes and keys on the main table.

3. Updating a Nonpartitioned Table – Switch with a Temp Table

When updating a fairly good size table, you might also want to try the steps in the scenario #2 – Switch with a Temp Table.

Again the idea is to avoid using the UPDATE command directly on the main table, and to take advantage of the bulk load optimizations in T-SQL with the INSERT … SELECT method or the SELECT … INTO method.

The only difference in this UPDATE situation is that the first step, where the temporary table is created, needs to take care of the new values for the columns you want to update.

Typically, the new values come from a third table. In this case, a JOIN will be needed.

INSERT INTO factMainTable_Temp WITH (TABLOCK)

SELECT main.cols, <new values>, main.more_cols

FROM factMainTable main JOIN TheThirdTable third_tbl

ON main.join_col = third_tbl.join_col

WHERE main.OrderDate >=‘20130701’

  AND main.OrderDate <‘20130901’

  AND <other Keep Criteria>;

4. Deleting 10%+ (but less than 50%) Rows from a Nonpartitioned Table – DELETE in Chunks

In this scenario, I’ll assume that the data you want to delete is less than the data you want to keep on the table. This is opposite to the scenario #2. When deleting more than 10% of the rows from a table with 20+ GB data, I’ve noticed that the deletion started to take much longer to complete.

Since the rows I want to keep is way more than what I need to delete, I doubt the “Switch with a Temp Table” method will be worth it.

So I tried the DELETE in Chunks in T-SQL. The basic idea is to take advantage of the minimum logging.

Check out this section “Using TOP to limit the number of rows deleted” in the link below.

TOP (Transact-SQL)

When a TOP (n) clause is used with DELETE, the delete operation is performed on an undefined selection of n number of rows. That is, the DELETE statement chooses any (n) number of rows that meet the criteria defined in the WHERE clause.

In my example, 200,000 rows are deleted from the factMainTable that have order dates that are between the START_DATE and the END_DATE. I do not need to use the ORDER BY clause here not only because ORDER BY will slow down the DELETE, but also that ORDER BY is not needed. What will break the WHILE loop is the system variable @@ROWCOUNT. When no more rows in the date range,  @@ROWCOUNT will be zero and the WHILE loop will end.

DECLARE @i int = 1;

WHILE @i > 0
BEGIN
DELETE TOP(200000) factMainTable
WHERE OrderDate >= @START_DATE
AND OrderDate <= @END_DATE;

SET @i = @@ROWCOUNT;
END;

Conclusion – Bulk Load Optimizations

The Guide has a good summary on the Bulk Load Methods in SQL Server.

To provide fast data insert operations, SQL Server ships with several of standard bulk load methods.

  • Integration Services Data Destinations – The method for performing bulk load from the Integration Services ETL tool.
  • BCP – The command line utility for performing bulk load from text files.
  • BULK INSERT – The method for performing bulk load in Transact-SQL. The term “BULK INSERT” refers to the specific Transact-SQL based bulk load method.
  • INSERT … SELECT – The method for performing bulk load in process with SQL Server from local queries or any OLE DB source. This method is only available as a minimally logged operation in SQL Server 2008.
  • SELECT INTO – The method for creating a new table containing the results of a query; It utilizes bulk load optimizations.
  • Similar bulk load techniques are supplied by programming interfaces to SQL Server, including the SQLBulkCopy class in ADO.NET, IRowsetFastload in OLE DB, and the SQL Server Native Client ODBC library.
    BTW, I have not been able to find the Guide for SQL Server 2012.