Wednesday, July 8, 2009

Sitting on the beach in Pensacola

The SQLSaturday phenomenon is taking over the nation!  I was honored to be selected to give a presentation at the Pensacola event on June 6, 2009 on PerformancePoint Server 2007 M&A (I'm still waiting to hear more information about next version incorporated with SharePoint :))  I had a great time seeing the SQLSaturday crew, old friends, and meeting some new friends!

While I was down in beautiful Pensacola, I was also thrilled to participate in the third episode of the It's All About The Tools TV Show!  I gave a demonstration on the SSIS 2008 data profiling tools - you can check it out here: http://channel9.msdn.com/shows/toolshed/Show-Episode-3-Its-All-About-The-Tools-TV-Show/Russ Fustino and Stan Schultes put on a great show that I'm sure you're going to enjoy!

I really enjoy speaking at events and sharing information on data warehousing and the Microsoft business intelligence suite.  I’ll warn you in advance that I prioritize engagements based on the shortest distance to water, sunshine, and community -- and not necessarily in that order :)

Tuesday, May 19, 2009

SQLTeach Vancouver 2009

On June 8-12, I'll be attending and speaking for the DevTeach/SQLTeach Vancouver conference. I've written about SQLTeach before, and I have to say that it's one of my favorite conferences. It's a small group of attendees with speakers who are some of the biggest names in the field. You really get a great opportunity to learn from and hobnob with speakers such as Peter DeBetta, Bill Graziano, Kevin Kline, Joe Webb, and more! I know I'm looking forward to picking their brains on a few things.

A few highlights of the conference:

  • Monday night: Party with Vancouver IT community DevTeach Kickoff party
  • Weekend after: Alt.NET event - DevTeach attendees receive a special registration code
  • Preconferences: 3 preconferences on Silverlight, F#, and Agile Development
  • Keynote: Tim Huckaby will discuss "Your Development Happy Place"

Register soon to get a great deal - the conference is right around the corner. I look forward to seeing you there!

Wednesday, May 13, 2009

Dimension ETL from One Source Table

In an ideal world, the source system of your data warehouse has the exact information that you need to populate all of the fields in your dimensions.  In a less than ideal world (also known as the real world), we need to cobble pieces of data together.  You may come across one scenario where all of your dimension and fact information is in one large table.  How do we handle this ETL in SQL Server Integration Services?

It is possible to load new records into your dimensions while loading your fact.  One way would be to use a Lookup transformation to check for existence, and if the business key doesn't yet exist, insert that value, return the surrogate key, and go along your merry way.  On the other hand, if you need to use those dimensions for other fact tables, you may decide to load only your dimensions first.

To load all dimensions from one table, we can utilize the Aggregate transformation in SSIS, which provides the capability to perform aggregations on columns in your data flow.  The aggregations that are available to you include: Count, Count distinct, Sum, Average, Minimum, and Maximum.  You also include a column to group the data.  For the output that you're setting up, you can add as many columns to aggregate or group by that you need.  For those of you familiar with T-SQL, this concept should be very similar to the GROUP BY clause.

When we're working with aggregations in SSIS, there are a few other things we should mention.  The aggregate transformation will not pass through all input columns, only the columns that you specify in your settings.  You can set comparison flags on your grouping columns on how to group the data together.  I also use the "Ignore case" option to help get a list of case insensitive distinct values from the column.

Initially opening up the Aggregate transformation allows you to set up the columns you want aggregated and what you want to group by.  In this scenario, we will use the Group By function for each column to get a distinct list of values; however, if we put all of them on this screen, we will still end up with duplicate values.  Instead, we want to create multiple outputs that each contain a single Group By on the appropriate column.

There's a sneaky little button at the top of the designer window that toggles between Advanced and Basic modes.  It defaults to Basic, which is why you only see one list right now.  Push that button to toggle to Advanced and create a different output for each dimension you need to populate.  Once you have a different output, you can perform a lookup against that dimension and only insert records that do not exist.  We end up with a package that looks similar to this to load all of our dimensions:

 

The aggregate transformation has a little bit of a bad reputation, as well it should.  It is an asynchronous component, so it creates a new buffer set when it runs, uses a huge amount of memory, and slows down your package execution time.  See Kirk's great article about performance tuning SSIS (the aggregate transformation notes are applicable to both 2005 and 2008) for more information: http://www.simple-talk.com/sql/sql-server-2005/sql-server-2005-ssis-tuning-the-dataflow-task/.

This method may not be the best for your situation.  It will load the source data twice and doesn’t take into account any slowly changing attributes.  This should only be used if the situation calls for it.  Hopefully, this will help you if you do fall into that situation!

Version used: SSIS 2008 SP1

Thursday, February 26, 2009

SSIS Insert Statement Using an OLE DB Destination

When building a SQL Server Integration Services package, many business scenarios call for inserting a record into a table if it doesn't already exist. The most commonly used database destination, the OLE DB destination, looks as though it can handle this through the SQL command Data access mode. Unfortunately, SSIS appearances can be deceiving...

In the OLE DB Destination, setting the Data access mode to SQL command causes a SQL command text window to appear. You can perform typical SSIS SQL statement actions, such as building the query through the Graphical Query Designer, importing the SQL from an external file, or parsing the query. Note that there is no option to specify query parameters on this display. You will not need to set query parameters because the OLE DB Destination uses this SQL statement to find the metadata of the desired insertion table. Even adding a WHERE clause to filter the data will not change the outcome of the result. All rows passed through the Data Flow pipeline are inserted into the destination table.

The SSIS OLE DB Destination uses the OLE DB Provider specified in the connection manager associated with that destination. The resulting SQL statements from the OLE DB Provider set to use the SQL command resemble the SQL statements from the Table or view Data access mode. I created an SSIS packages that inserts data into the AdventureWorksDW DimProductCategory table using both destination data access modes. The insert portions (taken from SQL Profiler) both match this code:

exec sp_cursor 180150003,4,0,N'[DimProductCategory]',@ProductCategoryAlternateKey=9,@EnglishProductCategoryName=N'Bikes',@SpanishProductCategoryName=N'Bicicleta',@FrenchProductCategoryName=N'Vélo'
go
exec sp_cursor 180150003,4,0,N'[DimProductCategory]',@ProductCategoryAlternateKey=10,@EnglishProductCategoryName=N'Components',@SpanishProductCategoryName=N'Componente',@FrenchProductCategoryName=N'Composant'
go


Note that sp_cursor, an internal API server cursor call that the OLE DB Provider uses, performs the insert.

The difference in these two methods is in the set up of the initial cursor. The SQL command method uses two statements:


declare @p1 int
set @p1=1073741825
declare @p5 int
set @p5=229378
declare @p6 int
set @p6=294916
exec sp_cursorprepare @p1 output,NULL,N'SELECT ProductCategoryKey
,ProductCategoryAlternateKey
,EnglishProductCategoryName
,SpanishProductCategoryName
,FrenchProductCategoryName
FROM DimProductCategory
WHERE ProductCategoryKey <> 1'
,1,@p5 output,@p6 output
select @p1, @p5, @p6
go
declare @p2 int
set @p2=180150003
declare @p3 int
set @p3=2
declare @p4 int
set @p4=4
declare @p5 int
set @p5=-1
exec sp_cursorexecute 1073741825,@p2 output,@p3 output,@p4 output,@p5 output
select @p2, @p3, @p4, @p5
go

While the Table or view method uses just one:

declare @p1 int
set @p1=180150003
declare @p3 int
set @p3=2
declare @p4 int
set @p4=4
declare @p5 int
set @p5=-1
exec sp_cursoropen @p1 output,N'select * from [dbo].[DimProductCategory]',@p3 output,@p4 output,@p5 output
select @p1, @p3, @p4, @p5


You can solve the original business problem of only inserting records if it doesn't already exist by performing a lookup against your destination table, redirecting the rows that do not match, and inserting those rows using the OLE DB Destination. Depending on your particular scenario, other methods also exist that do not use a lookup. Whichever method you use, know that it will not entail a hand-written insert statement from an OLE DB Destination.

Version used: SQL Server 2005

Thursday, February 12, 2009

SQL Down Under

I had the honor of being invited to join Greg Low, fellow Solid Quality mentor, on an episode of SQL Down Under.  On last week’s episode, we discussed the new features of SQL Server Reporting Services 2008 and best practices for all versions of Reporting Services.  You can check out the episode here.

During the episode, Greg and I talked about reports that can be used to monitor Reporting Services.  You can find these reports on CodePlex.  Also, the design book that I mentioned is called Information Dashboard Design.  I hope you enjoy the episode and these resources!

Monday, January 26, 2009

Analysis Services MDX Templates Exploration

While looking in SQL Server Management Studio for something else entirely, I stumbled upon the Template Explorer.  This window provides Analysis Services templates for querying data mining structures (DMX), querying cubes (MDX), and performing DDL (XMLA).  I took a deeper look into the “MDX Queries” templates.

To view the Template Explorer, select the View menu > Template Explorer option.  To see the Analysis Services template, select the Analysis Services cube option at the top of the window.  Double-clicking any of the templates listed will then open a new query window containing the selected query.  For example, the Basic Query will show the following:

Select    <row_axis, mdx_set,> on Columns,
        <column_axis, mdx_set,> on Rows

From    <from_clause, mdx_name,>
Where    <where_clause, mdx_set,>

You have two choices at this point in time: selecting the Query menu > Specify Values for Template Parameters option or replacing the phrases enclosed by angle brackets manually.  The former choice opens a dialog box with all parameters listed to allow you to fill in the correct value; however, you will need to type in the full hierarchical structure by hand.  If you’re anything like me, this is bound to cause a typo and a few frustrating minutes of letter-by-letter comparison.  I prefer to modify the query directly by dragging the measure or dimensional attribute/hierarchy to my query window.  Then I don’t need to worry about mistyping anything.

While these templates will not teach you how to write MDX, they are an easy way to save yourself some typing or looking up a particular keyword that you have forgotten.  Looking over the XMLA queries, they appear to be more useful, as I am forever looking up the exact syntax for a particular XMLA query.

Versions: SQL Server 2005/2008

Thursday, January 8, 2009

SSIS Designer Tip

When designing a SQL Server Integration Services package, it can seem tedious to drag over each and every task and component from the toolbox to your Control Flow and Data Flow and connect all of the precedence constraints and pipelines. You can alleviate some of this by modifying the default Business Intelligence options within Business Intelligence Development Studio (BIDS).

Under the default settings, you can double-click any toolbox item and it will show up in your package designer with no connectors or specific place. To improve this, open up the Tools > Options... menu in BIDS. Then expand Business Intelligence Designers and Integration Services Designers. You will see Control Flow Auto Connect and Data Flow Auto Connect. If you check the option to "Connect a new shape to the selected shape by default", the drop down lists for specifying connector type and location are enabled in each menu. I prefer to use a Success constraint and add the new shape to the right of the selected shape, but you have a few options based on your design predilection.

Once these options are checked, double-clicking a toolbox item will add that item to the designer, using the options specified in the drop downs. You can of course change the type of constraint or move the item once it has been generated for you. Hopefully, this will save you a little bit of time when designing!

[Hat tip] I read about this option from Donald Farmer's great SSIS Scripting book.

Versions: Visual Studio 2005, Visual Studio 2008