Wednesday, April 30, 2008

SSIS Performance Comparison

While working with a client this past week, I ran into a scenario where I needed to compare two dates in a SSIS package. One date would be the same for all rows, and one date would be different for each row. There are two possible implementations: include the "same date" in all rows and do a column comparison, or put the "same date" into a variable and do a comparison between the variable and column. My first instinct was that the column comparison would be faster because the extra trip to the server would be more expensive than extra column space. As I thought about it more, I realized that as the size of the dataset increases, the time to handle the extra column might overtake the hit from the extra trip.

So I ran a test to find out!

Scenario:

I created a temp table in the AdventureWorksDW database called FactLottaInternetSales that contains multiple copies of the FactInternetSales database, but with all orderdatekeys set to 1.

I created two packages to do the comparison.
Package 1: Column.dtsx

1 Data Flow Task containing:

  • 1 OLE DB Source:
    SELECT top x *
    FROM [AdventureWorksDW].[dbo].[FactLottaInternetSales]
    WHERE orderdatekey=1
  • 1 Conditional Split: OrderDateKey <>
  • 1 Row Count (as a terminator)

Package 2: Variable.dtsx

1 Execute SQL Task which assigns the result to the variable DateKey:
select max(orderdatekey)
from adventureworksdw.dbo.FactLottaInternetSales
where orderdatekey=1

1 Data Flow Task containing:

  • 1 OLE DB Source:
    SELECT top x [ProductKey] ,[OrderDateKey] ,[DueDateKey] ,[ShipDateKey] ,[CustomerKey] ,[PromotionKey] ,[CurrencyKey] ,[SalesTerritoryKey] ,[SalesOrderNumber] ,[SalesOrderLineNumber] ,[RevisionNumber] ,[OrderQuantity] ,[UnitPrice] ,[ExtendedAmount] ,[UnitPriceDiscountPct] ,[DiscountAmount] ,[ProductStandardCost] ,[TotalProductCost] ,[SalesAmount] ,[TaxAmt] ,[Freight] ,[CarrierTrackingNumber] ,[CustomerPONumber]
    FROM [AdventureWorksDW].[dbo].[FactLottaInternetSales]
    where orderdatekey=1
  • 1 Conditional Split: @[User::DateKey] <>
  • 1 Row Count (as a terminator)

Results:




The variable package takes longer up to about 2 million rows. After that, the column package takes longer! Happy performance tuning :)


Version: SQL Server 2005 SP2

Monday, April 28, 2008

Come to SQLTeach Toronto!

I'm thrilled to be speaking at the SQLTeach Toronto conference in a few weeks. I will be presenting two topics: "SSIS Tips & Tricks" and "Migrating DTS Packages to SSIS". If you're going to be in the Toronto area from May 12 - 16, come check it out!

A quick preview of what you can expect at the SQLTeach/DevTeach conference:

  • Silverlight 2.0 post conference workshop
  • Party with Palermo
  • Scott Hanselman keynote
  • An amazing line-up of speakers on .Net, SQL, and development artchitectures
I hope to see you there!

Thursday, April 24, 2008

Women Technical Speakers

It's not surprising to anyone working in technology that we work in a male-dominated field. If you've been to a technical conference recently, it's also not surprising that the percentage of female speakers is drastically lower than male speakers. Is this due to the obvious gender imbalance in the field, the difficulty in finding women to speak, or something else? I'm not going to try to venture a guess - I'm just glad to embrace a site that is trying to "even things up a little" by providing a repository of women technical speaker bios for conference organizers to utilize.

GeekSpeakr: http://www.geekspeakr.com

Disclaimer: You'll see my bio on there too :)

Tuesday, April 22, 2008

Execute SQL Task Designer Limit

After migrating a package from DTS to SSIS, I had a problem with an Execute SQL Task. I couldn't change any characters in the SQLStatement property; I couldn't add any new characters; I could delete characters, but not retype them! After googling several variations of "integration services" "read only" and "Execute SQL Task", I deleted about half of the entry in a fit of frustration. Lo and behold, I could type again.

Apparently, there is limit on the size or number of characters that can be entered in the SQLStatement property. From my experimentation, I came up with a limit of 32767 characters.

The interesting thing is that the restriction only seems to be on the designer. If you set the SourceType to "Variable" and use a variable that contains more than 32767 characters, the task will execute. Also, if you use the "Direct Input" SourceType and modify the package XML to set the SQLStatement longer than 32767 characters, the task also works.

This is just something to keep in mind if you are putting a monster of a query together to execute in SSIS. Of course, if you have a query that long, you might want to think about breaking it up into smaller pieces or encapsulating some of the logic into stored procedures. :)

Version: SQL Server 2005 SP2

Wednesday, April 16, 2008

Show SSIS Packages Alphabetically in SSMS

Trying to find an SSIS package in SQL Server Management Studio is difficult because the packages are ordered by the date they were imported. You can make a quick adjustment to a system stored procedure, which will show the packages in alphabetical order.

Modify the [msdb].[dbo].[sp_dts_listpackages] stored procedure to include an order by at the end:

ALTER PROCEDURE [dbo].[sp_dts_listpackages]
@folderid uniqueidentifier
AS
SELECT
name,
id,
description,
createdate,
folderid,
datalength(packagedata),
vermajor,
verminor,
verbuild,
vercomments,
verid
FROM
sysdtspackages90
WHERE
[folderid] = @folderid
ORDER BY

name

and Voila! your packages will be listed in alphabetical order when you view them in SSMS.

As of SQL Server 2008 CTP6, this functionality has already been added. The new stored procedure that is executed when expanding the list of packages in SSMS is named: [msdb].[dbo].[sp_ssis_listpackages].

Versions: SQL Server 2005 SP2/SQL Server 2008 CTP6

Wednesday, April 9, 2008

Orphaned Log Records

SQL Server Integration Services contains the SSIS log provider for SQL Server, which can be used to log records to a SQL Server database in the sysdtslog90 table. All sorts of events and information can be recorded during the execution of a package. If you were to turn on logging, but turn off all possible events to be logged, you would still end up with two events being fired and recorded: PackageStart and PackageEnd. These are also the events that are normally used when reporting on how the packages performed. See http://www.microsoft.com/downloads/details.aspx?FamilyId=D81722CE-408C-4FB6-A429-2A7ECD62F674&displaylang=en (SSISEventLogReportPack.exe) for examples of reporting on execution times, success/failure counts, etc.

The bite comes when you have a package that either fails very early in its validation phase, or if you stop the package before it's really started to rev up. If you include all events in your logging, both an OnPreValidate and an OnPostValidate event will occur before the PackageStart event. Because most logging is done based on the PackageStart event, these "orphaned" records will not show up any reports.

To see these records, you can run the following query:
select * from sysdtslog90

where executionid not in
( select executionid from sysdtslog90
where event in ('PackageStart', 'PackageEnd') )
and executionid in
( select executionid from sysdtslog90
where source = 'putPackageNameHere')

You can also use a similar query to delete these records if you don't want them cluttering up your log table.

Version: SQL Server 2005 SP2