Skip to main content

SSIS Performance Comparison

While working with a client this past week, I ran into a scenario where I needed to compare two dates in a SSIS package. One date would be the same for all rows, and one date would be different for each row. There are two possible implementations: include the "same date" in all rows and do a column comparison, or put the "same date" into a variable and do a comparison between the variable and column. My first instinct was that the column comparison would be faster because the extra trip to the server would be more expensive than extra column space. As I thought about it more, I realized that as the size of the dataset increases, the time to handle the extra column might overtake the hit from the extra trip.

So I ran a test to find out!

Scenario:

I created a temp table in the AdventureWorksDW database called FactLottaInternetSales that contains multiple copies of the FactInternetSales database, but with all orderdatekeys set to 1.

I created two packages to do the comparison.
Package 1: Column.dtsx

1 Data Flow Task containing:

  • 1 OLE DB Source:
    SELECT top x *
    FROM [AdventureWorksDW].[dbo].[FactLottaInternetSales]
    WHERE orderdatekey=1
  • 1 Conditional Split: OrderDateKey <>
  • 1 Row Count (as a terminator)

Package 2: Variable.dtsx

1 Execute SQL Task which assigns the result to the variable DateKey:
select max(orderdatekey)
from adventureworksdw.dbo.FactLottaInternetSales
where orderdatekey=1

1 Data Flow Task containing:

  • 1 OLE DB Source:
    SELECT top x [ProductKey] ,[OrderDateKey] ,[DueDateKey] ,[ShipDateKey] ,[CustomerKey] ,[PromotionKey] ,[CurrencyKey] ,[SalesTerritoryKey] ,[SalesOrderNumber] ,[SalesOrderLineNumber] ,[RevisionNumber] ,[OrderQuantity] ,[UnitPrice] ,[ExtendedAmount] ,[UnitPriceDiscountPct] ,[DiscountAmount] ,[ProductStandardCost] ,[TotalProductCost] ,[SalesAmount] ,[TaxAmt] ,[Freight] ,[CarrierTrackingNumber] ,[CustomerPONumber]
    FROM [AdventureWorksDW].[dbo].[FactLottaInternetSales]
    where orderdatekey=1
  • 1 Conditional Split: @[User::DateKey] <>
  • 1 Row Count (as a terminator)

Results:




The variable package takes longer up to about 2 million rows. After that, the column package takes longer! Happy performance tuning :)


Version: SQL Server 2005 SP2

Comments

Anonymous said…
Excellent analysis!

Popular posts from this blog

Upgrading your SSIS Management Framework: Part 3

At this point, you understand the options for moving an SSIS framework to the latest version of SSIS, and you've upgraded the logging portion of the framework using a hybrid approach.  The final step in the framework upgrade is handling your configurations.  Let's walk through an existing configuration implementation and how you can upgrade it by combining your existing implementation with the standard SSIS framework. Overview A typical "old-school" configuration scheme is described in the SSIS PDS book or in this blog post here: http://jessicammoss.blogspot.com/2008/05/ssis-configuration-to-configuration-to.html .  Starting in SSIS 2012, the configuration scheme uses environments and parameters when using the Project Deployment Model, as discussed here: http://msdn.microsoft.com/en-us/library/hh213290(v=sql.110).aspx . In both scenarios, the core ideas in a configuration scheme are: Provide the ability to move packages through environments without having

Manipulating Excel Spreadsheets in SSIS

Tom, an attendee at last weekend’s SQLSaturday Olympia , asked me how to refresh a spreadsheet from within SQL Server Integration Services. My first thought was to turn on the connection’s “Refresh data when opening the file” option in the spreadsheet itself and avoid the situation entirely; however, this may not always be a viable solution. Here are the steps to perform the refresh from within an SSIS package. First, ensure that Microsoft.Office.Interop.Excel is registered in the GAC. If not, install the 2007 Microsoft Office system Primary Interop Assemblies . This will need to be done on any machine where you plan on running this package. Next, create a script task in your SSIS package that contains the following code (include your spreadsheet name): Imports System Imports System.Data Imports System.Math Imports Microsoft.SqlServer.Dts.Runtime Imports Microsoft.Office.Interop.Excel Public Class ScriptMain Public Sub Main() Dts.TaskResult = Dts.Results.Success Dim excel

Reporting Services 2008 Configuration Mistake

To start working with the management side of SQL Server Reporting Services 2008, I decided to set up a report server and report manager. Unfortunately, I made a mistake while setting up my configuration that left me a little perplexed. Here are the steps I took to cause, track down, and solve the issue. Problem: I began by opening the Reporting Services Configuration Manager from the Start Menu. I clicked through each of the menu options and accepted the defaults for any question with a warning symbol, since warning symbol typically designate an action item. After two minutes, all of the warning symbols had disappeared, and I was ready to begin managing my report server. Unfortunately, opening up a browser and trying to open up the report manager resulted in the dreaded " The report server has encountered a configuration error. (rsServerConfigurationError) " message. Sherlock-ing it: I put on my sleuthing hat and went to the log file directory: C:\Program Files\Microsoft