When building a SQL Server Integration Services package, many business scenarios call for inserting a record into a table if it doesn't already exist. The most commonly used database destination, the OLE DB destination, looks as though it can handle this through the SQL command Data access mode. Unfortunately, SSIS appearances can be deceiving...
In the OLE DB Destination, setting the Data access mode to SQL command causes a SQL command text window to appear. You can perform typical SSIS SQL statement actions, such as building the query through the Graphical Query Designer, importing the SQL from an external file, or parsing the query. Note that there is no option to specify query parameters on this display. You will not need to set query parameters because the OLE DB Destination uses this SQL statement to find the metadata of the desired insertion table. Even adding a WHERE clause to filter the data will not change the outcome of the result. All rows passed through the Data Flow pipeline are inserted into the destination table.
The SSIS OLE DB Destination uses the OLE DB Provider specified in the connection manager associated with that destination. The resulting SQL statements from the OLE DB Provider set to use the SQL command resemble the SQL statements from the Table or view Data access mode. I created an SSIS packages that inserts data into the AdventureWorksDW DimProductCategory table using both destination data access modes. The insert portions (taken from SQL Profiler) both match this code:
exec sp_cursor 180150003,4,0,N'[DimProductCategory]',@ProductCategoryAlternateKey=9,@EnglishProductCategoryName=N'Bikes',@SpanishProductCategoryName=N'Bicicleta',@FrenchProductCategoryName=N'Vélo'
go
exec sp_cursor 180150003,4,0,N'[DimProductCategory]',@ProductCategoryAlternateKey=10,@EnglishProductCategoryName=N'Components',@SpanishProductCategoryName=N'Componente',@FrenchProductCategoryName=N'Composant'
go
Note that sp_cursor, an internal API server cursor call that the OLE DB Provider uses, performs the insert.
The difference in these two methods is in the set up of the initial cursor. The SQL command method uses two statements:
declare @p1 int
set @p1=1073741825
declare @p5 int
set @p5=229378
declare @p6 int
set @p6=294916
exec sp_cursorprepare @p1 output,NULL,N'SELECT ProductCategoryKey
,ProductCategoryAlternateKey
,EnglishProductCategoryName
,SpanishProductCategoryName
,FrenchProductCategoryName
FROM DimProductCategory
WHERE ProductCategoryKey <> 1',1,@p5 output,@p6 output
select @p1, @p5, @p6
go
declare @p2 int
set @p2=180150003
declare @p3 int
set @p3=2
declare @p4 int
set @p4=4
declare @p5 int
set @p5=-1
exec sp_cursorexecute 1073741825,@p2 output,@p3 output,@p4 output,@p5 output
select @p2, @p3, @p4, @p5
go
While the Table or view method uses just one:
declare @p1 int
set @p1=180150003
declare @p3 int
set @p3=2
declare @p4 int
set @p4=4
declare @p5 int
set @p5=-1
exec sp_cursoropen @p1 output,N'select * from [dbo].[DimProductCategory]',@p3 output,@p4 output,@p5 output
select @p1, @p3, @p4, @p5
You can solve the original business problem of only inserting records if it doesn't already exist by performing a lookup against your destination table, redirecting the rows that do not match, and inserting those rows using the OLE DB Destination. Depending on your particular scenario, other methods also exist that do not use a lookup. Whichever method you use, know that it will not entail a hand-written insert statement from an OLE DB Destination.
Version used: SQL Server 2005
Comments
Imagine a set of data: "red", "red" "orange" getting inserted into a table that already has "orange", "blue", "green".
The unmatched set will include "red", "red". The first insert works, the second is a duplicate. Any solution there?
You'll want to de-dupe your data before inserting, using an aggregate or sort transformation.
Jessica
Thanks!
I'm not sure if SSIS reacts the same way in 2012 - I would recommend running a profile trace and try it to see!
Jessica
Is it actually documented anywhere (other than in good blogs like yours) that the only purpose of the SQL command is to find the table metadata?
Some bloggers recommend against using "table or view" and say to always create a SQL statement. But it seems VERY odd to create a "Select" statement in a component that is going to be doing data inserts! I think of the data coming out from the Select statement, and crashing into the data that is coning in from the data flow, and leaving bits all over the floor.
Very poor documentation.