Tuesday, October 21, 2008

Manipulating Excel Spreadsheets in SSIS

Tom, an attendee at last weekend’s SQLSaturday Olympia, asked me how to refresh a spreadsheet from within SQL Server Integration Services. My first thought was to turn on the connection’s “Refresh data when opening the file” option in the spreadsheet itself and avoid the situation entirely; however, this may not always be a viable solution. Here are the steps to perform the refresh from within an SSIS package.

First, ensure that Microsoft.Office.Interop.Excel is registered in the GAC. If not, install the 2007 Microsoft Office system Primary Interop Assemblies. This will need to be done on any machine where you plan on running this package.

Next, create a script task in your SSIS package that contains the following code (include your spreadsheet name):

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
Imports Microsoft.Office.Interop.Excel

Public Class ScriptMain

Public Sub Main()
Dts.TaskResult = Dts.Results.Success

Dim excel As New Microsoft.Office.Interop.Excel.Application
Dim wb As Microsoft.Office.Interop.Excel.Workbook

wb = excel.Workbooks.Open("C:\\TestExcelSS.xlsx")
wb.RefreshAll()
wb.Save()
wb.Close()

excel.Quit()
Runtime.InteropServices.Marshal.ReleaseComObject(excel)

End Sub

End Class

You'll see error squiggles, but don't worry about them because they will disappear in just a minute. Save and close your package. In your Solution Explorer, right click on the package and select ‘View Code’.

In the resulting XML, change the Build Settings ReferencePath property to:
"C:\Windows\assembly\GAC_MSIL\Microsoft.SqlServer.ScriptTask\9.0.242.0__89845dcd8080cc91\;C:\Windows\assembly\GAC_MSIL\Microsoft.SqlServer.ManagedDTS\9.0.242.0__89845dcd8080cc91\;C:\Windows\assembly\GAC\Microsoft.Office.Interop.Excel\12.0.0.0__71e9bce111e9429c\"

Also change the Build References to include:
<Reference
Name = "Microsoft.Office.Interop.Excel"
AssemblyName = "Microsoft.Office.Interop.Excel"
/>

Save the XML, and reopen the package. Open the script task and select ‘Save’. This will compile the code, and now you can run your package.

When working with COM references, you can use the script task GUI to add the reference by adding the desired component to the .NET framework folder. I could not find Microsoft.Office.Interop.Excel.dll on my machine to move to the framework folder, which is why we added the reference through the XML.

As Douglas Laudenschlager notes, writing server-side code to access client-side Office is unsupported. Please take these possible problems under advisement and code as necessary. You have been warned. :)

Update (11/12/08): Added last two lines to code to stop Excel process.

Versions: Microsoft Office 2007, SQL Server 2005 SP2

52 comments:

Anonymous said...

Hi Jessica,

After I ran the code, in my Windows Task Manager Excel.exe is opened as many times as I run the code, how can I get this code to close Excel.exe.? I tried excel.quit() but it didn't help.

Regards,

Christian

Jessica M. Moss said...

Hi Christian,

If you add this snippet of code after the existing script, it removes the current process from the Task Manager:

excel.Quit()
Runtime.InteropServices.Marshal.ReleaseComObject(excel)

I'm not familiar with how the management of COM objects work, so I'm not sure if there are any aftereffects, but it appears to work.

Jessica

vivek said...

Hi Jessica,

Do you know how can I use Excel functions inside of SSIS ?

My situation is like this:
I have to import an Excel spreadsheet into SSIS.
Then I have to find the 50th and 75th percentile of the values in a certain column.
Based on the 2 percentile values, I have to create a derived column.

This has to be be done inside a package in SSIS.

In MS Excel, I could use the Percentile function to get the percentile values. But how can I do the same thing inside SSIS ?


Thanks and Regards.

Jessica M. Moss said...

Hi Vivek,

To figure out a percentile within an SSIS package, you would want to either use VB.NET code in a script task or an Execute SQL Task after loading the data into a temporary table. Keep in mind that percentiles (http://en.wikipedia.org/wiki/Percentile) are not always calculated the same way, but here is an example that someone created for SQL: http://www.sqlteam.com/article/computing-percentiles-in-sql-server

Good luck!
Jessica

Anonymous said...

Hi Jessica,

Thanks for the help. The problem is that I need to hide two sheets after I refreshed my Workbook:
wsListOfItems = CType(wb.Sheets("ListOfItems"), Microsoft.Office.Interop.Excel.Worksheet)
wsAddress = CType(wb.Sheets("Address"), Microsoft.Office.Interop.Excel.Worksheet)
wsListOfItems.Visible = XlSheetVisibility.xlSheetHidden
wsAddress.Visible = XlSheetVisibility.xlSheetHidden

This lines of code keeps Excel.exe open in memory(Windows Task Manager).

Another question that I want to ask is, how come when I use wb = excel.Workbooks.Open("C:\\TestExcelSS.xls") it enables the Macros as well?

Regards,

Christian

Jessica M. Moss said...

Hi Christian,

We're starting to get into Excel programming questions, which is not my area of expertise. Here's a good source of information for Excel programming: http://msdn.microsoft.com/en-us/library/bb687921.aspx or you can try posting your question on the associated newsgroup: http://www.microsoft.com/communities/newsgroups/en-us/?dg=microsoft.public.excel.programming

Good luck!
Jessica

ColeR said...

Followed the sample, but I get the error 'Microsoft.Office.Interop.Excel.Application' is not defined. The only difference is that my interop assembly is verion 11.0.0.0 (this shows on the GAC)

I can see "Microsoft.Office.Interop.Excel" under Reference in VS for Application, but when I run the task it gives me that error. Any idea?

Jessica M. Moss said...

Hi ColeR,

Because you're working with an earlier version of Excel, you'll need to change the reference. Replace C:\Windows\assembly\GAC\Microsoft.Office.Interop.Excel\12.0.0.0__71e9bce111e9429c\ with your version (I don't have that specific file to tell you what it should be). Make sure you modify the "71e9bce111e9429c" to match your file as well.

-Jessica

Anonymous said...

Hi Jessica,

This code does not work for us as it saves the file before the refresh is finished. When using excel 2003 we used the xlSheet.QueryTables(1).Refresh(False) method and it worked fine as long as your query was set up using excel 2003 or earlier version. By setting the false it stays on that line of code until the refresh is finished. Now if we set up a new query using Excel 2007 this line of code breaks.

Do you have any ideas?

Jessica M. Moss said...

Hello,

Here are a couple of things to try... The first is that you might want to look at setting PivotCache.BackgroundQuery = False on any pivot tables you have - those are set to run in the background, so may be where the issue lies. The other is to try inserting a Thread.Sleep() for a few milliseconds/seconds after you call the Refresh for debugging purposes.

HTH,
Jessica

Anonymous said...

Hi Jessica,

I just want to know. I'm using Excel2007 and I have an excel macro enable file, *.xlsm, which needs updating. Will this code refresh the data as well?

Regards

Christian

Jessica M. Moss said...

Hi Christian,

According to this article, http://msdn.microsoft.com/en-us/library/bb223574.aspx, pivot tables and external data ranges are the objects that are refreshed.

Jessica

VeriS Liem said...

If you use Thread.Sleep() call the package from command prompt instead of from SQL Agent. You can schedule it using Windows Schedule Tasks instead of SQL Job.

LGN Technology LLC said...

hi, i need to create a small gui app that will import different xls files into different sql server 2008 tables based on a 3 character code. do you know if anything like that exists or what's the best way to build it? thanks.
carminenatale@gmail.com

Anonymous said...

Hi Jessica,

I've got 97-2003 excel file with several tabs in it and some pivot tables also.Here I need to refresh this excel file using SSIS package. I saw you are code for 2007 Excel file it would be great help if you could guide me for 97-2003 excel file refresh through SSIS package. Thanks

Joshua C said...

I know this is an old post and this is a dumb question, but if you install PrimaryInteropAssembly.exe on your SQL Server, do you still have to install MS Office on it? My guess is yes since it invokes Excel objects that I do not believe comes with the 6 MB install (PrimaryInteropAssembly.exe), but I need a second opinion to convince my DBA. Thank you.

Joshua C said...

For the anonymous question before my last comment, there is PIA for Office 2003 as well. Please refer to this link.

http://msdn.microsoft.com/en-us/library/aa159923%28office.11%29.aspx

Jessica M. Moss said...

Hi Joshua C,

I tested my code on a machine that had Office installed, so I cannot say for sure. I would recommend testing this on a clean machine to check.

-Jessica

Christopher Nelson said...

Thank you - this was *exactly* what I needed to automatically refresh the data in a large collection of Excel files. It worked beautifully. Chris Nelson - Ascentium Corp.

Anonymous said...

Hi Jessica.

Your code is awesome. I used it to create a generic excel file refresh ssis package - just enter the path, filename, password, etc. into variables and it runs, refreshes, and sends an email with the results (stored proc). Awesome!

BUT - sometimes it hangs/exits, and the workbook remains open AND locked to me! Is there a way in code to close it down?

I have this code at the bottom of my sub, added a couple things I thought might work to yours, but doesn't seem to cut it:


On Error Resume Next

'close file without save
wb.Close(False)

'destroy excel object
wb = Nothing
excel.Workbooks(Filename).Close(False)
excel.Quit()
Runtime.InteropServices.Marshal.ReleaseComObject(excel)


Thanks,
Devo

Anonymous said...

Christian.. I faced the same problems (hanging) regarding closing excel after the dtsx execution.

You can find the solution here:
http://devcity.net/PrintArticle.aspx?ArticleID=239
specifically the third option...

Javier

Anonymous said...

Thanks Javier - just what I needed.

I took the code below and inserted it at the beginning of my code AND at the end (where errors go to).

So if you have anything open in Excel it shuts it down, before AND after trying to refresh the file.


Dim proc As System.Diagnostics.Process

'kill all instances of excel
For Each proc In System.Diagnostics.Process.GetProcessesByName("EXCEL")
proc.Kill()
Next


Note: it seems to cause issues if you run it AFTER the dim excel objects code. So, put it before this:

Dim excel As New Microsoft.Office.Interop.Excel.Application
Dim wb As Microsoft.Office.Interop.Excel.Workbook
Dim cn As Microsoft.Office.Interop.Excel.WorkbookConnection


and after this:


wb.Close(False)


-Devo

cucho said...

Hello guys again. I have to migrate my code to a 64 bits environment. My question is, do you know if there are some known problems with PIA in 64 bits?

Thanks in advance,

Javier

Jessica M. Moss said...

Hi cucho,

I haven't had to implement this on a 64 bit box yet. I know the 2010 version can work in WoW mode, so that may give you something to work with! http://msdn.microsoft.com/en-us/library/15s06t57(VS.100).aspx

Jessica

Anonymous said...

Hi Jessica,

I currently have two versions of excel installed 2003 and 2007. I am trying to run your code and it defaults to excel 2003, I would like to have it open in 2007 because the excel sheet I have has more than 256 columns. Any insight or suggestion would be helpful.

Thanks

Jessica M. Moss said...

Hi Anonymous,

Differentiating between version of Excel is getting out of my realm of expertise. I would suggest looking through Books Online or trying the MSDN forums for Excel for help!

Jessica

Anonymous said...

We used the above code when using SQL Server 2005, but now we are moving to SQL Server 2008 and it doesn't recognize the Dts. code. Any ideas?

Anonymous said...

We are also migrating to SQL Server SSIS 2008 and the Interop Assemblies will not save in the registry if we use vb 2008. Are there newer dll's that we can't find?

This doesn't happen with C# but I am not sure how to translate this code in C# since I have not used it before.

Thanks for any help.

Jessica M. Moss said...

Hi Anonymous #1,

Dts is still the object to use in 2008. Ensure you're using the script task, not the script component.

Hi Anonymous #2,

Using VB.NET shouldn't affect using the Interop Assemblies, so you may want to look more into that. Other than that, you can try converting code by using a website similar to this: http://converter.telerik.com/

HTH,
Jessica

Anonymous said...

use USING statement to help release/dispose the excel object.

ex.

USING

END USING

Anonymous said...

Thank you Jessica, it helped

Anonymous said...

Hello Jessica,

I have an Excel file with macro (Excel 2003 sp 3) that I want to execute thru SSIS. Now I have a job that calls this SSIS package that won't work. If I run the package from BIDS, it works. If I run the package from MSDB > right-click > run package, it works. If I open the Excel, the macro works as well. But not when the job calls it. The system user that runs the SQL Agent is admin in that machine (Windows Server 2003 R2 SP2 64bit).

Help?

Thanks,
Jaynne

Anonymous said...

I forgot to mention that when the SQL Job runs the package, I can see in Task Manager that the batch file is launched (cmd.exe) and Excel file is opened (EXCEL.exe). But it just stays like that, nothing happens, so I had to end task.

Thanks again,
Jaynne

Anonymous said...

This was exactly what I needed. Thank you!

Jessica M. Moss said...

Hi Jaynne,

It sounds as though there is a permission issue. Try using the advice in this KB article: http://support.microsoft.com/kb/918760 . If that doesn't work, you may want to try posting the symptoms to the MSDN forum.

Good luck!
Jessica

Darren Stanger said...

Jessica,

Regarding the instructions below, would you please post the XML from View Code? I'm having trouble finding the Build Settings ReferencePath because I seldom develop in .net.

Thanks,
Darren Stanger, Certiport, Inc.

In your Solution Explorer, right click on the package and select ‘View Code’.

In the resulting XML, change the Build Settings ReferencePath property to:
"C:\Windows\assembly\GAC_MSIL\Microsoft.SqlServer.ScriptTask\9.0.242.0__89845dcd8080cc91\;C:\Windows\assembly\GAC_MSIL\Microsoft.SqlServer.ManagedDTS\9.0.242.0__89845dcd8080cc91\;C:\Windows\assembly\GAC\Microsoft.Office.Interop.Excel\12.0.0.0__71e9bce111e9429c\"

Also change the Build References to include:

Jessica M. Moss said...

Hi Darren,

The XML will be different for every package due to the identifiers of the different elements. I'm afraid it wouldn't help at all. Try performing a file search for "ReferencePath".

Good luck!
Jessica

Niall said...

Hi Jessica,
Thanks heaps for posting your code - it was exactly what I needed.

IOU 1 * Pizza :)

cheers,
Niall

Anonymous said...

I have an Excel file with macro (Excel 2003) that I want to execute thru SSIS. Now I have a procedure that calls this SSIS package that won't work. If I run the package from BIDS, it works. If I open the Excel, the macro works as well. But not when the procedure calls it and inturn the excel process gets blocked. The system user that runs the SQL Agent is admin in that machine.
Any help in this regard would appreciate. Thanks in advance.

Anonymous said...

Jessica:

Any update regarding my yesterday's query? It would be great if you can provide us clarification. Thanks in advance.

Regard,Krishna

Jessica M. Moss said...

Hi Krishna,

I would start to tackle it from the angle of permission errors that occur when executing an SSIS package from SQL Agent. Try these troubleshooting options: http://support.microsoft.com/kb/918760

Jessica

Anonymous said...

Jessica,

I found your post to be very helpful, but it appears it is for SSIS 2005 and not 2008. Is this true?

We are running SSIS 2008.

Your instruction:
"In the resulting XML, change the Build Settings ReferencePath property to:" cannot be found in the XML code in my SSIS 2008 package after making the script task.

How can I accomplish this same thing you instructed in a 2008 package? Can it be done?

Thanks,

David

Kiran Ramaswamy said...

Old thread I know, but helped me quite a bit. Thanks!

laurens said...

Jessica M. Moss said...

Hello,

Here are a couple of things to try... The first is that you might want to look at setting PivotCache.BackgroundQuery = False on any pivot tables you have - those are set to run in the background, so may be where the issue lies. The other is to try inserting a Thread.Sleep() for a few milliseconds/seconds after you call the Refresh for debugging purposes.
----------------------

Thank you Jessica!! After hours of surfing the web... Well, I found you blog post as the first reference but did not went deep enough in the comments.
Thread.sleep did it for me cause I had troubles with a 'too early close' in Excel 2007/2010.

My code:
oApp.DisplayAlerts = False
wb.RefreshAll()
Threading.Thread.Sleep(10000)
wb.Save()
wb.Close()

Anonymous said...

Hi Jessica... I'm running a macro stored in an excel sheet via SSIS. All I'm doing is opening the excel sheet (.xlsm file) running the macro by running the code ExcelObject.Run("Macro1"). However, its returning a COM release error and unable to release excel process. If I simply open and close the the excel sheets it works fine. Only when I try to try to run the macro (which takes about 5-10 mins to complete running) it gives me the error. Please help as its a critical piece of automation for me.
P.S. - I have already installed Office 2010 Interlop assemblies

Thanks in advance,
Aanjaney

Anonymous said...

You dont need to have a thread that sleeps if you disable all of the background updates from the connections in the excel files.

Ganesh said...

Hi,

I am trying to use Excel Interop with SSIS and i am getting an error "unable to load Interop" when i run the package on a 64bit SQL Server 2008.

Do i have to install excel on the server to use excel interop?

Thanks.

Amar said...

Hi Jessica, I tried your solution and two things. I am unable to find the build settings reference section in the XML of the package and two when i try to run the script task ssis throws the following error:

at System.RuntimeMethodHandle._InvokeMethodFast(IRuntimeMethodInfo method, Object target, Object[] arguments, SignatureStruct& sig, MethodAttributes methodAttributes, RuntimeType typeOwner)
at System.RuntimeMethodHandle.InvokeMethodFast(IRuntimeMethodInfo method, Object target, Object[] arguments, Signature sig, MethodAttributes methodAttributes, RuntimeType typeOwner)
at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean skipVisibilityChecks)
at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
at System.RuntimeType.InvokeMember(String name, BindingFlags bindingFlags, Binder binder, Object target, Object[] providedArgs, ParameterModifier[] modifiers, CultureInfo culture, String[] namedParams)
at Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTATaskScriptingEngine.ExecuteScript()

I am trying to the very simple task of opening an macro enabled excel file and refreshing all data and closing it. my code is as shown below:

public void Main()
{
Dts.TaskResult = (int)ScriptResults.Success;

Microsoft.Office.Interop.Excel.Application oExcel = new Microsoft.Office.Interop.Excel.Application();
Microsoft.Office.Interop.Excel.Workbook oBook = new Microsoft.Office.Interop.Excel.Workbook();

String sDataSheet = (String)Dts.Variables["User::ExcelOutputFilePath"].Value;

oBook = oExcel.Workbooks.Open(@"strDataSheet");

oBook.RefreshAll();
oBook.Save();
oBook.Close();
oExcel.Quit();

System.Runtime.InteropServices.Marshal.FinalReleaseComObject(oExcel);


}

Please help.

Anonymous said...

Hi Jessica, I tried your solution and I got two errors and one warnning:
Error: 0x4 at Script Task: The binary code for the script is not found.
Error: 0xC0024107 at Script Task: There were errors during task validation.
Warning: 0x80019002 at Package_REFRESH_EXCEL_FILE: SSIS Warning Code DTS_W_MAXIMUMERRORCOUNTREACHED.



I run like admin
the system is win 7/64bit with SQL 2008 r2
I had added the Microsoft.Office.Interop.Excel without problem
The file is share locally
I am using the same code


What could be am missing?

need help
thanks

Anonymous said...

IS a command line to disable alert message ??

Anonymous said...

How do you handle the refresh of the Excel file if it connects to SQL Server. Windows Authentication is not used to login to SQL Server and there is a window prompt "SQL Server Login" for the "Login ID" and "Password". Thanks, Angel

Jessica M. Moss said...

Comments are now closed on this post.