Tuesday, 29 July 2014

Create and execute SSIS packages with Powershell

Create and execute SSIS packages with Powershell

The other day I was working on a project to migrate the backend of an application to Microsoft SQL Server 2008 R2 from another RDBMS engine. I figured that this would be quite easy to do, but when I tried to migrate the database using the SQL Server Import and Export wizard I started to get errors and the data migration failed. After trying several other methods to migrate the database we finally  decided that we would create a version of the  database in SQL Server and then use SQL Server Integration Services (SSIS) to migrate the data from the other RDBMS engine to Microsoft SQL Server 2008 R2.

This sounded easy enough until I realized that we would have to create over 900 table mappings between the source and destination databases  using the SQL Server Business Intelligence Development Studio (BIDS). I was not looking forward to doing that. So what should I do?

I spent the rest of that afternoon doing some research and I kept coming across blog postings that indicated that I could create an SSIS packages dynamically from a template. This sounded like a good idea and I was intrigued by the concept of automating the creation of all of the SSIS packages that I would need to do the migration of the data from an SSIS template. The only problem with doing that was that I would now have over 900 SSIS packages that I would need to run simultaneously and there was no easy way to do that in BIDS. Bummer!


So it was back to the drawing board to figure out that part.  After a long day of googling and at the point at which I was becoming bleary eyed from reading too many blog postings I finally got the inspiration that I needed from these two posts:
  • http://stackoverflow.com/questions/4878733/assembling-ssis-packages-in-powershell
  •  http://billfellows.blogspot.com/2011/11/powershell-manipulation-of-ssis.html

Both of those postings spoke to manipulating SSIS packages from Powershell and that was when the light bulb went off. The answer to my problem was to use a Powershell runspacepool to create the SSIS package from the template, save it, and then execute it all in one go. Now that sounds like an idea that would work for me.

Creating the SSIS Template

I had to decide what I would need in the SSIS template. At a minimum I would need a Data Flow Task and the connections to the source and the destination databases. The other thing that I decided to do was to ensure that the connection to the source database would be the first connection in SSIS Connection Manager and I did this by putting a "dot" as the first character in the name for the source connection. I also decided to use ADO.NET connections for the source and the destination databases because I was going to use an  ODBC driver to connect to the source RDBMS engine. The following image is an example of the SSIS Template.



One of the "gotchas" that I came across in using the SSIS Template based approach is the fact that BIDS is a 32-bit environment and I was using a 64-bit Powershell environment.  This meant  that  whatever DSN that I setup for the ODBC driver had to be visible from both the 32-bit environment and the 64-bit environment. The best way that I found to overcome that problem was to create  the ODBC DSN as a User DSN and it would be visible in both the 32-bit and the 64-bit environments.

Create and execute the SSIS package with Powershell

I had my SSIS template and I was ready to create my first SSIS package using the SSIS template. However programming SSIS was all new to me and I need some guidance on how to do this. I decided to use the Stack Overflow post as a starting point.

The first thing that I had to do was to load the assemblies and I did this by using [Reflection.Assembly]::LoadWithPartialName() as follows:

[Void][Reflection.Assembly]::LoadWithPartialName('Microsoft.Sqlserver.DTSPipelineWrap')
[Void][Reflection.Assembly]::LoadWithPartialName('Microsoft.SQLServer.DTSRuntimeWrap')
[Void][Reflection.Assembly]::LoadWithPartialName('Microsoft.SQLServer.ManagedDTS')

I know that some persons might ask why not use Add-Type to load the assemblies, and this can be done using the following syntax:

Add-Type -AssemblyName 'Microsoft.SqlServer.DTSPipelineWrap, Version=10.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91'
Add-Type -AssemblyName 'Microsoft.SQLServer.DTSRuntimeWrap, Version=10.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91'
Add-Type -AssemblyName 'Microsoft.SQLServer.ManagedDTS, Version=10.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91'

I used  [Reflection.Assembly]::LoadWithPartialName() to load the required assemblies because it allowed me to use the short version of the assembly name.  If I used Add-Type I would have had to use the long version of the assembly name as shown in the example. I know that [Reflection.Assembly]::LoadWithPartialName() is depreciated and I guess I will have to start using Add-Type if I want my scripts to work in the future :).

The next part of the script loads the actual SSIS template. The important part to note is that I create an Microsoft.SqlServer.Dts.Runtime.Application object and then  I use the LoadPackage() method  of the Application Class to create a Package Class object. Once the Package Class object is created I set a few of the properties of the object.

# Load the template
if ($template -eq $null) {
    $template = '.\Template\ODBCDBImportTemplate.dtsx'
}


# Load the template
$app = New-Object Microsoft.SqlServer.Dts.Runtime.Application
$package = $app.LoadPackage($template, $null)
$package.CreatorComputerName = $env:COMPUTERNAME
$package.Name = "SSIS_" + $tablename

Before I can start to create the ADO.NET source and destination connections I have to unwrap the SSIS object model and I start to do this by accessing the  "Data Flow Task".

# Get the data flow task
$dataflowtask = $package.Executables['Data Flow Task']
$null = $dataflowtask.SetExpression('DefaultBufferMaxRows', '10000')
$null = $dataflowtask.SetExpression('DefaultBufferSize', '104857600')
$null = $dataflowtask.SetExpression('EngineThreads', '2')

Once I have access to the "Data Flow Task" I need to get access to the MainPipeline Class so that I can programmatically build a data flow layout.

# Get the Taskhost
$XferTaskTH = [Microsoft.SqlServer.Dts.Runtime.TaskHost]$dataflowtask

# Get the pipeline
$DataPipe = [System.Runtime.InteropServices.Marshal]::CreateWrapperOfType($XferTaskTH.InnerObject, [Microsoft.SqlServer.Dts.Pipeline.Wrapper.MainPipeClass])

At this point I create the ADO.NET source and destination adapters.
Let us look at the ADO.NET source adapter.  I start by adding a new data flow component.

# Set the ADO.NET Source
$adonetSource = $DataPipe.ComponentMetaDataCollection.New()

I  then set the ComponentClassID to the type of component I want to create, and in this case it is an ADO.NET source component.

$adonetSource.ComponentClassID = "Microsoft.SqlServer.Dts.Pipeline.DataReaderSourceAdapter, Microsoft.SqlServer.ADONETSrc, Version=10.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91"

$instanceSource = [System.Runtime.InteropServices.Marshal]::CreateWrapperOfType($adonetSource.Instantiate(), [Microsoft.SqlServer.Dts.Pipeline.Wrapper.CManagedComponentWrapperClass])

$instanceSource.ProvideComponentProperties()

I then connect the ADO.NET source adapter to the source RDMBS  connection in our SSIS template.

$adonetSource.RuntimeConnectionCollection.Item(0).ConnectionManager = [Microsoft.SqlServer.Dts.Runtime.DtsConvert]::GetExtendedInterface($package.Connections[0])
$adonetSource.RuntimeConnectionCollection.Item(0).ConnectionManagerID = $package.Connections[0].ID


I then point the ADO.NET source adapter to the source table in RDBMS, and we initialize the column mappings.

# Set Source Parameters
$sourcetablename = "`"dbo`".`"$tablename`""
$null = $instanceSource.SetComponentProperty("AccessMode", 0)
$null = $instanceSource.SetComponentProperty("TableOrViewName", "$sourcetablename")

# Refresh the source table metadata
$instanceSource.AcquireConnections($null)
$instanceSource.ReinitializeMetaData()
$instanceSource.ReleaseConnections()

After I have setup the ADO.NET source adapter  I then move on to setup the ADO.NET destination adapter. To do this I add another data flow component.

# Set the ADO.Net Destination
$adonetDestination = $DataPipe.ComponentMetaDataCollection.New()

I then set the ComponentClassID to an ADO.NET destination.

$adonetDestination.ComponentClassID = "Microsoft.SqlServer.Dts.Pipeline.ADONETDestination, Microsoft.SqlServer.ADONETDest, Version=10.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91"

$instanceDestination = [System.Runtime.InteropServices.Marshal]::CreateWrapperOfType($adonetDestination.Instantiate(), [Microsoft.SqlServer.Dts.Pipeline.Wrapper.CManagedComponentWrapperClass])

$instanceDestination.ProvideComponentProperties()

I then connect the ADO.NET destination adapter to the destination SQL Server 2008 R2 connection in our SSIS template.

$adonetDestination.RuntimeConnectionCollection.Item(0).ConnectionManager = [Microsoft.SqlServer.Dts.Runtime.DtsConvert]::GetExtendedInterface($package.Connections[1])
$adonetDestination.RuntimeConnectionCollection.Item(0).ConnectionManagerID = $package.Connections[1].ID

I setup the destination table parameters.

# Set the destination parameters
$destinationtablename = "`"dbo`".`"$tablename`""
$null = $instanceDestination.SetComponentProperty("BatchSize", 0)
$null = $instanceDestination.SetComponentProperty("TableOrViewName", "$destinationtablename")
$null = $instanceDestination.SetComponentProperty("UseBulkInsertWhenPossible", $true)


# Refresh the destination table metadata
$instanceDestination.AcquireConnections($null)
$instanceDestination.ReinitializeMetaData()
$instanceDestination.ReleaseConnections()

Once I have both the ADO.NET source and destination adapters in place I then go ahead and connect both adapters.

# Create the path from source to destination.
$path = $DataPipe.PathCollection.New()
$null = $path.AttachPathAndPropagateNotifications($adonetSource.OutputCollection.Item(0),$adonetDestination.InputCollection.Item(0))

The next step is to map the input and the output columns.

$destInput = $adonetDestination.InputCollection.Item(0)
$destVirInput = $destInput.GetVirtualInput()
$destInputCols = $destInput.InputColumnCollection
$destExtCols = $destInput.ExternalMetadataColumnCollection
$sourceColumns = $adonetSource.OutputCollection.Item(0).OutputColumnCollection

$MapColumnsMessage = "Mapping columns for $tablename"
Write-Verbose $MapColumnsMessage


# The map the source and destination columns
foreach ($outputCol in $sourceColumns) {
    # Get the external column id
    $extCol = $destExtCols.Item($outputCol.Name)
    if ($extCol -ne $null) {
        # Create an input column from an output col of previous component.
        $null = $destVirInput.SetUsageType($outputCol.ID, [Microsoft.SqlServer.Dts.Pipeline.Wrapper.DTSUsageType]::UT_READONLY)
        $inputCol = $destInputCols.GetInputColumnByLineageID($outputCol.ID)
        if ($inputCol -ne $null) {
            # map the input column with an external metadata column
            $null = $instanceDestination.MapInputColumn($destInput.ID, $inputCol.ID, $extCol.ID)
        }
    }
}

Once I am done I go ahead and save the package to the file system.

# Save the package with the table name
$newpackagename = "$tablename.dtsx"
$template = ".\DSTXPackages\$newpackagename"
$app.SaveToXml($template, $package, $null)

$package.Dispose()
Clear-Variable app
Clear-Variable package

This final piece of code demonstrates how to run the saved  SSIS package.

# Create a new app and execute the package
$appex = New-Object Microsoft.SqlServer.Dts.Runtime.Application
$packageex = $appex.LoadPackage($template, $null)
$packageresults = $packageex.Execute()

if ($packageresults -eq 'Success') {
    $SucessMessage = "Sucessfully completed data import for $tablename. Start Time: " + $packageex.StartTime + ". Stop Time: " + $packageex.StopTime
    Write-Verbose $SucessMessage

}
if ($packageresults -eq 'Failure') {
    foreach ($dtserror in $packageex.Errors) {
      $ErrorMessage += $dtserror.Description.ToString()
    }
   
    $FailMessage = "Failed to complete data import for $tablename"
    Write-Error "$FailMessage + [System.Environment]::NewLine + $ErrorMessage"
 
   
   
   
}
$packageex.Dispose()
Clear-Variable appex
Clear-Variable packageex


The SSIS package that was created is shown in the image below.



The following code snippet shows how we would go about creating an OLEDB source and destination adapters.

# Set the OLEDB Source
$oledbSource = $DataPipe.ComponentMetaDataCollection.New()

$oledbSource.ComponentClassID = "DTSAdapter.OLEDBSource.2"

$instanceSource = [System.Runtime.InteropServices.Marshal]::CreateWrapperOfType($oledbSource.Instantiate(), [Microsoft.SqlServer.Dts.Pipeline.Wrapper.CManagedComponentWrapperClass])

$instanceSource.ProvideComponentProperties()

$oledbSource.RuntimeConnectionCollection.Item(0).ConnectionManager = [Microsoft.SqlServer.Dts.Runtime.DtsConvert]::GetExtendedInterface($package.Connections[0])

$oledbSource.RuntimeConnectionCollection.Item(0).ConnectionManagerID = $package.Connections[0].ID



# Set Source Parameters
$sourcetablename = "`"dbo`".`"$tablename`""
$null = $instanceSource.SetComponentProperty("AccessMode", 0)
$null = $instanceSource.SetComponentProperty("OpenRowset", "$sourcetablename")


# Refresh the source table metadata
$instanceSource.AcquireConnections($null)
$instanceSource.ReinitializeMetaData()
$instanceSource.ReleaseConnections()

# Set the OLEDB Destination
$oledbDestination = $DataPipe.ComponentMetaDataCollection.New()

$oledbDestination.ComponentClassID = "DTSAdapter.OLEDBDestination.2"

$instanceDestination = [System.Runtime.InteropServices.Marshal]::CreateWrapperOfType($oledbDestination.Instantiate(), [Microsoft.SqlServer.Dts.Pipeline.Wrapper.CManagedComponentWrapperClass])

$instanceDestination.ProvideComponentProperties()

$oledbDestination.RuntimeConnectionCollection.Item(0).ConnectionManager = [Microsoft.SqlServer.Dts.Runtime.DtsConvert]::GetExtendedInterface($package.Connections[1])

$oledbDestination.RuntimeConnectionCollection.Item(0).ConnectionManagerID = $package.Connections[1].ID

# Set the destination parameters
$destinationtablename = "[dbo].[$tablename]"
$null = $instanceDestination.SetComponentProperty("AccessMode", 3)
$null = $instanceDestination.SetComponentProperty("OpenRowset", "$destinationtablename")
$null = $instanceDestination.SetComponentProperty("FastLoadKeepIdentity", $true)
$null = $instanceDestination.SetComponentProperty("FastLoadOptions", "TABLOCK,CHECK_CONSTRAINTS")


# Refresh the destination table metadata
$instanceDestination.AcquireConnections($null)
$instanceDestination.ReinitializeMetaData()
$instanceDestination.ReleaseConnections()

This blog post outlines the basics of manipulating the SSIS API from Powershell to create a SSIS package from a template. The next series of posts will show how I turned this basic script into a tool to automate the creation of SSIS packages from a template and then execute the SSIS packages from Powershell runspaces to do data migration in parallel threads.