Pages

Friday 15 May 2015

Automation Nightmare - Updating Excel Spreadsheet from Powershell

Automation Nightmare - Updating  Excel Spreadsheet from Powershell

The assignment was very simple. All I had to do was to login into SQL Server Management Studio (SSMS), run a query ,  copy and paste the results of the query into an Excel Spreadsheet Template and then save that template using an agreed naming convention. The only problem was that I had to run the query with different parameters and that would mean doing it nearly one hundred times.

So I dutifully began my task and after a few hours I had only extracted the data for about three of the scenarios. The way that I was going about doing the extract would take forever and a day and I did not have that much time.

They say that "If you have a hammer, everything looks like a nail" and so I decided to use my Powershell hammer to nail my data extraction problem, so to speak. As far as I was concerned Powershell was the ideal tool for the job. I could use Powershell to run the queries against the SQL Server database and return the data in a dataset. Once I created the dataset I would then  loop through the data and update each worksheet in the Excel Spreadsheet Template  and then save the spreadsheet in the agreed naming convention.  Life could not be easier.


Without any hesitation I fired up Powershell ISE and in a few minutes wiped up my first function that ran the query against the SQL Server and returned the data in a dataset. I clicked the run script icon in Powershell ISE, typed the required parameters, and almost immediately the results of the query was being displayed in the console. Now how easy was that?

The next function that I wrote would be the one to loop through the dataset and save the data to worksheets in the Excel Spreadsheet Template. The pertinent part of the function is as follows.

foreach ($client in $clients.Tables) {

        if ($client.Columns.Count -eq 1) {

            $strtempfn1 = $StartupLocation + "/Data/" + $client.Rows.Item(0).Item(0) + ".xls"
           
            # if the file exists continue
            # Otherwise open the template and save it as the new file name
            if (Test-Path $strtempfn1) {
                continue
            }
            else {

                New-ExcelSpreadSheet -Template $PathtoDataTemplate -Name $strtempfn1
                continue

            }
          
        }
       
        # Open the saved excel spreadsheet and update it with the client records
        $ClientXls = New-Object -Com Excel.Application
        $ClientWorkbook = $ClientXls.Workbooks.Open($strtempfn1)
        $page = 'Clients'
        $ws = $ClientWorkbook.worksheets | Where-Object { $_.Name -eq $page }

        # Set variables for the worksheet cells, and for navigation
        # Start at second row which is below the header
        $cells = $ws.Cells
        $row = 2
        $col = 1

        $rowcount = $client.Rows.Count

        for ($i = 0; $i -lt $rowcount; $i++) {
            $col = 1

            $cells.item($row,$col) = $client.Rows[$i]["Client No"]
            $col++
            $cells.item($row,$col) = $client.Rows[$i]["First Name"]
            $col++
            $cells.item($row,$col) = $client.Rows[$i]["Middle Name"]
            $col++
            $cells.item($row,$col) = $client.Rows[$i]["Last Name"]
            $col++
            $cells.item($row,$col) = $client.Rows[$i]["Date of Birth"]
            $col++
            $cells.item($row,$col) = $client.Rows[$i]["Gender"]
            $col++
            $cells.item($row,$col) = $client.Rows[$i]["Address 1"]
            $col++
            $cells.item($row,$col) = $client.Rows[$i]["Address 2"]
            $col++
            $cells.item($row,$col) = $client.Rows[$i]["City"]
            $col++
            $cells.item($row,$col) = $client.Rows[$i]["Zip"]
            $col++
            $cells.item($row,$col) = $client.Rows[$i]["State/Province"]
            $col++
            $cells.item($row,$col) = $client.Rows[$i]["Mobile Number"]
            $col++
            $cells.item($row,$col) = $client.Rows[$i]["Phone"]
            $col++
            $cells.item($row,$col) = $client.Rows[$i]["E-Mail Address"]
            $col++
            $cells.item($row,$col) = $client.Rows[$i]["ID Type"]
            $col++
            $cells.item($row,$col) = $client.Rows[$i]["ID Number"]
            $col++
            $cells.item($row,$col) = $client.Rows[$i]["ID Valid Until"]
            $col++
            $cells.item($row,$col) = $client.Rows[$i]["Age"]
           
            $row++

        }

        # Set the width of the columns automatically
        $wsRng = $ws.usedRange
        $wsRng.EntireColumn.AutoFit()
       
        # Close the workbook and exit Excel
        $wsRng = $null
        $cells = $null
        $ws = $null

        [void]$ClientWorkbook.Close($true)
        $ClientWorkbook = $null

        [void]$ClientXls.Quit()
       
        [void][System.Runtime.Interopservices.Marshal]::ReleaseComObject($ClientXls)

        $ClientXls = $null
      
 }

The function did not do anything extraordinary. All that it would do is to use COM to create an Excel application, and then loop through each row in the dataset and write the data to the correct row and column in the worksheet.

My first test was to update a single worksheet  in the Excel Spreadsheet Template and I was quite happy with how fast the process was. My next test was to update all of the worksheets  in the Excel Spreadsheet Template. I clicked the run script icon in the Powershell ISE and I waited, and waited, and waited, and waited some more. "Ok, this is taking longer than I expected, " I said. "Let's just check the SQL Server to ensure that my query was not being blocked."

A quick check of the SQL Server using  SSMS showed that there was nothing happening there. The query had already been executed. So what could be the problem? I was stumped, and then it dawned on me. Could it be that opening up a spreadsheet and updating each row and column the way that I was doing it was the cause of the problem? I decided to wait and see how long it would take to create the spreadsheet.

About thirty minutes later the data extract was done and my spreadsheet had been created. I opened the spreadsheet and confirmed that the worksheets had been updated as I had wanted them to be, but why was the process so slow? A quick google showed Excel Automation via Powershell was painfully slow. In fact so slow that I would be better off not automating the data extraction process. It looked  like the Excel Automation had driven a stake through the heart of my Powershell Automation euphoria, and I did not like it one bit.

Was there another way? Had Powershell based automation finally met its match? Not quite ready to give up I decided to spend some time trying to figure out why updating the Excel spreadsheet via Powershell was so slow, because copying and pasting the same data from the results window in SSMS into the Excel Spreadsheet did not take any time at all.

Hold on one minute! Copy and paste, that was it! I did a quick goolge and came across this link that described the exact problem that I was facing. I fired backup Powershell ISE and modified the function that created the Excel spread and the relevant part follows.
  

foreach ($client in $clients.Tables) {

        if ($client.Columns.Count -eq 1) {

            $strtempfn1 = $StartupLocation + "/Data/" + $client.Rows.Item(0).Item(0) + ".xls"
           
            # if the file exists continue
            # Otherwise open the template and save it as the new file name
            if (Test-Path $strtempfn1) {
                continue
            }
            else {

                New-ExcelSpreadSheet -Template $PathtoDataTemplate -Name $strtempfn1
                continue

            }
            #continue
        }
       
        $clientdata = ""
        $rowcount = $client.Rows.Count

        for ($i = 0; $i -lt $rowcount; $i++) {

                $clientdata += $client.Rows[$i]["Client No"].ToString() + "`t"
                $clientdata += $client.Rows[$i]["First Name"].ToString()  + "`t"
                $clientdata += $client.Rows[$i]["Middle Name"].ToString()  + "`t"
                $clientdata += $client.Rows[$i]["Last Name"].ToString()  + "`t"
                $clientdata += $client.Rows[$i]["Date of Birth"].ToString()  + "`t"
                $clientdata += $client.Rows[$i]["Gender"].ToString()  + "`t"
                $clientdata += $client.Rows[$i]["Address 1"].ToString()  + "`t"
                $clientdata += $client.Rows[$i]["Address 2"].ToString()  + "`t"
                $clientdata += $client.Rows[$i]["City"].ToString()  + "`t"
                $clientdata += $client.Rows[$i]["Zip"].ToString()  + "`t"
                $clientdata += $client.Rows[$i]["State/Province"].ToString()  + "`t"
                $clientdata += $client.Rows[$i]["Mobile Number"].ToString()  + "`t"
                $clientdata += $client.Rows[$i]["Phone"].ToString()  + "`t"
                $clientdata += $client.Rows[$i]["E-Mail Address"].ToString()  + "`t"
                $clientdata += $client.Rows[$i]["ID Type"].ToString()  + "`t"
                $clientdata += $client.Rows[$i]["ID Number"].ToString()  + "`t"
                $clientdata += $client.Rows[$i]["ID Valid Until"].ToString()  + "`t"
                $clientdata += $client.Rows[$i]["Age"].ToString()
           
                $clientdata += "`r`n"

        }


        # Clear the clipboard and copy the client data to it
        [System.Windows.Forms.Clipboard]::Clear()
        [System.Windows.Forms.Clipboard]::SetText($clientdata)

        # Open the saved excel spreadsheet and update it with the client records
        CopyTo-ExcelSpreadSheet -WorkSheetName 'Clients' -Name $strtempfn1
      
}


# Copy Clipboard data to excel spreadsheet
function CopyTo-ExcelSpreadSheet {
    Param (
        $WorkSheetName,
        $Name
    )

    # Open the saved excel spreadsheet and update it with the data
    $UpdateXls = New-Object -Com Excel.Application
    $UpdateXls.Visible = 0;
    $UpdateWorkbook = $UpdateXls.Workbooks.Open($Name)

    # Open the Clienst worksheet
    $ws = $UpdateWorkbook.WorkSheets.Item($WorkSheetName)

    # Copy the data from the clipboard to the worksheet
    # Staring at the second row, because the first row has the headers
    $ws.activate() 
    $range = $ws.Range("A2")
    $ws.Paste($range

    # Set the width of the columns automatically
    $wsRng = $ws.usedRange
    $wsRng.EntireColumn.AutoFit()
       
    # Close the workbook and exit Excel
    $wsRng = $null
    $cells = $null
    $ws = $null

    [void]$UpdateWorkbook.Close($true)
    $UpdateWorkbook = $null

    [void]$UpdateXls.Quit()
       
    [void][System.Runtime.Interopservices.Marshal]::ReleaseComObject($UpdateXls)
    $UpdateXls = $null
   

}

It was time to test once again. The first test that I did was to update a single worksheet in the Excel  Spreadsheet Template and that went through very quickly. However I had been here before and the real test would be when I tried to update all of the worksheets in the Excel Spreadsheet Template.

I fired off my next test and like magic the creation of the Excel spreadsheet happened in seconds. I opened the created spreadsheet and I was able to see that all of the worksheets  had been updated. My final test was to create all of the one hundred spreadsheets and fifty minutes later I had created them all.

I randomly opened some of the spreadsheets and checked that the worksheets had been updated in the way that I expected them to be and it was indeed so.

So what did I learn from this experience? " A worker may be the hammer's master, but the hammer still prevails. A tool knows exactly how it is meant to be handled, while the user of the tool can only have an approximate idea." - Milan Kundera.


In other words you have to how to handle the hammer so that it can be effective in every situation.

No comments:

Post a Comment