Tuesday, 27 August 2013

A Powershell tool for deleting lines in a csv file based on another csv file

A Powershell tool for deleting lines in a csv file based on another csv file


The other day one of the daily processes that we have to do as a part of our IT Operations failed spectacularly. You see we have to upload a csv file to a transaction processing system so that it could maintain customer information just in case our primary system goes offline.  The problem that we had was that when we tried to upload the file to this system it failed because there were duplicate records already loaded into the database. By the way the systems are dissimilar systems and so some form of replication or log shipping would not have worked in this case. 



This is something that we had seen before and to fix this problem we would run a query against the database to find the duplicate records and then delete the one or two duplicate lines from the file. However this time when we ran the query instead of getting one or two records it returned a few hundred records. This certainly was not good news for us, because there was no way that we could delete a few hundred duplicate lines from the file in time to complete the file upload. Or was there?