Month: February 2017

How To Remove Duplicate Records In SQL Server Using SSIS

Posted on Updated on

There are multiple ways to remove duplicate records in SQL Server.We can eliminate duplicate rows by using Sort and aggregate transformation. First we’ll create some sample records.

IF OBJECT_ID(N'SSISTEST','U') IS NOT NULL
DROP DATABASE SSISTEST
GO
CREATE DATABASE SSISTEST
GO
USE SSISTEST
GO
IF OBJECT_ID(N'RemovingDuplicates','U') IS NOT NULL
DROP TABLE RemovingDuplicates
GO
CREATE TABLE RemovingDuplicates( Dealer varchar(20)
                                 ,Make varchar(20))
GO
INSERT INTO RemovingDuplicates(Dealer,Make)VALUES ('MGroup','HYUNDAI')

INSERT INTO RemovingDuplicates(Dealer,Make) VALUES ('pqrGroup','HONDA')

INSERT INTO RemovingDuplicates(Dealer,Make)VALUES ('pqrGroup','HONDA')

 INSERT INTO RemovingDuplicates(Dealer,Make)VALUES ('pqrGroup','FORD')

INSERT INTO RemovingDuplicates(Dealer,Make)VALUES ('ABCgroup','FORD')
SELECT * FROM RemovingDuplicates
Dealer Make
MGroup HYUNDAI
pqrGroup HONDA
pqrGroup HONDA
pqrGroup FORD
ABCgroup FORD

The highlighted records are duplicates in our case.

Using aggregate Transformation  

  1. First we create a package named NonDuplicateData.dtsx.
  2. Next Drag and drop data-flow task on designer.
  3. Right click the Data Flow task and choose Edit.
  4. You are now inside the data flow task. Drag an OLEDB source task from the SSIS toolbox to the design screen and make a connection and select the table. To preview data, Click on preview

1_remove-duplicate_ssis

5. Now Drag aggregate transformation and configure the Aggregate transformation as shown below. Take the count of records and do a group by all the columns present in the table.

2remove-duplicate_ssis

  1. Now drag and drop Data Conversion and connect the precedence constraint between aggregate and Data Conversion.

Right click on the precedence constraint between Data Conversion and Derived column and click Enable Date Viewer. This will allow us to view the data as it passes through the  constraint.

.3remove-duplicate_ssis

Using Sort Transformation:

The Sort Transformation in SSIS is used to sort the source data in either Ascending or Descending order, which is similar to T-SQL command ORDER BY statement.

Follow same steps up to 4.

Now instead of aggregate transformation use sort transformation and connect it with Data Conversion and Enable Date Viewer.

4_remove-duplicate_ssis

Check remove rows with duplicate sort values check box to remove duplicate values.

5_remove-duplicate_ssis

 

 

 

 

 

Advertisements

Verifying updated values in SQL SERVER

Posted on Updated on

In most cases after table is updated we fire a select statement to check whether the value is updated correctly or not.

We can accomplish the same task by using OUTPUT clause in SQL server. Using Output clause we can verify the original value and updated values.

Here I’ll update column  MaritalStatus of employee table of AdventureWorks2008 database.

Before doing update we’ll check the values of MaritalStatus Employee table.

SELECT   BusinessEntityID
       , MaritalStatus

FROM  [HumanResources].[Employee]

WHERE BusinessEntityID IN ( 1,2,4)

OUTPUT:

BusinessEntityID MaritalStatus
1 S
2 S
4 S

 

In below query we are updating MaritalStatus from ‘S’ to ‘M’

UPDATE [HumanResources].[Employee]

SET MaritalStatus ='M'

OUTPUT  deleted.BusinessEntityID
       ,deleted.MaritalStatus AS BeforeUpdateMaritalStatus
       ,inserted.MaritalStatus AS AfterUpdateMaritalStatus
WHERE  MaritalStatus='S'
AND BusinessEntityID IN ( 1,2,4)

 

After executing the query we’ll get below  output which shows values before and after update.

BusinessEntityID BeforeUpdateMaritalStatus AfterUpdateMaritalStatus
1 S M
2 S M
4 S M

We again cross verify by running below query.

SELECT  BusinessEntityID
        ,MaritalStatus 
FROM  [HumanResources].[Employee]
WHERE BusinessEntityID IN ( 1,2,4)
BusinessEntityID MaritalStatus
1 M
2 M
4 M

So now we can use output clause directly instead of using select statements to verify updated values and original values.