SSIS

How To Remove Duplicate Records In SQL Server Using SSIS

Posted on Updated on

There are multiple ways to remove duplicate records in SQL Server.We can eliminate duplicate rows by using Sort and aggregate transformation. First we’ll create some sample records.

IF OBJECT_ID(N'SSISTEST','U') IS NOT NULL
DROP DATABASE SSISTEST
GO
CREATE DATABASE SSISTEST
GO
USE SSISTEST
GO
IF OBJECT_ID(N'RemovingDuplicates','U') IS NOT NULL
DROP TABLE RemovingDuplicates
GO
CREATE TABLE RemovingDuplicates( Dealer varchar(20)
                                 ,Make varchar(20))
GO
INSERT INTO RemovingDuplicates(Dealer,Make)VALUES ('MGroup','HYUNDAI')

INSERT INTO RemovingDuplicates(Dealer,Make) VALUES ('pqrGroup','HONDA')

INSERT INTO RemovingDuplicates(Dealer,Make)VALUES ('pqrGroup','HONDA')

 INSERT INTO RemovingDuplicates(Dealer,Make)VALUES ('pqrGroup','FORD')

INSERT INTO RemovingDuplicates(Dealer,Make)VALUES ('ABCgroup','FORD')
SELECT * FROM RemovingDuplicates
Dealer Make
MGroup HYUNDAI
pqrGroup HONDA
pqrGroup HONDA
pqrGroup FORD
ABCgroup FORD

The highlighted records are duplicates in our case.

Using aggregate Transformation  

  1. First we create a package named NonDuplicateData.dtsx.
  2. Next Drag and drop data-flow task on designer.
  3. Right click the Data Flow task and choose Edit.
  4. You are now inside the data flow task. Drag an OLEDB source task from the SSIS toolbox to the design screen and make a connection and select the table. To preview data, Click on preview

1_remove-duplicate_ssis

5. Now Drag aggregate transformation and configure the Aggregate transformation as shown below. Take the count of records and do a group by all the columns present in the table.

2remove-duplicate_ssis

  1. Now drag and drop Data Conversion and connect the precedence constraint between aggregate and Data Conversion.

Right click on the precedence constraint between Data Conversion and Derived column and click Enable Date Viewer. This will allow us to view the data as it passes through the  constraint.

.3remove-duplicate_ssis

Using Sort Transformation:

The Sort Transformation in SSIS is used to sort the source data in either Ascending or Descending order, which is similar to T-SQL command ORDER BY statement.

Follow same steps up to 4.

Now instead of aggregate transformation use sort transformation and connect it with Data Conversion and Enable Date Viewer.

4_remove-duplicate_ssis

Check remove rows with duplicate sort values check box to remove duplicate values.

5_remove-duplicate_ssis

 

 

 

 

 

Task- Bulk Insertion

Posted on Updated on

Bulk insertion is used to transfer data from files to SQL server tables.

In this article, I provide an overview of bulk insertion task of SSIS.

  1. Create a package named BulkInsert.dtsx
  2. Open SSMS and create Employee table in AdventureWorks2012.

USE [AdventureWorks2012]

GO

CREATE TABLE [dbo].[Employee](

[EmpId] [int] NOT NULL,

[JobTitle] [nvarchar](50) NOT NULL,

[BirthDate] [date] NOT NULL,

[MaritalStatus] [nchar](1) NOT NULL,

[Gender] [nchar](1) NOT NULL,

[HireDate] [date] NOT NULL

) ON [PRIMARY]

GO

  1. Back to your dtsx package and drag -n drop Bulk insert task on control flow design pane. The Task editor has red sign on upper right corner of task indicates it’s not configured yet.
  2. Right click on Task and choose edit option or double click on task.

In General provide name LoadEmployeeDetails AND in description load Employee details from flat file.bulk-insertion-task

1

  1. Click on Connection tab. From the Connection drop down box, select.

This will open the Configure OLE DB Connection Manager dialog. Now, you’re going to create a connection to the AdventureWorksDW12 database

2

  1. Click OK to go back to the previous screen, and click OK again to return to the Bulk Insert

Task Editor.

You’ll now see that the Connection Manager you just created has been transposed into the Connection dropdown box.

  1. To Define Destination Table – Click on destination table and select dbo.Employee
  2. Leave the remaining options set to the defaults. The RowDelimiter property option will be

{CR}{LF} (a carriage return) and the ColumnDelimiter property should be set to Comma {,}.

  1. For the File option, again select to create a new Connection Manager.

This will open the File Connection Manager Editor.

3

  1. For the Usage Type, select Existing File. Then point to employee.csv for the File option. Click

OK to return to the editor. Your final Bulk Insert Task Editor screen should look similar to

If you have column header in flat file then go to Options and change first row to 2.

This would start to import data from second row.

Finally right-click on package and execute it. After execution completed go to SSMS and check the table data.