How To Remove Duplicate Records In SQL Server Using SSIS

Posted on Updated on

There are multiple ways to remove duplicate records in SQL Server.We can eliminate duplicate rows by using Sort and aggregate transformation. First we’ll create some sample records.

IF OBJECT_ID(N'SSISTEST','U') IS NOT NULL
DROP DATABASE SSISTEST
GO
CREATE DATABASE SSISTEST
GO
USE SSISTEST
GO
IF OBJECT_ID(N'RemovingDuplicates','U') IS NOT NULL
DROP TABLE RemovingDuplicates
GO
CREATE TABLE RemovingDuplicates( Dealer varchar(20)
                                 ,Make varchar(20))
GO
INSERT INTO RemovingDuplicates(Dealer,Make)VALUES ('MGroup','HYUNDAI')

INSERT INTO RemovingDuplicates(Dealer,Make) VALUES ('pqrGroup','HONDA')

INSERT INTO RemovingDuplicates(Dealer,Make)VALUES ('pqrGroup','HONDA')

 INSERT INTO RemovingDuplicates(Dealer,Make)VALUES ('pqrGroup','FORD')

INSERT INTO RemovingDuplicates(Dealer,Make)VALUES ('ABCgroup','FORD')
SELECT * FROM RemovingDuplicates
Dealer Make
MGroup HYUNDAI
pqrGroup HONDA
pqrGroup HONDA
pqrGroup FORD
ABCgroup FORD

The highlighted records are duplicates in our case.

Using aggregate Transformation  

  1. First we create a package named NonDuplicateData.dtsx.
  2. Next Drag and drop data-flow task on designer.
  3. Right click the Data Flow task and choose Edit.
  4. You are now inside the data flow task. Drag an OLEDB source task from the SSIS toolbox to the design screen and make a connection and select the table. To preview data, Click on preview

1_remove-duplicate_ssis

5. Now Drag aggregate transformation and configure the Aggregate transformation as shown below. Take the count of records and do a group by all the columns present in the table.

2remove-duplicate_ssis

  1. Now drag and drop Data Conversion and connect the precedence constraint between aggregate and Data Conversion.

Right click on the precedence constraint between Data Conversion and Derived column and click Enable Date Viewer. This will allow us to view the data as it passes through the  constraint.

.3remove-duplicate_ssis

Using Sort Transformation:

The Sort Transformation in SSIS is used to sort the source data in either Ascending or Descending order, which is similar to T-SQL command ORDER BY statement.

Follow same steps up to 4.

Now instead of aggregate transformation use sort transformation and connect it with Data Conversion and Enable Date Viewer.

4_remove-duplicate_ssis

Check remove rows with duplicate sort values check box to remove duplicate values.

5_remove-duplicate_ssis

 

 

 

 

 

Advertisements

Verifying updated values in SQL SERVER

Posted on Updated on

In most cases after table is updated we fire a select statement to check whether the value is updated correctly or not.

We can accomplish the same task by using OUTPUT clause in SQL server. Using Output clause we can verify the original value and updated values.

Here I’ll update column  MaritalStatus of employee table of AdventureWorks2008 database.

Before doing update we’ll check the values of MaritalStatus Employee table.

SELECT   BusinessEntityID
       , MaritalStatus

FROM  [HumanResources].[Employee]

WHERE BusinessEntityID IN ( 1,2,4)

OUTPUT:

BusinessEntityID MaritalStatus
1 S
2 S
4 S

 

In below query we are updating MaritalStatus from ‘S’ to ‘M’

UPDATE [HumanResources].[Employee]

SET MaritalStatus ='M'

OUTPUT  deleted.BusinessEntityID
       ,deleted.MaritalStatus AS BeforeUpdateMaritalStatus
       ,inserted.MaritalStatus AS AfterUpdateMaritalStatus
WHERE  MaritalStatus='S'
AND BusinessEntityID IN ( 1,2,4)

 

After executing the query we’ll get below  output which shows values before and after update.

BusinessEntityID BeforeUpdateMaritalStatus AfterUpdateMaritalStatus
1 S M
2 S M
4 S M

We again cross verify by running below query.

SELECT  BusinessEntityID
        ,MaritalStatus 
FROM  [HumanResources].[Employee]
WHERE BusinessEntityID IN ( 1,2,4)
BusinessEntityID MaritalStatus
1 M
2 M
4 M

So now we can use output clause directly instead of using select statements to verify updated values and original values.

How to get working days between two dates in SQL Server

Posted on Updated on

This post is about how to find the working day between Two Dates.

Consider the working days from Monday to Friday. We know that DATEDIFF function is used to calculate the Difference between two dates.

CREATE FUNCTION [dbo].fn_CountOfWorkingDays
( @StartDate DATETIME
  ,@EndDate  DATETIME
)
RETURNS  INT
AS
BEGIN
     DECLARE @CountOfWorkingDays INT
     SELECT @CountOfWorkingDays= ((DATEDIFF(dd, @StartDate, @EndDate) + 1) 
 -(DATEDIFF(wk, @StartDate, @EndDate) * 2)
 -(CASE WHEN DATENAME(dw, @StartDate) = 'Sunday' THEN 1 ELSE 0 END)
 -(CASE WHEN DATENAME(dw, @EndDate) = 'Saturday' THEN 1 ELSE 0 END))
RETURN @CountOfWorkingDays
END

Output:

SELECT [dbo].[fn_CountOfWorkingDays] ( '2017-01-01','2017-01-31') CountOfWorkingDays 
GO
CountOfWorkingDays
22

In above function  DATEDIFF(dd,@StartDate,@EndDate ) + 1 gives us  the number of days between the two dates and  DATEDIFF(wk,@StartDate,@EndDate ) this gives the number of weeks between dates and * 2 gives us the weekend (Saturday and Sunday) count.CASE statement will excludes the day if it’s a Saturday or Sunday.

In case you want to calculate working days from Monday to Saturday then just remove last case statement and * 2 from  (DATEDIFF(wk, @StartDate, @EndDate) * 2)

ALTER FUNCTION [dbo].fn_CountOfWorkingDays
( @StartDate DATETIME, 
 @EndDate    DATETIME )
RETURNS  INT
AS
BEGIN
 DECLARE @CountOfWorkingDays INT
 SELECT @CountOfWorkingDays=
             ((DATEDIFF(dd, @StartDate, @EndDate) + 1)
 -(DATEDIFF(wk, @StartDate, @EndDate) )
 -(CASE WHEN DATENAME(dw, @StartDate) = 'Sunday' THEN 1 ELSE 0 END) )
RETURN @CountOfWorkingDays
END

Output:

SELECT [dbo].[fn_CountOfWorkingDays] ( '2017-02-01','2017-02-28') CountOfWorkingDays 
GO
CountOfWorkingDays
24

 

Whats the Difference Between Detaching And Taking Database offline?

Posted on Updated on

Once you detach your database there will not be an entry in the sysdatabases system table.
i.e you can query and confirm the same as

 Select * from master..sysdatabases

In the end result of the above query you cannot see the db you have detached whereas
if you bring the database offline, there will be an entry present in the sysdatabases system table;
You can verify the same by running the above query and you can see that the db you took OFFLINE will be present in the sysdatabases table !

Basically, detaching/attaching is used to move database files between instances(servers) where as in Offline you can not move files. After detaching, the database will be removed from SQL Server but will be intact within the data and transaction log files that compose the database. The purpose of these commands are totally different though it looks like same.

Rename Table In SQL SERVER

Posted on Updated on

We can rename table by using T-SQL or SQL server management Studio.

Before renaming any object in SQL server we have to check the dependency of that object.

We can check it by using sp_depends  Or sys.sql_expression_dependencies this DMV.

After renaming table we have to change the table name in each queries, stored procedure, views, function and all other objects where it was being referenced.

Using SQL Server Management Studio

To rename a table

  1. In Object Explorer, right-click the table you want to rename and choose Design from the shortcut menu.
  2. From the View menu, choose Properties.
  3. In the field for the Name value in the Properties window, type a new name for the table.
  4. To cancel this action, press the ESC key before leaving this field.
  5. From the File menu choose Save table name.

 or Simply select the object and press F2 key and type the new name.

Using T SQL:

USE YourDatabaseName;

GO

EXEC sp_rename ‘Schema_name.old_Table_name’, ‘new_Table_name ‘;

Concatenate Rows using FOR XML PATH()

Posted on Updated on

To concat multiple rows of data into single row we can use XML PATH(”).

SELECT ‘,’ + Column_Name … FOR XML PATH(”)

Generates a set of comma separated values, based on combining multiple rows of data from the Column_Name column. It will produce a value  ,abc,pqr,

STUFF(…,1,1,”)

Is then used to remove the leading comma that the previous trick generated, see STUFF for details about its parameters.

The STUFF function inserts a string into another string. It deletes a specified length of characters in the first string at the start position and then inserts the second string into the first string at the start position.

For example :

We will create a Temp table and insert some sample data.

DECLARE @ConcateTest TABLE(Id INT,CityName VARCHAR(20))

INSERT @ConcateTest(Id,CityName)

VALUES(1,’Los Angeles’),(1,’Austin’)

,(2,’Houston’),(2,’         Fort Worth’)

,(3,’New York’),(3,’Dallas’)

,(4,’Atlanta’),(4,’cape town’)

SELECT DISTINCT ID ,

STUFF((SELECT ‘,’+CityName

FROM @ConcateTest CT1

WHERE CT1.ID=CT2.ID

FOR XML PATH(”)),1,1,”) AS CityName FROM @ConcateTest CT2

ID CityName
1 Los Angeles,Austin
2 Houston,      Fort Worth
3 New York,Dallas
4 Atlanta,cape town

 

T-SQL Queries to find SQL Server Cluster Nodes and Shared Drives

Posted on Updated on

Here are some SQL Server cluster related T-SQL queries which helps you to find SQL Server Cluster Nodes and their shared drives.

  1. Find name of the Node on which SQL Server Instance is Currently running
    SELECT SERVERPROPERTY('ComputerNamePhysicalNetBIOS') AS[CurrentNodeName]

    If the server is not cluster, then the above query returns the Host Name of the Server.

  2. Find SQL Server Cluster Nodes
    a. Using Function

    SELECT * FROM fn_virtualservernodes()

    b. Using DMV

    SELECT * FROM sys.dm_os_cluster_nodes
  3. Find SQL Server Cluster Shared Drives
    a. Using Function

    SELECT * FROM fn_servershareddrives()

    b. Using DMV

    SELECT * FROM sys.dm_io_cluster_shared_drives

Fix: Error: 15138 – The database principal owns a schema in the database, and cannot be dropped

Posted on Updated on

Fix: Error: 15138 – The database principal owns a schema in the database, and cannot be dropped

Recently I was dropping orphan user from database SSISTEST and Got below error:

The database principal owns a schema in the database, and cannot be dropped. (Microsoft SQL Server, Error: 15138).

The error message clearly tell us that the user is associated with one of the database schema, So to delete this user we have to transfer this schema to another user.

To find which schema is associated with that user use below query:

USE SSISTEST

GO

SELECT s.Name

FROM sys.schemas s

WHERE s.principal_id = USER_ID(‘test’);

This query will give below output:

name
db_ddladmin
db_datareader
db_datawriter

 

Then use the output to assign to other user let’s say dbo.

ALTER AUTHORIZATION ON SCHEMA:: db_ddladmin  TO dbo

GO

ALTER AUTHORIZATION ON SCHEMA:: db_datareader  TO dbo

GO

ALTER AUTHORIZATION ON SCHEMA:: db_datawriter  TO dbo

 

DROP USER test

Generic Script:

SELECT s.name

FROM sys.schemas s

WHERE s.principal_id = USER_ID(‘YourUserID’);

Now replace the result name in following script:

ALTER AUTHORIZATION ON SCHEMA::YourSchemaName TO dbo;