Wednesday, May 23, 2018

How to use CTE to remove duplicates

--One use of Common table Expressions (CTEs) is to remove duplicates
--To see this lets create a table 'Employee'
CREATE table Employee  (name NVARCHAR(100), age INT, salary INT)

--insert some duplicate data into the 'Employee Table'.
INSERT  into Employee
VALUES ('Mark', 42, 120000), ('Susan', 29, 125000), ('Mark', 42, 120000)
--you could see there is a duplicate record in the table by querying the table
--select * from  Employee
--we could use CTE to remove this duplicate. 
WITH CTE_Employee(name, age, salary, rownumber) AS (
select name,   age,   salary,   row_number() OVER (
PARTITION BY name, age, salary ORDER BY name, age, salary
) AS rownumber from Employee
)

DELETE
FROM CTE_Employee
WHERE rownumber <> 1
--query the table to see if there are still duplicates
--select * from  Employee

Saturday, December 16, 2017

Normalization

In database systems, normalization is the organization of data to reduce redundancy and improve data integrity. It is the process of making the database concise by breaking down the information into two or more tables.
 The main objectives of normalization are:
1.To minimize redundancy.
2.To avoid delete, update and insert anomalies.
3.To make the relational model more informative to users
In this post I will try to show how data is normalized with an example in some very easy steps

Lets pretend we have a caffe, and here is a snap shot of a table which shows orders.

The data in the table has some redundancy. For example the unit price of 'Brewed Coffee' is written four times, but logically speaking the unit price of 'Brewed Coffee' could be written only once because it doesn't change with each order. Writing a data twice when it is possible to write it once is exactly what we call data redundancy.





To normalize the data we could create another table which holds the information of the unit price of  each item, and we could have a key which is called a primary key on the table which shows each item.
Lets consider the table on the right. It shows the price of the five items we used in the previous table, we added a new column named 'Item_id'. 'Item_id' uniquely identifies each row. That is our primary key. so Item_id 1 represents the item name 'Brewed Coffee' and its price only. If we have this table at hand then we could only mention the 'Item_id' in the place of the price in the orders table as it is shown in the next table.
The table at the right is the orders table again. But here we didn't write the items' name and their unit prices, instead we wrote only the item_id in the place of the items' name and their prices. For example order 1 is an order of 3 items of item_id 1. What is the name of the item we should refer the table above ( the items list and we know that it is "Brewed coffee" and its price is 2.00. The Item_id in this table is called the foreign reference for it references the table with the list of items above.
That is exactly what normalization is.