Tuesday, January 29, 2013

SQL Indexes

Indexes allow SQL Server to quickly find rows in a table based on key values.
There are two types of indexes in SQL Server, clustered and nonclustered indexes.

Clustered Indexes
A clustered index is an index whose leaf nodes, that is the lowest level of the index, contain the actual data pages of the underlying table. Hence the index and the table itself are, for all practical purposes, one and the same. Each table can have only one clustered index. When a clustered index is used to resolve a query, SQL Server begins at the root node for the index and traverses the intermediate nodes until it locates the data page that contains the row it's seeking.
Many database designs make prolific use of clustered indexes. In fact, it is generally considered a best practice to include a clustered index on each table.
Lets’ consider an example. In the Customers table has a clustered index defined on the Customer_ID column. When a query is executed that searches by the Customer_ID column, SQL Server navigates through the clustered index to locate the row in question and returns the data. This can be seen in the Clustered Index Seek operation in the query’s Execution Plan.
SELECT Customers.Last_name,
Customer.First_name,
Customers.Email_Address
FROM
Customers
WHERE
Customers.CustomerID = 123

Nonclustered indexes
Nonclustered indexes use a similar methodology to store indexed data for tables within SQL Server. However in a nonclustered index, the lowest level of the index does not contain the data page of the table. Instead, it contains the information that allows SQL Server to navigate to the data pages it needs. For tables that have a clustered index, the leaf node of the nonclustered index contains the clustered index keys. In the previous example, the leaf node of a nonclustered index on the Customers table would contain the Customer_ID key. If the underlying table does not have a clustered index (this data structure is known as a heap), the leaf node of the nonclustered index contains a row locator to the heap data pages.
For example, a nonclustered composite index(ix_Customer_name) is been created on the Customers table encompassing the First_name, Last_name columns.
In this case when a query that searches by customer last name is executed, the SQL Server query optimizer choses to use the ix_Customer_Name index to resolve the query.
Using Nonclustered indexes
As illustrated in the preceding example, nonclustered indexes may be employed to provide SQL Server with an efficient way to retrieve data rows. However, under some circumstances, the overhead associated with nonclustered indexes may be deemed too great by the query optimizer and SQL Server will resort to a table scan to resolve the query.

Covering Indexes
When Key Lookups are detrimental to performance during query resolution for large result sets, we have to consider a query that does not require a Key Lookup. We can modify the previous query so that it no longer selects the Email_Address column.
SELECT Customers.Last_name,
Customer.First_name,
Customers.Email_Address
FROM
Customers
WHERE
Customers.First_name = 'SMITH'
The new execution plan has been streamlined and only uses the ix_Customer_Name nonclustered index which improves the performance of the query considerably.
The observed improvement is due to the fact that the nonclustered index contained all of the required information to resolve the query. No Key Lookups were required. An index that contains all information required to resolve the query is known as a "Covering Index"; it completely covers the query.

Monday, January 28, 2013

What is 'Functional Dependency' in a SQL server database?

A dependency occurs in a database when information stored in the same database table uniquely determines other information stored in the same table. You can also describe this as a relationship where knowing the value of one attribute (or a set of attributes) is enough to tell you the value of another attribute (or set of attributes) in the same table.
Saying that there is a dependency between attributes in a table is the same as saying that there is a functional dependency between those attributes. If there is a dependency in a database such that attribute B is dependent upon attribute A, you would write this as “A -> B”.
For example, In a table listing employee characteristics including Social Security Number (SSN) and name, it can be said that name is dependent upon SSN (or SSN -> name) because an employee's name can be uniquely determined from their SSN. However, the reverse statement (name -> SSN) is not true because more than one employee can have the same name but different SSNs.
Trivial Functional Dependencies
A trivial functional dependency occurs when you describe a functional dependency of an attribute on a collection of attributes that includes the original attribute. For example, “{A, B} -> B” is a trivial functional dependency, as is “{name, SSN} -> SSN”. This type of functional dependency is called trivial because it can be derived from common sense. It is obvious that if you already know the value of B, then the value of B can be uniquely determined by that knowledge.
Full Functional Dependencies
A full functional dependency occurs when you already meet the requirements for a functional dependency and the set of attributes on the left side of the functional dependency statement cannot be reduced any farther. For example, “{SSN, age} -> name” is a functional dependency, but it is not a full functional dependency because you can remove age from the left side of the statement without impacting the dependency relationship.
Transitive Dependencies
Transitive dependencies occur when there is an indirect relationship that causes a functional dependency. For example, ”A -> C” is a transitive dependency when it is true only because both “A -> B” and “B -> C” are true.
Multivalued Dependencies
Multivalued dependencies occur when the presence of one or more rows in a table implies the presence of one or more other rows in that same table. For example, imagine a car company that manufactures many models of car, but always makes both red and blue colors of each model. If you have a table that contains the model name, color and year of each car the company manufactures, there is a multivalued dependency in that table. If there is a row for a certain model name and year in blue, there must also be a similar row corresponding to the red version of that same car.
Importance of Dependencies
Database dependencies are important to understand because they provide the basic building blocks used in database normalization. For example: •For a table to be in second normal form (2NF), there must be no case of a non-prime attribute in the table that is functionally dependendent upon a subset of a candidate key.
•For a table to be in third normal form (3NF), every non-prime attribute must have a non-transitive functional dependency on every candidate key.
•For a table to be in Boyce-Codd Normal Form (BCNF), every functional dependency (other than trivial dependencies) must be on a superkey.
•For a table to be in fourth normal form (4NF), it must have no multivalued dependencies.

What is the difference between 'WHERE' and 'HAVING' clause in sql?

Example:
SELECT state, COUNT(*)
FROM Test
WHERE state IN ('CA', 'LA')
GROUP BY state
ORDER BY state

SELECT state, COUNT(*)
FROM Test
GROUP BY state
HAVING state IN ('CA', 'LA')
ORDER BY state

The WHERE and HAVING produce the same result set. What's the difference? The first query uses the WHERE clause to restrict the number of rows that the computer has to sum up. But the second query sums up all the rows in the table, then uses HAVING to discard the sums it calculated for all states except Texas and Georgia. The first query is obviously the better one, because there is no need to make the computer calculate sums and then throw them away.
Here's the rule. If a condition refers to an aggregate function, put that condition in the HAVING clause. Otherwise, use the WHERE clause.
Here's another rule: You can't use HAVING unless you also use GROUP BY.
SELECT state,SUM(baldue)
FROM Test
GROUP by state
HAVING SUM(baldue) > 0
ORDER BY state

What are cursors ?

Cursors let you move through rows one at a time and perform processing on each row.
SQL Server is very good at handling sets of data. For example, you can use a single UPDATE statement to update many rows of data. There are times when you want to loop through a series of rows a perform processing for each row. In this case you can use a cursor.
Please note that cursors are the SLOWEST way to access data inside SQL Server. They should only be used when you truly need to access one row at a time. The only reason I can think of for that is to call a stored procedure on each row.

What is ACID?

ACID (an acronymn for Atomicity Consistency Isolation Durability) is a concept that Database Professionals generally look for when evaluating databases and application architectures. For a reliable database all this four attributes should be achieved.
Atomicity is an all-or-none proposition.
Consistency guarantees that a transaction never leaves your database in a half-finished state. Isolation keeps transactions separated from each other until they’re finished. Durability guarantees that the database will keep track of pending changes in such a way that the server can recover from an abnormal termination.

Sql Server Triggers

What is a trigger?

A trigger is a special kind of a store procedure that executes in response to certain action on the table like insertion, deletion or updating of data. It is a database object which is bound to a table and is executed automatically. You can’t explicitly invoke triggers. The only way to do this is by performing the required action no the table that they are assigned to.

What are the different types of triggers?

There are three action query types that you use in SQL which are INSERT, UPDATE and DELETE. So, there are three types of triggers and hybrids that come from mixing and matching the events and timings that fire them.
Basically, triggers are classified into two main types:-
(i) After Triggers (For Triggers)
(ii) Instead Of Triggers

(i) After Triggers
These triggers run after an insert, update or delete on a table. They are not supported for views.
AFTER TRIGGERS can be classified further into three types as:
(a) AFTER INSERT Trigger.
(b) AFTER UPDATE Trigger.
(c) AFTER DELETE Trigger.
Example
CREATE TRIGGER trgAfterInsert ON [dbo].[Employee_Test]
FOR INSERT
AS
declare @empid int;
declare @empname varchar(100);
declare @empsal decimal(10,2);
declare @audit_action varchar(100);

select @empid=i.Emp_ID from inserted i;
select @empname=i.Emp_Name from inserted i;
select @empsal=i.Emp_Sal from inserted i;
set @audit_action='Inserted Record -- After Insert Trigger.';

insert into Employee_Test_Audit
(Emp_ID,Emp_Name,Emp_Sal,Audit_Action,Audit_Timestamp)
values(@empid,@empname,@empsal,@audit_action,getdate());

PRINT 'AFTER INSERT trigger fired.'
GO
(ii) Instead Of Triggers
These can be used as an interceptor for anything that anyonr tried to do on our table or view. If you define an Instead Of trigger on a table for the Delete operation, they try to delete rows, and they will not actually get deleted (unless you issue another delete instruction from within the trigger)
INSTEAD OF TRIGGERS can be classified further into three types as:-

(a) INSTEAD OF INSERT Trigger.
(b) INSTEAD OF UPDATE Trigger.
(c) INSTEAD OF DELETE Trigger
Example
CREATE TRIGGER trgInsteadOfDelete ON [dbo].[Employee_Test]
INSTEAD OF DELETE
AS
declare @emp_id int;
declare @emp_name varchar(100);
declare @emp_sal int;

select @emp_id=d.Emp_ID from deleted d;
select @emp_name=d.Emp_Name from deleted d;
select @emp_sal=d.Emp_Sal from deleted d;

BEGIN
if(@emp_sal>1200)
begin
RAISERROR('Cannot delete where salary > 1200',16,1);
ROLLBACK;
end
else
begin
delete from Employee_Test where Emp_ID=@emp_id;
COMMIT;
insert into Employee_Test_Audit(Emp_ID,Emp_Name,Emp_Sal,Audit_Action,Audit_Timestamp)
values(@emp_id,@emp_name,@emp_sal,'Deleted -- Instead Of Delete Trigger.',getdate());
PRINT 'Record Deleted -- Instead Of Delete Trigger.'
end
END
GO

What are the advantages and disadvantages of using Triggers?

Triggers are primarily intended for providing procedural integrity, however
they can be used for several purposes. There are no generalized "pros &
cons" per se with triggers, but in specific situations you might come across
performance problems with locking, serialization and concurrency issues.

Steps for database normalization

What is database normalization?

The underlying ideas in normalization are simple enough . Through normalization we want to design for our relational database a set of files that -
(1) contain all the data necessary for the purposes that the database is to serve.
(2) have as little redundancy as possible
(3) accommodate multiple values for types of data that requires them
(4) permit efficient updates of the data in the database, and
(5) avoid the danger of losing data unknowingly.

What are Database Anomalies?

Insertion Anomaly, Deletion anomaly and Update anomaly

Rules or steps for Normalization

Rule 1: Eliminate Repeating Groups. Make a separate table for each set of related attributes, and give each table a primary key.
Rule 2: Eliminate Redundant Data. If an attribute depends on only part of a multi-valued key, remove it to a separate table.
Rule 3: Eliminate columns not dependent on key. If attributes do not contribute to a description of the key, remove them to a separate table.
Rule 4: Isolate independent multiple relationships. No table may contain two or more 1:n (one-to-many) or n:m (many-to-many) relationships that are not directly related.

Friday, January 25, 2013

SQL Constraints

Constraints let you define the way the Database Engine automatically enforces the integrity of a database. Constraints define rules regarding the values allowed in columns and are the standard mechanism for enforcing integrity. Using constraints is preferred to using DML Triggers, rules, and defaults. The query optimizer also uses constraint definitions to build high-performance query execution plans.

Classes of Constraints

NOT NULL specifies that the column does not accept NULL values. For more information, see Allowing Null Values.
CHECK constraints enforce domain integrity by limiting the values that can be put in a column. For more information, see CHECK Constraints.
A CHECK constraint specifies a Boolean (evaluates to TRUE, FALSE, or unknown) search condition that is applied to all values that are entered for the column. All values that evaluate to FALSE are rejected. You can specify multiple CHECK constraints for each column. The following sample shows creating the constraint chk_id. This constraint additionally enforces the domain of the primary key by making sure that only numbers within a specified range are entered for the key.
CREATE TABLE cust_sample
(
cust_id int PRIMARY KEY,
cust_name char(50),
cust_address char(50),
cust_credit_limit money,
CONSTRAINT chk_id CHECK (cust_id BETWEEN 0 and 10000 )
)
UNIQUE constraints enforce the uniqueness of the values in a set of columns.
In a UNIQUE constraint, no two rows in the table can have the same value for the columns. Primary keys also enforce uniqueness, but primary keys do not allow for NULL as one of the unique values. For more information, see UNIQUE Constraints.
PRIMARY KEY constraints identify the column or set of columns that have values that uniquely identify a row in a table. For more information, see PRIMARY KEY Constraints.
No two rows in a table can have the same primary key value. You cannot enter NULL for any column in a primary key. We recommend using a small, integer column as a primary key. Each table should have a primary key. A column or combination of columns that qualify as a primary key value is referred to as a candidate key.
The following example creates the part_sample table and specifies the part_nmbr field as the primary key.
CREATE TABLE part_sample
(part_nmbr int PRIMARY KEY,
part_name char(30),
part_weight decimal(6,2),
part_color char(15) );

FOREIGN KEY constraints identify and enforce the relationships between tables. For more information, see FOREIGN KEY Constraints.
A foreign key in one table points to a candidate key in another table. In the following example, the order_part table establishes a foreign key that references the part_sample table defined previously.
CREATE TABLE order_part
(order_nmbr int,
part_nmbr int
FOREIGN KEY REFERENCES part_sample(part_nmbr)
ON DELETE NO ACTION,
qty_ordered int);
GO

You cannot insert a row with a foreign key value, except NULL, if there is no candidate key with that value. The ON DELETE clause controls what actions are taken when you try to delete a row to which existing foreign keys point. The ON DELETE clause has the following options: NO ACTION specifies that the deletion fails with an error.
CASCADE specifies that all the rows with foreign keys pointing to the deleted row are also deleted.
SET NULL specifies that all rows with foreign keys pointing to the deleted row are set to NULL.
SET DEFAULT specifies that all rows with foreign keys pointing to the deleted row are set to their default value. For more information, see Defaults.
The ON UPDATE clause defines the actions that are taken if you try to update a candidate key value to which existing foreign keys point. This clause also supports the NO ACTION, CASCADE, SET NULL and SET DEFAULT options.

Column and Table Constraints

Constraints can be column constraints or table constraints. A column constraint is specified as part of a column definition and applies only to that column. The constraints in the previous examples are column constraints. A table constraint is declared independently from a column definition and can apply to more than one column in a table. Table constraints must be used when more than one column must be included in a constraint.

For example, if a table has two or more columns in the primary key, you must use a table constraint to include both columns in the primary key. Consider a table that records events occurring in a computer in a factory. Assume that events of several types can occur at the same time, but that no two events occurring at the same time can be of the same type. This can be enforced in the table by including both the event_type and event_time columns in a two-column primary key, as shown in the following example.
CREATE TABLE factory_process
(event_type int,
event_time datetime,
event_site char(50),
event_desc char(1024),
CONSTRAINT event_key PRIMARY KEY (event_type, event_time) )

Optimizing SQL Server queries

If you’ve worked with SQL Server databases for any amount of time, it’s practically a given that you’ve run into slow running queries. Sometimes the reason for the slow response time is due to the load on the system, and other times it is because the query is not written to perform as efficiently as possible (I find this reason much more common). Learn how you may be able to improve performance by knowing what to look for in your query designs.

1. Operate on small sets

The fewer rows you are dealing with, the quicker you will find what you want. If the query optimizer sees that the number of rows in the table is sufficiently small, no indexes on that table will be used. The optimizer will realize that it is faster to scan the rows in the table to satisfy the request rather than traverse an index structure to find the rows.

2.Limit the columns returned

When returning data in your applications, the less data that is returned, the faster the information is transmitted over the network — this goes for the amount of rows returned, as well as the number of columns returned. This is why I am against using the SELECT * statement, especially in a production environment. In my experience with database administration and development, I have seen very few times that have warranted using a SELECT * statement. The reason for this is twofold. It doesn’t make sense to bring back columns that you are not going to be using. The second reason (which I feel is more important) is that using SELECT * can break existing code.
Consider the following example.
I have an INSERT statement in my production environment. I use a SELECT * statement as a data source in my INSERT statement. This isn’t a big deal because my SourceTable has the same number of columns in it as the DestinationTable.
SELECT INTO DestinationTable
(Fld1, Fld2, Fld3)
SELECT *
FROM SourceTable.

A business situation arises in which I need to add a field to my SourceTable table.
ALTER TABLE SourceTable
ADD Fld4 INT
Adding this new field will break my INSERT statement, which will cause problems in my production environment.

3.Searching for rows

The manner in which rows are searched for in a database table will always be one of the more vital implementations in your database environment. The SQL Server query optimizer will operate much more efficiently for some WHERE statements as compared to other WHERE statements based upon how the statement is written even if the outcome of the statements is the same.
The following example uses the IN() statement to specify a series of values being searched for. For this example, assume the OrderID column as a NonClustered index.
SELECT * FROM ProductSales
WHERE OrderID IN(4494, 4495, 4496)

This statement is exactly the same as using an OR operator to specify the three values being searched for. Either statement will cause SQL Server not to use the index on the field and to cycle through the rows in the table searching for the values. Since the values used in the example are contiguous, I can use the BETWEEN operator instead. This will allow the query optimizer to effectively use the index.
SELECT * FROM ProductSales
WHERE OrderID BETWEEN 4494 AND 4496
In general, most types of exclusion statements in your WHERE clause will cause SQL Server to not be able to use an index. The following are some additional examples: <>, !, OR, NOT IN, NOT EXISTS
The manner in which LIKE statements are used also makes a difference. If you are able to specify the beginning character(s) of the statement you are searching for, you will have a better chance of your statement using an index. If you specify a wildcard(%) before any type of search string, the optimizer will be unable to use an index.
SELECT * FROM Customers WHERE LastName LIKE '%TR%'

4.Date searches

Date searches are sometimes a little tricky to perform in the database; for instance, I have seen numerous situations where date functions are used for date searches. In the following example, the sales records are being retrieved from the SalesHistorry table for August 15, 2005:
SELECT SaleID
FROM SalesHistory
WHERE
MONTH(SaleDate) = 8 AND
YEAR(SaleDate) = 2005 AND
DAY(SaleDate) = 15
The functions in the WHERE clause will cause SQL Server to perform the function on every row being searched, which means the index will not be used. This is why I discourage the returns values from functions to be used as criteria in queries. The following code shows how you can rewrite the statement so that an index is used, and the results are returned in a much quicker fashion.
SELECT SaleID
FROM SalesHistory
WHERE
SaleDate >= '8/15/2005' AND
SaleDate < ‘8/16/2005'

5.Use JOINs rather than subqueries

If possible (and if it makes sense), I suggest using JOIN statements rather than subqueries to improve performance. When a subquery is used as criteria in a SELECT statement, the values returned from the subquery are distinct. Returning a distinct list of values requires additional processing, which can slow down your queries.

6.Use explicit transactions When data manipulation occurs in the database, the actions are written to the transaction log. If your statements are executing many DML statements, it might be a good idea to place them inside of a transaction for performance purposes. Placing the statements inside of a transaction will prevent all of the statements from being written to the transaction log serially. For example, the following statement takes approximately seven seconds to execute on my laptop:
CREATE InsertTable
(
IDCol INT IDENTITY(1,1),
ColVal INT
)
GO
DECLARE @Counter INT
SET @Counter = 1

WHILE @Counter < 15000
BEGIN
INSERT INTO InsertTable(ColVal)
SELECT DATEPART(ms, GETDATE())

SET @Counter = @Counter + 1
END
If I wrap the INSERT statements inside of a transaction, it takes a little less than two seconds to execute. This is because the statements are inside of a transaction rather than committed to a transaction log until the transaction commits. This reduces the number of writes to the log.
DECLARE @Counter INT
SET @Counter = 1
BEGIN TRAN
WHILE @Counter < 15000
BEGIN
INSERT INTO InsertTable(ColVal)
SELECT DATEPART(ms, GETDATE())

SET @Counter = @Counter + 1
END
COMMIT TRAN
Note: I advise you to use this approach with care. If there are too many statements inside a transaction, it will increase the duration of the transaction, which increases the amount of time locks are held on the objects inside of the transaction.

7.Use UNION ALL instead of UNION

When you use the UNION clause to concatenate the results from two or more SELECT statements, duplicate records are removed. This duplicate removal requires additional computing to accomplish. If you are not concerned that your results may include duplicate records, use the UNION ALL clause, which concatenates the full results from the SELECT statements.

8.Use EXISTS when possible

When you need to check for the presence of certain conditions, it is usually faster to use the EXISTS function over COUNT(*). This is because COUNT(*) has to scan all records returned by the statement, while EXISTS will return a true value as soon as it finds a record that meets the criteria.

9.STATISTICS IO

There are different ways to determine the best way to write your queries. Two of my favorite methods are looking at the number of logical reads produced by the query and looking at graphical execution plans provided by SQL Server Management Studio. For determining the number of logical reads, you can turn the STATISTICS IO option ON. Consider this query:
SET STATISTICS IO ON
SELECT * FROM SalesHistory
The following is returned in the Messages window in SQL Server Management Studio: Table 'SalesHistory'. Scan count 1, logical reads 33, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
There are several bits of data returned by STATISTICS IO, but I am really only concerned with the logical reads portion because it will tell me the number of pages read from the data cache. This is the most helpful to me because it will stay constant when I run the same query, which is important because there are sometimes external factors that might vary the execution time of my queries, such as locking by other queries.
When I’m tuning my queries, my goal is to get the number of logical reads as low as possible. Fewer logical reads typically leads to faster execution times.

Sunday, January 20, 2013

How to detect repeated or duplicate elements in an integer array? Dictionary Implementations (C#)

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace DictionaryImplementation
{
class Program
{
static void Main(string[] args)
{

//How to detect repeated or duplicate elements in an integer array?

int[] Nums = { 1, 3, 6, 8, 2, 6, 13, 9, 3, 6 };

duplicates(Nums);
Console.ReadKey();


}

public static void duplicates(int[] input)
{
Dictionary Dic = new Dictionary();

for (int i =0; i if (input.Length != 0 || input.Length != 1)
{
//if there is a key present, then increase the count. Key is the input number, value is the count


if (Dic.ContainsKey(input[i]))
Dic[input[i]]++;

else
{

//if key is not present, add the key to the dictionary and the count is 1

Dic.Add(input[i], 1);
}
}

foreach (var d in Dic)
{
//iterate through the dictionary key and value.

Console.WriteLine("the number {0} occurs {1} times", d.Key, d.Value);
}

}

}
}