SQL Development Tips & Tricks

Wednesday, October 27, 2010

Pivot in SQL Server 2008

Definition: Pivot table is a T-SQL operator that converts one tabled-valued expression into another table.
It transforms the values of a column of a table into multiple columns producing Pivot table.Basically, It rotates the rows into columns and performs aggregation on the desired column.

Use: It is used to produce meaningful informtion from the table data easily and quickly. It converts one table format into another.It is great operator for producing summary tables. It is frequently used in Reports for the means of grouping and aggregation of Report Data.

Annotated Syntax:

SELECT
[first pivoted column] AS ,
[second pivoted column] AS ,
...
[last pivoted column] AS
FROM
(SELECT query that produces the data)
AS (alias for the source query)

PIVOT
(
(aggregation function(column being aggregated))
FOR
[column that contains the values that will become column headers]
IN ( [first pivoted column], [second pivoted column],
... [last pivoted column])
) AS
optional ORDER BY clause

Examples:

Static Pivot Table Query:

Dynamic PIVOT Table Query:

Monday, October 25, 2010

UNION vs UNION ALL

Whenever possible use UNION ALL instead of UNION. The difference is that UNION has a “side-effect” of eliminating all duplicate rows and sorting results, which UNION ALL doesn’t do. Selecting a distinct result requires building a temporary worktable, storing all rows in it and sorting before producing the output. In some cases that’s exactly what you need to do, then UNION is good to use. But if you don’t expect any duplicate rows in the result set, then use UNION ALL. It simply selects from one table or a join, and then selects from another, attaching results to the bottom of the first result set. UNION ALL requires no worktable and no sorting (unless other unrelated conditions cause that). In most cases it’s much more efficient. One more potential problem with UNION is the danger of flooding tempdb database with a huge worktable. It may happen if you expect a large result set from a UNION query.

Example

The following queries select ID for all stores in the sales table, and the ID for all stores in the big_sales table, a version of the sales table that has been populated with over 70,000 rows. The only difference between the two solutions is the use of UNION versus UNION ALL. But the addition of the ALL keyword makes a big difference in the query plan. The first solution requires stream aggregation and sorting the results before they are returned to the client. The second query is much more efficient, especially for large tables. In this example both queries return the same result set, though in a different order.

Although the result sets in this example are interchangeable, you can see that the UNION ALL statement consumed less than half of the resources that the UNION statement consumed (UNION is doing 463 logical reads on g_sales table which takes more resources of SQL Server. UNION ALL is doing only 224 logical read and makes it fast). So be sure to anticipate your result sets and in those that are already distinct, use the UNION ALL clause.

Monday, October 18, 2010

DMV 'Dynamic management views' in SQL Server

What are DMVs?

•Dynamic management views and functions return server state information that can be used to monitor the health of a server instance, diagnose problems, and tune performance

•Introduced in SQL Server 2005 •Two Types

–Server-scoped

–Database-scoped

•Dynamic Management Objects (DMO)

–Dynamic Management Views (DMV)

•Can select like a view

–Dynamic Management Functions (DMF)

•Requires input parameters like a function

When and why use?

•Provides information that was not available in previous versions of SQL

•Provides a simpler way to query the data just like any other view versus using DBCC commands or system stored procedures.

Types of DMVs

•Change Data Capture

•Common Language Runtime

•Database Mirroring •Database

•Execution

•Full-Text Search

•I/O

•Index

•Object

•Query Notifications

•Replication

•Resource Governor

•Security

•Service Broker

•SQL Server Extended Events

•SQL Server Operating System

•Transaction

Get list of all DMOs

SELECT name, type_desc

FROM sys.all_objects

WHERE name LIKE 'dm%'

ORDER BY name

Permissions

•Server-scoped –VIEW SERVER STATE

•Database-scoped –VIEW DATABASE STATE

•DENY takes precedence –DENY state or –DENY SELECT on an object

GRANT / DENY Permissions

• GRANT VIEW SERVER STATE TO loginName

• GRANT VIEW DATABASE STATE TO user

• DENY VIEW SERVER STATE TO loginName

• DENY VIEW DATABASE STATE TO user

• Must create user in MASTER first

DENY SELECT ON sys.dm_os_sys_info TO loginName

Specific Types

•Database

• sys.dm_db_partition_stats

(page and row count information)

•Execution

• sys.dm_exec_sessions

(Information about all active user connections and internal tasks)

• sys.dm_exec_connections

(Information about the connections established)

• sys.dm_exec_requests

(Information about each request that is executing (includes all system

processes))

•Execution – Query Plans

• sys.dm_exec_sql_text (DMF)

(Returns text of SQL batch)

• sys.dm_exec_query_plan (DMF)

(Returns the showplan in XML format)

• sys.dm_exec_query_stats

(Statistics for cached query plans)

• sys.dm_exec_cached_plans

(Each query plan that is cached)

•IO

• sys.dm_io_pending_io_requests

(Shows pending I/O requests)

• sys.dm_io_virtual_file_stats

(Shows I/O stats for data and log files)

•Index

• sys.dm_db_index_operational_stats (DMF)

(Shows I/O, locking and access information such as inserts, deletes, updates…)

• sys.dm_db_index_physical_stats (DMF)

(Shows index storage and fragmentation information)

• sys.dm_db_index_usage_stats (DMV)

(Shows how often indexes are used and for what type of operation)

•Missing Indexes

– sys.dm_db_missing_index_details

– sys.dm_db_missing_index_groups

– sys.dm_db_missing_index_group_stats

– sys.dm_db_missing_index_columns

•SQL Operating System

• sys.dm_os_schedulers

(Information about processors)

• sys.dm_os_sys_info

(Information about the computer and about resources available to and consumed by SQL Server)

• sys.dm_os_sys_memory

(Memory information from the operating system)

• sys.dm_os_wait_stats

(Information about all the waits encountered by threads that executed)

•Clear Wait Stats

DBCC SQLPERF ('sys.dm_os_wait_stats', CLEAR)

• sys.dm_os_buffer_descriptors

(Information about all the data pages that are currently in the SQL Server buffer pool)

I hope this information will be useful for you. Please write your comments and related question. I will be happy to answer them.

Monday, October 11, 2010

SQL Server 2008 MERGE Statement

MERGE Statement

What do you do when your application logic requires that you INSERT a record if the record doesn’t exist, or UPDATE the record if it does exist? In SQL Server 2008, It has been simplified the amount of T-SQL code it requires to perform INSERT/UPDATE logic, by implementing the MERGE statement.

You can do more with the MERGE statement then just inserting and updating records as mentioned above. You can also use it to delete records. Another thing you can do with the MERGE statement is to perform a specific number of merge actions based on a TOP clause. The MERGE statement can also generate output that will identify which records where inserted, updated, or deleted.

Following is an example of using MERGE to perform UPDATE and DELETE operations on a table in a single statement

Here I am using AdventureWorks sample database to show this example.

MERGE can be used to update the ProductInventory table in the AdventureWorks sample database on a daily basis, based on orders that are processed in the SalesOrderDetail table. The following MERGE statement updates the Quantity column of the ProductInventory table by subtracting the number of orders placed each day for each product in the SalesOrderDetail table. If the number of orders for a product drops the product's inventory to 0, the row for that product is deleted from the ProductInventory table.

USE AdventureWorks;

IF OBJECT_ID (N'Production.usp_UpdateInventory', N'P') IS NOT NULL DROP PROCEDURE Production.usp_UpdateInventory;

CREATE PROCEDURE Production.usp_UpdateInventory

@OrderDate datetime

MERGE Production.ProductInventory AS target

USING (SELECT ProductID, SUM(OrderQty) FROM Sales.SalesOrderDetail AS sod

JOIN Sales.SalesOrderHeader AS soh

ON sod.SalesOrderID = soh.SalesOrderID

AND soh.OrderDate = @OrderDate

GROUP BY ProductID) AS source (ProductID, OrderQty)

ON (target.ProductID = source.ProductID)

WHEN MATCHED AND target.Quantity - source.OrderQty <= 0

THEN DELETE

WHEN MATCHED

THEN UPDATE SET target.Quantity = target.Quantity - source.OrderQty,

target.ModifiedDate = GETDATE()

OUTPUT $action, Inserted.ProductID, Inserted.Quantity, Inserted.ModifiedDate, Deleted.ProductID,

Deleted.Quantity, Deleted.ModifiedDate;

EXECUTE Production.usp_UpdateInventory '20030501'

Advantages of using MERGE statement

1 - With a single statement we can easily implement insert, update, and delete logic to handle criteria for maintaining a table.

2 - The MERGE statement handles all the joining of the source and target. This minimizes the amount of code you need to write to handle merge logic.

3 – It requires less SQL Server resources which make it fast and optimized.

Saturday, September 25, 2010

Stored Procedure Best Practices

1. Normalize your tables

There are two common excuses for not normalizing databases: performance and pure laziness. You'll pay for the second one sooner or later; and, about performance, don't optimize what's not slow. And, more frequent than the inverse, the resulting design is slower. DBMS’s were designed to be used with normalized databases and SQL Server is no exception, so design with normalization in mind.

2. Avoid using cursors

Use cursors wisely. Cursors are fundamentally evil. They force the database engine to repeatedly fetch rows, negotiate blocking, manage locks, and transmit results. They consume network bandwidth as the results are transmitted back to the client, where they consume RAM, disk space, and screen real estate. Consider the resources consumed by each cursor you build and multiply this demand by the number of simultaneous users. Smaller is better. And good DBAs, most of the time, know what they are doing. But, if you are reading this, you are not a DBA, right?

Having said this the other question that comes is, If I were to use cursors then .... ? Well here are my 20Cents on cursor usage. Use the appropriate cursors for the job in hand.

· Don't use scrollable cursors unless required

· Use readonly cursors if you donot intend to update. This would be 90% of the situations.

· Try to use Forward Only cursor when using cursors

· Don’t forget to close and deallocate the cursors used.

· Try to reduce the number of columns and records fetched in a cursor

3. Index Columns

Create Index on columns that are going to be highly selective. Indexes are vital to efficient data access; however, there is a cost associated with creating and maintaining an index structure. For every insert, update and delete, each index must be updated. In a data warehouse, this is acceptable, but in a transactional database, you should weigh the cost of maintaining an index on tables that incur heavy changes. The bottom line is to use effective indexes judiciously. On analytical databases, use as many indexes as necessary to read the data quickly and efficiently.

Now a classic example is DONOT index an column like "Gender". This would have a selectivity of 50% and if your table is having 10 Million records, you can be least assured that using this index you may have to travel half the number of rows ... Hence maintaining such indexes can slow your performance.

4. Use transactions

Use transaction judiciously. This will save you when things get wrong. Working with data for some time you'll soon discover some unexpected situation which will make your stored procured crash. See that the transaction starts as late as possible and ends as early as possible. This would reduce the requirement to lock down the resources while accessing. In short,

5. Analyze deadlocks

Access your tables on the same order always. When working with stored procedures and transactions, you may find this soon. Any SQL programmer / database analyst would have come across this problem. If the order changes then there wold be a cyclic wait for resources to be released and the users would experience a permanent hang in the application. Deadlocks can be tricky to find if the lock sequence is not carefully designed. To summarize, Deadlock occurs when two users have locks on separate objects and each user is trying to lock the other user's objects. SQL Server automatically detects and breaks the deadlock. The terminated transaction will be automatically rolled back and an error code 1205 will be issued.

6. GOTO Usage

Avoid using the infamous GOTO. This is a time-proven means of adding disorder to program flow. There are some cases where intelligent use of GOTO is preferable to dogmatically refusing to use it. On the other hand, unintelligent use of GOTO is a quick ticket to unreadable code.

7. Increase timeouts

When querying a database, the default timeout is often low, like 30 seconds. Remember that report queries may run longer than this, specially when your database grows. Hence increase this value to an acceptable value.

8. Avoid NULLable columns

When possible, normalize your table and separate your nullable columns. They consume an extra byte on each NULLable column in each row and have more overhead associated when querying data. It will be more flexible and faster, and will reduce the NULLable columns. I'm not saying that NULLs are the evil incarnation. I believe they can simplify coding when "missing data" is part of your business rules.

9. TEXT datatype

Unless you are using it for really large data. The TEXT datatype is not flexible to query, is slow and wastes a lot of space if used incorrectly. Sometimes a VARCHAR will handle your data better. You can also look at the "text in row" feature with the table options for SQL Server 2000. But still I would stick to the first statement, Avoid using them on first place.

10. SELECT * Usage

Its very difficult to get out of this habit, but believe me this is very essential. Please DONOT use this syntax. Always qualify the full list of columns. Using all columns increases network traffic, requires more buffers and processing, and could prove error prone if the table or view definition changes.

11. Temporary tables usage

Unless strictly necessary. More often than not a subquery can substitute a temporary table. In SQL Server 2000, there are alternatives like the TABLE variable datatype which can provide in-memory solutions for small tables inside stored procedures too. If I were to recollect some of the advantages of using the same:

· A table variable behaves like a local variable. It has a well-defined scope, which is the function, stored procedure, or batch in which it is declared. Within its scope, a table variable may be used like a regular table.

· However, table may not be used in the following statements: INSERT INTO table_variable EXEC stored_procedure SELECT select_list INTO table_variable statements.

· Table variables are cleaned up automatically at the end of the function, stored procedure, or batch in which they are defined.

· Table variables used in stored procedures result in fewer recompilations of the stored procedures than their counterparts temporary tables.

· Transactions involving table variables last only for the duration of an update on the table variable. Thus, table variables require less locking and logging resources

12. Using UDF

UDF can replace stored procedures. But be careful in their usage. Sometimes UDFs can take a toll on your applications performance. And UDFs have to prefixed with the owners name. This is not a drawback but a requirement. I support usage of SPs more than UDFs.

13. Multiple User Scenario

Sometimes two users will edit the same record at the same time. While writing back, the last writer wins and some of the updates will be lost. It's easy to detect this situation: create a timestamp column and check it before you write. Code for these practical situations and test your application for these scenarios.

14. Use SCOPE_IDENTITY

Dont do SELECT max(ID) from MasterTable when inserting in a Details table. This is a common mistake, and will fail when concurrent users are inserting data at the same instance. Use one of SCOPE_IDENTITY or IDENT_CURRENT. My choice would be SCOPE_IDENTITY as this would give you the identity value from the current context in prespective.

15. Analyze Query Plans

The SQL Server query analyzer is a powerful tool. And surely is your friend, and you'll learn a lot of how it works and how the query and index design can affect performance through it. Understand the execution plan that the execution plan window shows for potential bottlenecks.

16. Parameterized queries

Parameterize all your queries using the sp_executesql. This would help the optimzer to chace the execution plans and use the same when requested teh second time. You can cache-in the time required to parse, compile and place the execution plan. Avoid using of D-SQL as much as possible.

17. Keep Procedures Small

Keep SPs small in size and scope. Two users invoking the same stored procedure simultaneously will cause the procedure to create two query plans in cache. It is much more efficient to have a stored procedure call other ones then to have one large procedure.

18. Bulk INSERT

Use DTS or the BCP utility and you'll have both a flexible and fast solution. Try avoiding use of Insert statement for the Buld loading feature, they are not efficent and are not designed for the same.

19. Using JOINS

Make sure that there are n-1 join criteria if there are n tables.

Make sure that ALL tables included in the statement are joined. Make sure that only tables that

· Have columns in the select clause

· Have columns referenced in the where clause

· Allow two unrelated tables to be joined together are included.

20. Trap Errors

Make sure that the @@ERROR global variable is checked after every statement which causes an update to the database (INSERT, UPDATE, DELETE). Make sure that rollbacks (if appropriate) are performed prior to inserting rows into an exception table

21. Small Result Set

Retrieving needlessly large result sets (for example, thousands of rows) for browsing on the client adds CPU and network I/O load, makes the application less capable of remote use, and limits multi-user scalability. It is better to design the application to prompt the user for sufficient input so queries are submitted that generates modest result sets.

22. Negative Arguments

Minimize the use of not equal operations, <> or !=. SQL Server has to scan a table or index to find all values to see if they are not equal to the value given in the expression. Try rephrasing the expression using ranges:

WHERE KeyColumn < 'TestValue' AND KeyColumn > 'TestValue'

23. Date Assumption

Prevent issues with the interpretation of centuries in dates, do not specify years using two digits. Assuming dates formats is the first place to break an application. Hence avoid making this assumption.

24. SP_ Name

DONOT start the name of a stored procedure with SP_. This is because all the system related stored procedures follow this convention. Hence a valid procedure today may clash with the naming convention of a system procedure that gets bundled with a Service pack / Security patch tomorrow. Hence do not follow this convention.

25. Apply the latest Security Packs / Service Packs

Even though this point applies to the network and the database administrators, it is always better to keep up-to date on the software’s. With the "slammer" virus and many more still outside, it is one of the best practices to be up-to date on the same. Consider this strongly.

26. Using Count(*)

The only 100 percent accurate way to check the number of rows in a table is to use a COUNT(*) operation. The statement might consume significant resources if your tables are very big because scanning a large table or index can consume a lot of I/O. Avoid these type of queries to the maximum. Use short circuting methods as EXISTS etc. Here is one other way you can find the total number of rows in a table. SQL Server Books Online (BOL) documents the structure of sysindexes; the value of sysindexes.indid will always be 0 for a table and 1 for a clustered index. If a table doesn't have a clustered index, its entry in sysindexes will always have an indid value of 0. If a table does have a clustered index, its entry in sysindexes will always have an indid value of 1.

SELECT object_name(id) ,rowcnt
FROM sysindexes
WHERE indid IN (1,0) AND OBJECTPROPERTY(id, 'IsUserTable') = 1

27. Ownership Chaining

Try using this feature (available from SQL Server 2000 SP3), for permission management within a single database. Avoid using this feature to manage permissions across database.

28. SQL Injection

Security has been a prime concern for everyone. Hence validate all the incoming parameters at all levels of the application. Limit the scope of possible damage by permitting only minimally privileged accounts to send user input to the server. Adding to it, run SQL Server itself with the least necessary privileges.

29. Fill-factor

The 'fill factor' option specifies how full SQL Server will make each index page. When there is no free space to insert new row on the index page, SQL Server will create new index page and transfer some rows from the previous page to the new one. This operation is called page splits. You can reduce the number of page splits by setting the appropriate fill factor option to reserve free space on each index page. The fill factor is a value from 1 through 100 that specifies the percentage of the index page to be left empty. The default value for fill factor is 0. It is treated similarly to a fill factor value of 100, the difference in that SQL Server leaves some space within the upper level of the index tree for FILLFACTOR = 0. The fill factor percentage is used only at the time the index is created. If the table contains read-only data (or data that very rarely changed), you can set the 'fill factor' option to 100. When the table's data modified very often, you can decrease the 'fill factor' option to 70 percent, for example. Having explained page splits in detail I would warn you in over looking at this point because more free space means that SQL Server has to traverse through more pages to get the same amount of data. Hence try to strike a balance and arrive at an appropriate value.

30. Start-up Procedures

Verify all the stored procedures for safety reasons.

31. Analyze Blocking

More often than not any implementers nightmare would be to see a blocking process. Blocking occurs when a process must wait for another process to complete. The process must wait because the resources it needs are exclusively used by another process. A blocked process will resume operation after the resources are released by the other process. Sometimes this can become cyclic and the system comes to a stand still. The only solution is to analyze your indexing strategy and table design. Consider these points strongly.

32. Avoid Un-necessary Indexes

Avoid creating un-necessary indexes on table thinking they would improve your performance. Understand that creating Indexes and maintaining them are overheads that you incur. And these surely do reduce the throughput for the whole application. You can create a simple test on a large table and find it for yourself how multiple indexes on the same column decrease performance.

33. Consider Indexed Views

Sometimes we would require an view to be indexed. This feature is bundled with SQL Server 2000. The result set of the indexed view is persist in the database and indexed for fast access. Because indexed views depend on base tables, you should create indexed views with SCHEMABINDING option to prevent the table or column modification that would invalidate the view. Hence using them can reduce a lot of load on the base tables but increases the maintainability.

34. WITH SORT_IN_TEMPDB Option

Consider using this option when you create an index and when tempdb is on a different set of disks than the user database. This is more of a tuning recommendation. Using this option can reduce the time it takes to create an index, but increases the amount of disk space used to create an index. Time is precious, disk is cheaper.

35. Reduce Number of Columns

Try to reduce the number of columns in a table. The fewer the number of columns in a table, the less space the table will use, since more rows will fit on a single data page, and less I/O overhead will be required to access the table's data. This should be considered strongly by applications that talk across different machines. More the unwanted data passed more is the network latency observed.

Script to compare data in two tables with identical structure.

These SQL scripts are to compare data in two separate tables with identical structure.
These statement not only work for Microsoft SQL as well as for any other Databases.

Following script finds records which exist in source table but not in target table.

The SQL statements can be modified for more than one column in Primary Key.

In SQL Server 2005 or later, EXCEPT operator can be used.

Following script finds records which exist in source table but not in target table as well as records which exist in target table but not in source table.

But Union operator can slow down the execution for tables with large amounts of data.