Saturday, September 25, 2010

Stored Procedure Best Practices

1. Normalize your tables

There are two common excuses for not normalizing databases: performance and pure laziness. You'll pay for the second one sooner or later; and, about performance, don't optimize what's not slow. And, more frequent than the inverse, the resulting design is slower. DBMS’s were designed to be used with normalized databases and SQL Server is no exception, so design with normalization in mind.

2. Avoid using cursors

Use cursors wisely. Cursors are fundamentally evil. They force the database engine to repeatedly fetch rows, negotiate blocking, manage locks, and transmit results. They consume network bandwidth as the results are transmitted back to the client, where they consume RAM, disk space, and screen real estate. Consider the resources consumed by each cursor you build and multiply this demand by the number of simultaneous users. Smaller is better. And good DBAs, most of the time, know what they are doing. But, if you are reading this, you are not a DBA, right?

Having said this the other question that comes is, If I were to use cursors then .... ? Well here are my 20Cents on cursor usage. Use the appropriate cursors for the job in hand.

· Don't use scrollable cursors unless required

· Use readonly cursors if you donot intend to update. This would be 90% of the situations.

· Try to use Forward Only cursor when using cursors

· Don’t forget to close and deallocate the cursors used.

· Try to reduce the number of columns and records fetched in a cursor

3. Index Columns

Create Index on columns that are going to be highly selective. Indexes are vital to efficient data access; however, there is a cost associated with creating and maintaining an index structure. For every insert, update and delete, each index must be updated. In a data warehouse, this is acceptable, but in a transactional database, you should weigh the cost of maintaining an index on tables that incur heavy changes. The bottom line is to use effective indexes judiciously. On analytical databases, use as many indexes as necessary to read the data quickly and efficiently.

Now a classic example is DONOT index an column like "Gender". This would have a selectivity of 50% and if your table is having 10 Million records, you can be least assured that using this index you may have to travel half the number of rows ... Hence maintaining such indexes can slow your performance.

4. Use transactions

Use transaction judiciously. This will save you when things get wrong. Working with data for some time you'll soon discover some unexpected situation which will make your stored procured crash. See that the transaction starts as late as possible and ends as early as possible. This would reduce the requirement to lock down the resources while accessing. In short,

5. Analyze deadlocks

Access your tables on the same order always. When working with stored procedures and transactions, you may find this soon. Any SQL programmer / database analyst would have come across this problem. If the order changes then there wold be a cyclic wait for resources to be released and the users would experience a permanent hang in the application. Deadlocks can be tricky to find if the lock sequence is not carefully designed. To summarize, Deadlock occurs when two users have locks on separate objects and each user is trying to lock the other user's objects. SQL Server automatically detects and breaks the deadlock. The terminated transaction will be automatically rolled back and an error code 1205 will be issued.

6. GOTO Usage

Avoid using the infamous GOTO. This is a time-proven means of adding disorder to program flow. There are some cases where intelligent use of GOTO is preferable to dogmatically refusing to use it. On the other hand, unintelligent use of GOTO is a quick ticket to unreadable code.

7. Increase timeouts

When querying a database, the default timeout is often low, like 30 seconds. Remember that report queries may run longer than this, specially when your database grows. Hence increase this value to an acceptable value.

8. Avoid NULLable columns

When possible, normalize your table and separate your nullable columns. They consume an extra byte on each NULLable column in each row and have more overhead associated when querying data. It will be more flexible and faster, and will reduce the NULLable columns. I'm not saying that NULLs are the evil incarnation. I believe they can simplify coding when "missing data" is part of your business rules.

9. TEXT datatype

Unless you are using it for really large data. The TEXT datatype is not flexible to query, is slow and wastes a lot of space if used incorrectly. Sometimes a VARCHAR will handle your data better. You can also look at the "text in row" feature with the table options for SQL Server 2000. But still I would stick to the first statement, Avoid using them on first place.

10. SELECT * Usage

Its very difficult to get out of this habit, but believe me this is very essential. Please DONOT use this syntax. Always qualify the full list of columns. Using all columns increases network traffic, requires more buffers and processing, and could prove error prone if the table or view definition changes.

11. Temporary tables usage

Unless strictly necessary. More often than not a subquery can substitute a temporary table. In SQL Server 2000, there are alternatives like the TABLE variable datatype which can provide in-memory solutions for small tables inside stored procedures too. If I were to recollect some of the advantages of using the same:

· A table variable behaves like a local variable. It has a well-defined scope, which is the function, stored procedure, or batch in which it is declared. Within its scope, a table variable may be used like a regular table.

· However, table may not be used in the following statements: INSERT INTO table_variable EXEC stored_procedure SELECT select_list INTO table_variable statements.

· Table variables are cleaned up automatically at the end of the function, stored procedure, or batch in which they are defined.

· Table variables used in stored procedures result in fewer recompilations of the stored procedures than their counterparts temporary tables.

· Transactions involving table variables last only for the duration of an update on the table variable. Thus, table variables require less locking and logging resources

12. Using UDF

UDF can replace stored procedures. But be careful in their usage. Sometimes UDFs can take a toll on your applications performance. And UDFs have to prefixed with the owners name. This is not a drawback but a requirement. I support usage of SPs more than UDFs.

13. Multiple User Scenario

Sometimes two users will edit the same record at the same time. While writing back, the last writer wins and some of the updates will be lost. It's easy to detect this situation: create a timestamp column and check it before you write. Code for these practical situations and test your application for these scenarios.

14. Use SCOPE_IDENTITY

Dont do SELECT max(ID) from MasterTable when inserting in a Details table. This is a common mistake, and will fail when concurrent users are inserting data at the same instance. Use one of SCOPE_IDENTITY or IDENT_CURRENT. My choice would be SCOPE_IDENTITY as this would give you the identity value from the current context in prespective.

15. Analyze Query Plans

The SQL Server query analyzer is a powerful tool. And surely is your friend, and you'll learn a lot of how it works and how the query and index design can affect performance through it. Understand the execution plan that the execution plan window shows for potential bottlenecks.

16. Parameterized queries

Parameterize all your queries using the sp_executesql. This would help the optimzer to chace the execution plans and use the same when requested teh second time. You can cache-in the time required to parse, compile and place the execution plan. Avoid using of D-SQL as much as possible.

17. Keep Procedures Small

Keep SPs small in size and scope. Two users invoking the same stored procedure simultaneously will cause the procedure to create two query plans in cache. It is much more efficient to have a stored procedure call other ones then to have one large procedure.

18. Bulk INSERT

Use DTS or the BCP utility and you'll have both a flexible and fast solution. Try avoiding use of Insert statement for the Buld loading feature, they are not efficent and are not designed for the same.

19. Using JOINS

Make sure that there are n-1 join criteria if there are n tables.

Make sure that ALL tables included in the statement are joined. Make sure that only tables that

· Have columns in the select clause

· Have columns referenced in the where clause

· Allow two unrelated tables to be joined together are included.

20. Trap Errors

Make sure that the @@ERROR global variable is checked after every statement which causes an update to the database (INSERT, UPDATE, DELETE). Make sure that rollbacks (if appropriate) are performed prior to inserting rows into an exception table

21. Small Result Set

Retrieving needlessly large result sets (for example, thousands of rows) for browsing on the client adds CPU and network I/O load, makes the application less capable of remote use, and limits multi-user scalability. It is better to design the application to prompt the user for sufficient input so queries are submitted that generates modest result sets.

22. Negative Arguments

Minimize the use of not equal operations, <> or !=. SQL Server has to scan a table or index to find all values to see if they are not equal to the value given in the expression. Try rephrasing the expression using ranges:

WHERE KeyColumn < 'TestValue' AND KeyColumn > 'TestValue'

23. Date Assumption

Prevent issues with the interpretation of centuries in dates, do not specify years using two digits. Assuming dates formats is the first place to break an application. Hence avoid making this assumption.

24. SP_ Name

DONOT start the name of a stored procedure with SP_. This is because all the system related stored procedures follow this convention. Hence a valid procedure today may clash with the naming convention of a system procedure that gets bundled with a Service pack / Security patch tomorrow. Hence do not follow this convention.

25. Apply the latest Security Packs / Service Packs

Even though this point applies to the network and the database administrators, it is always better to keep up-to date on the software’s. With the "slammer" virus and many more still outside, it is one of the best practices to be up-to date on the same. Consider this strongly.

26. Using Count(*)

The only 100 percent accurate way to check the number of rows in a table is to use a COUNT(*) operation. The statement might consume significant resources if your tables are very big because scanning a large table or index can consume a lot of I/O. Avoid these type of queries to the maximum. Use short circuting methods as EXISTS etc. Here is one other way you can find the total number of rows in a table. SQL Server Books Online (BOL) documents the structure of sysindexes; the value of sysindexes.indid will always be 0 for a table and 1 for a clustered index. If a table doesn't have a clustered index, its entry in sysindexes will always have an indid value of 0. If a table does have a clustered index, its entry in sysindexes will always have an indid value of 1.

SELECT object_name(id) ,rowcnt
FROM sysindexes
WHERE indid IN (1,0) AND OBJECTPROPERTY(id, 'IsUserTable') = 1

27. Ownership Chaining

Try using this feature (available from SQL Server 2000 SP3), for permission management within a single database. Avoid using this feature to manage permissions across database.

28. SQL Injection

Security has been a prime concern for everyone. Hence validate all the incoming parameters at all levels of the application. Limit the scope of possible damage by permitting only minimally privileged accounts to send user input to the server. Adding to it, run SQL Server itself with the least necessary privileges.

29. Fill-factor

The 'fill factor' option specifies how full SQL Server will make each index page. When there is no free space to insert new row on the index page, SQL Server will create new index page and transfer some rows from the previous page to the new one. This operation is called page splits. You can reduce the number of page splits by setting the appropriate fill factor option to reserve free space on each index page. The fill factor is a value from 1 through 100 that specifies the percentage of the index page to be left empty. The default value for fill factor is 0. It is treated similarly to a fill factor value of 100, the difference in that SQL Server leaves some space within the upper level of the index tree for FILLFACTOR = 0. The fill factor percentage is used only at the time the index is created. If the table contains read-only data (or data that very rarely changed), you can set the 'fill factor' option to 100. When the table's data modified very often, you can decrease the 'fill factor' option to 70 percent, for example. Having explained page splits in detail I would warn you in over looking at this point because more free space means that SQL Server has to traverse through more pages to get the same amount of data. Hence try to strike a balance and arrive at an appropriate value.

30. Start-up Procedures

Verify all the stored procedures for safety reasons.

31. Analyze Blocking

More often than not any implementers nightmare would be to see a blocking process. Blocking occurs when a process must wait for another process to complete. The process must wait because the resources it needs are exclusively used by another process. A blocked process will resume operation after the resources are released by the other process. Sometimes this can become cyclic and the system comes to a stand still. The only solution is to analyze your indexing strategy and table design. Consider these points strongly.

32. Avoid Un-necessary Indexes

Avoid creating un-necessary indexes on table thinking they would improve your performance. Understand that creating Indexes and maintaining them are overheads that you incur. And these surely do reduce the throughput for the whole application. You can create a simple test on a large table and find it for yourself how multiple indexes on the same column decrease performance.

33. Consider Indexed Views

Sometimes we would require an view to be indexed. This feature is bundled with SQL Server 2000. The result set of the indexed view is persist in the database and indexed for fast access. Because indexed views depend on base tables, you should create indexed views with SCHEMABINDING option to prevent the table or column modification that would invalidate the view. Hence using them can reduce a lot of load on the base tables but increases the maintainability.

34. WITH SORT_IN_TEMPDB Option

Consider using this option when you create an index and when tempdb is on a different set of disks than the user database. This is more of a tuning recommendation. Using this option can reduce the time it takes to create an index, but increases the amount of disk space used to create an index. Time is precious, disk is cheaper.

35. Reduce Number of Columns

Try to reduce the number of columns in a table. The fewer the number of columns in a table, the less space the table will use, since more rows will fit on a single data page, and less I/O overhead will be required to access the table's data. This should be considered strongly by applications that talk across different machines. More the unwanted data passed more is the network latency observed.

Script to compare data in two tables with identical structure.

These SQL scripts are to compare data in two separate tables with identical structure.
These statement not only work for Microsoft SQL as well as for any other Databases.

Following script finds records which exist in source table but not in target table.






The SQL statements can be modified for more than one column in Primary Key.






In SQL Server 2005 or later, EXCEPT operator can be used.




Following script finds records which exist in source table but not in target table as well as records which exist in target table but not in source table.

But Union operator can slow down the execution for tables with large amounts of data.