Friday, November 16, 2012

SQL Server Database Design Gotchas

This is a long over-due post about some SQL Server database design gotchas that I have run across as I’ve worked with SQL Server over the years. A couple of these are already “best practices” and the others are simply practices I’ve learned that improve the database design.

1. Don’t use UniqueIdentifier (GUID) as the clustered key of a table.

This results in massive table fragmentation, as the order of the data in the table will be based on this column. Because GUIDs are by nature random and not sequential, this means that inserted data will always be randomly inserted all over the place. It is best to choose a sequential column or set of columns as the clustered key of a table.

2. Index your foreign keys.

SQL Server does not index foreign keys and thus you must do this manually. This is a good practice because you will be joining tables on the primary/foreign key relationships and therefore having an index on the foreign keys will allow SQL Server to use the index on those joins. If you follow point #3 on your database design, you can use the following SQL to generate your foreign keys (SQL 2008+):

SELECT 'CREATE NONCLUSTERED INDEX IDXFK_'+SCHEMA_NAME([d].schema_id)+'_'+[d].[name] +'_'+ OBJECT_NAME(a.referenced_object_id) + '_' + [c].[name] + ' ON '+ SCHEMA_NAME([d].schema_id)+'.'+[d].[name] + '('+[c].[name]+');'

FROM sys.foreign_keys a

INNER JOIN sys.foreign_key_columns b ON a.parent_object_id = b.parent_object_id AND a.referenced_object_id = b.referenced_object_id AND a.object_id = b.constraint_object_id

INNER JOIN sys.columns c ON a.parent_object_id = c.object_id AND b.parent_column_id = c.column_id

INNER JOIN sys.tables d ON c.object_id = d.object_id

ORDER BY [d].[name], [c].[name]


3. Don’t create nullable BIT columns.

This is a logical error anyway. A bit (or boolean) by definition is either true or false. If there is a third option, use another data type like tinyint. And create a default constraint on the bit column to save yourself on inserts.

4. Use a single column for your primary key (avoid composite primary keys).

The benefits to this one are multiple. One column is the identifier of the row, any foreign keys back to the table are also one column, the join in simpler, the index is smaller, etc. With composite keys, you can quickly end up in a situation where the great-grandchild table ends up having 4 columns as its primary key, and who likes to type that much code for joins?

5. Use unique constraints to specify the business keys (corollary to the previous point).

While the primary key is a single column, use a unique constraint to specify the business key instead of defining the primary key as the business key.

6. Be consistent in your naming.

Self-explanatory – it is difficult to maintain a database that is inconsistent. Even when making changes to an existing database, stick with the convention already defined instead of doing your own, even if you disagree with its convention.

7. Avoid making the primary key column of a lookup table an identity column if the table is one where the primary key value will have meaning.

For system lookup tables (typically those you would generate as enums in code), I’d recommend not applying an identity specification on the key column (thus requiring an explicit set of the value on insert) and I’d change the column name to end in “Cd” instead of “Id” to denote that it is an explicit value that can be coded against.

That’s it for now. Enjoy!