Sunday, 20 October 2019

Query Optimization

Query Optimization

The most underrated but most important topic, which is must while implementing the SQL Query, Store Procedures or Functions. While implementing any SQL operations knowing the syntax and structures is good thing, but one must know optimization. Without knowledge of optimization any developer can create DDL and DML statements, but they are not well designed procedures. You know why? Because while executing those statements there are chances that it will take time or may create deadlock situation. Proper joining is also considered to be part of optimization.

This below is actual execution plan flowchart.

Most simple query execution flow is mentioned below:

1.) From

2.) Joins

3.) Where

4.) Group by

5.) Having clause

6.) Column list

7.) Distinct

8.) Order by

9.) Top

1 . From: First it will fetch all the records from the table mentioned
after the ‘From’ keyword.

2 . Join: joins are the essential part of any SQL statements. A
developer must have proper knowledge of tables; otherwise it
can cause wrong data population of data or extended
execution time.

3 . Where: It another filter applied to any query after applying
joins. It is used to decrease no. of records provided filter wise.

4 . Group by: It is used for grouping the records with aggregate
functions or grouping the records particular provided column
wise.

5 . Having: When we need to provide aggregate function with
filter then we should use it in ‘Having’ clause.

6 . Column list: While putting ‘*’, we are calling all the columns
and all records from a particular table. If not necessary then we
must provide only those columns which are actually useful.

7 . Distinct: It is used to remove duplicate records while fetching
the details through select statement.

8 . Order by: This is useful to sort the data ascending or
descending column wise.

9 . Top: It is used to limit the no. of records to be displayed on the
screen.

Above mentioned query execution flow is 1^st step of the optimization ever keyword should be placed as per above plan. Last 3 steps (7, 8, and 9) are most crucial, because it will process all the records and then do operations accordingly.

‘Group by’ and ‘Having’ clauses are also takes time while execution because it uses aggregate functions in it. Along with that it is necessary for you to know that if not necessary then don’t do for inbuilt functions. ‘Convert’ an ‘Cast’ are most frequent built in functions which can extend the execution because of conversion. As mentioned above Joins are the most important part of any SQL statement because a good join increase the performance where a wrong can mislead you.

First of all, while implementing a join please check whether
tables are properly indexed or not. Indexing is very important
while creation of table, a table must have at least one clustered
index.

Less or no use of temp tables. Temp tables are tend to
increase the complexities of the query, because it increases
the continuous use of ‘tempdb’ database. If necessary then
create a clustered index on that temp table which increases
the performance and don’t wait for temp table to be dropped
automatically, drop it when it is of no use.

Go for the execution plan if query is taking too much time, by
seeing the plan we can easily fetch which query or portion of the
query is taking time. Execution plan shows which table used
maximum process time from overall time.

Make your indexes unique using integer or unique identifier
which increases the performances. A table must have one
clustered key and can have one or more non-clustered keys. Use
small data types for indexing.

For the existence of any record don’t dependent on counts
statement in query. For example,

Always use ‘with (nolock)’ keyword to avoid locks while fetching
records from the table. Avoid use of ‘NOT IN’ statement in where
condition, instead of that you can go for let join. Same way no
need to go for ‘IN’ statement, you can simply use inner join.

Please avoid loops and cursors while creating any store
procedure, because looping also causes CPU process usage
and calling same statement again and again. So avoid it is
not needed. Use ‘UNION LL’ instead of ‘UNION’ for combining
two or more ‘select’ statements.

Saturday, 19 October 2019

Importance of Indexing

Indexing

While you are fetching data from table, it is very important that your table should be indexed because indexing makes retrieval of data very easy. Without indexing, whole table is scanned to fetch the records as per needs which is time consuming and most of the times creates performance issues because it would require significantly more I/O and would be much slower, it doesn’t mean that you create more indexes on a single table. A table should have very few indexes. For that we need to know how many ways we can crate indexes in SQL Server.

There are two types of indexes: 1) Clustered Index 2) Non- Clustered Indexes which can contain two types of keys: i) Unique Key ii) Composite Key.

Keys

i) Unique:

Unique key means an index having only single column value. Most of the times we provide primary key to any table which means we are providing a unique key to that column.

ii) Composite Key:

Composite key means an index having multiple or combination of column values. You can add 16 columns (maximum) in composite until you reach the size of 900 bytes per key.

Index Structure

You can create indexes on any data type available in the structure but there are some exceptions here, come large object data types cannot be indexed like image, text and varchar (max). You can create indexes on XML but they are different from normal indexes and also if you want to use XML indexes then a table must have clustered index. While we are applying index on XML, a XML should be shredded which means data is parted from the XML tags, and organized into relational format.

Indexes are having hierarchical nature with a root node at the top and leaf nodes at the bottom. Between them it will be intermediate nodes as you can see in the Figure 1.

Figure1: B-tree structure of a SQL Server index

Types

1) Clustered Index

A clustered index is stored in the leaf nodes of the tree. An important characteristic of clustered index is that indexes remains sorted in either ascending or descending manner. As a result, there can be only one index per table or view and data in that table or view will also be sorted as index is sorted.

A table which is having a clustered index is called clustered table and a table which doesn’t have a clustered index is called ‘heap’. You can have 8 indexes per page and 900 bytes per key in clustered indexes.

2) Non-Clustered Index

Non-Clustered indexes can contain only the values from the indexed columns and pointers to the actual data rows, rather than containing the data which means query engine will need additional step to locate the actual data.

Locater structure depends on whether it is pointing to clustered index or heap. If it is pointing to clustered index then it will directly go to the data. If it is referencing heap, pointer will point to actual data row.

Non-Clustered indexes are not sorted like clustered indexes. You can create as many non-clustered indexes as you can. You can have 4 indexes per page and 1700 bytes (increased for T- SQL) per key in non-clustered index.

Fragmentation

Indexes are a great thing for relational database performance. Fragmentation done when the order of pages in an index doesn’t match to the physical order in data file. When index is created then it doesn’t require any fragmentation but as and when the data is inserted or deleted or updated, fragmentation should be done. There are two type of fragmentation.

Internal Fragmentation

Internal fragmentation occurs when data pages have extra spaces. If a data is having size under SQL Server’s page size, the engine will allocate page to store data. If size exceeds the page size than engine will allocate multiple pages to that data. Delete statement creates major fragmentation by leaving empty space in data pages. For Example, if a table is having 2500 pages which is 100 percent full and index is not fragmented. Table is used frequently and page size exceeds to 3000 pages that are only 83 percent full. Engine will need 13% more work on processing for 3000 pages instead of 2500 pages.

Sparsely populated data pages also increase SQL Server’s cache size, as caching happens at the page level.

External Fragmentation

External fragmentation is done when pages are totally filled. If any insert or update query is fired which has excess size then the leaf nodes are already filled then more space is required. Then engine performs page split operation where it creates new page by splitting existing page to 50% which causes logical fragmentation because the page is not contiguous to the original. External fragmentation results in increased random I/O, where engine needs to read data from many places rather than just reading the data.

To avoid index fragmentation:

Choose a cluster key that complements the table’s insert pattern
Do not insert records with random key values
Do not update records to make them longer
Do not update index key columns
Implement index fill factors

SQL Server Engine uses fill factor only when an index is created, rebuild or reorganized. The index fill factor is not used during regular inserts, updates and deletes. In fact, that does not make any sense, because the whole point is to allow inserts and updated to happen and to add more records without filling up the page.

Probably the easiest way to set the fill factor is to use the FILLFACTOR option, when you create or rebuild an index. You can also use the Object Explorer to set the fill factor. Note that you cannot set fill factor when you reorganizing an index.

You can use below scrip to rebuild index with fillfactor:

ALTER INDEX [key] ON [schema].[table name] REBUILD WITH (FILLFACTOR = 90);

OR

Ø  Select Database-> Table right click on table on which you want to specify fillfactor.

Ø  In ‘Select a page’ tab, select Options.

Ø

Removing Fragmentation

You can rebuild or reorganize index as and when require. If you rebuild index,

- It is faster for heavily fragmented data

- It can reduce logging by switching to BULK_LOGGED recovery mode

- It can also set FILLFACTOR.

- It can use multiple CPUs

- Must build new index before dropping old one.

- Will rebuild indexes and update index statistics.

ALTER INDEX [key] ON [schema].[table name] REBUILD

If you reconfigure index:

- It is incorruptible with no loss of work.

- Faster for lightly fragmented index.

- Will reorganize indexes but not update index statistics.

ALTER INDEX [key] ON [schema].[table name] REORGANIZE

There are some chances that statistics are not updated then unhindered data is fragmented well. For that you need to update column statistics. When you rebuild any index on table than index statistics are also updated but for table column statistics you need to use ‘Update statistics’ statement with ‘fullscan’ option.

UPDATE STATISTICS [schema].[table name] WITH FULLSCAN