Pradeep Chauhan: 2014

Sunday, February 23, 2014

Working with Statistics

What are statistics?

Simply put - Statistics are a form of dynamic metadata that assists the query optimizer in making better decisions and creating the best possible execution plan. These statistics represent the distribution of the data within a column, or columns. The Query Optimizer uses them to estimate how many rows will be returned from a query plan.
Query Optimizer is an intelligent program; it does not necessarily use Statistics all the time.
Suppose there’s a table with very few records, let’s say – 50 rows. In this case it is better off performing a table scan instead of wasting time to perform multiple seeks.

When are statistics created?

a) Statistics are automatically created when there’s a clustered index on the column.

b) When there’s non-clustered index on one or more columns.

c) SQL Server can also build statistics on non-indexed columns. When the non-indexed columns are used in join or filter criteria, SQL Server automatically builds statistics on such columns if it believes such information valuable for creating a better execution plan.

d)Statistics can also be created by create statistics statement. For Ex:
CREATE STATISTICS <statistic-name> ON TableName(ColumnName(s))

Note: By default AUTO_CREATE_STATISTICS set to ON. In case if it is set to OFF on the database the statistics won’t be created by default except for the indexes.

The drawback of the having AUTO_UPDATE ON is that the statistics update will be a part of a optimization phase when the load on the Database is high i.e., lot of DMLs happening, the optimization phase will wait till the statistics get updated for those inserts and updates which results in slower queries.

To be sure whether statistics are being used or not, check out your Actual Number of Rows and Estimated Number of Rows.
If these numbers are (consistently) fairly close, then most likely your statistics are up-to-date and used by the optimizer for the query. If not, time for you to update your statistics.

How Statistics work?

Data is measured two different ways within a single set of statistics, by Density and by Data distribution.

Density:

Density describes the uniqueness of data in a particular column.

Density can be Calculated as 1/distinct values and it lies between 0 and 1
0 refers to no duplicate records
1 refers to 100% duplicate records.

Data Distribution:

The data distribution represents a statistical analysis of the kind of data that is in the first column available for statistics.

Data distribution is measured by histogram which keeps track of number of occurrences of data that fall into number of categories, and in distribution statistics these categories are chosen so as to represent the distribution of the data. It’s this information that the optimizer can use to estimate the number of rows returned by a given distribution value.

SQL collects and maintains a sampling of 200 values of the index data in RANGE_HI_KEY and uses this histogram to determine an estimated number of matches to the criteria.

To understand how exactly Statistics work, let’s take help of DBCC SHOW_STATISTICS

use AdventureWorks2008

DBCC SHOW_STATISTICS ('Sales.SalesOrderDetail','IX_SalesOrderDetail_ProductID')

Result would be something like this.

It Returns 3 Resultsets.

1) First Resultset focusses on the columns included in the index, details of number of rows, the number of steps in the tree, length and density. You can also check when the statistics were last updated. If it is too old then you need to update the statistics.

The optimizer uses this information to see if it is better to just do a scan than to waste the effort on multiple seeks due to the number of rows and steps.

2) Second Resultset guides you about the All density of frequent values. It also contains information of the columns included and width of the columns.

3) Third resultset is actually the most interesting section. This is where most people think about stats but really don't understand what's going on

SQL collects and maintains a sampling of 200 values of the index data

By using this histogram, SQL Server determines the expected no. of rows to be returned for a given query. For example: in the above if you see the highlighted section, you will noticed that in the 80th step we have

RANGE_HI_KEY=741

RANGE_ROWS=167

EQ_ROWS= 94

Firstly, it means that for range of ProductID>738 and ProductID<741 we will get 167 rows. This can be verified by executing the following query:

Select count(*) from Sales.SalesOrderDetail where ProductID>738 andProductID<741

Result is 167, which confirms the RANGE_ROWS statistics value

Secondly, it also means that for ProductID=741 we will get 94 rows. This can be verified by executing the following query:

Select count(*) from Sales.SalesOrderDetail where ProductID=741
Result is 94, which confirms the EQ_ROWS statistics value.

Conclusion:

Statistics play a very vital role in deciding the best execution plan for the query optimizer and can enhance the performance of queries to a great extent. Inefficient maintenance of statistics may result in all sorts of uninvited performance issues.

Sources:
https://www.simple-talk.com/sql/learn-sql-server/statistics-in-sql-server/

http://blog.namwarrizvi.com/?p=193

http://technet.microsoft.com/en-us/library/dd535534(v=sql.100).aspx

Tuesday, February 18, 2014

SQL Server Architecture

Alright, everyone's got their version of SQL Server Architecture. I've read a lot of stuff to understand what's the design n architecture of SQL. Below diagram puts together almost everything that there is in SQL Server.
I'll keep it short n crisp.

External Protocols
Database Engine
SQLOS API
SQL OS

1. External Protocols: Every request to SQL comes from one the four protocols.

Shared memory: Connects to SQL instance on the same machine.
TCP/IP: Connects to SQL Instance over the network - This is widely used protocol today.
VIA (Virtual Interface Adapter): Customized network and it varies from vendor to vendor. This is however deprecated.
Named Pipes: It is an IPC object more like a TCP socket, All that i understand with named pipes is that, it needs more traffic to transmit data than does TCP. Also Named Pipes is limited to local subnet.

2. Database Engine: Database Engine has two parts:

Storage Engine: As seen in the diagram, storage engine holds responsible for the integrity of the data and manages all data access, both through transaction-based commands and bulk operations such as bulk insert, and certain DBCC (Database Consistency Checker) commands.
Relational Engine: Understands the user's query, checks for syntax and symantics of the query and requests the Storage engine for data to processes the user's query. Communication between the relational engine and the storage engine is generally in terms of OLE DB row sets.

3. SQLOS API: This sits in between Database Engine and SQLOS. It provides necessary Libraries and the environment to execute SQL Statements.

4. SQLOS

The SQLOS layer handles activities that are normally considered to be operating system responsibilities, such as thread management (scheduling), synchronization primitives, deadlock detection, and memory management, including the buffer pool.

Saturday, February 15, 2014

Introduction

Hello All,

This is my first blog.
I am by Profession MS SQL DBA currently working at a reputed MNC.
I am a very curious person in most of life's aspects, many may find me odd n weird, while others find me to be a delight.😉

I have a total of 5+ yrs of experience as a MS SQL DBA till date (8th Jan'17).
I consider myself a beginner or intermediate in SSIS, SSRS, T-SQL and Powershell.

Reason why i'm onto blogger is there's so much on my mind, and i must admit i'm a lazy guy and can't maintain a physical diary to note important things. Hence Blogger.
Another important reason why i'm here is, to dump my technical knowledge so that i could refer it at some later point and hope it helps others too. I am will be posting a few important scripts and articles regarding SQL Server which i have collected over the years from various sources. I don't claim all of these scripts are my own 😝.
Any suggestions regarding any of my posts are welcome....!

Thank you.
Pradeep