Home > In-MemoryOLTP > BUCKET_COUNT for Hash Indexes for SQL Server Memory Optimized Tables

BUCKET_COUNT for Hash Indexes for SQL Server Memory Optimized Tables

 

Indexes are used as entry points for memory-optimized tables.Data in memory is quickly read by index to locate in-memory data.

[From BOL]

A hash index consists of a collection of buckets organized in an array. A hash function maps index keys to corresponding buckets in the hash index. The following figure shows three index keys that are mapped to three different buckets in the hash index. For illustration purposes the hash function name is f(x).

Index keys mapped to different buckets.

 

The hashing function used for hash indexes has the following characteristics:

  • SQL Server has one hash function that is used for all hash indexes.

  • The hash function is deterministic. The same index key is always mapped to the same bucket in the hash index.

  • Multiple index keys may be mapped to the same hash bucket.

  • The hash function is balanced, meaning that the distribution of index key values over hash buckets typically follows a Poisson distribution.

    Poisson distribution is not an even distribution. Index key values are not evenly distributed in the hash buckets. For example, a Poisson distribution of n distinct index keys over n hash buckets results in approximately one third empty buckets, one third of the buckets containing one index key, and the other third containing two index keys. A small number of buckets will contain more than two keys.

If two index keys are mapped to the same hash bucket, there is a hash collision. A large number of hash collisions can have a performance impact on read operations.

The in-memory hash index structure consists of an array of memory pointers. Each bucket maps to an offset in this array. Each bucket in the array points to the first row in that hash bucket. Each row in the bucket points to the next row, thus resulting in a chain of rows for each hash bucket, as illustrated in the following figure.

 

The in-memory hash index structure.

 

The figure has three buckets with rows. The second bucket from the top contains the three red rows. The fourth bucket contains the single blue row. The bottom bucket contains the two green rows. These could be different versions of the same row.

 

Create In-Memory OLTP table

 

USE [AdventureWorks2014]
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [Sales].[SalesOrderDetail_imoltp4]
(
	[SalesOrderID] [int] NOT NULL,
	[SalesOrderDetailID] [int] IDENTITY(1,1) NOT NULL,
	[CarrierTrackingNumber] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
	[OrderQty] [smallint] NOT NULL,
	[ProductID] [int] NOT NULL,
	[SpecialOfferID] [int] NOT NULL,
	[UnitPrice] [money] NOT NULL,
	[UnitPriceDiscount] [money] NOT NULL CONSTRAINT [DF_SalesOrderDetail_UnitPriceDiscount4]  DEFAULT ((0.0)),
	[ModifiedDate] [datetime] NOT NULL CONSTRAINT [DF_SalesOrderDetail_ModifiedDate4]  DEFAULT (getdate()),

INDEX [IX_SalesOrderDetail_ProductId_4] NONCLUSTERED 
(
	[ProductID] ASC
),
CONSTRAINT [PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID_IN4] PRIMARY KEY NONCLUSTERED HASH 
(
	[SalesOrderID],
	[SalesOrderDetailID]
)WITH ( BUCKET_COUNT = 128)
)WITH ( MEMORY_OPTIMIZED = ON , DURABILITY = SCHEMA_AND_DATA )
GO

 

Created PRIMARY KEY NONCLUSTERED HASH Index with BUCKET_COUNT=128. BUCKET_COUNT indicates the no of buckets in the hash index.In this case we have 128 buckets.The max value we can have for BUCKET_COUNT=1,073,741,824

 

CONSTRAINT [PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID_IN4] PRIMARY KEY NONCLUSTERED HASH 
(
	[SalesOrderID],
	[SalesOrderDetailID]
)WITH ( BUCKET_COUNT = 128)

 

using Dynamic Management view we can monitor how BUCKET_COUNT is used.sys.dm_db_xtp_hash_index_stats

image

 

select * from sys.dm_db_xtp_hash_index_stats

 

image

 

HASH Index Statistics

 

Execute the below code to see how BUCKET_COUNT is used.

SELECT 
   object_name(hs.object_id) AS 'object name', 
   i.name as 'index name', 
   i.index_id,
   hs.total_bucket_count,
   hs.empty_bucket_count,
   floor((cast(empty_bucket_count as float)/total_bucket_count) * 100) AS 'empty_bucket_percent',
   hs.avg_chain_length, 
   hs.max_chain_length
FROM sys.dm_db_xtp_hash_index_stats AS hs 
   JOIN sys.indexes AS i 
   ON hs.object_id=i.object_id AND hs.index_id=i.index_id

 

 

image

 

As you can see PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID_IN4 Primary Key Hash Index has a total of 128 BUCKET_COUNT, and as we insert the records  empty_bucket_count will decrease, max_chain_length column will show no of records in each bucket.

 

 

image

 

image

 

After inserting few  millions of records all the BUCKET_COUNT where used and each BUKCET has 90156 rows

 

image

Categories: In-MemoryOLTP
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: