Columnstore Index in Azure SQL Database - Demo
Consider
an Order Management application. The key table for this application is ‘orders’
that tracks customer information, purchase price and the order status. It is
expected that over time a large number of rows will have the order status as
‘Order Received’ (i.e. the order has been received by the customer. We don’t
expect any more changes to the order once it has been received by the customer
except when customer is not happy with the purchase and wants to return it. So
for all practical purposes, once the order has been received by the customer,
it can be considered that this row is now ‘cold’.
create table orders (
AccountKey int not null,
customername nvarchar (50),
OrderNumber bigint,
PurchasePrice decimal (9,2),
OrderStatus smallint not NULL,
OrderStatusDesc nvarchar (50))
— OrderStatusDesc
— 0 => ‘Order Started’
— 1 => ‘Order Closed’
— 2 => ‘Order Paid’
— 3 => ‘Order Fullfillment Wait’
— 4 => ‘Order Shipped’
— 5 => ‘Order Received’
Let us also create a clustered index on the columns ‘OrderStatus’
create clustered index orders_ci on orders(OrderStatus)
Load 3 million rows with the data
pattern that 95% of the orders have already been received by the customer.
-- insert into the
main table load 3 million rows
declare @outerloop int = 0
declare @i int = 0
declare @purchaseprice decimal (9,2)
declare @customername nvarchar (50)
declare @accountkey int
declare @orderstatus smallint
declare @orderstatusdesc nvarchar(50)
declare @ordernumber bigint
while (@outerloop < 3000000)
begin
Select @i = 0
begin tran
while (@i < 2000)
begin
set @ordernumber = @outerloop + @i
set @purchaseprice = rand() * 1000.0
set @accountkey = convert (int, RAND ()*1000)
set @orderstatus = convert (smallint, RAND()*100)
if (@orderstatus >= 5) set @orderstatus = 5
set @orderstatusdesc =
case @orderstatus
WHEN 0 THEN 'Order Started'
WHEN 1 THEN 'Order Closed'
WHEN 2 THEN 'Order Paid'
WHEN 3 THEN 'Order Fullfillment'
WHEN 4 THEN 'Order Shipped'
WHEN 5 THEN 'Order Received'
END
insert orders values (@accountkey,
(convert(varchar(6), @accountkey) + 'firstname'),@ordernumber, @purchaseprice,@orderstatus, @orderstatusdesc)
set @i += 1;
end
commit
set @outerloop = @outerloop + 2000
set @i = 0
end
go
Now
create a nonclustered columnstore index. Note, that it is just a DDL operation
and similar to any other btree index that you would create on a rowstore table.
No changes to the application needed. Unlike in earlier releases of SQL Server,
the NCCI is updateable in SQL Server 2016 and Azure SQL Database, so your
transaction workload will continue to run.
--create NCCI
CREATE NONCLUSTERED COLUMNSTORE INDEX orders_ncci ON orders (accountkey, customername, purchaseprice, orderstatus, orderstatusdesc)
Load another 200k orders, but these orders are not ‘yet’ received by the customers using the below script.
--insert additional
200k rows
declare @outerloop int = 3000000
declare @i int = 0
declare @purchaseprice decimal (9,2)
declare @customername nvarchar (50)
declare @accountkey int
declare @orderstatus smallint
declare @orderstatusdesc nvarchar(50)
declare @ordernumber bigint
while (@outerloop < 3200000)
begin
Select @i = 0
begin tran
while (@i < 2000)
begin
set @ordernumber = @outerloop + @i
set @purchaseprice = rand() * 1000.0
set @accountkey = convert (int, RAND ()*1000)
set @orderstatus = convert (smallint, RAND()*5)
set @orderstatusdesc =
case @orderstatus
WHEN 0 THEN 'Order Started'
WHEN 1 THEN 'Order Closed'
WHEN 2 THEN 'Order Paid'
WHEN 3 THEN 'Order Fullfillment'
WHEN 4 THEN 'Order Shipped'
WHEN 5 THEN 'Order Received'
END
insert orders values (@accountkey,(convert(varchar(6), @accountkey) + 'firstname'),
@ordernumber, @purchaseprice, @orderstatus, @orderstatusdesc)
set @i += 1;
end
commit
set @outerloop = @outerloop + 2000
set @i = 0
end
go
If
we look at the details on the columnstore index, you will see that 3 million
rows are compressed while the new 200k rows are in the delta rowgroup
-- look at the
rowgroups
select object_name(object_id), index_id, row_group_id, delta_store_hobt_id, state_desc, total_rows, trim_reason_desc, transition_to_compressed_state_desc
from sys.dm_db_column_store_row_group_physical_stats
where object_id = object_id('orders')
We will now run couple of analytics queries and compare the performance if NCCI was not there.
These queries are simple and are not
representative of the common real-time operational scenario involving
multi-table joins but the intent here is to show you that query optimizer picks
NCCI for analytics queries and provides significant speed up over traditional
btree indexes. Also note, that the difference in performance will only widen as
you add more
data.
Example-1: --run simple query
select max (PurchasePrice)
from orders
The query plan is as follows
showing that query optimizer chose NCCI.
When running the same
query without using NCCI, the query took much longer to run (--
--run the query without using NCCI
select max (PurchasePrice)
from orders
option (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX)
Example-2: Let us now
consider a more complex query
-- a more complex
query
select top 5 customername, sum (PurchasePrice), Avg (PurchasePrice)
from orders
where purchaseprice > 90.0 and OrderStatus=5
group by customername
Here
is the query plan for the query using NCCI. One thing to note ,the aggregate
got pushed down to the scan node
--a more complex query without NCCI
select top 5 customername, sum (PurchasePrice), Avg (PurchasePrice)
from orders
where purchaseprice > 90.0 and OrderStatus = 5
group by customername
option (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX)
Check the Runtime information and compare the performance.
'SQL > DBA 가이드' 카테고리의 다른 글
TOP 10 queries by IO and CPU (0) | 2017.02.22 |
---|---|
모든 데이터베이스 전체 공간 및 여유 공간 조회 (0) | 2017.02.01 |
SQL Server 2005 에서 2012 로 업그레이드 시 오류 (0) | 2014.08.07 |
MSSQL DBA (0) | 2014.07.21 |
[MSSQL] 2005 이상 LOCK MONITORING (0) | 2013.05.21 |