본문 바로가기

SQL/DBA 가이드

Azure Column store demo

Columnstore Index in Azure SQL Database - Demo

Consider an Order Management application. The key table for this application is ‘orders’ that tracks customer information, purchase price and the order status. It is expected that over time a large number of rows will have the order status as ‘Order Received’ (i.e. the order has been received by the customer. We don’t expect any more changes to the order once it has been received by the customer except when customer is not happy with the purchase and wants to return it. So for all practical purposes, once the order has been received by the customer, it can be considered that this row is now ‘cold’.  

create table orders (

AccountKey int not null,

customername nvarchar (50),

OrderNumber bigint,

PurchasePrice decimal (9,2),

OrderStatus smallint not NULL,

OrderStatusDesc nvarchar (50))

 

OrderStatusDesc

0 => Order Started

1 => Order Closed

2 => Order Paid

3 => Order Fullfillment Wait

4 => Order Shipped

5 => Order Received

 

Let us also create a clustered index on the columns ‘OrderStatus’

 

create clustered index orders_ci on orders(OrderStatus)


Load 3 million rows with the data pattern that 95% of the orders have already been received by the customer.

-- insert into the main table load 3 million rows

declare @outerloop int = 0

declare @i int = 0

declare @purchaseprice decimal (9,2)

declare @customername nvarchar (50)

declare @accountkey int

declare @orderstatus smallint

declare @orderstatusdesc nvarchar(50)

declare @ordernumber bigint

while (@outerloop < 3000000)

begin

Select @i = 0

begin tran

while (@i < 2000)

begin

set @ordernumber = @outerloop + @i

set @purchaseprice = rand() * 1000.0

set @accountkey = convert (int, RAND ()*1000)

set @orderstatus = convert (smallint, RAND()*100)

if (@orderstatus >= 5) set @orderstatus = 5

set @orderstatusdesc  =

case @orderstatus

WHEN 0 THEN  'Order Started'

WHEN 1 THEN  'Order Closed'

WHEN 2 THEN  'Order Paid'

WHEN 3 THEN 'Order Fullfillment'

WHEN 4 THEN  'Order Shipped'

WHEN 5 THEN 'Order Received'

END

insert orders values (@accountkey,

(convert(varchar(6), @accountkey) + 'firstname'),@ordernumber, @purchaseprice,@orderstatus, @orderstatusdesc)

set @i += 1;

end

commit

set @outerloop = @outerloop + 2000

set @i = 0

end

go

 

 

 

Now create a nonclustered columnstore index. Note, that it is just a DDL operation and similar to any other btree index that you would create on a rowstore table. No changes to the application needed. Unlike in earlier releases of SQL Server, the NCCI is updateable in SQL Server 2016 and Azure SQL Database, so your transaction workload will continue to run.


 

--create NCCI

CREATE NONCLUSTERED COLUMNSTORE INDEX orders_ncci ON orders  (accountkey, customername, purchaseprice, orderstatus, orderstatusdesc)

 

 

 

Load another 200k orders, but these orders are not ‘yet’ received by the customers using the below script.


--insert additional 200k rows

declare @outerloop int = 3000000

declare @i int = 0

declare @purchaseprice decimal (9,2)

declare @customername nvarchar (50)

declare @accountkey int

declare @orderstatus smallint

declare @orderstatusdesc nvarchar(50)

declare @ordernumber bigint

while (@outerloop < 3200000)

begin

Select @i = 0

begin tran

while (@i < 2000)

begin

set @ordernumber = @outerloop + @i

set @purchaseprice = rand() * 1000.0

set @accountkey = convert (int, RAND ()*1000)

set @orderstatus = convert (smallint, RAND()*5)

set @orderstatusdesc =

case @orderstatus

WHEN 0 THEN 'Order Started'

WHEN 1 THEN 'Order Closed'

WHEN 2 THEN 'Order Paid'

WHEN 3 THEN 'Order Fullfillment'

WHEN 4 THEN 'Order Shipped'

WHEN 5 THEN 'Order Received'

END

insert orders values (@accountkey,(convert(varchar(6), @accountkey) + 'firstname'),

@ordernumber, @purchaseprice, @orderstatus, @orderstatusdesc)

set @i += 1;

end

commit

set @outerloop = @outerloop + 2000

set @i = 0

end

go


If we look at the details on the columnstore index, you will see that 3 million rows are compressed while the new 200k rows are in the delta rowgroup

-- look at the rowgroups

select object_name(object_id), index_id, row_group_id, delta_store_hobt_id, state_desc, total_rows, trim_reason_desc, transition_to_compressed_state_desc

from sys.dm_db_column_store_row_group_physical_stats

where object_id = object_id('orders')


We will now run couple of analytics queries and compare the performance if NCCI was not there.

 These queries are simple and are not representative of the common real-time operational scenario involving multi-table joins but the intent here is to show you that query optimizer picks NCCI for analytics queries and provides significant speed up over traditional btree indexes. Also note, that the difference in performance will only widen as you add more data.

Example-1: --run simple query

select max (PurchasePrice)

from orders

The query plan is as follows showing that query optimizer chose NCCI.



When running the same query without using NCCI, the query took much longer to run (--

--run the query without using NCCI

select max (PurchasePrice)

from orders

option (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX)

 

 

Example-2: Let us now consider a more complex query

-- a more complex query

select top 5 customername, sum (PurchasePrice), Avg (PurchasePrice)

from orders

where purchaseprice > 90.0 and OrderStatus=5

group by customername

 


Here is the query plan for the query using NCCI. One thing to note ,the aggregate got pushed down to the scan node



--a more complex query without NCCI

select top 5 customername, sum (PurchasePrice), Avg (PurchasePrice)

from orders

where purchaseprice > 90.0 and OrderStatus = 5

group by customername

option (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX)

 

Check the Runtime information and compare the performance.