Data Analysis Example: Understand how partitioning improves query performance using Teradata EXPLAIN

Today, we will see if adding a partition-by clause improves our SELECT queries in Teradata.

Let us go through the example below:

DATABASE RETAIL;

SELECT * FROM CUSTOMER;

ALTER TABLE CUSTOMER ADD JOINING_DT DATE FORMAT 'YYYY-MM-DD';

UPDATE CUSTOMER SET JOINING_DT = '2015-02-01' WHERE CUST_ID = 1;
UPDATE CUSTOMER SET JOINING_DT = '2015-05-01' WHERE CUST_ID = 201;

/* lET US RUN A SELECT FOR A PARTICULAR MONTH OF THE YEAR */

SELECT * FROM CUSTOMER WHERE JOINING_DT BETWEEN '2015-01-01' AND '2015-01-31';

/* " 1) First, we lock a distinct RETAIL.""pseudo table"" for read on a"
RowHash to prevent global deadlock for RETAIL.CUSTOMER.
" 2) Next, we lock RETAIL.CUSTOMER for read."
3) We do an all-AMPs RETRIEVE step from RETAIL.CUSTOMER by way of an
all-rows scan with a condition of ("(RETAIL.CUSTOMER.JOINING_DT <=
DATE '2015-01-31') AND (RETAIL.CUSTOMER.JOINING_DT >= DATE
" '2015-01-01')"") into Spool 1 (group_amps), which is built locally"
on the AMPs. The size of Spool 1 is estimated with no confidence
to be 2 rows (102 bytes). The estimated time for this step is
0.03 seconds.
*/

As we see in STEP-3, the SELECT with range based WHERE results in a Full Table Scan. This is not a great way to query large tables.

Let us explore the alternate approach.

CREATE TABLE CUSTOMER_1
(
CUST_ID INTEGER
,CUST_NAME VARCHAR(30)
,JOINING_DT DATE FORMAT 'YYYY-MM-DD'
)
PRIMARY INDEX(JOINING_DT)
PARTITION BY (RANGE_N(
JOINING_DT BETWEEN '2014-01-01' AND '2014-12-31' EACH INTERVAL '1' YEAR
,'2015-01-01' AND '2015-12-31' EACH INTERVAL '1' MONTH
, NO RANGE OR UNKNOWN))
;

We insert the same data as was present in Customer table.

INSERT INTO CUSTOMER_1 SELECT * FROM CUSTOMER;

SELECT * FROM CUSTOMER_1 WHERE JOINING_DT BETWEEN '2015-01-01' AND '2015-01-31';

/* IF WE RERUN THE QUERY ON CUSTOMER TABLE ON THE NEW PARTIIONED TABLE THE EXPLAIN CHANGES AS BELOW

" 1) First, we lock a distinct RETAIL.""pseudo table"" for read on a"
RowHash to prevent global deadlock for RETAIL.CUSTOMER_1.
" 2) Next, we lock RETAIL.CUSTOMER_1 for read."
3) We do an all-AMPs RETRIEVE step from a single partition of
RETAIL.CUSTOMER_1 with a condition of (
"(RETAIL.CUSTOMER_1.JOINING_DT <= DATE '2015-01-31') AND
(RETAIL.CUSTOMER_1.JOINING_DT >= DATE '2015-01-01')") into Spool 1
" (group_amps), which is built locally on the AMPs. The size of"
Spool 1 is estimated with low confidence to be 4 rows (204 bytes).
The estimated time for this step is 0.03 seconds.

*/

As we see now that the partition is leveraged when the partition column is used in the WHERE clause of a SELECT query. This results in faster query execution.

Join us in the classroom for more stimulating sessions on Teradata.

Data Analysis Example

Pages

Saturday, 8 August 2015

Understand how partitioning improves query performance using Teradata EXPLAIN

No comments:

Post a Comment

Knowledge Archive