Today, we will see if adding a partition-by clause improves our SELECT queries in Teradata.
Let us go through the example below:
DATABASE RETAIL;
SELECT * FROM CUSTOMER;
ALTER TABLE CUSTOMER ADD JOINING_DT DATE FORMAT 'YYYY-MM-DD';
UPDATE CUSTOMER SET JOINING_DT = '2015-02-01' WHERE CUST_ID = 1;
UPDATE CUSTOMER SET JOINING_DT = '2015-05-01' WHERE CUST_ID = 201;
/* lET US RUN A SELECT FOR A PARTICULAR MONTH OF THE YEAR */
SELECT * FROM CUSTOMER WHERE JOINING_DT BETWEEN '2015-01-01' AND '2015-01-31';
/* " 1) First, we lock a distinct RETAIL.""pseudo table"" for read on a"
RowHash to prevent global deadlock for RETAIL.CUSTOMER.
" 2) Next, we lock RETAIL.CUSTOMER for read."
3) We do an all-AMPs RETRIEVE step from RETAIL.CUSTOMER by way of an
all-rows scan with a condition of ("(RETAIL.CUSTOMER.JOINING_DT <=
DATE '2015-01-31') AND (RETAIL.CUSTOMER.JOINING_DT >= DATE
" '2015-01-01')"") into Spool 1 (group_amps), which is built locally"
on the AMPs. The size of Spool 1 is estimated with no confidence
to be 2 rows (102 bytes). The estimated time for this step is
0.03 seconds.
*/
As we see in STEP-3, the SELECT with range based WHERE results in a Full Table Scan. This is not a great way to query large tables.
Let us explore the alternate approach.
CREATE TABLE CUSTOMER_1
(
CUST_ID INTEGER
,CUST_NAME VARCHAR(30)
,JOINING_DT DATE FORMAT 'YYYY-MM-DD'
)
PRIMARY INDEX(JOINING_DT)
PARTITION BY (RANGE_N(
JOINING_DT BETWEEN '2014-01-01' AND '2014-12-31' EACH INTERVAL '1' YEAR
,'2015-01-01' AND '2015-12-31' EACH INTERVAL '1' MONTH
, NO RANGE OR UNKNOWN))
;
We insert the same data as was present in Customer table.
INSERT INTO CUSTOMER_1 SELECT * FROM CUSTOMER;
SELECT * FROM CUSTOMER_1 WHERE JOINING_DT BETWEEN '2015-01-01' AND '2015-01-31';
/* IF WE RERUN THE QUERY ON CUSTOMER TABLE ON THE NEW PARTIIONED TABLE THE EXPLAIN CHANGES AS BELOW
" 1) First, we lock a distinct RETAIL.""pseudo table"" for read on a"
RowHash to prevent global deadlock for RETAIL.CUSTOMER_1.
" 2) Next, we lock RETAIL.CUSTOMER_1 for read."
3) We do an all-AMPs RETRIEVE step from a single partition of
RETAIL.CUSTOMER_1 with a condition of (
"(RETAIL.CUSTOMER_1.JOINING_DT <= DATE '2015-01-31') AND
(RETAIL.CUSTOMER_1.JOINING_DT >= DATE '2015-01-01')") into Spool 1
" (group_amps), which is built locally on the AMPs. The size of"
Spool 1 is estimated with low confidence to be 4 rows (204 bytes).
The estimated time for this step is 0.03 seconds.
*/
As we see now that the partition is leveraged when the partition column is used in the WHERE clause of a SELECT query. This results in faster query execution.
Join us in the classroom for more stimulating sessions on Teradata.
Let us go through the example below:
DATABASE RETAIL;
SELECT * FROM CUSTOMER;
ALTER TABLE CUSTOMER ADD JOINING_DT DATE FORMAT 'YYYY-MM-DD';
UPDATE CUSTOMER SET JOINING_DT = '2015-02-01' WHERE CUST_ID = 1;
UPDATE CUSTOMER SET JOINING_DT = '2015-05-01' WHERE CUST_ID = 201;
/* lET US RUN A SELECT FOR A PARTICULAR MONTH OF THE YEAR */
SELECT * FROM CUSTOMER WHERE JOINING_DT BETWEEN '2015-01-01' AND '2015-01-31';
/* " 1) First, we lock a distinct RETAIL.""pseudo table"" for read on a"
RowHash to prevent global deadlock for RETAIL.CUSTOMER.
" 2) Next, we lock RETAIL.CUSTOMER for read."
3) We do an all-AMPs RETRIEVE step from RETAIL.CUSTOMER by way of an
all-rows scan with a condition of ("(RETAIL.CUSTOMER.JOINING_DT <=
DATE '2015-01-31') AND (RETAIL.CUSTOMER.JOINING_DT >= DATE
" '2015-01-01')"") into Spool 1 (group_amps), which is built locally"
on the AMPs. The size of Spool 1 is estimated with no confidence
to be 2 rows (102 bytes). The estimated time for this step is
0.03 seconds.
*/
As we see in STEP-3, the SELECT with range based WHERE results in a Full Table Scan. This is not a great way to query large tables.
Let us explore the alternate approach.
CREATE TABLE CUSTOMER_1
(
CUST_ID INTEGER
,CUST_NAME VARCHAR(30)
,JOINING_DT DATE FORMAT 'YYYY-MM-DD'
)
PRIMARY INDEX(JOINING_DT)
PARTITION BY (RANGE_N(
JOINING_DT BETWEEN '2014-01-01' AND '2014-12-31' EACH INTERVAL '1' YEAR
,'2015-01-01' AND '2015-12-31' EACH INTERVAL '1' MONTH
, NO RANGE OR UNKNOWN))
;
We insert the same data as was present in Customer table.
INSERT INTO CUSTOMER_1 SELECT * FROM CUSTOMER;
SELECT * FROM CUSTOMER_1 WHERE JOINING_DT BETWEEN '2015-01-01' AND '2015-01-31';
/* IF WE RERUN THE QUERY ON CUSTOMER TABLE ON THE NEW PARTIIONED TABLE THE EXPLAIN CHANGES AS BELOW
" 1) First, we lock a distinct RETAIL.""pseudo table"" for read on a"
RowHash to prevent global deadlock for RETAIL.CUSTOMER_1.
" 2) Next, we lock RETAIL.CUSTOMER_1 for read."
3) We do an all-AMPs RETRIEVE step from a single partition of
RETAIL.CUSTOMER_1 with a condition of (
"(RETAIL.CUSTOMER_1.JOINING_DT <= DATE '2015-01-31') AND
(RETAIL.CUSTOMER_1.JOINING_DT >= DATE '2015-01-01')") into Spool 1
" (group_amps), which is built locally on the AMPs. The size of"
Spool 1 is estimated with low confidence to be 4 rows (204 bytes).
The estimated time for this step is 0.03 seconds.
*/
As we see now that the partition is leveraged when the partition column is used in the WHERE clause of a SELECT query. This results in faster query execution.
Join us in the classroom for more stimulating sessions on Teradata.
No comments:
Post a Comment
Please share your thoughts and let us know the topics you want covered