Data Analysis Example: CSUM and ROW_NUMBER comparison in Teradata - Example script with statistics and syntax

Friday, 4 September 2015

CSUM and ROW_NUMBER comparison in Teradata - Example script with statistics and syntax - Surrogate key generation in Teradata

We keep getting questions about which one to use for sequence number generation in Teradata.

Many people use CSUM(1,1) as its easier and other databases support that. But, in Teradata , it is not recommended.

To test this out, we ran the below queries today on a table that contains 60K rows and another with just 1000 rows.

Example script:

--- Using CSUM
SELECT CSUM(1, SEED1.ID_MAX), L_ORDERKEY
,ID_MAX
FROM ITEMPPI AS E
CROSS JOIN
(SELECT 1 AS ID_MAX)AS SEED1;

--- Using Row Number
SELECT ROW_NUMBER() OVER(ORDER BY L_ORDERKEY) + SEED1.ID_MAX, L_ORDERKEY
,ID_MAX
FROM ITEMPPI AS E
CROSS JOIN
(SELECT 0 AS ID_MAX)AS SEED1;

SELECT * FROM DBC.DBQLogTbl
WHERE CAST(COLLECTTIMESTAMP AS DATE FORMAT 'YYYY-MM-DD')= CURRENT_DATE
AND SESSIONID = 1624
ORDER BY COLLECTTIMESTAMP DESC;

When you check the QueryLog after running a CSUM(1,1) you'll notice that a single AMP (usually vproc 0) processed all the data ,resulting in high cpu/io and spool.

For OLAP functions there is distribution of the data based on PARTTITION and ORDER and for CSUM(1,1) there's only 1 partition and no order. This can be seen in the diagram below:

Statistics and query log comparison between CSUM and ROW_NUMBER in Teradata

For a smaller table, the difference was not noteworthy, but for larger datasets, we see a big difference.

Let us know if you need more information. Like us on Twitter or Facebook.

Data Analysis Example

Pages

Friday, 4 September 2015

CSUM and ROW_NUMBER comparison in Teradata - Example script with statistics and syntax - Surrogate key generation in Teradata

No comments:

Post a Comment

Knowledge Archive