Update and Insert - The Concept
In SQL, we have the concept of UPDATE Else INSERT, which means:
If I have rows that are already in the table and their attributes changed, they should be updated.
Else, if I have new rows, insert them into the table.
Now, Teradata has a new feature since V12.0, known as the ANSI Merge.
Key benefits/Advantages:
1. ANSI Merge is more portable across platforms
2. ANSI Merge takes care of which rows to update and which ones to insert, based on the columns specified in the SET block of Update
3. ANSI Merge ignores rows that are not included in the SET column list.
4. ANSI Merge improves the performance of the queries. This is due to the fact that ANSI Merge is a single-table read, whereas traditional UPSERT is a 2 table-table read.
It uses a new step called MRM(Merge Row Multiple Operations) which does updates and inserts in a single step.
Disadvantages:
1. The MERGE Using Table(Derived table S) and the Target table (ARP_TBLS_PRD.WRK_FACT_SHC_DLY_OPR_INS) in the example, when joined, should return unique rows.
If duplicates, or multiple entries, qualify for UPDATE, quite logically, MERGE fails.
Example:
MERGE INTO ARP_TBLS_PRD.WRK_FACT_SHC_DLY_OPR_INS
USING (
SELECT VEND_PACK_OPR_ID, LOCN_OPR_NBR, DAY_NBR, MDS_STS, RPL_TYP_CD, INS_TYP_CD, INS_SUB_TYP_CD,
SUM(UN_QT) AS UN_QT, SUM(CST_DLR) AS CST_DLR, SUM(SLL_DLR) AS SLL_DLR
FROM STG_VIEWS_PRD.INCR_SHC_DLY_OPR_INS WHERE DAY_NBR <> '9999-09-09'
GROUP BY VEND_PACK_OPR_ID, LOCN_OPR_NBR, DAY_NBR, MDS_STS, RPL_TYP_CD, INS_TYP_CD, INS_SUB_TYP_CD ) S
ON ARP_TBLS_PRD.WRK_FACT_SHC_DLY_OPR_INS.VEND_PACK_ID=S.VEND_PACK_OPR_ID
AND ARP_TBLS_PRD.WRK_FACT_SHC_DLY_OPR_INS.LOCN_NBR=S.LOCN_OPR_NBR
AND ARP_TBLS_PRD.WRK_FACT_SHC_DLY_OPR_INS.INS_TYP_CD=S.INS_TYP_CD
AND ARP_TBLS_PRD.WRK_FACT_SHC_DLY_OPR_INS.INS_SUB_TYP_CD=S.INS_SUB_TYP_CD
AND ARP_TBLS_PRD.WRK_FACT_SHC_DLY_OPR_INS.DAY_NBR = S.DAY_NBR
WHEN MATCHED THEN UPDATE
SET
RPL_TYP_CD = S.RPL_TYP_CD
, TTL_UN_QT = S.UN_QT
, TTL_CST_DLR = S.CST_DLR
, TTL_SLL_DLR = S.SLL_DLR
, PVE_UN_QT = CASE WHEN S.UN_QT > 0 THEN S.UN_QT ELSE 0 END
, PVE_CST_DLR = CASE WHEN S.UN_QT > 0 THEN S.CST_DLR ELSE 0 END
, PVE_SLL_DLR = CASE WHEN S.UN_QT > 0 THEN S.SLL_DLR ELSE 0 END
WHEN NOT MATCHED THEN INSERT
VALUES (
S.VEND_PACK_OPR_ID,
S.LOCN_OPR_NBR,
S.DAY_NBR,
S.MDS_STS,
S.RPL_TYP_CD,
S.INS_TYP_CD,
S.INS_SUB_TYP_CD,
S.UN_QT,
S.CST_DLR,
S.SLL_DLR,
CASE WHEN S.UN_QT > 0 THEN S.UN_QT ELSE 0 END,
CASE WHEN S.UN_QT > 0 THEN S.CST_DLR ELSE 0 END,
CASE WHEN S.UN_QT > 0 THEN S.SLL_DLR ELSE 0 END
);
Explain:
So, lets see what the explain plan says:
1) First, we lock a distinct ARP_TBLS_PRD."pseudo table" for
write on a RowHash to prevent global deadlock for
ARP_TBLS_PRD.WRK_FACT_SHC_DLY_OPR_INS.
2) Next, we lock STG_TBLS_PRD.INCR_SHC_DLY_OPR_INS for access,
and we lock ARP_TBLS_PRD.WRK_FACT_SHC_DLY_OPR_INS for write.
3) We do an all-AMPs SUM step to aggregate from
STG_TBLS_PRD.INCR_SHC_DLY_OPR_INS by way of an all-rows scan
with a condition of (
"STG_TBLS_PRD.INCR_SHC_DLY_OPR_INS.DAY_NBR <> DATE
'9999-09-09'") , grouping by field1 (
STG_TBLS_PRD.INCR_SHC_DLY_OPR_INS.VEND_PACK_OPR_ID
,STG_TBLS_PRD.INCR_SHC_DLY_OPR_INS.LOCN_OPR_NBR
,STG_TBLS_PRD.INCR_SHC_DLY_OPR_INS.DAY_NBR
,STG_TBLS_PRD.INCR_SHC_DLY_OPR_INS.MDS_STS
,STG_TBLS_PRD.INCR_SHC_DLY_OPR_INS.RPL_TYP_CD
,STG_TBLS_PRD.INCR_SHC_DLY_OPR_INS.INS_TYP_CD
,STG_TBLS_PRD.INCR_SHC_DLY_OPR_INS.INS_SUB_TYP_CD).
Aggregate Intermediate Results are computed locally, then placed
in Spool 3. The size of Spool 3 is estimated with no confidence
to be 7,696,317 rows (692,668,530 bytes). The estimated time for
this step is 0.08 seconds.
4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
an all-rows scan into Spool 1 (used to materialize view, derived
table or table function S) (all_amps) (compressed columns allowed),
which is redistributed by the rowkey of (
STG_TBLS_PRD.INCR_SHC_DLY_OPR_INS.VEND_PACK_OPR_ID,
STG_TBLS_PRD.INCR_SHC_DLY_OPR_INS.DAY_NBR) to all AMPs. Then
we do a SORT to partition Spool 1 by rowkey. The size of Spool 1
is estimated with no confidence to be 7,696,317 rows (461,779,020
bytes). The estimated time for this step is 0.04 seconds.
This the portion mentioned in Point 3. So only those columns listed in this condition are considered for Merge. Any other column, will be ignored by Merge.
How it helps?
If you have 50,000 rows, of which only 1000 rows changed. Merge will automatically update the 1000 rows and not touch the remaining 49,000 rows.
Try to make sure that the source and target tables have the same PI and PPI, which results in even faster operation.
5) We do an all-AMPs merge with matched updates and unmatched inserts
into ARP_TBLS_PRD.WRK_FACT_SHC_DLY_OPR_INS from Spool 1 (Last
Use) with a condition of (
"(ARP_TBLS_PRD.WRK_FACT_SHC_DLY_OPR_INS.VEND_PACK_ID =
VEND_PACK_OPR_ID) AND
((ARP_TBLS_PRD.WRK_FACT_SHC_DLY_OPR_INS.LOCN_NBR =
LOCN_OPR_NBR) AND
((ARP_TBLS_PRD.WRK_FACT_SHC_DLY_OPR_INS.INS_TYP_CD =
INS_TYP_CD) AND
((ARP_TBLS_PRD.WRK_FACT_SHC_DLY_OPR_INS.INS_SUB_TYP_CD =
INS_SUB_TYP_CD) AND
(ARP_TBLS_PRD.WRK_FACT_SHC_DLY_OPR_INS.DAY_NBR = DAY_NBR ))))").
The number of rows merged is estimated with no confidence to be
7,696,317 rows.
6) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> No rows are returned to the user as the result of statement 1.
No comments:
Post a Comment
Please share your thoughts and let us know the topics you want covered