Data Analysis Example: February 2013

Wednesday, 20 February 2013

Fastload pre-processing script in UNIX

We may get files where the total no. of columns in the file is not as per the Defined layout in Fastload.

As part of Optimization of the Batch process, we can abort the Fastload job if the file is not as per the Defined format.

Below is an example script to achieve that:

#! /bin/ksh

while read line

do

# Assuming that "|" is the separator in your file

Len1=`echo $line | awk -F '|' '{print NF}'`

echo $Len1 $line

# If your file is supposed to have only 32 columns

if [ $Len1 != 32 ]
then
echo $line >>Error_Customer_Connect.txt
fi

done < Extract_Customer_Connect.txt

if [ -s Error_Customer_Connect.txt ]
then
echo "File is not per the defined layout"
exit 1
fi

exit 0

Let us know if this helped. Like us or share this blog on Google+

Donate a small amount for renovation of historical structures and for feeding the needy.

Tuesday, 12 February 2013

Advantages and Disadvantages of MVC - Compression in Teradata with sample script

Multi-Value Compression (MVC), also called field compression, allows compression to be applied at a column level. MVC replaces values specified by the user with a compact bit pattern. When the data is accessed, the bit pattern is used to look up the original value in the list (or dictionary).

For the character data type, before Teradata 13.10, MVC was restricted to fixed-length character columns (CHAR data type) and columns up to 255 characters wide. Teradata 13.10 onwards, MVC can be used with variable-length character columns (VARCHAR data type) and columns up to 510 characters wide. It can support any numeric type, all character data, GRAPHIC, VARGRAPHIC, BYTE and VARBYTE.

Compression is effectively used to save space for repeating data.

For example, the SALE_SEQ_NBR column, the value always ranges between 1 to 18.
So, its better to compress the value and store it in the table-header.

Sample Script for compression:

CREATE SET TABLE DB_PROD1.sales_detail ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
DATABLOCKSIZE = 130048 BYTES, FREESPACE = 15 PERCENT, CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
CUS_ID_NO DECIMAL(18,0) NOT NULL,
IAN_TYP_CD CHAR(2) CHARACTER SET LATIN NOT CASESPECIFIC COMPRESS 'KS',
SALE_NBR DECIMAL(15,0) NOT NULL,
SALE_SEQ_NBR SMALLINT NOT NULL DEFAULT 0 COMPRESS (1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10 ,11 ,12 ,13 ,14 ,15 ,16 ,17 ,18 ),
LOCN_NBR SMALLINT NOT NULL DEFAULT 0 ,
KSN_ID DECIMAL(9,0) NOT NULL DEFAULT 0. ,
DAY_DT DATE FORMAT 'YYYY/MM/DD' NOT NULL,
UPC_NBR DECIMAL(13,0) NOT NULL,
SKU_PRC_TYPE_CD BYTEINT NOT NULL COMPRESS (1 ,3 ,6 ),
SELL_TYPE_CD BYTEINT NOT NULL COMPRESS 0 ,
SELL_QTY DECIMAL(9,4) NOT NULL COMPRESS (0.0000 ,1.0000 ,-1.0000 ),
SELL_AMT DECIMAL(9,2) NOT NULL COMPRESS (12.99 ,2.79 ,7.99 ,2.99 ,3.00 ,0.50 ),
)
PRIMARY INDEX ( CUS_ID_NO )
PARTITION BY RANGE_N(DAY_DT BETWEEN DATE '2006-01-01' AND DATE '2010-12-31' EACH INTERVAL '1' MONTH ,
DATE '2011-01-01' AND DATE '2013-12-31' EACH INTERVAL '1' DAY ,
NO RANGE, UNKNOWN)
INDEX ( KSN_ID )
INDEX ( LOCN_NBR ,DAY_DT )
INDEX ( UPC_NBR );

Real-life Scenario faced:

Disadvantages-

MVC leading to long-running queries and using extra CPU cycles.
Compressed columns are not used for Partition Primary Index

Explanation: If you have a compression on a secondary index, the data will be skewed - ( as data is distributed according to PI).
Now, there is a small overhead of fetching data based on MVC value. This will be magnified if the data is skewed. Hence, for long-running queries, the hot-amp may become very CPU-intensive.

Hence, we suggest to use MVC for cold-data. The columns in SCD that are descriptive and not used in joins.

We will cover Block-level compression and Algorithmic Compression in a later post.

Please donate if you find this post helpful by using the button on right !! Let us know your issues, and we will solve them for you.

Friday, 8 February 2013

Sample Fastexport to export from multiple tables /SQL - Multiple exports in 1 Script

Generally for all Live Projects, we will find the requirement, where a Data file has to be created that contains information about Customer, His Behavior over the Websites, and the Account Details.

This can be easily solved using Fast-export. We can always use multiple SQL's to do this.

Just write these SQLs between the

.Begin Export and .End Export

Example of Multi-SQL Fast-export script:

.logtable $LOADDB.LOGTABLE_ARRANGEMENT;

.run File $LOGON_FILE;

.begin export sessions 2;

.export outfile $VAR6 FORMAT TEXT MODE RECORD;

---- Below script is for all those records with product code as 006_DET -----

SELECT TRIM(CAST(HEAD1.HEADER AS CHAR(100))) FROM

(

SELECT

'¿'||

TRIM(A.ID_HEADER)

|| '|' ||TRIM(A.CLUSTER_NAME)

|| '|' || TRIM(A.BUS_DATE)

|| '|' ||TRIM(A.PRODUCT_NAME)

|| '|' ||TRIM(A.PROCESS_NAME)

|| '|' ||TRIM(A.FREQUENCY_NAME)

|| '|' ||TRIM(A.VERSION_NO)

|| '¿'

AS HEADER

FROM

(

SELECT

'01' AS ID_HEADER

,TRIM(CLUSTER_NAME) AS CLUSTER_NAME

, CAST(CAST(CAST('$BUS_DATE' AS DATE FORMAT 'YYYY-MM-DD') AS DATE FORMAT 'DDMMMYYYY') AS VARCHAR(12)) AS BUS_DATE

,TRIM(PRODUCT_NAME) AS PRODUCT_NAME

,TRIM(PROCESS_NAME) AS PROCESS_NAME

,TRIM(FREQUENCY_NAME) AS FREQUENCY_NAME

,MAX(VERSION_NO)+1 AS VERSION_NO

FROM

?LOADDB.VERSION_HEAD

WHERE CAST('$BUS_DATE' AS DATE FORMAT 'YYYY-MM-DD')= RUNDATE

AND PRODUCT_NAME='PT_006_DET'

group by 1,2,3,4,5,6

) AS A

) AS HEAD1

;

----------------- This is for details listed under Accounts Table ----

Example of Fast-export:

.logtable $LOADDB.LOGTABLE_ARRANGEMENT;

.run File $LOGON_FILE;

.begin export sessions 2;

.export outfile $VAR6 FORMAT TEXT MODE RECORD;

SELECT TRIM(CAST(HEAD1.HEADER AS CHAR(100))) FROM

(

SELECT

'¿'||

TRIM(A.ID_HEADER)

|| '|' ||TRIM(A.CLUSTER_NAME)

|| '|' || TRIM(A.BUS_DATE)

|| '|' ||TRIM(A.PRODUCT_NAME)

|| '|' ||TRIM(A.PROCESS_NAME)

|| '|' ||TRIM(A.FREQUENCY_NAME)

|| '|' ||TRIM(A.VERSION_NO)

|| '¿'

AS HEADER

FROM

(

SELECT

'01' AS ID_HEADER

,TRIM(CLUSTER_NAME) AS CLUSTER_NAME

, CAST(CAST(CAST('$BUS_DATE' AS DATE FORMAT 'YYYY-MM-DD') AS DATE FORMAT 'DDMMMYYYY') AS VARCHAR(12)) AS BUS_DATE

,TRIM(PRODUCT_NAME) AS PRODUCT_NAME

,TRIM(PROCESS_NAME) AS PROCESS_NAME

,TRIM(FREQUENCY_NAME) AS FREQUENCY_NAME

,MAX(VERSION_NO)+1 AS VERSION_NO

FROM

?LOADDB.VERSION_HEAD

WHERE CAST('$BUS_DATE' AS DATE FORMAT 'YYYY-MM-DD')= RUNDATE

AND PRODUCT_NAME='PT_007_ACC'

group by 1,2,3,4,5,6

) AS A

) AS HEAD1

;

.end export;

.logoff;

Let us know if this was of help !!

Please donate for the renovation of old architectural buildings using the button on the right.

Friday, 1 February 2013

Queue Table implementation - Job Control with Sample Script

Queue Tables were introduced from TD V2R6 onwards. It helps in job-control by using the FIFO algorithm.

A Job that has waited the longest moves up, and will be executed first.

Since Queue tables are implemented in Teradata using CREATE TABLE SQL, all A-C-I-D rules of RDBMS apply.

Hence, if the job is aborted, the transaction is rolled-back.

The first column in a QITS or Queue table is always the QITS column ( Query Insertion Time Stamp).

Below is a sample implementation of Queue Tables:

CREATE TABLE DB_LOAD_TBLS.LAYWY_CONFIG
, QUEUE
(
QITS_JOB_START TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6)
,JOB_ID INTEGER)
;

INSERT INTO DB_LOAD_TBLS.LAYWY_CONFIG
VALUES
(CURRENT_TIMESTAMP(6)
, 1
);

SELECT * FROM LCI_LOAD_TBLS.LAYWY_CONFIG;

SELECT AND CONSUME TOP 1 * FROM LCI_LOAD_TBLS.LAYWY_CONFIG;

The SELECT AND CONSUME statement helps you to simulate the FIFO Pop function.
So, the moment the job is executed successfully, the entry will be deleted. Thus, this also helps in removing re-runs.

Feel free to add comments. And If you feel Helped, Please Donate !!!

Data Analysis Example

Pages

Wednesday, 20 February 2013

Fastload pre-processing script in UNIX

Tuesday, 12 February 2013

Advantages and Disadvantages of MVC - Compression in Teradata with sample script

Friday, 8 February 2013

Sample Fastexport to export from multiple tables /SQL - Multiple exports in 1 Script

Friday, 1 February 2013

Queue Table implementation - Job Control with Sample Script

Knowledge Archive