Data Analysis Example: January 2013

Saturday 26 January 2013

Adhoc Queries - How to handle them in the Teradata environment

Adhoc queries are always the difficult and resource-consuming processes. They are generally fired by the business users and will be used for Strategic decision making.

With the growing size of the Enterprise, this task is becoming more costly.

For the Teradata Database administrator, the following options are available:

Understand the Workload mix (Using Workload Monitor) and balance the Workload (Use Teradata Active Workload Manager)
Increase or decrease the priority of the Query
Monitor the resource consumption of the Query - (using Viewpoint)
Give "Privileged Users" permissions to the Reporting Users

Administrators can see the following Teradata e-brochure

As for the End-users, there are a lot of Ad-hoc query tools or Reporting tools.

These tools have a few drawbacks which you can read in the article listed below:
Ad Hoc Query Tools Shootout (This article was written in 1995 and till date some of the drawbacks mentioned by Dr. Paul Dorsey remain)

Friday 25 January 2013

Sample UNIX Shell script to Archive and Delete Data files

Every Data-warehousing or OLTP systems procure data from flat files.
Storing, archiving and Purging these files is necessary to maintain free space on your file-system.

Below is a sample script which will help you in these activities:

$> vi ArchiveScript.ksh

Then in the vi-editor enter the below mentioned script:

#! /bin/ksh

#Set Local variables

Date1=`date +%Y%m%d`

echo "$Date1 is Current Date"

#User-defined Function to Display error messages
ErrorCapture()
{
if [ $1 -ne 0 ]
then
echo "$2"
exit $1
else
exit 0
fi
}

#Delete users temporary files

if [ -s /tmp/users.tmp1 ]
then
echo " Delete Temporary files"
rm /tmp/users.tmp1
RC=$?
ErrorCapture $RC "Temporary file removal Failed "
else
echo "Temp file not found"
fi

# Archive users data file

if [ -s /fs/fs01/data/users ]
then
echo "Rename file with datestamp"
if [ -d /fs/fs01/data/SYWmbrVisitArch ]
then
mv /fs/fs01/data/users /fs/fs01/data/SYWmbrVisitArch/users_$Date1
else
echo "Creating directory /fs/fs01/data/SYWmbrVisitArch"
mkdir /fs/fs01/data/SYWmbrVisitArch
mv /fs/fs01/data/users /fs/fs01/data/SYWmbrVisitArch/users_$Date1
fi
RC=$?
ErrorCapture $RC "Renaming User Failed"
else
echo "File users not found in /fs/fs01/data"
fi

#Purge files older than 7 days

find /fs/fs01/data/SYWmbrVisitArch -type f -mtime +7 -exec rm {} \;

RC=$?

ErrorCapture $RC "Purge of datafile users older than 7 days Failed"

exit 0

Step 3 - Save by pressing 'Esc' + ":" (colon) + "wq!"

Let us know if this was helpful. Subscribe using the button on the right !!

Thursday 24 January 2013

Multiload UPSERT in Teradata with Example

UPSERT is an operation in DBMS, where if the new Delta contains updates for existing rows, they are Updated first. The remaining new rows in the Delta file will be inserted.

Multiload is well-equipped to handle such requests.

The following example illustrates how you can use Multiload to perform an UPSERT operation:

Note: This is different from the SQL UPSERT (UPDATE ... ELSE INSERT...)

Sample script:

.LOGTABLE WORK_TBLS.USER_INTERACT_ID_LOG;

.LOGON DWTEST1/USER1,PWD1;
.BEGIN IMPORT MLOAD
TABLES DW_TBLS.USER_INTERACT_ID
WORKTABLES WORK_TBLS.USER_INTERACT_ID_WK
ERRORTABLES WORK_TBLS.USER_INTERACT_ID_ET
WORK_TBLS.USER_INTERACT_ID_UV;

.LAYOUT DATAIN_LAYOUT;
.FIELD INTERACT_DESC 1 VARCHAR(30);
.FIELD INTERACT_TYP * varchar(10);

.DML LABEL UPDATE_DML
DO INSERT FOR MISSING UPDATE ROWS;
UPDATE DW_TBLS.USER_INTERACT_ID
SET
INTERACT_DESC = :INTERACT_DESC
WHERE
INTERACT_TYP = :INTERACT_TYP
;
INSERT INTO DW_TBLS.USER_INTERACT_ID
(
INTERACT_TYP
,INTERACT_DESC
)
values(
:INTERACT_TYP
,:INTERACT_DESC
);

.IMPORT INFILE /tmp/InteractionTypes.csv.tmp1
FORMAT vartext ','
LAYOUT DATAIN_LAYOUT
APPLY UPDATE_DML
;
.END MLOAD;

.LOGOFF;

DO INSERT FOR MISSING UPDATE ROWS - This keyword gives a HINT to multiload that an UPSERT is to be performed.

Otherwise, all rows will qualify for both UPDATE and INSERT, and the UPDATE rows go to _UV table.

Keep Reading to understand how to run a multiload, restart etc.!!

Subscribe with us if you like our Blog. Help us understand what topics should we cover.

Multiload in Teradata using a Variable-length, comma-separated file

How to load a table from Comma-separated Variable length file?

Multiload has four FORMAT to load data:

FASTLOAD;

BINARY;

TEXT;

UNFORMAT;

VARTEXT.

The following is a Teradata Multiload example:

.LOGTABLE WORK_TBLS.USER_INTERACT_ID_LOG;

.LOGON DWTEST1/USER1,PWD;

.BEGIN IMPORT MLOAD

TABLES DW_TBLS.USER_INTERACT_ID

WORKTABLES WORK_TBLS.USER_INTERACT_ID_WK

ERRORTABLES WORK_TBLS.USER_INTERACT_ID_ET

WORK_TBLS.USER_INTERACT_ID_UV;

.LAYOUT DATAIN_LAYOUT;

.FIELD INTERACT_DESC 1 VARCHAR(30);

.FIELD INTERACT_TYP * varchar(10);

.DML LABEL INSERT_DML;

INSERT INTO DW_TBLS.USER_INTERACT_ID

(

INTERACT_TYP = :INTERACT_TYP

,INTERACT_DESC = :INTERACT_DESC

);

.IMPORT INFILE /tmp/InteractionTypes.csv.tmp1

FORMAT vartext ','LAYOUT DATAIN_LAYOUT

APPLY INSERT_DML;

.END MLOAD;

.LOGOFF;

Important Points to Note:

1. The Errortables are Optional. If you do not specify the errortables, they will be created as ET_USER_INTERACT_ID and UV_USER_INTERACT_ID

But, if they are specified, they are position-dependent. So, ET and UV table may get interchanged by mistake.

Hence, if you write

ERRORTABLES
UV_USER_INTERACT_ID
ET_USER_INTERACT_ID

It will result in UV_USER_INTERACT_ID storing Error rows and ET_USER_INTERACT_ID storing UPI violation rows.

Incase you need any, just write your queries in comments section. Let us know what topics you want covered in the next blogs!!

Subscribe with us if you like our Blog!!

Data Analysis Example

Pages