Saturday 27 April 2013

Drive high value Sales in Retail - A Data Science Case study


In a Retail store, many a times we have bad user experiences due to long queues at the billing register. But, even with efficient billing agents, we can only achieve a certain limit. So, if "N" is the total transactions and “t” represents time, the below formula signifies that we can accommodate maximum 2000 customer per billing register.

                                  R (Resultant transactions) =  0ò2000dN/dt

We can only accommodate 2000 transactions per day per register. So, it’s a bottleneck.
Now, we can add more registers, but that comes at a cost.
The alternative to increase the sales is to convert each transaction to be a high dollar purchase. So, we implement business intelligence algorithms to drive high value sales.
Even price sensitive users on exposure to better quality/service opt for higher value products.

For example:

A customer buys 2 items.

Average time for billing 2 items = 2*R = 4 mins

Case 1 :

Each item is $5 each. Total value of transaction = $10
Time taken = 4 mins. 
Hence, value generated in unit time = $10/4 = $2.5

Case 2 :

Each item is $9 each. Total value of the transaction = $18
Time taken = 4 mins. 
Hence, value generated in unit time = $18/4 = $4.5

Probability and Statistics can help answer numerous Business problems.

Think about it !!

Business Intelligence helps us to answer many such questions with ease. We will cover more such case-study later.

Sunday 21 April 2013

Sample UNIX script to compress and uncompress files - Use of the gzip, tar, zip command

There are different ways of creating file archive. Once, we create the archive, we need to know how to extract the file back and use it.

The below commands work great to achieve our goal:

Archive creation and extraction:

Using the tar command:
  1. tar -cvf TEST.tar ./*   - Running this command compresses all the files located in current       directory into TEST.tar file
  2. tar -xvf TEST.tar - This command will extract all the files from the archive TEST.tar
  3. tar -tf TEST.tar - This command will list all the files in the archive TEST.tar
Using the gzip command:
  1. gzip Test.txt  - This command will create a gzip file Test.txt.gz
  2. To unzip the file - gunzip Test.txt.gz
Using the zip comand:

1. zip Test1 Test.txt - This will create a file Test1.zip
2. unzip Test1.zip -d /tmp/Test - This command will unzip the files in Test1.zip to the /tmp/Test path

Let us know if this was helpful !!

Monday 15 April 2013

Teradata Fastexport returning blank lines - Common problems of having NULL values in table - Example and Syntax

NULL handling is the one of the most important aspects of Reporting queries. 
So, if you face a situation where your Fastexport is returning blank lines, check the columns for NULL again.

For example, in the below script, CustEmailAddress is a NULL column. A customer may or maynot provide  his/her email address.
And for all rows where CustEmailAddress is NULL, the entire row becomes NULL on concatenation. Any operation on NULL results in a NULL.

Hence, these rows appear as blank rows in the export file.

So, we have to be careful while exporting such values in Fastexport.

Steps to be performed to avoid blank lines:


  1. Use Coalesce function to send a default value if NULL exists
  2. Also, if the columns are of different datatypes, the default value in COALESCE may vary.
  3. For email columns(CustEmailAddress in the example below), a default of '0' may lead to misinterpretation of data. Chooose the default value carefully. We would suggest confirm the DEFAULT values from Downstream applications
  4. Best practice is to store all columns as Varchar and then send "BLANK" instead of NULL
You can also use the below UNIX command to handle Blank rows in the fastexport output file:

sed '/^$/d' /tmp/outfile1.txt > /out/outfile_20130405.txt


Example Script:


/* Weird behavior of NULL */
/* If any column is null and coalesce is missing, appending the null column leads to an entire null row */

SELECT TRIM(DATA1)
     FROM
     (
     SELECT
     '¿'
     || trim(coalesce(SourceId,''))
     || '|' ||trim(coalesce(Dates,''))
     || '|' ||trim(coalesce(IPAddress,''))
     || '|' ||trim(coalesce(SalesNum,''))
     || '|' ||trim(coalesce(StoreNum,''))
     || '|' ||trim(coalesce(CustFirstName,''))
     || '|' ||trim(coalesce(CustEmailAddress,''))
     || '|' ||trim(coalesce(OrderValue,''))
     || '|' ||trim(coalesce(Sellerid,''))
     AS DATA1
     FROM
     db_work_tbls.CustOrder1
     )as A
     ;

Let us know if this information has helped. Like us on Facebook or Google+

Please donate as your donation will help someone in need.

Saturday 13 April 2013

NULL handling in NOT IN clause - Scenario to handle NULL in Teradata


Very often, we try to find out the values from a table (employee) where the employee_ID is not in the table Resigned. 

One way of achieving that is to use a NOT IN clause. But, beware, since presence of NULL may give you the wrong information.

NULL in not exists results in no rows returned, because no NULL in Teradata is equal.
Moreover, any operation on NULL returns a NULL.

So, make sure to modify your inserts using the Coalesce(EMP_Salary, 0) for all retired employees.

The below example illustrates this unusual behavior of NULL:

Sample script:



/* Unusual behavior of NULL in NOT IN */

create table lci_work_tbls.tera11
(
col1 integer
) ;

insert into lci_work_tbls.tera11 values (1);
insert into lci_work_tbls.tera11 values (2);
insert into lci_work_tbls.tera11 values (3);

create table lci_work_tbls.tera12
(
col2 integer
) ;

insert into lci_work_tbls.tera12 values (1);
insert into lci_work_tbls.tera12 values (null);
insert into lci_work_tbls.tera11 values (3);

/* Test to find the rows returned using NOT IN */

select col1 from lci_work_tbls.tera11
where col1 not in (select col2 from lci_work_tbls.tera12)

;


Result - NO ROWS RETURNED


Like us on Facebook or Google+ if you like our posts. Share what topics you want us to cover.

Please donate if you are helped. Your donation will help those who need help !!

Saturday 6 April 2013

BTEQ Import - The key to load delta data at near Real-time frequency - Sample script included


TPump is hands-down the best utility to maintain an active data-warehouse. 
But many architects feel that if the data can be refreshed every 3 hours, and the Teradata System Peaks are not regular, we can use BTEQ Import.

Its easy to use, uses minimal system resources and ideal for small Delta loads.

Example script:

.logon edw/developer1,test;


.IMPORT REPORT FILE = /home/dev1/TRAN.MAR262013.csv

.REPEAT * PACK 250
USING
(
         IN_REC_TYP            CHAR(2)
        ,IN_CUS_NO         CHAR(18)
        ,IN_FTY_NO          CHAR(7)
        ,IN_REG_NO             CHAR(3)
        ,IN_TRS_NO              CHAR(4)
        ,IN_TRS_DT               CHAR(10)
        ,IN_TRS_TM               CHAR(8)
        ,IN_APP_ID         CHAR(9)
        ,IN_AMT               CHAR(7)
        ,IN_ADD_DWN_PMT_AM CHAR(7)
        ,IN_FST_PMT_AM     CHAR(7)
)
INSERT INTO Database1.Pos_register
   (
         CUS_ID_NO
        ,IAN_TYP_CD
        ,FTY_NO
        ,REG_NO
        ,TRS_NO
        ,TRS_DT
        ,TRS_TM
        ,APP_ID
        ,AMT
        ,ADD_DWN_PMT_AM
        ,FST_PMT_AM
   )
VALUES
   (
        :IN_CUS_ID_NO
        ,'RS'
        ,TRIM(:IN_FTY_NO)
        ,TRIM(:IN_REG_NO)
        ,TRIM(:IN_TRS_NO)
        ,:IN_trs_dt (DATE,FORMAT'YYYY-MM-DD')
        ,:IN_trs_tm (TIME(0))
        ,NULLIF(:IN_APP_ID, '         ')
        ,CASE WHEN TRIM(:IN_LSE_AM) = ''
              THEN 0
         ELSE (TRIM(:IN_AMT))
         END
        ,CASE WHEN TRIM(:IN_ADD_DWN_PMT_AM/100) = ''
              THEN 0
         ELSE (TRIM(:IN_ADD_DWN_PMT_AM/100))
         END
        ,CASE WHEN TRIM(:IN_FST_PMT_AM) = ''
              THEN 0
         ELSE (TRIM(:IN_FST_PMT_AM/100))
         END
   );

.IF ERRORCODE <> 0 THEN .QUIT ERRORCODE;

Advantages of BTEQ Import:

1. Bteq import is light and uses less system resources as compared to its heavy-duty cousins.

2. Bteq import allows manipulation of the incoming data during the load. See the example above, where we have divided the :IN_ADD_DWN_PMT_AM 

3. The PACK option helps us to load multiple rows based on the value set. In the above example data is loaded in 250 rows per session.


Let us know if this topic helped you. Share our blog on Facebook on Google+.

Please donate using the button on the right. Your donation will help someone.