Tuesday 12 February 2013

Advantages and Disadvantages of MVC - Compression in Teradata with sample script

 Multi-Value Compression (MVC), also called field compression, allows compression to be applied at a column level. MVC replaces values specified by the user with a compact bit pattern. When the data is accessed, the bit pattern is used to look up the original value in the list (or dictionary).


For the character data type, before Teradata 13.10, MVC was restricted to fixed-length character columns (CHAR data type) and columns up to 255 characters wide. Teradata 13.10 onwards, MVC can be used with variable-length character columns (VARCHAR data type) and columns up to 510 characters wide. It can support any numeric type, all character data, GRAPHIC, VARGRAPHIC, BYTE and VARBYTE.


Compression is effectively used to save space for repeating data.

For example,  the SALE_SEQ_NBR column, the value always ranges between 1 to 18.
So, its better to compress the value and store it in the table-header.


Sample Script for compression:


CREATE SET TABLE DB_PROD1.sales_detail ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     DATABLOCKSIZE = 130048 BYTES, FREESPACE = 15 PERCENT, CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO
     (
      CUS_ID_NO DECIMAL(18,0) NOT NULL,
      IAN_TYP_CD CHAR(2) CHARACTER SET LATIN NOT CASESPECIFIC COMPRESS 'KS',
      SALE_NBR DECIMAL(15,0) NOT NULL,
      SALE_SEQ_NBR SMALLINT NOT NULL DEFAULT 0  COMPRESS (1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10 ,11 ,12 ,13 ,14 ,15 ,16 ,17 ,18 ),
      LOCN_NBR SMALLINT NOT NULL DEFAULT 0 ,
      KSN_ID DECIMAL(9,0) NOT NULL DEFAULT 0. ,
      DAY_DT DATE FORMAT 'YYYY/MM/DD' NOT NULL,
      UPC_NBR DECIMAL(13,0) NOT NULL,
      SKU_PRC_TYPE_CD BYTEINT NOT NULL COMPRESS (1 ,3 ,6 ),
      SELL_TYPE_CD BYTEINT NOT NULL COMPRESS 0 ,
      SELL_QTY DECIMAL(9,4) NOT NULL COMPRESS (0.0000 ,1.0000 ,-1.0000 ),
      SELL_AMT DECIMAL(9,2) NOT NULL COMPRESS (12.99 ,2.79 ,7.99 ,2.99 ,3.00 ,0.50  ),
      )
PRIMARY INDEX ( CUS_ID_NO )
PARTITION BY RANGE_N(DAY_DT  BETWEEN DATE '2006-01-01' AND DATE '2010-12-31' EACH INTERVAL '1' MONTH ,
DATE '2011-01-01' AND DATE '2013-12-31' EACH INTERVAL '1' DAY ,
 NO RANGE, UNKNOWN)
INDEX ( KSN_ID )
INDEX ( LOCN_NBR ,DAY_DT )
INDEX ( UPC_NBR );



Real-life Scenario faced:  

Disadvantages-

  1. MVC leading to long-running queries and using extra CPU cycles.
  2. Compressed columns are not used for Partition Primary Index


Explanation:   If you have a compression on a secondary index, the data will be skewed - ( as data is distributed according to PI).
Now, there is a small overhead of fetching data based on MVC value. This will be magnified if the data is skewed. Hence, for long-running queries, the hot-amp may become very CPU-intensive.

Hence, we suggest to use MVC for cold-data. The columns in SCD that are descriptive and not used in joins.

We will cover Block-level compression and Algorithmic Compression in a later post.

Please donate if you find this post helpful by using the button on right !! Let us know your issues, and we will solve them for you.

No comments:

Post a Comment

Please share your thoughts and let us know the topics you want covered