The Integration Layer marks the transition from raw data to integrated data. In this layer, the raw data is consolidated, duplicate records and values are removed, and disparate sources combined into a single version.
After the data is passed through the Integration layer, it is consistent and consumable for the DW and BI applications.
Key considerations for Integration Layer processing:
- For data with multiple sources, a Master Data Management process should be in place. See the below link for details: Master_data_management - Benefits and Issues
- To have source trace-ability, data should be stored with the corresponding source information. One of the major when an Organization grows is the duplication of Data, due to multiple sources. Data de-duplication is a tedious job. Having Source information at a row-level eases the job to an extent.
- To know the user who modified or inserted the data, we should have AlterUserID and InsertTimeStamp columns. We can use the below SQL in Teradata to populate these 2 columns.
INSERT INTO DIMENSION1
(
SourceCode
,AlterUserID
,InsertTimeStamp
)
Select
'POS Systems'
,USER
,CURRENT_TIMESTAMP(0);
Making SourceCode, AlterUserID and InsertTimeStamp as NOT NULL columns will ensure that we can always track back changes to a Soure, User and Date.
4. Updating the Logical and Physical Data Models on a regular basis is a must to keep track of the changes. Data Models are important to understand the flow of data.
This will be the first step in creating Stellar Data, which is clean and traceable.
Master Data Management, Data Models and Storing Source and User information at row-level should help us start building low-maintenance warehouses.
Source: Wikipedia, Teradata Certification
Let us know your comments and suggestions. Please Donate for a cause if you are helped !!
No comments:
Post a Comment
Please share your thoughts and let us know the topics you want covered