Friday 25 September 2015

Hortonworks HIVE metastore path - find the HDP Hive path to check for database.db files

Since, the direction is towards Open Data Platform, we are using Hortonworks hadoop for our project.

The metastore path in the HDP box is slightly different.

Lets find out the path using the below commands:

[root@sandbox /]# cd /etc/hive

[root@sandbox hive]# ls
2.3.0.0-2557  conf  conf.install


[root@sandbox hive]# cd conf.install
  
Open hive-site.xml and search for the warehouse directory: 

 <property>
      <name>hive.metastore.warehouse.dir</name>
      <value>/apps/hive/warehouse</value>
    </property>

 Once we get that, next step search the path using `hadoop fs` command. All the databases will have a separated directory of the form <databasename>.db

[root@sandbox conf.install]# hadoop fs -ls /apps/hive/warehouse/
Found 5 items
drwxrwxrwx   - root hdfs          0 2015-09-25 06:08 /apps/hive/warehouse/employees
drwxrwxrwx   - root hdfs          0 2015-09-15 07:07 /apps/hive/warehouse/financials.db
drwxrwxrwx   - hive hdfs          0 2015-08-20 09:05 /apps/hive/warehouse/sample_07
drwxrwxrwx   - hive hdfs          0 2015-08-20 09:05 /apps/hive/warehouse/sample_08
drwxrwxrwx   - hive hdfs          0 2015-08-20 08:58 /apps/hive/warehouse/xademo.db

You can read the data for the employees table using the below command:

[root@sandbox conf.install]# hadoop fs -cat /apps/hive/warehouse/employees/employees.txt

Execution of the below steps via an example diagram:
hive metastore in hortonworks hadoop 
Dirty Method to find the location of metastore is to use the describe extended command:

hive> describe extended employees;
OK
name                    string
salary                  float
subordinates            array<string>
deductions              map<string,float>
address                 struct<street:string,city:string,state:string,zip:int>

Detailed Table Information      Table(tableName:employees, dbName:default, owner:root, createTime:1443161279, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:name, type:string, comment:null), FieldSchema(name:salary, type:float, comment:null), FieldSchema(name:subordinates, type:array<string>, comment:null), FieldSchema(name:deductions, type:map<string,float>, comment:null), FieldSchema(name:address, type:struct<street:string,city:string,state:string,zip:int>, comment:null)], location:hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/employees, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{colelction.delim=, mapkey.delim=, serialization.format=, line.delim=
, field.delim=}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{numFiles=1, transient_lastDdlTime=1443161299, COLUMN_STATS_ACCURATE=true, totalSize=185}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
Time taken: 0.668 seconds, Fetched: 8 row(s)



Friday 11 September 2015

Informatica Training in Bangalore - Classroom and Online in Marathalli Bangalore

Informatica Online Training Course

Data Warehouse Concepts:

  • Introduction to Data warehouse

  • What is Data warehouse and why we need Data warehouse

  • Dimensional modeling

  • Star schema/Snowflake schema/Galaxy schema

  • Dimensions / Facts tables.

  • Slowly Changing Dimensions and its types.

  • Data Staging Area

  • Different types of Dimensions and Facts.

  • Data Mart vs Data warehouse

Informatica Power Center 9:

Software Installation:

Informatica 9 Server/Client Installation on Windows/Unix.

Power Center Architecture and Components:

  • Introduction to Informatica Power Center

  • Difference Between Power Center and Power Mart

  • PowerCenter 9 architecture

  • PowerCenter 7 architecture vs Power Center 8 and 9 architecture

  • Extraction, Transformation and loading process 

Power Center tools: Designer, Workflow manager, Workflow Monitor,

Repository Manager, Informatica Administration Console.

  • Repository Server and agent

  • Repository maintenance

  • Repository Server Administration Console

  • Security, Repository, privileges and folder permissions

  • Metadata extensions

Power Center Developer Topics:

Lab 1- Create a Folder.

  • How to provide Privileges

  • Source Object Definitions

        Source types

-       Relational Tables (Oracle, Teradata)
–       Flat Files (fix width, Delimiter Files)
–       Xml Files
–       COBOL Files
–       Sales Force

Source properties

Lab 2- Analyze Source Data, Import Source.

  • Target Object Definitions

-       Target types
–       Target properties

Lab 3- Import Targets

  • Transformation Concepts

  • Transformation types and views

  • Transformation features and ports

  • Informatica functions and data types

Mappings

  • Mapping components

  • Source Qualifier transformation

  • SQL and Post SQL

  • Mapping validation

  • Data flow rules

Lab 4 –Create a Mapping, session, and workflow

Workflows

  • Workflow Tools

  • Workflow Structure and configuration

  • Workflow Tasks

  • Workflow Design and properties

Session Tasks

  • Session Task properties

  • Session components

  • Transformation overrides

  • Session partitions

Workflow Monitoring

  • Workflow Monitor views

  • Monitoring a Server

  • Actions initiated from the workflow Monitor

  • chart View and Task view.

Lab 6 – Start and Monitor a Workflow

Debugger

  • Debugger features

  • Debugger windows

  • Tips for using the Debugger

Lab 7 –The Debugger

Expression transformation

  • Expression, variable ports, storing previous record values.

Different type of Ports

  • Input/ output / Variable ports and Port Evaluation

  • Filter transformation

  • Filter properties

Lab 8- Expression and Filter

Aggregator transformation

  • Aggregation function and expressions

  • Aggregator properties

  • Using sorted data

  • Incremental Aggregation

Joiner transformation

  • Joiner types

  • Joiner conditions and properties

  • Joiner usage and Nested joins

Lab 9 – Aggregator, Heterogeneous join

  • Working with Flat files

  • Importing and editing flat file sources & Targets

Lab Session – Use Flat file as source.

Sorter transformation

  • Sorter properties

  • Sorter limitations

Lab 10 – Sorter

  • Propagate Attributes.

  • Shared Folder and Working with shortcuts.

  • Informatica built in functions.

Lookup transformation

  • Lookup principles

  • Lookup properties

  • Lookup techniques

  • Connected and unconnected lookup.

  • Lookup Caches

Lab 11 – Basic and Advance Lookup

 Target options

  • Row type indicators

  • Row loading operations

  • Constraint- based loading

  • Rejected row handling options

Lab 12 – Deleting Rows

  • Update Strategy transformation

  • Update strategy expressions

Lab 13 – Data Driven Inserts and Rejects 

  • Router transformation

Using a router

Router groups

Lab 14 – Router

  • Conditional Lookup

Usage and techniques

Advantage

Functionality

Lab 15 – Straight Load
Lab 16 – Conditional Lookup

 Heterogeneous Targets

  • Heterogeneous target types

  • Target type conversions and limitations

Lab 17 – Heterogeneous Targets

M-applet

  • Functionality and Advantages

  • M-applet types and structure

  • M-applet limitations

Lab 18 – M-applet

  • Reusable transformations

  • Advantages

  • Limitations

  • Promoting and copying transformations

Lab 19 – Reusable transformations

  • Sequence Generator transformation

  • Using a sequence Generator

Sequence Generator properties

  • Dynamic Lookup

  • Dynamic Lookup theory

  • Usage and functionality

  • Advantages

Lab 20 – Dynamic Lookup

  • Concurrent and sequential WorkflowsStopping, Starting and suspending tasks and workflows

    • Concurrent Workflows
    • Sequential Workflows

Lab 21 – Sequential Workflow

  • Additional TransformationsLab Sessions- For above transformations

    • Union Transformation
    • Rank transformation
    • Normalize transformation
    • Custom Transformation
    • Transformation Control transformation
    • XML Transformation
    • SQL Transformation
    • Stored Procedure Transformation
    • External procedure Transformation
    • SQL Transformation
  • Error Handling

  • Overview of Error Handling Topics

Lab 22 – Error handling fatal and non Fatal

  • Workflow Tasks:

    • Command
    • Email
    • Decision
    • Timer
    • Control
    • Even Raise and Wait
    • Sequential Batch Processing

Parallel Batch Processing

  • Lab Sessions – With Workflow tasks

  • Link Conditions

  • Team Based Development

Version Control

Checking out and checking in objects.

  • Performance Tuning

  • Overview of System Environment

  • Identifying Bottlenecks.

Optimizing Source, Target, mapping, Transformation, session.

  • Mapping Parameters and Variables

Introduction to Mapping Variables and Parameters

  • Creating Mapping Variables and Updating Variables

  • Creating Parameter File and associating file to a Session

  • System Variables

  • Variables functions

Lab 26 – Override Mapping Variable with Parameter Files
Lab 27 – Dynamically Updating a Source Qualifier with Mapping Variable

  • Slowly Changing Dimensions Type 1, Type 2, Type 3

  • Incremental Loading

Lab 28– SCD 1, 2, 3

  • Reusable Workflow Tasks

  • Work Lets

  • Work lets Limitation

  • Sessions

  • Reusable Sessions

Lab 29 – Create Worklet using Tasks

  • Command Line Interface

  • Overview of PM REP and functions.

  • PM REP

  • Informatica Migrations:

Copying Objects

  • Objects export and import (XML)

  • Deployment groups

  • Workflows Scheduling:

  • Using Informatica

  • Unix cron tab, third party tools.

Lab 30: Informatica Project- Case Study

  • Sales Data mart.

  • Loading Dimensions and Facts.

  • ETL Best Practices and methodologies

  • Review the Industry best practices in ETL Development

  • Review Real time project experiences of trainer

  • Discuss what is learned techniques are useful in real world

  • How to design effective ETL process

  • Important considerations in designing ETL process

  • Discuss real world production issues and support

  • Discuss various roles in ETL world

  • Business Analyst, System Analyst

  • System Architect

  • Technical Architect, ETL Lead

  • Stakeholders, Business users

  • Effective ways of using Data warehouse

  • Review various BI Reporting methods

    • Q/A-Interview preparation/Placements

    • Answer students questions

    • Tips for interview preparation

    • How we can assist in placement and future growth

    • Discuss other related technologies like Business Intelligence (BI)

    • Advancing career options