Showing posts with label hadoop. Show all posts
Showing posts with label hadoop. Show all posts

Friday, 25 September 2015

Hortonworks HIVE metastore path - find the HDP Hive path to check for database.db files

Since, the direction is towards Open Data Platform, we are using Hortonworks hadoop for our project.

The metastore path in the HDP box is slightly different.

Lets find out the path using the below commands:

[root@sandbox /]# cd /etc/hive

[root@sandbox hive]# ls
2.3.0.0-2557  conf  conf.install


[root@sandbox hive]# cd conf.install
  
Open hive-site.xml and search for the warehouse directory: 

 <property>
      <name>hive.metastore.warehouse.dir</name>
      <value>/apps/hive/warehouse</value>
    </property>

 Once we get that, next step search the path using `hadoop fs` command. All the databases will have a separated directory of the form <databasename>.db

[root@sandbox conf.install]# hadoop fs -ls /apps/hive/warehouse/
Found 5 items
drwxrwxrwx   - root hdfs          0 2015-09-25 06:08 /apps/hive/warehouse/employees
drwxrwxrwx   - root hdfs          0 2015-09-15 07:07 /apps/hive/warehouse/financials.db
drwxrwxrwx   - hive hdfs          0 2015-08-20 09:05 /apps/hive/warehouse/sample_07
drwxrwxrwx   - hive hdfs          0 2015-08-20 09:05 /apps/hive/warehouse/sample_08
drwxrwxrwx   - hive hdfs          0 2015-08-20 08:58 /apps/hive/warehouse/xademo.db

You can read the data for the employees table using the below command:

[root@sandbox conf.install]# hadoop fs -cat /apps/hive/warehouse/employees/employees.txt

Execution of the below steps via an example diagram:
hive metastore in hortonworks hadoop 
Dirty Method to find the location of metastore is to use the describe extended command:

hive> describe extended employees;
OK
name                    string
salary                  float
subordinates            array<string>
deductions              map<string,float>
address                 struct<street:string,city:string,state:string,zip:int>

Detailed Table Information      Table(tableName:employees, dbName:default, owner:root, createTime:1443161279, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:name, type:string, comment:null), FieldSchema(name:salary, type:float, comment:null), FieldSchema(name:subordinates, type:array<string>, comment:null), FieldSchema(name:deductions, type:map<string,float>, comment:null), FieldSchema(name:address, type:struct<street:string,city:string,state:string,zip:int>, comment:null)], location:hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/employees, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{colelction.delim=, mapkey.delim=, serialization.format=, line.delim=
, field.delim=}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{numFiles=1, transient_lastDdlTime=1443161299, COLUMN_STATS_ACCURATE=true, totalSize=185}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
Time taken: 0.668 seconds, Fetched: 8 row(s)



Tuesday, 25 June 2013

Handling Large data volumes - PIG basics and advantages

Hadoop PIG is very efficient in handling large volumes of Data. We could effectively join files, saving a lot of CPU cycles from Teradata server.

It operates in  2 modes - Local and MapReduce.

To invoke the local mode, we type the command as:

bash$> pig -x local

The default mode is Map-Reduce mode.

bash$> pig 

will invoke PIG in the Map Reduce mode.

Running any pig script can be accomplished as below:

bash$> pig TestScript.pig

or 

After logging into Pig:

grunt> exec TestScript.pig


We will follow up with PIG commands and how PIG can be combined with Teradata to give great performance improvements.