Data Analysis Example: hadoop

Since, the direction is towards Open Data Platform, we are using Hortonworks hadoop for our project.

The metastore path in the HDP box is slightly different.

Lets find out the path using the below commands:

[root@sandbox /]# cd /etc/hive

[root@sandbox hive]# ls
2.3.0.0-2557 conf conf.install

[root@sandbox hive]# cd conf.install

Open hive-site.xml and search for the warehouse directory:

<property>
<name>hive.metastore.warehouse.dir</name>
<value>/apps/hive/warehouse</value>
</property>

Once we get that, next step search the path using `hadoop fs` command. All the databases will have a separated directory of the form <databasename>.db

[root@sandbox conf.install]# hadoop fs -ls /apps/hive/warehouse/
Found 5 items
drwxrwxrwx - root hdfs 0 2015-09-25 06:08 /apps/hive/warehouse/employees
drwxrwxrwx - root hdfs 0 2015-09-15 07:07 /apps/hive/warehouse/financials.db
drwxrwxrwx - hive hdfs 0 2015-08-20 09:05 /apps/hive/warehouse/sample_07
drwxrwxrwx - hive hdfs 0 2015-08-20 09:05 /apps/hive/warehouse/sample_08
drwxrwxrwx - hive hdfs 0 2015-08-20 08:58 /apps/hive/warehouse/xademo.db

You can read the data for the employees table using the below command:

[root@sandbox conf.install]# hadoop fs -cat /apps/hive/warehouse/employees/employees.txt

Execution of the below steps via an example diagram:

hive metastore in hortonworks hadoop

Dirty Method to find the location of metastore is to use the describe extended command:

hive> describe extended employees;
OK
name string
salary float
subordinates array<string>
deductions map<string,float>
address struct<street:string,city:string,state:string,zip:int>

Detailed Table Information Table(tableName:employees, dbName:default, owner:root, createTime:1443161279, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:name, type:string, comment:null), FieldSchema(name:salary, type:float, comment:null), FieldSchema(name:subordinates, type:array<string>, comment:null), FieldSchema(name:deductions, type:map<string,float>, comment:null), FieldSchema(name:address, type:struct<street:string,city:string,state:string,zip:int>, comment:null)], location:hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/employees, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{colelction.delim=, mapkey.delim=, serialization.format=, line.delim=
, field.delim=}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{numFiles=1, transient_lastDdlTime=1443161299, COLUMN_STATS_ACCURATE=true, totalSize=185}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
Time taken: 0.668 seconds, Fetched: 8 row(s)

Data Analysis Example

Pages

Friday, 25 September 2015

Hortonworks HIVE metastore path - find the HDP Hive path to check for database.db files

Tuesday, 25 June 2013

Handling Large data volumes - PIG basics and advantages

Knowledge Archive