Data Analysis Example: hortonworks

Showing posts with label hortonworks. Show all posts

Sunday, 15 May 2016

HortonWorks Sandbox Hive error - How to resolve the issue

If you have recently downloaded the HDP vm and tried to launch HIVE, you must have encountered the below error:

[root@sandbox ~]# hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.2.0-2950/hadoop/lib/slf4j-log4j12 -1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.2.0-2950/spark/lib/spark-assembly -1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar!/org/slf4j/impl/StaticLoggerBin der.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.2.0-2950/hadoop/lib/slf4j-log4j12 -1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.2.0-2950/spark/lib/spark-assembly -1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar!/org/slf4j/impl/StaticLoggerBin der.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

Logging initialized using configuration in file:/etc/hive/2.3.2.0-2950/0/hive-lo g4j.properties
Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.securit y.AccessControlException: Permission denied: user=root, access=WRITE, inode="/us er/root":hdfs:hdfs:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPe rmissionChecker.java:319)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPe rmissionChecker.java:292)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermi ssion(FSPermissionChecker.java:213)
at org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAcc essControlEnforcer.checkPermission(RangerHdfsAuthorizer.java:300)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermi ssion(FSPermissionChecker.java:190)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FS Directory.java:1771)

The reason is the permission of the hdfs folder /user which is drwxr-xr-x

You cannot change it.

Directory permission in HDFS

[root@sandbox ~]# hadoop fs -mkdir /user/root
mkdir: Permission denied: user=root, access=WRITE, inode="/user/root":hdfs:hdfs:drwxr-xr-x
[root@sandbox ~]# who am i
root pts/0 2016-05-15 07:50 (192.168.10.1)
[root@sandbox ~]# hadoop fs -chmod 777 /user
chmod: changing permissions of '/user': Permission denied. user=root is not the owner of inode=user

The solution is to change the user:

Change User and login to Hive - Successful login

Now, you are connected to Hive.

hive> show databases;
OK
default
xademo
Time taken: 1.32 seconds, Fetched: 2 row(s)
hive>

Like us on Facebook and Google if liked the post. We provide Live Support for your project. Just fill the Inquiry form on the home page.

Or click on the link and fill your details: https://www.surveymonkey.com/r/zz962jr

Wednesday, 21 October 2015

Minimum or Min function in Hive - Use of min() over(partition by ) in Hiveql

Say , we have this table products in Hive.

+----------+-------+-------+
|   name   | price | notes |
+----------+-------+-------+
| product1 |   100 |       |
| product1 |   200 | note1 |
| product2 |    10 | note2 |
| product2 |     5 | note2 |
+----------+-------+-------+

and I expect to get this result (distinct of products with minimum price)

+----------+-------+-------+
|   name   | price | notes |
+----------+-------+-------+
| product1 |   100 |       |
| product2 |     5 | note2 |
+----------+-------+-------+

How do we go about it:

1. The subquery approach:

select a.name,a.price,b.notes from (select name,min(price) as price from products group by name) as a inner join (select name,price,notes from products) as b on a.name = b.name and a.price = b.price;

2. The alternate and better approach is to use window partitioning:

select name,price,notes from (select *, min(price)over(partition by name) as min_price from products) as
where a.price = a.min_price;

The 2nd approach is better as it invokes a single-map task to read the table while the first approach will invoke 2 map tasks even on a single node system.

Friday, 25 September 2015

Hortonworks HIVE metastore path - find the HDP Hive path to check for database.db files

Since, the direction is towards Open Data Platform, we are using Hortonworks hadoop for our project.

The metastore path in the HDP box is slightly different.

Lets find out the path using the below commands:

[root@sandbox /]# cd /etc/hive

[root@sandbox hive]# ls
2.3.0.0-2557 conf conf.install

[root@sandbox hive]# cd conf.install

Open hive-site.xml and search for the warehouse directory:

<property>
<name>hive.metastore.warehouse.dir</name>
<value>/apps/hive/warehouse</value>
</property>

Once we get that, next step search the path using `hadoop fs` command. All the databases will have a separated directory of the form <databasename>.db

[root@sandbox conf.install]# hadoop fs -ls /apps/hive/warehouse/
Found 5 items
drwxrwxrwx - root hdfs 0 2015-09-25 06:08 /apps/hive/warehouse/employees
drwxrwxrwx - root hdfs 0 2015-09-15 07:07 /apps/hive/warehouse/financials.db
drwxrwxrwx - hive hdfs 0 2015-08-20 09:05 /apps/hive/warehouse/sample_07
drwxrwxrwx - hive hdfs 0 2015-08-20 09:05 /apps/hive/warehouse/sample_08
drwxrwxrwx - hive hdfs 0 2015-08-20 08:58 /apps/hive/warehouse/xademo.db

You can read the data for the employees table using the below command:

[root@sandbox conf.install]# hadoop fs -cat /apps/hive/warehouse/employees/employees.txt

Execution of the below steps via an example diagram:

hive metastore in hortonworks hadoop

Dirty Method to find the location of metastore is to use the describe extended command:

hive> describe extended employees;
OK
name string
salary float
subordinates array<string>
deductions map<string,float>
address struct<street:string,city:string,state:string,zip:int>

Detailed Table Information Table(tableName:employees, dbName:default, owner:root, createTime:1443161279, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:name, type:string, comment:null), FieldSchema(name:salary, type:float, comment:null), FieldSchema(name:subordinates, type:array<string>, comment:null), FieldSchema(name:deductions, type:map<string,float>, comment:null), FieldSchema(name:address, type:struct<street:string,city:string,state:string,zip:int>, comment:null)], location:hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/employees, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{colelction.delim=, mapkey.delim=, serialization.format=, line.delim=
, field.delim=}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{numFiles=1, transient_lastDdlTime=1443161299, COLUMN_STATS_ACCURATE=true, totalSize=185}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
Time taken: 0.668 seconds, Fetched: 8 row(s)

Data Analysis Example

Pages