Wednesday, 21 October 2015

Minimum or Min function in Hive - Use of min() over(partition by ) in Hiveql

 Say , we have this table products in Hive.
+----------+-------+-------+
|   name   | price | notes |
+----------+-------+-------+
| product1 |   100 |       |
| product1 |   200 | note1 |
| product2 |    10 | note2 |
| product2 |     5 | note2 |
+----------+-------+-------+
and I expect to get this result (distinct of products with minimum price)
+----------+-------+-------+
|   name   | price | notes |
+----------+-------+-------+
| product1 |   100 |       |
| product2 |     5 | note2 |
+----------+-------+-------+
How do we go about it:

1. The subquery approach:

select a.name,a.price,b.notes from (select name,min(price) as price from products group by name) as a inner join (select name,price,notes from products) as b on a.name = b.name and a.price = b.price;

2. The alternate and better approach is to use window partitioning:

select name,price,notes from (select *, min(price)over(partition by name) as min_price from products) as 
where a.price = a.min_price;


The 2nd approach is better as it invokes a single-map task to read the table while the first approach will invoke 2 map tasks even on a single node system.



1 comment:

  1. Jammin' Jars - Casino - Missouri - JSHub
    Jammin' Jars is a 3D slot 계룡 출장샵 machine with 5 reels and 창원 출장안마 3 충주 출장안마 rows with 20 fixed paylines. Hit 김해 출장안마 and miss on the 원주 출장안마 top symbols, and pay from anywhere. The symbols are

    ReplyDelete

Please share your thoughts and let us know the topics you want covered