Say , we have this table products in Hive.
+----------+-------+-------+
| name | price | notes |
+----------+-------+-------+
| product1 | 100 | |
| product1 | 200 | note1 |
| product2 | 10 | note2 |
| product2 | 5 | note2 |
+----------+-------+-------+
and I expect to get this result (distinct of products with minimum price)
+----------+-------+-------+
| name | price | notes |
+----------+-------+-------+
| product1 | 100 | |
| product2 | 5 | note2 |
+----------+-------+-------+
How do we go about it:
1. The subquery approach:
2. The alternate and better approach is to use window partitioning:
select name,price,notes from (select *, min(price)over(partition by name) as min_price from products) as
where a.price = a.min_price;
The 2nd approach is better as it invokes a single-map task to read the table while the first approach will invoke 2 map tasks even on a single node system.