Tuesday, 25 June 2013

Handling Large data volumes - PIG basics and advantages

Hadoop PIG is very efficient in handling large volumes of Data. We could effectively join files, saving a lot of CPU cycles from Teradata server.

It operates in  2 modes - Local and MapReduce.

To invoke the local mode, we type the command as:

bash$> pig -x local

The default mode is Map-Reduce mode.

bash$> pig 

will invoke PIG in the Map Reduce mode.

Running any pig script can be accomplished as below:

bash$> pig TestScript.pig

or 

After logging into Pig:

grunt> exec TestScript.pig


We will follow up with PIG commands and how PIG can be combined with Teradata to give great performance improvements.

No comments:

Post a Comment

Please share your thoughts and let us know the topics you want covered