Tuesday 8 September 2015

Big Data Lake Implementation - Moving Data from OLTP (MySQL) to HDFS using Apache Sqoop - Example scripts

To persist the entire history data in the Big Data Lake, we started with the ingestion and storage of all records in the OLTP system (based on MySQL) to HDFS cluster.

Below is a sample sqoop import call that allows us to do this with ease.

sqoop import --connect jdbc:mysql://localhost/test_nifi --username root --table apache_nifi_test -m 1


We can also persist the data directly onto a Hive table :

./sqoop import –connect jdbc:mysql://w.x.y.z:3306/db_name –username user_name –password **** –hive-import –hive-database -table oltp_table1 -m 1

The m 1 creates only one file for the table. This is used if you don't have a primary key defined. Sqoop uses the primary key for partitioning the data.

The below diagram will explain the steps we ran to achieve the data copy.

Step 1: Creation of a table on MySQL with data


Copy MySQL data onto HDFS using Sqoop for Big Data Lake implementation

Step 2: Running sqoop to extract data from the MySQL table and dump on HDFS

MySQL to HDFS data transer via Sqoop

As you can see in the diagram, by default the data is copied into :
/user/<login-user>/<tablename>
The path from where you execute the sqoop import command also stores the .java file that was used to import the data.

Let us know if you like our Blog. Thanks!!

No comments:

Post a Comment

Please share your thoughts and let us know the topics you want covered