Data Analysis Example: MySql implementation

Showing posts with label MySql implementation. Show all posts

Tuesday, 8 September 2015

Big Data Lake Implementation - Moving Data from OLTP (MySQL) to HDFS using Apache Sqoop - Example scripts

To persist the entire history data in the Big Data Lake, we started with the ingestion and storage of all records in the OLTP system (based on MySQL) to HDFS cluster.

Below is a sample sqoop import call that allows us to do this with ease.

sqoop import --connect jdbc:mysql://localhost/test_nifi --username root --table apache_nifi_test -m 1

We can also persist the data directly onto a Hive table :

./sqoop import –connect jdbc:mysql://w.x.y.z:3306/db_name –username user_name –password **** –hive-import –hive-database -table oltp_table1 -m 1

The m 1 creates only one file for the table. This is used if you don't have a primary key defined. Sqoop uses the primary key for partitioning the data.

The below diagram will explain the steps we ran to achieve the data copy.

Step 1: Creation of a table on MySQL with data

Copy MySQL data onto HDFS using Sqoop for Big Data Lake implementation

Step 2: Running sqoop to extract data from the MySQL table and dump on HDFS

MySQL to HDFS data transer via Sqoop

As you can see in the diagram, by default the data is copied into :
/user/<login-user>/<tablename>
The path from where you execute the sqoop import command also stores the .java file that was used to import the data.

Let us know if you like our Blog. Thanks!!

Monday, 20 April 2015

Find the combination of strings in all orders and delete the duplicate records - Amazing Interview Question

This was a question posted to me in the in-the premises round of a leading E-commerce company.

Problem; Find the count of strings and their combinations in a table, and count all the combination of letters as duplicates.

If AB is a string, then BA is a duplicate string. And count(AB) should be 2.

The Solution attempted by me is as follows:

create table table1
(
value1 varchar(30));

Now , in the list below, AB and BA are treated as duplicates.
Similarly, 'CD' and 'DC' are duplicates.
We need to find the count of all combinations possible for 'AB' and 'CD' .

insert into table1 values ('AB');
insert into table1 values ('BA');
insert into table1 values ('CD');
insert into table1 values ('DC');

select * from table1;

select A.value1,count(*) from table1 as t1
inner join
(
select
case when value1 > value_ref then value_ref else value1 end as value1
, value_ref
, value1 as val1

from
(
select value1, concat(substr(value1,2,1),substr(value1,1,1)) as value_ref from table1
group by value_ref
) as A) as A
on t1.value1 = A.value_ref
group by A.value1;

The trick is to create a lookup with all the values in the table.
The inner query does this trick as shown below:

Lookup table of values and their combination

Result set will be:

Count of all combinations possible

Write your solution in the comment section.

Tuesday, 10 February 2015

Cursor code example in MySQL to update from multiple rows in source

Example script to update multiple rows from source, where each row has the changed primary key.

How the keys look like? The highlighted columns form a composite primary key.
For
Update_Code = 1, New PK value updates Record_Key
Update_Code = 2, New PK value updates Date of Birth
Update_Code = 3, New PK value updates First Name

Cursor Update in MySQL dataset

Example script to solve using cursor:

drop procedure if exists segment_update_mbr;
DELIMITER $$
CREATE PROCEDURE segment_update_mbr(v_file_id bigint)
proc_label: begin

#-----------------------------------------------------------#
# Usage: Update segment_data using each row #
#-----------------------------------------------------------#

declare done int default 0;
declare a, b int default 0;
declare v_mbr_id, v_mbr_rec_no, v_mbr_id_old varchar(500);
declare v_error, v_file_type varchar(500);
declare v_error_count,v_threshold , v_threshold_breach, v_staging_count, v_ref_id, v_update_count bigint;
DECLARE cur1 CURSOR FOR SELECT reference_id, member_id, member_record_number FROM temp_multiple_mc order by cast(member_record_number as unsigned integer);

# Ensure to cast your order by column to integer, else varchar columns will give incorrect order of rows

declare continue handler for not found set done=1;

set v_error = 'start';
set v_threshold = 0;
SET v_threshold_breach = 0;
SET v_staging_count=0;
SET v_error_count=0;
set v_ref_id = 0;
set v_update_count= 0;

SELECT segment_file_type_description
INTO v_file_type
FROM segment_file_type
WHERE segment_file_type_id = (SELECT segment_file_type_id FROM segment_file WHERE file_id = v_file_id);

select account_id into @v_account
from segment_file
where file_id=v_file_id;

if lower(v_file_type) = 'complete' then

set sql_safe_updates = 0;

open cur1;
strt1 : loop
fetch cur1 into v_ref_id, v_mbr_id, v_mbr_rec_no;
#set v_error = 'line15';
if done = 1 then leave strt1;
end if;
select count(1) into a from segment_data where member_id = v_mbr_id;
select fn_check_key_present(v_file_id, v_ref_id) into @exist_chk;
select fn_check_duplicate_primary_mbr_present(v_file_id, v_ref_id) into @primary_chk;
select v_mbr_id, v_mbr_rec_no,v_ref_id, @exist_chk, @primary_chk, a;

if a > 0 and @exist_chk = 0 and @primary_chk = 0 then
set sql_safe_updates = 0;
set foreign_key_checks = 0;

# Update the segment_data table using the 1 row present in the cursor

update segment_data ed

inner join
segment_file_staging_complete as efsc
on ed.member_id = efsc.member_id
and ed.member_first_name = efsc.member_first_name
and cast(date_format(ed.date_of_birth, "%m%d%Y") as char(10))= cast(efsc.date_of_birth as char(10))
and ed.group_id = efsc.group_id
and ed.client_id = efsc.client_id
and ed.account_id = @v_account
inner join
temp_multiple_mc as tmm
on cast(tmm.reference_id as unsigned integer) = cast(efsc.segment_file_staging_complete_id as unsigned integer)
set
ed.member_id = case when efsc.maintenance_code = '01' and efsc.transition_id is not null and efsc.transition_id <> '' then efsc.transition_id else efsc.member_id end,
ed.date_of_birth = case when efsc.maintenance_code = '02' and efsc.transition_id is not null and efsc.transition_id <> '' then str_to_date(efsc.transition_id, "%m%d%Y") else str_to_date(efsc.date_of_birth, "%m%d%Y") end,
ed.member_first_name = case when efsc.maintenance_code = '03' and efsc.transition_id is not null and efsc.transition_id <> '' then efsc.transition_id else efsc.member_first_name end,
ed.modified_by = efsc.modified_by,
ed.modified_date = efsc.modified_date,
ed.file_id = efsc.file_id,
#ed.segment_file_type_id = efsc.segment_file_type_id,
ed.client_id = efsc.client_id,
#ed.account_id = efsc.account_id,
ed.initial_member_id = efsc.initial_member_id,
ed.member_middle_initial = efsc.member_middle_initial,
ed.member_last_name = efsc.member_last_name,
ed.group_id = efsc.group_id,
ed.segment_effective_date = str_to_date(efsc.segment_effective_date, "%m%d%Y"),
ed.segment_end_date = CASE WHEN efsc.segment_end_date = '' THEN '2099-01-01' WHEN efsc.segment_end_date IS NULL THEN '2099-01-01' ELSE str_to_date(efsc.segment_end_date, "%m%d%Y") END,
ed.gender_desc = (select lookup_value from lookup where lookup_group='GENDER' and lookup_key=efsc.gender)
where tmm.reference_id = v_ref_id
and efsc.is_invalid_row = 0
and efsc.is_duplicate_row = 0
;
set v_update_count = v_update_count + row_count();

set sql_safe_updates = 1;

# Mark records that on update will create duplicate primary key
elseif @exist_chk = 129 then
set sql_safe_updates = 0;
update temp_multiple_mc
set is_invalid_row = 1
where reference_id = v_ref_id;

set sql_safe_updates = 1;

elseif @primary_chk = 165 then
set sql_safe_updates = 0;
update temp_multiple_mc
set is_duplicate_row = 1
where reference_id = v_ref_id;

set sql_safe_updates = 1;
end loop;
end if;

end if;

# Updated rows count is inserted into segment_file_status table
set sql_safe_updates = 0;

select max(segment_file_status_id) into @status_id from segment_file_status where file_id = v_file_id;

update segment_file_status
set
records_updated = v_update_count
,modified_by = current_user()
where file_id = v_file_id
and segment_file_status_id>0
and segment_file_status_id = @status_id
;
SET SQL_SAFE_UPDATES=1;
end$$
DELIMITER ;

Like us on Google+ or Facebook and share your valuable feedback.

Thursday, 22 January 2015

Find_in_set feature of MySQL to find elements in an array or list of Values - Example and Diagrams

In an earlier post , we have seen how using group_concat() function we can group all the values of a column into comma separated array/list of values.

Please see the picture below:

Array elements distributed over 5 different rows and needs to be merged

On doing a group_concat(eligibility_file_staging_complete_id), we get the values in all the rows into a single row.

Example query:

select group_concat(eligibility_file_staging_complete_id)into @v_staging_id_list from temp_multiple_mc_complete;

select @v_staging_id_list;

Merge all columns into a single array using group_concat() function

Now, our requirement is to find each ID from another table and do some operation if the match is found.
We can use the FIND_IN_SET() function for this.

Example query:

select efsc.eligibility_file_staging_complete_id as reference_id, efsc.* from
eligibility_file_staging_complete as efsc
where find_in_set(efsc.eligibility_file_staging_complete_id , @v_staging_id_list)
and file_id = v_file_id;

The above query will pick only those rows that are in the array @v_staging_id_list.

Please like us on Facebook or LinkedIn if you find our blog helpful.

Wednesday, 21 January 2015

Example script to extract one item from a list of values in MySQL - Use Prepare and Execute - Dynamic SQL

If you have a list of ids in a string and then you want to extract each element from the array of elements, then you can use the below sample script to extract each item.

We are using a while loop and the substring_index function to extract each item.
We have passed the table name as the parameter to the procedure. The rows will be concatenated and then each element from the array is extracted.

The data in my table looks as follows: I need each ID field stored in the last column. The below stored will accomplish this.

The values in column are to be merged using group_concat() to form an array

Example script:

drop procedure if exists eligibilityfile_str_to_column;
DELIMITER $$
CREATE PROCEDURE eligibilityfile_str_to_column(v_tbl_name varchar(200))
begin

DECLARE v_data_extract, v_str_len, v_counter bigint;
DECLARE v_file_type, v_staging_id_list,v_stmnt varchar(500);

set v_data_extract = 0;
set v_str_len = 0;
set v_counter = 1;

select v_tbl_name into @a;
set @v_stmnt = concat('select group_concat(eligibility_file_staging_complete_id) into @v_staging_id_list from ', v_tbl_name);

PREPARE stmt2 from @v_stmnt;
select concat('select group_concat(eligibility_file_staging_complete_id) into @v_staging_id_list from ', v_tbl_name);

EXECUTE stmt2;

# find the total no of delimiters
select char_length(@v_staging_id_list) - char_length(replace(@v_staging_id_list,',','')) into v_str_len;

select concat('length is',v_str_len);

drop temporary table if exists temp_staging_list;
create temporary table temp_staging_list
(staging_id bigint(20) unsigned
);

select 'table created';

# loop to read till all the elements are not read
while v_counter <= v_str_len do
#select 'loop started';
select substring_index(substring_index(@v_staging_id_list,',',v_counter),',',-1) into v_data_extract;
#insert into temp_staging_list select v_data_extract;
select v_data_extract;
set v_counter = v_counter + 1;
end while;

end$$
DELIMITER ;

Insert only unqiue rows in MySQL-reject duplicate rows silently without error

If you use INSERT IGNORE, then the row won't actually be inserted if it results in a duplicate key. But the statement won't generate an error. It generates a warning instead. These cases include:

Inserting a duplicate key in columns with PRIMARY KEY or UNIQUE constraints.
Inserting a NULL into a column with a NOT NULL constraint.
Inserting a row to a partitioned table, but the values you insert don't map to a partition.
If you use REPLACE, MySQL actually does a DELETE followed by an INSERT internally, which has some unexpected side effects:
A new auto-increment ID is allocated.
Dependent rows with foreign keys may be deleted (if you use cascading foreign keys) or else prevent the REPLACE.
Triggers that fire on DELETE are executed unnecessarily.
Side effects are propagated to replication slaves too.

Both REPLACE and INSERT...ON DUPLICATE KEY UPDATE are non-standard, proprietary inventions specific to MySQL. ANSI SQL 2003 defines a MERGE statement that can solve the same need (and more), but MySQL does not support the MERGE statement.

Thursday, 15 January 2015

Date comparison confusion in MySQL - Read the example below

MySQL is pretty straight forward in the documentation. All valid dates should be in 'YYYY-MM-DD' format or 'yy-mm-dd' format.

But developers like us will continue making mistakes and hence learn the hard way.

As we see in the below example, how the format matters in date comparisons.

Example code:

SELECT

fn_str_date_chk(efer.eligibility_end_date)

,date_format(str_to_date(efer.eligibility_end_date,'%m%d%Y'),'%Y-%m-%d') as end_Date

,date_format(str_to_date('11012014','%m%d%Y'),'%Y-%m-%d') as file_date

, case when date_format(str_to_date(efer.eligibility_end_date,'%m%d%Y'),'%Y-%m-%d') < date_format(str_to_date('11012014','%m%d%Y'),'%Y-%m-%d') then 1 else 0 end as newresult

,date_format(str_to_date(efer.eligibility_end_date,'%m%d%Y'),'%m%d%Y') as end_Date

,date_format(str_to_date('11012014','%m%d%Y'),'%m%d%Y') as file_date

, case when date_format(str_to_date(efer.eligibility_end_date,'%m%d%Y'),'%m%d%Y') < date_format(str_to_date('11012014','%m%d%Y'),'%m%d%Y') then 1 else 0 end as oldresult

,efer.*

FROM eligibility_file_error_row as efer

WHERE (fn_str_date_chk(eligibility_end_date) = 1 AND fn_str_date_chk(eligibility_effective_date) = 1)

AND (date_format(str_to_date(eligibility_end_date,'%m%d%Y'),'%m%d%Y') < (select date_format(str_to_date(fn_header_date_retrieve(122228),'%Y-%m-%d'),'%m%d%Y'))

OR str_to_date(eligibility_end_date,'%m%d%Y') < str_to_date(eligibility_effective_date,'%m%d%Y'))

AND file_id = 122228;

The code marked in bold will give you incorrect results as the format is '%m%d%Y'.

Please see the results from the query below in newresult and oldresult column:

Invalid dates or dates with space conversion in MySQL

Monday, 12 January 2015

Detecting and validating email address in weblogs/tweets - MySQL example function

We regularly dump data onto Mysql from different sources. And the hardest part is to cleanse and filter out the desirable inforamation.

One of the key uses of the below function in MySql is to extract valid email addresses and reject the invalid ones.

MySql Example function:

DELIMITER $$

CREATE FUNCTION fn_email_address_test(column_value Varchar(200)) RETURNS tinyint(1)

READS SQL DATA

DETERMINISTIC

BEGIN

DECLARE is_valid_email BOOLEAN;

SET is_valid_email = 0;

IF column_value NOT REGEXP '^[A-Za-z0-9._%-]+@[A-Za-z0-9]+\.[A-Za-z]+[A-Za-z]{2}$'

then

SET is_valid_email = 0;

else

set is_valid_email = 1;

END IF;

RETURN is_valid_email;

END$$

DELIMITER ;

The above function will successfully identify 98% email addresses.

Please let us know if the above post helped.

Give us your valuable feedback in the comment section.

Tuesday, 23 December 2014

MySQL function to reject invalid email address from websites

Most of us data analysts are facing the challenge of accepting valid data and rejecting invalid data. Please see the below email address validation function in MySql.

The function returns 0 for invalid and 1 for valid emails.

Example:

DELIMITER $$
CREATE FUNCTION fn_email_test(column_value Varchar(200)) RETURNS tinyint(1)
READS SQL DATA
DETERMINISTIC
BEGIN
DECLARE is_valid_email BOOLEAN;

SET is_valid_email = 0;

IF column_value NOT REGEXP '^[\x41-\x5a\x61-\x7a\x30-\x39\x2e\x2d\x5f\x25]+\x40[\x41-\x5a\x61-\x7a\x30-\x39\x2e\x2d]+\x2e[\x41-\x5a\x61-\x7a]{2,4}$';
then
SET is_valid_email = 0;
else
set is_valid_email = 1;
END IF;
RETURN is_valid_email;
END$$
DELIMITER ;

Please write your comments to help us improve. Like us on Google+ or Facebook.

Sunday, 21 December 2014

Understanding Implicit Commits in MySql on using Alter table Create Index on temporary table

As we understand from the MySql documents, implicit commits happen when we do a Alter Table or Add Index query within a stored procedure.

So, this means that any DDL changes will be committed to the Database. So, if you do a Alter Table Add Column, the column will be added implicitly. A Rollback will not change the table structure back to its old DDL. The new column or index will be committed.

Any DML not committed since the last successful commit will be committed into the Database as well.

As per the MySQL documentation:

CREATE TABLE and DROP TABLE statements do not commit a transaction if the TEMPORARY keyword is used. (This does not apply to other operations on temporary tables such as ALTER TABLE and CREATE INDEX, which do cause a commit.) However, although no implicit commit occurs, neither can the statement be rolled back, which means that the use of such statements causes transactional atomicity to be violated. For example, if you useCREATE TEMPORARY TABLE and then roll back the transaction, the table remains in existence.

Example Procedure:

drop procedure if exists testproc_2;
DELIMITER $$
CREATE PROCEDURE testproc_2(user_enter varchar(500))
BEGIN

DECLARE v_error bigint;
DECLARE v_file_type, error_msg char(100);

DECLARE EXIT HANDLER FOR sqlexception
BEGIN
SELECT error_msg;
ROLLBACK;
end;

set v_error = 0;

START TRANSACTION;

select user_enter;
insert into int_marks
values (user_enter);

select * from int_marks;

drop temporary table if exists temp_int_marks;
CREATE temporary table temp_int_marks select * from int_marks where 1=2;
ALTER TABLE temp_int_marks ADD INDEX tim_index (member_first_name(20));

select * from int_marks;
select 'Executing';

set error_msg = 'sql failure';
create temporary table temp_int_marks select * from int_marks where 1=2;

END$$
DELIMITER ;

Verifying the DDL:

show create table temp_int_marks;

CREATE TEMPORARY TABLE `temp_int_marks` ( `member_first_name` varchar(50) COLLATE utf8_unicode_ci NOT NULL COMMENT 'Member First Name', KEY `tim_index` (`member_first_name`(20)) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

Before the procedure the data in the int_marks table:

Call the procedure:

call testproc_2('none');

From within the procedure:

The 'none' value passed to the procedure for insertion is shown. On, any sql failure, this should be rolled back and data not committed into the table.

Outside the procedure:

So, the data has been committed to the DB though there was no explicit commit.
The index addition, being an implicit commit is applied to the Database. This also commits any DML that preceded the commit.

We validated the same by opening a new session. The new row has been added to the Database.

Please let us know if this helps. Like us on Google Plus or Facebook.

Wednesday, 12 November 2014

Problems in transforming date using str_to_date function - null returned for valid date input

The str_to_date() fails by returning a null value in MySql for some specific valid date values. Unlike Teradata, where we can use the cast function to format dates also, we were using the function to format our dates in MySql.
To overcome the bug with str_to_date() function, we can use the date_format() function.

Did a comparison of the values returned by both the functions when the input is a valid date and the results are below:

Query:
select str_to_date(trim('2013-09-10 02:30:01 '), '%Y-%m-%d %H:%m:%s'), date_format(trim('2013-09-10 02:30:01 '), '%Y-%m-%d %H:%m:%s');

str_to_date() returns a null, where date_format() returns a value.

Please share with us your comments or suggestion.

Wednesday, 15 October 2014

Accept or reject International Accent marks in MySql and comparing them using regex

After fighting over this for the last couple of hours, I have nailed it at last. Storage and filtering international accent marks in MySql , one of the most sought after problems for any portal's underlying DB.

So, my approach was to first pass the column value to the function and store as a BLOB. Please note that if you store it as Varchar, your regular expressions will not work.

Accent marks span over multiple bytes, refer the collation and charset pages on mysql manuals.

Example:

select char_length('À'); # 1 as the total number of characters
select length('À'); #2 as the total number of bytes.

Now, the example function to reject any special characters and number and to accept all alphabets and international accent marks.

Example function:

drop function if exists fn_alphabetic_with_acc_hypen;
DELIMITER $$
CREATE FUNCTION fn_alphabetic_with_acc_hypen(column_value Varchar(200)) RETURNS tinyint(1)
BEGIN
DECLARE is_alpha_acc_hyphen BOOLEAN;

SET is_alpha_acc_hyphen = 0;

# If input string contains only numbers, then return 1 else 0
#using concat function to preserve the hexadecimal codes

IF column_value not REGEXP concat('[',x'21','-',x'2C',x'2E','-',x'40',x'5B','-',x'60',x'7B','-',x'7E',x'A1','-',x'BF',']') then
SET is_alpha_acc_hyphen = 1;
# 1 means true
else
set is_alpha_acc_hyphen = 0;
# 0 means false
END IF;
RETURN is_alpha_acc_hyphen;
END$$
DELIMITER ;

If we run the below select, it will return 1 for international accent mark containing strings.

select fn_alphabetic_with_acc_hyphen('A Àcb'); # results in 1 (TRUE) - that it accepts the international accent marks

Please let us know your suggestion/advice. Like us if you were helped.

Friday, 26 September 2014

User defined functions in MySql - check date format and leap year in MySql UDF

The below function can be used to check the date format (MMDDYYYY) in any input column. Also, it checks for a valid date. So, if for example, there is a date 02302014 (30th Feb, 2014), it will return 0.

0 = False (Not a valid date)
1 = True (Is a valid date)

While compiling, please change the DEFINER=`dataanalysis` to a valid user in your system. Else, you will get a user not present definer error.

The error message will be:
0 row(s) affected, 1 warning(s): 1449 The user specified as a definer ('dataanalysis'@'%') does not exist

Example code for Date validation:

DELIMITER $$
CREATE FUNCTION fn_str_date_chk(column_value Varchar(200)) RETURNS tinyint(1)
BEGIN
DECLARE is_valid BOOLEAN default 0;
#declare continue handler for sqlwarning

if char_length(column_value) <> 8 then
set is_valid = 0;
elseif char_length(column_value) = 8 then

case when date(str_to_date(column_value,'%m%d%Y')) is null then
set is_valid = 0;
else set is_valid = 1;
end case;
else
set is_valid = 1;
end if;

return is_valid;
end$$
DELIMITER ;

Usage:

select fn_is_date(date_of_birth) from employee;

Result:

0 = for all invalid date of birth
1 = for all valid date of birth

Please like us if you appreciate our blog. And make sure to chip in with your suggestions.

Saturday, 30 August 2014

Sample procedure in MySql with the use of a out parameter

Starting with the development of new operational data store, with integration of the portal.

The most handy feature is MySql procedure alongwith out parameters to pass the values on to a function or the front-end.

Example:

drop procedure rec_cnt;

delimiter //

create procedure rec_cnt(out param1 int)

begin

select count(*) into param1 from employee;

end//

delimiter ;

call rec_cnt(@param1);

select @param1;

Results:

20097 rows returned

Please like us on Google if you like our posts!! And feel free to leave your comments/suggestion.

Thursday, 24 July 2014

Using Group_Concat() function in MySQL - Handy function to extract all values of a column

The Group_Concat() function in MySQL saved the day for me this week. So, a post thanking the authors. :)

I was loading some basic geographical data of Users who logged in to a website. The data looked as follows:

Row_id	Language	Name
42914	ES	Reader1
42915	ES	Reader1
44623	EN	Reader1
44624	EN	Reader1
44625	EN	Reader1

The dataset is actually pretty heavy with lots of other columns. So, my objective was to have a single row for 'NAME' with all the languages he uses in a delimited format.

For example, the result output should become:

Name	Languages
Reader1	ES \| EN
Reader2	EN \| AR

And the group_concat() function in MySQL was just what I was looking for.

Example query:

select

Name

, group_concat(distinct actor_languages separator '|')

from tweet_data

group by Name

;

The query will group your dataset by Name, and create a pipe separated string of all the languages in the language column.

Please remember to use distinct. Else, we will have one entry for each row of Name, and this will become a really long string.

Suggestions or comments are welcome as always !!

Monday, 7 July 2014

Full outer join sample in MySQL - Replace a string in the dataset

Below is a sample query for doing a full outer join in MySQL:

The data joined from output_1 , output_2, output_3 and output_4 tables based on actor_link column.

We are also using the Replace() function of MySQL to replace a string in the resultant dataset.

Example Query:

select Replace(coalesce(out1.actor_link, out2.actor_link, out3.actor_link, out4.actor_link), 'http://www.twitter.com/',''), coalesce(cnt1,0)+ coalesce(cnt2,0) + coalesce(cnt3,0) + coalesce(cnt4, 0)

from

(select actor_link, count(gen_key) as cnt1 from output_1 group by actor_link) out1

left outer join

(select actor_link, count(gen_key) as cnt2 from output_2 group by actor_link) out2

on out1.actor_link = out2.actor_link

left outer join

(select actor_link , count(gen_key) as cnt3 from output_3 group by actor_link) out3

on out1.actor_link = out3.actor_link

left outer join

(select actor_link , count(gen_key) as cnt4 from output_4 group by actor_link) out4

on out1.actor_link = out4.actor_link

UNION

from

(select actor_link, count(gen_key) as cnt1 from output_1 group by actor_link) out1

right outer join

(select actor_link, count(gen_key) as cnt2 from output_2 group by actor_link) out2

on out1.actor_link = out2.actor_link

right outer join

(select actor_link , count(gen_key) as cnt3 from output_3 group by actor_link) out3

on out1.actor_link = out3.actor_link

right outer join

(select actor_link , count(gen_key) as cnt4 from output_4 group by actor_link) out4

on out1.actor_link = out4.actor_link

where out1.actor_link <> 'actor__link'

;

Please let us know if you need any help in your Analytics exploits.

Thursday, 26 June 2014

Loading a text file into MySQL and joining it with a table - Line delimiter may cause an issue

A post after a long time :)

Today, we will learn how to load a data file or text file directly into a MySQL table.

Step1: We create a table in using MySQL workbench

drop table response_values;

create table response_values(
respondent_serial varchar(10),
response_ID decimal(16,0),
D13new varchar(255),
Load_Date varchar(1),
primary key (respondent_serial)
);

Step 2: We run the below command to load the file into the table

load data local infile 'C:\\Data files\\Raw_Data_2013.txt' into table response_values
fields terminated by '|'
lines terminated by '\n'
(respondent_serial, response_ID,D13new, Load_date);

Step 3 : We now join the data of this file with another table using a Left Outer Join. MySQL follows the standard format of SQL.

Tips: As a best practice, don't keep the joining columns at the very end of the text file. If that is unavoidable, append an additional technical column as delimiter.

Example :

select RV.*, TD.*
from response_values as Rv left outer join TotFeed_Data as Td
on upper(trim(RV.D13new)) = upper(trim(td.actor_link) )
where d13new is not null
;

The Upper() function in MySQL converts the string into Upper-case.

Feel free to send any questions. Like us if you are helped.

Data Analysis Example

Pages

Tuesday, 8 September 2015

Big Data Lake Implementation - Moving Data from OLTP (MySQL) to HDFS using Apache Sqoop - Example scripts

Monday, 20 April 2015

Find the combination of strings in all orders and delete the duplicate records - Amazing Interview Question

Tuesday, 10 February 2015

Cursor code example in MySQL to update from multiple rows in source

Thursday, 22 January 2015

Find_in_set feature of MySQL to find elements in an array or list of Values - Example and Diagrams

Wednesday, 21 January 2015

Example script to extract one item from a list of values in MySQL - Use Prepare and Execute - Dynamic SQL

Insert only unqiue rows in MySQL-reject duplicate rows silently without error

Thursday, 15 January 2015

Date comparison confusion in MySQL - Read the example below

Monday, 12 January 2015

Detecting and validating email address in weblogs/tweets - MySQL example function

Tuesday, 23 December 2014

MySQL function to reject invalid email address from websites

Sunday, 21 December 2014

Understanding Implicit Commits in MySql on using Alter table Create Index on temporary table

Wednesday, 12 November 2014

Problems in transforming date using str_to_date function - null returned for valid date input

Wednesday, 15 October 2014

Accept or reject International Accent marks in MySql and comparing them using regex

Friday, 26 September 2014

User defined functions in MySql - check date format and leap year in MySql UDF

Saturday, 30 August 2014

Sample procedure in MySql with the use of a out parameter

Thursday, 24 July 2014

Using Group_Concat() function in MySQL - Handy function to extract all values of a column

Monday, 7 July 2014

Full outer join sample in MySQL - Replace a string in the dataset

Thursday, 26 June 2014

Loading a text file into MySQL and joining it with a table - Line delimiter may cause an issue

Knowledge Archive