Wednesday 15 October 2014

Accept or reject International Accent marks in MySql and comparing them using regex

After fighting over this for the last couple of hours, I have nailed it at last. Storage and filtering international accent marks in MySql , one of the most sought after problems for any portal's underlying DB.

So, my approach was to first pass the column value to the function and store as a BLOB. Please note that if you store it as Varchar, your regular expressions will not work.

Accent marks span over multiple bytes, refer the collation and charset pages on mysql manuals.

Example:

select char_length('À'); # 1 as the total number of characters
select length('À');  #2 as the total number of bytes.

Now, the example function to reject any special characters and number and to accept all alphabets and international accent marks.

Example function:

drop function if exists fn_alphabetic_with_acc_hypen;
DELIMITER $$
CREATE FUNCTION fn_alphabetic_with_acc_hypen(column_value Varchar(200)) RETURNS tinyint(1)
BEGIN
 DECLARE is_alpha_acc_hyphen BOOLEAN;

 SET is_alpha_acc_hyphen = 0;

 # If input string contains only numbers, then return 1 else 0
 #using concat function to preserve the hexadecimal codes

IF column_value not REGEXP concat('[',x'21','-',x'2C',x'2E','-',x'40',x'5B','-',x'60',x'7B','-',x'7E',x'A1','-',x'BF',']') then
 SET is_alpha_acc_hyphen = 1;
# 1 means true
 else
 set is_alpha_acc_hyphen = 0;
# 0 means false
 END IF;
 RETURN is_alpha_acc_hyphen;
 END$$
DELIMITER ;

If we run the below select, it will return 1 for international accent mark containing strings.

select fn_alphabetic_with_acc_hyphen('A Àcb'); # results in 1 (TRUE) - that it accepts the international accent marks

Please let us know your suggestion/advice. Like us if you were helped.