Security: Data Masking in MySQL 8.0 server

Mydbops
Mar 30, 2020
Mins to Read
All

Data security plays a major role in current age. Privacy matters a lot. Data masking is one of the Key Features when comes to security.

  • In MySQL community versions if you want to mask your data, You can go with a Maxscale load balancer.
  • They introduced a new masking filter on the Maxscale 2.1 version.
  • My colleague Prasanth written about column-level data masking. For reference click here Column Level Data Masking.
  • In MySQL 8.0 Enterprise they have introduced a new security feature called data masking.
  • This feature was introduced in MySQL 8.0.13 Enterprise Edition and it provides data masking and de-identification capabilities.

What is Masking ?

  • Transformation of existing data to mask it and remove identifying characteristics, for example, changing all digits of a credit card number but the last four to ‘X’ characters.
  • Generation of random data, such as email addresses and payment card numbers.

In MySQL 8.0 they have introduced 14 functions,

 
1) gen_blacklist

2) gen_dictionary

3) gen_dictionary_drop

4) gen_dictionary_load

5) gen_range

6) gen_rnd_email

7) gen_rnd_pan

8) gen_rnd_ssn

9) gen_rnd_us_phone

10) mask_inner

11) mask_outer

12) mask_pan

13) mask_pan_relaxed

14) mask_ssn
	

Plugin Installation :

 
root@localhost:(none)>install plugin data_masking SONAME 'data_masking.so';
Query OK, 0 rows affected (0.10 sec)
	
  • In MySQL 8.0 there were different types of masking functions are available.
  • I have categorised those functions based on their usage.

Architecture Diagram :

General-Purpose Functions:

  • mask_inner
  • mask_outer

mask_inner :

  • It will mask the interior part of a string, We can specify the number of strings that needs to be masked.

Example :

 
root@localhost:mydbops_test>select mask_inner(aadhar_number,1,2) as aadhar_number,mask_inner(status,2,0) as status from aadhar_details;
+-----------------+--------+
| aadhar_number   | status |
+-----------------+--------+
| 1XXXXXXXXXXXX78 | ACXXXX |
| 9XXXXXXXXXXX78  | ACXXXX |
+-----------------+--------+
2 rows in set (0.01 sec)
	
  • In the above example, I have masked the (aadhar_number & status) column using mask_inner function. I have passed the 2 arguments.
  • The first argument indicates margin1. It will mask the inner part of the data.
  • The second argument indicates margin2. It will mask part of the data.

Note :

  • For using this mask_inner function you need minimum 3 characters.

mask_outer :

  • It will mask the left and right ends of a string. We can specify the number of strings that needs to be masked.

Example :

 
root@localhost:mydbops_test>select mask_outer(aadhar_number,3,2) as aadhar_number,mask_outer(status,2,2) as status from aadhar_details;
+-----------------+--------+
| aadhar_number   | status |
+-----------------+--------+
| XXX2334455667XX | XXTIXX |
| XXX883445566XX  | XXTIXX |
+-----------------+--------+
2 rows in set (0.00 sec)
	

In the above example, I have masked the (aadhar_number & status) column using mask_outer function.

I have passed the 2 arguments.

  • The first argument indicates margin1. It will mask the data from the beginning.
  • The second argument indicates margin2. It will mask the data from the end of the row.

Special-Purpose :

  • mask_pan
  • mask_pan_relaxed
  • mask_ssn

mask_pan :

  • The mask span function it will mask all the values from passed argument except the last four-digit.
Example :
 
root@localhost:mydbops_test>select mask_pan(aadhar_number),mask_pan(user_identity) from aadhar_details;
+-------------------------+-------------------------+
| mask_pan(aadhar_number) | mask_pan(user_identity) |
+-------------------------+-------------------------+
| XXXXXXXXXXX6778         | XXXXXXwxrin             |
| XXXXXXXXXX66201         | XXXXXXXrstg             |
+-------------------------+-------------------------+
2 rows in set (0.00 sec)
	
Note :
  • If the argument is shorter than required, it is returned unchanged value.
  • For using this (mask_pan) function, We need a minimum of 5 characters.
Example :
 
root@localhost:mydbops_test>select mask_pan('data');
+------------------+
| mask_pan('data') |
+------------------+
| data             |
+------------------+
1 row in set (0.00 sec)
	

mask_pan_relaxed :

  • The mask_pan_relaxed function will mask other characters except for first six digits and last 4 digits.
Example :
 
root@localhost:mydbops_test>select mask_pan_relaxed(photo) from aadhar_details;
+-------------------------------+
| mask_pan_relaxed(photo)       |
+-------------------------------+
| https:XXXXXXX.jpg             |
| https:XXXXXXXXXXXXXXXXXXX.jpg |
+-------------------------------+
2 rows in set (0.00 sec)
	
Note :
  • The string must be a suitable length,(ie.. string must be 11 characters long). Otherwise, this mask_pan_relaxed function did not check the string.

mask_ssn :

  • The mask_ssn is a Social Security number that returns the number except for last 4 digits. Other numbers are replaced by “X” characters.
Example :
 
root@localhost:mydbops_test>select mask_ssn('90963692211');
+-------------------------+
| mask_ssn('90963692211') |
+-------------------------+
| XXX-XX-2211             |
+-------------------------+
1 row in set (0.00 sec)

	

Generating Random Data with Specific Characteristics :

  • One of the nice functions of MySQL Data Masking is the ability to generate business-relevant datasets.
  • Because it is not always possible to test/simulate your application on your real dataset (indeed playing with the customer credit card or security social numbers is a very bad practice) this feature is very convenient.

Several functions are available,

gen_range() : returns a random integer selected from a given range.

gen_rnd_email() : returns a random email address in the example.com domain.

gen_rnd_pan() : returns a random payment card Primary Account Number.

gen_rnd_us_phone() : returns a random U.S. phone number.

Masking Data for Customer Identification :

  • This type of masking function is generating random data from the dictionary file.
  • Here the main advantage is we can load the dictionary files manually.
  • When a dictionary is loaded, it becomes part of the dictionary registry and is assigned a name to be used by other dictionary functions.
  • Dictionaries are loaded from plain text files containing one term per line. Empty lines are ignored. To be valid, a dictionary file must contain at least one non-empty line.

gen_dictionary :

  • It returns some random data from a dictionary.
Example :
 
mysql> select gen_dictionary('name') from aadhar_details;
+------------------------+
| gen_dictionary('name') |
+------------------------+
| La Trinité             |
| Fonds-Saint-Denis      |
+------------------------+
2 rows in set (0.00 sec)
	

gen_dictionary_load :

  • Loads a file into the dictionary registry and assigns the dictionary a name to be used with other functions that require a dictionary name argument.
  • In order to use data from a dictionary, we must first load the data.
Sample data :
 
[root@mydbopslabs26 mysql-files]# cat random_dictionary.txt
Basse-Pointe
Bellefontaine
Case-Pilote
Ducos
Fonds-Saint-Denis
Fort-de-France
Grand'Rivière
Gros-Morne
L'Ajoupa-Bouillon
La Trinité
	
  • We can load the dictionary files using (gen_dictionary_load) function.

Example :

 
mysql> SELECT gen_dictionary_load('/var/lib/mysql-files/random_dictionary','name');
+-----------------------------------------------------------------------+
| gen_dictionary_load('/var/lib/mysql-files/random_dictionary','name')  |
+-----------------------------------------------------------------------+
| Dictionary load success                                               |
+-----------------------------------------------------------------------+
1 row in set (0.00 sec)
	
Note :
  • To loading the dictionary file to MySQL you need to put the file under (secure_file_priv) directory.Otherwise, MySQL will not allow loading the dictionary file.
 
mysql> SELECT gen_dictionary_load('/home/mydbops/dictionary','user_identity');
+-----------------------------------------------------------------+
| gen_dictionary_load('/home/mydbops/dictionary','user_identity') |
+-----------------------------------------------------------------+
| Dictionary load error: dictionary file not readable             |
+-----------------------------------------------------------------+
1 row in set (0.00 sec)
	

How to use these functions in a production use case?

  • We can achieve this by creating views. While creating the views we can specify what are the columns needs to mask.
Sample Table:
 
mysql> desc aadhar_details;
+---------------+--------------+------+-----+---------+----------------+
| Field         | Type         | Null | Key | Default | Extra          |
+---------------+--------------+------+-----+---------+----------------+
| id            | int(11)      | NO   | PRI | NULL    | auto_increment |
| aadhar_number | bigint(20)   | NO   |     | NULL    |                |
| name          | varchar(16)  | NO   |     | NULL    |                |
| mobile        | int(11)      | YES  |     | NULL    |                |
| address       | varchar(150) | NO   |     | NULL    |                |
| email         | varchar(64)  | YES  |     | NULL    |                |
| photo         | blob         | YES  |     | NULL    |                |
| user_identity | blob         | YES  |     | NULL    |                |
| status        | varchar(16)  | YES  |     | NULL    |                |
| create_time   | timestamp    | NO   |     | NULL    |                |
+---------------+--------------+------+-----+---------+----------------+
10 rows in set (0.03 sec)
	

For the above table, I am masking a few columns, ie…

  • aadhar_number – The gen_range function will generate some random numbers.
  • mobile – The gen_rnd_us_phone function will generate random numbers in U.S standard format.
  • name & address – The gen_dictionary function will load random dictionary data.
  • email – The gen_rnd_email function will generate random email id’s.
  • photo – The mask_inner function will mask the inner part of the selected column.

Creating View

 
mysql> create view mydbops_masking
as
select
  gen_range(900000000, 999999999) as aadhar_number,
  gen_rnd_us_phone() as mobile,
  gen_dictionary('name') as name,
  gen_dictionary('address') as address,
  gen_rnd_email() as email,
  mask_inner(photo,1,5) as photo
from aadhar_details;
Query OK, 0 rows affected (0.03 sec)
	

Now the data looks like below,

 
mysql> select aadhar_number,mobile,name,email,photo from mydbops_masking;
+---------------+---------------+------------------------------------------------------------------------+----------------------+-------------------------------+
| aadhar_number | mobile        | name                                                                   | email                | photo                         |
+---------------+---------------+------------------------------------------------------------------------+----------------------+-------------------------------+
| 976896841     | 1-555379-4677 | ˙∆√∆ç©√˙∫∫∆\∫∆\©¨∆∫˙                                                   | lQGeD5ZA@example.com | hXXXXXXXXXXXl.jpg             |
| 990742763     | 1-555330-5061 | ∆˙©ç†˙∆˚˙©©¨˙©ƒ∂†®ƒ©˙∆˜µ∆∫˙¨©ƒç                                        | dSODymDW@example.com | hXXXXXXXXXXXXXXXXXXXXXXXe.jpg |
+---------------+---------------+------------------------------------------------------------------------+----------------------+-------------------------------+
2 rows in set (0.00 sec)
	
  • This is a really awesome feature in MySQL 8.0. But this feature is only available for Enterprise MySQL.
  • In Percona MySQL 8.0 they have implemented this same feature in open source. And they are expanding the Data-Masking plugin to implementing new features.

Don't compromise on data security. Partner with Mydbops, your MySQL security experts! Contact us today for a free consultation and unlock a new level of data protection for your MySQL environment.

{{cta}}

No items found.

About the Author

Mydbops

Subscribe Now!

Subscribe here to get exclusive updates on upcoming webinars, meetups, and to receive instant updates on new database technologies.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.