Generating Flexible Random data in MySQL

Mydbops
Jan 7, 2021
12
Mins to Read
All

Benchmarking MySQL compression methods often requires a realistic dataset that mimics real-world production workloads. I was about to benchmark the various compression methods in MySQL ( Via Engines ). But i liked to simulate my own data set similar to a production work load. Well searched for a tool which should be flexible to enough make customised table structure and more.

Mysql_random_data_load is a tool from Percona labs used to manipulate random data based on flexible table structure. This seems like a right fit for our benchmarking needs.

Let’s explore this tool to work efficiently on it.

MySQL Random Data Load :

Mysql_random_data_load will load (insert) ‘n’ number of records to the source table and populate it with random data based on data type. So this tool won’t determine the predefined table column or data type like sysbench. It will insert data into the table based on column data type. Thus we can generate random data based on our custom needs. The table can have any number of columns with varied data types, this tool will generate the data based on the data type of the column and insert the data.

How it works ?

Mysql random data load will load the data based on the table structure mentioned with most of common data type like int, char, text, blob, etc

  1. For strings
    • If field size < 10 the program generates a random “first name”
    • If the field size > 10 and < 30 the program generates a random “full name”
    • If the field size > 30 the program generates a “lorem ipsum” paragraph having up to 100 chars.
  2. The program can detect if a field accepts NULLs and if it does, it will generate NULLs randomly (~ 10 % of the values).

Download :-

https://github.com/Percona-Lab/mysql_random_data_load/releases

Sample Table structure:

I have created a simple table with INT , CHAR , BLOB datatypes for a sample use case.

 
mysql> show create table test\G
Table: test
Create Table: CREATE TABLE test (
id int(11) DEFAULT NULL,
name char(11) DEFAULT NULL,
desg blob
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
	

Options:

Let us have a view at the few important options. Before diving in

–debug :

From this debug option we are able to get more details about the data loading. The default bulk size is 1000 rows. We can notice in this example below with additional details. When the –debug mode is enabled the progress is disabled.

Example :

 
[root@mydbops11 vagrant]# ./mysql_random_data_load test test 1000000 -u -p --debug
DEBU[2019-10-09T07:15:14Z] &tableparser.Table{
    Schema: "test",
    Name:   "test",
    Fields: {
        {
            TableCatalog:           "def",
            TableSchema:            "test",
            TableName:              "test",
            ColumnName:             "id",
            OrdinalPosition:        1,
            ColumnDefault:          sql.NullString{},
            IsNullable:             false,
            DataType:               "int",
            CharacterMaximumLength: sql.NullInt64{},
            CharacterOctetLength:   sql.NullInt64{},
            NumericPrecision:       sql.NullInt64{Int64:10, Valid:true},
            NumericScale:           sql.NullInt64{Int64:0, Valid:true},
            DatetimePrecision:      sql.NullInt64{},
            CharacterSetName:       sql.NullString{},
            CollationName:          sql.NullString{},
            ColumnType:             "int(11)",
            ColumnKey:              "",
            Extra:                  "",
            Privileges:             "select,insert,update,references",
            ColumnComment:          "",
            GenerationExpression:   "",
            SetEnumVals:            {},
            Constraint:             (*tableparser.Constraint)(nil),
            SrsID:                  sql.NullString{},
        },
        {
            TableCatalog:           "def",
            TableSchema:            "test",
            TableName:              "test",
            ColumnName:             "name",
            OrdinalPosition:        2,
            ColumnDefault:          sql.NullString{},
            IsNullable:             false,
            DataType:               "char",
            CharacterMaximumLength: sql.NullInt64{Int64:1, Valid:true},
            CharacterOctetLength:   sql.NullInt64{Int64:4, Valid:true},
            NumericPrecision:       sql.NullInt64{},
            NumericScale:           sql.NullInt64{},
            DatetimePrecision:      sql.NullInt64{},
            CharacterSetName:       sql.NullString{String:"utf8mb4", Valid:true},
            CollationName:          sql.NullString{String:"utf8mb4_0900_ai_ci", Valid:true},
            ColumnType:             "char(11)",
            ColumnKey:              "",
            Extra:                  "",
            Privileges:             "select,insert,update,references",
            ColumnComment:          "",
            GenerationExpression:   "",
            SetEnumVals:            {},
            Constraint:             (*tableparser.Constraint)(nil),
            SrsID:                  sql.NullString{},
        },
        {
            TableCatalog:           "def",
            TableSchema:            "test",
            TableName:              "test",
            ColumnName:             "desg",
            OrdinalPosition:        3,
            ColumnDefault:          sql.NullString{},
            IsNullable:             false,
            DataType:               "blob",
            CharacterMaximumLength: sql.NullInt64{Int64:65535, Valid:true},
            CharacterOctetLength:   sql.NullInt64{Int64:65535, Valid:true},
            NumericPrecision:       sql.NullInt64{},
            NumericScale:           sql.NullInt64{},
            DatetimePrecision:      sql.NullInt64{},
            CharacterSetName:       sql.NullString{},
            CollationName:          sql.NullString{},
            ColumnType:             "blob",
            ColumnKey:              "",
            Extra:                  "",
            Privileges:             "select,insert,update,references",
            ColumnComment:          "",
            GenerationExpression:   "",
            SetEnumVals:            {},
            Constraint:             (*tableparser.Constraint)(nil),
            SrsID:                  sql.NullString{},
        },
    },
    Indexes: {
    },
    Constraints: {
    },
    Triggers: {
    },
    conn: (*sql.DB)(nil),
}
INFO[2019-10-09T07:15:14Z] Starting
DEBU[2019-10-09T07:15:14Z] Must run 1000 bulk inserts having 1000 rows each
INFO[2019-10-09T07:16:04Z] 1000000 rows inserted
	

–bulk-size :

The bulk size determines the chunk size of rows during bulk loading. In the below example tried a chunk size of 400.

Example:

 
[root@mydbops11 vagrant]# ./mysql_random_data_load test test 100000 -u -p --bulk-size 400 --debug
.......
INFO[2019-10-09T07:12:18Z] Starting
DEBU[2019-10-09T07:12:18Z] Must run 250 bulk inserts having 400 rows each
INFO[2019-10-09T07:12:19Z] 100000 rows inserted
	

–no-progressbar

By default, it is off. When you enable it. It masks the progress when it is enabled.

Disabled ( Default ) :

 
[root@mydbops11 vagrant]# ./mysql_random_data_load test test 1000000 -uroot -pMydbops@11
INFO[2019-10-17T16:28:51Z] Starting
  47s [====================================================================] 100%
INFO[2019-10-17T16:29:39Z] 1000000 rows inserted
[root@mydbops11 vagrant]#
	

Enabled :

 
[root@mydbops11 vagrant]# ./mysql_random_data_load test test 1000000 --no-progress -uroot -pMydbops@11
INFO[2019-10-17T16:39:30Z] Starting
INFO[2019-10-17T16:40:19Z] 1000000 rows inserted
[root@mydbops11 vagrant]#
	

–print :

This print option used to get the insert query instead of inserting the data to the tables.

 
[vagrant@mydbops11 ~]$ ./mysql_random_data_load test test 10 -uroot -pMydbops@11 --print
INSERT IGNORE INTO `test`.`test` (`id`,`name`,`desg`) VALUES
 (583532949, "B", "neque cum voluptas iste maiores est reprehenderit."),
 (1342458479, "M", "atque sint ratione nisi natus ad assumenda dolor."),
 (280366509, "M", "voluptates voluptatum quia rerum minus quis."),
 (1801160058, "S", "modi mollitia eligendi odio accusantium."),
 (914091476, "T", "aliquam et nam nihil error."),
 (1022430181, "J", "qui voluptatem sed impedit nesciunt molestias!"),
 (165910161, "V", "necessitatibus consequuntur id cupiditate."),
 (1255569388, "B", "voluptate atque consequatur in et."),
 (1375471152, "P", "sunt odit aperiam quibusdam in iusto."),
 (1705409249, "K", "aperiam voluptas sapiente culpa a sint dolore.")
	

Now it is time to make the benchmarking test case. I have created two tables with same table structure, with different engines Innodb and Rocksdb respectively. I have chose a longtext and blob data types.

 
mysql>CREATE TABLE test1 (
ID int DEFAULT NULL,
Name longtext,
Image blob
) ENGINE=InnoDB;
1 row in set (0.16 sec)

mysql> CREATE TABLE test2 (
ID int DEFAULT NULL,
Name longtext,
Image blob
) ENGINE=ROCKSDB;
1 row in set (0.02 sec)
	

Loaded around 100M records using mysql_random_data_load on both those tables.

 
mysql> select count(*) from test1;
+-----------+
| count(*)  |
+-----------+
| 100000000 |
+-----------+
1 row in set (1 min 4.13 sec)

mysql> select count(*) from test2; 
+-----------+ 
| count(*)  |
+-----------+
| 100000000 |
+-----------+
1 row in set (8 min 7.25 sec)
	

Now the data is loaded to both the table, let see the size of both the tables on disk.

Size of test1 (InnoDB):-

 
[root@mydbops11 db]# du -sh .
12G .
	

Size of test2 (RocksDB):-

 
[root@centos13 .rocksdb]# du -sh .
7.5G .
	

I have used the default setting for both InnoDB and RocksDB didn’t do any variable tuning for the max compression. So by default 25% of compression is occurring in RocksDB when compared to InnoDB engine. Now i am ready to test further with various combinations with the help of the customised data set by MySQL Random Data load.

This is a very simple example you can make further complex table structures with this tool. This tools any one who like to simulate realtime dataset and in performance related data loading.

Fine-tuning your MySQL database for optimal performance requires expert guidance. Mydbops, a leading provider of open-source database management services, offers a comprehensive suite of solutions to empower you. Get in touch with our MySQL specialists today to unlock the full potential of your databases!

{{cta}}

No items found.

About the Author

Mydbops

Subscribe Now!

Subscribe here to get exclusive updates on upcoming webinars, meetups, and to receive instant updates on new database technologies.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.