MySQL is the most widely used Relational Database Management System in the open source world. MySQL stepped into the NoSQL world by introducing the JSON Data Type in MySQL 5.7 release. In this blog post I am going to explain one of the major advantage of optimisation made in JSON Replication .
Before MySQL 8.0.3, Whatever changes (Updates) made in JSON document, it will log as a full document in binary log & replicate the same into slave. The JSON data is stored as a blob internally. Below is an example of how it is logged as full document in binary log ?
Example –
Server version - 5.7.22-log MySQL Community Server (GPL)
My Binlog settings,
log-bin
Binlog_format = ROW
Binlog_row_image = minimal
Creating table with JSON data type,
mysql> create table json_repl (id int primary key, emp_de json);
Query OK, 0 rows affected (0.01 sec)
Inserting the values,
mysql> insert into json_repl values(1,'{"tag": "desk1 addr1 history1 grade1", "book": "table1 and emp_historicaldata", "emp_id": "aaaa1", "designation": "MySQL DBA", "position_grade": "A1 grade1 and MASTER in MySQL"}');
Query OK, 1 row affected (0.01 sec)
Updating the record with JSON function JSON_REPLACE.
mysql> update json_repl set emp_de = json_replace(emp_de,'$.emp_id','a1');
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
Binlog view ,
mysqlbinlog -vvv --base64-output=DECODE-ROWS mydbopslabs13-bin.000011 > log57
cat log57
### UPDATE `json_store`.`json_repl`
### WHERE
### @1=1 /* INT meta=0 nullable=0 is_null=0 */
### SET
### @2='\x00\x05\x00\x00\x27\x00\x03\x00*\x00\x04\x00.\x00\x06\x004\x00\x0b\x00?\x00\x0e\x00\x0cM\x00\x0ci\x00\x0c<87>\x00\x0c<8A>\x00\x0c<94>\x00tagbookemp_iddesignationposition_grade\x1bdesk1 addr1 history1 grade1\x1dtable1 and emp_historicaldata\x02a1\x09MySQL DBA\x1dA1 grade1 and MASTER in MySQL' /* JSON meta=4 nullable=1 is_null=0 */
# at 666596
#180421 23:00:58 server id 1 end_log_pos 666627 CRC32 0xbf0a14ea Xid = 138659
COMMIT/*!*/;
Here, the JSON_REPLACE function is used to update the value in emp_id field . But, it is logging the full document in binary log file .
What happened after 8.0.3 ?
From MySQL 8.0.3, MySQL community released the new variable to avoid the full document logging into the binary log . The new variable “binlog_row_value_options”. The valid options are PARTIAL_JSON or empty(‘’) . More information in the worklog (WL 2955)
Below, I have explained how the replication is differed from older versions .
Example –
MySQL Version - MySQL 8.0.3
My Binlog settings ( my.cnf )
log-bin
binlog_format=row
binlog_row_image=minimal
binlog_row_value_options=partial_json
Creating table with JSON data type,
mysql> create table json_repl (id int primary key, emp_de json);
Query OK, 0 rows affected (0.01 sec)
Inserting the values,
mysql> insert into json_repl values(1,'{"tag": "desk1 addr1 history1 grade1", "book": "table1 and emp_historicaldata", "emp_id": "aaaa1", "designation": "MySQL DBA", "position_grade": "A1 grade1 and MASTER in MySQL"}');
Query OK, 1 row affected (0.01 sec)
Updating the record with JSON function JSON_REPLACE.
mysql> update json_repl set emp_de = json_replace(emp_de,'$.emp_id','a1');
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
Binlog view ,
mysqlbinlog -vvv --base64-output=DECODE-ROWS mydbopslabs15-bin.000008 > log803
cat log803
### UPDATE `json_store`.`json_repl`
### WHERE
### @1=1 /* INT meta=0 nullable=0 is_null=0 */
### SET
### @2=JSON_REPLACE(@2, '$.emp_id', 'a1') /* JSON meta=4 nullable=1 is_null=0 */
# at 516165
#180421 23:02:27 server id 3 end_log_pos 516196 CRC32 0x43c90650 Xid = 127163
COMMIT/*!*/;
Now, we are can observe only the updated field in binlog . Hope, it clearly explains the optimisation in JSON .
How it Benefits in Production Environment ?
Of course, there are many benefits. One of the major advantage is less disk usage .
Lower disk usage (IO) –
By avoiding to logging the full document, we can save the disk usage (IO). I am just curious to know how much disk we can save by this approach, so i have made an test with 1 core records in both MySQL 5.7 and MySQL 8.0.3 . Below i am sharing the findings and my test results as well .
I have initiated the below script for insert the record in both MySQL 5.7 & MySQL 8.0.3.
For INSERT,
[sakthi@mydbopslabs13 mysql]# cat json_insert.sh
#bin/bash
for i in `seq 2 10000000`;
do
mysql -e "insert into json_store.json_repl values($i,'{\"emp_id\": \"aaaa$i\", \"tag\": \"desk$i addr$i history$i grade$i\", \"book\": \"table$i and emp_historicaldata\", \"position_grade\": \"A$i grade$i and MASTER in MySQL\", \"designation\": \"MySQL DBA\"}');"
done
After inserting the records, i have decided to update every values based on primary key with the below script in both MySQL 5.7 and MySQL 8.0.3. Then we can finally analyze the binlog growth
[sakthi@mydbopslabs13 mysql]# cat json_update.sh
#/bin/bash
mysql -e "flush logs;"
for i in `seq 1 10000000`;
do
mysql -e "update json_store.json_repl set emp_de = json_replace(emp_de,'$.emp_id','a$i') where id=$i;"
done
I have analysed the binlog growth in both MySQL 5.7 & MySQL 8.0.3 post JOSN Update. It is really good .
Binlog Growth for update in MySQL 5.7,
rw-r-----. 1 mysql mysql 1.1G Apr 21 20:05 mydbopslabs13-bin.0000017
rw-r-----. 1 mysql mysql 1.1G Apr 21 20:34 mydbopslabs13-bin.0000018
rw-r-----. 1 mysql mysql 1.1G Apr 21 21:02 mydbopslabs13-bin.0000019
rw-r-----. 1 mysql mysql 1.1G Apr 21 21:29 mydbopslabs13-bin.0000020
rw-r-----. 1 mysql mysql 1.1G Apr 21 21:57 mydbopslabs13-bin.0000021
rw-r-----. 1 mysql mysql 1.1G Apr 21 22:31 mydbopslabs13-bin.0000022
rw-r-----. 1 mysql mysql 652M Apr 21 23:00 mydbopslabs13-bin.0000023
Total – 6 GB 652 MB
Binlog Growth for update in MySQL 8.0,
-rw-r-----. 1 mysql mysql 1.1G Apr 21 20:07 mydbopslabs15-bin.000012
-rw-r-----. 1 mysql mysql 1.1G Apr 21 20:38 mydbopslabs15-bin.000013
-rw-r-----. 1 mysql mysql 1.1G Apr 21 21:09 mydbopslabs15-bin.000014
-rw-r-----. 1 mysql mysql 1.1G Apr 21 21:35 mydbopslabs15-bin.000015
-rw-r-----. 1 mysql mysql 1.1G Apr 21 22:11 mydbopslabs15-bin.000016
-rw-r-----. 1 mysql mysql 74M Apr 21 22:17 mydbopslabs15-bin.000017
Total – 5 GB 74 MB
Binlog Growth Graph –
From the analysis we have saved approximately 1.5 GB of disk space. It will be vary based on yours JSON data .
Less Network Traffic –
We can transfer the small amount of data over network instead of the full document . By this approach We can save the unwanted network usage in replication too.
Can avoid Replication Lag –
Replicating the big document (blob) might cause slower writes to the slave. So, there is a chance for replication lag. We can avoid the replication lag by this approach .
Limitations –
This only take effect with the below JSON functions. Without the below functions, it will log the full document even the partial_json enabled .
JSON_SET()
JSON_REPLACE()
JSON_REMOVE()
Input column and target column should be the same .
It only affect the ROW based replication Environment . Statement based replication will always works as its nature .
Conclusion –
JSON is very good for store the NoSQL data . It is very good for read purpose . MySQL has introduced the JSON and keep improving it . Definitely, JSON will play an important role for making the MySQL better . At Mydbops, We are keep testing the new things on MySQL and related tools, will be coming back with new blog soon.