Schema change is one of the crucial tasks in MySQL with huge tables. Schema change can cause locks.
What is gh-ost?
gh-ost is a triggerless online schema change for MySQL by Github Engineering .It produces light workload on the master during the schema changes . We need online schema change to alter a table without downtime (locking) in production.pt-online schema change is the most widely used tool for making changes in the tables.gh-ost is just an alternative to pt-online schema change.
Why we have to use gh-ost?
pt-online-schema uses triggers for migrating data from existing table to new table (shadow table ).It uses triggers like After insert,After update,After delete. It uses triggers to propagate live changes in the original table to the shadow table.pt-online schema uses synchronous approach ( All changes are applied on the temporary table immediately).
fb-online schema change uses asynchronous approach(All changes are added to change log table later applied on shadow table)
Major difference between these online schema changes and gh-ost is triggers less. Before understanding about gh-ost we need to know something about triggers and what are the disadvantages over triggers based online schema change.
Triggers
Trigger is a stored routine that get activated when a particular event occur on the table. For example a row can be inserted by using insert statement and insert trigger activates for the newly inserted row. Triggers generally contains set of queries.They use same transaction space as the query manipulates the tables. This ensures atomicity of both original table and shadow table ( new table ).
Query and trigger run concurrently and competes for resources.It may cause lock in production.Generally, Triggers are used to record ongoing changes to the original table. pt-osc pauses the execution when server has high currency (–critical-load) or replication lag (–max-lag ) is very high. But it never cancel the triggers in order to maintain synchronization with the original table and the duplicate table.
Triggerless
gh-ost does not use triggers. It is a triggerless.
How ghost track live changes in the original table ?
gh-ost uses asynchronous approach similar to fb-osc but it does not use triggers and change log table. Instead of using new change table, it uses binary logs.In order to use gh-ost online schema migration, we must enable a few variables in MySQL server.
1) log-bin=mysql-bin (gh-ost acts as fake replica and pulls the events from the binary logs)
2) binlog-format=row (gh-ost will not work if this variable is set to statement.we need to change it to the row based or let allow the ghost to change by using the option –switch-to-rbr )
3) log-slave-updates=on (if you are using gh-ost in slave, we must enable this variable in order to generate binary logs)
These variables must be enabled because gh-ost acts as fake replica and it needs binlogs to apply live changes in the table.
How to install gh-ost ?
Steps:
1)Download the release from its official site
2) Extract the file
How it works ?
- Connecting to replica/master In gh-ost online schema change we can avoid the –execute option by using no-op migration.It is similar to dry-run in pt-osc.
- validate alter statement and checking privileges and existence of tables.
- creation of ghost table which is similar to original table
- Apply alter on ghost table
- copy data from original table to ghost table
- copy live changes in the original table and copied to the ghost table by reading DML events from the binlog
- swapping the tables ghost → original and original → old
- Dropping the older table.
Ensure proper grants for MySQL user
After setting these things we need to check binlog_format. we need to set it to ROW.
2018-05-27 11:51:50 FATAL You must be using ROW binlog format. I can switch it for you, provided –switch-to-rbr and that localhost:3306 doesn’t have replicas
In the above line , we have given user,password,databases,tables to alter.we also add chunk size for the background table copying process
After setting binlog_format=row or use –switch-to-rbr option we can proceed to the next step.In verbose mode,it gives detailed output.without –execute flag we can give dry-run.It checks for all the preliminary validations like user privileges,binary logs,connections.
From the above output,we can understand that which table are we going to alter(users) and which table is used as temporary table(_users_gho).It also produces another table with suffix _ghc.It is used to store migration logs and status.And maximum acceptance lag is 1.5s.
From these we can understand,how many rows have been copied and how many events have been copied from the binary logs and also the estimated time to complete the copy of the tables.
Read binary logs from slave → Analyze tables on slave → apply changes in the master
lock the original table → after sync,rename the original table to _old → rename the ghost table to original table → Drop the old table
Advantages:
- No triggers and no additional tables are created
- configuration changes during run time.
Limitations:
- No support for foreign keys
- No support for tables having triggers
Image Courtesy: Photo by David Menidrey on Unsplash