MusicBrainz Database in MySQL
Manual Version 2.1
Introduction
File Description
Installation
Requirements
Modifications from the Original Database
Globals
Setting up the database
Extra Scripts
DownloadCover.pl
LiveStats.pl
Replication
How replication works
Running replications
Update options
Database
Tuning MySQL
Engines
Table descriptions
This manual describes how to set up the MusicBrainz database into a MySQL database, and also how to run replications to update the data hourly from the MusicBrainz site.
| MB_MySQL | |
| doc | Documentation |
| manual.html | Manual |
| DownloadCover.pl | For downloading cover art from amazon. See DownloadCover |
| Globals.pl | Variables for things like MySQL user and password. Must be configured before any other script is executed. |
| Init.pl | First run. Contains functions for creating and loading initial database in, also dropping database. |
| LiveStats.pl | See LiveStats |
| MB_Funcs.pl | This contains the original functions from the MusicBrainz server code, that has been adapted for MySQL. |
| mbdump | For the full export, explained in the Installation. |
| README | ReadMe file. |
| replication | For the replication dumps. For Update.pl |
| SQL | Contains all the SQL code for creating and dropping the database and tables. |
| create_tables.sql | Table definitions. Only run once to set up the MySQL database and all tables. Used by Init.pl |
| drop_db.sql | Drop MySQL database. |
| dropmc_db.sql | For dropping MyCollection tables. |
| index.sql | Contains SQL statements for indexing the tables. |
| Update.pl | Update MySQL database from MusicBrainz replications. |
| update.sh | An example script, there is an explanation in the file. |
| VERSION | The version of the package. |
| versionlog-MB_MySQL.txt.txt | Version log. |
Modifications from the Original Database
This describes the what each variable is for in Globals.pl
| Variable | Default | Description |
| MySQL Options | ||
| $g_prog | 'mysql' | The command to run MySQL. |
| $g_use_login | 1 | Use a user and pass for MySQL. |
| $g_db_name | 'musicbrainz_db' | The database name. |
| $g_db_user | 'root' | User for MySQL. You may need to change this if you are not the administrator or sharing MySQL |
| $g_db_pass | Password for MySQL | |
| Replication Options | ||
| $g_rep_chkevery | 15 | This is how many minutes the Update script will wait before checking for the next replication |
| Schema Options | ||
| $g_schema_url | url | When a new database schema is released from MusicBrainz, this is where it will look for the update script. |
| Errors | ||
| $g_die_on_dupid | 0 | Die if a replication entry returns a duplicate ID error. |
| $g_die_on_error | 1 | If this is set to 1, the Update.pl script will die if a SQL error occurs. Set to 0 if you want it to ignore any errors and proceed. |
| DownloadCover | ||
| $g_download_new_covers | 0 | If this is set to '1' it will download new covers from amazon as they become available through replications. |
| $g_cover_path | path | The path to where the covers are to be stored. |
| LiveStats | ||
| $g_livestats | 1 | Use LiveStats.pl during replications. NOTE: You must run 'LiveStats.pl -i -f' first time. |
| Misc (these variables do not need to be changed) | ||
It is a good idea to read Tuning MySQL before setting up the database.
Once you have checked that you have all the requirements you are ready to proceed to creating the database. To do this you will first need to install the MusicBrainzMySQL package. Download the latest version (see requirements), then extract the contents into a local directory, the easiest place is the users home directory. After extraction you should have a directory called MB_MySQL, this is where all the scripts, SQL, and replication dumps will go. For more information, see File Description
Now you must configure the Globals.pl. See Globals
Next, to create the MySQL database ($g_db_name). Change directory to inside MB_MySQL
> cd MB_MySQL
MB_MySQL> perl Init.pl Choose option: 0 Continue (y/n): y
MB_MySQL> perl Init.pl Choose option: 1
Continue (y/n): y
It will very quickly create the almost 100 tables in the database.
Now you have to download the latest full export from ftp://ftp.musicbrainz.org/pub/musicbrainz/data/fullexport/. Select the latest folder and get mbdump.tar.bz2, mbdump-derived.tar.bz2 and mbdump-stats.tar.bz2. If want you can grab the others later, but it's other stuff like old edits that are not needed to get up and running.
This is quite large but only needs to be done once. When it has finished downloading, open the archive. Inside there will be a folder called mbdump, inside that will be the data for each table where the name of the file is the name of the table. Uncompress these files into MB_MySQL/mbdump it is important that the files go into the correct directory and the file names remain unchanged.
Run the Init.pl script again, except this time select the second option,
MB_MySQL> perl Init.pl Choose option: 2 Continue (y/n): y
This is fairly quick and takes about 15 mins. Now to index the tables:
MB_MySQL> perl Init.pl Choose option: 3 Continue (y/n): y
This may take a very long time, so you may need to leave it running all night or more.
This script can download cover art from amazon.com. Use the $g_download_new_covers option in Globals.pl to turn on/off.
| Option | Descripion |
| -h or --help | Show help. |
| -i=x or --id=x | Download the cover for the album with the album.id=x. |
| -s=x or --stopat=x | Stop when album.id=x. |
One of the well known issues with InnoDB tables is that it does not keep a record of how many rows there are in a table at any given time because of version issues. Therefore if you want to know how many records are in a table InnoDB will have to count the records one by one everytime you request a count(*), making it very slow.
LiveStats.pl is a simple add-on for the replications that gives you exact statistics on tables in realtime (that is; the moment a record is inserted or deleted the LiveStats value changes). To use LiveStats.pl you must first run 'LiveStats.pl -i -f' once AFTER the database has been fully loaded.
| Option | Descripion |
| -a or --action | The action to perform. "add" Add the number onto the current number. "subtract" Add the number from the current number. "set" Sets the value. |
| -c or --count | Refresh count.* values. |
| -h or --help | Help. |
| -i or --init | Initialise LiveStats. You must use -r to refresh the stats. |
| -r or --refresh | Completely reload all the statistics. |
| -s or --silent | Don't report any messages. |
Every hour MusicBrainz creates a compressed archive with all the changes to the database for that hour. Each archive is named replication-x.tar.bz2 where x is the number for that replication, that increments by 1 each hour. In the archive is 2 important files, they are the data that need to be loaded into the Pending and PendingData tables. The Pending table contains the transactions that need to occur and the PendingData table contains to data for the new changes to the database.
When the Update.pl script is run, it will try to download the next replication from the MusicBrainz site, if it exists it will download and uncompress the files, load the Pending and PendingData tables, then run each transaction deleting the transaction when the modification has been to the database.
If something goes wrong while the Update.pl script is running it is fine because when it is run again it will see that there is pending transactions. And will pick up where it left off.
To update the database with the latest data from the MusicBrainz database, all you need to do is run the Update.pl script. There are other options for the Update.pl script (see Update options)
MB_MySQL> perl Update.pl
It will automatically download the latest replication archive (if it's available), uncompress it and update the whole MySQL database.
These are the flags that be appended to the Update.pl script. Examples
Update.pl -r |
This is handy for leaving it running in the background, and just leaving it to run the replications automatically. |
Update.pl --quiet -p |
Only run the pending transactions, without printing what is being done (quiet). |
Update.pl -p -n |
Only run the pending transactions and display the progress each time a new XID is being processed. |
Update.pl -c |
Update the database to the current replication then exit |
| Option | Descripion |
| -c or --onlytocurrent | Keep updating as many replications as possible until no more replications can be found on the server, then quit |
| -g=x or --skiptorep=x | Change the replication number to x. |
| -h or --help | Show help. |
| -i or --info | Only shows the information about the current replication and pending transactions |
| -n or --nonverbose | Only print the progress every time a new XID is started. This is to minimise the printing to the console |
| -p or --onlypending | Only process pending transactions then quit |
| -q or --quiet | Non-verbose. The status each statement is not printed |
| -r or --keeprunning | Keep the script running contantly, automatically checking for new replications every 30 minutes |
| -s or --showall | This will show what type of statement (INSERT/UPDATE/DELETE) is being run and how long it takes for the statement to process. |
| -t or --truncate | Force TRUNCATE on Pending and PendingData tables. |
There are a few tuning options to make sure your MusicBrainz MySQL database runs as fast as possible. The main focus is to make InnoDB run as efficiently as possible to keep the tables fast for searching. This can be done through configuring InnoDB. This can be done through the MySQL Administrator or via the my.cnf file.
An engine is how MySQL stores the data. Different engines have different strong points. All tables use the InnoDB engine, this is because INSERT/UPDATE/DELETE statements are really fast, much faster than MyISAM tables for UPDATEs and DELETEs because of indexing.