MusicBrainz Database in MySQL
Manual Version 2.1

Introduction
   File Description
Installation
   Requirements
   Modifications from the Original Database
   Globals
   Setting up the database
Extra Scripts
   DownloadCover.pl
   LiveStats.pl
Replication
   How replication works
   Running replications
   Update options
Database
   Tuning MySQL
   Engines
   Table descriptions

 


Introduction

This manual describes how to set up the MusicBrainz database into a MySQL database, and also how to run replications to update the data hourly from the MusicBrainz site.

File Description

MB_MySQL  
   doc Documentation
      manual.html Manual
   DownloadCover.pl For downloading cover art from amazon. See DownloadCover
   Globals.pl Variables for things like MySQL user and password. Must be configured before any other script is executed.
   Init.pl First run. Contains functions for creating and loading initial database in, also dropping database.
   LiveStats.pl See LiveStats
   MB_Funcs.pl This contains the original functions from the MusicBrainz server code, that has been adapted for MySQL.
   mbdump For the full export, explained in the Installation.
   README ReadMe file.
   replication For the replication dumps. For Update.pl
   SQL Contains all the SQL code for creating and dropping the database and tables.
      create_tables.sql Table definitions. Only run once to set up the MySQL database and all tables. Used by Init.pl
      drop_db.sql Drop MySQL database.
      dropmc_db.sql For dropping MyCollection tables.
      index.sql Contains SQL statements for indexing the tables.
   Update.pl Update MySQL database from MusicBrainz replications.
   update.sh An example script, there is an explanation in the file.
   VERSION The version of the package.
   versionlog-MB_MySQL.txt.txt Version log.

 


Installation

Requirements

Modifications from the Original Database

Globals

This describes the what each variable is for in Globals.pl

Variable Default Description
 
MySQL Options
$g_prog 'mysql' The command to run MySQL.
$g_use_login 1 Use a user and pass for MySQL.
$g_db_name 'musicbrainz_db' The database name.
$g_db_user 'root' User for MySQL. You may need to change this if you are not the administrator or sharing MySQL
$g_db_pass   Password for MySQL
     
Replication Options
$g_rep_chkevery 15 This is how many minutes the Update script will wait before checking for the next replication
     
Schema Options
$g_schema_url url When a new database schema is released from MusicBrainz, this is where it will look for the update script.
     
Errors
$g_die_on_dupid 0 Die if a replication entry returns a duplicate ID error.
$g_die_on_error 1 If this is set to 1, the Update.pl script will die if a SQL error occurs. Set to 0 if you want it to ignore any errors and proceed.
     
DownloadCover
$g_download_new_covers 0 If this is set to '1' it will download new covers from amazon as they become available through replications.
$g_cover_path path The path to where the covers are to be stored.
     
LiveStats
$g_livestats 1 Use LiveStats.pl during replications. NOTE: You must run 'LiveStats.pl -i -f' first time.
     
Misc (these variables do not need to be changed)

Setting up the database

It is a good idea to read Tuning MySQL before setting up the database.

Once you have checked that you have all the requirements you are ready to proceed to creating the database. To do this you will first need to install the MusicBrainzMySQL package. Download the latest version (see requirements), then extract the contents into a local directory, the easiest place is the users home directory. After extraction you should have a directory called MB_MySQL, this is where all the scripts, SQL, and replication dumps will go. For more information, see File Description

Now you must configure the Globals.pl. See Globals

Next, to create the MySQL database ($g_db_name). Change directory to inside MB_MySQL

> cd MB_MySQL
MB_MySQL> perl Init.pl Choose option: 0 Continue (y/n): y
MB_MySQL> perl Init.pl Choose option: 1
Continue (y/n): y

It will very quickly create the almost 100 tables in the database.

Now you have to download the latest full export from ftp://ftp.musicbrainz.org/pub/musicbrainz/data/fullexport/. Select the latest folder and get mbdump.tar.bz2, mbdump-derived.tar.bz2 and mbdump-stats.tar.bz2. If want you can grab the others later, but it's other stuff like old edits that are not needed to get up and running.

This is quite large but only needs to be done once. When it has finished downloading, open the archive. Inside there will be a folder called mbdump, inside that will be the data for each table where the name of the file is the name of the table. Uncompress these files into MB_MySQL/mbdump it is important that the files go into the correct directory and the file names remain unchanged.

Run the Init.pl script again, except this time select the second option,

 MB_MySQL> perl Init.pl
 Choose option: 2
 Continue (y/n): y

This is fairly quick and takes about 15 mins. Now to index the tables:

 MB_MySQL> perl Init.pl
 Choose option: 3
 Continue (y/n): y

This may take a very long time, so you may need to leave it running all night or more.

 


Extra Scripts

DownloadCover.pl

This script can download cover art from amazon.com. Use the $g_download_new_covers option in Globals.pl to turn on/off.

Option Descripion
-h or --help Show help.
-i=x or --id=x Download the cover for the album with the album.id=x.
-s=x or --stopat=x Stop when album.id=x.

LiveStats.pl

One of the well known issues with InnoDB tables is that it does not keep a record of how many rows there are in a table at any given time because of version issues. Therefore if you want to know how many records are in a table InnoDB will have to count the records one by one everytime you request a count(*), making it very slow.

LiveStats.pl is a simple add-on for the replications that gives you exact statistics on tables in realtime (that is; the moment a record is inserted or deleted the LiveStats value changes). To use LiveStats.pl you must first run 'LiveStats.pl -i -f' once AFTER the database has been fully loaded.

Option Descripion
-a or --action The action to perform.
"add" Add the number onto the current number.
"subtract" Add the number from the current number.
"set" Sets the value.
-c or --count Refresh count.* values.
-h or --help Help.
-i or --init Initialise LiveStats. You must use -r to refresh the stats.
-r or --refresh Completely reload all the statistics.
-s or --silent Don't report any messages.

 

 


Replication

How replication works

Every hour MusicBrainz creates a compressed archive with all the changes to the database for that hour. Each archive is named replication-x.tar.bz2 where x is the number for that replication, that increments by 1 each hour. In the archive is 2 important files, they are the data that need to be loaded into the Pending and PendingData tables. The Pending table contains the transactions that need to occur and the PendingData table contains to data for the new changes to the database.

When the Update.pl script is run, it will try to download the next replication from the MusicBrainz site, if it exists it will download and uncompress the files, load the Pending and PendingData tables, then run each transaction deleting the transaction when the modification has been to the database.

If something goes wrong while the Update.pl script is running it is fine because when it is run again it will see that there is pending transactions. And will pick up where it left off.

Running replications

To update the database with the latest data from the MusicBrainz database, all you need to do is run the Update.pl script. There are other options for the Update.pl script (see Update options)

 MB_MySQL> perl Update.pl

It will automatically download the latest replication archive (if it's available), uncompress it and update the whole MySQL database.

Update options

These are the flags that be appended to the Update.pl script. Examples

 Update.pl -r 
This is handy for leaving it running in the background, and just leaving it to run the replications automatically.
 Update.pl --quiet -p 
Only run the pending transactions, without printing what is being done (quiet).
 Update.pl -p -n
Only run the pending transactions and display the progress each time a new XID is being processed.
 Update.pl -c
Update the database to the current replication then exit

Option Descripion
-c or --onlytocurrent Keep updating as many replications as possible until no more replications can be found on the server, then quit
-g=x or --skiptorep=x Change the replication number to x.
-h or --help Show help.
-i or --info Only shows the information about the current replication and pending transactions
-n or --nonverbose Only print the progress every time a new XID is started. This is to minimise the printing to the console
-p or --onlypending Only process pending transactions then quit
-q or --quiet Non-verbose. The status each statement is not printed
-r or --keeprunning Keep the script running contantly, automatically checking for new replications every 30 minutes
-s or --showall This will show what type of statement (INSERT/UPDATE/DELETE) is being run and how long it takes for the statement to process.
-t or --truncate Force TRUNCATE on Pending and PendingData tables.

 


Database

Tuning MySQL

There are a few tuning options to make sure your MusicBrainz MySQL database runs as fast as possible. The main focus is to make InnoDB run as efficiently as possible to keep the tables fast for searching. This can be done through configuring InnoDB. This can be done through the MySQL Administrator or via the my.cnf file.

MySQL Administrator
  1. Login. If MySQL is running on this machine you can use the default values
    Server Hostname: localhost
    Port: 3306

  2. Select Startup Parameters from the side bar. And change to the InnoDB Parameters tab:
    You can active each option by clicking on the notepad icon next to the item.

    1. Buffer Pool Size: This is how much RAM will be allocated for InnoDB. Making this larger will decrease the number of disk read/writes and therefore improve the speed of the tables.  It is not uncommon to set this to 256MB on a machine that has 512MB of RAM.
    2. Add. mem Pool Size: This is to hold the table definitions so that InnoDB can read and write to the tables without having to read the table definintions each time it does so. The recomended value is 16MB for musicbrainz_db. A larger value won't give anymore performance because all the table definitions can fit in the 16MB

Engines

An engine is how MySQL stores the data. Different engines have different strong points. All tables use the InnoDB engine, this is because INSERT/UPDATE/DELETE statements are really fast, much faster than MyISAM tables for UPDATEs and DELETEs because of indexing.

Table descriptions

Pending
PendingData

Pending (MyISAM)
Holds pending transactions, connects with PendingData
 
SeqId Sequence ID. Every change to the database has a unique transaction ID
TableName The name of the table that the transaction belongs to
Op The operation to be performed. i INSERT; u UPDATE; d DELETE
XID There is 1 or more SeqIds connected to an XID
 

 
PendingData (MyISAM)
Holds the data for the Pending table
   
SeqId Sequence ID
IsKey If that record is a key
Data Data for the SQL statement