Following system color scheme Selected dark color scheme Selected light color scheme

Liang-Bo Wang's Blog

About | Talks | Archives |

Build EnsDb from a local Ensembl MySQL database

In some occasions, I need to access the older version of Ensembl human transcripts. For example, the mutation calls generated by the NCI’s Genomic Data Common pipeline are annotated by Ensembl v84. To programmatically query the Ensembl annotations, I use the EnsDb SQLite database created by ensembldb, which is a R package I enjoy using (see my previous post for its usage).

The EnsDbs of the recent versions of Ensembl (v87+) are available on AnnotationHub. However, the older versions are not available, and they don’t get updated when ensembldb introduces a new feature. For example, now newer EnsDbs include the transcript and gene ID version (github issue).

In my case, I need to build a EnsDb of Ensembl v84. The ensembldb’s documentation describes how to build one from the public Ensembl MySQL server. However, this method will take more than a day to complete. I started to look for other methods. After some trial and error, I managed to create my EnsDb fast by connecting to a local Ensembl database that I built. Surprisingly the setup wasn’t difficult at all, and it only took about an hour to build the EnsDb.

Here are my notes of how to create the EnsDB from a local Ensembl MySQL database. I use macOS but the steps can be easily modified to work on other OSes.

Ensembl VM

To create a EnsDB from a Ensembl MySQL database, we need to the Ensembl Perl APIs. And the easiest setup is by a Ensembl virtual machine. We just need to import the VM image using VirtualBox and install the ensembldb R package inside the VM, then it is ready to build the EnsDb. I recommend the VM to have more memory than the default 1GB since a larger memory helps build the R packages and EnsDb.

Build a local Ensembl v84 MySQL database

Ensembl provides the MySQL database dump to allow easy import of their data of any version. Assuming the working directory is ~/Documents/Ensembl_MySQL_mirror/, we first copy the database dump by:

cd ~/Documents/Ensembl_MySQL_mirror

# Download the db dump
rsync -a rsync://ftp.ensembl.org/ensembl/pub/release-84/mysql/homo_sapiens_core_84_38 .

# MySQL doesn't accept compressed db dump files so we decompress them
gunzip *.txt.gz

While downloading the data, we also need to install the MySQL server. I install the same or similar version of MySQL Ensembl is currently using, which is 5.6 at the time of writing. On macOS, Homebrew can specify the version of MySQL to be installed:

brew install mysql@5.6
# And launch the MySQL server
/usr/local/opt/mysql@5.6/bin/mysql.server start

First we create a database whose name matches the Ensembl version (v84):

CREATE DATABASE homo_sapiens_core_84_38;

Then we load the table schema and Ensembl data:

/usr/local/opt/mysql@5.6/bin/mysql -u root \
    homo_sapiens_core_84_38 < homo_sapiens_core_84_38.sql

/usr/local/opt/mysql@5.6/bin/mysqlimport \
    -u root \
    --fields-terminated-by='\t' --fields_escaped_by=\\  \
    homo_sapiens_core_84_38 -L *.txt

Finally, we modify the MySQL config at /usr/local/etc/my.cnf to accept remote database connection, so our VM can access the database on the host machine. I don’t use MySQL for anything else, so I simply let MySQL binds to all the possible IP addresses my machine has:

[mysqld]
bind-address = *

Note that this is not a secure configuration. To be secure, there should be a designated MySQL user with limited permission and a stricter connection setting. Restart MySQL to load the new config:

/usr/local/opt/mysql@5.6/bin/mysql.server restart

Write down an (local) IP address of our host machine.

Build EnsDB from the local MySQL database

Now we can come back to the vm and build the EnsDb v84. Run the following R script to build the EnsDb:

library(ensembldb)
fetchTablesFromEnsembl(
    84, species = "human",
    user = 'root', host = '<our host IP>', port = 3306
)
DBFile <- makeEnsemblSQLiteFromTables()

The EnsDb SQLite database will be availabe under the working directory. We can test the new EnsDb by:

edb <- EnsDb(DBFile)

Remove MySQL

If there is no other need of MySQL, we can uninstall it and remove all its databases by:

brew remove mysql@5.6
rm -rf /usr/local/var/mysql