In some occasions, I need to access the older version of Ensembl human transcripts. For example, the mutation calls generated by the NCI’s Genomic Data Common pipeline are annotated by Ensembl v84. To programmatically query the Ensembl annotations, I use the EnsDb SQLite database created by ensembldb, which is a R package I enjoy using (see my previous post for its usage).
The EnsDbs of the recent versions of Ensembl (v87+) are available on AnnotationHub. However, the older versions are not available, and they don’t get updated when ensembldb introduces a new feature. For example, now newer EnsDbs include the transcript and gene ID version (github issue).
In my case, I need to build a EnsDb of Ensembl v84. The ensembldb’s documentation describes how to build one from the public Ensembl MySQL server. However, this method will take more than a day to complete. I started to look for other methods. After some trial and error, I managed to create my EnsDb fast by connecting to a local Ensembl database that I built. Surprisingly the setup wasn’t difficult at all, and it only took about an hour to build the EnsDb.
Here are my notes of how to create the EnsDB from a local Ensembl MySQL database. I use macOS but the steps can be easily modified to work on other OSes.
To create a EnsDB from a Ensembl MySQL database, we need to the Ensembl Perl APIs. And the easiest setup is by a Ensembl virtual machine. We just need to import the VM image using VirtualBox and install the ensembldb R package inside the VM, then it is ready to build the EnsDb. I recommend the VM to have more memory than the default 1GB since a larger memory helps build the R packages and EnsDb.
Build a local Ensembl v84 MySQL database
Ensembl provides the MySQL database dump to allow easy import of their data of any version. Assuming the working directory is
~/Documents/Ensembl_MySQL_mirror/, we first copy the database dump by:
cd ~/Documents/Ensembl_MySQL_mirror # Download the db dump rsync -a rsync://ftp.ensembl.org/ensembl/pub/release-84/mysql/homo_sapiens_core_84_38 . # MySQL doesn't accept compressed db dump files so we decompress them gunzip *.txt.gz
While downloading the data, we also need to install the MySQL server. I install the same or similar version of MySQL Ensembl is currently using, which is 5.6 at the time of writing. On macOS, Homebrew can specify the version of MySQL to be installed:
brew install email@example.com # And launch the MySQL server /firstname.lastname@example.org/bin/mysql.server start
First we create a database whose name matches the Ensembl version (v84):
CREATE DATABASE homo_sapiens_core_84_38;
Then we load the table schema and Ensembl data:
/email@example.com/bin/mysql -u root \ homo_sapiens_core_84_38 < homo_sapiens_core_84_38.sql /firstname.lastname@example.org/bin/mysqlimport \ -u root \ --fields-terminated-by='\t' --fields_escaped_by=\\ \ homo_sapiens_core_84_38 -L *.txt
Finally, we modify the MySQL config at
/usr/local/etc/my.cnf to accept remote database connection, so our VM can access the database on the host machine. I don’t use MySQL for anything else, so I simply let MySQL binds to all the possible IP addresses my machine has:
[mysqld] bind-address = *
Note that this is not a secure configuration. To be secure, there should be a designated MySQL user with limited permission and a stricter connection setting. Restart MySQL to load the new config:
Write down an (local) IP address of our host machine.
Build EnsDB from the local MySQL database
Now we can come back to the vm and build the EnsDb v84. Run the following R script to build the EnsDb:
library(ensembldb) fetchTablesFromEnsembl( 84, species = "human", user = 'root', host = '<our host IP>', port = 3306 ) DBFile <- makeEnsemblSQLiteFromTables()
The EnsDb SQLite database will be availabe under the working directory. We can test the new EnsDb by:
edb <- EnsDb(DBFile)
If there is no other need of MySQL, we can uninstall it and remove all its databases by:
brew remove email@example.com rm -rf /usr/local/var/mysql