Use DuckDB in ensembldb to query Ensembl's genome annotations

I have been using ensembldb to query genome annotations locally, which stores the Ensembl annotations in a offline SQLite database. By replacing the database engine with DuckDB, genome-wide queries are faster with small impact on gene specific queries (depending on the usage). DuckDB database’s file size is also smaller, and it can be even smaller by offloading the tables to external Parquet files.

Access gene annotation using gffutils

Recently, I had to access gene annotations in multiple versions from multiple sources such as Ensembl, GENCODE, and UCSC. I used to rely on the R/Bioconductor ecosystem to query the coordinates of a gene annotation. There are existing Bioconductor packages ready for Ensembl and UCSC annotations (more info in …

Read UniProtKB in XML format

UniProt Knowledge Base (UniProtKB) provides various methods to access their data. I settled on their XML format since no additional parsing code is required and the format is well defined, which comes with a schema. Plus, it turns out that databases such as PDB also provide their data export in …

Ad hoc bioinformatic analysis in database

Recently I’ve found that bioinformatic analysis in a database is not hard at all and the database set up wasn’t as daunting as it sounds, especially when the data are tabular. I used to start my analysis with loading everything into R or Python, and then figuring out …

Using EnsDb's annotation database in Python

How to find and download the EnsDb, the Ensembl genomic annotation in SQLite database made by R package ensembldb, and use it in Python application.

Use Snakemake on Google cloud

TL;DR Run a RNA-seq pipeline using Snakemake locally and later port it to Google Cloud. Snakemake can parallelize jobs of a pipeline and even across machines.

Snakemake has been my favorite workflow management system for a while. I came across it while writing my master thesis and from the …

使用 conda env 部署 Django

沒幾天前剛部署一次 Django,記錄在《使用 uWSGI、nginx、systemd 部署 Django》。今天又部署了另一個專案。部署的設定跟上次一樣:

nginx -- unix socket -- uWSGI -- Django

一樣寫 …

使用 uWSGI、nginx、systemd 部署 Django

上一次很認真的 Django 部署記錄在《設定 Python 官方文件中文化自動更新 Server》一文。很巧地自己畢業的題目也要架個 Django 網站,所以就 …

設定 Python 官方文件中文化自動更新 Server

設定一個自動更新 Python 說明文件中文翻譯並且 host 中文化網頁版文件的 server。使用 Django 作 web server、Django-Q 做為 task queue,deploy stack 用 nginx、uWSGI,host 於 Amazon EC2 (Debian Jessie),資料庫用 PostgreSQL,並用 systemd 管理相關的 process。

Coding 初學指南附錄 - Bioinfo Practices using Python

A walk through of practices created by Rosalind Team.