Use DuckDB in ensembldb to query Ensembl's genome annotations

I have been using ensembldb to query genome annotations locally, which stores the Ensembl annotations in a offline SQLite database. By replacing the database engine with DuckDB, genome-wide queries are faster with small impact on gene specific queries (depending on the usage). DuckDB database’s file size is also smaller, and it can be even smaller by offloading the tables to external Parquet files.

Build EnsDb from a local Ensembl MySQL database

In some occasions, I need to access the older version of Ensembl human transcripts. For example, the mutation calls generated by the NCI’s Genomic Data Common pipeline are annotated by Ensembl v84. To programmatically query the Ensembl annotations, I use the EnsDb SQLite database created by ensembldb, which is …

Access gene annotation using gffutils

Recently, I had to access gene annotations in multiple versions from multiple sources such as Ensembl, GENCODE, and UCSC. I used to rely on the R/Bioconductor ecosystem to query the coordinates of a gene annotation. There are existing Bioconductor packages ready for Ensembl and UCSC annotations (more info in …

Ad hoc bioinformatic analysis in database

Recently I’ve found that bioinformatic analysis in a database is not hard at all and the database set up wasn’t as daunting as it sounds, especially when the data are tabular. I used to start my analysis with loading everything into R or Python, and then figuring out …

用 Django 與 SQLite 架抽籤網站

把之前用 Flask 架的抽籤網站改用 Django 實作,也藉這個機會比較一下兩個 Framework 設計概念的不同。

Datetime in SQLite and Python

整理在 Python 中處理時區的問題,並如何自 SQLite 存取考慮時區的時間

用 Flask 與 SQLite 架抽籤網站

為了實驗室的專題生而寫。

目標其實是 Django + Django ORM + PostgreSQL,不過一次接觸太多會有反效果,先操作比較簡單的才好上手。所以這邊講 …