About
I develop foundation models for transcription regulation and apply them to understand cancer biology. My research combines deep learning, genomics, and epigenomics to predict gene expression across human cell types and identify disease-associated regulatory mechanisms.
My PhD thesis introduced GET (General Expression Transformer), a foundation model achieving experimental-level accuracy in predicting gene expression across 213 human fetal and adult cell types. This work enables discovery of distal regulatory regions and transcription factor interactions linked to disease risk.
Current Position
December 2025 - Present
Irving Cancer Early Scholar / Associate Research Scientist
Herbert Irving Comprehensive Cancer Center, Columbia University
Education
September 2020 - November 2025
PhD in Biomedical Informatics
Columbia University
Advised by Dr. Raul Rabadan
Thesis: A foundation model of transcription regulation and application to cancer
Defense: November 12, 2025
Thesis: A foundation model of transcription regulation and application to cancer
Defense: November 12, 2025
September 2016 - January 2019
MPhil in Computer Science and Engineering
The Chinese University of Hong Kong
Thesis: Systematic Identification and Prioritization of Noncoding Variants in Hirschsprung's Disease
September 2012 - August 2016
BSc in Cell and Molecular Biology
The Chinese University of Hong Kong
Selected Publications
A foundation model of transcription across human cell types
Nature
Whole-genome analysis of noncoding genetic variations identifies multigranular regulatory element perturbations associated with Hirschsprung disease
Genome Research
Working Papers
Computational structure prediction and analysis of cancer hotspot mutations
Preprint
Other Publications
Illuminating the noncoding genome in cancer using artificial intelligence
Cancer Research
Understanding variants of unknown significance: The computational frontier
The Oncologist
Smoother: A unified and modular framework for incorporating structural dependency in spatial omics data
Genome Biology
Genome-wide association analyses identified novel susceptibility loci for pulmonary embolism among Han Chinese population
BMC Medicine
Hypomorphic and dominant-negative impact of truncated SOX9 dysregulates Hedgehog-Wnt signaling, causing campomelia
PNAS
Multi-modal self-supervised pre-training for large-scale genome data
NeurIPS AI for Science Workshop
A unified framework for integrative study of heterogeneous gene regulatory mechanisms
Nature Machine Intelligence
Identification of genes associated with Hirschsprung disease, based on whole-genome sequence analysis
Gastroenterology
Dual roles of an Arabidopsis ESCRT component FREE1 in regulating vacuolar protein transport and autophagic degradation
PNAS
Invited Talks
Nov 2025
Columbia University: AI at VP&S Workshop - Foundation Models Across Scales
Jul 2025
Google Genomics
Jun 2025
CCEN / University of Chicago
Mar 2025
New York University
Feb 2025
Stanford University
Feb 2025
Genentech
Feb 2025
Tsinghua University
Jan 2025
EMBL Heidelberg
Jan 2025
Sanford Burnham Prebys Medical Discovery Institute
Nov 2024
CCEN / Spanish National Cancer Research Centre
May 2024
CCEN / University of Rome
Dec 2023
CCEN / Universitat Politecnica de Catalunya
Oct 2023
CCEN / Keio University
Honors & Awards
2021
Champion, DeeCamp Bootcamp 2021
Foundation model and deep learning bootcamp/hackathon hosted by Sinovation Ventures (Kaifu Lee)
Mentoring
- Mentored 2 PhD students and one research assistant on research projects derived from thesis work (one mentee is now a PhD student at Stanford University)
- Tutored collaborators from biological and medical backgrounds on AI/ML techniques, including pretraining and finetuning of biological foundation models
Teaching & Academic Service
- Ad hoc writer for grant proposals, including R01 grant applications
- Ad hoc/Co-reviewer for papers in Science
- Teaching assistant for 'Introduction to Database Systems' - Received departmental award for excellence in teaching assistance
Technical Skills
Since 2025
Claude Code Usage (Jan 9-17, 2026)
| Date | Input | Output | Cache Read | Cost |
|---|---|---|---|---|
| 01-09 | 12K | 210K | 335M | $180 |
| 01-12 | 34K | 171K | 336M | $183 |
| 01-13 | 13K | 122K | 230M | $135 |
| 01-14 | 38K | 22K | 74M | $71 |
| 01-16 | 94K | 18K | 101M | $98 |
| 01-17 | 87K | 7K | 93M | $74 |
| Total | 339K | 570K | 1.2B | $775 |
Models: claude-opus-4-5, claude-haiku-4-5
Pre-2025