About

I develop foundation models for transcription regulation and apply them to understand cancer biology. My research combines deep learning, genomics, and epigenomics to predict gene expression across human cell types and identify disease-associated regulatory mechanisms.

My PhD thesis introduced GET (General Expression Transformer), a foundation model achieving experimental-level accuracy in predicting gene expression across 213 human fetal and adult cell types. This work enables discovery of distal regulatory regions and transcription factor interactions linked to disease risk.

Current Position

December 2025 - Present
Irving Cancer Early Scholar / Associate Research Scientist
Herbert Irving Comprehensive Cancer Center, Columbia University

Education

September 2020 - November 2025
PhD in Biomedical Informatics
Columbia University
Advised by Dr. Raul Rabadan
Thesis: A foundation model of transcription regulation and application to cancer
Defense: November 12, 2025
September 2016 - January 2019
MPhil in Computer Science and Engineering
The Chinese University of Hong Kong
Thesis: Systematic Identification and Prioritization of Noncoding Variants in Hirschsprung's Disease
September 2012 - August 2016
BSc in Cell and Molecular Biology
The Chinese University of Hong Kong

Selected Publications

A foundation model of transcription across human cell types
Fu, X., et al. (2025)
Nature
Whole-genome analysis of noncoding genetic variations identifies multigranular regulatory element perturbations associated with Hirschsprung disease
Fu, X., Lui, K.N.C., Tang, C.S.M., et al. (2020)
Genome Research

Working Papers

Computational structure prediction and analysis of cancer hotspot mutations
Fu, X., et al. (2022)
Preprint

Other Publications

Illuminating the noncoding genome in cancer using artificial intelligence
Alvarez-Torres, M., Fu, X., Rabadan, R. (2025)
Cancer Research
Understanding variants of unknown significance: The computational frontier
Fu, X., Rabadan, R. (2024)
The Oncologist
Smoother: A unified and modular framework for incorporating structural dependency in spatial omics data
Su, J., Reynier, J.B., Fu, X., et al. (2023)
Genome Biology
Genome-wide association analyses identified novel susceptibility loci for pulmonary embolism among Han Chinese population
Zhang, Z., Li, H., ..., Fu, X., et al. (2023)
BMC Medicine
Hypomorphic and dominant-negative impact of truncated SOX9 dysregulates Hedgehog-Wnt signaling, causing campomelia
Au, T.Y.K., Yip, R.K.H., ..., Fu, X., et al. (2023)
PNAS
Multi-modal self-supervised pre-training for large-scale genome data
Mo, S., Fu, X., Hong, C., et al. (2021)
NeurIPS AI for Science Workshop
A unified framework for integrative study of heterogeneous gene regulatory mechanisms
Cao, Q., Zhang, Z., Fu, X., et al. (2020)
Nature Machine Intelligence
Identification of genes associated with Hirschsprung disease, based on whole-genome sequence analysis
Tang, C.S.M., Li, P., ..., Fu, X., et al. (2018)
Gastroenterology
Dual roles of an Arabidopsis ESCRT component FREE1 in regulating vacuolar protein transport and autophagic degradation
Gao, C., Zhuang, X., Cui, Y., Fu, X., et al. (2015)
PNAS

Invited Talks

Nov 2025 Columbia University: AI at VP&S Workshop - Foundation Models Across Scales
Jul 2025 Google Genomics
Jun 2025 CCEN / University of Chicago
Mar 2025 New York University
Feb 2025 Stanford University
Feb 2025 Genentech
Feb 2025 Tsinghua University
Jan 2025 EMBL Heidelberg
Jan 2025 Sanford Burnham Prebys Medical Discovery Institute
Nov 2024 CCEN / Spanish National Cancer Research Centre
May 2024 CCEN / University of Rome
Dec 2023 CCEN / Universitat Politecnica de Catalunya
Oct 2023 CCEN / Keio University

Honors & Awards

2021
Champion, DeeCamp Bootcamp 2021
Foundation model and deep learning bootcamp/hackathon hosted by Sinovation Ventures (Kaifu Lee)

Mentoring

  • Mentored 2 PhD students and one research assistant on research projects derived from thesis work (one mentee is now a PhD student at Stanford University)
  • Tutored collaborators from biological and medical backgrounds on AI/ML techniques, including pretraining and finetuning of biological foundation models

Teaching & Academic Service

  • Ad hoc writer for grant proposals, including R01 grant applications
  • Ad hoc/Co-reviewer for papers in Science
  • Teaching assistant for 'Introduction to Database Systems' - Received departmental award for excellence in teaching assistance

Technical Skills

Since 2025
Claude Code Cursor
Claude Code Usage (Jan 9-17, 2026)
Date Input Output Cache Read Cost
01-09 12K 210K 335M $180
01-12 34K 171K 336M $183
01-13 13K 122K 230M $135
01-14 38K 22K 74M $71
01-16 94K 18K 101M $98
01-17 87K 7K 93M $74
Total 339K 570K 1.2B $775
Models: claude-opus-4-5, claude-haiku-4-5
Pre-2025
PyTorch Hydra PyTorch Lightning Hugging Face scikit-learn statsmodels seaborn tidyverse ggplot2 Nextflow Bash Linux