ML Bioinformatics Scientist — Deep Learning for Genomics
PhD Computational Biology, Carnegie Mellon (2020)
Technical Stack
Domain Expertise
Communication Verified
Passed mandatory 15-min technical explanation interview. Candidate can articulate code logic clearly.
Summary
5 years at the intersection of deep learning and genomics. Trained sequence-to-function transformer models predicting tissue-specific gene expression from DNA sequence (Enformer/Borzoi-class). Built production MLOps infrastructure at Insitro for 12 research scientists. Strong software engineering instincts — treats ML pipelines as software products with versioning, testing, and monitoring.
Experience
ML Scientist II — Insitro (2022–Present)
- Fine-tuned Borzoi-class sequence model on 400 GB of in-house functional genomics data (ATAC-seq, RNA-seq, Hi-C); model now standard for variant effect prediction across 3 disease programmes.
- Built end-to-end MLOps platform on AWS SageMaker + MLflow: automated training, evaluation, and model registration for 12 research scientists.
- Implemented JAX-based in-silico mutagenesis pipeline; identified 3 causal noncoding variants validated experimentally.
Graduate Researcher — Pfenning Lab, CMU (2015–2020)
- Developed HALGAN, a conditional GAN for generating cell-type-specific regulatory sequences; code open-sourced with 500+ GitHub stars.
- Benchmarked 8 sequence-to-function architectures across 218 ENCODE cell types; results published in Genome Research.
Selected Publications
- Park Y. et al. “Cell-type-aware regulatory sequence generation with conditional adversarial networks.” Genome Research, 2021.
- Park Y. & Pfenning AR. “Benchmarking deep learning models for regulatory genomics across 218 ENCODE cell types.” Bioinformatics, 2020.
Code Quality Notes
PyTorch and JAX modules ship with typed interfaces (mypy strict), property-based tests via hypothesis, and deterministic seed fixtures. Model training configs managed via Hydra; experiments fully reproducible from a single YAML. GPU-optimised dataloaders benchmarked with torch.profiler before merge.
Interested in this Candidate?
Reference ID: #023
To protect candidate privacy, all introductions are facilitated by the Biointal. Click below to request an introduction—no commitment required.
Request Interview