OPT / H-1B eligible PhD Computational Biology, Carnegie Mellon (2020) 5+ Years Immediate Python PyTorch JAX Hugging Face Enformer Borzoi scVI Lightning CUDA AWS SageMaker MLflow WandB Polars Deep Learning Regulatory Sequence Modelling Gene Expression Prediction Foundation Models Single-Cell MLOps
#023 OPT / H-1B eligible

ML Bioinformatics Scientist — Deep Learning for Genomics

PhD Computational Biology, Carnegie Mellon (2020)

GitHub Audited by Biointal MVC Compliant
GitHub profiles are anonymized to protect candidates until an introduction is made. Request an interview below to receive full code samples and contact details.
Experience
5+ Years
Availability
Immediate
Degree
PhD Computational Biology, Carnegie Mellon (2020)
Visa Status
OPT / H-1B eligible

Technical Stack

Python PyTorch JAX Hugging Face Enformer Borzoi scVI Lightning CUDA AWS SageMaker MLflow WandB Polars

Domain Expertise

Deep Learning Regulatory Sequence Modelling Gene Expression Prediction Foundation Models Single-Cell MLOps

Communication Verified

Passed mandatory 15-min technical explanation interview. Candidate can articulate code logic clearly.

Summary

5 years at the intersection of deep learning and genomics. Trained sequence-to-function transformer models predicting tissue-specific gene expression from DNA sequence (Enformer/Borzoi-class). Built production MLOps infrastructure at Insitro for 12 research scientists. Strong software engineering instincts — treats ML pipelines as software products with versioning, testing, and monitoring.

Experience

ML Scientist II — Insitro (2022–Present)

  • Fine-tuned Borzoi-class sequence model on 400 GB of in-house functional genomics data (ATAC-seq, RNA-seq, Hi-C); model now standard for variant effect prediction across 3 disease programmes.
  • Built end-to-end MLOps platform on AWS SageMaker + MLflow: automated training, evaluation, and model registration for 12 research scientists.
  • Implemented JAX-based in-silico mutagenesis pipeline; identified 3 causal noncoding variants validated experimentally.

Graduate Researcher — Pfenning Lab, CMU (2015–2020)

  • Developed HALGAN, a conditional GAN for generating cell-type-specific regulatory sequences; code open-sourced with 500+ GitHub stars.
  • Benchmarked 8 sequence-to-function architectures across 218 ENCODE cell types; results published in Genome Research.

Selected Publications

  • Park Y. et al. “Cell-type-aware regulatory sequence generation with conditional adversarial networks.” Genome Research, 2021.
  • Park Y. & Pfenning AR. “Benchmarking deep learning models for regulatory genomics across 218 ENCODE cell types.” Bioinformatics, 2020.

Code Quality Notes

PyTorch and JAX modules ship with typed interfaces (mypy strict), property-based tests via hypothesis, and deterministic seed fixtures. Model training configs managed via Hydra; experiments fully reproducible from a single YAML. GPU-optimised dataloaders benchmarked with torch.profiler before merge.

Interested in this Candidate?

Reference ID: #023

To protect candidate privacy, all introductions are facilitated by the Biointal. Click below to request an introduction—no commitment required.

Request Interview