Open to relevant roles globally

Agentic AI & ML Engineer building reasoning systems for science.

PhD with 8+ years shipping production ML and cloud infrastructure across clinical diagnostics and genomics. I build multi-agent systems that diagnose real pipeline failures, ML-powered workflows that process 6,000+ samples, and open-source tooling for agentic AI in biomedicine. Open to relevant opportunities globally — across industry, research, and startups.

8+ yrs
Production ML & bioinformatics
6,000+
Genomic samples processed in production
5 agents
Multi-agent ops system shipped (GenomicsOps AI)
40% ↓
Compute cost savings on production pipelines
// Selected work

Agents, pipelines, and research projects

Six projects that cover the range: open-source agentic AI for outbreak surveillance, agent-callable genomics tools, production multi-agent systems, self-optimising ML pipelines, and applied research. Each links to code, a demo, or a writeup where possible.

Open source · Agentic AI LangGraph · Python · No API key

outbreak-agent -- Infectious Disease Triage Pipeline

4-node LangGraph state machine that triages infectious disease cases -- built around the April 2026 MV Hondius / Andes virus event, the first confirmed human-to-human hantavirus transmission on a cruise ship.

A self-correcting agentic pipeline: genomic_node identifies clade and mutations, linkage_node resolves contact clusters and transmission mode, risk_node scores 0-100 with tier (LOW / MEDIUM / HIGH / CRITICAL), and critic_node enforces four consistency rules -- looping back up to 3 times if flags are raised. Generates a 3-panel matplotlib risk dashboard (PNG) and a structured A4 PDF triage report automatically on every run. No API key. No cost. 33 tests, fully deterministic. Part of the same ecosystem as gwas_nf: gwas_bridge.py feeds REGENIE tophits directly into this triage pipeline for population-to-clinical interpretation.

4 nodes
genomic / linkage / risk / critic
33 tests
23 unit + 10 integration, free to run
<2 sec
CRITICAL triage, MV Hondius case
LangGraph 0.6 LangChain matplotlib ReportLab pytest Apache 2.0
Open source · Agent skills Python · Claude Haiku · REST APIs

genomics-skills — Agent-Callable Skill Library

8 pure-Python genomics skills that downstream agents can call: expression profiling, survival analysis, protein mapping, pathway enrichment, literature search, and more.

The downstream skill layer for agentic-genomics. Each skill is a standalone, agent-discoverable Python module with a SKILL.md contract, CLI entrypoint, and deterministic output (TSV + PNG/SVG). Pan-cancer expression uses real TCGA data (9,479 samples across 31 cancer types via cBioPortal). Kaplan-Meier survival runs Cox PH regression on actual patient data. LLM-powered routing via Claude Haiku maps natural-language queries to the right skill. Parquet caching makes repeat queries instant.

8 skills
Agent-callable, SKILL.md contract each
9,479
Real TCGA patient samples
MIT
Open source, Python 3.9+
Python Claude Haiku cBioPortal API MyVariant.info NCBI E-utils PDB / AlphaFold Pandas Matplotlib

GenomicsOps AI

Five specialized agents that triage and resolve DRAGEN, ICA and SGE/HPC pipeline failures end-to-end.

Personal project built on weekends: Trigger → Log Fetcher → RAG → Classifier → JIRA Writer. Tested on real failure scenarios — BED-file overlaps, samplesheet index mismatches, stuck SGE jobs — with end-to-end triage and ticket creation.

5
Specialised agents
Multi-agent Claude API RAG Python JIRA & Confluence APIs
Open source · Production · Cloud Nextflow · REGENIE · AWS

gwas_nf — Multi-Ethnic GWAS Pipeline

REGENIE-based Nextflow GWAS pipeline for the TEMUS multi-ethnic cohort (4 groups, 10,000 samples, 100,000 variants, 13 phenotypes) -- with downstream agentic interpretation via GenomicsCopilot.

End-to-end population genomics pipeline: genotype QC and LD pruning, REGENIE whole-genome regression (Step 1 + Step 2), multi-ethnic stratified association, interactive Manhattan plots per phenotype per group, and automated HTML reports. Top hits feed into GenomicsCopilot via gwas_bridge.py: variant IDs extracted from .regenie.filtered.gz tophit files and passed to the 7-node LangGraph interpretation pipeline for ACMG classification and LLM-powered clinical synthesis. Population discovery to clinical interpretation in a single command.

4 groups
TEMUS multi-ethnic cohort
13 phenotypes
Parallel GWAS runs
100k variants
REGENIE whole-genome regression
Nextflow DSL2 REGENIE AWS Batch Python R / ggplot2 Docker
Research · PhD

Age-dependent hepatocyte epigenomics

Integrative RNA-seq + ChIP-seq + Hi-C analysis pipeline revealing age-driven chromatin reorganisation in mouse liver.

PhD work at NTU: built end-to-end NGS analysis pipelines for transcriptome, histone modifications, and 3D chromatin. Identified H3K27me3 as a key age-dependent regulator. The technical stack — reproducible pipelines, multi-omic integration, careful statistics — is the same foundation I now use for agentic ML systems.

RNA-seq ChIP-seq Hi-C (3C-seq) R · Bioconductor Python k-means / GSEA
// About

Scientist-turned-engineer with production miles.

PhD bioinformatician with 8+ years spanning clinical diagnostics, genomic data analysis, and cloud infrastructure. Proven track record in regulatory validation (DVT, clinical concordance, LOD), production-grade pipelines, and commercial product improvement at leading genomics companies. Now focused on agentic AI systems that apply the same rigour — traceability, reproducibility, measurable outcomes — to autonomous reasoning over real biomedical data.

// Experience

From benchtop to agentic pipelines

A decade-plus moving from wet-lab genomics and pharmaceutical QC to production ML infrastructure and, now, agentic AI systems.

Jul 2025 — Present
Senior Scientist, Bioinformatics
Illumina · Singapore
  • Lead Design Verification Testing (DVT) and regulatory validation for commercial diagnostic products (TSO500, NIPT16, VeriSeq) — FDA compliance, clinical-deployment quality.
  • TSO500: optimised TruSight Oncology 500 assay workflows for comprehensive genomic profiling in oncology.
  • NIPT16: contributed to non-invasive prenatal testing product development and validation.
  • VeriSeq: performed software testing and workflow optimisation, improving reporting efficiency and data-analysis reliability.
  • Designed clinical concordance studies — sensitivity, specificity, accuracy, LOD, reproducibility — for regulatory submissions.
Jan 2022 — Jul 2025
Senior Scientist — Bioinformatics & Cloud
Mirxes · Singapore
  • Built and validated production-grade cloud data-analysis pipelines: 40%↓ compute cost, 50%↓ storage cost, 30%↓ turnaround time.
  • Analysed 6,000+ genomic samples with standardised workflows and rigorous QC.
  • Architected AWS infrastructure integrating 15 services to manage and process 400 TB of genomic data.
  • Led a team of 5 scientists delivering clinical diagnostic assay workflows across cancer types — 33%↓ analysis time.
  • Technical lead for the Singapore National Precision Medicine project; identified business opportunities worth SGD 0.5M.
Jul 2021 — Dec 2021
Scientist, Assay Development
Vela Diagnostics · Singapore
Implemented automated verification procedures and maintained regulatory documentation (NCR, CAPA, DR). Shipped a testing framework that cut non-conformance reports by 30% and lifted product quality by 15%.
2016 — 2021
PhD Research Fellow
Nanyang Technological University · Singapore
Multi-omic study of age-dependent transcriptional and epigenetic changes in mouse hepatocytes. Built reproducible NGS analysis pipelines covering RNA-seq, ChIP-seq, and Hi-C (3C-seq). Head of NTU 3MT team (23 members, Nanyang Awards); led operations for TEDxNTU (78-person team, 1,500+ audience). Thesis DOI: 10.32657/10356/155390.
2011 — 2013
Technical Supervisor
Zydus Cadila Healthcare Ltd. · Ahmedabad, India
Built process-validation and QC pipelines for 13 pharmaceutical products under cGMP guidelines — 30%↑ efficiency, 10%↓ processing time. First exposure to regulated, high-stakes data infrastructure.
Education
PhD · M.Tech · B.Tech — Biological Sciences & Biotechnology
NTU Singapore · UPTU India
  • PhD — Nanyang Technological University, Singapore (2016 – 2021).
  • M.Tech Biotechnology (2013 – 2015).
  • B.Tech in Biotechnology — Uttar Pradesh Technical University, India (2007 – 2011).
// Publications & writing

Research output & technical writing

Peer-reviewed work, open-source documentation, and engineering writeups.

Doctoral thesis · NTU · 2021
Age-dependent transcriptional and epigenetic alterations in mouse hepatocytes
Sharma, A. (2021). Nanyang Technological University. · doi:10.32657/10356/155390 · hdl.handle.net
Technical writeup · Open source
Why agentic AI for genomics? Designing reasoning-traceable variant interpretation
Design philosophy & architecture doc shipped with agentic-genomics. · Read on GitHub
Conference poster · Cell Symposia, Chicago · 2019
Significance of hepatocyte polyploidization in liver physiology and pathology
Sharma A, Ong A, Wuestefeld T, Sanyal A. Transcriptional regulation in evolution, development and disease.
Peer-reviewed · Frontiers in Microbiology · 2018
Antiproliferative and antioxidative bioactive compounds in extracts of marine-derived endophytic fungus
Kumari M, Taritla S, Sharma A, Jayabaskaran C. · Frontiers in Microbiology
// Skills

What I use, how often

Weighted by how regularly I use each tool in production, not by how much I've ever touched it.

Agentic AI & LLMs primary

LangGraph / LangChaincore
Claude / Anthropic APIcore
Multi-agent orchestrationcore
RAG & vector storesfrequent
Tool-calling & function agentsfrequent
OpenAI API / GPT-4frequent
Local LLMs (Ollama)familiar

Machine learning & MLOps primary

Python · scikit-learncore
Pandas / NumPycore
Statistical modellingcore
PyTorchfrequent
Feature engineeringfrequent
MLflow / experiment trackingfrequent

Cloud & infra primary

AWS (Batch, Lambda, S3, Step Fns)core
Nextflow (DSL2)core
Dockercore
CI/CD (GitHub Actions)frequent
IaC (CloudFormation / CDK)frequent
Kubernetesfamiliar

Bioinformatics domain expert

NGS — WGS / WES / RNA-seq / ChIP-seqcore
WGMS · single-cell · spatialfrequent
DRAGEN · GATK · samtools · bcftoolscore
Variant calling & interpretationcore
R · Bioconductor · DESeq2frequent
Snakemakefrequent

Regulatory & clinical expert

Design Verification Testing (DVT)core
Clinical validation (LOD, sensitivity, specificity)core
FDA compliance · SOPscore
cGMP documentationfrequent
NCR / CAPA / DRfrequent
HIPAA · security compliancefrequent
// Contact

Let's build something useful.

Open to relevant roles globally — across industry, research, and startups — where agentic AI, ML, or computational biology intersects with real-world impact. Based in Singapore (PR) — open to remote, hybrid, or relocation anywhere in the world.

Based in Singapore (PR) · Open to remote, hybrid, or relocation anywhere in the world.

Frequently Asked Questions

Who is Ankur Sharma and what does he do?

Ankur Sharma is an Agentic AI & ML Engineer with a PhD from NTU Singapore and 8+ years of experience. He builds production multi-agent AI systems and reasoning-traceable AI for genomics and biomedicine. He is based in Singapore and open to relevant roles globally across industry, research, and startups. Contact: ankurs103@gmail.com or linkedin.com/in/ankurit.

What is agentic-genomics?

agentic-genomics (GenomicsCopilot) is an open-source LangGraph agent for reasoning-traceable variant interpretation. It takes VCF files and HPO phenotype terms and returns ranked, explainable reports of candidate genetic variants. It uses 7 deterministic nodes including ACMG-lite classification, Phrank HPO scoring, and a critic LLM for fact-checking. Available at github.com/ankurgenomics/agentic-genomics under the MIT license.

What is genomics-skills?

genomics-skills is a library of 8 agent-callable Python genomics tools: TCGA pan-cancer expression analysis (9,479 real patient samples), Kaplan-Meier survival analysis, GO/KEGG enrichment, PubMed search, protein variant mapping, 3D structure viewing, and volcano plots. Each skill has a SKILL.md contract and CLI entrypoint. LLM routing via Claude Haiku. Available at github.com/ankurgenomics/genomics-skills.

What technologies does Ankur Sharma work with?

Core technologies: LangGraph, Claude/Anthropic API, multi-agent orchestration, RAG, Python, PyTorch, scikit-learn, AWS (Batch, Lambda, S3, Step Functions), Nextflow, Docker, GitHub Actions. Domain expertise: clinical genomics (WGS/WES/RNA-seq/ChIP-seq), DRAGEN, GATK, variant interpretation, ACMG classification, regulatory validation (DVT, FDA compliance, HIPAA).

Is Ankur Sharma available for hire?

Yes. Ankur Sharma is open to relevant roles globally — across industry, research, and startups — where agentic AI, ML, or computational biology intersects with real-world impact. Based in Singapore (PR), open to remote, hybrid, or relocation anywhere in the world. Contact: ankurs103@gmail.com or linkedin.com/in/ankurit.

What results has Ankur Sharma achieved in production?

Key results: built a 5-agent AI system (GenomicsOps AI) as a personal project that reduced pipeline failure resolution time by 67% (from 3 days to 2 hours). Architected cloud genomics pipelines processing 6,000+ samples with 40% lower compute costs, 50% less storage, and 400 TB of data managed. Led clinical validation for oncology and prenatal diagnostic products with FDA compliance.