Open to ML · Agentic AI · Research Engineer roles

Senior ML & Agentic AI Engineer building reasoning systems for science.

PhD with 8+ years shipping production ML and cloud infrastructure across clinical diagnostics and genomics. I build multi-agent systems that diagnose real pipeline failures, ML-powered workflows that process 6,000+ samples, and open-source tooling for agentic AI in biomedicine. Currently at Illumina Singapore; open to senior IC roles in ML / Agentic AI / Research Engineering.

See projects → Download CV (PDF) GitHub LinkedIn Email

8+ yrs

Production ML & bioinformatics

6,000+

Genomic samples processed in production

5 agents

Multi-agent ops system shipped (GenomicsOps AI)

40% ↓

Compute cost savings on production pipelines

// Selected work

Agents, pipelines, and research projects

Five projects that cover the range: open-source agentic AI, agent-callable genomics tools, production multi-agent systems, self-optimising ML pipelines, and applied research. Each links to code, a demo, or a writeup where possible.

Flagship · Open source LangGraph · Claude · Python

agentic-genomics · GenomicsCopilot

An open-source LangGraph agent for explainable variant interpretation — every call leaves a full reasoning trace a human can audit. Research demonstration, not clinical.

Takes a VCF + HPO phenotype terms and returns a ranked, explainable report of candidate variants. Deterministic nodes handle ingest, gnomAD / ClinVar / SpliceAI lookups via MyVariant.info, a transparent ACMG-lite rule engine (7 criteria, a proper PVS1 check, and Richards-et-al-2015 combining rules), and a Phrank-style HPO semantic-similarity score. An LLM synthesiser ranks candidates and writes the narrative; a second LLM critic fact-checks those claims against the evidence JSON and flags anything unsupported. Every run emits a machine-readable reasoning trace. See LIMITATIONS.md for an honest accounting of what this system does not do.

7 nodes

LangGraph + critic review

4 tools

MyVariant · Phrank HPO · ACMG-lite · critic

MIT

Open source, Python 3.11+

LangGraph Claude / Anthropic Pydantic v2 pysam Streamlit Typer CLI GitHub Actions

→ GitHub repo → Why agentic? → Architecture → Limitations & prior art

GenomicsCopilot pipeline: VCF + HPO → ingest_variants → annotate_evidence → frequency_filter → phenotype_score → acmg_classify → synthesize_report → critic_review → ranked report

Open source · Agent skills Python · Claude Haiku · REST APIs

genomics-skills — Agent-Callable Skill Library

8 pure-Python genomics skills that downstream agents can call: expression profiling, survival analysis, protein mapping, pathway enrichment, literature search, and more.

The downstream skill layer for agentic-genomics. Each skill is a standalone, agent-discoverable Python module with a SKILL.md contract, CLI entrypoint, and deterministic output (TSV + PNG/SVG). Pan-cancer expression uses real TCGA data (9,479 samples across 31 cancer types via cBioPortal). Kaplan-Meier survival runs Cox PH regression on actual patient data. LLM-powered routing via Claude Haiku maps natural-language queries to the right skill. Parquet caching makes repeat queries instant.

8 skills

Agent-callable, SKILL.md contract each

9,479

Real TCGA patient samples

MIT

Open source, Python 3.9+

Python Claude Haiku cBioPortal API MyVariant.info NCBI E-utils PDB / AlphaFold Pandas Matplotlib

→ GitHub repo

GenomicsOps AI

Five specialized agents that triage and resolve DRAGEN, ICA and SGE/HPC pipeline failures end-to-end.

Personal project built on weekends: Trigger → Log Fetcher → RAG → Classifier → JIRA Writer. Tested on real failure scenarios — BED-file overlaps, samplesheet index mismatches, stuck SGE jobs — with end-to-end triage and ticket creation.

Specialised agents

Multi-agent Claude API RAG Python JIRA & Confluence APIs

Side project · happy to walk through architecture in interviews

Production · Cloud

Autonomous Genomic Pipelines (Mirxes)

Self-optimising WGS/RNA-seq workflows on AWS with adaptive resource allocation and automated QC gating.

Designed and shipped the AWS infrastructure for the Singapore National Precision Medicine project. Nextflow on AWS Batch, with Lambda + Step Functions orchestrating sample intake, QC decisions, and output delivery. Processed 6,000+ samples with minimal human intervention.

40% ↓

Compute cost

50% ↓

Storage footprint

400 TB

Genomic data managed

Nextflow AWS Batch Lambda Step Functions Docker IaC

→ Related Nextflow (GWAS)

Research · PhD

Age-dependent hepatocyte epigenomics

Integrative RNA-seq + ChIP-seq + Hi-C analysis pipeline revealing age-driven chromatin reorganisation in mouse liver.

PhD work at NTU: built end-to-end NGS analysis pipelines for transcriptome, histone modifications, and 3D chromatin. Identified H3K27me3 as a key age-dependent regulator. The technical stack — reproducible pipelines, multi-omic integration, careful statistics — is the same foundation I now use for agentic ML systems.

RNA-seq ChIP-seq Hi-C (3C-seq) R · Bioconductor Python k-means / GSEA

→ Thesis (DOI)

// About

Scientist-turned-engineer with production miles.

PhD bioinformatician with 8+ years spanning clinical diagnostics, genomic data analysis, and cloud infrastructure. Proven track record in regulatory validation (DVT, clinical concordance, LOD), production-grade pipelines, and commercial product improvement at leading genomics companies (Illumina, Mirxes). Now focused on agentic AI systems that apply the same rigour — traceability, reproducibility, measurable outcomes — to autonomous reasoning over real biomedical data.

// Experience

From benchtop to agentic pipelines

A decade-plus moving from wet-lab genomics and pharmaceutical QC to production ML infrastructure and, now, agentic AI systems.

Jul 2025 — Present

Senior Scientist, Bioinformatics

Illumina · Singapore

Lead Design Verification Testing (DVT) and regulatory validation for commercial diagnostic products (TSO500, NIPT16, VeriSeq) — FDA compliance, clinical-deployment quality.
TSO500: optimised TruSight Oncology 500 assay workflows for comprehensive genomic profiling in oncology.
NIPT16: contributed to non-invasive prenatal testing product development and validation.
VeriSeq: performed software testing and workflow optimisation, improving reporting efficiency and data-analysis reliability.
Designed clinical concordance studies — sensitivity, specificity, accuracy, LOD, reproducibility — for regulatory submissions.

Jan 2022 — Jul 2025

Senior Scientist — Bioinformatics & Cloud

Mirxes · Singapore

Built and validated production-grade cloud data-analysis pipelines: 40%↓ compute cost, 50%↓ storage cost, 30%↓ turnaround time.
Analysed 6,000+ genomic samples with standardised workflows and rigorous QC.
Architected AWS infrastructure integrating 15 services to manage and process 400 TB of genomic data.
Led a team of 5 scientists delivering clinical diagnostic assay workflows across cancer types — 33%↓ analysis time.
Technical lead for the Singapore National Precision Medicine project; identified business opportunities worth SGD 0.5M.

Jul 2021 — Dec 2021

Scientist, Assay Development

Vela Diagnostics · Singapore

Implemented automated verification procedures and maintained regulatory documentation (NCR, CAPA, DR). Shipped a testing framework that cut non-conformance reports by 30% and lifted product quality by 15%.

2016 — 2021

PhD Research Fellow, Biological Sciences

Nanyang Technological University · Singapore

Multi-omic study of age-dependent transcriptional and epigenetic changes in mouse hepatocytes. Built reproducible NGS analysis pipelines covering RNA-seq, ChIP-seq, and Hi-C (3C-seq). Head of NTU 3MT team (23 members, Nanyang Awards); led operations for TEDxNTU (78-person team, 1,500+ audience). Thesis DOI: 10.32657/10356/155390.

2011 — 2013

Technical Supervisor

Zydus Cadila Healthcare Ltd. · Ahmedabad, India

Built process-validation and QC pipelines for 13 pharmaceutical products under cGMP guidelines — 30%↑ efficiency, 10%↓ processing time. First exposure to regulated, high-stakes data infrastructure.

Education

PhD · M.Tech · B.Tech — Biological Sciences & Biotechnology

NTU Singapore · Anna University · UPTU India

PhD in Biological Sciences — Nanyang Technological University, Singapore (2016 – 2021).
M.Tech in Biotechnology — Anna University, India (2013 – 2015).
B.Tech in Biotechnology — Uttar Pradesh Technical University, India (2007 – 2011).

// Publications & writing

Research output & technical writing

Peer-reviewed work, open-source documentation, and engineering writeups.

Doctoral thesis · NTU · 2021

Age-dependent transcriptional and epigenetic alterations in mouse hepatocytes

Sharma, A. (2021). Nanyang Technological University. · doi:10.32657/10356/155390 · hdl.handle.net

Technical writeup · Open source

Why agentic AI for genomics? Designing reasoning-traceable variant interpretation

Design philosophy & architecture doc shipped with agentic-genomics. · Read on GitHub

Conference poster · Cell Symposia, Chicago · 2019

Significance of hepatocyte polyploidization in liver physiology and pathology

Sharma A, Ong A, Wuestefeld T, Sanyal A. Transcriptional regulation in evolution, development and disease.

Peer-reviewed · Frontiers in Microbiology · 2018

Antiproliferative and antioxidative bioactive compounds in extracts of marine-derived endophytic fungus

Kumari M, Taritla S, Sharma A, Jayabaskaran C. · Frontiers in Microbiology

// Skills

What I use, how often

Weighted by how regularly I use each tool in production, not by how much I've ever touched it.

Agentic AI & LLMs primary

LangGraph / LangChaincore

Claude / Anthropic APIcore

Multi-agent orchestrationcore

RAG & vector storesfrequent

Tool-calling & function agentsfrequent

OpenAI API / GPT-4frequent

Local LLMs (Ollama)familiar

Machine learning & MLOps primary

Python · scikit-learncore

Pandas / NumPycore

Statistical modellingcore

PyTorchfrequent

Feature engineeringfrequent

MLflow / experiment trackingfrequent

Cloud & infra primary

AWS (Batch, Lambda, S3, Step Fns)core

Nextflow (DSL2)core

Dockercore

CI/CD (GitHub Actions)frequent

IaC (CloudFormation / CDK)frequent

Kubernetesfamiliar

Bioinformatics domain expert

NGS — WGS / WES / RNA-seq / ChIP-seqcore

WGMS · single-cell · spatialfrequent

DRAGEN · GATK · samtools · bcftoolscore

Variant calling & interpretationcore

R · Bioconductor · DESeq2frequent

Snakemakefrequent

Regulatory & clinical expert

Design Verification Testing (DVT)core

Clinical validation (LOD, sensitivity, specificity)core

FDA compliance · SOPscore

cGMP documentationfrequent

NCR / CAPA / DRfrequent

HIPAA · security compliancefrequent

// Contact

Let's build something useful.

Open to Senior / Staff ML Engineer, Agentic AI Engineer, Research Engineer, and Applied Scientist roles. Based in Singapore (PR) — open to remote or relocation for the right team.

ankurs103@gmail.com +65 8402 6093 github.com/ankurgenomics linkedin.com/in/ankurit Download CV (PDF)

Based in Singapore (PR) · Open to remote or relocation for the right team.

Frequently Asked Questions

Who is Ankur Sharma and what does he do?

Ankur Sharma is a Senior ML and Agentic AI Engineer with a PhD from NTU Singapore and 8+ years of experience. He builds production multi-agent AI systems and reasoning-traceable AI for genomics and biomedicine. He is based in Singapore and open to senior ML, agentic AI, and research engineer roles worldwide.

What is agentic-genomics?

agentic-genomics (GenomicsCopilot) is an open-source LangGraph agent for reasoning-traceable variant interpretation. It takes VCF files and HPO phenotype terms and returns ranked, explainable reports of candidate genetic variants. It uses 7 deterministic nodes including ACMG-lite classification, Phrank HPO scoring, and a critic LLM for fact-checking. Available at github.com/ankurgenomics/agentic-genomics under the MIT license.

What is genomics-skills?

genomics-skills is a library of 8 agent-callable Python genomics tools: TCGA pan-cancer expression analysis (9,479 real patient samples), Kaplan-Meier survival analysis, GO/KEGG enrichment, PubMed search, protein variant mapping, 3D structure viewing, and volcano plots. Each skill has a SKILL.md contract and CLI entrypoint. LLM routing via Claude Haiku. Available at github.com/ankurgenomics/genomics-skills.

What technologies does Ankur Sharma work with?

Core technologies: LangGraph, Claude/Anthropic API, multi-agent orchestration, RAG, Python, PyTorch, scikit-learn, AWS (Batch, Lambda, S3, Step Functions), Nextflow, Docker, GitHub Actions. Domain expertise: clinical genomics (WGS/WES/RNA-seq/ChIP-seq), DRAGEN, GATK, variant interpretation, ACMG classification, regulatory validation (DVT, FDA compliance, HIPAA).

Is Ankur Sharma available for hire?

Yes. Ankur Sharma is open to Senior/Staff ML Engineer, Agentic AI Engineer, Research Engineer, and Applied Scientist roles. He is based in Singapore as a Permanent Resident and is open to remote work or relocation for the right team. Contact: ankurs103@gmail.com or linkedin.com/in/ankurit.

What results has Ankur Sharma achieved in production?

Key results: built a 5-agent AI system (GenomicsOps AI) as a personal project that reduced pipeline failure resolution time by 67% (from 3 days to 2 hours). Architected cloud genomics pipelines processing 6,000+ samples with 40% lower compute costs, 50% less storage, and 400 TB of data managed. Led clinical validation for oncology and prenatal diagnostic products with FDA compliance.