Open to ML · Agentic AI · Research Engineer roles

Senior ML & Agentic AI Engineer building reasoning systems for science.

PhD with 8+ years shipping production ML and cloud infrastructure across clinical diagnostics and genomics. I build multi-agent systems that diagnose real pipeline failures, ML-powered workflows that process 6,000+ samples, and open-source tooling for agentic AI in biomedicine. Currently at Illumina Singapore; open to senior IC roles in ML / Agentic AI / Research Engineering.

8+ yrs
Production ML & bioinformatics
6,000+
Genomic samples processed in production
5 agents
Multi-agent ops system shipped (GenomicsOps AI)
40% ↓
Compute cost savings on production pipelines
// Selected work

Agents, pipelines, and research projects

Five projects that cover the range: open-source agentic AI, agent-callable genomics tools, production multi-agent systems, self-optimising ML pipelines, and applied research. Each links to code, a demo, or a writeup where possible.

Open source · Agent skills Python · Claude Haiku · REST APIs

genomics-skills — Agent-Callable Skill Library

8 pure-Python genomics skills that downstream agents can call: expression profiling, survival analysis, protein mapping, pathway enrichment, literature search, and more.

The downstream skill layer for agentic-genomics. Each skill is a standalone, agent-discoverable Python module with a SKILL.md contract, CLI entrypoint, and deterministic output (TSV + PNG/SVG). Pan-cancer expression uses real TCGA data (9,479 samples across 31 cancer types via cBioPortal). Kaplan-Meier survival runs Cox PH regression on actual patient data. LLM-powered routing via Claude Haiku maps natural-language queries to the right skill. Parquet caching makes repeat queries instant.

8 skills
Agent-callable, SKILL.md contract each
9,479
Real TCGA patient samples
MIT
Open source, Python 3.9+
Python Claude Haiku cBioPortal API MyVariant.info NCBI E-utils PDB / AlphaFold Pandas Matplotlib

GenomicsOps AI

Five specialized agents that triage and resolve DRAGEN, ICA and SGE/HPC pipeline failures end-to-end.

Personal project built on weekends: Trigger → Log Fetcher → RAG → Classifier → JIRA Writer. Tested on real failure scenarios — BED-file overlaps, samplesheet index mismatches, stuck SGE jobs — with end-to-end triage and ticket creation.

5
Specialised agents
Multi-agent Claude API RAG Python JIRA & Confluence APIs
Production · Cloud

Autonomous Genomic Pipelines (Mirxes)

Self-optimising WGS/RNA-seq workflows on AWS with adaptive resource allocation and automated QC gating.

Designed and shipped the AWS infrastructure for the Singapore National Precision Medicine project. Nextflow on AWS Batch, with Lambda + Step Functions orchestrating sample intake, QC decisions, and output delivery. Processed 6,000+ samples with minimal human intervention.

40% ↓
Compute cost
50% ↓
Storage footprint
400 TB
Genomic data managed
Nextflow AWS Batch Lambda Step Functions Docker IaC
Research · PhD

Age-dependent hepatocyte epigenomics

Integrative RNA-seq + ChIP-seq + Hi-C analysis pipeline revealing age-driven chromatin reorganisation in mouse liver.

PhD work at NTU: built end-to-end NGS analysis pipelines for transcriptome, histone modifications, and 3D chromatin. Identified H3K27me3 as a key age-dependent regulator. The technical stack — reproducible pipelines, multi-omic integration, careful statistics — is the same foundation I now use for agentic ML systems.

RNA-seq ChIP-seq Hi-C (3C-seq) R · Bioconductor Python k-means / GSEA
// About

Scientist-turned-engineer with production miles.

PhD bioinformatician with 8+ years spanning clinical diagnostics, genomic data analysis, and cloud infrastructure. Proven track record in regulatory validation (DVT, clinical concordance, LOD), production-grade pipelines, and commercial product improvement at leading genomics companies (Illumina, Mirxes). Now focused on agentic AI systems that apply the same rigour — traceability, reproducibility, measurable outcomes — to autonomous reasoning over real biomedical data.

// Experience

From benchtop to agentic pipelines

A decade-plus moving from wet-lab genomics and pharmaceutical QC to production ML infrastructure and, now, agentic AI systems.

Jul 2025 — Present
Senior Scientist, Bioinformatics
Illumina · Singapore
  • Lead Design Verification Testing (DVT) and regulatory validation for commercial diagnostic products (TSO500, NIPT16, VeriSeq) — FDA compliance, clinical-deployment quality.
  • TSO500: optimised TruSight Oncology 500 assay workflows for comprehensive genomic profiling in oncology.
  • NIPT16: contributed to non-invasive prenatal testing product development and validation.
  • VeriSeq: performed software testing and workflow optimisation, improving reporting efficiency and data-analysis reliability.
  • Designed clinical concordance studies — sensitivity, specificity, accuracy, LOD, reproducibility — for regulatory submissions.
Jan 2022 — Jul 2025
Senior Scientist — Bioinformatics & Cloud
Mirxes · Singapore
  • Built and validated production-grade cloud data-analysis pipelines: 40%↓ compute cost, 50%↓ storage cost, 30%↓ turnaround time.
  • Analysed 6,000+ genomic samples with standardised workflows and rigorous QC.
  • Architected AWS infrastructure integrating 15 services to manage and process 400 TB of genomic data.
  • Led a team of 5 scientists delivering clinical diagnostic assay workflows across cancer types — 33%↓ analysis time.
  • Technical lead for the Singapore National Precision Medicine project; identified business opportunities worth SGD 0.5M.
Jul 2021 — Dec 2021
Scientist, Assay Development
Vela Diagnostics · Singapore
Implemented automated verification procedures and maintained regulatory documentation (NCR, CAPA, DR). Shipped a testing framework that cut non-conformance reports by 30% and lifted product quality by 15%.
2016 — 2021
PhD Research Fellow, Biological Sciences
Nanyang Technological University · Singapore
Multi-omic study of age-dependent transcriptional and epigenetic changes in mouse hepatocytes. Built reproducible NGS analysis pipelines covering RNA-seq, ChIP-seq, and Hi-C (3C-seq). Head of NTU 3MT team (23 members, Nanyang Awards); led operations for TEDxNTU (78-person team, 1,500+ audience). Thesis DOI: 10.32657/10356/155390.
2011 — 2013
Technical Supervisor
Zydus Cadila Healthcare Ltd. · Ahmedabad, India
Built process-validation and QC pipelines for 13 pharmaceutical products under cGMP guidelines — 30%↑ efficiency, 10%↓ processing time. First exposure to regulated, high-stakes data infrastructure.
Education
PhD · M.Tech · B.Tech — Biological Sciences & Biotechnology
NTU Singapore · Anna University · UPTU India
  • PhD in Biological Sciences — Nanyang Technological University, Singapore (2016 – 2021).
  • M.Tech in Biotechnology — Anna University, India (2013 – 2015).
  • B.Tech in Biotechnology — Uttar Pradesh Technical University, India (2007 – 2011).
// Publications & writing

Research output & technical writing

Peer-reviewed work, open-source documentation, and engineering writeups.

Doctoral thesis · NTU · 2021
Age-dependent transcriptional and epigenetic alterations in mouse hepatocytes
Sharma, A. (2021). Nanyang Technological University. · doi:10.32657/10356/155390 · hdl.handle.net
Technical writeup · Open source
Why agentic AI for genomics? Designing reasoning-traceable variant interpretation
Design philosophy & architecture doc shipped with agentic-genomics. · Read on GitHub
Conference poster · Cell Symposia, Chicago · 2019
Significance of hepatocyte polyploidization in liver physiology and pathology
Sharma A, Ong A, Wuestefeld T, Sanyal A. Transcriptional regulation in evolution, development and disease.
Peer-reviewed · Frontiers in Microbiology · 2018
Antiproliferative and antioxidative bioactive compounds in extracts of marine-derived endophytic fungus
Kumari M, Taritla S, Sharma A, Jayabaskaran C. · Frontiers in Microbiology
// Skills

What I use, how often

Weighted by how regularly I use each tool in production, not by how much I've ever touched it.

Agentic AI & LLMs primary

LangGraph / LangChaincore
Claude / Anthropic APIcore
Multi-agent orchestrationcore
RAG & vector storesfrequent
Tool-calling & function agentsfrequent
OpenAI API / GPT-4frequent
Local LLMs (Ollama)familiar

Machine learning & MLOps primary

Python · scikit-learncore
Pandas / NumPycore
Statistical modellingcore
PyTorchfrequent
Feature engineeringfrequent
MLflow / experiment trackingfrequent

Cloud & infra primary

AWS (Batch, Lambda, S3, Step Fns)core
Nextflow (DSL2)core
Dockercore
CI/CD (GitHub Actions)frequent
IaC (CloudFormation / CDK)frequent
Kubernetesfamiliar

Bioinformatics domain expert

NGS — WGS / WES / RNA-seq / ChIP-seqcore
WGMS · single-cell · spatialfrequent
DRAGEN · GATK · samtools · bcftoolscore
Variant calling & interpretationcore
R · Bioconductor · DESeq2frequent
Snakemakefrequent

Regulatory & clinical expert

Design Verification Testing (DVT)core
Clinical validation (LOD, sensitivity, specificity)core
FDA compliance · SOPscore
cGMP documentationfrequent
NCR / CAPA / DRfrequent
HIPAA · security compliancefrequent
// Contact

Let's build something useful.

Open to Senior / Staff ML Engineer, Agentic AI Engineer, Research Engineer, and Applied Scientist roles. Based in Singapore (PR) — open to remote or relocation for the right team.

Based in Singapore (PR) · Open to remote or relocation for the right team.

Frequently Asked Questions

Who is Ankur Sharma and what does he do?

Ankur Sharma is a Senior ML and Agentic AI Engineer with a PhD from NTU Singapore and 8+ years of experience. He builds production multi-agent AI systems and reasoning-traceable AI for genomics and biomedicine. He is based in Singapore and open to senior ML, agentic AI, and research engineer roles worldwide.

What is agentic-genomics?

agentic-genomics (GenomicsCopilot) is an open-source LangGraph agent for reasoning-traceable variant interpretation. It takes VCF files and HPO phenotype terms and returns ranked, explainable reports of candidate genetic variants. It uses 7 deterministic nodes including ACMG-lite classification, Phrank HPO scoring, and a critic LLM for fact-checking. Available at github.com/ankurgenomics/agentic-genomics under the MIT license.

What is genomics-skills?

genomics-skills is a library of 8 agent-callable Python genomics tools: TCGA pan-cancer expression analysis (9,479 real patient samples), Kaplan-Meier survival analysis, GO/KEGG enrichment, PubMed search, protein variant mapping, 3D structure viewing, and volcano plots. Each skill has a SKILL.md contract and CLI entrypoint. LLM routing via Claude Haiku. Available at github.com/ankurgenomics/genomics-skills.

What technologies does Ankur Sharma work with?

Core technologies: LangGraph, Claude/Anthropic API, multi-agent orchestration, RAG, Python, PyTorch, scikit-learn, AWS (Batch, Lambda, S3, Step Functions), Nextflow, Docker, GitHub Actions. Domain expertise: clinical genomics (WGS/WES/RNA-seq/ChIP-seq), DRAGEN, GATK, variant interpretation, ACMG classification, regulatory validation (DVT, FDA compliance, HIPAA).

Is Ankur Sharma available for hire?

Yes. Ankur Sharma is open to Senior/Staff ML Engineer, Agentic AI Engineer, Research Engineer, and Applied Scientist roles. He is based in Singapore as a Permanent Resident and is open to remote work or relocation for the right team. Contact: ankurs103@gmail.com or linkedin.com/in/ankurit.

What results has Ankur Sharma achieved in production?

Key results: built a 5-agent AI system (GenomicsOps AI) as a personal project that reduced pipeline failure resolution time by 67% (from 3 days to 2 hours). Architected cloud genomics pipelines processing 6,000+ samples with 40% lower compute costs, 50% less storage, and 400 TB of data managed. Led clinical validation for oncology and prenatal diagnostic products with FDA compliance.