Open to Senior Roles

Hi, I'm
Pradyumna

Data Scientist

Impact-driven Data Scientist with close to 5 years of experience building production-grade AI/ML systems across Medical and Finance domain for Computer Vision, Audio and Generative AI use-cases. Proven track record in architecting real-time audio chatbot systems, Knowledge-graph based GraphRAG pipelines, fine-tuning large language models, developing multimodal agents and leading cross-functional teams. Seeking a Senior Data Scientist role to drive AI innovation at scale.

5
Years Experience
16+
Projects Built
2
IIT Collaborations
Pradyumna Kumar Sahoo
🕸️GraphRAG
🧠LLM Fine-tuning
🏥Medical AI
GenAI
Scroll
Expertise

Skills & Technologies

A battle-tested toolkit spanning GenAI, Knowledge Graphs, Computer Vision, and production MLOps — built over 5+ years.

🧠

LLM Training & Inference

Full Fine-tuningPEFT / LoRA / QLoRAInstruction TuningRLHF / DPOMixed-precision (bf16/fp16)Gradient CheckpointingvLLMQuantisation (GPTQ / AWQ / bitsandbytes)HuggingFace TransformersDSPyUnsloth
🤖

Agentic & RAG Systems

LangChainLangGraphLiveKitTool-use / Function CallingMulti-agent OrchestrationMultimodal RAGModel Context Protocol (MCP)Google Agent Development KitPrompt Engineering
🕸️

Knowledge Graphs & Vector Search

Neo4jAWS NeptuneQdrantLanceDBKnowledge Graph ConstructionAgentic KGRxNorm APIPubMed APIDrug Interaction Systems
👁️

Computer Vision & NLP

PyTorchDetectron2YOLOOpenCVGradCAM++GAN / Synthetic DataInstance SegmentationSpaCyNLTKASR Fine-tuningSWIN2SRNAFNet
⚙️

MLOps & Infrastructure

Weights & BiasesMLflowDockerFastAPIApache AirflowApache Spark (Databricks)ETL Pipeline DesignProcess MiningCI/CD for MLGit
☁️

Cloud & Databases

AWS BedrockAWS NeptuneAWS S3 / Lambda / EC2AWS OpensearchGCP Vertex AIAzure MLPostgreSQLMongoDBSQLitePythonSQLBash
Career

Work Experience

5 years building production AI/ML at scale across GenAI, Medical AI, and Data Science.

Data Scientist

Mondee Pvt. Ltd.

Hyderabad, India

August 2025 – PresentCurrent
  • Architected a medical-grade GraphRAG audio chatbot for our flagship clinical decision support system CDSS deployed across Surekha Hospital Chain and BhaktiVedant Hospital, by constructing structured knowledge graphs from medical textbooks using NER and Neo4J, enabling doctors to query alternative possible diagnosis and treatment protocols with traceable, hallucination-resistant responses via a LangChain-powered retrieval layer.
  • Engineered a drug–drug interaction checker and dosage scheduler agent served through a Model Context Protocol (MCP) interface, integrating real-time data from RxNorm and PubMed APIs to surface conflict alerts and patient-specific dosage recommendations within the chat session.
  • Led end-to-end fine-tuning and serving of Medgemma-27b-text-it for domain-specific clinical NLP tasks using Unsloth and vLLM, coordinating data curation, training, and evaluation pipelines with research scholars from IIT Madras and IIT Hyderabad.
  • Directed large-scale Speech-to-Text (STT/ASR) data preparation and multimodal fine-tuning of gemma-3n-e2b-it via LoRA and TGI-based serving, overseeing live clinical audio collection, Subject Matter Expert annotation, and quality control to build a medical voice agent capable of real-time clinical transcription and Doctor's Note generation.
Medical GraphRAGClinical NLPLLM Fine-tuningVoice AIMedgemma-27bMCP

Senior Member Technical (AI/ML)

ADP India Pvt. Ltd.

Hyderabad, India

December 2023 – July 2025
  • Designed and deployed a Knowledge-Graph based RAG pipeline on AWS Neptune & Bedrock orchestrated via LangGraph, enhancing financial data processing accuracy and minimising AI hallucinations — recognised as runner-up in the ADP Global Hackathon (2024).
  • Developed an intelligent Process Mining Chatbot for Global Payroll Services using Microsoft Power Automate and Databricks to analyse transaction patterns across millions of client records, optimising payment workflows across 73 payroll cycles per client on average.
  • Engineered an agentic assistant built on Google Agent Development Kit and AWS Opensearch that drafts context-aware emails and schedules meetings in real-time by checking live calendars, saving equivalent to 24 hours per user per month — currently rolled out across all ADP employees.
  • Built a scalable QR code detection, decoding, and masking pipeline with a fine-tuned YOLOv8 for multi-orientation detection and OpenCV for automated region masking, sanitising financial documents prior to downstream processing at scale across millions of payroll documents.
  • Developed an Indic PII detection and redaction system for financial regulatory compliance using fine-tuned NER models via HuggingFace Transformers and SpaCy with IndicNLP, identifying sensitive entities — Aadhaar, PAN, account numbers, names across 10+ Indic scripts — from payroll and HR documents.
AWS NeptuneLangGraphProcess MiningAgentic AIFinance AIYOLOv8Indic NLP

Junior Data Scientist

Claim Genius Pvt. Ltd.

Remote, India

June 2021 – December 2023
  • Built a high-performance Instance Segmentation pipeline using Detectron2 served via FastAPI for vehicle parts identification, improving segmentation accuracy to 95% mAP and accelerating assessment throughput by 26%; integrated GradCAM++ visualisations for regulatory model interpretability.
  • Engineered an automated ML pipeline failure tracing system using MLflow to monitor inference-time distributions and surface root-cause analysis on mispredictions, reducing manual diagnosis time by 30%.
  • Deployed an image super-resolution and denoising ensemble combining SWIN2SR Transformer and NAFNet pre-trained models to upscale and denoise compressed input images, reducing model failures by 16% and improving downstream prediction accuracy by 12%.
  • Built an automatic labelling error-detection service using Scikit-learn confidence scoring that flagged curation errors — saving 6 man-hours per head per sprint — and trained PyTorch GAN models for synthetic image generation to resolve class imbalance in rare damage categories.
  • Designed a geometric flat-tyre detection approach via OpenCV polygon analysis enabling reliable detection with zero curated data, and boosted vehicle damage severity classification by 3% per class through a fusion ensemble combining PyTorch CNN features with XGBoost structured metadata.
Detectron2FastAPIGradCAM++GANInstance SegmentationMLflowXGBoost

Education

🎓

M.Sc. Computer Science (Big Data Analytics)

Central University of Rajasthan

Kishangarh, India

⚛️

Integrated B.Sc. B.Ed. (Physical Sciences and Education)

Regional Institute of Education (NCERT), Bhubaneswar

Bhubaneswar, India

Open Source

Featured Projects

Research and personal projects spanning multi-label learning, medical AI, quantum computing, and recommender systems.

🔬
12

LLSF: Learning Label-Specific Features

Research paper implementation from scratch for improving multi-label classification on imbalanced datasets using the Label-Specific Feature learning (LLSF) algorithm.

Multi-labelFeature LearningClass ImbalanceResearch
Python
🧬
9

LLSF-DL MLSMOTE Hybrid for Tail Labels

Masters' thesis — hybrid deep learning approach combining Label-Specific Feature learning with MLSMOTE to address the tail-labels problem in multi-label classification.

Multi-labelDeep LearningSMOTEThesis
Python
🏋️
6

LIFT: Multi-Label Learning with Label-Specific Features

Research paper implementation of the LIFT algorithm — learns label-specific feature transformations to improve multi-label classification performance.

Multi-labelFeature LearningClassification
Python
📚
4

NCERT AI Teacher Assistant

AI-powered assistant for teachers that automatically creates lesson plans, drafts question sets, and supports classroom preparation workflows.

GenAIEducation AILLMAgentic
Python
📊
1

Multi-label Datasets

Curated collection of multi-label classification benchmarks used across research experiments on label imbalance, feature learning, and SMOTE-based augmentation.

DatasetMulti-labelResearch
Python
🔀
1

Fuzzy Computing Programs

Implementations of fuzzy sets, fuzzy logic inference systems, and fuzzy control applications from the Fuzzy Computing course at CURAJ.

Fuzzy LogicControl SystemsAcademic
Jupyter Notebook
🌱
1

Early Days of Machine Learning

A collection of ML algorithm implementations from 2018 — supervised learning, unsupervised learning, and regression techniques built from first principles.

Machine LearningScikit-learnFoundations
Jupyter Notebook
🧩
0

DSA 101

Ongoing journey through Data Structures and Algorithms — curated problem sets, solutions, and notes in Python.

DSAPythonProblem Solving
Python
Writing

Latest Articles

Deep-dives into ML research, audio source separation, and multilingual NLP.

Let's Connect

Get In Touch

Open to senior AI/ML roles, GenAI research collaborations, and consulting opportunities.

Pradyumna Kumar Sahoo

Pradyumna Kumar Sahoo

Senior Data Scientist

📍 Hyderabad, India

pradyumna.sahoo@outlook.in

✉️ Send Me an Email

Built with Next.js · Tailwind CSS · Framer Motion — © 2026 Pradyumna Kumar Sahoo