Hi, I'm
Pradyumna
Data Scientist
Impact-driven Data Scientist with close to 5 years of experience building production-grade AI/ML systems across Medical and Finance domain for Computer Vision, Audio and Generative AI use-cases. Proven track record in architecting real-time audio chatbot systems, Knowledge-graph based GraphRAG pipelines, fine-tuning large language models, developing multimodal agents and leading cross-functional teams. Seeking a Senior Data Scientist role to drive AI innovation at scale.
Skills & Technologies
A battle-tested toolkit spanning GenAI, Knowledge Graphs, Computer Vision, and production MLOps — built over 5+ years.
LLM Training & Inference
Agentic & RAG Systems
Knowledge Graphs & Vector Search
Computer Vision & NLP
MLOps & Infrastructure
Cloud & Databases
Work Experience
5 years building production AI/ML at scale across GenAI, Medical AI, and Data Science.
Data Scientist
Mondee Pvt. Ltd.
Hyderabad, India
- ▸Architected a medical-grade GraphRAG audio chatbot for our flagship clinical decision support system CDSS deployed across Surekha Hospital Chain and BhaktiVedant Hospital, by constructing structured knowledge graphs from medical textbooks using NER and Neo4J, enabling doctors to query alternative possible diagnosis and treatment protocols with traceable, hallucination-resistant responses via a LangChain-powered retrieval layer.
- ▸Engineered a drug–drug interaction checker and dosage scheduler agent served through a Model Context Protocol (MCP) interface, integrating real-time data from RxNorm and PubMed APIs to surface conflict alerts and patient-specific dosage recommendations within the chat session.
- ▸Led end-to-end fine-tuning and serving of Medgemma-27b-text-it for domain-specific clinical NLP tasks using Unsloth and vLLM, coordinating data curation, training, and evaluation pipelines with research scholars from IIT Madras and IIT Hyderabad.
- ▸Directed large-scale Speech-to-Text (STT/ASR) data preparation and multimodal fine-tuning of gemma-3n-e2b-it via LoRA and TGI-based serving, overseeing live clinical audio collection, Subject Matter Expert annotation, and quality control to build a medical voice agent capable of real-time clinical transcription and Doctor's Note generation.
Senior Member Technical (AI/ML)
ADP India Pvt. Ltd.
Hyderabad, India
- ▸Designed and deployed a Knowledge-Graph based RAG pipeline on AWS Neptune & Bedrock orchestrated via LangGraph, enhancing financial data processing accuracy and minimising AI hallucinations — recognised as runner-up in the ADP Global Hackathon (2024).
- ▸Developed an intelligent Process Mining Chatbot for Global Payroll Services using Microsoft Power Automate and Databricks to analyse transaction patterns across millions of client records, optimising payment workflows across 73 payroll cycles per client on average.
- ▸Engineered an agentic assistant built on Google Agent Development Kit and AWS Opensearch that drafts context-aware emails and schedules meetings in real-time by checking live calendars, saving equivalent to 24 hours per user per month — currently rolled out across all ADP employees.
- ▸Built a scalable QR code detection, decoding, and masking pipeline with a fine-tuned YOLOv8 for multi-orientation detection and OpenCV for automated region masking, sanitising financial documents prior to downstream processing at scale across millions of payroll documents.
- ▸Developed an Indic PII detection and redaction system for financial regulatory compliance using fine-tuned NER models via HuggingFace Transformers and SpaCy with IndicNLP, identifying sensitive entities — Aadhaar, PAN, account numbers, names across 10+ Indic scripts — from payroll and HR documents.
Junior Data Scientist
Claim Genius Pvt. Ltd.
Remote, India
- ▸Built a high-performance Instance Segmentation pipeline using Detectron2 served via FastAPI for vehicle parts identification, improving segmentation accuracy to 95% mAP and accelerating assessment throughput by 26%; integrated GradCAM++ visualisations for regulatory model interpretability.
- ▸Engineered an automated ML pipeline failure tracing system using MLflow to monitor inference-time distributions and surface root-cause analysis on mispredictions, reducing manual diagnosis time by 30%.
- ▸Deployed an image super-resolution and denoising ensemble combining SWIN2SR Transformer and NAFNet pre-trained models to upscale and denoise compressed input images, reducing model failures by 16% and improving downstream prediction accuracy by 12%.
- ▸Built an automatic labelling error-detection service using Scikit-learn confidence scoring that flagged curation errors — saving 6 man-hours per head per sprint — and trained PyTorch GAN models for synthetic image generation to resolve class imbalance in rare damage categories.
- ▸Designed a geometric flat-tyre detection approach via OpenCV polygon analysis enabling reliable detection with zero curated data, and boosted vehicle damage severity classification by 3% per class through a fusion ensemble combining PyTorch CNN features with XGBoost structured metadata.
Certifications
Agentic Knowledge Graph Construction
DeepLearning.AI × Neo4j
Aug 2025
Building AI Voice Agents for Production
DeepLearning.AI × LiveKit
Jul 2025
Neo4j Fundamentals
Neo4j GraphAcademy
Jul 2025
Pretraining LLMs
DeepLearning.AI × Upstage
Feb 2025
TensorFlow Developer Certificate
Coursera
2023
Deep Learning Specialization
Coursera
2022
An Introduction To Practical Deep Learning
Intel - Coursera
2022
Technical Support Fundamentals
Google - Coursera
2021
Education
M.Sc. Computer Science (Big Data Analytics)
Central University of Rajasthan
Kishangarh, India
Integrated B.Sc. B.Ed. (Physical Sciences and Education)
Regional Institute of Education (NCERT), Bhubaneswar
Bhubaneswar, India
Featured Projects
Research and personal projects spanning multi-label learning, medical AI, quantum computing, and recommender systems.
LLSF: Learning Label-Specific Features
Research paper implementation from scratch for improving multi-label classification on imbalanced datasets using the Label-Specific Feature learning (LLSF) algorithm.
LLSF-DL MLSMOTE Hybrid for Tail Labels
Masters' thesis — hybrid deep learning approach combining Label-Specific Feature learning with MLSMOTE to address the tail-labels problem in multi-label classification.
LIFT: Multi-Label Learning with Label-Specific Features
Research paper implementation of the LIFT algorithm — learns label-specific feature transformations to improve multi-label classification performance.
NCERT AI Teacher Assistant
AI-powered assistant for teachers that automatically creates lesson plans, drafts question sets, and supports classroom preparation workflows.
Multi-label Datasets
Curated collection of multi-label classification benchmarks used across research experiments on label imbalance, feature learning, and SMOTE-based augmentation.
Fuzzy Computing Programs
Implementations of fuzzy sets, fuzzy logic inference systems, and fuzzy control applications from the Fuzzy Computing course at CURAJ.
Early Days of Machine Learning
A collection of ML algorithm implementations from 2018 — supervised learning, unsupervised learning, and regression techniques built from first principles.
DSA 101
Ongoing journey through Data Structures and Algorithms — curated problem sets, solutions, and notes in Python.
Latest Articles
Deep-dives into ML research, audio source separation, and multilingual NLP.

Salesforce Uses AWS Textract For Intelligent Document Automation
The healthcare domain has received all-time higher attention because of the current pandemic... Read the full article →

Extracting Vocals And Instrumentals From Music The Deep Learning Way
Whenever people get exposed to good music, the tune gets stuck in their heads for hours. And at some point, they google up the lyrics, vocals, and instrumental... Read the full article →

Microsoft Speller100: A Spell-Checker For Over 100 Languages
People do not care enough to use their queries’ correct spelling while searching for anything online... Read the full article →

A Deep Dive Into IBM Quantum Roadmap
“It took us 60 years from the first logic gates to modern cloud services. But IBM has set itself on a mission to fast forward the same journey for Quantum Computation (QC) to 3 years,” Jay G.. Read the full article →

IIT Kanpur Offers Free 8-Weeks Computational Science Course, Enrollments Ends 15th Feb
IIT Kanpur has opened up the enrollment for an eight-week online course on computational science on the SWAYAM platform. An AvHumboldt Fellow with over 50 publications in his name, Dr... Read the full article →

Dealing With Racially-Biased Hate-Speech Detection Models
Hate-speech detection models are the most glaring example of biased models, as shown by researchers from Allen Institute for Artificial Intelligence in their linguistic study... Read the full article →
Get In Touch
Open to senior AI/ML roles, GenAI research collaborations, and consulting opportunities.
Pradyumna Kumar Sahoo
Senior Data Scientist
📍 Hyderabad, India
pradyumna.sahoo@outlook.in
✉️ Send Me an EmailBuilt with Next.js · Tailwind CSS · Framer Motion — © 2026 Pradyumna Kumar Sahoo