Hi, I'm
Pradyumna
Senior Data Scientist
Impact-driven Data Scientist with close to 5 years of experience building production-grade AI/ML systems across Medical and Finance domain for Computer Vision, Audio and Generative AI use-cases. Proven track record in architecting real-time audio chatbot systems, Knowledge-graph based GraphRAG pipelines, fine-tuning large language models, developing multimodal agents and leading cross-functional teams. Seeking a Senior Data Scientist role to drive AI innovation at scale.
Skills & Technologies
A battle-tested toolkit spanning GenAI, Knowledge Graphs, Computer Vision, and production MLOps — built over 5+ years.
LLM Training & Inference
Agentic & RAG Systems
Knowledge Graphs & Vector Search
Computer Vision & NLP
MLOps & Infrastructure
Cloud & Databases
Work Experience
5 years building production AI/ML at scale across GenAI, Medical AI, and Data Science.
Senior Data Scientist
Mondee Pvt. Ltd.
Hyderabad, India
- ▸Architected a medical-grade GraphRAG audio chatbot for our flagship clinical decision support system CDSS deployed across Surekha Hospital Chain and BhaktiVedant Hospital, by constructing structured knowledge graphs from medical textbooks using NER and Neo4J, enabling doctors to query alternative possible diagnosis and treatment protocols with traceable, hallucination-resistant responses via a LangChain-powered retrieval layer.
- ▸Engineered a drug–drug interaction checker and dosage scheduler agent served through a Model Context Protocol (MCP) interface, integrating real-time data from RxNorm and PubMed APIs to surface conflict alerts and patient-specific dosage recommendations within the chat session.
- ▸Led end-to-end fine-tuning and serving of Medgemma-27b-text-it for domain-specific clinical NLP tasks using Unsloth and vLLM, coordinating data curation, training, and evaluation pipelines with research scholars from IIT Madras and IIT Hyderabad.
- ▸Directed large-scale Speech-to-Text (STT/ASR) data preparation and multimodal fine-tuning of gemma-3n-e2b-it via LoRA and TGI-based serving, overseeing live clinical audio collection, Subject Matter Expert annotation, and quality control to build a medical voice agent capable of real-time clinical transcription and Doctor's Note generation.
Senior Member Technical (AI/ML)
ADP India Pvt. Ltd.
Hyderabad, India
- ▸Designed and deployed a Knowledge-Graph based RAG pipeline on AWS Neptune & Bedrock orchestrated via LangGraph, enhancing financial data processing accuracy and minimising AI hallucinations — recognised as runner-up in the ADP Global Hackathon (2024).
- ▸Developed an intelligent Process Mining Chatbot for Global Payroll Services using Microsoft Power Automate and Databricks to analyse transaction patterns across millions of client records, optimising payment workflows across 73 payroll cycles per client on average.
- ▸Engineered an agentic assistant built on Google Agent Development Kit and AWS Opensearch that drafts context-aware emails and schedules meetings in real-time by checking live calendars, saving equivalent to 24 hours per user per month — currently rolled out across all ADP employees.
- ▸Built a scalable QR code detection, decoding, and masking pipeline with a fine-tuned YOLOv8 for multi-orientation detection and OpenCV for automated region masking, sanitising financial documents prior to downstream processing at scale across millions of payroll documents.
- ▸Developed an Indic PII detection and redaction system for financial regulatory compliance using fine-tuned NER models via HuggingFace Transformers and SpaCy with IndicNLP, identifying sensitive entities — Aadhaar, PAN, account numbers, names across 10+ Indic scripts — from payroll and HR documents.
Junior Data Scientist
Claim Genius Pvt. Ltd.
Remote, India
- ▸Built a high-performance Instance Segmentation pipeline using Detectron2 served via FastAPI for vehicle parts identification, improving segmentation accuracy to 95% mAP and accelerating assessment throughput by 26%; integrated GradCAM++ visualisations for regulatory model interpretability.
- ▸Engineered an automated ML pipeline failure tracing system using MLflow to monitor inference-time distributions and surface root-cause analysis on mispredictions, reducing manual diagnosis time by 30%.
- ▸Deployed an image super-resolution and denoising ensemble combining SWIN2SR Transformer and NAFNet pre-trained models to upscale and denoise compressed input images, reducing model failures by 16% and improving downstream prediction accuracy by 12%.
- ▸Built an automatic labelling error-detection service using Scikit-learn confidence scoring that flagged curation errors — saving 6 man-hours per head per sprint — and trained PyTorch GAN models for synthetic image generation to resolve class imbalance in rare damage categories.
- ▸Designed a geometric flat-tyre detection approach via OpenCV polygon analysis enabling reliable detection with zero curated data, and boosted vehicle damage severity classification by 3% per class through a fusion ensemble combining PyTorch CNN features with XGBoost structured metadata.
Certifications
Agentic Knowledge Graph Construction
DeepLearning.AI × Neo4j
Aug 2025
Neo4j Fundamentals
Neo4j GraphAcademy
Jul 2025
Pretraining LLMs
DeepLearning.AI × Upstage
Feb 2025
TensorFlow Developer Certificate
Coursera
2023
Deep Learning Specialization
Coursera
2022
An Introduction To Practical Deep Learning
Intel - Coursera
2022
Technical Support Fundamentals
Google - Coursera
2021
Education
M.Sc. Computer Science (Big Data Analytics)
Central University of Rajasthan
Kishangarh, India
Integrated B.Sc. B.Ed. (Physical Sciences and Education)
Regional Institute of Education (NCERT), Bhubaneswar
Bhubaneswar, India
Featured Projects
Research and personal projects spanning multi-label learning, medical AI, quantum computing, and recommender systems.
Algorithm Development: Multi-label Classification Enhancement
Improving multi-label classification by generating synthetic data for rare labels using MLSMOTE technique at TCS Big Data Lab, Rajasthan. Addressed the tail-labels problem where classifiers struggle with underrepresented labels.
LLSF_DL-MLSMOTE-Hybrid
Hybrid deep learning approach combining Label-Specific Feature learning with MLSMOTE for multi-label classification. Implements the LLSF-DL algorithm for improved classification performance on imbalanced datasets.
LLSF-Learning-Label-Specific-Features
Implementation of the Learning Label-Specific Features (LLSF) algorithm for multi-label classification. Enables feature selection by ranking features according to their relevance to each label.
Session-based Recommendation with Graph Neural Networks
Graph Neural Network-based recommendation system for session-based learning. Captures essential features from graph structures to recommend items during ongoing sessions.
Electricity Price Prediction using ELM-PSO-ARIMA
Hybrid model combining Extreme Learning Machine, Particle Swarm Optimization, and ARIMA to capture frequent changes in electricity prices with improved accuracy.
SVM-kNN-PSO Ensemble for Intrusion Detection
Novel ensemble method combining Support Vector Machines, k-Nearest Neighbors, and Particle Swarm Optimization for robust intrusion detection system.
Brain-Tumor-Segmentation
Deep learning brain tumor segmentation from MRI scans using U-Net architecture with attention mechanisms for improved medical imaging analysis.
MLSMOTE
Multi-Label Synthetic Minority Over-sampling Technique for handling class imbalance in multi-label datasets. Generates synthetic samples for minority labels.
Rule-based-Recommender-system
Rule-based recommendation engine using association rules and collaborative filtering techniques with NLP for personalized recommendations.
My-first-quantum-code
Quantum computing experiments using Qiskit — exploring quantum circuit simulations and entanglement phenomena.
Latest Articles
Deep-dives into ML research, audio source separation, and multilingual NLP.

Salesforce Uses AWS Textract For Intelligent Document Automation
The healthcare domain has received all-time higher attention because of the current pandemic... Read the full article →

Extracting Vocals And Instrumentals From Music The Deep Learning Way
Whenever people get exposed to good music, the tune gets stuck in their heads for hours. And at some point, they google up the lyrics, vocals, and instrumental... Read the full article →

Microsoft Speller100: A Spell-Checker For Over 100 Languages
People do not care enough to use their queries’ correct spelling while searching for anything online... Read the full article →

A Deep Dive Into IBM Quantum Roadmap
“It took us 60 years from the first logic gates to modern cloud services. But IBM has set itself on a mission to fast forward the same journey for Quantum Computation (QC) to 3 years,” Jay G.. Read the full article →

IIT Kanpur Offers Free 8-Weeks Computational Science Course, Enrollments Ends 15th Feb
IIT Kanpur has opened up the enrollment for an eight-week online course on computational science on the SWAYAM platform. An AvHumboldt Fellow with over 50 publications in his name, Dr... Read the full article →

Dealing With Racially-Biased Hate-Speech Detection Models
Hate-speech detection models are the most glaring example of biased models, as shown by researchers from Allen Institute for Artificial Intelligence in their linguistic study... Read the full article →
Get In Touch
Open to senior AI/ML roles, GenAI research collaborations, and consulting opportunities.
Pradyumna Kumar Sahoo
Senior Data Scientist
📍 Hyderabad, India
pradyumna.sahoo@outlook.in
✉️ Send Me an EmailBuilt with Next.js · Tailwind CSS · Framer Motion — © 2026 Pradyumna Kumar Sahoo