Open to Senior Roles

Hi, I'm
Pradyumna

Senior Data Scientist

Impact-driven Data Scientist with 5 years of experience building production-grade AI/ML systems across Medical AI, NLP, Computer Vision, and Generative AI. Proven track record in architecting GraphRAG pipelines, fine-tuning large medical language models, developing voice agents for clinical applications, and leading cross-functional teams.

5+
Years Experience
15+
Projects Built
2
IIT Collaborations
3
Companies
Pradyumna Kumar Sahoo
🕸️GraphRAG
🧠LLM Fine-tuning
🏥Medical AI
GenAI
Scroll
Expertise

Skills & Technologies

A battle-tested toolkit spanning GenAI, Knowledge Graphs, Computer Vision, and production MLOps — built over 5+ years.

🧠

GenAI & LLMs

Medgemma-27bGemma-3n (ASR)LLaMAGPT-4PEFT / LoRALangChainLangGraphPrompt EngineeringHuggingFace TransformersRAG PipelinesAgentic Systems
🕸️

Knowledge Graphs & RAG

GraphRAGAWS NeptuneNeo4jRxNorm APIPubMed APIVector DatabasesKnowledge Graph ConstructionAgentic KGDrug Interaction Systems
👁️

Computer Vision & NLP

PyTorchDetectron2YOLOOpenCVGradCAM++GAN / Synthetic DataInstance SegmentationSpaCyNLTKASR Fine-tuning
⚙️

MLOps & Infrastructure

AWS (Neptune, S3, Lambda, Bedrock)DockerFastAPIApache SparkAirflowETL PipelinesProcess MiningReal-time Analytics
📊

Core Data Science

Scikit-learnTensorFlowPostgreSQLMachine LearningStatistical ModelingBig Data AnalyticsPythonMATLAB
Career

Work Experience

5 years building production AI/ML at scale across GenAI, Medical AI, and Data Science.

Senior Data Scientist

Mondee Pvt. Ltd.

Hyderabad, India

August 2025 – PresentCurrent
  • Architected a medical-grade GraphRAG chatbot for clinical decision support by constructing structured knowledge graphs from medical textbooks — enabling traceable, hallucination-resistant drug query responses.
  • Engineered a drug–drug interaction checker and dosage scheduler integrating real-time RxNorm and PubMed APIs for conflict alerts and patient-specific recommendations.
  • Led end-to-end fine-tuning of Medgemma-27b-text-it for clinical NLP, coordinating data curation, training, and evaluation with teams from IIT Madras and IIT Hyderabad.
  • Directed large-scale ASR data preparation for gemma-3n-e2b-it to build a medical voice agent capable of real-time clinical transcription and query resolution.
GraphRAGMedical AILLM Fine-tuningASRKnowledge Graphs

Data Scientist

ADP India Pvt. Ltd.

Hyderabad, India

December 2023 – July 2025
  • Designed a Knowledge-Graph RAG pipeline on AWS Neptune & Bedrock, reducing AI hallucinations in financial data — recognised as runner-up in the ADP Global Hackathon.
  • Built a Process Mining solution analysing millions of client records to optimise payment workflows across 73 payroll cycles per client on average.
  • Engineered an agentic assistant that drafts emails and schedules meetings in real-time, saving 24 hours per user per month — now rolled out across all ADP employees.
AWS NeptuneLangGraphProcess MiningAgentic AIGraphRAG

Junior Data Scientist

Claim Genius Pvt. Ltd.

Remote, India

June 2021 – December 2023
  • Built a high-performance Instance Segmentation pipeline with Detectron2 + FastAPI, improving mAP by 12% and accelerating vehicle damage assessment by 26%.
  • Enhanced model interpretability for regulatory compliance by integrating GradCAM++ visualisations.
  • Trained GAN models for synthetic image generation to resolve class imbalance in insurance damage detection datasets.
Detectron2FastAPIGradCAM++GANComputer Vision

Certifications

🕸️

Agentic Knowledge Graph Construction

DeepLearning.AI × Neo4j

Aug 2025

🗃️

Neo4j Fundamentals

Neo4j GraphAcademy

Jul 2025

🧠

Pretraining LLMs

DeepLearning.AI × Upstage

Feb 2025

🔧

TensorFlow Developer Certificate

Coursera

2023

🧠

Deep Learning Specialization

Coursera

2022

An Introduction To Practical Deep Learning

Intel - Coursera

2022

💬

Technical Support Fundamentals

Google - Coursera

2021

Education

🎓

M.Sc. Computer Science (Big Data Analytics)

Central University of Rajasthan

Kishangarh, India

⚛️

Integrated B.Sc. B.Ed. (Physical Sciences and Education)

Regional Institute of Education (NCERT), Bhubaneswar

Bhubaneswar, India

Open Source

Featured Projects

Research and personal projects spanning multi-label learning, medical AI, quantum computing, and recommender systems.

🏷️
8

Algorithm Development: Multi-label Classification Enhancement

Improving multi-label classification by generating synthetic data for rare labels using MLSMOTE technique at TCS Big Data Lab, Rajasthan. Addressed the tail-labels problem where classifiers struggle with underrepresented labels.

Multi-labelData AugmentationTCS Big Data LabDeep Learning
Python
🧬
6

LLSF_DL-MLSMOTE-Hybrid

Hybrid deep learning approach combining Label-Specific Feature learning with MLSMOTE for multi-label classification. Implements the LLSF-DL algorithm for improved classification performance on imbalanced datasets.

Multi-labelDeep LearningSMOTE
MATLAB
🔬
4

LLSF-Learning-Label-Specific-Features

Implementation of the Learning Label-Specific Features (LLSF) algorithm for multi-label classification. Enables feature selection by ranking features according to their relevance to each label.

Multi-labelFeature LearningClassification
Jupyter Notebook
🕸️
5

Session-based Recommendation with Graph Neural Networks

Graph Neural Network-based recommendation system for session-based learning. Captures essential features from graph structures to recommend items during ongoing sessions.

GNNRecommender SystemGraph Learning
Python
4

Electricity Price Prediction using ELM-PSO-ARIMA

Hybrid model combining Extreme Learning Machine, Particle Swarm Optimization, and ARIMA to capture frequent changes in electricity prices with improved accuracy.

Time SeriesOptimizationELMEnergy
Python
🛡️
5

SVM-kNN-PSO Ensemble for Intrusion Detection

Novel ensemble method combining Support Vector Machines, k-Nearest Neighbors, and Particle Swarm Optimization for robust intrusion detection system.

SecurityEnsemble LearningPSOIDS
Python
🏥
3

Brain-Tumor-Segmentation

Deep learning brain tumor segmentation from MRI scans using U-Net architecture with attention mechanisms for improved medical imaging analysis.

Medical AISegmentationU-Net
Python
⚖️
3

MLSMOTE

Multi-Label Synthetic Minority Over-sampling Technique for handling class imbalance in multi-label datasets. Generates synthetic samples for minority labels.

Data AugmentationClass Imbalance
Python
🎯
2

Rule-based-Recommender-system

Rule-based recommendation engine using association rules and collaborative filtering techniques with NLP for personalized recommendations.

Recommender SystemNLPAssociation Rules
Python
⚛️
1

My-first-quantum-code

Quantum computing experiments using Qiskit — exploring quantum circuit simulations and entanglement phenomena.

Quantum ComputingQiskit
Python
Writing

Latest Articles

Deep-dives into ML research, audio source separation, and multilingual NLP.

Let's Connect

Get In Touch

Open to senior AI/ML roles, GenAI research collaborations, and consulting opportunities.

Pradyumna Kumar Sahoo

Pradyumna Kumar Sahoo

Senior Data Scientist

📍 Hyderabad, India

pradyumna.sahoo@outlook.in

✉️ Send Me an Email

Built with Next.js · Tailwind CSS · Framer Motion — © 2026 Pradyumna Kumar Sahoo