Open to Senior Roles

Hi, I'm
Pradyumna

Senior Data Scientist

Impact-driven Data Scientist with close to 5 years of experience building production-grade AI/ML systems across Medical and Finance domain for Computer Vision, Audio and Generative AI use-cases. Proven track record in architecting real-time audio chatbot systems, Knowledge-graph based GraphRAG pipelines, fine-tuning large language models, developing multimodal agents and leading cross-functional teams. Seeking a Senior Data Scientist role to drive AI innovation at scale.

5+
Years Experience
15+
Projects Built
2
IIT Collaborations
3
Companies
Pradyumna Kumar Sahoo
🕸️GraphRAG
🧠LLM Fine-tuning
🏥Medical AI
GenAI
Scroll
Expertise

Skills & Technologies

A battle-tested toolkit spanning GenAI, Knowledge Graphs, Computer Vision, and production MLOps — built over 5+ years.

🧠

LLM Training & Inference

Full Fine-tuningPEFT / LoRA / QLoRAInstruction TuningRLHF / DPOMixed-precision (bf16/fp16)Gradient CheckpointingvLLMQuantisation (GPTQ / AWQ / bitsandbytes)HuggingFace TransformersDSPyUnsloth
🤖

Agentic & RAG Systems

LangChainLangGraphLiveKitTool-use / Function CallingMulti-agent OrchestrationMultimodal RAGModel Context Protocol (MCP)Google Agent Development KitPrompt Engineering
🕸️

Knowledge Graphs & Vector Search

Neo4jAWS NeptuneQdrantLanceDBKnowledge Graph ConstructionAgentic KGRxNorm APIPubMed APIDrug Interaction Systems
👁️

Computer Vision & NLP

PyTorchDetectron2YOLOOpenCVGradCAM++GAN / Synthetic DataInstance SegmentationSpaCyNLTKASR Fine-tuningSWIN2SRNAFNet
⚙️

MLOps & Infrastructure

Weights & BiasesMLflowDockerFastAPIApache AirflowApache Spark (Databricks)ETL Pipeline DesignProcess MiningCI/CD for MLGit
☁️

Cloud & Databases

AWS BedrockAWS NeptuneAWS S3 / Lambda / EC2AWS OpensearchGCP Vertex AIAzure MLPostgreSQLMongoDBSQLitePythonSQLBash
Career

Work Experience

5 years building production AI/ML at scale across GenAI, Medical AI, and Data Science.

Senior Data Scientist

Mondee Pvt. Ltd.

Hyderabad, India

August 2025 – PresentCurrent
  • Architected a medical-grade GraphRAG audio chatbot for our flagship clinical decision support system CDSS deployed across Surekha Hospital Chain and BhaktiVedant Hospital, by constructing structured knowledge graphs from medical textbooks using NER and Neo4J, enabling doctors to query alternative possible diagnosis and treatment protocols with traceable, hallucination-resistant responses via a LangChain-powered retrieval layer.
  • Engineered a drug–drug interaction checker and dosage scheduler agent served through a Model Context Protocol (MCP) interface, integrating real-time data from RxNorm and PubMed APIs to surface conflict alerts and patient-specific dosage recommendations within the chat session.
  • Led end-to-end fine-tuning and serving of Medgemma-27b-text-it for domain-specific clinical NLP tasks using Unsloth and vLLM, coordinating data curation, training, and evaluation pipelines with research scholars from IIT Madras and IIT Hyderabad.
  • Directed large-scale Speech-to-Text (STT/ASR) data preparation and multimodal fine-tuning of gemma-3n-e2b-it via LoRA and TGI-based serving, overseeing live clinical audio collection, Subject Matter Expert annotation, and quality control to build a medical voice agent capable of real-time clinical transcription and Doctor's Note generation.
Medical GraphRAGClinical NLPLLM Fine-tuningVoice AIMedgemma-27bMCP

Senior Member Technical (AI/ML)

ADP India Pvt. Ltd.

Hyderabad, India

December 2023 – July 2025
  • Designed and deployed a Knowledge-Graph based RAG pipeline on AWS Neptune & Bedrock orchestrated via LangGraph, enhancing financial data processing accuracy and minimising AI hallucinations — recognised as runner-up in the ADP Global Hackathon (2024).
  • Developed an intelligent Process Mining Chatbot for Global Payroll Services using Microsoft Power Automate and Databricks to analyse transaction patterns across millions of client records, optimising payment workflows across 73 payroll cycles per client on average.
  • Engineered an agentic assistant built on Google Agent Development Kit and AWS Opensearch that drafts context-aware emails and schedules meetings in real-time by checking live calendars, saving equivalent to 24 hours per user per month — currently rolled out across all ADP employees.
  • Built a scalable QR code detection, decoding, and masking pipeline with a fine-tuned YOLOv8 for multi-orientation detection and OpenCV for automated region masking, sanitising financial documents prior to downstream processing at scale across millions of payroll documents.
  • Developed an Indic PII detection and redaction system for financial regulatory compliance using fine-tuned NER models via HuggingFace Transformers and SpaCy with IndicNLP, identifying sensitive entities — Aadhaar, PAN, account numbers, names across 10+ Indic scripts — from payroll and HR documents.
AWS NeptuneLangGraphProcess MiningAgentic AIFinance AIYOLOv8Indic NLP

Junior Data Scientist

Claim Genius Pvt. Ltd.

Remote, India

June 2021 – December 2023
  • Built a high-performance Instance Segmentation pipeline using Detectron2 served via FastAPI for vehicle parts identification, improving segmentation accuracy to 95% mAP and accelerating assessment throughput by 26%; integrated GradCAM++ visualisations for regulatory model interpretability.
  • Engineered an automated ML pipeline failure tracing system using MLflow to monitor inference-time distributions and surface root-cause analysis on mispredictions, reducing manual diagnosis time by 30%.
  • Deployed an image super-resolution and denoising ensemble combining SWIN2SR Transformer and NAFNet pre-trained models to upscale and denoise compressed input images, reducing model failures by 16% and improving downstream prediction accuracy by 12%.
  • Built an automatic labelling error-detection service using Scikit-learn confidence scoring that flagged curation errors — saving 6 man-hours per head per sprint — and trained PyTorch GAN models for synthetic image generation to resolve class imbalance in rare damage categories.
  • Designed a geometric flat-tyre detection approach via OpenCV polygon analysis enabling reliable detection with zero curated data, and boosted vehicle damage severity classification by 3% per class through a fusion ensemble combining PyTorch CNN features with XGBoost structured metadata.
Detectron2FastAPIGradCAM++GANInstance SegmentationMLflowXGBoost

Certifications

🕸️

Agentic Knowledge Graph Construction

DeepLearning.AI × Neo4j

Aug 2025

🗃️

Neo4j Fundamentals

Neo4j GraphAcademy

Jul 2025

🧠

Pretraining LLMs

DeepLearning.AI × Upstage

Feb 2025

🔧

TensorFlow Developer Certificate

Coursera

2023

🧠

Deep Learning Specialization

Coursera

2022

An Introduction To Practical Deep Learning

Intel - Coursera

2022

💬

Technical Support Fundamentals

Google - Coursera

2021

Education

🎓

M.Sc. Computer Science (Big Data Analytics)

Central University of Rajasthan

Kishangarh, India

⚛️

Integrated B.Sc. B.Ed. (Physical Sciences and Education)

Regional Institute of Education (NCERT), Bhubaneswar

Bhubaneswar, India

Open Source

Featured Projects

Research and personal projects spanning multi-label learning, medical AI, quantum computing, and recommender systems.

🏷️
8

Algorithm Development: Multi-label Classification Enhancement

Improving multi-label classification by generating synthetic data for rare labels using MLSMOTE technique at TCS Big Data Lab, Rajasthan. Addressed the tail-labels problem where classifiers struggle with underrepresented labels.

Multi-labelData AugmentationTCS Big Data LabDeep Learning
Python
🧬
6

LLSF_DL-MLSMOTE-Hybrid

Hybrid deep learning approach combining Label-Specific Feature learning with MLSMOTE for multi-label classification. Implements the LLSF-DL algorithm for improved classification performance on imbalanced datasets.

Multi-labelDeep LearningSMOTE
MATLAB
🔬
4

LLSF-Learning-Label-Specific-Features

Implementation of the Learning Label-Specific Features (LLSF) algorithm for multi-label classification. Enables feature selection by ranking features according to their relevance to each label.

Multi-labelFeature LearningClassification
Jupyter Notebook
🕸️
5

Session-based Recommendation with Graph Neural Networks

Graph Neural Network-based recommendation system for session-based learning. Captures essential features from graph structures to recommend items during ongoing sessions.

GNNRecommender SystemGraph Learning
Python
4

Electricity Price Prediction using ELM-PSO-ARIMA

Hybrid model combining Extreme Learning Machine, Particle Swarm Optimization, and ARIMA to capture frequent changes in electricity prices with improved accuracy.

Time SeriesOptimizationELMEnergy
Python
🛡️
5

SVM-kNN-PSO Ensemble for Intrusion Detection

Novel ensemble method combining Support Vector Machines, k-Nearest Neighbors, and Particle Swarm Optimization for robust intrusion detection system.

SecurityEnsemble LearningPSOIDS
Python
🏥
3

Brain-Tumor-Segmentation

Deep learning brain tumor segmentation from MRI scans using U-Net architecture with attention mechanisms for improved medical imaging analysis.

Medical AISegmentationU-Net
Python
⚖️
3

MLSMOTE

Multi-Label Synthetic Minority Over-sampling Technique for handling class imbalance in multi-label datasets. Generates synthetic samples for minority labels.

Data AugmentationClass Imbalance
Python
🎯
2

Rule-based-Recommender-system

Rule-based recommendation engine using association rules and collaborative filtering techniques with NLP for personalized recommendations.

Recommender SystemNLPAssociation Rules
Python
⚛️
1

My-first-quantum-code

Quantum computing experiments using Qiskit — exploring quantum circuit simulations and entanglement phenomena.

Quantum ComputingQiskit
Python
Writing

Latest Articles

Deep-dives into ML research, audio source separation, and multilingual NLP.

Let's Connect

Get In Touch

Open to senior AI/ML roles, GenAI research collaborations, and consulting opportunities.

Pradyumna Kumar Sahoo

Pradyumna Kumar Sahoo

Senior Data Scientist

📍 Hyderabad, India

pradyumna.sahoo@outlook.in

✉️ Send Me an Email

Built with Next.js · Tailwind CSS · Framer Motion — © 2026 Pradyumna Kumar Sahoo