Nafis Neehal
PhD Candidate (CS) | NLP/LLM (Dataset, Benchmarking, Fine-Tuning, Deploy, Evaluate, RAG) | Applied ML
About Me
As a PhD candidate in Computer Science at Rensselaer Polytechnic Institute, I have been solving complex challenges in healthcare—one of the most data-intensive industries—using advanced AI and machine learning techniques. With over 7 years of experience in applied machine learning, deep learning, and data science, I specialize in developing, fine-tuning, and deploying large-scale models that drive real-world impact.
Through industrial research collaborations, I currently bridge cutting-edge AI research with practical healthcare solutions. In collaboration with IBM Research, I lead the development of next-generation LLM frameworks for clinical trial automation, while my work with CDPHP focused on deploying ML systems that have processed over 22M+ patient records to improve healthcare delivery and risk prediction. My expertise in large-scale data processing, ML system optimization, and trustworthy AI architectures is domain-agnostic and directly transferable across industries.
My research has been recognized at leading venues including ACM RecSys, AMIA, and Society for Clinical Trials, demonstrating the successful translation of academic innovation to industrial applications. Previously, as a lecturer at Daffodil International University, I established the institution’s first AI Research Lab, mentoring future AI researchers and engineers.
Research Focus
My research lies at the intersection of AI and Healthcare, with expertise in developing trustworthy and efficient systems. My work spans:
-
LLM Development & Evaluation Led development of specialized clinical trial LLMs through quantized fine-tuning of Llama models on 65k+ trials, while engineering novel evaluation frameworks with hallucination-adjusted metrics for GPT-4/LLaMA-70B. Created comprehensive benchmarking infrastructure (CTBench) and implemented RAG architectures with few-shot learning, achieving 48.5% accuracy improvement in clinical trial feature generation tasks.
-
ML Systems for Healthcare Engineered large-scale healthcare ML systems processing 22.5M+ patient records with 87-dimensional features, implementing deep autoencoders for efficient patient matching (35% faster, 40% memory reduction) and hybrid clustering algorithms for treatment effect analysis in 350K+ patient cohorts, while achieving 200x efficiency gain through PCA-based optimization for imbalanced healthcare data.
-
Health Recommender Systems Developed fairness-aware patient matching frameworks improving treatment effect estimation accuracy by 75-80%, incorporating dual-adjustment pipelines for demographic alignment (96-99% improvement) and multi-stage survival analysis for outcome tracking. Engineered cost-efficient trial recruitment strategies demonstrating 25% reduction in expenses while maintaining equity.
-
MLOps & System Architecture Architected end-to-end ML pipelines using AWS SageMaker, MLflow, and Docker, optimizing large-scale data processing with PySpark implementations achieving 60% faster processing. Developed distributed computing solutions with vector database integrations for enhanced retrieval systems, focusing on scalability and production-ready deployments.
Technical Expertise
- Languages & DB: Python, SQL, R, C++, Neo4j, Google Firestore, MySQL
- ML/DL/Causal: PyTorch, DDP, TensorFlow, Scikit-learn, AutoML, OpenCV, EconML, DoWhy
- MLOps Stack: MLflow, Docker, CI/CD, ChromaDB, Hopsworks, PySpark, AWS (SageMaker, Lambda, EC2)
- LLM Frameworks LangChain, LlamaIndex, Hugging Face, Axolotl, Unsloth, Autotrain, LangGraph, Opik, Comet
- Specialization in LLMs:
- Prompt Engineering (Zero/Few Shot)
- Fine-tuning (PEFT)
- End-to-end RAG Pipeline (Embedding, Ingestion, Indexing, Storing, Query Engines)
- Quantization
- Benchmarking
- GraphRAG
- Trustworthiness Evaluation
- Deployment (WebUI + Cloud Serving)
Beyond Research
Beyond research, I am passionate about developing novel AI applications that solve real-world problems. I find deep intellectual engagement in reading across genres, with particular interests in human history and international politics, exploring how past events and current global dynamics shape our world. I’m an avid follower of science fiction movies and series, fascinated by their vision of technological futures and exploration of human-AI interactions. In my leisure time, I enjoy the strategic complexities and psychological elements of poker, which mirrors many of the analytical challenges I tackle in my research work.
News
Nov 13, 2024 | RecSys’24 (@HealthRecSys) Paper out now. [Link] |
---|---|
Oct 05, 2024 | Joining BanglaLLM - developing LLMs to improve reasoning in Bengali Language. [HuggingFace] |
Jun 25, 2024 | New Paper released on LLMs in Clinical Trial Design. [Link] |
Dec 14, 2023 | Passed PhD Candidacy Exam. 🎉 |
Dec 11, 2023 | Happy to Serve as a Reviewer in AMIA CIC 2024. 📚 |