About

Welcome to my blog, a dedicated space where data science, data engineering, and AI meet the complex and fascinating world of healthcare data. Here, I explore the practical and cutting-edge intersections of technologies like R, Python, Apache Arrow, Spark, and various SQL engines (DuckDB, PostgreSQL, Azure SQL Server, and more) with real-world clinical data challenges. Whether it’s structured data extraction from messy EHRs or mapping clinical records into Common Data Models (CDMs) like OMOP, this blog is for practitioners and enthusiasts who care about building scalable, intelligent, and ethical solutions in the healthcare domain.

I also dive into the evolving space of AI and machine learning in healthcare—especially Retrieval-Augmented Generation (RAG), fine-tuning small language models on domain-specific corpora, and deploying models that respect both data privacy and operational efficiency. From deep dives into Spark-based data pipelines to hands-on tutorials with DuckDB or exploring the nuances of CDM transformations, the content here is designed to balance technical rigor with real-world relevance. Whether you’re a data scientist, engineer, researcher, or just curious about where AI and healthcare data are headed, you’ll find something here to learn, build, and share.