Paper Title
Enhancing Text-to-SQL for Health Records with Retrieval-Augmented Generation and Instruction Tuning
Abstract
Complex EHR data accesses are knowledge-intensive SQL, making them inaccessible to many healthcare professionals and researchers who rely on speedy accuracy in data supplied to inform patient care and research. This paper begins with an introduction to an improved Text-to-SQL framework designed with the specific needs of healthcare in mind since the EHR databases differ uniquely. Our approach integrates Retrieval-Augmented Generation for schema and inclusion of relevant medical terminology in SQL generation, and Instruction Tuning for flexibility on adaptation with respect to varying query structure and complexities associated with a schema. Experimental results on health care-specific datasets demonstrate how our approach significantly out performs standard Text-to-SQL models in accuracy and robustness in generating consistent high quality queries for health record applications. This framework opens ways to a much more intuitive and effective interaction with EHR data, hence enabling health care professionals who need information directly to access it easily.
Keywords- Schema Linking, Chain-of-Thought Prompting, Masked LanguageModeling (MLM), Electronic Medical Records (EMRs), Retrieval Augmented Generation (RAG), Abstract Syntax Trees (ASTs), Dual Instruction Tuning, Stepwise SQL Parsing, Complex Query Decomposition, Spider Benchmark, Named Entity Recognition (NER), Schema Pruning, MIMICSQL Dataset