VectoRAG: A Document-Grounded Retrieval- Augmented Generation System with Page-Level Traceability for Academic CSE Resources
DOI:
https://doi.org/10.7492/xe738308Abstract
Retrieval-Augmented Generation (RAG) systems combine large language models with external knowledge sources to improve factual grounding. In academic settings, however, conventional RAG implementations often lack fine- grained traceability, making it difficult for learners to verify the origin of generated responses. This challenge is particularly relevant in computer science education, where contextual precision and source citation are essential.This paper presents VectoRAG, a hybrid retrieval-augmented generation system designed to provide document-grounded answers with page-level traceability for academic CSE resources. The architecture integrates dense vector retrieval and lexical search while preserving metadata across document chunks to enable precise citation mapping. Retrieved evidence is assembled into constrained prompts to support reliable answer generation and reduce unsupported outputs. In addition to question answering, the system enables evidence-based quiz generation derived directly from uploaded documents.
A baseline comparison conducted on curated academic queries suggests improvements in citation accuracy and grounding behaviour relative to a vector-only retrieval pipeline. The findings highlight the importance of metadata-aware hybrid retrieval in building transparent and educationally reliable RAG systems.














