Next-Generation Retrieval-Augmented Generation System: A Privacy-First Approach to Intelligent Document Analysis

Vigneshkumar S, Ramaranganathan S, Ridhun Krishnan R, Vishal Sarvani S, Pradeep G

doi:10.7492/0nf04922

Authors

Vigneshkumar S, Ramaranganathan S, Ridhun Krishnan R, Vishal Sarvani S, Pradeep G Author

DOI:

https://doi.org/10.7492/0nf04922

Abstract

One way to make big language tools more trust- worthy is by pulling in outside facts when they answer. These setups usually lean on remote servers, which can be tricky if you care about who sees your information. Instead of guessing, some versions still get details wrong or miss pieces of longer texts. Handling different kinds of files - like images mixed with words

- doesn’t always work well either.

This work introduces a web browser tool focused on privacy, built to analyze documents and answer questions smartly. All processing happens right inside your browser - nothing gets sent elsewhere. Instead of relying solely on one method, it uses both semantic matching and network-like data exploration together. By keeping everything local and blending these two techniques, results become more precise and context-aware.

Starting with mixed-format docs, it pulls out words, charts, for- mulas, or pictures straight from PDFs, Word files, spreadsheets, and scanned images. Instead of just trusting outputs, trust comes from triple-checking sources, matching facts across references, then confirming meaning through small-scale language tools that run on-site.