← Back to Work

document-intelligence-service

An Intelligent document processing platform that classifies, extracts entities, summarizes, and answers natural laungauge questions against uploaded documents using Retrieval Augmented Generation (RAG).

Java 21Spring BootSpring aiReactopenaipgvector
// Impact: Ai powered RAG implementation for citation answers
{ FEATURES }
  • Engineered a RAG pipeline using Spring AI and pgvector: documents are parsed via Apache Tika, split into token-aware chunks, embedded as 1536-dim vectors, and queried via cosine similarity to ground LLM answers in actual document content
  • Built structured AI output for document classification and named-entity extraction using Spring AI's BeanOutputConverter, delivering type-safe JSON responses with confidence scores — cached in Redis to eliminate redundant LLM calls
  • Designed async document ingestion on Java 21 virtual threads (upload returns HTTP 202), with idempotent uploads via unique request keys, dual authentication (session tokens + BCrypt-hashed API keys), and a React + shadcn/ui frontend for document management and chat-based Q&A
  • Full observability: distributed tracing (Micrometer + Zipkin), metrics (Prometheus + Grafana)
⌥ View Code