Telecom AI Bot — System Architecture

System Overview

AI-powered customer service platform handling text and voice interactions for telecom operations serving millions of subscribers.

🎙️ Voice + Text Bot

Multi-modal customer interaction via WebSocket voice streams and REST chat endpoints. Powered by open-source LLMs with domain-specific fine-tuning for telecom context.

WhisperPiper TTSvLLMLangChain4j

🔌 MCP Tool Server

Model Context Protocol enabled APIs that expose telecom business functions as tools the LLM can invoke — billing, account lookup, plan management, and more.

Spring MCPTool RegistryJSON-RPC

🛡️ Guardrails Engine

Multi-layer safety: identity verification before data access, PII redaction, context boundary enforcement, and business-rule alignment for every response.

GuardrailsAIJWTOPARow-Level Security

📊 Data Isolation

PostgreSQL Row-Level Security (RLS) ensures zero data leakage between customers. Every query is scoped to the authenticated subscriber ID at the database level.

PostgreSQL RLSpgcryptoTenant Context

⚡ Scalable Infrastructure

AWS EKS with auto-scaling pods, ALB for traffic distribution, ElastiCache for session management, and SQS for async processing of billing operations.

EKSALBElastiCacheSQS

📈 Observability

Full-stack observability with OpenTelemetry traces, Prometheus metrics, Grafana dashboards, and Loki for log aggregation across all microservices.

OpenTelemetryPrometheusGrafanaLoki

Key Design Principles

Zero-Trust Data Access

No customer data is returned without verified identity. Database queries are scoped via RLS policies. LLM context never contains other customers' data.

Context-Aware Conversations

Redis-backed session store maintains conversation state with TTL. The bot remembers what was discussed within a session but purges after timeout.

Business-Aligned Responses

System prompts and guardrails constrain responses to telecom domain. Off-topic or harmful requests are intercepted before reaching the LLM.

Non-Functional Targets

Metric	Target
Response Latency (text)	< 2 seconds P95
Response Latency (voice)	< 3.5 seconds P95
Concurrent Users	10,000+
Availability	99.95%
Data Isolation	100% (RLS enforced)
Context Accuracy	> 95% on-topic
Auth Before Data	100% enforced
PII in Logs	0 (redacted)

Application Architecture

Layered microservices architecture with MCP-enabled tool calling and multi-modal input processing.

🌐 Client Layer — Channels

Mobile AppReact Native / Flutter

Web WidgetEmbedded JS SDK

IVR GatewayTwilio / Amazon Connect

WhatsAppBusiness API

USSDShort Code Gateway

▼ HTTPS / WSS / gRPC

🔀 API Gateway & Load Balancing

AWS ALBTLS Termination

Spring Cloud GatewayRate Limiting + Routing

Kong / APISIXAuth + Throttle

▼ JWT + mTLS

🤖 AI Orchestration Layer — Spring Boot 3.4+

Chat ControllerREST + SSE Streaming

Voice ControllerWebSocket + gRPC

Session ManagerRedis + TTL

LLM OrchestratorLangChain4j + Spring AI

Guardrails EnginePre/Post Filters

▼ MCP JSON-RPC over HTTP/SSE

🔧 MCP Tool Server Layer

Billing Toolsget_bill, pay_bill

Account Toolsget_profile, update_info

Plan Toolscurrent_plan, upgrade

Invoice Toolsget_invoices, download

Network Toolsdata_usage, speed_test

▼ SQL (RLS enforced) / REST

💾 Data & AI Model Layer

PostgreSQL 16RLS + pgcrypto

Redis ClusterSessions + Cache

vLLM / OllamaMistral-7B / LLaMA-3

Whisper ServerASR (Speech→Text)

Piper TTSText→Speech

▼ Metrics / Traces / Logs

📡 Observability & Infrastructure

OpenTelemetryDistributed Tracing

PrometheusMetrics Collection

GrafanaDashboards

LokiLog Aggregation

AWS EKSKubernetes Cluster

Component Catalog

Every component identified for development, with responsibilities and technology choices.

Core AI Orchestration Service

The central brain that coordinates LLM inference, tool selection, and response generation.

Sub-Component	Technology	Responsibility
LLM Client	LangChain4j + Spring AI	Unified interface to vLLM-hosted Mistral/LLaMA models with structured output parsing
Prompt Engine	Jinja2-style Templates	Dynamic system prompt construction with customer context, guardrail instructions, and tool schemas
Tool Router	Spring MCP Client	Discovers and invokes MCP tools based on LLM function-calling output
Context Builder	Custom Java Service	Assembles conversation history + auth state + business context into LLM context window
Response Validator	Custom Guardrails	Post-LLM output validation — PII detection, hallucination check, business alignment

MCP Tool Server (Billing, Account, Plans)

MCP-compliant tool server exposing telecom business operations as callable tools with JSON Schema definitions.

Tool Name	Parameters	Action
`get_current_bill`	subscriber_id	Fetches current billing cycle amount, due date, status
`get_bill_payment_link`	subscriber_id, amount	Generates a one-time payment link via payment gateway
`get_past_invoices`	subscriber_id, months	Returns invoice history for the last N months
`get_subscriber_profile`	subscriber_id	Fetches name, email, plan, status (masked phone/email)
`update_subscriber_info`	subscriber_id, field, value	Updates profile fields after OTP verification
`get_current_plan`	subscriber_id	Returns active plan details — data, calls, validity
`get_data_usage`	subscriber_id, period	Data consumption for current cycle or custom period

Security Authentication & Authorization

Multi-step identity verification before any data access. No customer data is exposed without proof of identity.

Component	Technology	Details
Phone/OTP Verification	Spring Security + Twilio	SMS OTP sent to registered number; verified within session
JWT Session Tokens	Spring OAuth2 Resource Server	Short-lived JWT (15min) with subscriber_id claim
Row-Level Security	PostgreSQL RLS Policies	SET app.current_subscriber_id before every query — DB enforces isolation
API Rate Limiting	Redis + Bucket4j	Per-subscriber rate limits: 60 req/min chat, 20 req/min voice
PII Redaction	Custom Filter Chain	Regex + NER-based PII detection in logs and LLM context

Voice Voice Processing Pipeline

Real-time voice interaction with streaming ASR and TTS for natural conversational experience.

Stage	Technology	Details
Audio Capture	WebSocket (Opus codec)	16kHz mono audio streamed from client in 100ms frames
Speech-to-Text	Faster-Whisper (GPU)	CTranslate2-optimized Whisper model; real-time factor < 0.3x
Language Detection	Whisper Built-in	Auto-detect Hindi, English, regional languages
Intent → LLM	Same AI Orchestrator	Transcribed text processed through same pipeline as chat
Text-to-Speech	Piper TTS / Coqui	Neural TTS with natural prosody; multilingual support
Audio Streaming	WebSocket response	TTS audio chunks streamed back as PCM/Opus frames

Data Database & Caching

Store	Technology	Usage
Primary DB	PostgreSQL 16 (RDS)	Subscribers, billing, invoices, plans — with RLS policies
Session Store	Redis 7 (ElastiCache)	Conversation history, auth state, TTL-based expiry
Vector Store	pgvector extension	FAQ embeddings, policy documents for RAG retrieval
Audit Log	PostgreSQL (append-only)	Immutable log of all tool invocations and data access
Rate Limit	Redis (Sliding Window)	Per-subscriber request counting with Bucket4j

Security Architecture

Defense-in-depth approach ensuring zero data leakage, identity verification, and business-aligned responses.

🔒 Layer 1 — Identity Verification Gate

Phone Number InputCustomer provides number

OTP via SMS6-digit, 5min TTL

JWT Issuancesubscriber_id claim

Session BindingRedis: session → sub_id

▼ Only verified users proceed

🛡️ Layer 2 — Pre-LLM Guardrails

Input SanitizerXSS, injection prevention

Topic ClassifierOn-topic vs off-topic

Jailbreak DetectorPrompt injection filter

Context BoundaryMax history tokens

▼ Clean, validated input

🤖 Layer 3 — LLM Execution with Constrained Tools

System PromptRole + domain constraints

Tool WhitelistingOnly registered MCP tools

Param Injectionsubscriber_id from JWT only

Token BudgetMax output tokens enforced

▼ LLM response + tool results

✅ Layer 4 — Post-LLM Validation

PII RedactionMask sensitive data in logs

Hallucination CheckVerify tool output matches response

Business AlignmentDomain vocabulary check

Audit LoggerImmutable access log

▼ Safe, verified response to customer

💾 Layer 5 — Database Level Enforcement

RLS PolicyWHERE subscriber_id = current_setting('app.current_subscriber_id')

Column Encryptionpgcrypto for PII columns

Connection PoolingPgBouncer with SET on checkout

⚠️ Critical Security Invariant

The subscriber_id used in MCP tool calls is NEVER taken from the user's message or LLM output. It is always injected server-side from the verified JWT token. This prevents any attempt by a user (or hallucinating LLM) to access another subscriber's data.

Request Data Flow

End-to-end sequence for a customer asking "What is my current bill?"

Customer

"What is my current bill?" → Chat Widget / Voice

API Gateway

Validates JWT token → Extracts subscriber_id from claims → Rate limit check

Chat Controller

Receives request → Loads session from Redis → Appends to conversation history

Guardrails (Pre)

Input sanitization → Topic classification (billing = ✅ on-topic) → No jailbreak detected

LLM Orchestrator

Builds prompt: system instructions + conversation history + available MCP tools → Sends to vLLM

LLM (Mistral-7B)

Generates tool call: get_current_bill(subscriber_id) — note: subscriber_id is injected server-side

MCP Tool Server

Executes get_current_bill → Sets RLS context → Queries PostgreSQL → Returns: ₹599, due Mar 15

LLM (Round 2)

Receives tool result → Generates natural response: "Your current bill is ₹599, due on March 15th."

Guardrails (Post)

Verifies response matches tool output → No PII leakage → Business-aligned ✅

Audit Logger

Logs: tool=get_current_bill, subscriber=SUB_XXXX, timestamp, result_hash (no raw PII in logs)

Customer

← "Your current bill is ₹599, due on March 15th. Would you like a payment link?"

Voice Flow Addition

Customer

Speaks: "Mera current bill kitna hai?" → Audio via WebSocket (Opus 16kHz)

Voice Controller

Buffers audio frames → Sends to Faster-Whisper → Receives transcript + detected language: Hindi

AI Orchestrator

Same flow as text chat → LLM responds in Hindi: "Aapka current bill ₹599 hai, due date 15 March"

Piper TTS

Synthesizes Hindi audio → Streams PCM chunks via WebSocket back to client

Technology Stack

Production-grade, open-source-first technology choices with AWS managed services for operations.

Application Layer

Category	Technology	Version
Runtime	Java 21 (GraalVM)	21 LTS
Framework	Spring Boot	3.4.x
AI Framework	LangChain4j + Spring AI	1.0.x
MCP SDK	Spring MCP (spring-ai-mcp)	1.0.x
WebSocket	Spring WebFlux	6.2.x
Security	Spring Security + OAuth2	6.4.x
DB Access	Spring Data JPA + jOOQ	3.19.x
Cache Client	Lettuce (Spring Data Redis)	6.4.x
Rate Limiting	Bucket4j + Redis	8.x
Resilience	Resilience4j	2.2.x
API Docs	SpringDoc OpenAPI	2.7.x

AI / ML Layer

Category	Technology	Details
LLM	Mistral-7B-Instruct / LLaMA-3-8B	Open-source, fine-tunable
LLM Serving	vLLM / Ollama	OpenAI-compatible API
ASR	Faster-Whisper (large-v3)	CTranslate2 optimized
TTS	Piper TTS / Coqui TTS	Neural, multilingual
Embeddings	all-MiniLM-L6-v2	For RAG vector search
Vector DB	pgvector (PostgreSQL ext)	Embedded in primary DB
Guardrails	Custom Java + NeMo Guardrails	Pre/post LLM filters

Infrastructure

Service	AWS / Open Source
Container Orchestration	AWS EKS (Kubernetes)
Database	Amazon RDS PostgreSQL 16
Cache	Amazon ElastiCache (Redis 7)
Queue	Amazon SQS / RabbitMQ
Storage	Amazon S3 (invoices/audio)
Load Balancer	AWS ALB + NLB for WebSocket
GPU Instances	AWS g5.xlarge (A10G GPU)
IaC	Terraform + Helm Charts
CI/CD	GitHub Actions + ArgoCD

PoC Step-by-Step Guide

Prototype implementation roadmap — from database to working chatbot.

Step 1 Database Schema with RLS

Create PostgreSQL tables: subscribers, billing, invoices, plans. Enable Row-Level Security policies that scope all queries by subscriber_id set at connection time via SET app.current_subscriber_id.

CREATE TABLE billing (
  id UUID DEFAULT gen_random_uuid(),
  subscriber_id UUID NOT NULL REFERENCES subscribers(id),
  amount DECIMAL(10,2), due_date DATE, status VARCHAR(20)
);
ALTER TABLE billing ENABLE ROW LEVEL SECURITY;
CREATE POLICY billing_isolation ON billing
  USING (subscriber_id = current_setting('app.current_subscriber_id')::UUID);

Step 2 Spring Boot Project Setup

Initialize Spring Boot 3.4 project with dependencies: spring-ai, langchain4j, spring-data-jpa, spring-security, spring-websocket, spring-data-redis. Configure virtual threads for high concurrency.

@SpringBootApplication
public class TelecomAiBotApplication {
    public static void main(String[] args) {
        SpringApplication.run(TelecomAiBotApplication.class, args);
    }
}
// application.yml: spring.threads.virtual.enabled=true

Step 3 MCP Tool Server Implementation

Build MCP-compliant tool server with @Tool annotations. Each tool maps to a business operation. The server exposes tool schemas via MCP discovery and handles JSON-RPC invocations.

@McpTool("get_current_bill")
public BillResponse getCurrentBill(
    @ToolParam(description="Subscriber ID") UUID subscriberId
) {
    rlsContextSetter.setSubscriber(subscriberId); // Set RLS context
    return billingRepo.findCurrentBill(subscriberId);
}

Step 4 AI Orchestrator with LangChain4j

Connect to vLLM-hosted Mistral model. Configure the agent with MCP tools, system prompt with telecom domain constraints, and conversation memory backed by Redis.

var agent = AiServices.builder(TelecomAssistant.class)
    .chatLanguageModel(openAiModel) // vLLM with OpenAI-compatible API
    .tools(billingTools, accountTools, planTools)
    .chatMemoryProvider(id -> new RedisChatMemory(id, Duration.ofMinutes(30)))
    .build();

Step 5 Guardrails & Security Pipeline

Implement pre-LLM filters (input validation, topic classification, jailbreak detection) and post-LLM validators (PII redaction, response-tool consistency check, business alignment).

Step 6 Voice Pipeline Integration

Add WebSocket endpoint for audio streaming. Integrate Faster-Whisper for real-time speech-to-text and Piper TTS for response synthesis. Share the same AI orchestrator for text and voice.

Step 7 Authentication Flow (OTP)

Build OTP-based verification: customer provides phone number → bot sends OTP → customer confirms → JWT issued with subscriber_id → all subsequent tool calls use this verified identity.

Step 8 Docker Compose & Testing

Containerize all services. Docker Compose for local dev: PostgreSQL, Redis, vLLM/Ollama, Whisper, Piper, Spring Boot app. Integration tests verify data isolation across subscribers.

Telecom AI Bot System Designv1.0

System Overview

🎙️ Voice + Text Bot

🔌 MCP Tool Server

🛡️ Guardrails Engine

📊 Data Isolation

⚡ Scalable Infrastructure

📈 Observability

Key Design Principles

Zero-Trust Data Access

Context-Aware Conversations

Business-Aligned Responses

Non-Functional Targets

Application Architecture

Component Catalog

Security Architecture

⚠️ Critical Security Invariant

Request Data Flow

Voice Flow Addition

Technology Stack

Application Layer

AI / ML Layer

Infrastructure

PoC Step-by-Step Guide

Step 1 Database Schema with RLS

Step 2 Spring Boot Project Setup

Step 3 MCP Tool Server Implementation

Step 4 AI Orchestrator with LangChain4j

Step 5 Guardrails & Security Pipeline

Step 6 Voice Pipeline Integration

Step 7 Authentication Flow (OTP)

Step 8 Docker Compose & Testing