CRM Purple Agent — Berkeley RDI AgentX–AgentBeats Phase 2

CRM Purple Agent

Building a robust CRM agent is harder than it looks—real-world deployments face schema drift (column names that silently change), context rot (stale notes polluting task context), and the sheer breadth of enterprise CRM workflows. CRM Purple Agent is our answer: a schema-adaptive, adversarially robust agent competing in the Berkeley RDI AgentX–AgentBeats Phase 2 competition.

Evaluated by the Entropic CRMArena green agent across 2,140 CRM tasks spanning 22 categories, the purple agent must handle everything from lead qualification and sales analytics to knowledge base QA and case routing—all while the evaluation environment actively mutates schema column names and injects irrelevant context into task payloads.

Most CRM agents fail under adversarial conditions because they hard-code schema assumptions. Our agent detects drift at runtime, maps mutated column names back to canonical ones, and strips rotted context before ever touching an LLM—keeping accuracy high and token costs low.

2,140 CRM Tasks
22 Task Categories
5 Pipeline Layers
8 CRM Schema Tables

How does it work?

The agent implements a 5-layer pipeline designed to fail gracefully under adversarial conditions. Each layer handles a distinct failure mode seen in real CRM deployments, from privacy violations to hallucinated SQL. Tasks arrive via the A2A (Agent-to-Agent) JSON-RPC protocol and flow through the pipeline before a structured answer is returned.

Purple Agent

Adversarial Challenges

  • ✗ Schema Drift — column names mutate
  • ✗ Context Rot — stale notes injected
  • ✗ Privacy traps — PII extraction attempts
  • ✗ Ambiguous task intent
  • ✗ SQL hallucinations on drifted schema
  • ✗ 22-category task distribution
Pipeline

Pipeline Defenses

  • ✓ Rule-based privacy rejection (0 LLM calls)
  • ✓ Runtime drift mapping (fuzzy + hardcoded)
  • ✓ Rot note stripping + heuristic filtering
  • ✓ 22 category-specific prompt templates
  • ✓ Hallucination grounding in synthesizer
  • ✓ Max-2-retry error recovery

The Privacy Guard runs first with zero LLM calls, instantly rejecting tasks in three private categories. Surviving tasks then hit Schema Introspector and Context Filter in parallel before the Task Planner classifies intent. The SQL Generator uses category-specific prompts with the corrected schema, and the Answer Synthesizer grounds the final output against real database results to prevent hallucination.

Pipeline Architecture

  • Privacy Guard — Rule-based, zero LLM calls — instantly rejects 3 private task categories before any processing
  • L1 · Schema Introspector — Detects drifted column names and maps them back to the canonical CRM schema (8 tables, 6 relationships) using hardcoded rules + fuzzy fallback
  • L1 · Context Filter — Strips rot noise from task context; heuristic relevance filtering to reduce irrelevant tokens before SQL generation
  • L2 · Task Planner — Classifies the task into one of 22 categories: exact_query_match, semantic_retrieval, or privacy_rejection
  • L3 · SQL Generator — Category-specific prompt templates with schema-aware LLM reasoning using Claude Sonnet 4 as the primary model
  • L4 · Answer Synthesizer — Cleans output, formats multi-value answers, and grounds responses against actual query results to block hallucinations
  • L5 · Error Recovery — Up to 2 retries with schema re-introspection on failure; graceful degradation to a safe fallback answer

Quick Start

# Clone and install
git clone https://github.com/MadGAA-Lab/CRM-Agent-Phase2_dev.git
cd CRM-Agent-Phase2_dev
uv sync

# Run the agent locally (requires at least OPENAI_PRIMARY_API_KEY)
OPENAI_PRIMARY_API_KEY=sk-... uv run src/server.py

# Claude as primary, Nebius as cheap tier
OPENAI_PRIMARY_API_KEY=sk-ant-... LLM_PRIMARY_BASE_URL=https://api.anthropic.com/v1 LLM_PRIMARY_MODEL=claude-sonnet-4-6 \
  OPENAI_CHEAP_API_KEY=<nebius-key> LLM_CHEAP_BASE_URL=https://api.studio.nebius.com/v1 \
  uv run src/server.py

# Or run with Docker
docker build -t crm-purple-agent .
docker run -p 9009:9009 -e OPENAI_PRIMARY_API_KEY=sk-... crm-purple-agent

# Run unit tests
uv run pytest tests/ --ignore=tests/test_agent.py -v

Environment Variables

Two independent tiers — primary (expensive) and cheap — each backed by any OpenAI-compatible provider. Configure each tier with a key and an optional base URL; no provider is hard-coded.

  • OPENAI_PRIMARY_API_KEY — Key for the primary provider — required
  • LLM_PRIMARY_BASE_URL — Base URL override for the primary provider (default: OpenAI). E.g. https://api.anthropic.com/v1
  • LLM_PRIMARY_MODEL — Primary model name (default: claude-sonnet-4-6)
  • OPENAI_CHEAP_API_KEY — Key for the cheap provider — optional
  • LLM_CHEAP_BASE_URL — Base URL override for the cheap provider. E.g. https://api.studio.nebius.com/v1
  • LLM_CHEAP_MODEL — Cheap model name (default: claude-haiku-4-5)

Citation

If you use CRM Purple Agent in your research, please cite:

@software{crm_purple_agent,
  title = {CRM Purple Agent: Schema-Adaptive CRM Agent for AgentX--AgentBeats Phase 2},
  author = {MadGAA-Lab},
  year = {2026},
  url = {https://github.com/MadGAA-Lab/CRM-Agent-Phase2_dev},
  note = {Competing agent for Berkeley RDI AgentX--AgentBeats Phase 2 competition}
}