I built VerdictMail as a homelab project to explore whether combining classical email authentication signals with LLM reasoning produces better threat classification than either approach alone.
It runs as a daemon on Ubuntu, monitors a Gmail inbox via IMAP IDLE, and processes every incoming message through a multi-stage pipeline:
RFC 822 parse → whitelist check → enrich → AI analyze → decide → act → audit
The enrichment stage collects SPF/DKIM/DMARC results, DNSBL reputation,
WHOIS domain age, display-name spoofing detection, and URL expansion — then
packages all of that as structured context in a prompt to the LLM. The model
returns a JSON threat assessment with a confidence score and reasoning chain,
which the decision engine maps to one of three IMAP actions: pass, flag
($VerdictMail-Suspect keyword), or move to Junk.
Supports OpenAI, Anthropic, or a local Ollama instance. I run it with qwen2.5-coder:7b locally for privacy. Full pipeline typically completes in under 15 seconds with a cloud provider.
Includes a Flask web UI with dashboard, paginated audit log, whitelist management, in-browser YAML config editor, and a manual test page where you can paste a raw email and dry-run the full pipeline.
Runs as two systemd services. Tested on Proxmox LXC (unprivileged). MIT licensed.
Happy to discuss the enrichment → LLM prompt design, the confidence threshold tuning, or the LXC-specific stop/pause behavior if anyone is curious.