I run aiagents.directory and got tired of manually curating agents. Built a pipeline to automate it:
1. Sourcing: Firecrawl Search API + LLM extraction to find agents mentioned in blog posts and list articles, not just homepage links. Filters junk via blocklists and deduplication.
2. Enrichment: Scrapes each site for features, pricing, screenshots, logos. Handles aggregator pages (ProductHunt, YC) by extracting the actual product URL.
3. Review: Pydantic AI agent (GPT-powered) classifies submissions. Returns confidence scores. High confidence auto-applies, low confidence flags for manual review.
I still trigger it manually and review before publishing — want to make sure output quality is solid before I flip it to full automation. Code's up if the approach is useful to anyone building directories or curation pipelines. Questions/feedback welcome