Spec27: Agent Validation Without Building Test Infrastructure
I've watched too many AI agent demos work perfectly on stage and fall apart the next day. A model update, a prompt tweak, a vendor API change — suddenly the agent that reliably booked meetings is double-booking or sending calendar invites to the wrong person. Spec27 attacks this problem at the right level: you write a spec for what the agent should do, and Spec27 checks whether it actually does it.
The core insight is smart. Instead of instrumenting the agent's code (which might be a vendor black box you can't touch), Spec27 interacts with the agent from the outside and compares behavior against a specification. If your agent is supposed to respond to "reschedule my 3pm to Thursday" with a confirmation and a calendar update, Spec27 sends that input and verifies the output matches. No test infrastructure to build, no mocking frameworks, no brittle integration tests that break on every UI change.
Where it falls short: the product is new and the documentation reflects that. I'd want to see more example specs, more agent types covered (right now it's strongest for text-based agents), and public pricing before I'd commit a team budget. The concept is dead-on — agent testing is a real category that barely exists — but execution maturity is early. I'd recommend it for teams already shipping agents who feel the pain of manual verification, not for people still prototyping their first agent.
If you're shipping agent-based features and manually spot-checking behavior, Spec27 replaces a tedious process. If you're still figuring out whether agents are right for your product, wait 6 months for the docs and examples to catch up.

