read_graph when asked to delete_relationsI pointed one CLI at the official MCP reference servers and gave each a grade. Not a security audit — this asks the two questions nobody scores: do the tools actually work, and can an agent use them?
Every grade is the weighted blend of the layers that ran: L1 static (schema quality), L2 behavioral (does it run without crashing), L3 agent-usability (can an LLM pick the right tool). Read-only, run locally.
read_graph when asked to delete_relationsechoread_file — a tool that doesn't exist. The real one is read_text_file. Three tools tripped the model this way.L3 hands an LLM the tool catalog and a realistic task, then checks which tool it picks — without ever calling the server. A wrong pick is a description-clarity signal: the schema is technically valid but reads ambiguously to an agent.
mcp-vitals grades a server the way you'd grade a dependency you're about to trust — from static hygiene up to how it behaves under an agent and an attacker.
Schema quality: descriptions, typed & documented params, naming, required fields.
Generates inputs from the schema, calls read-only tools, watches for crashes vs graceful errors + latency.
An LLM picks a tool for a task. Measures selection accuracy & argument validity. Never calls the server.
Scans descriptions & prompts for injection, concealment and over-permission patterns.
Transport security, type coverage, graceful-error rate under load.
# install from the public repo — any stdio or http server pipx install git+https://github.com/enached134-ctrl/mcp-vitals mcpvitals grade "npx -y @modelcontextprotocol/server-memory" \ --behavioral --agent --min-grade B # → report.html + score.json, exits non-zero below the gate
Drop the composite Action into a workflow and fail the build when a server regresses below a grade:
- uses: enached134-ctrl/mcp-vitals@v1 with: target: "npx -y your-server" min-grade: B