---
name: skill-quality-audit
version: 1.0.0
last_validated: 2026-05-27
model_compatibility: [fast, balanced, smart]
code_first: true
display_name: Skill Quality Audit
description: "Audit a skill against publishing quality rubrics and produce a gap-to-golden scorecard. Use when asked to 'audit my skill', 'review skill quality', 'is this skill golden', 'check skill against rubrics', 'skill quality gate', 'publishing readiness check', or 'IOTHS check on this skill'."
icon: "🔍"
trigger: audit my skill
inputs:
  - name: skill_name
    description: "Name of the skill to audit (must be a saved skill in ~/.quickwork/profiles/*/skills/)"
    type: string
    required: true
tools: [file_read, run_python, open_in_session_tab, file_write]
---

## Overview

Reads a saved SKILL.md, evaluates it against 5 self-contained quality rubrics, and produces a structured scorecard showing pass/fail per criterion with an actionable fix list. Designed to work on any vanilla Quick Desktop install with zero external dependencies — all rubric criteria are inlined below.

This is the quality gate for Insist on the Highest Standards (IOTHS). A skill is "golden" only when it scores 100% across all applicable rubrics.

## Workflow

### Step 1: Load the skill
- **Mode**: `deterministic`
- **Tool**: `file_read`
- **Input**: Read `~/.quickwork/profiles/federate-prod/skills/{{skill_name}}/SKILL.md`
- **Output**: Full SKILL.md content (frontmatter + body)
- **Validate**: File exists and contains `---` frontmatter delimiters
- **On failure**: If file not found, try alternate profile paths or ask user for the correct skill name.

### Step 2: Parse and audit against all rubrics
- **Mode**: `deterministic`
- **Tool**: `run_python`
- **Input**: SKILL.md content from Step 1
- **Output**: Structured audit results — per-rubric, per-criterion pass/fail with justification

Apply all 5 rubrics defined in the Rubrics section below. For each criterion:
- ✅ = fully met
- ⚠️ = partially met (explain what's missing)
- ❌ = not met (explain + provide specific fix)

### Step 3: Second-order analysis
- **Mode**: `agentic`
- **Input**: SKILL.md content + audit results from Step 2
- **Output**: Critical analysis section covering:
  - What's NOT in the skill that should be?
  - What assumptions are fragile?
  - What failure modes aren't covered?
  - What would a devil's advocate say about real-world reliability?
  - What unintended consequences could the workflow create?

This is where the "between the lines" analysis lives — not just checking boxes, but thinking about whether the skill would actually work reliably across different contexts.

### Step 4: Produce scorecard and fix list
- **Mode**: `agentic`
- **Input**: All audit results + second-order analysis
- **Output**: Formatted report with scorecard table + numbered fix list

Present as:
1. Scorecard table: | Rubric | Score (X/Y) | % |
2. Overall verdict: "🏆 GOLDEN" or "❌ NOT YET GOLDEN"
3. Numbered fix list (ordered by impact, highest first)
4. Second-order findings (what's between the lines)

### Step 5: Offer to patch
- **Mode**: `agentic`
- **Input**: Fix list from Step 4
- **Output**: Decision card offering to apply fixes automatically

If gaps exist, offer to patch them (add missing frontmatter fields, generate evals, etc.)

## Integration: Auto-chaining with skill-authoring

This skill is designed to run AFTER the built-in `skill-authoring` or `skill-improvement` skills complete. To make the quality gate automatic:

**For the skill author (self-enforcing):**
After saving any skill, say: "audit my skill [name]" — or ask Quick to "always audit after saving a skill" and it will learn this as a preference.

**For team-wide adoption (marketplace publish):**
1. Publish this skill to the Quick Suite Skills Marketplace
2. Others install it and get the quality gate on-demand
3. Propose to the Quick team (#quick-desktop-feedback) that `skill-authoring` Step 5 should auto-load this skill post-save

**For full CI automation (scheduled agent):**
Create a scheduled agent that watches `~/.quickwork/profiles/*/skills/` for file changes and auto-runs this audit when a SKILL.md is modified. Posts results to the activity feed. See the agent_management skill for setup.

**Dependency resolution**: This skill has ZERO external dependencies. It works on any vanilla Quick Desktop install. No memory, no custom MCP, no local files beyond the skill being audited. Install and go.

## Rubrics

All criteria are self-contained here. No external files needed.

### Rubric 1: Structure (Quick Desktop native format)

| # | Criterion | How to check |
|---|---|---|
| 1.1 | Frontmatter has `name:` | Present in YAML header |
| 1.2 | Frontmatter has `display_name:` | Present |
| 1.3 | Frontmatter has `description:` (pushy, trigger-rich) | Present, >50 chars, includes trigger phrases |
| 1.4 | Frontmatter has `icon:` | Present |
| 1.5 | Frontmatter has `trigger:` | Present |
| 1.6 | Frontmatter has `inputs:` with descriptions | Each input has a `description` field |
| 1.7 | Frontmatter has `tools:` | Present, non-empty list |
| 1.8 | `## Overview` section exists | 2-4 sentences explaining what + when |
| 1.9 | `## Workflow` section with structured steps | Each step has Mode, Tool (if deterministic), Input, Output, Validate, On failure |
| 1.10 | `## Output` section exists | Describes the deliverable |
| 1.11 | `## Lessons Learned` with 4 subsections | Do, Don't, Common Failures, When to Ask the User |
| 1.12 | Under 500 lines | Count lines |
| 1.13 | `{{input_name}}` placeholders match inputs list | Every input is referenced at least once |
| 1.14 | Tools in workflow steps match `tools:` list | No phantom tools |

### Rubric 2: Publishing Quality Bar

| # | Criterion | How to check |
|---|---|---|
| 2.1 | `version:` field in frontmatter | SemVer format (X.Y.Z) |
| 2.2 | `last_validated:` field | ISO date (YYYY-MM-DD) |
| 2.3 | `model_compatibility:` field | Array of model tiers |
| 2.4 | `code_first: true` declaration | Present if skill uses run_python for deterministic logic |
| 2.5 | Evals exist | `evals/evals.json` file in skill directory with ≥3 happy-path + ≥1 edge case |
| 2.6 | Security: no hardcoded credentials | Scan for API keys, tokens, passwords |
| 2.7 | Security: write actions gated | All destructive ops require user confirmation |
| 2.8 | Security: scoped permissions | Tools are minimum necessary, no broad access |
| 2.9 | No hardcoded personal names | Scan for specific person names in examples |

### Rubric 3: Design Principles

| # | Criterion | How to check |
|---|---|---|
| 3.1 | One skill, one job | Skill has a single atomic purpose |
| 3.2 | Empty state safe | Every step handles "no data" gracefully (On failure exists) |
| 3.3 | Dependencies declared | All tools listed, no phantom references |
| 3.4 | Code > AI | Deterministic logic in run_python, LLM only for UX/judgment |
| 3.5 | Error messages human + actionable | On failure text describes what the user can do |
| 3.6 | No invented terminology | Uses standard industry/Amazon terms |
| 3.7 | Audience-first (Working Backwards) | Description and Overview written for the intended user |

### Rubric 4: Writing Quality

| # | Criterion | How to check |
|---|---|---|
| 4.1 | No AI tells | Scan for: "delve", "leverage", "tapestry", "it's important to note", "in conclusion", "comprehensive", "robust", "seamless", excessive hedging |
| 4.2 | Direct imperative voice | Instructions use "Do X" not "You should consider doing X" |
| 4.3 | Rationale over rules | Steps explain *why*, not just ALWAYS/NEVER directives |
| 4.4 | Concise | No filler sentences, no repetition between sections |

### Rubric 5: Portability

| # | Criterion | How to check |
|---|---|---|
| 5.1 | No references to local file paths | Scan for ~/*, /Users/*, absolute paths |
| 5.2 | No references to specific user aliases | Scan for @username patterns in logic |
| 5.3 | Works without user memory | Skill doesn't depend on learned context for core logic |
| 5.4 | Tools are standard Quick Desktop tools | No custom MCP or user-specific connectors required |

## Output

A structured audit report with:
1. Per-rubric scorecard (criterion-level pass/fail)
2. Overall percentage and golden/not-golden verdict
3. Numbered fix list (highest impact first)
4. Second-order analysis (what's between the lines)
5. Offer to auto-patch gaps

## Lessons Learned

### Do
- Read the actual SKILL.md content — don't guess from the name
- Count criteria precisely — the score should be reproducible
- Be specific in fixes: "Add `version: 1.0.0` to line 2 of frontmatter" not "add version field"
- Apply second-order thinking after the mechanical audit — that's where the real value is
- Check evals directory existence, not just frontmatter

### Don't
- Don't invent criteria not listed above — the rubric is the rubric
- Don't pass a skill that has ❌ on any security criterion (2.6, 2.7, 2.8) — these are hard blockers
- Don't flag personal names in Lessons Learned "Don't" examples if they illustrate a real failure case
- Don't require criteria from Rubric 5 (Portability) if the skill is explicitly personal-use-only

### Common Failures
- Skill uses run_python but doesn't declare `code_first: true`
- Steps say "Mode: deterministic" but don't specify which tool
- Evals directory exists but has fewer than 4 test cases
- Description field is generic ("Does X") instead of pushy ("Use when Y, triggers: Z")
- AI tells scanner flags the rubric's own banned-word list as violations (false positive)
- Credential scanner flags pagination tokens like "startToken" — fix: use word-boundary matching

### When to Ask the User
- When unclear whether skill is intended for personal use vs. external publishing (Rubric 5 applicability)
- When a criterion is borderline (e.g., "is this name hardcoded or illustrative?")
- Before auto-patching: confirm user wants changes applied