Skills Pattern

A filesystem-based approach to tool management that achieves 98%+ token savings while maintaining accurate tool selection. Pioneered by Anthropic for Claude Code.

The Problem: Context Bloat from Tools

Traditional function calling requires sending all tool definitions with every request. This creates a critical scaling problem:

50 tools x 3,000 tokens/tool = 150,000 tokens/request

The Skills Pattern solves this by treating tools as files on disk that are loaded on-demand, rather than static definitions passed with every API call.

Key Insight

Instead of stuffing all tool definitions into context, give the agent access to a skills/ directory. The agent reads skill files as needed, just like a developer reads documentation.

Three Pillars of the Skills Pattern

Skills Pattern Architecture
┌─────────────────────────────────────────────────────────────┐
│                     Skills Pattern                          │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. FILESYSTEM AS TOOL STORAGE                              │
│     skills/                                                 │
│     ├── web-search/SKILL.md                                 │
│     ├── code-review/SKILL.md                                │
│     └── data-analysis/SKILL.md                              │
│                                                             │
│  2. PROGRESSIVE DISCLOSURE                                  │
│     ┌──────────┐   ┌───────────────┐   ┌──────────────┐    │
│     │ Metadata │ → │ Instructions  │ → │  Examples    │    │
│     │ ~50 tok  │   │ ~1000 tok     │   │ ~2000 tok    │    │
│     └──────────┘   └───────────────┘   └──────────────┘    │
│         ↑                  ↑                  ↑             │
│      Always            On select          If complex        │
│                                                             │
│  3. DATABASE-BACKED DISCOVERY (Optional)                    │
│     ┌─────────────┐                                         │
│     │ Vector DB   │  ← Embed skill descriptions             │
│     │ (Chroma,    │  ← Semantic search for relevance        │
│     │  Qdrant)    │  ← Skip metadata scanning               │
│     └─────────────┘                                         │
│                                                             │
└─────────────────────────────────────────────────────────────┘

1. Filesystem as Tool Storage

Skills are organized as directories, each containing a SKILL.md file with metadata and instructions. The agent can list, read, and navigate these files using standard filesystem tools.

Skill Directory Structure
skills/
├── web-search/
│   ├── SKILL.md           # Metadata + instructions
│   ├── examples/          # Few-shot examples
│   └── resources/         # Additional context
├── code-review/
│   ├── SKILL.md
│   ├── examples/
│   └── templates/
├── data-analysis/
│   ├── SKILL.md
│   ├── examples/
│   └── schemas/
└── email-composer/
    ├── SKILL.md
    ├── examples/
    └── templates/
from pathlib import Path
from dataclasses import dataclass
import yaml

@dataclass
class SkillMetadata:
    name: str
    description: str
    triggers: list[str]
    tools_required: list[str]

def discover_skills(skills_dir: Path) -> dict[str, SkillMetadata]:
    """Scan filesystem to discover available skills."""
    skills = {}

    for skill_path in skills_dir.iterdir():
        if not skill_path.is_dir():
            continue

        skill_file = skill_path / "SKILL.md"
        if not skill_file.exists():
            continue

        # Parse SKILL.md frontmatter
        content = skill_file.read_text()
        metadata = parse_skill_frontmatter(content)

        skills[skill_path.name] = SkillMetadata(
            name=metadata.get("name", skill_path.name),
            description=metadata.get("description", ""),
            triggers=metadata.get("triggers", []),
            tools_required=metadata.get("tools", [])
        )

    return skills

def parse_skill_frontmatter(content: str) -> dict:
    """Extract YAML frontmatter from SKILL.md."""
    if not content.startswith("---"):
        return {}

    end_idx = content.find("---", 3)
    if end_idx == -1:
        return {}

    frontmatter = content[3:end_idx].strip()
    return yaml.safe_load(frontmatter)
using System.IO;
using YamlDotNet.Serialization;

public record SkillMetadata(
    string Name,
    string Description,
    List<string> Triggers,
    List<string> ToolsRequired
);

public class SkillDiscoveryService
{
    private readonly string _skillsDirectory;
    private readonly IDeserializer _yamlDeserializer;

    public SkillDiscoveryService(string skillsDirectory)
    {
        _skillsDirectory = skillsDirectory;
        _yamlDeserializer = new DeserializerBuilder().Build();
    }

    public Dictionary<string, SkillMetadata> DiscoverSkills()
    {
        var skills = new Dictionary<string, SkillMetadata>();

        foreach (var skillDir in Directory.GetDirectories(_skillsDirectory))
        {
            var skillFile = Path.Combine(skillDir, "SKILL.md");
            if (!File.Exists(skillFile))
                continue;

            var content = File.ReadAllText(skillFile);
            var metadata = ParseSkillFrontmatter(content);
            var skillName = Path.GetFileName(skillDir);

            skills[skillName] = new SkillMetadata(
                Name: metadata.GetValueOrDefault("name", skillName),
                Description: metadata.GetValueOrDefault("description", ""),
                Triggers: metadata.GetValueOrDefault("triggers", new List<string>()),
                ToolsRequired: metadata.GetValueOrDefault("tools", new List<string>())
            );
        }

        return skills;
    }

    private Dictionary<string, dynamic> ParseSkillFrontmatter(string content)
    {
        if (!content.StartsWith("---"))
            return new Dictionary<string, dynamic>();

        var endIdx = content.IndexOf("---", 3);
        if (endIdx == -1)
            return new Dictionary<string, dynamic>();

        var frontmatter = content.Substring(3, endIdx - 3).Trim();
        return _yamlDeserializer.Deserialize<Dictionary<string, dynamic>>(frontmatter);
    }
}

SKILL.md Format

Each skill's SKILL.md file contains YAML frontmatter (metadata) and markdown body (instructions):

Example SKILL.md Files
---
name: web-search
description: Search the web for current information and facts
version: 1.0.0
author: agent-team

# Triggers help the agent decide when to use this skill
triggers:
  - "search for"
  - "find information about"
  - "what is the latest"
  - "current news on"
  - "look up"

# Tools this skill requires
tools:
  - web_search
  - url_fetch
  - extract_content

# Token cost estimate (for prioritization)
token_estimate: 1200

# Categories for organization
categories:
  - research
  - information-retrieval
---

# Web Search Skill

## Purpose
Use this skill when the user needs current, real-time information
that may not be in your training data.

## When to Use
- Questions about recent events (after your knowledge cutoff)
- Fact-checking claims that need verification
- Finding specific data points (prices, statistics, etc.)
- Research tasks requiring multiple sources

## When NOT to Use
- Questions you can answer from training data
- Hypothetical or opinion-based questions
- Creative writing tasks

## Instructions
1. Formulate a clear search query
2. Execute the web search tool
3. Evaluate result relevance (discard low-quality sources)
4. Fetch full content from top 2-3 results
5. Synthesize information with citations

## Example Interaction
User: "What's the current price of Bitcoin?"
Agent: [Uses web_search with query "bitcoin price USD current"]
Agent: [Fetches top result, extracts price data]
Response: "As of [timestamp], Bitcoin is trading at $XX,XXX USD."
---
name: code-review
description: Review code for bugs, security issues, and improvements
version: 2.1.0

triggers:
  - "review this code"
  - "check for bugs"
  - "security audit"
  - "improve this function"

tools:
  - read_file
  - ast_parse
  - static_analysis

token_estimate: 2500

categories:
  - development
  - security
---

# Code Review Skill

## Purpose
Provide thorough code reviews focusing on correctness,
security, performance, and maintainability.

## Review Checklist
1. **Correctness**: Does the code do what it's supposed to?
2. **Security**: Are there injection, XSS, or auth issues?
3. **Performance**: Any obvious bottlenecks or N+1 queries?
4. **Readability**: Is the code self-documenting?
5. **Testing**: Are edge cases covered?

## Output Format
```
## Summary
[One-line summary of code quality]

## Issues Found
- [SEVERITY] file:line - Description

## Recommendations
- [Priority] Suggestion for improvement
```

## Language-Specific Checks
### Python
- Type hints present and correct
- No mutable default arguments
- Context managers for resources

### JavaScript
- No `var` (use `const`/`let`)
- Proper async/await error handling
- No prototype pollution risks

Best Practice

Include triggers in your skill metadata. These are phrases that help the agent quickly match user requests to relevant skills without reading the full instructions.

2. Progressive Disclosure

Not all skill information is needed for every request. Progressive disclosure loads context in stages:

Stage Content Tokens When Loaded
1. Metadata Name, description, triggers ~50/skill Always (for selection)
2. Instructions Full SKILL.md body ~500-1000 After skill selected
3. Resources Examples, templates, schemas Variable Only for complex tasks

Three stages of progressive skill loading

Progressive Disclosure Implementation
# Three-stage progressive disclosure

# Stage 1: Metadata only (minimal tokens)
function getSkillList():
    skills = []
    for skillDir in listDirectory("skills/"):
        metadata = parseYamlFrontmatter(skillDir + "/SKILL.md")
        skills.append({
            name: metadata.name,
            description: metadata.description,  # Short summary
            triggers: metadata.triggers
        })
    return skills  # ~50-100 tokens per skill

# Stage 2: Full instructions (on-demand)
function loadSkillInstructions(skillName):
    skillFile = "skills/" + skillName + "/SKILL.md"
    return parseMarkdownBody(skillFile)  # ~500-1000 tokens

# Stage 3: Examples and resources (if needed)
function loadSkillResources(skillName):
    examplesDir = "skills/" + skillName + "/examples/"
    resources = []
    for file in listDirectory(examplesDir):
        resources.append(readFile(examplesDir + file))
    return resources  # Variable, loaded only when needed

# Agent workflow
function handleRequest(userQuery):
    # Stage 1: Quick scan with metadata only
    skillList = getSkillList()
    selectedSkill = llm.selectBestSkill(userQuery, skillList)

    if selectedSkill:
        # Stage 2: Load full instructions
        instructions = loadSkillInstructions(selectedSkill)

        # Stage 3: Load examples only if complex task
        if taskIsComplex(userQuery):
            examples = loadSkillResources(selectedSkill)
            context = instructions + examples
        else:
            context = instructions

        return executeWithContext(userQuery, context)
from dataclasses import dataclass
from enum import Enum
from pathlib import Path

class DisclosureLevel(Enum):
    METADATA = 1    # Name, description, triggers (~50 tokens)
    INSTRUCTIONS = 2 # Full SKILL.md body (~500-1000 tokens)
    RESOURCES = 3    # Examples, templates (~variable)

@dataclass
class SkillContext:
    name: str
    level: DisclosureLevel
    content: str
    token_count: int

class ProgressiveSkillLoader:
    def __init__(self, skills_dir: Path):
        self.skills_dir = skills_dir
        self._metadata_cache: dict[str, dict] = {}

    def get_skill_list(self) -> list[dict]:
        """Stage 1: Return minimal metadata for all skills."""
        skills = []
        for skill_path in self.skills_dir.iterdir():
            if not skill_path.is_dir():
                continue

            metadata = self._load_metadata(skill_path.name)
            skills.append({
                "name": metadata["name"],
                "description": metadata["description"][:100],  # Truncate
                "triggers": metadata.get("triggers", [])[:5]   # Limit
            })
        return skills  # Minimal token footprint

    def load_instructions(self, skill_name: str) -> SkillContext:
        """Stage 2: Load full instructions on demand."""
        skill_file = self.skills_dir / skill_name / "SKILL.md"
        content = skill_file.read_text()

        # Extract body after frontmatter
        body = self._extract_body(content)

        return SkillContext(
            name=skill_name,
            level=DisclosureLevel.INSTRUCTIONS,
            content=body,
            token_count=self._estimate_tokens(body)
        )

    def load_resources(self, skill_name: str) -> SkillContext:
        """Stage 3: Load examples and additional resources."""
        examples_dir = self.skills_dir / skill_name / "examples"

        resources = []
        if examples_dir.exists():
            for example_file in examples_dir.iterdir():
                resources.append(example_file.read_text())

        combined = "\n---\n".join(resources)

        return SkillContext(
            name=skill_name,
            level=DisclosureLevel.RESOURCES,
            content=combined,
            token_count=self._estimate_tokens(combined)
        )

    def _load_metadata(self, skill_name: str) -> dict:
        if skill_name not in self._metadata_cache:
            skill_file = self.skills_dir / skill_name / "SKILL.md"
            content = skill_file.read_text()
            self._metadata_cache[skill_name] = parse_frontmatter(content)
        return self._metadata_cache[skill_name]

# Usage in agent
class SkillAwareAgent:
    def __init__(self, loader: ProgressiveSkillLoader):
        self.loader = loader

    def process(self, query: str) -> str:
        # Stage 1: Select skill from metadata
        skill_list = self.loader.get_skill_list()
        selected = self.llm.select_skill(query, skill_list)

        if not selected:
            return self.llm.respond_without_skill(query)

        # Stage 2: Load instructions
        context = self.loader.load_instructions(selected)

        # Stage 3: Load examples for complex tasks
        if self.is_complex_task(query):
            resources = self.loader.load_resources(selected)
            context.content += "\n\n" + resources.content

        return self.llm.respond_with_context(query, context.content)
public enum DisclosureLevel
{
    Metadata = 1,     // ~50 tokens per skill
    Instructions = 2, // ~500-1000 tokens
    Resources = 3     // Variable
}

public record SkillContext(
    string Name,
    DisclosureLevel Level,
    string Content,
    int TokenCount
);

public class ProgressiveSkillLoader
{
    private readonly string _skillsDir;
    private readonly Dictionary<string, Dictionary<string, object>> _metadataCache = new();

    public ProgressiveSkillLoader(string skillsDir)
    {
        _skillsDir = skillsDir;
    }

    /// <summary>
    /// Stage 1: Get minimal metadata for skill selection.
    /// </summary>
    public List<SkillSummary> GetSkillList()
    {
        var skills = new List<SkillSummary>();

        foreach (var skillDir in Directory.GetDirectories(_skillsDir))
        {
            var skillName = Path.GetFileName(skillDir);
            var metadata = LoadMetadata(skillName);

            skills.Add(new SkillSummary(
                Name: metadata.GetValueOrDefault("name", skillName)?.ToString() ?? skillName,
                Description: TruncateString(
                    metadata.GetValueOrDefault("description", "")?.ToString() ?? "",
                    100
                ),
                Triggers: GetTriggers(metadata).Take(5).ToList()
            ));
        }

        return skills;
    }

    /// <summary>
    /// Stage 2: Load full instructions for selected skill.
    /// </summary>
    public SkillContext LoadInstructions(string skillName)
    {
        var skillFile = Path.Combine(_skillsDir, skillName, "SKILL.md");
        var content = File.ReadAllText(skillFile);
        var body = ExtractBody(content);

        return new SkillContext(
            Name: skillName,
            Level: DisclosureLevel.Instructions,
            Content: body,
            TokenCount: EstimateTokens(body)
        );
    }

    /// <summary>
    /// Stage 3: Load examples and additional resources.
    /// </summary>
    public SkillContext LoadResources(string skillName)
    {
        var examplesDir = Path.Combine(_skillsDir, skillName, "examples");
        var resources = new List<string>();

        if (Directory.Exists(examplesDir))
        {
            foreach (var file in Directory.GetFiles(examplesDir))
            {
                resources.Add(File.ReadAllText(file));
            }
        }

        var combined = string.Join("\n---\n", resources);

        return new SkillContext(
            Name: skillName,
            Level: DisclosureLevel.Resources,
            Content: combined,
            TokenCount: EstimateTokens(combined)
        );
    }
}

// Agent usage
public class SkillAwareAgent
{
    private readonly ProgressiveSkillLoader _loader;
    private readonly ILanguageModel _llm;

    public async Task<string> ProcessAsync(string query)
    {
        // Stage 1: Quick skill selection
        var skillList = _loader.GetSkillList();
        var selected = await _llm.SelectSkillAsync(query, skillList);

        if (selected == null)
            return await _llm.RespondAsync(query);

        // Stage 2: Load instructions
        var context = _loader.LoadInstructions(selected);

        // Stage 3: Load examples if needed
        if (IsComplexTask(query))
        {
            var resources = _loader.LoadResources(selected);
            context = context with {
                Content = context.Content + "\n\n" + resources.Content
            };
        }

        return await _llm.RespondWithContextAsync(query, context.Content);
    }
}

Token Savings Analysis

Calculating Token Savings
# Traditional approach: All tools in context
# 50 tools × 3000 tokens/tool = 150,000 tokens per request

# Skills Pattern: Progressive disclosure
# Stage 1: Metadata only
#   50 skills × 50 tokens = 2,500 tokens
#
# Stage 2: Selected skill instructions
#   1 skill × 1,000 tokens = 1,000 tokens
#
# Stage 3: Examples (if needed)
#   1 skill × 2,000 tokens = 2,000 tokens
#
# Total: 2,500 + 1,000 + 2,000 = 5,500 tokens
#
# Savings: 150,000 - 5,500 = 144,500 tokens (96% reduction)

# Even better with vector search:
# Skip Stage 1 entirely - query embeddings directly
# Stage 2 + 3 only: 3,000 tokens
#
# Savings: 150,000 - 3,000 = 147,000 tokens (98% reduction)
def calculate_token_savings(
    num_skills: int,
    avg_tool_tokens: int = 3000,
    metadata_tokens: int = 50,
    instruction_tokens: int = 1000,
    example_tokens: int = 2000
) -> dict:
    """Calculate token savings from skills pattern."""

    # Traditional approach
    traditional = num_skills * avg_tool_tokens

    # Skills pattern with progressive disclosure
    stage1_metadata = num_skills * metadata_tokens
    stage2_instructions = instruction_tokens  # Only selected skill
    stage3_examples = example_tokens  # Only if needed

    # Most requests only need stages 1 + 2
    typical_request = stage1_metadata + stage2_instructions
    complex_request = typical_request + stage3_examples

    # Vector search approach (skip metadata scan)
    vector_approach = stage2_instructions + stage3_examples

    return {
        "traditional_tokens": traditional,
        "progressive_typical": typical_request,
        "progressive_complex": complex_request,
        "vector_search": vector_approach,
        "savings_typical": f"{(1 - typical_request/traditional) * 100:.1f}%",
        "savings_vector": f"{(1 - vector_approach/traditional) * 100:.1f}%"
    }

# Example with 50 skills
result = calculate_token_savings(num_skills=50)
print(result)
# {
#     "traditional_tokens": 150000,
#     "progressive_typical": 3500,    # Metadata + instructions
#     "progressive_complex": 5500,    # + examples
#     "vector_search": 3000,          # Instructions + examples
#     "savings_typical": "97.7%",
#     "savings_vector": "98.0%"
# }
public record TokenSavingsReport(
    int TraditionalTokens,
    int ProgressiveTypical,
    int ProgressiveComplex,
    int VectorSearch,
    string SavingsTypical,
    string SavingsVector
);

public static class TokenCalculator
{
    public static TokenSavingsReport CalculateSavings(
        int numSkills,
        int avgToolTokens = 3000,
        int metadataTokens = 50,
        int instructionTokens = 1000,
        int exampleTokens = 2000)
    {
        // Traditional: all tools in context
        var traditional = numSkills * avgToolTokens;

        // Progressive disclosure
        var stage1 = numSkills * metadataTokens;
        var stage2 = instructionTokens;
        var stage3 = exampleTokens;

        var typical = stage1 + stage2;
        var complex = typical + stage3;

        // Vector search (skip metadata scan)
        var vector = stage2 + stage3;

        return new TokenSavingsReport(
            TraditionalTokens: traditional,
            ProgressiveTypical: typical,
            ProgressiveComplex: complex,
            VectorSearch: vector,
            SavingsTypical: $"{(1 - (double)typical / traditional) * 100:F1}%",
            SavingsVector: $"{(1 - (double)vector / traditional) * 100:F1}%"
        );
    }
}

// Usage
var report = TokenCalculator.CalculateSavings(numSkills: 50);
Console.WriteLine($"Traditional: {report.TraditionalTokens:N0} tokens");
Console.WriteLine($"Progressive: {report.ProgressiveTypical:N0} tokens");
Console.WriteLine($"Savings: {report.SavingsTypical}");

Trade-off

Progressive disclosure adds latency (extra LLM calls for skill selection). For time-critical applications, consider pre-loading frequently-used skills or using vector search for instant matching.

3. Database-Backed Tool Discovery

For large skill libraries (50+), scanning metadata files becomes slow. Vector databases enable instant semantic search:

Vector-Based Skill Discovery
User Query: "help me analyze this spreadsheet"
                    │
                    ▼
            ┌───────────────┐
            │ Embed Query   │
            │ (384-dim vec) │
            └───────────────┘
                    │
                    ▼
      ┌─────────────────────────┐
      │     Vector Database     │
      │  ┌─────────────────┐   │
      │  │ data-analysis   │●──┼── 0.92 similarity
      │  │ visualization   │●──┼── 0.78 similarity
      │  │ web-search      │●──┼── 0.31 similarity
      │  │ code-review     │●──┼── 0.22 similarity
      │  └─────────────────┘   │
      └─────────────────────────┘
                    │
                    ▼
        Top match: data-analysis
        Load: skills/data-analysis/SKILL.md
Vector Database Implementation
# Vector database approach for skill discovery

function indexSkills(skills):
    for skill in skills:
        # Create embedding from skill description + triggers
        text = skill.name + ": " + skill.description
        text += " Triggers: " + join(skill.triggers, ", ")

        embedding = embedModel.encode(text)

        vectorDb.upsert(
            id: skill.name,
            vector: embedding,
            metadata: {
                name: skill.name,
                description: skill.description,
                tools: skill.tools,
                token_estimate: skill.tokenEstimate
            }
        )

function findRelevantSkills(query, topK = 3):
    queryEmbedding = embedModel.encode(query)

    results = vectorDb.search(
        vector: queryEmbedding,
        topK: topK,
        threshold: 0.7  # Minimum similarity
    )

    return results.map(r => r.metadata)

# Hybrid approach: Vector + keyword fallback
function findSkillsHybrid(query):
    # Try vector search first
    vectorResults = findRelevantSkills(query, topK = 5)

    if vectorResults.isEmpty() or vectorResults[0].score < 0.75:
        # Fall back to keyword matching
        keywordResults = keywordSearch(query)
        return mergeResults(vectorResults, keywordResults)

    return vectorResults
import chromadb
from chromadb.utils import embedding_functions
from dataclasses import dataclass

@dataclass
class SkillMatch:
    name: str
    description: str
    score: float
    tools: list[str]

class VectorSkillDiscovery:
    def __init__(self, persist_dir: str = "./skill_vectors"):
        self.client = chromadb.PersistentClient(path=persist_dir)

        # Use sentence-transformers for embeddings
        self.embedding_fn = embedding_functions.SentenceTransformerEmbeddingFunction(
            model_name="all-MiniLM-L6-v2"
        )

        self.collection = self.client.get_or_create_collection(
            name="skills",
            embedding_function=self.embedding_fn,
            metadata={"hnsw:space": "cosine"}
        )

    def index_skill(self, skill: dict) -> None:
        """Add or update a skill in the vector database."""
        # Create rich text representation for embedding
        text = f"{skill['name']}: {skill['description']}"
        if skill.get('triggers'):
            text += f" Triggers: {', '.join(skill['triggers'])}"

        self.collection.upsert(
            ids=[skill['name']],
            documents=[text],
            metadatas=[{
                "name": skill['name'],
                "description": skill['description'],
                "tools": ",".join(skill.get('tools', [])),
                "token_estimate": skill.get('token_estimate', 0)
            }]
        )

    def find_skills(
        self,
        query: str,
        top_k: int = 3,
        min_score: float = 0.5
    ) -> list[SkillMatch]:
        """Find most relevant skills for a query."""
        results = self.collection.query(
            query_texts=[query],
            n_results=top_k,
            include=["documents", "metadatas", "distances"]
        )

        matches = []
        for i, distance in enumerate(results['distances'][0]):
            # Convert distance to similarity score
            score = 1 - distance

            if score < min_score:
                continue

            metadata = results['metadatas'][0][i]
            matches.append(SkillMatch(
                name=metadata['name'],
                description=metadata['description'],
                score=score,
                tools=metadata['tools'].split(',') if metadata['tools'] else []
            ))

        return matches

    def reindex_all(self, skills_dir: Path) -> int:
        """Reindex all skills from filesystem."""
        count = 0
        for skill_path in skills_dir.iterdir():
            if not skill_path.is_dir():
                continue

            skill_file = skill_path / "SKILL.md"
            if not skill_file.exists():
                continue

            metadata = parse_skill_frontmatter(skill_file.read_text())
            metadata['name'] = skill_path.name
            self.index_skill(metadata)
            count += 1

        return count

# Usage
discovery = VectorSkillDiscovery()
discovery.reindex_all(Path("./skills"))

# At runtime
matches = discovery.find_skills("help me analyze this CSV data")
# Returns: [SkillMatch(name='data-analysis', score=0.89, ...)]
using Qdrant.Client;
using Qdrant.Client.Grpc;
using Microsoft.ML.OnnxRuntime;

public record SkillMatch(
    string Name,
    string Description,
    float Score,
    List<string> Tools
);

public class VectorSkillDiscovery
{
    private readonly QdrantClient _qdrant;
    private readonly EmbeddingModel _embedder;
    private const string CollectionName = "skills";

    public VectorSkillDiscovery(string qdrantUrl = "http://localhost:6334")
    {
        _qdrant = new QdrantClient(qdrantUrl);
        _embedder = new EmbeddingModel("all-MiniLM-L6-v2.onnx");

        EnsureCollectionExists().Wait();
    }

    private async Task EnsureCollectionExists()
    {
        var collections = await _qdrant.ListCollectionsAsync();
        if (!collections.Contains(CollectionName))
        {
            await _qdrant.CreateCollectionAsync(
                CollectionName,
                new VectorParams { Size = 384, Distance = Distance.Cosine }
            );
        }
    }

    public async Task IndexSkillAsync(Dictionary<string, object> skill)
    {
        var name = skill["name"].ToString()!;
        var description = skill["description"].ToString()!;
        var triggers = skill.GetValueOrDefault("triggers") as List<string> ?? new();

        // Create text for embedding
        var text = $"{name}: {description} Triggers: {string.Join(", ", triggers)}";
        var embedding = _embedder.Encode(text);

        var point = new PointStruct
        {
            Id = new PointId { Uuid = Guid.NewGuid().ToString() },
            Vectors = embedding,
            Payload = {
                ["name"] = name,
                ["description"] = description,
                ["tools"] = string.Join(",", skill.GetValueOrDefault("tools") as List<string> ?? new()),
                ["token_estimate"] = (long)(skill.GetValueOrDefault("token_estimate") ?? 0)
            }
        };

        await _qdrant.UpsertAsync(CollectionName, new[] { point });
    }

    public async Task<List<SkillMatch>> FindSkillsAsync(
        string query,
        int topK = 3,
        float minScore = 0.5f)
    {
        var queryEmbedding = _embedder.Encode(query);

        var results = await _qdrant.SearchAsync(
            CollectionName,
            queryEmbedding,
            limit: (ulong)topK,
            scoreThreshold: minScore
        );

        return results.Select(r => new SkillMatch(
            Name: r.Payload["name"].StringValue,
            Description: r.Payload["description"].StringValue,
            Score: r.Score,
            Tools: r.Payload["tools"].StringValue
                .Split(',', StringSplitOptions.RemoveEmptyEntries)
                .ToList()
        )).ToList();
    }
}

// Hybrid discovery combining vector + keyword
public class HybridSkillDiscovery
{
    private readonly VectorSkillDiscovery _vectorSearch;
    private readonly KeywordSkillDiscovery _keywordSearch;

    public async Task<List<SkillMatch>> FindSkillsAsync(string query)
    {
        // Try vector search first
        var vectorResults = await _vectorSearch.FindSkillsAsync(query, topK: 5);

        // If low confidence, supplement with keyword search
        if (!vectorResults.Any() || vectorResults[0].Score < 0.75f)
        {
            var keywordResults = _keywordSearch.Search(query);
            return MergeResults(vectorResults, keywordResults);
        }

        return vectorResults;
    }

    private List<SkillMatch> MergeResults(
        List<SkillMatch> vector,
        List<SkillMatch> keyword)
    {
        // Reciprocal rank fusion
        var scores = new Dictionary<string, float>();

        for (int i = 0; i < vector.Count; i++)
            scores[vector[i].Name] = scores.GetValueOrDefault(vector[i].Name) + 1f / (i + 1);

        for (int i = 0; i < keyword.Count; i++)
            scores[keyword[i].Name] = scores.GetValueOrDefault(keyword[i].Name) + 1f / (i + 1);

        return scores
            .OrderByDescending(kv => kv.Value)
            .Take(5)
            .Select(kv => vector.Concat(keyword).First(m => m.Name == kv.Key))
            .ToList();
    }
}

Discovery Approaches Compared

Approach Pros Cons Best For
Keyword/Trigger Simple, fast, no dependencies Misses synonyms, brittle <20 skills
LLM Selection Understands intent Extra API call, latency 20-50 skills
Vector Search Semantic matching, fast Requires embedding model 50+ skills
Hybrid Best accuracy Most complex Production systems

Evaluation Approach

Measuring skill discovery quality requires testing both accuracy and efficiency:

Metric What it Measures Target
Precision@1 Is the top result the right skill? >90%
Precision@3 How many of top 3 are relevant? >80%
Mean Reciprocal Rank How high is the correct skill ranked? >0.85
Latency Time to find relevant skill(s) <50ms
False Positive Rate Skills selected but not relevant <5%

Key metrics for skill discovery evaluation

Skill Discovery Evaluation
# Skill discovery evaluation metrics

function evaluateSkillDiscovery(testCases, discoverySystem):
    metrics = {
        precision_at_1: [],
        precision_at_3: [],
        recall_at_3: [],
        latency: [],
        false_positives: 0,
        false_negatives: 0
    }

    for testCase in testCases:
        query = testCase.query
        expectedSkills = testCase.relevantSkills

        startTime = now()
        results = discoverySystem.findSkills(query, topK = 3)
        metrics.latency.append(now() - startTime)

        # Precision@1: Is the top result correct?
        if results[0].name in expectedSkills:
            metrics.precision_at_1.append(1)
        else:
            metrics.precision_at_1.append(0)
            metrics.false_positives += 1

        # Precision@3: How many of top 3 are relevant?
        relevant_in_top3 = count(r for r in results[:3] if r.name in expectedSkills)
        metrics.precision_at_3.append(relevant_in_top3 / 3)

        # Recall@3: Did we find all relevant skills?
        metrics.recall_at_3.append(relevant_in_top3 / len(expectedSkills))

        # False negatives: relevant skills not in top 3
        metrics.false_negatives += len(expectedSkills) - relevant_in_top3

    return {
        avg_precision_at_1: mean(metrics.precision_at_1),
        avg_precision_at_3: mean(metrics.precision_at_3),
        avg_recall_at_3: mean(metrics.recall_at_3),
        avg_latency_ms: mean(metrics.latency) * 1000,
        total_false_positives: metrics.false_positives,
        total_false_negatives: metrics.false_negatives
    }
from dataclasses import dataclass
from typing import Protocol
import time

@dataclass
class TestCase:
    query: str
    relevant_skills: set[str]

@dataclass
class EvaluationResult:
    precision_at_1: float
    precision_at_3: float
    recall_at_3: float
    avg_latency_ms: float
    mrr: float  # Mean Reciprocal Rank

class SkillDiscovery(Protocol):
    def find_skills(self, query: str, top_k: int) -> list[SkillMatch]: ...

def evaluate_skill_discovery(
    test_cases: list[TestCase],
    discovery: SkillDiscovery
) -> EvaluationResult:
    """Evaluate skill discovery accuracy and performance."""

    p1_scores, p3_scores, recall_scores = [], [], []
    latencies, reciprocal_ranks = [], []

    for case in test_cases:
        # Measure latency
        start = time.perf_counter()
        results = discovery.find_skills(case.query, top_k=3)
        latencies.append((time.perf_counter() - start) * 1000)

        result_names = [r.name for r in results]

        # Precision@1
        p1_scores.append(1.0 if result_names[0] in case.relevant_skills else 0.0)

        # Precision@3
        hits = sum(1 for r in result_names[:3] if r in case.relevant_skills)
        p3_scores.append(hits / 3)

        # Recall@3
        recall_scores.append(hits / len(case.relevant_skills))

        # Mean Reciprocal Rank
        for i, name in enumerate(result_names):
            if name in case.relevant_skills:
                reciprocal_ranks.append(1.0 / (i + 1))
                break
        else:
            reciprocal_ranks.append(0.0)

    return EvaluationResult(
        precision_at_1=sum(p1_scores) / len(p1_scores),
        precision_at_3=sum(p3_scores) / len(p3_scores),
        recall_at_3=sum(recall_scores) / len(recall_scores),
        avg_latency_ms=sum(latencies) / len(latencies),
        mrr=sum(reciprocal_ranks) / len(reciprocal_ranks)
    )

# Example test suite
test_cases = [
    TestCase(
        query="search the web for recent news about AI",
        relevant_skills={"web-search", "news-aggregator"}
    ),
    TestCase(
        query="review this Python code for bugs",
        relevant_skills={"code-review"}
    ),
    TestCase(
        query="analyze this CSV and create a chart",
        relevant_skills={"data-analysis", "visualization"}
    ),
]

# Run evaluation
results = evaluate_skill_discovery(test_cases, vector_discovery)
print(f"P@1: {results.precision_at_1:.2%}")
print(f"MRR: {results.mrr:.3f}")
print(f"Latency: {results.avg_latency_ms:.1f}ms")

Real-World Example: Claude Code

Anthropic's Claude Code uses a variant of the Skills Pattern to manage its extensive toolset:

How Claude Code Uses Skills

  • 1. Skills are stored as markdown files in a .claude/skills/ directory
  • 2. The agent can list, read, and search skill files
  • 3. Skills include triggers, instructions, and examples
  • 4. New skills can be added by users without redeploying

User-Extensible Skills

A key benefit of filesystem-based skills: users can add custom skills by creating new SKILL.md files. No code changes or redeployment required.

Common Pitfalls

Overly Generic Triggers

Triggers like "help me" or "do this" match everything. Use specific action verbs and domain terms.

Missing Negative Examples

Skills should document when NOT to use them. This prevents false matches on similar-sounding queries.

Stale Embeddings

When using vector search, remember to re-embed skills after updates. Implement a hash-based change detection.

Too Many Small Skills

Prefer fewer, well-documented skills over many tiny ones. Each skill selection adds cognitive load for the agent.

Implementation Checklist

  1. 1 Create skills/ directory structure with SKILL.md files
  2. 2 Define YAML schema for skill metadata (name, description, triggers, tools)
  3. 3 Implement skill discovery (start with keyword, upgrade to vector as needed)
  4. 4 Add progressive loading: metadata → instructions → examples
  5. 5 Create test cases with ground truth skill mappings
  6. 6 Measure token savings and discovery accuracy

Related Topics