Catalog · Workflow & Primitives

Volume 02

The Claude Skills Catalog

Volume 02 of the Agentic AI Series

33 patterns draft-v0.1 2026-05 Workflow & Primitives

A Source Catalog

Draft v0.1

May 2026

Table of Contents

About This Catalog

Anthropic published the Skills feature in October 2025 and opened the SKILL.md format as an open standard at agentskills.io in December 2025. Within six months the GitHub ecosystem contained tens of thousands of community skills, several major repositories with hundreds of curated entries, and a small number of corporate skill libraries that have become reference works in their own right. This catalog organizes that landscape.

The form is consciously borrowed from the Patterns of AI Agent Workflows catalog: each entry has a stable name, a one-line intent, a sketch (for the foundational narrative chapters; individual skill entries are text-only), a description of the motivating problem, a How It Works section, a When to Use It section, Sources naming the canonical origin, and Example artifacts --- invocation prompts, SKILL.md sketches, or agent code as appropriate. The goal is the same: a vocabulary for talking about how skills work, where they come from, and when to reach for which.

This is a catalog of sources --- a curated tour through the seven most consequential skill repositories as of May 2026 --- not an attempt at exhaustive coverage. Anyone who wants 1,000+ skills can use the marketplaces and registries (SkillsMP, agentskill.sh, the awesome-claude-skills lists, the n-skills curated marketplace). This catalog covers the repositories that define how skills are designed, packaged, and composed, with three to seven representative skills from each.

Scope

Coverage:

  • Anthropic’s official skills (anthropics/skills) and developer cookbooks (anthropics/claude-cookbooks)

  • Posit’s data-science skills (posit-dev/skills), the largest corporate community contribution outside Anthropic

  • Seth Hobson’s wshobson/agents repository, the largest engineering-focused plugin ecosystem (153 skills, 80 plugins as of May 2026)

  • Numman Ali’s openskills CLI (numman-ali/openskills), the cross-agent universal installer

  • Junghan’s Denote-Org skills (junghan0611/org-mode-skills), a PKM-focused repository extending Anthropic’s Life Sciences paradigm to personal knowledge management

  • Matt Pocock’s personal skill collection (mattpocock/skills), the most-trended community repository in early 2026

  • Grid Dynamics’ Rosetta (griddynamics/rosetta), an enterprise-grade context engineering and meta-prompting platform that delivers structured context via MCP

Out of scope:

  • Slash commands, subagents, plugins, and hooks except where they intersect with skills (Chapter 3 disambiguates).

  • MCP servers as standalone artifacts --- covered only where a repository uses MCP as the delivery mechanism for skills (Rosetta).

  • Domain-specific niche repositories (legal, scientific, bioinformatics, marketing) --- these exist in great numbers but the structural lessons in this catalog apply equally to them.

How to read this catalog

Part 1 (“The Narratives”) is conceptual orientation: what a skill is, how progressive disclosure works, how skills differ from the neighboring concepts (MCP, subagents, slash commands), and what the larger ecosystem looks like. Five diagrams sit in Part 1; everything in Part 2 is text and code.

Part 2 (“The Sources”) is reference material organized by repository. Each section opens with a short essay on what the repository is and why it matters, then presents representative skills using a consistent template. The entries are not meant to be read front-to-back; jump in via the table of contents to whatever matches the task at hand.

Part 1 — The Narratives

The five short essays below frame the design space for skills. The repository entries in Part 2 assume the vocabulary established here.

Chapter 1. What a Skill Is

A skill is a folder. Inside the folder there is exactly one required file --- SKILL.md --- and zero or more optional sub-directories. The SKILL.md file has YAML frontmatter (a name and a description, both required) followed by a Markdown body. The body is plain English instructions written to be read by Claude. The optional sub-directories carry executable code (scripts/), reference documents loaded into context on demand (references/), and templates or other files used in output (assets/).

Skill folder anatomy
A skill is a folder with a SKILL.md and optional bundled resources.

That is the entire specification. There is nothing else. A skill could be 50 words in a SKILL.md and that is enough. The complete Anthropic docx skill that powers Claude’s native Word document generation is a SKILL.md plus a handful of Python scripts and reference files --- roughly the same shape, scaled up.

The frontmatter’s description field carries unusual weight: it is the only signal Claude has at session start for deciding whether the skill is relevant to a given task. Conventional README-style descriptions (“Does X for Y”) underperform. Anthropic’s own skill-creator skill recommends writing descriptions that are explicitly “pushy” --- enumerating the trigger phrases and task shapes that should activate the skill, and naming the cases where the skill should NOT be invoked. The description is a classifier prompt, not a tagline.

Chapter 2. Progressive Disclosure

The single design idea that makes skills work at scale is progressive disclosure. Skill content loads in three tiers, each with a different cost:

Progressive disclosure across the three tiers
Three load tiers — metadata, body, resources — keep many skills available without burning context.
  • Stage 1: metadata. Name + description, roughly 100 tokens per skill. Always loaded at session start. A catalog of 100 skills costs 5,000—10,000 tokens of context to expose.

  • Stage 2: SKILL.md body. Typically under 5,000 tokens. Loaded only when Claude decides, based on the description, that the skill is relevant to the current task.

  • Stage 3: bundled resources. scripts/, references/, assets/ --- any size. Loaded on demand when the SKILL.md body directs Claude to a specific file.

Without progressive disclosure, exposing 100 skills to a model would require putting all of their content into the system prompt, which would cost half a million tokens or more. With progressive disclosure, the same catalog costs a few thousand tokens of overhead, and the marginal cost of adding a new skill is roughly 100 tokens. This is what makes skills scale to a marketplace.

It also imposes design constraints. SKILL.md bodies should stay under ~500 lines (Anthropic’s skill-creator skill enforces this). Domain-specific detail belongs in references/ files that the body points to with explicit instructions about when to read them. A skill that grows past the 500-line ceiling should be split or refactored to load detail lazily.

Chapter 3. Skills vs. MCP, Subagents, and Slash Commands

Skills sit in a crowded conceptual neighborhood. The four nearest neighbors are MCP servers, subagents, slash commands, and “prompts” in the broad sense. The distinctions matter because the four mechanisms have different cost profiles, different failure modes, and different operational characteristics.

Skills vs. MCP, Subagents, Slash Commands
Two axes — invocation (model vs. user) and artifact (static vs. dynamic) — cleanly separate the four.
  • Skills are static and model-invoked. The model decides when to activate one, based on the description. The skill’s content is files; nothing runs server-side.

  • MCP servers are dynamic and model-invoked. The model calls a server’s tools at runtime; the server can carry state and reach external systems.

  • Slash commands are static and user-invoked. The user types /command in chat; the command’s full instructions are injected into the next turn.

  • Subagents are user- (or parent-agent-) invoked and run autonomously with their own context window once spawned.

Skills and MCP are complementary, not substitutes. A skill might instruct Claude on how to use a particular MCP server’s tools; an MCP server might surface skills as resources. In practice production agent stacks use all four mechanisms together --- slash commands as user-facing entry points, skills for repeatable workflows, MCP for live data and actions, subagents for parallel investigation.

Chapter 4. The Skills Ecosystem

The SKILL.md format is a published open standard. The reference implementation is anthropics/skills. The installers are heterogeneous: Claude Code ships its own /plugin marketplace command; the numman-ali/openskills CLI installs the same SKILL.md folders into any environment that reads AGENTS.md (Cursor, Windsurf, Aider, Codex, Gemini CLI); and manual installation --- copy a folder into ~/.claude/skills/ --- is always available.

The Skills Ecosystem
Six producer repositories feed the SKILL.md spec; three installer paths feed six (and counting) consumer agents.

This catalog covers the six producers shown on the left. The producers are heterogeneous in shape. anthropics/skills is the reference (137k stars, ~15 skills, intentionally narrow). wshobson/agents is industrial-scale (153 skills, 80 plugins, 100 commands). posit-dev/skills is a corporate community contribution rooted in R and data science. junghan0611/org-mode-skills is a domain-specialist contribution for personal knowledge management. mattpocock/skills is an opinionated personal collection that went viral on engineering methodology. griddynamics/rosetta is an enterprise platform that delivers skills as part of a larger context-engineering system over MCP.

Chapter 5. Choosing a Skill Source

A rough decision procedure:

  1. Anthropic-produced output formats (docx, pdf, pptx, xlsx) --- use anthropics/skills. These are the reference implementations and are what powers Claude’s native document generation.

  2. R, data science, package lifecycle, Quarto, Shiny --- use posit-dev/skills. Created by the people who maintain RStudio.

  3. Engineering and infrastructure (Temporal, SQL optimization, Stripe, threat modeling, FastAPI, Kubernetes) --- use wshobson/agents. The largest open-source plugin marketplace and the most opinionated about enterprise guardrails.

  4. Cross-agent install of any of the above into Cursor, Windsurf, or Codex --- use numman-ali/openskills. The universal installer; same SKILL.md folders, different target.

  5. Personal Knowledge Management over an Emacs/Org-mode/Markdown vault --- use junghan0611/org-mode-skills. The most rigorous community PKM skills based on the Denote naming convention.

  6. Engineering methodology (TDD, debugging, codebase architecture, alignment-via-Q&A) --- use mattpocock/skills. Small, sharp, opinionated.

  7. Enterprise deployment where multiple repositories and agents need shared conventions, guardrails, and onboarding --- use griddynamics/rosetta. MCP-based delivery; agent-agnostic; source code never leaves the agent.

The repositories are not mutually exclusive. A typical engineering setup installs a few skills from anthropics/skills (skill-creator, docx, frontend-design), several from mattpocock/skills (tdd, diagnose, grill-me), a handful from wshobson/agents (sql-optimization-patterns, workflow-orchestration-patterns), and --- if the org has standardized on it --- attaches Rosetta as an MCP server for project-specific context.

Part 2 — The Sources

Seven repository sections follow. Each opens with an essay on what the repository is, who maintains it, and why it matters. Representative skills follow as individual entries in the same format. The selection in each section is representative, not exhaustive; the goal is to teach the shape of the repository’s thinking, not to mirror its full contents.

Sections at a glance

  • Section A --- Anthropic Official (anthropics/skills + anthropics/claude-cookbooks)

  • Section B --- Posit Data Science (posit-dev/skills)

  • Section C --- Engineering & Architecture (wshobson/agents)

  • Section D --- Cross-Agent Infrastructure (numman-ali/openskills + numman-ali/n-skills)

  • Section E --- Knowledge Management (junghan0611/org-mode-skills)

  • Section F --- Engineering Methodology (mattpocock/skills)

  • Section G --- Enterprise Context Engineering (griddynamics/rosetta)

Section A — Anthropic Official

anthropics/skills (~137k stars) and anthropics/claude-cookbooks (~38k stars)

Anthropic’s own skill repository (anthropics/skills) is the reference implementation of the SKILL.md format. It is intentionally narrow --- fifteen or so skills --- and the contents are chosen to demonstrate the range of what skills can do rather than to cover every domain. The four production document skills (docx, pdf, pptx, xlsx) actually power Claude’s native document-generation capabilities; the rest (algorithmic-art, canvas-design, slack-gif-creator, artifacts-builder, mcp-builder, webapp-testing, brand-guidelines, internal-comms, frontend-design, claude-api, skill-creator) are educational examples and useful production tools in their own right.

anthropics/claude-cookbooks is a companion repository of Jupyter notebooks demonstrating how to call the API directly. It is not a skill repository in the strict SKILL.md sense, but it contains the canonical reference implementations of agent workflow patterns (patterns/agents/basic_workflows.ipynb), the skills cookbook (skills/notebooks/01—03), and the memory cookbook (tool_use/memory_cookbook.ipynb). It is included here because it is the canonical Anthropic-authored material on how to compose skills into agentic systems.

Use Anthropic-official skills when (a) you need a production output format Anthropic has invested in (docx, pdf, pptx, xlsx), (b) you want the reference SKILL.md to study before writing your own, or (c) you want to learn the SDK by working through the cookbook notebooks.

skill-creator

Repository: github.com/anthropics/skills/tree/main/skills/skill-creator

Classification Meta-skill — a skill that produces other skills

Intent

Guide Claude through creating, validating, and optimizing new SKILL.md files for any domain.

Motivating Problem

Writing a good SKILL.md is harder than it looks. The description field is a classifier prompt that determines whether Claude reaches for the skill; getting it wrong means the skill never fires or fires constantly on the wrong tasks. The body should stay under 500 lines, organize references by domain when it covers multiple frameworks, and avoid burning context with information that should live in lazily-loaded reference files. There are enough sharp edges that a skill that codifies the conventions earns its place in the catalog.

How It Works

The skill walks Claude through the lifecycle of producing a skill: gather requirements, draft the SKILL.md, write evaluation queries that distinguish should-trigger from should-not-trigger cases, run the queries against the draft, refine the description until activation accuracy crosses a threshold, then test the body on representative tasks. It enforces structural conventions --- the 500-line ceiling, the references/ subdirectory pattern for multi-domain skills, the explicit reference instructions in the body, the prohibition on bundling executable code that would surprise the user.

The skill is unusually self-aware about its own role: it includes an explicit instruction that descriptions should be “pushy” --- enumerating the specific phrases and contexts that should trigger the skill, even when those triggers are obvious from the skill name. The implicit theory is that the model’s relevance classifier underweights skill names and overweights description content.

When to Use It

Whenever creating a new skill, especially the first time. The skill is fast to use and the lessons it encodes apply across domains. Many of the high-quality community skill repositories (mattpocock/skills, posit-dev/skills) explicitly recommend skill-creator as the starting point for new contributions.

Sources

  • github.com/anthropics/skills/tree/main/skills/skill-creator

  • Anthropic, Equipping agents for the real world with Agent Skills (October 2025)

Example

You want a skill that handles invoice-parsing tasks. You invoke skill-creator with that goal; it asks clarifying questions about the input formats, the desired output schema, and which related tasks should and shouldn’t trigger the skill. It produces a draft SKILL.md plus a JSON file of twenty test queries (“parse this invoice PDF” should trigger; “write a marketing email about invoices” should not). You run the tests, refine the description, iterate.

Example artifacts

Invocation.

# Activate skill-creator in any agent that has it installed.

"I want to create a skill for parsing scanned invoices into
structured JSON.

The skill should trigger on uploads of invoice-like PDFs/images and
ignore

general document-processing requests. Walk me through it."

SKILL.md sketch.

---

name: invoice-parser

description: "Parse scanned invoices, receipts, and bills into a
normalized

JSON schema. TRIGGER when: user uploads a PDF or image of an invoice/

receipt/bill; user asks to 'extract line items' or 'parse this
invoice';

request mentions vendor, amount, line items, tax, total. SKIP:
general

document OCR not specifically asking for invoice structure;
data-entry

for forms that aren't invoices."

---

# invoice-parser

## When to use this skill

[generated by skill-creator from clarifying questions]

## Recipe

1. ...

## References

- references/schema.md --- the canonical output JSON schema

- references/vendors.md --- known per-vendor quirks (loaded only on
demand)

claude-api

Repository: github.com/anthropics/skills/tree/main/skills/claude-api

Classification Domain skill — SDK and API usage

Intent

Build, debug, and optimize Claude API and Anthropic SDK applications, including migration between model versions and adoption of features like prompt caching, tool use, thinking, and Managed Agents.

Motivating Problem

SDK signatures, beta headers, and model strings change over the life of any Claude project. A model trained six months ago will hallucinate import paths, miss prompt-caching opportunities, mix Anthropic-flavored SDK calls with OpenAI-flavored shims, and recommend deprecated patterns. The claude-api skill exists to bind the model’s code generation to the current SDK surface, with explicit per-language references and a directive to web-fetch the SDK documentation when in doubt.

How It Works

The skill organizes its content by Anthropic surface (Claude API, Managed Agents, Skills API, Files API) and by language (Python, TypeScript, Java, .NET, Go). The body has a decision matrix mapping use cases to surfaces: stateful long-running agent with file mounts → Managed Agents; bespoke tool runtime → Claude API with tool use; quick single-shot completion → Messages API.

It enforces two non-negotiables: use the official SDK for the project’s language (no falling back to raw requests/fetch when an SDK exists), and never use OpenAI-compatible shims to call Claude. The skill includes a shared/live-sources.md file listing canonical URLs for each SDK; when the binding the model needs is not in the skill’s reference files, the skill instructs Claude to fetch the SDK documentation rather than guess.

Migration is first-class: the skill explicitly handles 4.5 → 4.6, 4.6 → 4.7, and retired-model replacements. It triggers on file imports of the anthropic / @anthropic-ai/sdk packages.

When to Use It

Any time code that calls the Claude API or Managed Agents is being authored, debugged, or migrated. The skill is essentially a guard against the model’s training-cutoff staleness in this specific narrow domain. Skip it for general programming, for code that calls other providers, and for provider-neutral abstractions.

Sources

  • github.com/anthropics/skills/tree/main/skills/claude-api

Example

Migrating a Python codebase from claude-opus-4-6 to claude-opus-4-7 with prompt caching turned on. Without the skill, the model is likely to retain old beta-header names and miss the cache-control breakpoint syntax. With the skill, the migration follows the documented path and the resulting code passes the caching-correctness check the skill prescribes.

Example artifacts

Invocation.

# Trigger the skill by referencing the SDK in your file or prompt.

"Upgrade this Python module from claude-opus-4-6 to claude-opus-4-7
and

add prompt caching for the long system prompt."

# Claude reads the claude-api SKILL.md (already in context), follows
the

# migration recipe, and applies the cache-control pattern from
references/python.md.

Document creation skills — docx / pdf / pptx / xlsx

Repository: github.com/anthropics/skills (skills/docx, skills/pdf, skills/pptx, skills/xlsx)

Classification Production output skills

Intent

Create, edit, and analyze Microsoft Office and PDF documents from Claude, with full fidelity for tracked changes, comments, formulas, layouts, and binary attachments.

Motivating Problem

Generating Word, Excel, PowerPoint, and PDF documents correctly is a deep problem that the model cannot solve from training alone. The file formats are XML-in-ZIP archives with idiosyncratic namespacing; preserving comments, tracked changes, and complex formatting requires specific library calls that the model gets wrong without prompting. These four skills exist because they are what powers Claude’s own native document-generation in claude.ai and the Claude API --- they are not demos but production code with reference implementations.

How It Works

Each skill follows the same shape: a SKILL.md describing what the skill does and when, plus a scripts/office/ directory with the Python scripts that handle the heavy lifting (unpacking, soffice headless conversion, validation). The body tells Claude when to reach for a script vs. when to write inline code; for docx for example, text extraction with tracked-changes preservation goes through pandoc, while raw XML access goes through the unpack.py utility.

The skills explicitly delineate scope: docx triggers on requests for Word documents and explicitly says NOT to use it for PDFs, spreadsheets, or Google Docs. Same negative-trigger discipline for the other three. This prevents cross-activation that would degrade output quality.

Critical: the document skills carry their own validate.py and soffice.py utilities under scripts/office/, which the skill instructs Claude to use for round-trip validation before declaring the document complete. This catches a class of formatting errors that the model alone would miss.

When to Use It

Any request that produces or consumes a .docx, .pdf, .pptx, or .xlsx file. For Word: tracked-changes documents, formatted letters, multi-section reports, templates. For PDF: form filling, table/image extraction, merging or splitting. For PowerPoint: pitch decks, structured slide content. For Excel: spreadsheets with formulas, charts, or pivot tables. Not for general text content where Markdown or plain text is appropriate.

Sources

  • github.com/anthropics/skills/tree/main/skills/docx

  • github.com/anthropics/skills/tree/main/skills/pdf

  • github.com/anthropics/skills/tree/main/skills/pptx

  • github.com/anthropics/skills/tree/main/skills/xlsx

  • License: source-available (proprietary). The skills are released as a snapshot reference; they are not actively maintained as open source.

Example

Generating a 50-page docx with embedded images, a table of contents that auto-populates on Update Fields, and a footer with page numbers. The skill’s recipe walks the model through using docx-js, the soffice.py round-trip validation, and the structural conventions for headings that the TOC field expects.

Example artifacts

Invocation.

# Skill activates automatically when the user asks for a Word, PDF,
PPT,

# or Excel deliverable.

"Produce a polished pitch deck as a .pptx with a title slide,
problem,

solution, market sizing, and team. Use a clean minimal style."

# Claude reads pptx/SKILL.md, consults pptx/scripts/office/ helpers,

# and writes the file with validate.py round-tripping it before
delivery.

frontend-design

Repository: github.com/anthropics/skills/tree/main/skills/frontend-design

Classification Production output skill — UI/UX

Intent

Instruct Claude to avoid generic “AI slop” aesthetics and make distinctive, considered visual decisions when producing frontend code, especially with React and Tailwind.

Motivating Problem

A bare model asked to “make a beautiful landing page” produces a recognizable default --- gradient buttons, the same hero layout, the same overused stock icon set. The aesthetic is competent and indistinguishable from every other model’s default. frontend-design is the explicit countermeasure: a skill that loads design vocabulary, encourages bolder typographic decisions, and constrains the model toward distinctive rather than safe choices.

How It Works

The skill loads design principles around typography, color, and layout that nudge the model away from the lowest-common-denominator template. It works in concert with the artifacts-builder skill (which knows the technical particulars of claude.ai artifacts: Tailwind, shadcn/ui, lucide-react, the available libraries and their import paths). frontend-design owns aesthetic judgment; artifacts-builder owns the technical surface.

The skill is one of the more frequently cited in community discussions because the contrast with the default is so visible. Public artifacts produced with frontend-design loaded tend to be identifiable on inspection.

When to Use It

Any artifact, component, or page where visual quality is an explicit goal. Particularly useful for landing pages, marketing pages, and design-sensitive React components. Less useful for purely functional internal tooling where defaults are fine.

Sources

  • github.com/anthropics/skills/tree/main/skills/frontend-design

Example

Asked to design a pricing page, a vanilla Claude produces a three-column card layout with gradients and check-mark bullet lists. With frontend-design loaded, the same prompt is more likely to produce a typographic page with deliberate asymmetry, a curated palette, and varied content emphasis. The aesthetic difference is large enough to be visible without a side-by-side comparison.

Example artifacts

Invocation.

# frontend-design activates on UI/page-design requests.

"Design a pricing page for a developer tools product. Three tiers.
Make

it distinctive --- do not give me the default SaaS template."

mcp-builder

Repository: github.com/anthropics/skills/tree/main/skills/mcp-builder

Classification Domain skill — MCP server scaffolding

Intent

Guide Claude through producing high-quality MCP servers --- the integration layer between Claude (or any MCP-compatible agent) and external APIs or data sources.

Motivating Problem

Writing an MCP server end-to-end --- transport choice (stdio vs SSE vs HTTP), tool schemas, capability negotiation, error handling, packaging --- has enough surface area that the model produces incomplete or subtly-wrong servers from a bare prompt. mcp-builder is the codification of the contract.

How It Works

The skill walks Claude through the MCP server lifecycle: pick the transport, define tools with input schemas, write resources if applicable, handle errors per the spec, package for distribution (npm or PyPI), document for the user. The skill includes references for each transport and a checklist that catches the common mistakes (missing tool descriptions, schemas that don’t match handler signatures, capability flags omitted).

It pairs naturally with the claude-api skill when the user wants to both consume Claude and expose tools to it; the two skills can be active concurrently.

When to Use It

Producing a new MCP server, especially the first time, or when integrating an unfamiliar third-party API as an MCP tool. Skip it for using existing MCP servers (which is a Claude API + tool use task, covered by claude-api).

Sources

  • github.com/anthropics/skills/tree/main/skills/mcp-builder

  • Model Context Protocol specification (modelcontextprotocol.io)

Example

Building an MCP server that exposes Linear issue search and creation as tools. The skill guides Claude through choosing stdio transport (since Linear is per-developer), defining search_issues and create_issue tools with strict input schemas, and packaging the result as a npm-installable binary.

Example artifacts

Invocation.

"Build an MCP server that exposes our internal time-tracking API as
tools.

Endpoints: list_entries, create_entry, summarize_week. Use stdio
transport."

webapp-testing

Repository: github.com/anthropics/skills/tree/main/skills/webapp-testing

Classification Domain skill — browser-driven verification

Intent

Test local web applications using Playwright for UI verification, debugging, and visual diffing.

Motivating Problem

When a coding agent produces a web application change, the loop “write code → see if it works in the browser → fix bugs” needs a reliable closing step. Without it the agent ships code that compiles and reads correctly but breaks in the browser. webapp-testing is the skill that closes the loop by giving Claude a Playwright-driven way to actually load the page, click through flows, and observe results.

How It Works

The skill scaffolds Playwright tests, manages the browser session, captures screenshots and console output, and feeds the observations back into Claude’s context. It pairs with the autonomous-agent pattern (Claude tries something, observes the browser result, revises) and with HITL checkpoints (“does this look right?” before merge).

When to Use It

Any frontend or full-stack change where the agent needs to verify end-user behavior rather than just compile-time correctness. Particularly valuable on UI bug-fix work and on accessibility/visual regression catches.

Sources

  • github.com/anthropics/skills/tree/main/skills/webapp-testing

Example

Fixing a login flow bug where a redirect lands on the wrong page after authentication. The skill scripts Playwright to log in with test credentials, observe the resulting URL, capture a screenshot, and report back to Claude with the actual vs. expected URL. Claude revises, reruns the script, ships when the redirect is correct.

Example artifacts

Invocation.

"There's a bug where users land on /dashboard instead of /home after
login.

Reproduce it with the testing skill, fix it, and verify the fix with
a screenshot."

anthropics/claude-cookbooks — reference notebooks

Repository: github.com/anthropics/claude-cookbooks

Classification Companion repository — educational notebooks (not strict SKILL.md format)

Intent

Provide canonical reference implementations of how to call the Claude API and compose agent workflows.

Motivating Problem

The Claude API surface is large enough that the SDK documentation, while accurate, doesn’t convey how the pieces are meant to fit together. Notebooks fill the gap: they show a complete working pipeline end-to-end, with the necessary setup, the actual call sequence, and the post-processing.

How It Works

The repository organizes notebooks by topic: misc/ for general API usage, multimodal/ for image and PDF, patterns/agents/ for the canonical reference implementations of the agent workflow patterns (Prompt Chaining, Routing, Parallelization, Orchestrator-Workers, Evaluator-Optimizer), skills/ for the three-notebook walkthrough of how to use and create skills, and tool_use/ for the memory cookbook and other tool-use patterns.

These notebooks are referenced by name throughout the agent-workflows literature; patterns/agents/basic_workflows.ipynb is essentially the canonical citation for the foundational three workflows.

When to Use It

Learning the SDK, learning the agent workflow patterns from canonical sources, building reference implementations for internal training. Not a place to install skills from --- these are notebooks, not SKILL.md folders --- but the right place to read first when designing a new agent system.

Sources

  • github.com/anthropics/claude-cookbooks

  • Notable notebooks: patterns/agents/basic_workflows.ipynb, tool_use/memory_cookbook.ipynb, skills/notebooks/01_skills_introduction.ipynb

Example

Before writing an evaluator-optimizer loop for your own system, read patterns/agents/basic_workflows.ipynb to see the canonical 30-line implementation. Borrow its structure; replace its prompts with yours.

Example artifacts

Invocation.

# Clone and run locally:

git clone https://github.com/anthropics/claude-cookbooks.git

cd claude-cookbooks/patterns/agents

pip install anthropic

jupyter notebook basic_workflows.ipynb

Section B — Posit Data Science

posit-dev/skills --- the corporate community contribution from the maintainers of RStudio

Posit is the company formerly known as RStudio. They make the IDE most R programmers use, the Quarto publishing system, the Shiny web framework, the tidyverse ecosystem of R packages, and a number of supporting tools (cli, testthat, lifecycle). Their skill repository (posit-dev/skills) is the most substantial corporate community contribution to the skills ecosystem outside Anthropic itself, organized into seven plugin categories: posit-dev (Posit-internal workflows), github (PR workflows), open-source (release management for R and Python packages), r-lib (R package development with the r-lib ecosystem), shiny (dashboards), quarto (publishing), and ggsql (data visualization queries).

What makes the Posit collection distinctive is that several of the skills are not R-specific. critical-code-reviewer and describe-design are general-purpose engineering skills that work across Python, TypeScript, SQL, and R. Anyone who does code review or codebase exploration in any language can install them. The R-specific skills are deeply rooted in actual R package conventions and read as if written by someone who has shipped a CRAN package recently --- which they were.

Installation uses npx skills add posit-dev/skills with —list / —all / —skill options; the repository is also available as a Claude Code marketplace via /plugin marketplace add posit-dev/skills.

critical-code-reviewer

Repository: github.com/posit-dev/skills (general/critical-code-reviewer)

Classification Code-review skill, adversarial mode

Intent

Conduct rigorous, adversarial code reviews that name security holes, lazy patterns, edge-case failures, and bad practices across Python, R, JavaScript/TypeScript, SQL, and front-end code.

Motivating Problem

Bare Claude on a code review tends toward agreeable, surface-level approval. It will note style issues and obvious bugs but will not push hard on architectural decisions, will not assume malicious inputs by default, and will accept lazy patterns (“this works”) without questioning whether they are correct. The critical-code-reviewer skill is an explicit instruction to take the opposing role: assume the code is wrong and look for proof.

How It Works

The skill loads an adversarial reviewer persona with explicit checklists per language: SQL injection and parameterization for SQL; XSS, CSRF, and input validation for front-end; race conditions and resource leaks for backend; correctness of error handling and idempotency for distributed code. It instructs Claude to list specific failure cases with line references rather than general comments. The output format is a list of findings ranked by severity, each with a concrete bad-input example that would trigger the failure.

Critically, the skill includes a “lazy pattern” category: code that works but should not have been written that way. Catch-all error handlers, silent swallowing of exceptions, hardcoded secrets, ignored return values. These rarely produce visible bugs in development but cause incidents in production.

When to Use It

Pre-merge code review on any change that touches data handling, security boundaries, or production code. Particularly valuable as the second reviewer after the author’s self-review and before a human review --- it catches the categories a tired human reviewer is likely to miss.

Sources

  • github.com/posit-dev/skills

Example

A SQL query string built with string concatenation passes a developer’s eyes; critical-code-reviewer flags it as parameterization-missing with a concrete injection payload that would exfiltrate other rows. A try/except: pass block in a Python data pipeline passes review; critical-code-reviewer flags it as silent-failure with the specific case where it would mask a corrupted batch.

Example artifacts

Invocation.

# In an agent with critical-code-reviewer installed:

"Run a critical review on the changes in this branch. Be
adversarial.

Assume malicious inputs and a tired human reviewer."

describe-design

Repository: github.com/posit-dev/skills (general/describe-design)

Classification Documentation generation skill

Intent

Research a codebase and create architectural documentation describing how features or systems work, with Mermaid diagrams and stable code references.

Motivating Problem

Documenting an undocumented codebase is the canonical task no one wants to do. The model is more patient than a human and can read every file, but without structure it produces either bland prose summaries or low-value comments on individual functions. describe-design exists to give that work shape: it produces design documents at the right level of abstraction, with diagrams that survive refactors and references that survive renames.

How It Works

The skill instructs Claude to first survey the codebase structure (build files, entry points, module organization), then identify the feature or system to be documented, then trace its execution paths from inputs to outputs. The output is a Markdown document with sections for context, data flow, key abstractions, and known limitations.

Mermaid diagrams are first-class: the skill knows which Mermaid styles render well for sequence flows, component relationships, and state machines, and produces them inline in the document. References to code use stable identifiers (function names, module paths) rather than line numbers, so the document doesn’t rot after a refactor.

The output is explicitly written for two audiences: humans reading the document and AI agents loading it as context for future work. The latter is non-trivial: the document is shaped to be useful as a reference loaded into the next session’s prompt.

When to Use It

Onboarding to a new codebase, planning a refactor, writing the missing architecture document before a major change, or producing handoff documentation when transferring ownership.

Sources

  • github.com/posit-dev/skills

Example

Inherited a Django monolith with no design docs and a handful of senior engineers leaving. Invoke describe-design on the order-fulfillment subsystem. The output is a markdown document with a Mermaid sequence diagram of the order → fulfillment → shipping flow, a component diagram of the modules involved, and named pointers into the code (“See fulfillment.workers.dispatcher”). The document becomes the canonical reference for the team.

Example artifacts

Invocation.

"Run describe-design on the order-fulfillment subsystem. The entry
point is

the dispatch_order task in workers/fulfillment.py. Document with
sequence

diagrams and component relationships."

testing-r-packages

Repository: github.com/posit-dev/skills (r-lib/testing-r-packages)

Classification Domain skill — R package testing

Intent

Write idiomatic R package tests using testthat 3+, with the appropriate use of fixtures, snapshots, mocking, and BDD-style describe/it blocks.

Motivating Problem

R’s package-testing conventions have evolved meaningfully across testthat versions, and Claude’s training data is biased toward older patterns. The skill encodes the current best practice as of testthat 3+: edition-3 semantics, snapshot tests for output-heavy functions, mocking via mockery or testthat’s built-in shims, and the BDD-style describe/it blocks that read as well in R as they do in JavaScript.

How It Works

The skill loads patterns and references for each testing scenario: testing a function with side effects (use mocking), testing console output (use snapshot tests), testing across a parameter grid (use parameterized expect_equal), testing internal helpers (use testthat::local_mocked_bindings). It pairs naturally with the cli skill (for testing functions that produce CLI output) and the lifecycle skill (for testing deprecation warnings).

When to Use It

Writing or updating tests in any R package, especially when adopting testthat 3 edition or migrating from older patterns. Not useful for non-R projects.

Sources

  • github.com/posit-dev/skills (r-lib category)

Example

Adding tests to a CRAN package that previously had only basic expect_equal coverage. The skill restructures the tests into describe/it blocks, adds snapshot tests for the print methods, and converts file-system side-effect tests to use withr::local_tempdir().

Example artifacts

Invocation.

"Add testthat 3 edition tests to this R package. Use snapshots for
print

methods, withr for any tempdir setup, and BDD-style describe/it."

release-post + create-release-checklist

Repository: github.com/posit-dev/skills (open-source/release-post, open-source/create-release-checklist)

Classification Workflow skills — open-source release management

Intent

Streamline the release of an R or Python package: produce the changelog-driven release blog post and the per-release operational checklist as a GitHub issue.

Motivating Problem

Releasing a non-trivial open-source package is a multi-step ritual: write release notes, generate the changelog, write the announcement blog post, draft the social posts, run the release checks, tag the release, build artifacts, push to the registry, publish the post. Most of these steps are easy to forget; experienced maintainers maintain their own informal checklists. The skills codify the Tidyverse and Shiny blog conventions and Posit’s internal release checklists for community use.

How It Works

create-release-checklist generates a GitHub issue from a template, with version numbers calculated from the current version and the type of release (patch, minor, major). The checklist is a literal todo list that maintainers tick through during the release.

release-post takes the changelog as input and produces a blog post in the conventional structure: short opening, headline features with code examples, deprecations, breaking changes, contributors. It supports both R (Tidyverse style) and Python (Posit Python style).

When to Use It

Releasing any R or Python package, particularly if you want consistent post structure across releases. Even outside Posit’s ecosystem, the checklist generation works for any tagged-release workflow.

Sources

  • github.com/posit-dev/skills

Example

Releasing dplyr 1.2.0. Invoke create-release-checklist with the version bump type; it opens an issue with the 23 items to tick through. After the release, invoke release-post with the changelog; it produces a draft blog post that goes through standard editorial review and gets published.

Example artifacts

Invocation.

"Generate the release checklist for the next minor version of
{package}."

"Write a release post for {package} {version} following the
Tidyverse blog

format. The notable items are: {features}, {fixes}, {breaking
changes}."

quarto-alt-text

Repository: github.com/posit-dev/skills (quarto/quarto-alt-text)

Classification Accessibility / publishing skill

Intent

Generate accessible alt text for figures in Quarto documents using Amy Cesal’s three-part formula: chart type, data description, key insight.

Motivating Problem

Most chart alt text in published documents is either missing or inadequate (“figure 1”, “bar chart”). Writing good alt text requires naming the chart type, summarizing the data, and naming the insight --- a three-part discipline that takes a few sentences and benefits from a consistent template. The skill operationalizes that template.

How It Works

The skill triggers on Quarto chunks that produce figures (#| fig-alt: chunks) and on static image inclusions. It applies the Cesal three-part formula and produces alt text in the conventional structure: “A [chart type] showing [data]. [Key insight].” It handles both code-generated plots (where it can read the data and the plot specification) and static images (where it works from the image and surrounding context).

When to Use It

Authoring Quarto documents intended for publication, especially in accessibility-conscious contexts (academic publishing, government reports, large-organization internal documentation). Useful even outside Quarto for any chart-heavy markdown.

Sources

  • github.com/posit-dev/skills (quarto category)

  • Amy Cesal’s alt text guidelines (cited in the SKILL.md)

Example

A Quarto document has 23 figures and no alt text. Invoke quarto-alt-text on the document. It reads each figure, the surrounding context, and the chart specification, and writes alt text in the Cesal format for each one. Run accessibility checks; they pass.

Example artifacts

Invocation.

"Add Cesal-format alt text to all figures in this Quarto document.
Be

specific about chart type and data, and end each with one insight
sentence."

cli (r-lib)

Repository: github.com/posit-dev/skills (r-lib/cli)

Classification Domain skill — R command-line UI

Intent

Use the cli R package well: semantic messaging, inline markup, progress indicators, theming.

Motivating Problem

The cli package has a rich API with idiomatic patterns that take time to learn: cli_inform vs cli_alert vs cli_warn, the {.field}, {.val}, {.cls} inline markup, progress bars and spinners, theme customization. A model writing R code reaches for message() and warning() by default --- functional but missing the experience the cli package was built to provide.

How It Works

The skill loads the cli idioms and instructs Claude to use the semantic functions (cli_alert_success, cli_inform, cli_warn, cli_abort) with inline markup for the relevant nouns. For long-running operations it prescribes the cli_progress_* functions with a clean teardown.

When to Use It

Writing R packages with command-line interfaces, or improving the user experience of existing R scripts. Particularly valuable for packages that produce user-facing messages --- which is most CLI-shaped R packages.

Sources

  • github.com/posit-dev/skills (r-lib category)

Example

An R package was using message() and warning() everywhere. Run the cli skill on the package; it converts the messages to cli_inform with appropriate inline markup, adds a progress bar to the longest function, and the package immediately feels polished without changing its behavior.

Example artifacts

Invocation.

"Convert all the message()/warning() calls in this package to use
the cli

package idioms, with inline markup for filenames and values."

Section C — Engineering & Architecture

wshobson/agents --- 80 plugins, 185 agents, 153 skills, 16 workflow orchestrators (as of May 2026)

Seth Hobson’s wshobson/agents is the largest engineering-focused plugin ecosystem for Claude Code. As of May 2026 the repository contains 80 focused plugins, each plugin combining a small number of specialized agents, commands, and skills. The plugins are small --- average 3.6 components each, following Anthropic’s recommended 2—8 pattern --- and they are designed to load in isolation so that installing python-development does not also load the 100+ unrelated skills in other plugins.

Installation is via the Claude Code plugin marketplace: /plugin marketplace add wshobson/agents, then /plugin install <plugin-name>. The granular plugin design is the repository’s defining feature; it is a deliberate departure from the “one giant skills folder” approach.

The skills below are representative of what the repository emphasizes: structured patterns for areas where ad-hoc agent behavior produces problems --- distributed workflow orchestration, database query optimization, payment integration, threat modeling, API design. Many of these skills exist because they encode the things a senior engineer would say in a code review, which a less senior engineer (or a model without the skill) is likely to miss.

workflow-orchestration-patterns

Repository: github.com/wshobson/agents (plugins/workflow-orchestration)

Classification Domain skill — distributed workflows

Intent

Apply the right state-management and reliability patterns when writing or modifying long-running workflow code, specifically targeting Temporal and Saga-style patterns.

Motivating Problem

Long-running distributed workflows are a category where idiomatic mistakes are expensive: missing retry policies, side effects outside activities, blocking calls inside workflow code, signal handling that races with the workflow’s own state, Saga compensations that don’t actually compensate. A model writing Temporal code from base training gets the API right and the patterns wrong. This skill exists to load the patterns.

How It Works

The skill loads Temporal-specific guidance: the workflow/activity split, deterministic workflow code, the difference between signal handlers and activity-triggered behavior, the retry policy taxonomy, how to write a Saga with proper compensation. It also covers the general distributed-workflow patterns that apply outside Temporal: idempotency keys, at-least-once vs exactly-once semantics, the relationship between sagas and event sourcing.

For Saga in particular, the skill makes the model think through compensations explicitly: for each forward step, what is the compensating action; what state must be preserved to be able to compensate; what happens if the compensation itself fails.

When to Use It

Writing or modifying Temporal workflows or any orchestration code with similar shape (Cadence, Step Functions). Designing a distributed transaction across multiple services. Adding retry/compensation logic to existing workflow code.

Sources

  • github.com/wshobson/agents (plugins/workflow-orchestration)

Example

Building an e-commerce order-processing workflow as a Saga: reserve inventory → charge payment → create shipment. With the skill loaded, the implementation includes explicit compensations (release inventory, void payment) and idempotency keys so retries don’t double-charge.

Example artifacts

Invocation.

"Implement an order-processing Saga in Temporal: reserve inventory,
charge,

ship. Include compensations and idempotency. Follow the patterns from
the

workflow-orchestration skill."

sql-optimization-patterns

Repository: github.com/wshobson/agents (plugins/database)

Classification Domain skill — query optimization

Intent

Optimize SQL by reading the query plan first --- parse EXPLAIN output, identify the actual bottleneck, then make targeted changes, rather than reaching for indexes or rewrites on instinct.

Motivating Problem

Bare Claude given a slow query tends to suggest “add an index on column X” as the universal answer, often without justification. Real query optimization is plan-driven: read the EXPLAIN output, find the dominant cost node, address that. The skill enforces this discipline.

How It Works

The skill instructs Claude to (a) run EXPLAIN (or EXPLAIN ANALYZE if a representative dataset is available) before proposing any change, (b) identify the dominant cost in the plan, and (c) make changes targeted at that cost. It covers the major database engines’ EXPLAIN output conventions (PostgreSQL, MySQL, SQL Server, SQLite) and the optimization patterns that work for each (covering indexes, partial indexes, query rewrites that change the join order, materialized views).

It also enforces a “prove it” step: any proposed change should come with an updated EXPLAIN that shows the planned improvement. Speculation without an updated plan is rejected.

When to Use It

Any database query slower than the team’s tolerance, particularly when the slowness is recent. Less useful when the schema itself is wrong (the skill optimizes within a schema; it doesn’t redesign one).

Sources

  • github.com/wshobson/agents (plugins/database)

Example

A report query has degraded from 80ms to 4 seconds over six months. With the skill loaded, the agent runs EXPLAIN ANALYZE, identifies a sequential scan over a join that used to use an index, finds that statistics are stale, runs ANALYZE on the relevant table, re-runs the query plan to confirm the index is now used, and reports back with the before/after plans.

Example artifacts

Invocation.

"This report query is slow. Use the sql-optimization-patterns skill:
run

EXPLAIN, identify the cost driver, propose a targeted change, and
show me

the updated plan."

stripe-integration + pci-compliance

Repository: github.com/wshobson/agents (plugins/fintech)

Classification Domain skill — payment integration with compliance constraints

Intent

Integrate Stripe (or similar payment processors) with correct webhook validation, idempotent payment routing, and PCI-compliance discipline.

Motivating Problem

Payment integration is a category where common bare-model mistakes --- logging card numbers, missing webhook signature verification, naive retry without idempotency, storing tokens in the wrong place --- are not merely bugs but compliance violations. The skills enforce the patterns that keep an integration on the safe side of PCI scope.

How It Works

stripe-integration covers the Stripe-specific API: Customer/PaymentIntent creation, webhook signature verification via the Stripe library (not hand-rolled HMAC), idempotency keys on every mutating call, error handling that surfaces user-friendly messages without leaking secrets. pci-compliance loads the broader PCI patterns: never log primary account numbers (PANs), never store them outside the processor’s scope, ensure cardholder data flows only through narrow controlled paths, use processor-tokenization for any persistence.

The two skills are designed to load together for payment-touching work. They impose a positive obligation: every webhook handler the agent writes includes signature verification; every payment mutation includes an idempotency key; every logging call near a payment path is checked for PAN exposure.

When to Use It

Implementing a payment flow --- either greenfield or adding a new payment method to an existing system. Reviewing existing payment code for compliance gaps. NOT for general invoicing or accounting code that doesn’t touch raw card data.

Sources

  • github.com/wshobson/agents (plugins/fintech)

  • Stripe documentation (referenced from the SKILL.md)

  • PCI DSS v4.0

Example

Adding ACH payments to an existing Stripe-Card integration. With both skills loaded, the implementation adds a new PaymentIntent flow with the correct payment_method_types, registers a webhook handler with Stripe signature verification using the official library, and adds idempotency keys to every mutating call. A compliance review finds no PAN exposure in logs.

Example artifacts

Invocation.

"Add ACH support to our Stripe integration. Use the
stripe-integration

and pci-compliance skills --- verify all webhooks, key all mutations

idempotently, and don't put any PII in logs."

stride-analysis-patterns

Repository: github.com/wshobson/agents (plugins/security)

Classification Domain skill — threat modeling

Intent

Apply STRIDE threat modeling (Spoofing, Tampering, Repudiation, Information disclosure, Denial of service, Elevation of privilege) to an architecture or design.

Motivating Problem

Security review is a category where bare Claude defaults to a generic checklist (“did you use HTTPS? did you validate inputs?”) that misses architecture-level threats. STRIDE is the standard threat modeling discipline at Microsoft and is broadly applicable, but applying it by hand is enough work that it tends to be skipped. The skill makes it cheap.

How It Works

The skill reads an architecture description (or builds one from a codebase via describe-design) and walks each component through STRIDE: for this component, what spoofing threats exist, what tampering threats, what repudiation threats, and so on. Each threat is logged with a severity, an exposure description, and a candidate mitigation. The output is a threat model document organized by component and category.

When to Use It

Pre-launch security review of a new system; periodic threat-model refresh; vendor onboarding where you need to assess third-party exposure. Useful in regulated industries (healthcare, finance) where threat modeling is expected as evidence of due diligence.

Sources

  • github.com/wshobson/agents (plugins/security)

  • Microsoft STRIDE methodology

Example

New SaaS product going to launch. Run stride-analysis-patterns on the architecture (API gateway → application → database → third-party payments). The output is a threat model document listing 47 specific threats across the six categories, ranked by severity, with proposed mitigations. The team works through them before launch.

Example artifacts

Invocation.

"Run a STRIDE threat model on our payment processing service. The
components

are: API gateway, payments-svc, transactions DB, and the Stripe
webhook handler.

Output the threat model document."

attack-tree-construction

Repository: github.com/wshobson/agents (plugins/security)

Classification Domain skill — adversarial modeling

Intent

Construct an attack tree for a target asset: the attacker’s goal as root, the strategies that achieve it as branches, the sub-steps as leaves, with cost/feasibility estimates per path.

Motivating Problem

STRIDE catalogs threats by component; attack trees catalog them by attacker objective. The two are complementary: one is bottom-up (“what can go wrong with this thing?”), the other top-down (“if I were trying to steal customer data, how would I do it?”). Doing attack trees by hand is even more tedious than STRIDE; the skill makes the model do the enumeration.

How It Works

The skill takes a target asset (“customer payment data”, “admin panel access”) and produces a tree: the goal at the root, the strategies (gain database access, intercept in transit, social-engineer an admin) as branches, and the concrete steps to achieve each strategy as leaves. Each leaf is annotated with attacker cost, required skill level, and likelihood of detection. The result is a structured Markdown document with the tree as a Mermaid diagram and the leaf details below.

When to Use It

When defensive resources are limited and you need to prioritize --- attack trees surface which paths are cheapest for an attacker, which is where defenses pay off. Also useful in red-team planning and in regulatory contexts (some compliance frameworks require attack-tree-style analysis).

Sources

  • github.com/wshobson/agents (plugins/security)

  • Schneier, Attack Trees (1999)

Example

Build an attack tree for “unauthorized access to customer credit card data.” The skill enumerates 14 distinct attacker paths, from SQL injection through the public API to phishing a customer-support agent to exploiting the third-party payment processor. Each path has a cost estimate. The lowest-cost paths get the bulk of the defensive investment.

Example artifacts

Invocation.

"Build an attack tree for unauthorized access to our customer PII.
Annotate

each leaf with attacker cost and detection likelihood."

context-driven-development

Repository: github.com/wshobson/agents (plugins/methodology)

Classification Methodology skill — specification-first agentic development

Intent

Apply Context-Driven Development methodology: maintain product context, write specifications, and proceed in phased planning rather than ad-hoc code production.

Motivating Problem

Agents move fast. Without a methodology, that speed produces software that no human can maintain and that the agent itself can’t reason about across sessions. Context-Driven Development is one of several attempts --- alongside Spec-Kit, GSD, BMAD --- to give agentic coding a process that scales. wshobson’s implementation is opinionated and complete enough to use as a default.

How It Works

The skill imposes a three-tier structure: a product-context document (the long-lived state describing what the system is and who it serves), a specification document (the focused description of the current change), and a phased plan (the concrete steps to execute). The agent moves between the three documents: spec drives plan, plan drives execution, execution updates spec and product context.

It pairs naturally with the workflow-orchestration plugin’s track-management skills (features, bugs, chores, refactors) and with HITL checkpoints between phases.

When to Use It

Long-running engineering work where multiple agent sessions need consistent context. Onboarding a new agent to an existing project. Codebases where the absence of methodology has already produced confusion.

Sources

  • github.com/wshobson/agents (plugins/methodology)

Example

Starting a six-week migration project. The first session produces the product-context document; the second produces a per-phase plan; subsequent sessions consume both as context and proceed with focused changes. The output across the six weeks reads as if produced by one engineer following a single plan, even though it spans many sessions.

Example artifacts

Invocation.

"Apply context-driven-development to the database migration project.
Produce

the product context first, then the phased plan, then start phase
1."

Section D — Cross-Agent Infrastructure

numman-ali/openskills --- the universal SKILL.md installer; numman-ali/n-skills --- the curated marketplace

Anthropic’s native skill management is tightly coupled to Claude Code: the /plugin marketplace, the ~/.claude/skills directory, the in-product UI. openskills is the explicit decoupling: a single npm-installable CLI that installs any SKILL.md folder from any public or private GitHub repo into any agent that reads AGENTS.md. That covers Cursor, Windsurf, Aider, Codex, Gemini CLI, GitHub Copilot, and Factory Droid, in addition to Claude Code itself.

The package implements the Agent Skills specification (the same spec as Claude Code’s plugin system), generates the same <available_skills> XML block in AGENTS.md, and uses the same SKILL.md folder structure. The differentiator is that the install target is configurable: project-local .claude/skills/ (default), project-local .agent/skills/ (with —universal for AGENTS.md-only agents), or global ~/.claude/skills/ (with —global). Skills can be installed from a GitHub repo (npx openskills install anthropics/skills), from a local path (npx openskills install ~/my-skills/foo), or from a private git repository (npx openskills install git@github.com:org/private-skills.git).

n-skills is the companion curated marketplace from the same author --- a small, hand-picked collection of high-quality skills (orchestration, open-source-maintainer, dev-browser, gastown, zai-cli) installable via openskills. It is positioned explicitly as curation, not breadth: anyone can request inclusion but only meaningfully different skills make it in.

Use openskills when: (a) the team standardizes on a non-Claude-Code agent (Cursor, Codex), or (b) the team uses multiple agents and wants to maintain a single SKILL.md source of truth, or (c) skills need to be versioned in-project rather than installed globally.

openskills CLI

Repository: github.com/numman-ali/openskills (npm: openskills)

Classification Infrastructure skill — cross-agent installer

Intent

Install, sync, and read SKILL.md folders across any AI coding agent that reads AGENTS.md, with progressive disclosure preserved.

Motivating Problem

Many teams use more than one agent: Claude Code for one developer, Cursor for another, Codex in CI. Each has its own plugin or skill system. Maintaining the same skill in three places --- with the same name, the same description, the same updates --- is friction the team is unlikely to sustain. openskills solves the friction by installing the same SKILL.md folder into whichever target the agent expects.

How It Works

The CLI has four core commands: install (clone or copy a SKILL.md source into the target directory), read (load a skill’s body into the current session’s output), sync (regenerate the AGENTS.md skills table from the currently-installed skills), and list (show what is currently installed and from where). Sources can be a GitHub org/repo (auto-clones), a local path, or a private git URL.

The CLI writes an <available_skills> XML block into AGENTS.md that AGENTS.md-reading agents pick up. Each entry has the skill’s name, description, and a location pointer; the body of the skill is loaded lazily when the agent runs openskills read <name>. This preserves the progressive-disclosure design of the original Anthropic skills system across agents that don’t implement progressive disclosure natively.

Recent versions (1.2.0+) default to project-local installs (.claude/skills/), with —global as opt-in for cross-project sharing. Symlink support enables a develop-locally pattern: clone a skill repo once, symlink the skills you want into the project, edit in place.

When to Use It

Whenever a team needs skills to work in more than one agent, or wants the install location to be the project rather than the user’s home directory (for version control of which skills are in use). Also for installing private or experimental skills before they’re ready for a public marketplace.

Sources

  • github.com/numman-ali/openskills

  • npmjs.com/package/openskills

Example

A team uses Cursor for development and Codex in CI. They install posit-dev/skills, anthropics/skills, and a private internal-skills repo via openskills into the project’s .agent/skills/. The AGENTS.md gets regenerated; both Cursor and Codex see the skills automatically. New team members get the same skills by cloning the repo.

Example artifacts

Invocation.

# Install Anthropic's skills, project-local, to .claude/skills/

npx openskills install anthropics/skills

# Install Posit's skills, universal target for non-Claude agents

npx openskills install posit-dev/skills --universal

# Install from a private repository

npx openskills install git@github.com:our-org/internal-skills.git

# Regenerate AGENTS.md skill table after manual changes

npx openskills sync

# Load a specific skill's body into the current session

npx openskills read pdf

n-skills marketplace

Repository: github.com/numman-ali/n-skills

Classification Curated skill marketplace

Intent

A small, hand-curated collection of high-quality skills, installable via openskills.

Motivating Problem

The broad skill ecosystem has scale problems: tens of thousands of community SKILL.md folders, of which many are AI-generated, duplicated, or untested. Curated marketplaces fill the discovery gap by selecting skills that demonstrate real value-add. n-skills is the small, opinionated curation from the openskills author.

How It Works

The repository hosts skills directly or syncs them from external repos via a daily cron. Inclusion is by request and review, not automated. The repository organizes skills into categories (workflow, tools, automation) and exposes each via the same SKILL.md format.

Notable inclusions as of May 2026: orchestration (multi-agent task decomposition), open-source-maintainer (GitHub triage, PR review, repo maintenance), dev-browser (browser automation with persistent state), gastown (multi-agent orchestrator), zai-cli (Z.AI integration via MCP).

When to Use It

Discovery of opinionated, vetted skills for cross-cutting tasks (multi-agent orchestration, open-source maintenance, browser automation) where the larger marketplaces have noise. The catalog is small enough to browse manually.

Sources

  • github.com/numman-ali/n-skills

Example

A maintainer of an open-source library wants to triage two months of accumulated issues. Install the open-source-maintainer skill via openskills install numman-ali/n-skills; openskills sync; the agent now has structured guidance for issue triage, PR review, and maintenance reporting.

Example artifacts

Invocation.

npm i -g openskills

openskills install numman-ali/n-skills

openskills sync

# Now in your agent:

"Run open-source-maintainer on the redux-toolkit repo. Triage open
issues

and produce a maintenance report."

Section E — Knowledge Management

junghan0611/org-mode-skills --- “Denote-Org Skills for Claude”, validated on a 3,000+ file PKM system

Denote is a file-naming convention for note-taking, invented by Protesilaos Stavrou for Emacs. The convention encodes a timestamp, a title, and tags directly in the filename: 20251021T105353—literate-programming__org_emacs.org. The semantics is rigid by design: the timestamp is the canonical identifier; titles can change without breaking links; tags live in the filename rather than in a database. Over a multi-year vault this convention produces a navigable graph of notes that survives software changes --- a note from 2017 is still findable by its timestamp regardless of what tool you use to read it.

junghan0611/org-mode-skills is the codification of this PKM approach for Claude. The author explicitly positions it as extending Anthropic’s Life Sciences paradigm --- the idea that giving Claude domain context (PubMed for biology, Benchling for lab work) elevates it to expert-level collaboration --- from biology to personal knowledge management. The skills give Claude the file-naming rules, the link resolution mechanics, the knowledge-graph navigation primitives, and the literate-programming patterns needed to operate over a Denote-organized vault.

The repository validates the skills on a real 3,000+ file vault. It is the most rigorous PKM-focused community skill set in the ecosystem; if PKM is a serious workflow rather than a hobby, this is the relevant source.

denote-core

Repository: github.com/junghan0611/org-mode-skills (denote-core.md)

Classification PKM skill — file-naming convention

Intent

Teach Claude the Denote file-naming convention: timestamp-prefixed filenames with double-dash title separators and double-underscore tag separators, and the frontmatter conventions that go with them.

Motivating Problem

Bare Claude given a vault of Denote-named files doesn’t know what the filenames encode. It reads filenames as opaque strings, misses the timestamp identity, can’t parse the tags, and breaks links because it can’t resolve denote:20251021T105353 references back to actual files. The skill loads the convention so all of that becomes legible.

How It Works

The skill encodes the Denote naming spec: timestamp YYYYMMDDTHHMMSS is the canonical identifier; — separates title; __ separates tags from title; tags are underscore-separated. The frontmatter mirrors this: title, date, filetags, identifier (the timestamp).

It provides a denote_finder.py script that searches the vault by timestamp, by tag, by date range, by title fuzzy-match. It provides a denote_links.py that resolves denote:TIMESTAMP references to actual file paths, accounting for title changes. The combination makes Claude’s navigation through a Denote vault deterministic and fast.

When to Use It

Any vault that uses Denote naming --- typically Emacs Org-mode users, but the convention is increasingly adopted by Obsidian and other Markdown tools. Useful regardless of whether the underlying file format is Org-mode, Markdown, or plain text.

Sources

  • github.com/junghan0611/org-mode-skills

  • Protesilaos Stavrou, denote.el (the Emacs package that defined the convention)

Example

Asking Claude “find all llmlog files from October 2025.” Without the skill, this requires reading the entire vault to interpret the filenames. With the skill, denote_finder.py is invoked with —tags llmlog —date 202510* and returns the matching files in a single tool call.

Example artifacts

Invocation.

"Find all my notes about literate programming from the last year."

# Skill triggers denote_finder.py:

python scripts/denote_finder.py --tags literate --date
202504*-202604*

denote-knowledge-graph

Repository: github.com/junghan0611/org-mode-skills (denote-knowledge-graph.md)

Classification PKM skill — graph navigation

Intent

Build and query a knowledge graph of inter-note links across a Denote vault, with hop-distance queries and centrality reporting.

Motivating Problem

A 3,000-file vault has a non-trivial graph of cross-references. Without a precomputed graph, each “what connects to this note?” query requires re-reading many files. The skill maintains an indexed graph that makes the queries cheap.

How It Works

denote_graph.py builds an adjacency list from the vault by parsing the [[denote:TIMESTAMP]] references in each file. The graph is cached and rebuilt incrementally. Queries supported include get_connected_nodes(timestamp, hops=N) for graph traversal, find_orphans() for unconnected notes, find_clusters() for topic groups.

The skill instructs Claude to use the graph for navigation rather than reading file bodies when the question is about connectivity (“what are all my notes about X?”, “which notes link to this one?”). The bodies load only when actually needed.

When to Use It

Discovery work in a large vault: finding all notes connected to a topic, finding orphaned notes that should probably link somewhere, finding the closest notes to a given starting point. Particularly useful when a user has lost track of where they wrote about a thing.

Sources

  • github.com/junghan0611/org-mode-skills

Example

“What are all the notes within two hops of my literate-programming entry?” With the skill, the graph script returns 23 directly-linked notes and 47 two-hop neighbors in under a second. Without it, Claude would have to read several thousand files.

Example artifacts

Invocation.

"Show me everything within 2 hops of 20251021T105353."

# Skill invokes:

python scripts/denote_graph.py --start 20251021T105353 --hops 2

literate

Repository: github.com/junghan0611/org-mode-skills (literate.md)

Classification PKM skill — literate programming over Org-mode

Intent

Operate org-babel literate-programming features correctly: :tangle for code extraction, :results for inline execution, :session for stateful evaluation.

Motivating Problem

Org-mode’s literate-programming features are powerful and idiosyncratic. An agent that doesn’t know the :tangle / :results / :session semantics will produce “literate” notes that don’t actually compose code from the source blocks, or that lose execution state between blocks, or that overwrite tangled files unexpectedly. The skill loads the semantics so the agent treats org-mode literate notes as actual literate programming, not as Markdown with code fences.

How It Works

The skill loads the org-babel options and the conventions for naming code blocks (#+name:), specifying tangle targets (:tangle), result handling (:results value vs :results output), and session continuation (:session name). It instructs Claude to write blocks that tangle cleanly to the target files and to use sessions for sequences of related computations.

When to Use It

Authoring or modifying Org-mode literate-programming notes. Maintaining a literate codebase where the source of truth is the .org file and the compiled code lives in tangled outputs.

Sources

  • github.com/junghan0611/org-mode-skills

  • GNU Emacs org-babel documentation

Example

Building a literate analysis of a dataset: org-mode buffer contains the narrative, with code blocks tangled to analysis.py and result-blocks executed inline showing the outputs. The skill ensures the tangle headers are consistent across blocks and that the analysis runs end-to-end on M-x org-babel-execute-buffer.

Example artifacts

Invocation.

"Convert this analysis note from inline-code-blocks to an org-babel

literate program. Tangle to analysis.py; use a Python session for
state."

export

Repository: github.com/junghan0611/org-mode-skills (export.md)

Classification PKM skill — publishing

Intent

Export Org-mode notes to Markdown, HTML, PDF, or other formats via pypandoc with the correct flags for the conversion.

Motivating Problem

Pandoc has many options and many of them matter for round-trip fidelity (footnotes, citations, math, code blocks, internal links). The skill encodes which options to use for which target format.

How It Works

The skill loads the pypandoc wrapper conventions and the per-target flags. It triggers on export requests (“make this a PDF”, “publish to my blog”) and invokes pypandoc with the correct configuration for each target.

When to Use It

Publishing Org-mode notes to any non-Org format --- blog publishing, document sharing, archival.

Sources

  • github.com/junghan0611/org-mode-skills

  • pypandoc / pandoc documentation

Example

Convert a 20-note Org vault into a static HTML site with cross-references preserved. The skill runs the export with the right pandoc flags so internal denote: links become HTML anchors and the table of contents reflects the original document structure.

Example artifacts

Invocation.

"Export the literate-programming series of notes to HTML. Preserve
internal

denote: links as HTML anchors."

Section F — Engineering Methodology

mattpocock/skills --- 21 opinionated skills from a TypeScript educator’s personal .claude directory

In April 2026 Matt Pocock --- a TypeScript educator who runs Total TypeScript and a 60,000-subscriber newsletter --- pushed his personal .claude/skills/ directory to GitHub. The repository hit 22,000 stars in the first 24 hours and was GitHub trending #1 globally for several days. As of May 2026 it has 90,000+ stars. What is striking about the repository is not its size (21 skills, all relatively small) but its theory: agents move fast, but speed doesn’t make them reliable. The skills are explicit countermeasures to the four most common failure modes Pocock identified in agent-driven coding: misaligned requirements, redundant output, code that doesn’t actually work, and accelerated architectural decay.

The skills are small, sharp, single-purpose primitives, designed to be installed individually (npx skills@latest add mattpocock/skills/<name>). Pocock’s explicit recommendation: don’t install all 21; install /tdd and /diagnose first, add others as you encounter the failure modes they address.

The methodological influence is visible: Domain-Driven Design (ubiquitous language), Ousterhout’s A Philosophy of Software Design (deep modules, shallow modules), test-driven development with hard rules about vertical slicing, and the British engineering-education tradition that Pocock embodies.

tdd

Repository: github.com/mattpocock/skills (skills/tdd)

Classification Methodology skill — test-driven development

Intent

Enforce strict red-green-refactor TDD with vertical slicing only --- one test, one slice of behavior, one piece of implementation at a time. No horizontal slicing.

Motivating Problem

The most common failure mode of agents doing TDD is horizontal slicing: write all the tests first, then write all the code. This produces tests that verify imagined behavior rather than real behavior --- the agent has already imagined the implementation, so its tests pass because they test what it imagined. Vertical slicing (one test, one implementation, one refactor) prevents this. The tdd skill enforces it.

How It Works

The skill instructs the agent to: (1) write exactly one test that describes the first behavior; (2) verify it fails (red); (3) write the minimum implementation to make it pass (green); (4) optionally refactor; (5) repeat for the next behavior. Horizontal moves --- writing the next test before the current one passes --- are explicitly prohibited.

The skill is short (the SKILL.md is under 100 lines) but the discipline is load-bearing. With the skill loaded, the agent’s tests are visibly different: each test covers one behavior; each passes for a non-trivial reason; the implementation grows behavior-by-behavior rather than appearing fully-formed.

When to Use It

Any feature implementation where TDD is the right approach. Pocock’s explicit recommendation is to install /tdd first among the 21 skills and only branch out as needed.

Sources

  • github.com/mattpocock/skills

Example

Implementing a price-formatting function. Without the skill: agent writes 8 tests covering edge cases, then a 30-line function that passes them. With the skill: agent writes one test (“formats 100 cents as $1.00”), writes 3 lines to make it pass, then one test for negative amounts, adjusts the function by one line, then one test for currency rounding, adjusts again. The final function is the same size but every line of it was driven by a real test.

Example artifacts

Invocation.

# Install

npx skills@latest add mattpocock/skills/tdd

# Invoke

"Use the tdd skill to implement formatPriceCents. Start with a
single

vertical slice and grow behaviour by behaviour."

grill-me

Repository: github.com/mattpocock/skills (skills/grill-me)

Classification Methodology skill — requirements alignment via interrogation

Intent

Force the agent to ask detailed clarifying questions before starting work, surfacing the misalignment between what the user said and what they meant.

Motivating Problem

The most expensive failure mode in software development is misalignment: the developer thinks they understand; they build something; the stakeholder sees it and realizes the developer didn’t understand at all. Pocock’s observation is that this failure mode is just as common with AI agents as with humans --- perhaps more so, because the agent is fast enough to deliver a wrong implementation before anyone has caught the misunderstanding. grill-me is the explicit fix: a structured interrogation phase before any code is written.

How It Works

The skill triggers when the user describes a feature or change. Instead of jumping to implementation, the agent runs a structured question session: who is the user of this feature, what does success look like, what are the failure modes, what existing things are similar to it, what should it explicitly NOT do. Questions are concrete and follow from each other rather than being checklist-style.

The output of the session is a written summary of the agreed scope that the user has explicitly approved. Implementation starts only after approval.

When to Use It

Any non-trivial new feature or behavior change, especially when the user’s initial description is short or abstract. Skip for clear bug fixes (“this throws an exception, here’s the stack trace”) where the scope is unambiguous.

Sources

  • github.com/mattpocock/skills

Example

User says “add a way for admins to see all users.” With grill-me: agent asks whether “see all users” means search, list with filters, or export; whether sensitive fields should be visible; whether the page should support bulk actions; what should happen with deleted users. The user clarifies; the implementation lands once on the right spec instead of three times on different misreadings.

Example artifacts

Invocation.

npx skills@latest add mattpocock/skills/grill-me

"I want to add a feature to let admins see all users. Use grill-me
to make

sure we're aligned before you start writing code."

diagnose

Repository: github.com/mattpocock/skills (skills/diagnose)

Classification Methodology skill — structured debugging

Intent

Run a structured debugging loop: form a hypothesis about why the bug occurs, design an experiment that would prove or disprove it, run the experiment, iterate.

Motivating Problem

Agents debugging by patching tend to apply fixes that suppress symptoms rather than addressing causes. The diagnose skill imposes the discipline of hypothesis-driven debugging: don’t change code without first proposing what is wrong and what would prove it.

How It Works

The skill instructs the agent to articulate a hypothesis (“the bug is X because of Y”), design an experiment (“if I add logging here and run the failing case, I should see Z”), execute the experiment, observe the result, and either confirm the hypothesis or refine it. Only after the hypothesis is confirmed does the agent change code to fix the underlying cause.

The structure also catches a failure mode where the agent gets distracted by adjacent issues: each hypothesis names a specific cause; tangential changes are deferred.

When to Use It

Any bug investigation that isn’t trivially obvious from the error message. Pairs naturally with /tdd: when a test reveals a bug, /diagnose finds its cause; then the fix lands behind a new test.

Sources

  • github.com/mattpocock/skills

Example

A flaky test fails one time in ten. Without /diagnose: agent adds a retry and moves on. With /diagnose: agent hypothesizes that the flake is due to a race condition between two async setup calls, designs an experiment to log timing, observes the race, and fixes by serializing the setup. The flake stops; the fix doesn’t mask future related issues.

Example artifacts

Invocation.

npx skills@latest add mattpocock/skills/diagnose

"This test fails about 10% of the time. Use diagnose: hypothesis,
experiment,

then fix the root cause."

improve-codebase-architecture

Repository: github.com/mattpocock/skills (skills/improve-codebase-architecture)

Classification Methodology skill — codebase health

Intent

Audit a codebase for architectural problems --- shallow modules, exposed complexity, drift from the ubiquitous language --- and recommend targeted refactors.

Motivating Problem

Agent-driven coding accelerates software entropy. Without explicit attention to architecture, a codebase rapidly accumulates shallow modules (lots of surface area, little capability), inconsistent vocabulary, and complexity exposed at boundaries that should hide it. Periodic architectural cleanup is part of the discipline; this skill makes it cheap.

How It Works

The skill loads Ousterhout-style architecture criteria: deep vs. shallow modules, interface design (information hiding), the cost of complexity at module boundaries. It runs an audit pass over the codebase identifying violators and proposing specific refactors with before/after sketches. The audit is non-destructive --- it produces a report, not commits.

Pocock’s recommendation: run it every few days during active development. The compounding value is significant: caught early, architectural decay is a small refactor; left for months, it becomes a rewrite.

When to Use It

Periodic codebase health check, particularly for projects with high agent-write volume. Before major new features, to clean up the foundation. After fixing a complex bug, to ensure the bug’s underlying cause was structural rather than incidental.

Sources

  • github.com/mattpocock/skills

  • Ousterhout, A Philosophy of Software Design (2018)

Example

After three months of feature work, run improve-codebase-architecture. It identifies six modules with exposed implementation details (private constants leaking into public types), three shallow-module candidates, and two places where the domain vocabulary has drifted (the same concept under three different names). Each is logged with a specific refactor proposal.

Example artifacts

Invocation.

npx skills@latest add mattpocock/skills/improve-codebase-architecture

"Run improve-codebase-architecture over the src/ directory. Produce
a list of

shallow modules and vocabulary drift, with proposed refactors."

git-guardrails-claude-code

Repository: github.com/mattpocock/skills (skills/git-guardrails-claude-code)

Classification Methodology skill — destructive-action guard

Intent

Configure Claude Code hooks that block dangerous git commands (push, reset —hard, clean -f, push -f) before they execute.

Motivating Problem

Coding agents have access to git. They will execute git push when they think a change is done. Most of the time this is fine; occasionally they will execute git reset —hard on a branch with uncommitted work, or git push -f to a shared branch, or git clean -f -d in a directory with unsaved files. Even rare incidents are bad enough to justify a hook-based blocker.

How It Works

The skill registers Claude Code PreToolUse hooks that intercept bash commands matching dangerous git patterns. The hook blocks the command and asks the user to confirm in chat. Commands that are blocked: push (any flag), reset —hard, clean -f, checkout to discard changes, branch -D on shared branches.

The configuration is conservative: many of the blocked commands are legitimate in many contexts, so the hook is annoying enough to nudge the user toward explicit user-runs-it patterns rather than agent-runs-it patterns for destructive ops.

When to Use It

Always, in any Claude Code setup where the agent has access to git. The cost of the hook is occasional friction; the benefit is preventing the rare but expensive accident.

Sources

  • github.com/mattpocock/skills

  • Anthropic, Claude Code hooks documentation

Example

Agent confidently runs git reset —hard HEAD on what it thinks is a stale branch. Hook intercepts. User reads the dialog, realizes there were uncommitted changes, denies the command. Disaster averted; agent uses git stash instead.

Example artifacts

Invocation.

npx skills@latest add mattpocock/skills/git-guardrails-claude-code

# After install, the hook is registered in .claude/settings.json:

# {

# "hooks": {

# "PreToolUse": [{

# "matcher": "Bash",

# "hooks": [{ "type": "command", "command":
".claude/hooks/git-guard.sh" }]

# }]

# }

# }

Section G — Enterprise Context Engineering

griddynamics/rosetta --- meta-prompting, context engineering, and centralized knowledge management for AI coding agents

Rosetta is the architectural departure in this catalog. Where the other repositories ship SKILL.md folders that get copied into a developer’s .claude/skills/ directory, Rosetta ships an MCP server that exposes instructions, skills, and workflows on demand to whichever agent is asking. The agent never sees the full content of the knowledge base; it asks the Rosetta MCP for what it needs for the current task, and Rosetta serves just that.

This matters at enterprise scale for two reasons. First, organizations with hundreds of repositories don’t want to install the same skills folder into every project; they want centralized governance over what conventions apply where. Rosetta delivers the same conventions across every project via the same MCP endpoint. Second, organizations with proprietary engineering standards don’t want those standards copy-pasted across public agent installations; the MCP delivery means the standards stay on the server, with audit trails for what an agent loaded when.

The repository is maintained by Grid Dynamics (a global engineering services firm). Components: rosetta-cli (a Python CLI installed via pip from PyPI), ims-mcp-server (the MCP server, also a PyPI package: ims-mcp), instructions/r2/core (the canonical instruction bundles), plugins for Claude Code and Cursor that wire the MCP server into the agent. Apache 2.0 license, agent-agnostic (Cursor, Claude Code, VS Code/Copilot, JetBrains, Windsurf, Codex, Antigravity, OpenCode).

The defining concept is the four-phase workflow: every AI interaction follows Prepare → Research → Plan → Act. This is not novel as a list (it echoes Plan-and-Execute and similar workflow patterns) but it is novel as a contractual structure imposed across an organization’s entire AI coding surface.

Use Rosetta when: (a) you operate more than ~10 repositories with shared engineering conventions, (b) you want a single source of truth for AI coding standards that survives team changes, (c) you require audit trails for what an agent was instructed to do, or (d) you want guardrails (approval gates, risk assessment) imposed consistently across teams without per-repo configuration.

Rosetta MCP server (ims-mcp)

Repository: github.com/griddynamics/rosetta (ims-mcp-server)

Classification Infrastructure — MCP-delivered skill server

Intent

Serve instructions, skills, and workflows to any MCP-compatible agent on demand, with progressive disclosure and source-code isolation.

Sketch

Diagram for Rosetta MCP server (ims-mcp)
The agent calls Rosetta over MCP; Rosetta serves instructions from the knowledge base; source code never leaves the agent.

Motivating Problem

Distributing SKILL.md folders by checking them into every project works for small organizations but breaks down at enterprise scale. Conventions drift across projects; updates require coordinating dozens or hundreds of pull requests; private standards leak into open repositories; new repositories get inconsistent starting conditions. Centralized delivery via MCP solves all four problems at once.

How It Works

Each developer’s IDE configures the Rosetta MCP server as a tool source. When the agent starts work on a task, it consults Rosetta for the relevant guardrails and workflows. Rosetta returns just the instructions the agent asked for --- not the entire knowledge base.

The four-phase workflow is the contract: Prepare (the agent loads guardrails and project context), Research (the agent searches the Rosetta knowledge base for relevant instructions), Plan (the agent produces a reviewable plan that the user or another agent can approve), Act (the agent executes with the full loaded context).

Source code stays local. Rosetta never receives the project’s source; it only serves instructions back to the agent. This is the architectural property that makes Rosetta usable in environments where source-code exfiltration would be a problem.

Cross-project intelligence is opt-in. Organizations that want it can publish technical and business context from each project into a shared knowledge base; agents working in one project can see references to related work in others. Without opt-in, the per-project context stays local.

When to Use It

Enterprise deployment where AI coding standards must be consistent across many repositories and teams; environments where source code cannot leave the agent’s machine; organizations that need audit trails for what an agent was instructed to do; teams that want a single onboarding path for new repositories that bakes in best practices.

Sources

  • github.com/griddynamics/rosetta

  • pypi.org/project/ims-mcp/ (the MCP server package)

  • pypi.org/project/rosetta-cli/ (the CLI)

Example

A 200-person engineering organization across 40 repositories deploys Rosetta. Each developer’s Cursor, Claude Code, or Codex points at the same Rosetta MCP endpoint. When any developer asks their agent to scaffold a new microservice, the agent consults Rosetta for the org’s microservice template, the security guardrails, the observability conventions, and the deployment pattern. The same agent on a different developer’s machine in a different repository gets exactly the same conventions.

Example artifacts

Invocation.

# Cursor: ~/.cursor/mcp.json or .cursor/mcp.json

{

"mcpServers": {

"Rosetta": {

"url": "<rosetta MCP production server URL>"

}

}

}

# Claude Code

claude mcp add --transport http Rosetta <rosetta MCP production
URL>

# Codex

codex mcp add Rosetta --url <rosetta MCP production URL>

codex mcp login Rosetta

# Then in any agent session:

"Initialize this repository using Rosetta."

rosetta-cli

Repository: github.com/griddynamics/rosetta (rosetta-cli)

Classification Infrastructure — project initialization and management CLI

Intent

Initialize new repositories with Rosetta’s conventions, manage instruction sets locally, and bridge between local-only and centralized deployments.

Motivating Problem

MCP-only delivery has a chicken-and-egg problem: an agent needs to know to consult Rosetta before it consults Rosetta. The CLI handles the bootstrap --- setting up a new repository with the AGENTS.md and .claude-plugin/.cursor-plugin files that tell whichever agent the developer is using to reach Rosetta on first contact.

How It Works

The CLI is a thin Python tool that installs the plugin configurations for the major agents, configures the MCP connection, and runs the project-bootstrap workflow. It can also operate offline against local instruction bundles, useful for air-gapped environments where the MCP server can’t be reached.

For organizations not using the centralized server, the CLI is the way to vendor the instruction bundles into the repository directly.

When to Use It

Initializing a new repository for Rosetta-aware agents; setting up an air-gapped Rosetta deployment that doesn’t use the centralized MCP; debugging the agent’s consultation of Rosetta when behavior diverges from expectation.

Sources

  • pypi.org/project/rosetta-cli/

Example

Bootstrap a new microservice: rosetta-cli init my-service. The CLI creates AGENTS.md, .claude/, .cursor-plugin/, and a stub CLAUDE.md. The developer’s next session in Claude Code automatically sees the Rosetta MCP and follows the four-phase workflow on the first feature request.

Example artifacts

Invocation.

pip install rosetta-cli

rosetta init my-new-service

cd my-new-service

# AGENTS.md, .claude-plugin/, .cursor-plugin/ are now configured

# Open in Claude Code or Cursor and the agent picks up Rosetta
automatically.

The four-phase workflow (Prepare → Research → Plan → Act)

Repository: github.com/griddynamics/rosetta (workflow defined in USAGE_GUIDE.md)

Classification Workflow recipe — the contractual shape of every Rosetta-mediated agent interaction

Intent

Impose a consistent four-phase structure on agent work: load context, search for guidance, produce a reviewable plan, execute.

Motivating Problem

Agents left to their own structure produce inconsistent process. Sometimes they jump straight to code; sometimes they over-research and produce no code; sometimes they plan in private and the user has no opportunity to intervene. The four-phase workflow is the contract that prevents each of these failure modes.

How It Works

Prepare loads guardrails and project context: the relevant conventions, the security and approval policies, the project’s prior decisions. The agent reads these before any user request is processed.

Research uses the Rosetta MCP to search the knowledge base for instructions specifically relevant to the current task. This is progressive disclosure: only the instructions actually needed are loaded.

Plan produces a written plan with the steps the agent intends to take, the files it expects to modify, and the risks it has identified. The plan is reviewable; depending on the risk policy, it may require human approval before Act begins.

Act executes the plan with the full loaded context, with the guardrails enforcing any approval gates or audit hooks defined in the Prepare phase.

The phases are not advisory: they are enforced by the Rosetta instructions the agent loads. An agent attempting to write code in the Research phase will be redirected to produce a plan first.

When to Use It

On any non-trivial change in a Rosetta-managed project. Trivial changes (typo fixes, formatting) may skip the Plan phase if the project policy allows. Higher-risk changes (production deployments, database migrations, security-sensitive code) typically require human approval at the Plan phase before Act.

Sources

  • github.com/griddynamics/rosetta/blob/main/USAGE_GUIDE.md

Example

Developer says “add OAuth2 to the user service.” Prepare loads the org’s auth conventions and the security guardrails. Research consults Rosetta for the org’s OAuth implementation pattern; Rosetta returns the relevant guidance. Plan produces a 12-step plan with the specific files that will change, the libraries that will be added, and the migrations that will be required. The plan goes for human approval (the security guardrail requires it). After approval, Act executes the plan; each step is logged for audit.

Example artifacts

Invocation.

# The workflow is invoked implicitly by any Rosetta-mediated agent
session.

# To make it explicit:

"Use the four-phase workflow on this task. Show me the plan before
acting."

# Or to follow each phase explicitly:

"Prepare: load the relevant guardrails for the user-service repo."

"Research: find the org pattern for OAuth2 integration."

"Plan: produce the change plan and stop for review."

# (user approves)

"Act: execute the plan."

Appendix A --- Integration Recipes

A few common ways to compose the repositories in this catalog:

ScenarioRecipe
Anthropic-official + communityInstall anthropics/skills first as a baseline (skill-creator, claude-api, document skills). Add posit-dev/skills if R or data science is in scope, mattpocock/skills for engineering methodology, wshobson/agents for infrastructure/security domains.
Cross-agent teamUse openskills as the installer for everything. Install the same SKILL.md folders into every agent the team uses; AGENTS.md stays in sync.
Enterprise centralizedUse Rosetta as the MCP-delivered backbone for organization-wide conventions. Augment with project-local SKILL.md folders for project-specific knowledge that doesn’t belong in the central knowledge base.
PKM-heavy individualInstall junghan0611/org-mode-skills for vault navigation, anthropics/skills for output formats (docx, pdf), mattpocock/skills for any code work.
Solo engineer, opinionatedInstall mattpocock/skills selectively (tdd, diagnose, grill-me, improve-codebase-architecture, git-guardrails-claude-code). Add posit-dev/skills critical-code-reviewer and describe-design for review and onboarding. Skip the larger marketplaces.

Appendix B --- The Catalog’s Omissions

This catalog covers seven repositories. The skills ecosystem contains many thousands more. A non-exhaustive list of categories not covered:

  • obra/superpowers --- the other large engineering-methodology repository, broader than mattpocock/skills and complementary to it.

  • Scientific and biomedical skills (claude-scientific-skills, materials-simulation-skills, bioinformatics collections).

  • WordPress, Rails, and other framework-specific repositories that bundle agent-friendly conventions for a single stack.

  • Marketing and content-strategy skills (wondelai/skills, devmarketing-skills, the Composio marketing collections).

  • Skill-discovery and find-skills tooling that operates as a meta-layer above the marketplaces.

  • OpenClaw and similar agent-bundle distributions that package skills with broader runtime configurations.

The selection criteria for this catalog was reference-quality and architectural distinctiveness, not popularity. A v0.2 may broaden coverage; this draft holds to the seven anchor repositories that define the ecosystem’s shape.

Appendix C --- A Note on the Moving Target

Anthropic published the Skills feature in October 2025. The Agent Skills standard was opened in December 2025. As of May 2026 the ecosystem is six months old. Star counts, plugin counts, and skill counts in this catalog reflect snapshots taken in early May 2026; they will be wrong by the time this is read. The repository URLs and the architectural patterns are stable; the specific skill counts and the marketplaces’ contents are not. Treat this catalog as a map of an ecosystem still under rapid construction --- the shape of the landscape will hold even as individual landmarks shift.

--- End of The Claude Skills Catalog v0.1 ---