Agents can’t code AI apps: it’s a Skill issue

PLATFORM

SKYBRIDGE

BLOG

PRICING

…

SIGN UP

PLATFORM

SKYBRIDGE

BLOG

PRICING

SIGN UP

MCP Guides

Agents can’t code AI apps: it’s a Skill issue

Thursday, February 19, 2026

What we learned teaching coding agents to use Skybridge

They can’t build what they don’t know

Coding agents are all fun and games until you ask for something that is not part of their training set, i.e. for stuff born after their cut-off date, namely: ChatGPT and MCP apps.

"AI apps" are a new category of applications that live directly inside LLM chats like ChatGPT or Claude. Through MCP, these apps can invoke tools (search a catalog, place an order, check a booking) and render interactive UI components (a product carousel, a checkout form, a seat picker) right in the conversation. It's a dual surface where users and AI assistants collaborate in the same interface, at the same time.

This concept is already hard enough to grasp for humans, let alone LLMs. They are just not having it. We realized that while building Skybridge, our open source framework for this new paradigm. Coding agents really struggled to understand what we were doing.

Coping skills

LLM in distress: oh no, it’s lacking ✨context✨.

Luckily, skills are for just that: filling the knowledge gap, meaning providing the missing context for something LLMs don’t know or cannot guess. It can be everything from a one-liner instruction to write passive-aggressive PR reviews to a comprehensive hands-on guide to using a new technology (for instance, Skybridge).

In practice, it’s as simple as a Markdown file, SKILL.md. And maybe more Markdown files, if needed. This set of instructions lives in the .agents (or .claude, .codex, .whatnot) folder. Agents will automatically inject the skills metadata in the LLM context and source the skill content whenever it’s relevant. You can also invoke skills explicitly, for instance, with a slash command for Claude Code and Codex.

Skybridge for dummies (LLM edition)

A common pattern we’ve seen in the “AI apps” space is full ports of existing UX within ChatGPT, where web apps were shoved inside widgets with little to no leverage of the underlying LLM.

One of the things to absolutely avoid in skills is to reinforce this behavior by stopping at API surfaces (“here’s how to request this endpoint, here’s the schema”) and call it a day. Naturally, this is not what we wanted for our skill: we wanted it to have a much broader knowledge of MCP apps.

So we went holistic. We wanted our skill to cover the full lifecycle, from “is this even a good idea?” to “it’s live.” A Skybridge-skilled agent should be able to help you refine an idea and make something people will want to use before you even write a line of code. Then, if the idea holds up, scaffold the project, wire up tools and widgets, spin up a dev server, and walk you through deployment. Less “here’s a snippet,” more “here’s why your approach is wrong, and how to fix it”.

Anatomy of a large skill

Context window is finite. A skill that dumps everything at once will bloat the context and degrade the agent’s performance. So larger skills must be split, letting the agent load only what it needs, when it needs it.

SKILL.md is the entry point. Its header (title and description) is always in context. This is how the agent decides whether to load the skill: it matches user intent against the metadata. In other words, it’s the skill’s elevator pitch to the agent. The body is loaded when the skill is invoked, so it should stay short (< 500 lines). For larger skills, think of it as a routing table to reference files and a place to nudge the agent toward a particular sequence.

Reference files are loaded on demand. As the agent works through the SKILL.md flow, it pulls in the references it’s been pointed to. We kept ours sharp and concise, usually under 100 lines each.

Is this prompt engineering 🦋?

Yes. Coding agents really love to code, and the main challenge was preventing them from jumping straight to implementation, instead having them mature ideas and decisions before touching the codebase. We used various techniques to achieve that:

SKILL.md as playbook. The main file is orchestration. It routes the agent to the right reference based on current state, controls what gets loaded and when, and sets hard gates. Progressive disclosure also helps a lot with channeling the agent’s behavior.

State artifact. The agent is asked to create a SPEC.md and update it along the way. The skill gates on its existence and references it throughout implementation. It gives the agent a persistent anchor across sessions.

Phased reasoning with validation gates. Within reference files, the agent must work through sequential phases, some requiring explicit user validation before moving on.

Fail patterns. We describe what going wrong looks like, so the agent can recognize it and push back on the user.

Contrastive examples. Good/bad pairs, side by side, annotated: what’s wrong, why, what to do instead.

**No SPEC.md? Stop.** → Read discover.md first. Nothing else until SPEC.md exists.
...
** Design or evolve UX flows and API shape** → architecture.md
...
-**Fetch and render data**

SKILL.md

Passing the vibe check

Turns out there’s no npm test for vibes. So we started with a lot of manual testing and dogfooding. The team feedback was extremely valuable, even with our bias as framework maintainers.

We also managed to automate some of the QA with evals. We kept things simple: a collection of prompts and expected behavior. To run them, we ask Claude Code to spawn subagents, load the skill, run the prompt, compare expected and actual results, and explain failures. Not bulletproof, but enough to catch regressions when we update the skill.

{
  "query": "I run a small pizza chain called Tony's Pizza. We have an online ordering API that handles menu, cart, and checkout. I want customers to order through ChatGPT - like 'get me a large pepperoni' and it adds to cart, confirms toppings, and places the order.",
  "expected_behavior": "PASS. Work through Phases 1-4, asking questions one at a time (never inferring). Create SPEC.md with Value Proposition, Why ChatGPT?, UI Overview, and Product Context sections. Offer to proceed to implementation."
}

evals/discover.md

Preaching the good word

The most common way for users to discover and install skills is through Vercel’s skills CLI:

npx skills add alpic-ai/skybridge -s

There’s also a search command if you’re just browsing:

npx skills find "skybridge"

The CLI pulls skills from GitHub repos automatically. Once a skill has been installed at least once, it gets listed on skills.sh, the community marketplace. That’s basically your skill’s landing page.

PLATFORM

SKYBRIDGE

BLOG

PRICING

SIGN UP

PLATFORM

SKYBRIDGE

BLOG

PRICING

SIGN UP

Agents can’t code AI apps: it’s a Skill issue

They can’t build what they don’t know

Coping skills

Skybridge for dummies (LLM edition)

Anatomy of a large skill

Is this prompt engineering 🦋?

Passing the vibe check

Preaching the good word

Further readings

Liked what you read here?

Solutions

Resources

Solutions

Resources

Solutions

Resources