Upskill generate worked mostly as intended, but when I tried to do the actually interesting part and ran upskill eval, everything started falling apart.
A few concrete issues I hit (Windows + Ubuntu/WSL):
Skill path chaos:
- Ubuntu: Access denied ⦠SKILL.md is not within an allowed skill directory even when the file exists.
- Windows: Path must be absolute⦠and then fast-agent tries to read_skill from /tmp/.../SKILL.md (!), which is āabsoluteā on Linux but not on Windows.
- I ended up chasing /skills, /tmp/skills, .fast-agent/skills, .claude/skills, absolute paths⦠nothing was consistently accepted across runs.
Model selection confusion / fast-agent flag collision
- -m means something else in fast-agent, so the tool evaluates the wrong thing unless you use the right flag. Itās very easy to think youāre running Haiku/Sonnet but youāre not.
-m haiku actually made the program write a literal haiku šš
Provider / model parameter mismatches
- OpenAI GPT-5 models error because the request uses max_tokens instead of max_completion_tokens (hard 400).

Weird UI/logging behavior
- Terminal output spams repeated āChatting ⦠turn 1ā lines, then never prints the result table even though run logs exist.

Token usage
- Can blow up fast (I saw huge output spikes)

- And in at least one case the āagentsā wrote a bunch of .md artifacts into my actual filesystem, which was scary.

Overall: the concept is awesome, but eval is super brittle right now. Did anyone get eval actually working reliably?