This is basically part two of the previous skills post.
That first batch was about making agents less vague: publish to Planet, be thorough, audit code, write pages that are useful to humans instead of SEO sludge.
This batch is what I keep reaching for after the agent starts being useful. Review the work. Delete junk. Remember the stuff that matters. Touch the real Mac without being an idiot.
Different problem.
clauditor
clauditor is for bringing in adversarial Claude reviewers from inside a Codex run.
The important part is that they are separate read-only reviewers, not the same agent asking itself if it did a good job. Codex builds the brief, hands over the relevant diff or page context, and Claude reviews from a specific lens: security, performance, UX, DX, edge cases, skeptical buyer, whatever fits the task.
Example: Codex implements a change, then clauditor runs a security reviewer and a performance reviewer against the same diff. The result is not a vote. It is more like getting a few fresh pairs of eyes and then forcing the parent agent to reconcile the evidence.
This is especially useful when paired with audit-code from the previous post. audit-code gives the main workflow a serious review shape. clauditor gives it another model's judgment, which is often where the uncomfortable misses show up.
skills-debt-collector
skills-debt-collector looks for code that might be worth deleting.
Dead routes, stale tests, old compatibility shims, abandoned feature flags, jobs nobody runs anymore, expensive polling paths that survived because nobody wanted to touch them. That kind of thing.
The useful bit is that it does not treat grep as proof. Static search is a lead. Before calling something debt, it has to check alternate explanations: maybe this is a template, a fixture, a migration replay path, generated code, deploy glue, or some horrible public compatibility contract that still has one customer depending on it.
The report format is intentionally annoying in a good way: what it is, why it might be debt, who added it, when, what commit or PR explains it, what evidence is missing, and whether the first patch should delete it, stage it, or leave it alone.
This pairs nicely with clauditor and the thermo-nuclear review below. One skill says "this looks dead". Another asks "are we about to remove a load-bearing corpse". This is enough paranoia for deletion work.
agent-skill-learn
agent-skill-learn is not agent memory. It is human memory.
It turns notes, docs, posts, meeting transcripts, and debugging discoveries into spaced repetition cards. SQLite deck, JSON import/export, duplicate checks, simple SM-2-ish scheduling, one question at a time.
Example: after a Planet publishing incident, I do not need a poetic summary. I need a card like: "When Planet local public HTML is correct but gateways are stale, what do you check next?" Answer: IPFS logs, repo locks, dead API ports, local IPNS, then external gateways.
That is the kind of thing future me will forget exactly once and then get angry about.
The point is not to memorize blog posts. The point is to extract the small operational rules that keep paying rent.
mac-automation
mac-automation is a local control plane for a real Mac desktop session.
It can inspect Edge tabs, read the active page, take scoped screenshots, query the macOS Accessibility tree, send clicks and keystrokes, run launchd workflows, and smoke-test Swift apps. It also has LinkedIn workflows with dry-runs and audit logs.
It is built for agents, but it is not an agent. That distinction matters. It returns JSON and artifacts. It does not pretend that "computer use" is magic.
Example uses: check the logged-in Edge page without dumping every tab, capture the active window for visual QA, click a macOS app button by accessibility id, run a Swift app smoke test, or prepare a LinkedIn post and stop at the dry-run until the exact public text is approved.
Sharp tool, real desktop. Start read-only. Do not let an agent freehand your browser profile unless you like finding new ways to ruin an afternoon.
thermo-nuclear-code-quality-review
thermo-nuclear-code-quality-review is from Cursor's team kit and it is exactly what it sounds like.
It is an unusually strict maintainability review. Not "does this pass tests", but "did this make the codebase worse".
It pushes on file sprawl, spaghetti conditionals, pointless wrappers, casts that hide bad boundaries, feature logic leaking into shared layers, and missed chances to delete whole chunks of complexity with a cleaner model.
I like it because agents are very good at producing code that works locally and quietly makes the surrounding system worse. This skill is a counterweight to that. It asks for the code-judo move: can we keep the behavior and make the structure much simpler?
Use it after the happy path passes, not before. First make the thing work. Then ask whether the implementation deserves to live.
the pattern
The first post was mostly about giving agents better task workflows.
This set is more about consequences.
Once agents can ship useful work, you need ways to review it, remember the lesson, delete old mistakes, and safely operate real tools. Otherwise the workflow becomes one big pile of confident output and future cleanup.
Skills are useful because they make these standards reusable. Not perfect. Not automatic. But better than retyping "be careful, check evidence, do not leak secrets, do not make a mess" into every prompt like a cursed little prayer.
The agent still needs judgment. The skill just makes it harder for that judgment to start from zero every time.