After my last article on parallels between working with AI agents and managing teams, new power-user patterns for using agentic tools like Claude Code, Codex and Amp have emerged. I like Andrej Karpathy's phrase "some powerful alien tool was handed around except it comes with no manual and everyone has to figure out how to hold it and operate it" and agree that a helpful framing is the recognition of a higher "layer of abstraction to master".

In particular, users of AI agents (most commonly software engineers but increasingly any digital knowledge worker) are most effective when they think beyond situation-specific prompting and to the bigger project-level picture, asking how the 'harness' (i.e. inputs, outputs, feedback loops and control mechanisms) which wraps around the LLM can be optimised over time to deliver the biggest productivity boost. In the last couple of months, the most powerful AI agents are the ones equipped to control downstream services, and to operate within a feedback loop which compounds their knowledge over time.

Note: I expect this article will rapidly go stale (and it's already a bit behind thanks to the Christmas break) so what follows is my perspective on interesting themes. I'll focus on the aspects of agentic workflows which seem most relevant for knowledge workers (but naturally will use software engineering examples).

Not everyone knows how to write a good prompt, but that's ok

As I previously wrote, we should treat AI agents roughly as we would treat a member of our team. Since communication is the main interface between us and them, developing intuition for how to effectively prompt is still essentially a prerequisite. When training people on how to prompt, I encourage 'metaprompting' with tricks like "give me 5 potential answers" or "summarise our chat so far and suggest 3 next steps". But while thinking creatively about prompts can elicit valuable responses, the IQ Bell Curve meme (and the xkcd over-engineering comic) applies here: the most important thing is to set a clear direction and execute without getting lost in extensive prompt-writing.

This is especially true if you're bullish on LLMs (or other architectures) and buy into the idea that models will get smarter and able to deduce user intentions. In this scenario, it's even less important to meticulously craft a domain-specific prompt (or prompt at all), and instead more important to control what actions the agent can take to complete the task in question.

Skilled agents allow users to focus on higher-level tasks, but limitations exist

For equipping agents with tools in order to make them more effective at their work, Skills seems to be winning as the stickiest concept. As "lazy loaded prompt engineering" they are much cheaper (in terms of tokens) than MCP servers, and easier to create.

Hugging Face Skills are a really cool example. They provide LLM agents with the ability to perform AI engineering tasks (e.g. "Fine-tune Qwen3-0.6B on the open-r1/codeforces-cots dataset for instruction following"). This is a clear example of using agents to climb the ladder of abstraction to focus on direction rather than details. This example specifically also hints at Recursive Self-Improvement (discussion of which is beyond the scope of this article).

There are interesting commercial implications when we think about agents who are getting more and more capable at consuming existing, or developing new, technology solutions. Hugging Face is built like an ecosystem, and Hugging Face Skills are complementary to driving engagement therein. But making AI agents generally capable at software development speaks to a collapse in demand for SaaS tools in favour of ephemeral software if "many things I'd think to find a freemium or paid service for I can get an agent to often solve in a few minutes". It has been argued that a lot of commercial SaaS tools are basically CRUD apps with a sprinkling of "simple domain logic"; could standardisation of this category into a "compact universal representation" enable not just the on-demand generation of software tools, but also their integrations, migrations, documentation and marketing? The trend towards ephemeral software correlates with the rise of uv (ephemeral environments) in the Python world - which has also been packaged up into an agentic Skill like the HF example.

If empowering an agent with a skillset/toolset is the way to unlock higher-level focus for the user, then we find ourselves in a scenario where the size and complexity of the toolset becomes a bottleneck. In other words, toolset management - how to select, install, review and optimise the choice of tools - is an area for improvement in agentic software:

Harnesses are more valuable when customised in terms of skills and UX

The first wave of LLM products (ChatGPT etc) had no real features on which to differentiate beyond the raw capability of their underlying models. But with this growing focus on the skill/tool layer, product offerings are becoming more curated. "A great way to win today is to take that broad stack and narrow it with your opinions". It is clear that understanding how to operate a tailored "AI IDE" (see Stanford CS146S Week 3) is a crucial skill for future users of AI agents.

For example, coding harness Amp is likely picking which LLM to use based on the user's intent and toolset, relieving "decision paralysis" in an "Apple-like" (i.e. curated) experience. It also includes new UX for developers rebalancing how much code they write vs. how much code they review. Similarly, LangChain's LangSmith adds UX for debugging the work of complex agents. Oh My OpenCode is another harness customisation layer that I've seen recently.

Of course, the same bitter lesson of prompt engineering applies to harness engineering: it's almost inevitable that users will go down rabbit holes of customisation without getting real value from their systems; but it's also true that there is still lots of potential value to be gained from UX optimisation of tools which are to be applied to knowledge and project management domains. If companies are able to curate a slick UX then they will be able to drive adoption of traditionally developer-targeted products in non-traditional (non-technical) user segments.

Harnesses don't need to feel like linear conversations with LLMs

As agentic AI gets adopted, a range of possible workflows have appeared. It can be a multi-phase and iterative conversation where an agent is directed to 'do the work' (i.e. write the code); but it could also be more useful to have the AI as a design partner and the user 'holding the pen'. It might also be that AI is more tightly embedded into the same UI as the user, being triggered on-demand (like a canvas copilot).

Other deployment patterns break even further from the user-triggers-LLM-via-computer-screen paradigm:

With great power comes great responsibility

It's worth the reminder: making AI agents more capable, connected and available necessitates controlling how much damage they are able to do. The simplest requirement is managing tool access, but there is also a lot of discussion about managing (simulating) the files that an agent is running over.

The LlamaIndex docs put it well: "One way around this problem is to frequently use human-in-the-loop: while this is a high-success strategy (most people can recognize dangerous actions and block them before they happen), it breaks the autonomy that a coding agent should provide. [...] The second way around this is, counterintuitively, to ban the agent from accessing your actual file system, and make it work in a virtualized copy."

Not everyone is convinced that filesystem virtualisation is the right thing to do: for example is it better for an agent to make 'working copies' at the service level?

What is clear, however, is that agents enable long-term success when they are able to read and write plans and documentation.

Note: I also really liked Simon Willison's "mise en place" metaphor for ensuring that an agent is sufficiently prepared for its task (e.g. plan developed, success metrics set, guardrails in place).

The best systems compound over time

Much of the recent attention spent on optimising agentic harnesses has been on making them "become more ideal for you and the task you are pursuing" (over time). The Every blog (somewhat pretentiously) claims that delivery difficulty consistently increases during a traditional software engineering project because of insufficient focus on "compound engineering" i.e. "helping the whole system learn from successes and failures"; and the same dynamic regularly occurs in/after consulting projects where there is never enough time for rigorous retrospectives and codification of reusable assets (templates, processes, automation, etc). So it makes a lot of sense that we want our systems to automatically "learn". In other words, "the future of (coding) agents is memory". The aforementioned Hugging Face Skills demonstrate an example of this where "everything gets captured. Everything compounds".

Basic implementations of a memory system include an ever-growing folder of tutorials/notes about a given project (which can be viewed as a more project-specific flavour of generic Skills) or the regular distillation of user preferences from conversation logs (similar to what LLM chatbot providers do with their Memory functionalities).

But two other implementations of memory systems (in the sense of writing status updates and plans) have gone viral recently: beads and Agent Mail.

beads

Agent Mail

So what?

AI agents are coming for all knowledge work. Skills and integrations are enabling the application of ostensibly coding agents (Claude Code etc) to business admin, communications, content creation, home automation tinkering and much more; and deep research agents are being adopted as "thinking partners". Certain aspects of knowledge work - design decisions, stakeholder alignment, etc - may be the harder unlock when compared with code generation, but they will not remain the exclusive domain of humans for long (in fact "managers have been vibe coding forever" i.e. as long as task execution happens, the human/managerial domains do not need to be perfect in order for a project to progress).

If we get the balance wrong, then knowledge workers may become the "reverse centaur" consigned to permanent human-in-the-loop review of AI-generated outputs. But in the meantime, understanding how to manage these alien abstractions is both useful and enjoyable.