Butterswords | Where Words >= Swords | Starting to evaluate agent usefulness

I recently talked with a friend and collaborator about agents and the usefulness of products such as Anthropic’s Claude (incl. Cowork and Claude Code). We did not agree on much during the conversation, with me sounding very rigid and curmudgeonly and him taking on an optimistic view I did not expect. I respect this person a great deal, so as he continued to argue for the benefits and usefulness of these systems in his life I felt a desire to better understand where he was coming from. Towards the end of the conversation he gave me a blanket challenge:

Use agents for tasks you’d never consider using them for. See how they perform. It may not change your mind and, if it doesn’t, at least then you’d be able to say you tested your biases.

Implied in his challenge was something I find very hard to do: set aside my biases to give something a proper chance when I’ve already made up my mind. Still, I took up the challenge and this article represents the first of a handful of pages I’ll be writing on my observations from using 3rd party agents (mostly Claude), Ollama, and LangGraph to bring agents into my day-to-day. I don’t know whether the findings will be consistent throughout, so I hope you understand I expect my thinking to change. I just don’t know if it will be more optimistic or if I will want to pull the rip cord and eject out of tech all together.

Overview of what I plan to do

I recognize I will have to use AI when it’s incorporated into products I use or that people and companies I interact with leverage it. Does that mean I have to use it for myself? What changes if I choose to use it? What changes if I don’t? These questions rattle around my head all the time. They take up more space than I would like. Every time I see a new “report” or “finding” about how AI is helping people, or how it’s hurting them, I find myself questioning the motivations of the authors. I seek to think through who benefits from the perspective shared, who is funding the research, and how does it match with larger narratives around the technology. There are many people thinking deeply about these questions, who have more time and resources to devote to them than I do, so I will not focus on addressing them directly. I will focus, instead, on what I see as a risk management profession who has to exist in a world permeated with AI.

I feel a lot of friction about using agents because I don’t know if they provide enough value to warrant integrating them into my life and workflows. I am working on a project to explore this more directly, using this great paper on the uses of LLMs for assurance arguments, but for our current purposes it’s critical just to note they set up 14 questions someone should have a plan to answer BEFORE integrating LLMs into their work. I agree with their findings and so tend to gravitate towards spending a lot of time before touching the technology thinking through my goals and expectations. My friend told me I needed to let go and focus more on discovering the usefulness of agents through their usage. He has a point. Trying a different methodology might support me putting my biases to the side.

Tasks I’ve identified for exploration

Potential Use Case	What is the tech	What I hope to do	Effort or skill required	Value to me	Outcome	My current judgment
Design a custom logo/wordmark	Claude Cowork	Take a simple sketch I have of a wordmark and generate a logo and wordmark from it	Low	Medium	It eventually created a usable placeholder logo and wordmark	Design does not appear to be a strength of these systems, especially when making small modifications based on abstract concepts like balance
Resume rewrites for specific jobs	Claude Cowork	Assess my resume’s fit for a role and then use it, and my LinkedIn profile, to generate a role-specific resume	Low	High	It has generated a resume for each role I’ve used it for	The text generated feels largely made up and not reflective of my experience, no positive responses yet for modified resumes
Draft a business plan	Claude Cowork	Take a set of goals and a website and turn it into a business plan	Low	High	It provided a structured document with some rough estimates for revenue and expenses	The structure seemed find, the numbers did not match across the document
Create a Mod for a video game	Claude Cowork	Create a mod for Banner Lord 2 to add a new story and more depth	Medium	High	Unknown	Not Started
Oxalate Estimator	VLM + Agent	Estimating the amount of oxalates in food based on images, recipes, or labes	Medium	High	Initial POC showed promise but continually broke when given a recipe	Just Started
Irrigation System	Agent	Using an agent to gather upcoming weather information from the internet and determine if an automated watering system should run.	Medium	Medium	Unknown	Not Started
Triage bugs in live code	Claude Cowork	Assisting me by identifying issues with dependencies and configuration.	Low	High	It helped me identify and resolve a bug that came about after an update to a dependency	The system appears to handle this task well
Demo Creation	Claude Cowork and Claude Code		Medium	High	The demo came out great	This tech is useful for mocking a front end, with custom functionality, quickly
Creating an Agent	Ollama + Copilot		Medium	Low	I finished the agent in the time alotted	It could be very useful, but it’s also dangerous to use too much

Of the use cases above, three of the tasks (Triage bugs in live code, Demo creation, Creating an Agent) fall directly within what I believe agents to be useful for. I will still test them so I do not end up focusing only on peripheral use cases of the systems. The creators of this technology want people to believe it can support or replace human work across many domains. I consider this exploration an expansion of the years of risk management I’ve been doing in the space at a more personal level.

Update Log

Potential Use Case	Last Activity Date	Current Status	Notes
Design a custom logo/wordmark	June 17, 2026	✅	This took me the better part of a week’s worth of Claude Credits on Free Tier.
Resume rewrites for specific jobs	July 2, 2026	✅	I am unsure whether the changes made are useful.
Draft a business plan	May 30, 20206	✅	The framework was helpful, the financials… not so much.
Create a Mod for a video game	June 22, 2026	🆕
Oxalate Estimator	May 26, 2026	🔜
Irrigation System	June 1, 2026	🔜
Triage bugs in live code	June 3, 2026	✅	Definitely a place worth exploring more.
Demo Creation	June 24, 2026	✅	This helped me build quickly once I figured out how best to shape my own experience.
Creating an Agent	June 2, 2026	✅	It’s tough to code in front of people in 30 minutes, especially when they expect you to prompt Claude and then just chat…

Overview of what I plan to do

Tasks I’ve identified for exploration

Update Log

Share this post