I hate how he called out terminal user interfaces as shit… then proved web interfaces to be superior. Damn him. I love working from my terminal, but having ai prove itself through html reports including video, image, metrics, charts, and text is goated. Rethinking yourself has the bottleneck not the orchestrator feels real. Validating the work is hard, theres a shift right now and everyone is trying to figure it out. Lucas’s technique is a little bit of be lazy and tell it to prove itself to you, so as you juggle your 15 agents you have a nice report to read.
Posts tagged: ai
All posts with the tag "ai"
This is a really good guide, with quite a few good nuggets. I need to try deleting my AGENTS.md and rebuilding it from scratch more often. I liked how he talked about having agents prove their work and tell them up front how they will be judged. What I didn’t care for so much was the feeling that a lot of the rules go in markdown, thats not a rule, thats a suggestion. Rules should be deterministic. They should be tests and linters that ensure they are followed. Suggestions are good, but dont trust the agents to always follow them. And don’t trust that they wont change your rules, keep them honest.
Interesting take by Kenneth Reitz. Not quite sure how I feel about it anymore. It kinda hurts, but I’m not sure if code aesthetics matter as much as the product anymore. I cared when I was the one editing, but at this point I’m not doing a lot of edits by hand. Do these aesthetics affect the final products that users use, Not sure. AI makes me sad.
If agents make prime a bit faster, what does that mean for the rest of us mortals?
I’ve gotta agree with bob on this one, the first thing I did to my biggest brownfield project I wanted to use agents on BEFORE they did work was a hardened pre-commit.yaml, ci, hardened type checking and linting. SECOND get rid of bad inconsistent patterns, let them replicate consistency, force them to pass checks. Agents will follow all of your markdown suggestions most of the time, enough for you to become complacent if you let it. They are goal seeking, if you put them to a task you thought was possible that is not given your constraints, they will try to find a way given enough tokens. I dont see this ever changing, its one thing that makes them great, it just needs to be kept in check.
behind, yet positioned to completely dominate this race by hitting it with some sense. Making trends in what looks like longevity in the race that is not subsidising to simply get users, but to get by until they figure out how to 100x reduce the cost to a reasonable level. They feel like the guy sitting in the back with nothing big or flashy to say that is going to drop the hammer on their competition that overstretched itself taking on too much debt because it was necessary to change the game. There might be something to having a mix of hipsters, boomers, and luddites all trying to balance each other out.
An ai model created by Anthropic was announced as a closed preview on April 7, 2026 for critical security research and evaluation with its close partners with critical software such as operating systems and browsers. Anthropic claims that mythos is able to reason through so much more context that any model ever before. This enables it to find bugs that are 25 years old in the BSD, considered one of the most secure operating systems we have. Once it finds these zero day bugs never discovered before its able to use them together in malicious ways never expected. In ways the world is not ready for. At the time of writing these are claims without proof. It remains scary to know the potential this has and that there is only a few companies with this potential that will gatekeep who gets access.
5 star video, if you are going to watch one video to understand how harnesses and agents work, this is it. This really had my gears spinning on what tools do for agents and how big of a difference they make in their ability to manage context efficiently and accurately create changes. It’s crazy how good bash works, and that gives the agents the ability to do just about everything, but it could be better.
Agents Are Here
Late last year I started writing I'm Out On Agents. Agents sucked, the models were good, but there was still something missing between the harnesses and the models. They could write good code, they could do some debugging and exploring, but they were too good a fucking up the whole project to be useful. They could crank out Green Field POC’s like nobody’s business, but they created so much mess in brown field projects that it was easier to chat and edit yourself.
The 7 min read
A really interesting long form interview with @simonwillison. If you follow him closely most of it is probably not new, but I found some interesting nuggets.
Simon is writing most of his code from his phone these days using anthropic hosted platform. He mentioned that a lot of security risks go away when you don’t put secrets on the platform and you let them take the risk of running ai written code with ai chosen supply chain.
He talked about the Pelican Riding a Bike benchmark for quite awhile. He was surprised at how well of a proxy it is for how capable a model is at just about everything. He also said that when he runs the benchmark he also runs half a dozen others that he’s never talked about so that He could see if they were to train a model specific to his benchmark he could catch them, but it seems they had caught on and if they were they seem that they would already...
...
Is Glasswing the next inflection point
I’ve been thinking about this for awhile and Daniel makes some great arguments here. Interestingly keeping inference cheap removes the incentives to make our tools better, help us choose the right model, lean on local models, open weight models. The frontier models are so affordable through subsidized subscription models why would you deal with anything less intelligent at this point. The tooling we use is not optimized for it, and why should it be.
Everyone look away, nothing to see here.
Anthropic safewords are the talk of the town today.
The claude code source code leaked today and the tweets are great, maybe twitter is back.
Did you know you can replace the spinning verbs in Claude Code. I’m having fun with it.
I’m about to be pi pilled.
We f&#ing said @pype, well f&#ing said. I think a lot of us are feeling this, we’ve pitched our brain into a bucket and we are no longer stretching it in the same way. We still work in similar ways of old, with new ways of turning off and saying yes a bunch of times. the best thing I can hope for is that as things get better we have fewer yes loops, and more architectural design debates and deep thoughts. But I fear deep thoughts are gone to the way of “research the leading 10 frameworks and pick the best one for this project.” and letting the clankers do the deep thinking. Its signing us up for a weird distopia.
I think a lot of us wish we could undo what has happened and go back to actually understanding what we are doing, but...
I’m in step with @pype here, I really want beads to work for me, but my systems for infra/platform work are all over the place, not one repo. I’m considering trying the BEADS_DIR env var but idk if it fits my workflow. For now, similar to @pype, I am rocking my own home vibed solution that I’ve intentionally put little effort in and its working great and I expect it to be broken and not working with the latest harnesses and models within a few months anyways, cause there is no predicting this train.
Vibe coding is going so far into the news sphere now that Adam Savage even weighs in with perspectives from someone who has built a life around building things with his hands, keeping up with new making techniques, discovering old techniques as they combine with new. He talks about 3d printing reviving his love of the pantograph as one automation technique eases the most difficult part of another.
Does anyone think fast-code will continue to pay the same salary? The answer isn’t to switch your brain off during your McCode shift and write a poem after work. Your job will be replaced by a Banglasdeshi slop-shop if AI improves (which is inevitable, apparently). Possibly the same sweatshop that loomed my £3 T-shirt. The Luddites didn’t accept their fate so easily.
David has some good points here, but I’m feeling the opposite direction a bit. Execs have always liked keeping the PM’s and the people steering the ship close by and were willing to farm out more and more grunt work. It feels like we are in a weird phase where there used to be a big group of people paid to write code. A few of them are exceptionally good at it and will remain. There will be a need for these people everywhere. Somehow we still need people hand editing assembly code optimizations, fortran, and cobol today. Those industries largely moved on, but a few great ones remain. I think this fast-code slop factory is going to be a short forgotten time in history, but no one yet knows what’s next. We are all waiting to find out. Just with anything there is still value in doing it by hand and...