News | drihu.com

By PunchyHamster, 13 hours ago

I'd argue 2 types of users are

* People using it as a tool, aware of its limitations and treating it basically as intern/boring task executor (whether its some code boilerplate, or pooping out/shortening some corporate email), or as tool to give themselves summary of topic they can then bite into deeper.

* People outsourcing thinking and entire skillset to it - they usually have very little clue in the topic, are interested only in results, and are not interested in knowing more about the topic or honing their skills in the topic

The second group is one that thinks talking to a chatbot will replace senior developer

By sevenzero, 12 hours ago

I started to outsource thinking at my job as my company made it very clear that they do not want/cant afford thinking engineers. Thinking requires time and they want to deliver quickly. So they cater towards the very realistic deadlines our PMs set for features (/s). Funnily enough the features have to be implemented ASAP according to the customers, but the customer feedback takes like 6 months due to them using the new feature for the first time 6 months after delivery. I just dont care anymore. Gonna leave the learning part up to my time off, but getting generally tired of the industry as a whole, so just putting in minimal effort to pay my bills until things explode or get better. So for me its definitely outsourcing thinking at work.

By gloomyday, 12 hours ago

This is a fatalistic attitude, but I can totally get behind. It has become harder to associate my job with contributing with society.

By sevenzero, 11 hours ago

I agree and I am far from being a senior engineer. I'm only in the market since a few years and started out just before the whole LLM stuff started to pick up. So I have been grinding a lot (2nd job I've learned, am in tech since ~2020) only to be confronted with permanent existential fear of having to possibly learn a 3rd job (which takes 3 years of full time work for neglectable pay in my country). I dont want to start from zero again and I am tired of corporations that are shitting out money to be cheap on their employees. Starting from zero is never fun, going back into debt is never fun and having to leave a job/career you like also is never fun. I'm 30 now and only ever have been making (noteworthy, still below median) money since 1.5 years now. I cant afford starting anew and there is little I can do about it which is extremely frustrating.

By Schiendelman, 6 hours ago

As a product manager, this makes me think the features you're building are not the things your customers need or want most. I'm curious if you were to ask your product manager about that six-month timeframe, and just ask the open-ended question of is there anything on the backlog that we can build that the product manager thinks users would pick up within days instead of months?

By idopmstuff, 6 hours ago

As a product manager, this feels like they're in the pretty typical B2B SaaS trap of building the stuff that the people who pay for the product insist they need but the people using the product don't necessarily want, so they've gotta invest a bunch of time and effort getting usage/feedback.

Could be for good reasons (e.g. they're security features that are important to the business but add friction for the user) or just because management is disconnected from the reality of their employees. Either way, not necessarily the wrong decision by the PM - sometimes you've gotta build features fast because the buyer demands them in a certain timeframe in order to get the contract signed. Even if they never get used, the revenue still pays the bills.

By ddsfsdf, 8 hours ago

What do you really care? Its a job.

By eitally, 7 hours ago

Historically (I'm 48), professionals have cared about their jobs, generally speaking, and often do make serious attempts to logically derive sociological benefits from their personal efforts. There's been a seismic shift over the past 5-6 years, though, and this sense of care has massively eroded.

By Schiendelman, 6 hours ago

I'm 44. It's been more than five or six years. I would say 15 or 20, if not more.

By strange_quark, 6 hours ago

It feels like covid turbocharged it though. The amount of grift outright corruption is unrecognizable compared to even 2019. Maybe it was always there but it feels like companies have gone full mask off now.

By NDizzle, 6 hours ago

I feel you. I’m 46 and now on the hunt for the right company to work for, and hopefully finish out my career there. While the company values haven’t technically changed, the actions taken in the past 5 years have eroded my trust so much I barely recognize the place. When you no longer have a sense of pride working somewhere, it’s time to move on. At least that is what I believe to be true.

By palmotea, 6 hours ago

> While the company values haven’t technically changed, the actions taken in the past 5 years have eroded my trust so much I barely recognize the place. When you no longer have a sense of pride working somewhere, it’s time to move on. At least that is what I believe to be true.

The problem, as I see it, is the changes that bug me [1] seem systemic throughout the economy, "best practices" promulgated by consultants and other influencers. I'm actually under the impression my workplace was a bit behind the curve, at a lot of other places are worse.

[1] Not sure if they're the "actions" you're talking about. I'm talking about offshoring & AI (IHMO part of the same thrust), and a general increase in pressure/decrease in autonomy.

By antisthenes, 2 hours ago

> There's been a seismic shift over the past 5-6 years

Nah. It's been at least since 2009 (GBC), if not longer.

It started happening with the advent of applicant tracking systems (making hiring a nightmare, which it still is) and the fact that most companies stopped investing into training of juniors and started focusing more on the short-term bottom line.

If the company is going to make it annoying to get hired and won't invest anything in you as a professional, there's 0 reason for loyalty besides giving your time for the paycheck. And 0 reason to go 120% so you burn out.

By ForHackernews, 6 hours ago

Software developers have never been professionals. Doctors, lawyers, accountants, chartered engineers are professionals. They have autonomy and obligations to a professional code of ethics that supersedes their responsibility to their employers.

Devs are hired goons at worst and skilled craftspeople at best, but never professionals.

By palmotea, 6 hours ago

> What do you really care? Its a job.

Because having a job that's somewhat satisfying and not just a grind is great for one's own well-being. It's also not a bad deal for the employer, because an engaged employee delivers better results than one who doesn't give a shit.

By 3D30497420, 12 hours ago

> People outsourcing thinking and entire skillset to it - they usually have very little clue in the topic, are interested only in results, and are not interested in knowing more about the topic or honing their skills in the topic

And this may be fine in certain cases.

I'm learning German and my listening comprehension is marginal. I took a practice test and one of the exercises was listening to 15-30 seconds of audio followed by questions. I did terribly, but it seemed like a good way to practice. I used Claude Code to create a small app to generate short audio (via ElevenLabs) dialogs and set of questions. I ran the results by my German teacher and he was impressed.

I'm aware of the limitations: Sometimes the audio isn't great (it tends to mess up phone numbers), it can only a small part of my work learning German, etc.

The key part: I could have coded it, but I have other more important projects. I don't care that I didn't learn about the code. What I care about is I'm improving my German.

By isqueiros, 12 hours ago

Seems like you are part of the first group then, not the second. The fact that you are interested in learning and are using it as a tool disqualifies you from someone who has little clue and just wants to get something out (i.e. just spit out code)

By 3D30497420, 12 hours ago

As I reread the original post, I'm not actually not sure which group I fall into. I think there's a bunch of overlap depending on perspective/how you read it:

> Group 1: intern/boring task executor

Yup, that makes sense I'm in group 1.

> Group 2: "outsourcing thinking and entire skillset to it - they usually have very little clue in the topic, are interested only in results"

Also me (in this case), as I'm outsourcing the software development part and just want the final app.

Soo... I probably have thought too much about the original proposed groups. I'm not sure they are as clear as the original suggests.

By aljaz823, 11 hours ago

I'd say you're still in the group 1. Your main goal is not the app but learning German. Therefore creating the app using AI is only a means to an end, a tool, and spending time coding it yourself is not important in this context.

By charcircuit, 10 hours ago

The AI usage was not about learning German, but for creating an app. This would be group 2. He may use the tool he made to learn German, but using that tool isn't using AI

By netdevphoenix, 9 hours ago

>using that tool isn't using AI

It is though. App is using AI underneath to generate audio snippets. That's literally its purpose

By charcircuit, 3 hours ago

Creating those snippets don't require knowing how to make a proper recording, how to edit it down, or how to direct the voice actor for the line.

By 0xEF, 11 hours ago

They could admittedly be more defined, but I think the original commenter missed a key word. It really boils down to whether or not you are offloading your critical thinking.

The word "thinking" can be a bit nebulous in these conversations, and critical thinking perhaps even more ambiguously defined, so before we discuss that, we need to define it. I go with the Merriam-Webster definition: the act or practice of thinking critically (as by applying reason and questioning assumptions) in order to solve problems, evaluate information, discern biases, etc.

LLMs seem to be able to mimic this, particularly to those who have no clue what it means when we call an LLM a "stochastic parrot" or some equally esoteric term. At first I was baffled that anyone really thought that LLMs could somehow apply reason or discern its own biases but I had to take a step back and look at how that public perception was shaped to see what these people were seeing. LLMs, generative AI, ML, etc are all extremely complex things. Couple that with the pervasive notion that thinking is hard and you have a massive pool of consumers who are only too happy to offload some of that thinking on to something they may not fully understand but were promised that it would do what they wanted, which is make their daily lives a bit easier.

We always get snagged by things that promise us convenience or offer to help us do less work. It's pretty human to desire both of those things, but proving to be an Achilles Heel for many. How we characterize AI determines our expectations of it; so do you think of it as a bag of tools you can use to complete tasks? Or is it the whole factory assembly line where you can push a few buttons and an pseudo-finished product comes out the other side?

By fragmede, 11 hours ago

False dichotomy is one of the original sins. The two groups as advertised aren't all that's out there. Most people are interested in results. How we get those results is part of the journey of getting results, and sometimes it's about the journey not the destination. I care very much about the results of my biopsy or my flight, I don't know much about how we get there, I want to know if I have cancer, and that my plane didn't crash. I hope that doesn't put me on the B ark that gets sent into the sun.

By idopmstuff, 6 hours ago

This is me, but for writing code. I own a business, and I use Claude Code to build internal tools for myself.

Don't care about code quality; never seen the code. I care if the tools do the things I want them to do, and they verifiably do.

By bwestergard, 4 hours ago

I'd love to hear about what your tools do.

By 3D30497420, 2 hours ago

You're in luck: https://theautomatedoperator.substack.com/

By cik, 9 hours ago

To me this misses a third group, those using these tools as a series of virtual teammates, a mock team member with which to ping pong possibilities.

This is actually the greatest use case I see, and interact with.

By mettamage, 7 hours ago

Yea I am an ENFP. While I don’t think MBTI is scientific, it captures perfectly that I have the tendency to think out loud.

LLMs make me think out loud way better.

Best rubber duck ever.

By madeofpalk, 8 hours ago

To me, this is the first use case, depending on whether you're aware of its shortcomings or not.

By GrinningFool, 6 hours ago

I can buy that if we stipulate that one person can belong to both groups, depending on the task and goals of the user.

Sometimes I just want the thing and really don't care about any details. Sometimes I want a very specific thing built in a very specific way. Sometimes I care about some details and not others.

How I use the tools at my disposal depends on what I want to get out of the effort.

By Aardwolf, 13 hours ago

The same person might be both kinds of users, depending on the topic or just the time of the day

By mathgeek, 8 hours ago

It's almost as if categorization is often an oversimplification.

By safety1st, 12 hours ago

Well the second group in your taxonomy are very unserious, I mean that's fine, it's OK to use an AI tool for vibing and self-amusement, there will be an entire multi-billion dollar entertainment industry which will grow up around that. In my personal experience, decisionmakers who fell into this camp and were frothing at the mouth about making serious business decisions this way are already starting to get a reality check.

From my perspective the distinction is more on the supply side and we have two generations of AI tools. The first generation was simply talking to a chatbot in a web UI and it's still got its uses, you chat and build up a context with it, it's relying heavily on its training data, maybe it's reading one file.

The second generation leans into RAG and agentic capabilities (if you can glob and grep or otherwise run a search, congrats you have v1 of your RAG strategy). This is where Gemini actually scans all the docs in our Google Workspace and produces a proposal similar to ones we've written before. (Do we even need document templates anymore?) Or where you start a new programming project and Claude can write all the boilerplate, deploy and set up a barebones test suite within a couple of minutes. There's no doubt that these types of tools give us new capabilities and in some cases save a lot more time than just babbling into chatgpt.com.

I think this accounts for a lot of differences in terms of reported productivity by the sane users. I was way less enthusiastic about AI productivity gains before I discovered the "gen 2" applications.

By absynth, 11 hours ago

Other alternatives that aren't exactly "just as a tool":

* people who use it instead of search engines.

* people who use it as a doctor/therapist/confidant. Not to research. But as a practitioner.

There are others:

* people who use it instead of man pages or documentation.

* people who use it for short scripts in a language they don't quite understand but "sorta kinda".

By notarobot123, 12 hours ago

> The second group is one that thinks talking to a chatbot will replace senior developer

And the first group thinks that these tools will enable them to replace a whole team of developers.

By cheevly, 7 hours ago

A company with 5 developers could potentially downsize to 3 developers using AI, while improving overall velocity. Would you agree?

By the__alchemist, an hour ago

No.

By cheevly, 8 hours ago

What about the type of user that uses thinking/reasoning to produce more advanced tooling in order to outsource more and more of their thinking and skillsets to it? Because I myself and many others that I know fall into that category.

By RHSeeger, 9 hours ago

I think the specific examples of the first group there undersell it. They make it sound like the group isn't getting a lot of power out of the AI. The things I use it as a tool for include

- Peer reviews. Not the only peer review of code, but a "first pass" to point out anything that I might have missed

- Implementing relatively simple changes; ones where the "how" doesn't require a lot of insight into long term planning

- Smart auto-complete (and this one is huge)

- Searching custom knowledge bases (I use Obsidian and have an AI tied into it to search through my decade+ of notes)

- Smart search of the internet; describing the problem I'm trying to solve and then asking it to find places that discuss that type of thing

- I rarely use it to clean up emails, but it does happen sometimes. My emails tend to be very technical, and "cleaning them up" usually requires I spend time figuring out what information not to include

By whynotmaybe, 5 hours ago

> The second group is one that thinks talking to a chatbot will replace senior developer

Once they realize that it doesn't replace senior but can easily replace junior, junior dev will have a bigger problem and the industry at large will have a huge problem in 8 years because the concept of "senior" would have vanished.

By hxugufjfjf, 3 hours ago

Why would the concept of senior disappear in 8 years? I am a senior, and work with seniors that have been seniors for 20 years. In 8 years we will still be seniors.

By whynotmaybe, 2 hours ago

My perspective is that each year a good chunk of senior dev leave the pure dev to go to management roles or something else and while they do that, they are replaced by junior morphing into senior.

I consider 8 years to be the real experience to be considered a senior dev.

If from now on, the amount of junior is drastically reduced, this will lead to a lack of senior in 8 years because the senior leaving should be the same proportion.

In a situation where they replace juniors with agents, yes, we'll still be senior, but just like people capable of setting a VHS recorder, our number will dwindle.

By hxugufjfjf, an hour ago

I understand. At my work nobody senior becomes a manager so I never had that perspective. Thanks for clarifying your thoughts for me.

By everdrive, 9 hours ago

And what people don't understand is that these two modes, much like those which can successfully restrict their calories and stay in shape, are dispositional more than anything. Most people will fail to "upgrade" to the better path, and people in the better path will fail to understand why most people are complaining about LLMs.

By jmathai, 6 hours ago

You can split the second group into two sub-buckets.

Junior devs: who have limited experience or depth in knowledge. They are unable to analyze the output of AI coding agents sufficiently to determine long term viability of the code. I think this is the entirety of who you're speaking of.

Senior devs: who are using it for more than a basic task executor. They have a decade+ of experience and can quickly understand if what the AI coding agent suggests is viable long term or not. When it's not, they understand how to steer it into a more appropriate direction.

By netdevphoenix, 9 hours ago

> The second group is one that thinks talking to a chatbot will replace senior developer

No one is going to replace senior developers. But senior developer pay WILL decrease relative to its historical values.

By sharperguy, 9 hours ago

Surely making use of a new tool that makes you more productive would increase your value rather than decreasing it? Especially when, knowing the kinds of mistakes AI could make that would affect your codebase negatively in terms of maintainability, security etc would require significant experience.

By netdevphoenix, 4 hours ago

> Surely making use of a new tool that makes you more productive would increase your value rather than decreasing it?

Think wider. You, sharperguy, are not and will not the only person with access to these tools. Therefore, your productivity increase will likely be the same as everyone else's. If you are as good as everyone else, why would YOU get paid more? Have you ever seen a significant number of companies outside FAANG permanently boost everyone's salary just because they they did well on a given year?

A company's goal is to the shareholders not to you. Your value exists relative to that of others.

By RicDan, 8 hours ago

Not really. If pay decreases it's because you're not required anymore or less, which is contrary to what has been shown. IF educating and enabling juniors etc. is not handled correctly, then senior pay will explode, because whilst they are much more efficient, their inherent knowledge is required to produce sustainable results.

By netdevphoenix, 4 hours ago

> If pay decreases it's because you're not required anymore or less

Not necessarily, there are many factors at play here which are downplayed. The first one is education: LLMs are going to significantly improve skill training. Arguably, it is already happening. So the gap between you and a middev will get narrower. At the same time, candidates who can be as good as you will increase.

While you can argue that you possess specialised skills that not many do, you are unlikely to prove that under pressure within a couple of hours and certainly not to the level where you can have late 10s level of negotiating power imo.

At the end of the day, the market can stay irrational longer than you can continue refuse to accept a lower offer imo. I believe there will be winners. But pure technical skill isn't the moat you think it is. Not anymore.

By bwat49, 6 hours ago

I think there's some middle ground possible between those two black and white groupings

By CrzyLngPwd, 9 hours ago

I agree, but there is a creeping issue of where the first group may delve deeper into a topic if all they/we have is an increasingly polluted internet.

By delaminator, 10 hours ago

What about me? I'm in group 3 and I can't be alone.

I'm a subject matter expert 45 years in programming and data, aware of the tools limitation but still use it all day every day to implement non-trivial code, all the while using other tools to do voice transcription, internal blog posting about new tools, agents information gathering while I sleep, various classifiers, automated OCR, email scanning, recipe creation, electronics designing, many many other daily tasks.

By holoduke, 11 hours ago

I think you miss one third user. That's a developer generating entire systems and still have an understanding on the output. The dev person is in control of the architecture, code quality, functional quality and more. These persons are still rare. But I have seen them already. They are the new 10x developers.

By actsasbuffoon, 7 hours ago

Ironically, I find LLMs far better at helping me dive into unfamiliar code than at writing it.

A few weeks ago a critical bug came in on a part of the app I’d never touched. I had Claude research the relevant code while I reproduced the bug locally, then had it check the logs. That confirmed where the error was, but not why. This was code that ran constantly without incident.

So I had Claude look at the Excel doc the support person provided. Turns out there was a hidden worksheet throwing off the indices. You couldn’t even see the sheet inside Excel. I had Claude move it to the end where our indices wouldn’t be affected, ran it locally, and it worked. I handed the fixed document back to the support person and she confirmed it worked on her end too.

Total time to resolution: 15 minutes, on a tricky bug in code I’d never seen before. That hidden sheet would have been maddening to find normally. I think we might be strongly overestimating the benefits of knowing a codebase these days.

I’ve been programming professionally for about 20 years. I know this is a period of rapid change and we’re all adjusting. But I think getting overly precious about code in the age of coding agents is a coping mechanism, not a forward-looking stance. Code is cheap now. Write it and delete it.

Make high leverage decisions and let the agent handle the rest. Make sure you’ve got decent tests. Review for security. Make peace with the fact that it’s cheaper to cut three times and measure once than it used to be to measure twice and cut once.

By ontouchstart, 8 hours ago

I have been diving deeply in the Rust community and ecosystem and really enjoyed reading the decade of real engineering poor into it, from RFCs to std, critical crates such as serde, and testing practices. What a refreshing world.

Compared to the mess created by Node.js npm amateur engineers, it really shows who is 10x or 100x.

Outsourcing critical thinking to pattern matching and statistical prediction will make the haystacks even more unmanageable.

By nisegami, 9 hours ago

I find myself in both groups depending on the project/task. I wonder what to make of that.

By kakacik, 11 hours ago

Second group are often the management decision makers, holding budgets, setting up 5-year plans etc. Don't underestimate them nor mock them, at the end its a disservice to all of us.

By ontouchstart, 5 hours ago

That is a great point. People in this group are programming at a different abstraction level, i.e., allocating computing resources, both human and machine resources.

Now AI agents are cheap but they generate a lot of slop, and potential minefields that might be costly to clean. The ROI will show up eventually and people in the second group will find out their jobs might be in danger. Hopefully a third group will come to save them.

By danpalmer, 19 hours ago

I've noticed a huge gap between AI use on greenfield projects and brownfield projects. The first day of working on a greenfield project I can accomplish a week of work. But the second day I can accomplish a few days of work. By the end of the first week I'm getting a 20% productivity gain.

I think AI is just allowing everyone to speed-run the innovator's dilemma. Anyone can create a small version of anything, while big orgs will struggle to move quickly as before.

The interesting bit is going to be whether we see AI being used in maturing those small systems into big complex ones that account for the edge cases, meet all the requirements, scale as needed, etc. That's hard for humans to do, and particularly while still moving. I've not see any of this from AI yet outside of either a) very directed small changes to large complex systems, or b) plugins/extensions/etc along a well define set of rails.

By stego-tech, 17 hours ago

Enterprise IT dinosaur here, seconding this perspective and the author’s.

When I needed to bash out a quick Hashicorp Packer buildfile without prior experience beyond a bit of Vault and Terraform, local AI was a godsend at getting me 80% of the way there in seconds. I could read it, edit it, test it, and move much faster than Packer’s own thin “getting started” guide offered. The net result was zero prior knowledge to a hardened OS image and repeatable pipeline in under a week.

On the flip side, asking a chatbot about my GPOs? Or trusting it to change network firewalls and segmentation rules? Letting it run wild in the existing house of cards at the core of most enterprises? Absolutely hell no the fuck not. The longer something exists, the more likely a chatbot is to fuck it up by simple virtue of how they’re trained (pattern matching and prediction) versus how infrastructure ages (the older it is or the more often it changes, the less likely it is to be predictable), and I don’t see that changing with LLMs.

LLMs really are a game changer for my personal sales pitch of being a single dinosaur army for IT in small to medium-sized enterprises.

By PunchyHamster, 13 hours ago

Yeah, I use it to get some basic info about topic I know little of (as google search is getting worse by the day..). That then I check.

Honestly the absolute revolution for me would be if someone managed to make LLM tell "sorry I don't know enough about the topic", one time I made a typo in a project name I wanted some info on and it outright invented commands and usages (that also were different than the project I was looking for so it didn't "correct the typo") out of thin air...

By xmcqdpt2, 9 hours ago

> Honestly the absolute revolution for me would be if someone managed to make LLM tell "sorry I don't know enough about the topic"

https://arxiv.org/abs/2509.04664

According to that OpenAI paper, models hallucinate in part because they are optimized on benchmarks that involve guessing. If you make a model that refuses to answer when unsure, you will not get SOTA performance on existing benchmarks and everyone will discount your work. If you create a new benchmark that penalizes guessing, everyone will think you are just creating benchmarks that advantage yourself.

By snovv_crash, 17 minutes ago

That is such a cop-out, if there was a really good benchmark for getting rid of hallucinations then it would be included in every eval comparison graph.

The real reason is that every bench I've seen has Anthropic with lower hallucinations.

By KellyCriterion, 3 hours ago

...or they hallicunate because of floating point issues in parallel execution environments:

https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

By cbdevidal, 9 hours ago

Holy perverse incentives, Batman

By Incipient, 16 hours ago

>LLMs really are a game changer for my personal sales pitch of being a single dinosaur army for IT in small to medium-sized enterprises.

This is essentially what I'm doing too but I expect in a different country. I'm finding it incredibly difficult to successfully speak to people. How are you making headway? I'm very curious how you're leveraging AI messaging to clients/prospective clients that doesn't just come across as "I farm out work to an AI and yolo".

Edit - if you don't mind sharing, of course.

13 hours ago

[deleted]

By judahmeek, 15 hours ago

I interpreted his statement as LLMs being valuable for the actual marketing itself.

By vages, 14 hours ago

Which local AI do you use? I am local-curious, but don’t know which models to try, as people mention them by model name much less than their cloud counterparts.

By charcircuit, 10 hours ago

I've had code look having claude code use ssh with root to deploy code, change configurations, and debug bad configs / selinux policy / etc. Debugging servers is not that different than debugging code. You just need to give it a way to test.

By holoduke, 15 hours ago

I let Claude configure en setup entire systems now. Requires some manual auditing and steering once in a while. But managing barebone servers without any management software has become pretty feasible and cheap. I managed to configure +50 Debian server cluster simultaneously with just ssh and Claude. Yes it's cowboy 3.0. But so are our products/sites.

By GrinningFool, 6 hours ago

When you use phrases like "managed to configure" to describe your production systems, it does not inspire confidence in long-term sustainability of those systems.

4 hours ago

[deleted]

By somat, 15 hours ago

Isn't this true of any greenfield project? with or without generative models. The first few days are amazingly productive. and then features and fixes get slower and slower. And you get to see how good an engineer you really are, as your initial architecture starts straining under the demands of changing real world requirements and you hope it holds together long enough to ship something.

"I could make that in a weekend"

"The first 80% of a project takes 80% of the time, the remaining 20% takes the other 80% of the time"

By jjav, 12 hours ago

> Isn't this true of any greenfield project?

That is a good point and true to some extent. But IME with AI, both the initial speedup and the eventual slowdown are accelerated vs. a human.

I've been thinking that one reason is that while AI coding generates code far faster (on a greenfield project I estimate about 50x), it also generates tech-debt at a hyperastonishing rate.

It used to be that tech debt started to catch up with teams in a few years, but with AI coded software it's only a few months into it that tech debt is so massive that it is slowing progress down.

I also find that I can keep the tech debt in check by using the bot only as a junior engineer, where I specify precisely the architecture and the design down to object and function definitions and I only let the bot write individual functions at a time.

That is much slower, but also much more sustainable. I'd estimate my productivity gains are "only" 2x to 3x (instead of ~50x) but tech debt accumulates no faster than a purely human-coded project.

This is based on various projects only about one year into it, so time will tell how it evolves longer term.

By unlikelymordant, 12 hours ago

In your experience, can you take the tech debt riddled code, and ask claude to come up with an entirely new version that fixes the tech debt/design issues you've identified? Presumably there's a set of tests that you'd keep the same, but you could leverage the power of ai in greenfield scenarios to just do a rewrite (while letting it see the old code). I dont know how well this would work, i havn't got to the heavy tech debt stage in any of my projects as I do mostly prototyping. I'd be interested in others thoughts.

By leoedin, 9 hours ago

I built an inventory tracking system as an exercise in "vibe coding" recently. I built a decent spec in conversation with Claude, then asked it to build it. It was kind of amazing - in 2 hours Claude churned out a credible looking app.

It looked really good, but as I got into the details the weirdness really started coming out. There's huge functions which interleave many concepts, and there's database queries everywhere. Huge amounts of duplication. It makes it very hard to change anything without breaking something else.

You can of course focus on getting the AI to simplify and condense. But that requires a good understanding of the codebase. Definitely no longer vibe-coded.

My enthusiasm for the technology has really gone in a wave. From "WOW" when it churned out 10k lines of credible looking code, to "Ohhhh" when I started getting into the weeds of the implementation and realising just how much of a mess it was. It's clearly very powerful for quick and dirty prototypes (and it seems to be particularly good at building decent CRUD frontends), but in software and user interaction the devil is in the details. And the details are a mess.

By jerf, 6 hours ago

At the moment, good code structure for humans is good code structure for AIs and bad code structure for humans is still bad code structure for AIs too. At least to a first approximation.

I qualify that because hey, someone comes back and reads this 5 years later, I have no idea what you will be facing then. But at the moment this is still true.

The problem is, people see the AIs coding, I dunno, what, a 100 times faster minimum in terms of churning out lines? And it just blows out their mental estimation models and they substitute an "infinity" for the capability of the models, either today or in the future. But they are not infinitely capable. They are finitely capable. As such they will still face many of the same challenges humans do... no matter how good they get in the future. Getting better will move the threshold but it can never remove it.

There is no model coming that will be able to consume an arbitrarily large amount of code goop and integrate with it instantly. That's not a limitation of Artificial Intelligences, that's a limitation of finite intelligences. A model that makes what we humans would call subjectively better code is going to produce a code base that can do more and go farther than a model that just hyper-focuses on the short-term and slops something out that works today. That's a continuum, not a binary, so there will always be room for a better model that makes better code. We will never overwhelm bad code with infinite intelligence because we can't have the latter.

Today, in 2026, providing the guidance for better code is a human role. I'm not promising it will be forever, but it is today. If you're not doing that, you will pay the price of a bad code base. I say that without emotion, just as "tech debt" is not always necessarily bad. It's just a tradeoff you need to decide about, but I guarantee a lot of people are making poor ones today without realizing it, and will be paying for it for years to come no matter how good the future AIs may be. (If the rumors and guesses are true that Windows is nearly in collapse from AI code... how much larger an object lesson do you need? If that is their problem they're probably in even bigger trouble than they realize.)

I also don't guarantee that "good code for humans" and "good code for AIs" will remain as aligned as they are now, though it is my opinion we ought to strive for that to be the case. It hasn't been talked about as much lately, but it's still good for us to be able to figure out why a system did what it did and even if it costs us some percentage of efficiency, having the AIs write human-legible code into the indefinite future is probably still a valuable thing to do so we can examine things if necessary. (Personally I suspect that while there will be some efficiency gain for letting the AIs make their own programming languages that I doubt it'll ever be more than some more-or-less fixed percentage gain rather than some step-change in capability that we're missing out on... and if it is, maybe we should miss out on that step-change. As the moltbots prove that whatever fiction we may have told ourselves about keeping AIs in boxes is total garbage in a world where people will proactively let AIs out of the box for entertainment purposes.)

By fauigerzigerk, 11 hours ago

Perhaps it depends on the nature of the tech-debt. A lot of the software we create has consequences beyond a paticular codebase.

Published APIs cannot be changed without causing friction on the client's end, which may not be under our control. Even if the API is properly versioned, users will be unhappy if they are asked to adopt a completely changed version of the API on a regular basis.

Data that was created according to a previous version of the data model continues to exist in various places and may not be easy to migrate.

User interfaces cannot be radically changed too frequently without confusing the hell out of human users.

By jjav, 11 hours ago

> ask claude to come up with an entirely new version that fixes the tech debt/design issues you've identified?

I haven't tried that yet, so not sure.

Once upon a time I was at a company where the PRD specified that the product needs to have a toggle to enable a certain feature temporarily. Engineering implemented it literally, it worked perfectly. But it was vital to be able to disable the feature, which should've been obvious to anyone. Since the PRD didn't mention that, it was not implemented.

In that case, it was done as a protest. But AI is kind of like that, although out of sheer dumbness.

The story is meant to say that with AI it is imperative to be extremely prescriptive about everything, or things will go haywire. So doing a full rewrite will probably work well, only if you manage to have very tight test case coverage for absolutely everything. Which is pretty hard.

By Draiken, 9 hours ago

Take Claude Code itself. It's got access to an endless amount of tokens and many (hopefully smart) engineers working on it and they can't build a fucking TUI with it.

So, my answer would be no. Tech debt shows up even if every single change made the right decisions and this type of holistic view of projects is something AIs absolutely suck at. They can't keep all that context in their heads so they are forever stuck in the local maxima. That has been my experience at least. Maybe it'll get better... any day now!

By michaelt, 12 hours ago

> Isn't this true of any greenfield project?

Sometimes the start of a greenfield project has a lot of questions along the lines of "what graph plotting library are we going to use? we don't want two competing libraries in the same codebase so we should check it meets all our future needs"

LLMs can select a library and produce a basic implementation while a human is still reading reddit posts arguing about the distinction between 'graphs' and 'charts'.

By dust42, 14 hours ago

From personal experience I'd like to add the last 5% take 95% of the time - at least if you are working on a make over of an old legacy system.

By Fr0styMatt88, 17 hours ago

I find AI great for just greasing the wheels, like if I’m overthinking on a problem or just feel too tired to start on something I know needs doing.

The solutions also help me combat my natural tendency to over-engineer.

It’s also fun getting ChatGPT to quiz me on topics.

By orwin, 18 hours ago

Yeah, my observation is that for my usual work, I can maybe get a 20% productivity boot, probably closer to 10% tbh, and for the whole team overall productivity it feels like it has done nothing, as senior use their small productivity gains to fix the tons of issues in PR (or in prod when we miss something).

But last week I had two days where I had no real work to do, so I created cli tools to help with organisation, and cleaning up, I think AI boosted my productivity at least 200%, if not 500.

By K0balt, 16 hours ago

It seems to be fantastic up to about 5k loc and then it starts to need a lot more guidance, careful supervision, skepticism, and aggressive context management. If you’re careful, it only goes completely off the rails once in a while and the damage is only a lost hour or two.

Overall, still a 4x production gain overall though, so I’m not complaining for $20 a month. It’s especially good at managing complicated aspects of c so I can focus on the bigger picture rather than the symbol contortions.

By gingersnap, 13 hours ago

Yes, I see the same thing. My working thesis is that if I can keep the codebase modular and clear seperations, so I keep the entire context, while claude code only need to focus on one module at a time, I can keep up the speed and quality. But if I try and give it tasks that cover the entire codebase it will have issues, no matter how you manage context and give directions. And again, this is not suprising, humans do the same, they need to break the task apart into smaller piecers. Have you found the same?

By K0balt, 9 hours ago

Yes. Spot on. The good thing is that it makes better code if modularity is strict as well.

I’m finding that I am breaking projects down into clear separations of concerns and designing inviolate API walls between modules, where before I might have reached into the code with less clearly defined internal vs external functions.

Exercising solid boundaries and being maniacal about the API surface is also really liberating personally, less cognitive load, less stress, easier tests, easier debugging.

Of course none of this is new, but now we can do it and get -more- done in a day than if we don’t. Building in technical debt no longer raises productivity, it lowers it.

If you are a competent engineer, ai can drastically improve both code quality and productivity, but you have to be capable of cognitively framing the project in advance (which can also be accelerated with ai). You need to work as an architect more than a coder.

By Gigachad, 17 hours ago

Similar experience. I love using Gemini to set up my home server, it can debug issues and generate simple docker compose files faster than I could have done myself. But at work on the 10 year old Rails app, I find it so much easier to just write all the code myself than to work out what prompt would work and then review/modify the results.

By galaxyLogic, 14 hours ago

This makes me think how AI turns SW development upside down. In traditonal development we write code which is the answer to our problems. With AI we write questions and get the answers. Neither is easy, finding the correct questions can be a lot fo work, whereas if you have some existing code you already have the answers, but you may not have the questions (= "specs") written down anywhere, at least not very well, typically.

By data-ottawa, 19 hours ago

It’s fantastic to be able to prototype small to medium complexity projects, figure what architects work and don’t, then build on a stable foundation.

That’s what I’ve been doing lately, and it really helps get a clean architecture at the end.

By johnrob, 19 hours ago

I’ve done this in pure Python for a long time. Single file prototype that can mostly function from the command line. The process helps me understand all the sub problems and how they relate to each other. Best example is when you realize behaviors X, Y, and Z have so much in common that it makes sense to have a single component that takes a parameter to specify which behavior to perform. It’s possible that already practicing this is why I feel slightly “meh” compared to others regarding GenAI.

By veunes, 11 hours ago

All of this speedrun hits a wall at the context window. As long as the project fits into 200k tokens, you’re flying. The moment it outgrows that, productivity doesn’t drop by 20% - it drops to zero. You start spending hours explaining to the agent what you changed in another file that it has already forgotten. Large organizations win in the long run precisely because they rely on processes that don’t depend on the memory of a single brain - even an electronic one

By TaupeRanger, 7 hours ago

This reads as if written by someone who has never used these tools before. No one ever tries to "fit" the entire project into a single context window. Successfully using coding LLMs involves context management (some of which is now done by the models themselves) so that you can isolate the issues you're currently working on, and get enough context to work effectively. Working on enormous codebases over the past two months, I have never had to remind the model what it changed in another file, because 1) it has access to git and can easily see what has changed, and 2) I work with the model to break down projects into pieces that can be worked on sequentially. And keep in mind, this the worst this technology will ever be - it will only get larger context windows and better memory from here.

By kasey_junk, 9 hours ago

Everyone I know who is using AI effectively has solved for the context window problem in their process. You use design, planning and task documents to bootstrap fresh contexts as the agents move through the task. Using these approaches you can have the agents address bigger and bigger problems. And you can get them to split the work into easily reviewable chunks, which is where the bottleneck is these days.

Plus the highest end models now don’t go so brain dead at compaction. I suspect that passing context well through compaction will be part of the next wave of model improvements.

By kristopolous, 11 hours ago

I call it the day50 problem, coined that about a year ago. I've been building tools to address it since then. Quit the dayjob 7 months ago and have been doing it full time since

https://github.com/day50-dev/

I have been meaning to put up a blog ...

Essentially there's a delta between what the human does and the computer produces. In a classic compiler setting this is a known, stable quantity throughout the life-cycle of development.

However, in the world of AI coding this distance increases.

There's various barriers that have labels like "code debt" where the line can cross. There's three mitigations now. Start the lines closer together (PRD is the current en vogue method), push out the frontier of how many shits someone gives (this is the TDD agent method), try to bend the curve so it doesn't fly out so much (this is the coworker/colleague method).

Unfortunately I'm just a one-man show so the fact that I was ahead and have working models to explain this has no rewards because you know, good software is hard...

I've explained this in person at SF events (probably about 40-50 times) so much though that someone reading this might have actually heard it from me...

If that's the case, hi, here it is again.

By daliusd, 10 hours ago

This is where engineering practices help. Based on 1.5 years data from my team I can say that I see about 30% performance increase on mature system (about 9 years old code base), maybe more. The interesting stuff - LLMs is leverage, the better engineer you are the more you benefit from LLM.

By mark_l_watson, 11 hours ago

A friendly counterpoint: my test of new models and agentic frameworks is to copy one of my half a zillion old open source git repos for some usually very old project or experiment - I then see how effective the new infra is for refactoring and cleaning up my ancient code. After testing I usually update the old projects and I get the same warm fuzzy feeling as I do from spring cleaning my home.

I also like to generate greenfield codebases from scratch.

By Aeolun, 15 hours ago

I find that setting up proper structure while everything still fits in a single context window of Claude code, as well as splittjng as much as possible into libraries works pretty well for staving off that moment.

By EnPissant, 18 hours ago

I have experienced much of the opposite. With an established code base to copy patterns from, AI can generate code that needs a lot less iteration to clean up than on green fields projects.

By cortesoft, 17 hours ago

I solve this problem by pointing Claude at existing code bases when I start a project, and tell it to use that approach.

By danpalmer, 17 hours ago

That's a fair observation, there's probably a sweet spot. The difference I've found is that I can reliably keep the model on track with patterns through prompting and documentation if the code doesn't have existing examples, whereas I can't document every single nuance of a big codebase and why it matters.

By sevenzero, 12 hours ago

> Anyone can create a small version of anything

Yup. My biggest issue with designing software is usually designing the system architecture/infra. I am very opposed to just shove everything to AWS and call it a day, you dont learn anything from that, cloud performance stinks for many things and I dont want to get random 30k bills because I let some instance of something run accidentally.

AI sucks at determining what kinda infrastructure would be great for scenario x due to Cloud being to go to solution for the lazy dev. Tried to get it to recommend a way to self host stuff, but thats just a general security hazard.

By tonfreed, 17 hours ago

My observations match this. I can get fresh things done very quickly, but when I start getting into the weeds I eventually get too frustrated with babysitting the LLM to keep using it.

By copilot_king, 9 hours ago

[dead]

By defrost, 20 hours ago

The "upside" description:

  On the other you have a non-technical executive who's got his head round Claude Code and can run e.g. Python locally.

  I helped one recently almost one-shot converting a 30 sheet mind numbingly complicated Excel financial model to Python with Claude Code.

  Once the model is in Python, you effectively have a data science team in your pocket with Claude Code. You can easily run Monte Carlo simulations, pull external data sources as inputs, build web dashboards and have Claude Code work with you to really integrate weaknesses in your model (or business). It's a pretty magical experience watching someone realise they have so much power at their fingertips, without having to grind away for hours/days in Excel.

almost makes me physically sick.

I've a reasonably intense math background corrupted by application to geophysics and implementing real world numerical applications.

To be fair, this statement alone:

* 30 sheet mind numbingly complicated Excel financial model

makes my skin crawl and invokes a flight reflex.

Still, I'll concede that a Claude Code conversion to Python of a 30 sheet Excel financial model is unlikely to be significantly worse than the original.

By majormajor, 19 hours ago

One of the dirty secrets of a lot of these "code adjacent" areas is that they have very little testing.

If a data science team modeled something incorrectly in their simulation, who's gonna catch it? Usually nobody. At least not until it's too late. Will you say "this doesn't look plausible" about the output? Or maybe you'll be too worried about getting chided for "not being data driven" enough.

If an exec tells an intern or temp to vibecode that thing instead, then you definitely won't have any checkpoints in the process to make sure the human-language prompt describing process was properly turned into the right simulation. But unlike in coding, you don't have a user-facing product that someone can click around in, or send requests to, and verify. Is there a test suite for the giant excel doc? I'm assuming no, maybe I'm wrong.

It feels like it's going to be very hard for anyone working in areas with less black-and-white verifiability or correctness like that sort of financial modeling.

By benjijay, 10 hours ago

> If a data science team modeled something incorrectly in their simulation, who's gonna catch it? Usually nobody. At least not until it's too late. Will you say "this doesn't look plausible" about the output?

I recently watched a demo from a data science guy about the impending proliferation of AI in just about all related fields, his position was highly sceptical but with a "let's make the most of it while we can"

The part that stood out to me which I have repeated to colleagues since, was a demo where the guy fed his tame robot a .csv of price trends for apples and bananas, and asked it to visualise this. Sure enough, out comes a nice looking graph with two jagged lines. Pack it ship it move on..

But then he reveals that, as he wrote the data himself, he knows that both lines should just be an upward trend. Expands the axis labels - the LLM has alphabetized the months but said nothing of it in any of the outputs.

By jihadjihad, 7 hours ago

Always a good idea to spot check the labels and make sure you've got JFMAMJ..JASON Derulo

By senordevnyc, 7 hours ago

Like every anecdote out there where an LLM makes a basic mistake, this one is worthless without knowing the model and prompt.

By gipp, 5 hours ago

If choosing the "wrong" model, or not wording your prompt in just the right way, is sufficient to not just degrade your output but make it actively misleading and worse than useless, then what does that say about the narrative that all this sort of work is about to be replaced?

By benjijay, 5 hours ago

I don't recall the bot he was using, it was a rushed portion of the presentation to make the point that "yes these tools exist, but be mindful of the output - they're not a magic wand"

By Hammershaft, 15 hours ago

This has had tremendous real world consequences. The European austerity wave of the early 2010s was largely downstream of an excel spreadsheet errors that changed the result of a major study on the impact of debt/gdp.

https://www.newscientist.com/article/dn23448-how-to-stop-exc...

By tharkun__, 19 hours ago

This is a pet peeve of mine at work.

Any and I mean any statistic someone throws at me I will try and dig in. And if I'm able to, I will usually find that something is very wrong somewhere. As in, the underlying data is usually just wrong, invalidating the whole thing or the data is reasonably sound but the person doing the analysis is making incorrect assumptions about parts of the data and then drawing incorrect conclusions.

By defrost, 18 hours ago

I've frequently found, over a few decades, that numerical systems are cyclically 'corrected' until results and performance match prior expectations.

There are often more errors. Sometimes the actual results are wildly different in reality to what a model expects .. but the data treatment has been bug hunted until it does what was expected .. and then attention fades away.

By pprotas, 14 hours ago

Or the company just changes the definition of success, so that the metrics (that used to be bad last quarter) are suddenly good

By aschla, 19 hours ago

It seems to be an ever-present trait of modern business. There is no rigor, probably partly because most business professionals have never learned how to properly approach and analyze data.

Can't tell you how many times I've seen product managers making decisions based on a few hundred analytics events, trying to glean insight where there is none.

By p_v_doom, 13 hours ago

Also rigor is slow. Looks like a waste of time.

What are you optimizing all that code for, it works doesnt it? Dont let perfect be the enemy of good. If it works 80% thats enough, just push it. What is technical debt?

By gyomu, 16 hours ago

If what you're saying 1) is true and 2) does matter in the success of a business, then wouldn't anyone be able to displace an incumbent trivially by applying a bit of rigor?

I think 1) holds (as my experience matches your cynicism :), but I have a feeling that data minded people tend to overestimate the importance of 2)...

By laserlight, 14 hours ago

> does matter in the success of a business

In many experience, many of the statistics these people use doesn't matter in the success of a business --- they are vanity metrics. But people use statistics, and especially the wrong statistics, to pass their agenda. Regardless, it's important to fix the statistics.

By mettamage, 14 hours ago

Rigor helps for better insights about data. That can help for entrepreneurship.

What also can help for entrepreneurship is having a bias for action. So even if your insights are wrong, if you act and keep acting you will keep acting then you will partially shape reality to your will and bend to its will.

So there are certain forces where you can compensate for your lack of rigor.

The best companies have both of those things by their side.

By riskable, 5 hours ago

> Any and I mean any statistic someone throws at me I will try and dig in.

I bet you do this only 75% of the time.

By skywhopper, 9 hours ago

This is, unfortunately, a feature of a lot of these systems. The sponsors don’t want truth, they want validation. Generative AI means there don’t even have to be data engineers in the mix to create fake numbers.

By obscurette, 16 hours ago

> If a data science team modeled something incorrectly in their simulation, who's gonna catch it? Usually nobody. At least not until it's too late. Will you say "this doesn't look plausible" about the output?

The local statistics office here recently presented salary statistics claiming that teachers' salaries had unexpectedly increased by 50%. All the press releases went out, and it was only questions raised by the public that forced the statistics office to review and correct the data.

By p_v_doom, 13 hours ago

> If a data science team modeled something incorrectly in their simulation, who's gonna catch it? Usually nobody. At least not until it's too late.

Back in my data scientist days I used to push for testing and verification of models. Got told off for reducing the teams speed. If the model works well enough to get money in, and the managers that make the final calls do not understand the implications of being wrong, this would be the majority of cases.

By AdamN, 8 hours ago

I would say that although Claude may hallucinate at least it can be told to test the scripts. Many data scientists will just balk at the idea of testing a crazy excel workbook with lots of formulas that they themselves inherited.

By singingbard, 17 hours ago

I did a fair about of data analysis and deciding when or if my report was correct was a huge adrenaline rush.

A huge test for me was to have people review my analyses and poke holes. You feel good when your last 50 reports didn’t have a single thing anyone could point out.

I’ve been seeing a lot of people try to build analyses with AI who haven’t been burned with the “just because it sounds correct doesn’t mean it’s right” dilemma who haven’t realized what it takes before you can stamp your name on an analysis.

By decimalenough, 19 hours ago

I'm almost certain it will be significantly worse.

The Excel sheet will have been tuned over the years by people who knew exactly what it was doing and fixed countless bugs along the way.

The Claude Code copy will be a simulacrum that may behave the same way with some inputs, but is likely to get many of edge cases wrong, and, when you're talking about 30 sheets of Excel, there will be many, many of these sharp edges.

By defrost, 19 hours ago

I won't disagree - I suffered from insufficient damning praise in my last sentence above.

IMHO, earned through years of bleeding eyeballs, the first will be riddled with subtle edge cases curiously patched and fettled such that it'll limp through to the desired goal .. mostly.

The automated AI assisted transcoding will be ... interesting.

By holoduke, 14 hours ago

My assumption is that with the right approach you can create a much much better and reliable program using only Claude code. You are referring to yolo coding results

By xmcqdpt2, 9 hours ago

I work in finance and we have prod excel spreadsheets. Those spreadsheets are versioned like code artifacts, with automated testing and everything. Converting them to real applications is a major part of the work the technology division does.

They usually happen because some new and exciting line of business is started by a small team as a POC. Those teams don't get full technology backing, it would slow down the early iteration and cost a lot of money for an idea that may not be lucrative. Eventually they make a lot of money and by then risk controls are basically requiring them to document every single change they make in excel. This eventually sucks enough that they complain and get a tech team to convert the spreadsheet.

By defrost, 8 hours ago

I too have seen such things.

My experience being they are an exception rather than the rule and many more businesses have sheets that tend further toward Heath Robinson than would be admitted in public.

* https://en.wikipedia.org/wiki/W._Heath_Robinson

By PunchyHamster, 12 hours ago

we're going from "bad excel sheet caused recession" to "bad vibe-coded financial thing caused recession"

By ruszki, 2 hours ago

I heard already a story like that. My home country’s government made already some decisions based on some vibe code verified by nobody. It was made by one of my friends. Nobody cares.

By simonebrunozzi, 10 hours ago

> Still, I'll concede that a Claude Code conversion to Python of a 30 sheet Excel financial model is unlikely to be significantly worse than the original.

You made me laugh hard. :)

By biophysboy, 5 hours ago

Ha! I also have a physics background and had the same gag reflex.

By bitwize, 15 hours ago

The thing is, when you use AI, you're not really doing things, you're having things done. AI isn't a tool, it's a service.

Now, back in the day, IBM designed and built an "executive data terminal". It wasn't really a computer terminal in the sense that you and I understand it. Rather, it was a video and two-way-audio feed to a room with a team of underlings, which an executive could ask for business data and analyses, which could be called up on a computer display (also routed to the executive's office). This allowed the executive to ask questions so he (it was the 1960s, it was almost invariably a he) could make informed decisions, and the team of underlings to call up data or crunch numbers on the computer and show the results on the display.

So because executives are used to having things done for them, I can totally see AI being used by executives to replace the "team of underlings" in this setup—in principle. The fact is that were I in that CEO's chair, I'd be thinking twice before trusting anything an LLM tells me, and double-checking those results—perhaps with my team of underlings.

Discussed on Hackernews: https://news.ycombinator.com/item?id=42405462 IEEE article: https://spectrum.ieee.org/ibm-demo

By chrisjj, 11 hours ago

> be thinking twice before trusting anything an LLM tells me

You're too modest. You'd be thinking once.

However when the parrot is hidden in a shiny box made up to look like a regular, relatively trustworthy program...

By ChrisMarshallNY, 19 hours ago

Obligatory xkcd: https://xkcd.com/1667/

By t0mk, 22 minutes ago

I really like that this article explained the title thesis in the first 4 paragraphs. At that point you can decide if you agree or if your want to read on. No unnecessary fluff baiting the reader to read the whole thing in order to find out what are the two kinds of users. More writing like this.

By decimalenough, 19 hours ago

> I helped one recently almost one-shot[3] converting a 30 sheet mind numbingly complicated Excel financial model to Python with Claude Code.

I'm sure Claude Code will happily one-shot that conversion. It's also virtually guaranteed to have messed up vital parts of the original logic in the process.

By linsomniac, 19 hours ago

It depends on how easily testable the Excel is. If Claude has the ability to run both the Excel and the Python with different inputs, and check the outputs, it's stunningly likely to be able to one-shot it.

By AlotOfReading, 19 hours ago

Something being simultaneously described as a "30 sheet, mind-numbingly complex Excel model" and "testable" seems somewhat unlikely, even before we get into whether Claude will be able to test such a thing before it runs into context length issues. I've seen Claude hallucinate running test suites before.

By djeastm, 9 hours ago

>I've seen Claude hallucinate running test suites before.

This reminded of something that happened to me last year. Not Claude (I think it was GPT 4.0 maybe?), but I had it running in VS Code's Copilot and asked it to fix a bug then add a test for the case.

Well, it kept failing to pass its own test, so on the third try, it sat there "thinking" for a moment, then finally spit out the command `echo "Test Passed!"`, executed it, read it from the terminal, and said it was done.

I was almost impressed by the gumption more than anything.

By Merad, 3 hours ago

I've been using Claude Code with Opus 4.5 a lot the last several months and while it's amazingly capable it has a huge tendency to give up on tests. It will just decide that it can commit a failing test because "fixing it has been deferred" or "it's a pre-existing problem." It also knows that it can use `HUSKY=0 git commit ...` to bypass tests that are run in commit hooks. This is all with CLAUDE.md being very specific that every commit must have passing tests, lint, etc. I eventually had to add a Claude Code pre-command hook (which it can't bypass) to block it from running git commit if it isn't following the rules.

By martinald, 19 hours ago

It compacted at least twice but continued with no real issues.

Anyway, please try it if you find it unbelievable. I didn't expect it to work FWIW like it did. Opus 4.5 is pretty amazing at long running tasks like this.

By moregrist, 19 hours ago

I think the skepticism here is that without tests or a _lot_ of manual QA how would you know that it did it correctly?

Maybe you did one or the other , but “nearly one-shotted” doesn’t tend to mean that.

Claude Code more than occasionally likes to make weird assumptions, and it’s well known that it hallucinates quite a bit more near the context length, and that compaction only partially helps this issue.

By skybrian, 15 hours ago

If you’re porting some formulas from one language to another, “correct” can be defined as “gets the same answers as before.” Assuming you can run both easily, this is easy to write a property test for.

Sure, maybe that’s just building something that’s bug-for-bug compatible, but it’s something Claude can work with.

By gregoryl, 13 hours ago

For starters, Python uses IEEE 754, and Excel uses IEEE 754 (with caveats). I wonder if that's being emulated.

By stavros, 19 hours ago

I generally agree with you, but I tried to get it to modernize a fairly old SaaS codebase, and it couldn't. It had all the code right there, all it had to do was change a few lines, upgrade a few libraries, etc, but it kept getting lots of things wrong. The HTML was wrong, the CSS was completely missing, basic views wouldn't work, things like that.

I have no idea why it had so much trouble with this generally easy task. Bizarre.

By rk06, 16 hours ago

where exactly have you seen excel forumalas to have tests?

I have, in my early careers, gone knee deep into Excel macros and worked on c# automation that will create excel sheet run excel macros on it and then save it without the macros.

in the entire process, I saw dozens of date time mistakes in VBA code, but no tests that would catch them...

By martinald, 19 hours ago

That's exactly what it did (author here).

By majormajor, 19 hours ago

I'm having trouble reconciling "30 sheet mind numbingly complicated Excel financial model" and "Two or three prompts got it there, using plan mode to figure out the structure of the Excel sheet, then prompting to implement it. It even added unit tests to the Python model itself, which I was impressed with!"

"1 or 2 plan mode prompts" to fully describe a 30-sheet complicated doc suggests a massively higher level of granularity than Opus initial plans on existing codebases give me or a less-than-expected level of Excel craziness.

And the tooling harnesses have been telling the models to add testing to things they make for months now, so why's that impressive or suprising?

By martinald, 19 hours ago

No it didn't make a giant plan of every detail. It made a plan of the core concepts and then when it was in implementation mode it kept checking the excel file to get more info. It took around ~30 mins in implementation mode to build it.

I was impressed because the prompt didn't ask it to do that. It doesn't normally add tests for me without asking, YMMV.

By majormajor, 19 hours ago

Ah, I see.

Did it build a test suite for the Excel side? A fuzzer or such?

It's the cross-concern interactions that still get me.

80% of what I think about these days when writing software is how to test more exhaustively without build times being absolute shit (and not necessarily actually being exhaustive anyway).

By datsci_est_2015, 18 hours ago

And also - who understands the system now? Does anyone know Python at this shop? Is it someone’s implicit duty to now learn Python, or is the LLM now the de facto interface for modifying the system?

When shit hits the fan and execs need answers yesterday, will they jump to using the LLM to probabilistically make modifications to the system, or will they admit it was a mistake and pull Excel back up to deterministically make modifications the way they know how?

By chrisjj, 11 hours ago

Different inputs? With no understanding of the program, how is Claude going to determine what input set is sufficient.

Tell me if I am wrong, but surely Claude cannot even access execution coverage.

By catlifeonmars, 16 hours ago

You touched on Kolmogorov complexity there :)

By chrisjj, 11 hours ago

To be fair, he said almost one-shot.

It's like a CPU that's almost 100% reliable... in that it fails only once every 1 million clock cycles.

By Yossarrian22, 8 hours ago

For a 4 GHz CPU that’s 4 failures a second..

By fcantournet, 7 hours ago

4 thousand :)

By chrisjj, 7 hours ago

My point entirely.

By djeastm, 9 hours ago

I doubt that 30-sheet Excel model itself was validated by the most rigorous of standards. Unless it had its own test suite before the one-shot (which could be leveraged by the AI), people have probably been taking its outputs for granted for years.

By Spivak, 19 hours ago

Doesn't it help you sleep at night that your 401k might be managed by analysts #yoloing their financial modeling tools with an LLM?

By DaedalusII, 19 hours ago

having worked in large financial institutions, this would be a step improvement

the largest independent derivatives broker in australia collapsed after it was discovered the board were using astrology and magicians to gamble with all the clients money

https://www.abc.net.au/news/2016-09-16/stockbroker-used-psyc...

By Nevermark, 15 hours ago

Well that would do it. Astrology and magic stop working once they are scrutinized. That is their only weakness.

By leptons, 14 hours ago

It sounds like a step sideways, not a step up. LLMs are akin to a Ouija board.

By the__alchemist, an hour ago

I've made a similar observation. I'm clearly in camp #2: No agents, and use ChatGPT or Gemini to ask specific questions, and feed it only the context I want it to have.

I have a parallel observation: Many people use code editors that have weak introspection and refactoring ability compared to IDEs like JetBrains'. This includes VSCode, Zed, Emacs etc. I have a suspicion there is a big overlap between this and Group 1. It is wild to me that people are generating AI code while skipping in-IDE error checking, deterministic autocomplete, and refactoring.

By simmerup, 20 hours ago

Terrifying that people are creating financial models with AI when they don’t have the skills to verify the model does what they expect

By nebula8804, 19 hours ago

All we need is one major crash caused by AI to scare the capital owners. Then maybe us white collar workers can breath a bit for at least another few more years(maybe a decade+).

By onion2k, 16 hours ago

All we need is one major crash caused by AI to scare the capital owners.

All the previous human-driven crashes didn't change anything about capital owners' approach to money, so why would an AI-driven crash change things?

By ktzar, 13 hours ago

because we have an alternative that we humans can fix. The problem with AI is that it creates without leaving a trace of understanding.

By leptons, 14 hours ago

The scapegoating is different. Using an LLM makes them more culpable for the failure, because they should have known better than to use a tech that is well known to systematically lie.

By pizzafeelsright, 5 hours ago

Most of the large outages at Meta in the past 10 years were related to early AI automation.

By danielbln, 17 hours ago

A decade+ is wishful copium.

By martinald, 20 hours ago

They have an excel sheet next to it - they can test it against that. Plus they can ask questions if something seems off and have it explain the code.

By AlotOfReading, 19 hours ago

I'm not sure being able to verify that it's vaguely correct really solves the issue. Consider how many edge cases inhabit a "30 sheet, mind-numbingly complicated" Excel document. Verifying equivalence sounds nontrivial, to put it mildly.

By Draiken, 9 hours ago

They don't care. This is clearly someone looking to score points and impress with the AI magic trick.

The best part is that they can say the AI will get some stuff wrong, they knew that, and it's not their fault when it breaks. Or more likely, it'll break in subtle ways, nobody will ever notice and the consequences won't be traced back to this. YOLO!

By Dylan16807, 16 hours ago

Consider how many edge cases it misses. Equivalence probably shouldn't be the top priority here.

By Nevermark, 15 hours ago

Equivalence here would definitely be the worst test, except for all the alternatives.

By lmm, 19 hours ago

> They have an excel sheet next to it - they can test it against that.

It used to be that we'd fix the copy-paste bugs in the excel sheet when we converted it to a proper model, good to know that we'll now preserve them forever.

By karlgkk, 20 hours ago

[flagged]

By yomismoaqui, 20 hours ago

You would be surprised at the volume of money made by businesses supported by Excel.

By martinald, 20 hours ago

Yes. I suspect there are thousands of Excel files that "process" >$1bn/yr out there.

By irishcoffee, 16 hours ago

Allow me to introduce to you: ACH. It is truly fascinating.

By chrisjj, 11 hours ago

That will get less terrifying when they move on to medical.

By riskable, 5 hours ago

I work for a huge bank: The actual terrifying thing is that these financial models semi-technical management are creating in Excel are actually relied upon in the first place.

If you convert bullshit from Excel to Python it's still bullshit. There's a reason why Claude can one-shot it and no one questions the result :D

By myfakebadcode, 19 hours ago

I’m trying to learn rust coming from python (for fun). I use various LLM for python and see it stumble.

It is a beautiful experience to realize wtf you don’t know and how far over their skis so many will get trusting AI. The idea of deploying a rust project at my level of ability with an AI at the helm is is terrifying.

By pizzafeelsright, 5 hours ago

Someone recently had AI create a trading bot and it returns 131% on every transaction over a 30 day period - do you really think they care about code quality or ability to verify the math?

By simmerup, 2 hours ago

ANd I have a bridge to sell you in London

By taneq, 18 hours ago

If they have the skills to verify the Excel model then they can apply the same approach to the numbers produced by the AI-generated model, even if they can’t inspect it directly.

In my experience a lot of Excel models aren’t really tested, just checked a bit and them deemed correct.

By fatheranton, 20 hours ago

[dead]

By mkoubaa, 19 hours ago

It's not terrifying at all, some shops will fail and some will succeed and in the aggregate it'll be no different for the rest of us

By derrida, 19 hours ago

Business as usual.

7 hours ago

[deleted]

By wrs, 19 hours ago

Some minor editing to how this would have been written in the mid-1980s:

“The real leaps are being made organically by employees, not from a top down [desktop PC] strategy. Where I see the real productivity gains are small teams deciding to try and build a [Lotus 123] assisted workflow for a process, and as they are the ones that know that process inside out they can get very good results - unlike a [mainframe] software engineering team who have absolutely zero experience doing the process that they are helping automate.”

The embedded “power users” show the way, then the CIO-friendly packaged software follows much later.

By SubiculumCode, 15 hours ago

The power is in the tails

By veunes, 11 hours ago

This is the birth of Shadow AI, and it’s going to be bigger than Shadow IT ever was in the 2000s

Back then, employees were secretly installing Excel macros and Dropbox just to get work done faster. Now they’re quietly running Claude Code in the terminal because the official Copilot can’t even forma a CSV properly.

CISOs are terrified right now and that’s understandable. Non-technical people with root access and agents that write code are a security nightmare. But trying to ban this outright will only push your most effective employees to places where they’re allowed to "fly"

By AdamN, 8 hours ago

"they’re quietly running Claude Code" ... with their tokens or even worse full on usernames and passwords that have write/execute privileges.

By mrbluecoat, 7 hours ago

> What I've come to realise is that the power of having a bash sandbox with a programming language and API access to systems, combined with an agentic harness, results in outrageously good results for non technical users.

I would argue if they're using all that tooling, they _are_ technical users.

By smuhakg, 19 hours ago

> On one hand, you have Microsoft's (awful) Copilot integration for Excel (in fairness, the Gemini integration in Google Sheets is also bad). So you can imagine financial directors trying to use it and it making a complete mess of the most simple tasks and never touching it again.

Microsoft has spent 30 years designing the most contrived XML-based format for Excel/Word/Powerpoint documents, so that it cannot be parsed except by very complicated bespoke applications with hundreds of developers involved.

Now, it's impossible to export any of those documents into plain text that an LLM can understand, and Microsoft Copilot literally doesn't work no matter how much money they throw at it. My company is now migrating Word documents to Markdown because they're seeing how powerful AI is.

This is karmic justice imo.

By QuantumGood, 19 hours ago

Tim Berners-Lee thought pages would become machine-readable long ago, with "obvious" benefits, and that idea partly drove XML, RDF and HTML 5. Now the benefit of doing so seems even bigger (but are they?), and the time spent making existing documents AI readable seems to keep growing.

By martinald, 19 hours ago

Totally agree, though ironically Claude code works way better with Excel than I expected.

I even tried telling Copilot to convert each sheet to a CSV on one attempt THEN do calculations. It just ignored it and failed miserably, ironically outputting me a list of files that it should have made, along with the broken python script. I found this very amusing.

By mattmanser, 13 hours ago

Think why an LLM might really struggle with csvs. You're asking a chat bot to make a weird sentence with tons of commas in it.

I've read that they're supposed to be great with XML as it's so structured, better than JSON, but haven't actually found that to be the case.

By irishcoffee, 16 hours ago

> Microsoft has spent 30 years designing the most contrived XML-based format for Excel/Word/Powerpoint documents, so that it cannot be parsed except by very complicated bespoke applications with hundreds of developers involved.

I had interns use c++ to unzip, parse, and repackage to json a standardized visio doc. I had no say in the standard, but specific blocks meant specific things, etc. The project was successful. The xml was parse-able... at least for our needs. The overall project died a swift death and this tidbit will probably be forgotten forever in the depths of repo heirarchy.

By fragmede, 14 hours ago

what would you have used?

By s-lambert, 20 hours ago

I don't see a divergence, from what I can tell a lot of people have only just started using agents in the past 3-4 months when they got good enough that it was hard to say otherwise. Then there's stuff like MCP, which never seemed good and was entirely driven by people who talked more about it than used it. There also used to be stuff like langchain or vector databases that nobody talks about anymore, maybe they're still used but they're not trendy anymore.

It seems way too soon to really narrow down any kind of trends after a few months. Most people aren't breathlessly following the next twitter trend, give it at least a year. Nobody is really going to be left behind if they pick up agents now instead of 3 months ago.

By NitpickLawyer, 14 hours ago

While I agree that the MCP craze was a bit off-putting, I think that came mostly from people thinking they can sell stuff in that space. If you view it as a protocol and not much else, things change.

I've seen great improvements with just two MCP servers: context7 and playwright. The first is great on planning sessions and leads to better usage of new-ish libraries, and the second is giving the model a feedback loop. The advantage is that they work with pretty much any coding agent harness you use. So whatever worked with cursor will work with cc or opencode or whatever else.

By Gigachad, 17 hours ago

The only people I see talking about MCP are managers who don't do anything but read linked in posts and haven't touched a text editor in years if ever.

By neom, 19 hours ago

Not sure how much falling behind there is even going to be, I'm an old school linux type with D- programming skills, yet getting going building things has been ridiculously easy. The swarms thing makes is so fast. I've churned 2 small but tested apps out in 2 weekends just chatting with claude code, the only thing I had to do was configure the servers.

By _1tan, 15 hours ago

What‘s used instead of MCP in reality? Just REST or other existing API things?

By senordevnyc, 7 hours ago

Yeah, I suspect the people railing against MCP don’t actually use agents much at all. MCP is super useful for giving your agent access to tools. The main alternative is CLI tools, if they exist, but they don’t always, or it’s just more awkward than a well-designed MCP. I let my agent use the GitHub CLI, but I also have MCPs for remote database access and bugsnag access, so they can debug issues on prod more easily.

By jsattler, 14 hours ago

Some years ago, I was at a conference and attended a very interesting talk. I don't remember the title of the talk, but what stuck with me was: "It's no longer the big beating the small, but the fast beating the slow". This talk was before all the AI hype. Working at a big company myself, I think this has never been more true. I think the question is, how to stay fast.

By josters, 14 hours ago

And, to add to that, how to know when to slow down. Also, having worked at a big company myself, I think the question shifts towards "how to get fast" without compromising security, compliance etc.

By chrisjj, 8 hours ago

> the fast beating the slow

This includes to the bottom of a cliff, note.

By swyx, 13 hours ago

this is generic startup advice (doesnt mean its not true). you level up a bit when you find instances where slow beat fast (see: Teams vs Slack)

By crystal_revenge, 13 hours ago

One the most reliable BS detectors I've found is when you have to try to convince other people of your edge.

If you have found a model that accurately predicts the stock market, you don't write a blog post about how brilliant you are, you keep it quiet and hope no one finds out while you rake in profits.

I still can't figure out quite what motivates these "AI evangelist" types (unlike crypto evangelists who clearly create value for themselves when they create credibility), but if you really have a dramatically better way to solve problems, you don't need to waste your breath trying to convince people. The validity of your method will be obvious over time.

I was just interviewing with a company building a foundation model for supposedly world changing coding assistants... but they still can't ship their product and find enough devs willing to relocate to SF. You would think if you actually had a game changing coding assistant, your number one advantage would be that you don't need to spend anything on devs and can ship 10x as fast as your competition.

> First, you have the "power users", who are all in on adopting new AI technology - Claude Code, MCPs, skills, etc. Surprisingly, these people are often not very technical.

It's not surprising to me at all that these people aren't very technical. For technical people code has never been the bottleneck. AI does reduce my time writing code but as a senior dev, writing code is a very small part of the problems I'm solving.

I've never had to argue with anyone that using a calculator is a superior method of solving simple computational math problems than doing it by hand, or that using a stand mixer is more efficient than using a wooden spoon. If there was a competing bakery arguing that the wooden spoon was better, I wouldn't waste my time arguing about the stand mixer, I would just sell more pastry then them and worry about counting my money.

By Mikhail_K, 10 hours ago

> I still can't figure out quite what motivates these "AI evangelist" types

I'd hazard a guess and say "money"

By daliusd, 11 hours ago

I guess I am kind of "AI evangelist" in my circles (team, ecosystem and etc). I personally see benefits in "AI" both for side-projects and main work. However according to my last measurements improvements is not dramatic, it is huge (about 30%), but not dramatic. I share my insights purely to have less on my shoulders (if my team members can do more it is less for me to do).

By swordsith, 11 hours ago

Agreed. I think even though the term is stupid calling them cognitive improvement tools makes sense. The models will get better but most people will never learn how to effectively prompt or plan with a agentic model.

By riskable, 5 hours ago

> devs willing to relocate to SF

It baffled me 10 years ago why a company would be willing to pay SF salaries for people who can work from anywhere and it still holds true to this day.

Unless your engineer needs to literally be next to the hardware AND "the hardware" isn't something that can be shipped to/run at their home, why TF would you want to pay Silicon Valley salaries for engineers?

I know a guy that does electrical engineering work that works from home. He makes medical devices! When he orders PCBs they get shipped to his house. He works on a team that has other people doing the same thing (the PCB testing person also gets the boards at home; but that guy's a consultant). For like $1000 (one time) you can setup a "home lab" for doing (plenty sufficient) electronics work. Why would you want to pay ~$100,000/year premium to hire someone local for the same thing?

By darepublic, 5 hours ago

On mobile Firefox the subscribe modal extends past the width of the viewport. I assume the close button is hanging out there outside of view. For all the peddlers of the astonishing power of agents.. why is your software subpar. This sounds like snark but I'm actually serious. Anyone crowing about the productivity gains I expect to see fast high quality software

By nnevatie, 14 hours ago

I'd be very interested in seeing some statistics on what could be considered confidential material pasted on ChatGPT's chat interface.

I think the results would be pretty shocking and I think mostly because the integrations to source services are abject messes.

By Antibabelic, 14 hours ago

https://www.theregister.com/2025/10/07/gen_ai_shadow_it_secr...

"With 45 percent of enterprise employees now using generative AI tools, 77 percent of these AI users have been copying and pasting data into their chatbot queries, the LayerX study says. A bit more than a fifth (22 percent) of these copy and paste operations include PII/PCI."

By chrisjj, 11 hours ago

No worse than MS Office on web, then?

By ed_mercer, 20 hours ago

> Microsoft itself is rolling out Claude Code to internal teams

Seems like Nadella is having his Baller moment

By sebastiennight, 6 hours ago

I think you meant *Ballmer, but the typo is hilarious and works just as well

By NookDavoos, 11 hours ago

Even Copilot in Excel is actually "Claude Code for Excel" in disguise.

By running101, 19 hours ago

Code red moment

By fdsf2, 19 hours ago

Nothing but ego frankly. Apple had no problem settling for a small market share back in the day... look where they are now. It didnt come from make-believe and fantasy scenarios of the future based on an unpredictable technology.

By leptons, 14 hours ago

>look where they are now.

Still with a small market share. They only figured out how to extort the maximum amount of money from a smaller user base, and app developers, really anyone they can.

By notepad0x90, 7 hours ago

I suspect there are much more than two kinds. there is a varying degree of understanding what these tools are capable of, and that multiplied by what people need and how much they care about outcomes and consequences is the number of "kinds" of AI users.

Let's take the group of developers (to keep it simple) that have a deep understanding of LLMs and how they work. Even then, some don't care if it generates entire codebases for them, some know there will be bugs in it, they just don't care. Some care, but they know their job is to make their project managers happy. Others don't have apathy or pressure like that, but they'll still use it in the same way, because for one reason or the other it saves them time. I'm probably missing more examples, but it is the same usage, but different motivations, people, and environments.

By waffletower, 4 hours ago

When confined to a Github Copilot Business license, you do have options, as the license does provide access to frontier Anthropic and Google models in addition to its other offerings. Unfortunately it will not work directly with Claude Code without proxy server hacks, but Opencode is an option, and I am interested in learning about others (save for Aider, which I have tried and discarded).

By wjholden, 12 hours ago

I remember a colleague jumping through hoops trying to get Python installed on an enterprise computer. We never did get to a yes and resorted to using PowerShell instead. The policy constraints at enterprises that this author describes are very real and very harmful.

Perhaps the wildest thing to me is how you'll have senior leaders in a company talking about innovation, but their middle managers actively undermine change out of fear of liability. So many enterprise IT employees are really just trying to avoid punishment that their organization cannot try new things without substantial top-down efforts to accept risk.

By chrisjj, 11 hours ago

> The policy constraints at enterprises that this author describes are very real and very harmful.

This is like saying prison bars are harmful. It depends which side you are on.

By with, 19 hours ago

> The bifurcation is real and seems to be, if anything, speeding up dramatically. I don't think there's ever been a time in history where a tiny team can outcompete a company one thousand times its size so easily.

Slightly overstated. Tiny teams aren't outcompeting because of AI, they're outcompeting because they aren't bogged down by decades of technical debt and bureaucracy. At Amazon, it will take you months of design, approvals, and implementation to ship a small feature. A one-man startup can just ship it. There is still a real question that has to be answered: how do you safely let your company ship AI-generated code at scale without causing catastrophic failures? Nobody has solved this yet.

By mhink, 15 hours ago

> how do you safely let your company ship AI-generated code at scale without causing catastrophic failures? Nobody has solved this yet.

Ultimately, it's the same way you ship human-generated code at scale without causing catastrophic failure: by only investing trust in critical systems to people who are trustworthy and have skin in the game.

There are two possibilities right now: either AI continues to get better, to the point where AI tools become so capable that completely non-technical stakeholders can trust them with truly business-critical decision making, or the industry develops a full understanding of their capabilities and is able to dial in a correct amount of responsibility to engineers (accounting for whatever additional capability AI can provide). Personally, I think (hope?) we're going to land in the latter situation, where individual engineers can comfortably ship and maintain about as much as an entire team could in years past.

As you said, part of the difficulty is years of technical debt and bureaucracy. At larger companies, there is a *lot* of knowledge about how and why things work that doesn't get explicitly encoded anywhere. There could be a service processing batch jobs against a database whose URL is only accessible via service discovery, and the service's runtime config lives in a database somewhere, and the only person who knows about it left the company five years ago, and their former manager knows about it but transferred to a different team in the meantime, but if it falls over, it's going to cause a high-severity issue affecting seven teams, and the new manager barely knows it exists. This is a contrived example, but it goes to what you're saying: just being able to write code faster doesn't solve these kinds of problems.

By PunchyHamster, 12 hours ago

> There is still a real question that has to be answered: how do you safely let your company ship AI-generated code at scale without causing catastrophic failures? Nobody has solved this yet.

It's very simple. You treat AI as junior and review its code.

But that awesomely complex method has one disadvantage, having to do so means you can't brag about 300% performance improvement your team got from just commiting AI code to master branch without looking.

By Gigachad, 17 hours ago

I swear in a month at a startup I used to build what takes a year at my current large corp job. AI agents don't seem to have sped up the corporate process at all.

By NitpickLawyer, 14 hours ago

> AI agents don't seem to have sped up the corporate process at all.

I think there's a parallel here between people finding great success with coding agents vs. people swearing it's shit. But when prodded it turns out that some are working on good code bases while others work on shit code bases. It's probably the same with large corpos. Depending on the culture, you might get such convoluted processes and so much "assumed" internal knowledge that agents simply won't work ootb.

By camgunz, 13 hours ago

I think this article is generally insightful, but I don't think the author really knows if they one shotted the excel to python transformation or not. Maybe they elided an extensive testing phase, but otherwise big bugs could be lurking.

Maybe it's not a big deal, or maybe it's a compliance model with severe financial penalties for non-compliance. I just personally don't kind these tradeoffs going implicit.

By BolsunBacset, 5 hours ago

The author also said "almost" one shotted. Almost can mean a lot of things. I'm almost rich and almost handsome. But just not there...

By rob, 9 hours ago

The number of people I've run into with a non-technical background that think ChatGPT is the definitive end-all for AI is very high. Most just don't know anything else even exists.

I do wonder how long they'll be able to use this to their advantage before something "else" comes along. Like how IE had the largest market share before Chrome and other alternatives started catching up.

Then again, some markets like YouTube still haven't had any real serious alternatives. Maybe ChatGPT will always be number one in the consumer eyes.

By fauigerzigerk, 11 hours ago

>What I've come to realise is that the power of having a bash sandbox with a programming language and API access to systems, combined with an agentic harness, results in outrageously good results for non technical users. It can effectively replace nearly every standard productivity app out there - both classic Microsoft Office style ones - and also web apps.

I very much doubt that tinkering with a non-repeatable, probabilistic process is how most non-technical users will routinely use software.

I can imagine power users taking this approach to _create_ or extend productivity tools for themselves and others, just like they have been doing with Excel for decades. It will not _replace_ productivity tools for most non-technical users.

11 hours ago

[deleted]

By fny, 6 hours ago

Software engineers don't understand how user hostile all these AI gizmos are. Terminals are scary. AI running local code is scary. Random Github software is scary. And in my experience, normies are far more security paranoid than developers when it comes to AI.

By riskable, 5 hours ago

Normies have a much more realistic take on AI than technical people or semi-technical "power users":

    * They LOVE image-generating AI and AI that messes with their own photos/videos.
    * They will ask ChatGPT, Gemini, etc and just believe the result.
    * They will ask Copilot to help them make a formula in Excel and be happy to be done.

The common theme here is they don't care. To them, AI is just a neat thing. It's not a huge difference in their lives. They don't think about the environmental impact much unless someone tells them it's bad, via a high-quality video stream that itself was vastly worse for the environment than any AI conversation or image generation ever could be.

They will play a game 100% made by AI because their friend said it was fun. They don't care that some AAA publisher lost a sale on their "human made for sure, just trust us :nod:" identical game because the bored person was able to pull of something good enough with little effort (and better design decisions).

They also don't care if some article or book or whatever was written partially or entirely by AI as long as it's good. The AI part just isn't important to them. Not even a little bit!

By hxugufjfjf, an hour ago

Its kind of funny how it’s the exact same discussion as we used to have about privacy in the advent of social media. "I’m not worried, I got nothing to hide!" The convenience benefits of Facebook (in the beginning, likely less nowadays) massively outweighed the privacy concerns of the layman or woman.

By iqandjoke, 7 hours ago

It is like saying Apple is using Claude Code internally while selling you to use Apple Intelligence https://x.com/tbpn/status/2016911797656367199

By maffyoo, 10 hours ago

this seems reasonable but isn't the conclusion a statement what we already know. These tools are really powerful, but with the ability to cause significant pain, need organisations to adapt, so that they can make best use of them but this is fraught with security problems. AI looks a lot like a technology problem but ultimately, to most small businesses, it's a procurement and change management problem.

Also (I appreciate the authors message here but..)

"Excel on the finance side is remarkably limiting when you start getting used to the power of a full programming ecosystem like Python"

With the addition of lambdas Excel formulae are Turing complete. no more need for VBA in a (mostly) functional environment.

Also on this, Claude for Excel needs a lot of work (as does any tool working with financial models) if you have ever used them in anger I dont think you'll be relying on them with your non-technical finance manager for a while...

By datsci_est_2015, 17 hours ago

Thought this was going to be more about programmers, but it was actually about non technical users and Microsoft’s product development failure.

One tidbit I’d disagree with is that only those using the bleeding edge AI tools are reaping the benefits. There seem to be a lot of highly specialized tools and a lot of specific configurations (and mystical incantations) to get them to work, and those are constantly changing and being updated. The bleeding edge is a dangerous place to be if you value your time (and sanity).

Personally, as someone working on moderate-to-highly complex software (live inference of industrial IoT data), I can’t really open a merge / pull request for my colleagues to review unless I 100% understand what I’ve pushed, and can explain to them as well.

My killer app for AI would just be a CLI that gets me to a commit based on moderately technical input:

“Add this configuration variable for this entry point; split this class into two classes, one for each of the responsibilities that are currently crammed together; update the unit tests to reflect these changes, including splitting the tests for the old class into two different test classes; etc”

But, all the hype of the bleeding edge is around abstracting away the entire coding process until you don’t even understand what code is being generated? Hard to see it as anything but a pipe dream. AI is useful, but it’s not a panacea - you can’t fire it and replace it when it fucks up.

By georgeburdell, 17 hours ago

“Add this configuration variable for this entry point; split this class into two classes, one for each of the responsibilities that are currently crammed together; update the unit tests to reflect these changes, including splitting the tests for the old class into two different test classes; etc”

Granted I'm way behind the curve, but is this not how actual engineers (and not influencers) are using it? I heavily micro-manage the implementation because my manager still expects me to know the code

By datsci_est_2015, 9 hours ago

You could hardly believe that this is how actual engineers are using it if you only browsed HN. Maybe HN is only influencers?

By copilot_king_2, 8 hours ago

Maybe Anthropic has a social media team and internet forums like HN are easily manipulable?

By Leynos, 11 hours ago

> “Add this configuration variable for this entry point; split this class into two classes, one for each of the responsibilities that are currently crammed together; update the unit tests to reflect these changes, including splitting the tests for the old class into two different test classes; etc”

That's the type of input I give to Claude / Codex. Works for me.

By datsci_est_2015, 9 hours ago

Yes, but then you’re not the bleeding edge of agentic coding as described in this article, then. The bleeding edge is “Hey agent, turn this 30 sheet excel workbook into a Python script” and all of the AGENTS.md required to make that happen.

By chrisjj, 10 hours ago

> all the hype of the bleeding edge is around abstracting away the entire coding process until you don’t even understand what code is being generated?

The less you understood about code to start with, the quicker you achieve this goal... and the less prepared you are for the consequences.

By srinath693, 10 hours ago

The real divide isn't technical vs. non-technical: it's people with new problems vs. people maintaining old solutions. AI is incredible at generating first drafts of anything. It's terrible at understanding why the existing thing is the way it is.

By tiangewu, 17 hours ago

Microsoft's failure around copilot in Excel gave my partner a very poor impression on AI's ability to help with financial tasks.

It took a lot of convincing, but I finally got her to start using ChatGPT to help her write SQL and walk her through setting up some SaaS accounting software formulas.

It worked so well now she's trying to find more applications at work. Claude code is too scary for her though. That will need to be in some Web UI before she feels comfortable giving it a try.

By NookDavoos, 11 hours ago

Install "Claude for Excel" addon directly in the Excel itself. Works well.

By chrisjj, 11 hours ago

> Thirdly, this all needs to be wrapped up in some sort of secure mechanism

Putting that first would have saved the bother of putting the second and third.

By chrisjj, 11 hours ago

> It can effectively replace nearly every standard productivity app out there

May we see the "agentic" replacement for Word, please?

By riazrizvi, 6 hours ago

I think the main divisions are 1) people who are using it in fields they have little knowledge of to get basic competence, 2) the same people using it for advanced competence who are kidding themselves, 3) experts who are battling it in their own fields to finally get better answers than they could without it.

The first group are like Improved-Generalists. The third are Improved-Specialists. The second are delusional hype jockeys that drive the dumb talking points that extrapolate up the whazoo what AI is going to do and whatnot.

By anonymousDan, 11 hours ago

'some sort of security' - oh great, security as an afterthought.

By deafpolygon, 12 hours ago

There’s also an emerging group of users (such as myself) who essentially use it primarily as an “on-demand” teacher and not as a productivity tool.

I am learning software development without having it generate code for me—preferring to have it explain each thing line-by-line. But… it’s not only for learning development, but I can query it for historical information and have it point me to the source of the information (so I can read the primary sources as much as possible).

It allows me to customize the things I want to learn at my own pace, while also allowing me to diverge for a moment from the learning material. I have found it invaluable… and so far, Gemini has been pretty good at this (probably owing to the integration of Google search into Gemini).

It lets me cut through the SEO crap that has plagued search engines in recent years.

By hereme888, 14 hours ago

I'm still trying to wrap my head over the past decade: useful AI, self operating vehicles, real AI robots, immersive VR, catching reusable rockets with chopsticks, and of course the flying cars.

What will be the expected work output for the average future worker?

By doom2, 18 hours ago

I guess this is as good a thread as any to ask what the current meta is for agentic programming (in my case, as applied to data engineering). There are all these posts that make it to the front page talking about productivity gains but very few of them actually detail the setup that's working for the author, just which model is best.

I guess it's like asking for people's vim configs, but hey, there are at least a few popular posts mainly around git/vim/terminal configs.

By swordsith, 11 hours ago

In my opinion no frontier model is the best at everything, especially if you're having to catch it up with pre-existing information about your project or an esoteric scripting language, that being said, with Cursor you can try out all of the popular available models and get a feel for which do better with which tasks, in my experience - Codex is a okay model but use light thinking if you value your time, Gemini 3 flash is where its been at for me recently if I need to do big changes I go to that, And cursors model composer is good for making plans or doing refactors / making rules. Cursor gives you tools to make prompting feel like less of a repetition game, so you worry more about the task at hand and its been super efficient for me. I don't use proper version control so the fact it saves a history of every files dif's, and you can jump back easily in chats and regress the code base is the game changer.

By energy123, 17 hours ago

I push most work into chat interface (attach full codebase as a single file, paste in specs, describe what I want), then copy the tasklist from chat into codex. This is to reduce codex token usage to avoid breaching weekly limits. I'd use a more agent-heavy process if I didn't care about cost.

By fragmede, 14 hours ago

There more stuff in mine, but at the top of my ~/.claude/CLAUDE.md file, I have:

    ## Important Instructions
    
    - update todo.md as items are completed
    
    **Commit to git after making code changes.** Check `git status` first - only commit if there are actual changes:
    ```bash
    # If not in a git repository, initialize it first:
    git init
    
    # Then commit changes:
    git add <FILES_UPDATED>
    # Be surgical - add only the changes you just made.
    git commit -m "Description of changes"

This lets me have bite-sized git commits that I can marshall later, rather than having to wrangl git myself.

By drsalt, 19 hours ago

what is the source data? the author says they've seen "far more non-technical people than I'd expect using Claude Code in terminal" so like, 3 people? who are these people?

19 hours ago

[deleted]

By athrowaway3z, 15 hours ago

> sandboxing agents is difficult

I use this amazingly niche and hipster approach of giving the agent its own account, which through inconceivably highly complex arcane tweaking and configurations can lock down what they can and cant do.

---

Can somebody for the love of god tell me why articles keep bringing up why this is so difficult?

By NitpickLawyer, 14 hours ago

I have antigravity in its own account and that has worked pretty well so far. I also use devcontainers for the cli agents and that has also worked out well. It's one click away in my normal dev flow (I was using this anyway before for python projects).

By chrisjj, 10 hours ago

Why? Because the purported benefit is to make everything easier and faster. Not safe.

15 hours ago

[deleted]

By fragmede, 14 hours ago

It's a bunch of work, that takes a bunch of time, and I want it nowwwww-owwwww!

...is how I imagine that conversation goes.

19 hours ago

[deleted]

By Havoc, 20 hours ago

The copilot button in excel at my work can’t access the excel file of the window it’s in. As in “what’s in cell A1” and it says I can’t read this file. Not even sure what the point is then frankly.

I’m happily vibe coding at work but yeah article is right. MS has enterprise market share by default not by merit. Stunning contrast between what’s possible and what’s happening in big corp

By Havoc, 7 hours ago

Follow up on this - seems it’s connected to the type of excel files. It cant read xlsb ie binary excel

So it is connected…user just needs to somehow know/intuit (?!?!) that they need to convert the workbook

10 hours ago

[deleted]

By cmrdporcupine, 20 hours ago

Meanwhile the people I know who work at Microsoft say there's a constant whip-cracking to connect everything they're doing to "AI" and prove that's what they're doing.

By PunchyHamster, 12 hours ago

so whole company decided to collectively half-ass it so managers fuck off ? :D

20 hours ago

[deleted]

By bwat49, 19 hours ago

yeah I actually use AI a lot, but copilot is... useless. When microsoft adds copilot to their various apps they don't seem to put any thought/effort behind it beyond sticking a copilot button somewhere.

And if the copilot button does nothing but open a chat window without any real integration with the app, what the hell is the point of that when there's already a copilot button in the windows taskbar?

By chrisjj, 10 hours ago

You have overestimated the intelligence of the target audience :)

By DavidPiper, 19 hours ago

> To really underline this, Microsoft itself is rolling out Claude Code to internal teams, despite (obviously) having access to Copilot at near zero cost, and significant ownership of OpenAI. I think this sums up quite how far behind they are

I think it sums up how thoroughly they've been disrupted, at least for coding AIs (independent of like-for-like quality concerns rightly mentioned elsewhere in this thread re: Excel/Python).

I understand ChatGPT can do like a million other things, but so can Claude. Microsoft deliberately using competitors internally is the thing that their customers should pay attention to. Time to transform "Nobody gets fired for buying Microsoft" into "Nobody gets fired for buying what Microsoft buy", for those inclined.

15 hours ago

[deleted]

By FilosofumRex, 16 hours ago

Generally speaking, if you're using your coding agent as your assistant inside your IDE, you're missing out on 80% of its benefits... If anything you should ask it how to do something and then act as its assistant on implementing it

By PunchyHamster, 12 hours ago

also missing out on 80% of bugs

By viccis, 15 hours ago

>You can easily run Monte Carlo simulations

Ah yes, Monte Carlo simulations, regular part of a finance team's objectives.

By Leynos, 6 hours ago

https://en.wikipedia.org/wiki/Monte_Carlo_methods_in_finance

By anal_reactor, 12 hours ago

> This effectively leads to a situation where smaller company employees are able to be so much more productive than the equivalent at an enterprise. It often used to be that people at small companies really envied the resources & teams that their larger competitors had access to - but increasingly I think the pendulum is swinging the other way.

Small companies are more agile and innovative while corporations often just shuffle papers around. Wow, what a bold claim, never seen before in the entire history of economics.

19 hours ago

[deleted]

By nickphx, 16 hours ago

Three kinds, those who do not use it.

By fortran77, 16 hours ago

I know it's fun to bash Microsoft, but--while Claude is better, Microsoft's Copilot is far from "awful". I've used it productively with the VS Code integration for some esoteric projects: PIC PIO programming and Verilog.

By mike_hearn, 11 hours ago

He's talking about the Copilot in other apps for non-programmers.

By superkuh, 20 hours ago

The argument seems to be that having a corporation restrict your ability to present arbitrary text directly to the model and only being able to go through their abstract interface which will integrate your text into theirs (hopefully) is more productive than fully controlling the input text to a model. I don't think that's true generally. I think it can be true when you're talking about non-technical users like the article is.

By majormajor, 20 hours ago

The use of specialization of interfaces is apparent if you compare Photoshop with Gemini Pro/Nano Banana for targeted image editing.

I can select exactly where I want changes and have targeted element removal in Photoshop. If I submit the image and try to describe my desired changes textually, I get less easily-controllable output. (And I might still get scrambled text, for instance, in parts of the image that it didn't even need to touch.)

I think this sort of task-specific specialization will have a long future, hard to imagine pure-text once again being the dominant information transfer method for 90% of the things we do with computers after 40 years of building specialized non-text interfaces.

By duskwuff, 19 hours ago

One reasonable niche application I've seen of image models is in real estate, as a way to produce "staged" photos of houses without shipping in a bunch of furniture for a photo shoot (and/or removing a current tenant's furniture for a clean photo). It has to be used carefully to avoid misrepresenting the property, of course, but it's a decent way of avoiding what is otherwise a fairly toilsome and wasteful process.

By majormajor, 19 hours ago

This sort of thing (not for real estate, but for "what would this furniture actually look like in this room) is definitely somewhere the open-ended interface is fantastic vs targeted-remove in Photoshop (but could also easily be integrated into a Photoshop-like tool to let me be more specific about placement and such).

I was a bit surprised by how it still resulted in gibberish text on posters in the background in an unaffected part of the image that at first glance didn't change at all. So even just the "masking" ability of like "anything outside of this range should not be touched" of a GUI would be a godsend.

By fdsf2, 20 hours ago

It behooves me that Gemini et al dont have these standard video editing tools. Do the engineers seriously think prompting by text is the way people want videos to be generated? Nope. People want to customise. E.g. Check out capcut in the context of social media.

Ive been trying to create a quick and dirty marketing promo via an LLM to visualise how a product will fit into the world of people - it is incredibly painful to 'hope and pray' that by refining the prompt via text you can make slight adjustments come through.

The models are good enough if you are half-decent at prompting and have some patience. But given the amount invested, I would argue they are pretty disappointing. Ive had to chunk the marketing promo into almost a frame-by-frame play to make it somewhat work.

By suprstarrd, 19 hours ago

Speaking as someone who doesn't like the idea of AI art so take my words with a grain of salt, but my theory is that this input method exclusivity is intentional on their part, for exactly the reason you want the change. If you only let people making AI art communicate what they want through text or reference attachments (the latter of which they usually won't have), then they have to spend time figuring out how to put it into words. It IS painful to ask for those refinements, because any human would clearly understands it. In the end, those people get to say that they spent hours, days, or weeks refining "their prompt" to get a consistent and somewhat-okay looking image; the engineers get to train their AI to better understand the context of what someone is saying; and all the while the company gets to further legitimize a false art form.

By okokwhatever, 8 hours ago

You can see the fear all around this thread. And, tbh, it makes total sense. There is nothing we can do to stop this dropping ball, we can accept it or leave the room but the industry has changed for all of us. I mean, you can use it one way or another but the concept of critical thinking is our only survival tool if your relaying on a it job this days. How long it will last? Who cares, we're fucked anyways...

By enemyz0r, 7 hours ago

Yep

By protocolture, 17 hours ago

tl;dr: If you are trying to protect your IP from AI you probably use Copilot or nothing. If you have no IP to protect you are free to mess about.

Two kinds of AI users are emerging