Sometimes I accept this, and I vibe-code, when I don't care about the result. When I do care about the result, I have to read every line myself. Since reading code is harder than writing it, this takes longer, but LLMs have made me too lazy to write code now, so that's probably the only alternative that works.
I have to say, though, the best thing I've tried is Cursor's autocomplete, which writes 3-4 lines for you. That way, I can easily verify that the code does what I want, while still reaping the benefit of not having to look up all the APIs and function signatures.
Until you lose access to the LLM and find your ability has atrophied to the point you have to look up the simplest of keywords.
> the last few years of my life has been a repetition of frontend components and api endpoints, which to me has become too monotonous
It’s a surprise that so many people have this problem/complaint. Why don’t you use a snippet manager?! It’s lightweight, simple, fast, predictable, offline, and includes the best version of what you learned. We’ve had the technology for many many years.
You can locally run pretty decent coding models such as Qwen3 Coder in a RTX 4090 GPU through LM Studio or Ollama with Cline.
It's a good idea even if they give slightly worse results in average, as you can limit your spending of expensive tokens for trivial grunt work and use them only for the really hard questions where Claude or ChatGPT 5 will excel.
Devs shouldn't be blindly accepting the output of an LLM. They should always be reviewing it, and only committing the code that they're happy to be accountable for. Consequently your coding and syntax knowledge can't really atrophy like that.
Algorithms and data structures on the other hand...
I never remembered those keywords to begin with.
Checkmate!
Realistically, that's probably never going to happen. Expecting it is just like the prepper mindset.
I think now we have identified this problem (programmers need more abstract metaprogramming tools) and a sort of practical engineering solution (train LLM on code), it's time for researchers (in the nascent field of metaprogramming, aka applied logic) to recognize this and create some useful theories, that will help to guide this.
In my opinion, it should lead to adoption of richer (more modal and more fuzzy) logics in metaprogramming (aside from just typed lambda calculus on which our current programming languages are based). That way, we will be able to express and handle uncertainty (e.g. have a model of what constitutes a CRUD endpoint in an application) in a controlled and consistent way.
This is similar how programming is evolving from imperative with crude types into something more declarative with richer types. (Roughly, types are the specification and the code is the solution.) With a good set of fuzzy type primitives, it would be possible to define a type of "CRUD endpoint", and then answer the question if the given program has that type.
If you can imagine an evolutionary function of noabstraction -> total abstraction oscilating overtime, the current batch of frameworks like Django and others are roughly the local maxima that was settled on. Enough to do what you need, but doesn’t do too much so its easy to customize to your use case.
I do think that in a few years time, next generation coding LLMs will read current-generation LLM generated code to improve on it. The question is whether they're smart enough to ignore the implicit requirements in the code if they aren't necessary for the explicit ones.
(this comment makes sense in my head)
Most if not all of my professional projects have been replacing existing software. In theory, they're like-for-like, feature-for-feature rewrites. In practice, there's an MVP of must-have features which usually is only a fraction of the features (implicit or explicit) of the application it replaced, with the rewrite being used as an opportunity to re-assess what is actually needed, what is bloat over time, and of course to do a redesign and re-architecture of the application.
That is, rewriting software was an exercise in extracting explicit features from an application.
Reading bad code is harder than writing bad code. Reading good code is easier than writing good code.
At this point in my career 35 years in I find reading and writing code whether I wrote it or other did irrelevant. Bad or good code, it’s all the same. By far the most effective work I do involves reading a lot of complex code written by many people over many years and seeing the exact one line to change or improve.
I find LLM assisted coding very similar frankly. I’ve finished maybe 20 projects or more on the last seven months on my own time that I never would have been able to do in my lifetime for want of free time to learn minutia in stuff I am not familiar with. The parts it get hung up on I’m able with quick inspection to recognize and unwedge it, just like any junior engineer. The junior engineers also are often much better versed in XYZ library than I am.
LLM assisted coding ("vibe coding") is just project management.
You ask it to do things, then you check the work to a sufficient degree.
The better the specifications and documentation you give it, the better the result will be. Keeping tasks short and verifiable also helps a lot.
I've written SO many small tools for myself during the last year it's not even funny. Upgraded some shitty late night Python scripts to proper Go applications with unit tests and all, while catching up on my TV shows.
Converted my whole rat's nest of Docker compose files to a single Opentofu declarative setup.
None of this would've gotten done without an LLM assistant.
Just want to add one more point: code is not fiction or even nonfiction, “good or bad” style can be subjective, but correct or not correct is not, regardless of the reviewer’s mental model.
The difficulty of reading code is to understand its logic and logical consequences. The more complex the codebase (not just the line we are reading), the more risky to modify it.
That is why I use LLM to write a lot of tests and let it review the logs to help me understand the logic. Even the tests can be disposable.
The way I describe it is almost gambling with your time. Every time I want to reach for the Cline extension in VSCode, I always ask myself "if this gamble worth it?" and "what are my odds for this gamble?".
For some things like simple refactoring I'm usually getting great odds so I use AI, but I would say at least 5-6 times last week I've thought about it and ended up doing it by hand as the odds were not in my favor.
One thing I've picked up using AI over the past few months is this sense of what it can and can't do. For some things I'm like "yeah it can do this no problem" but for other tasks I find myself going "better do this by hand, AI will just fuck it up"
isn't that the same as delegated task to jr developer but you still have to check their work as sr?
> Can you speculate on why this problem was only happening with these embedded buttons? Is this a flaw in Qt, or is this application trying to do something unusual?
Why this problem occurred: This is a Qt design issue, not a flaw per se. Qt's item delegate editing model assumes editors are simple widgets (line edits, spin boxes, combo boxes) that don't spawn modal dialogs. When a modal dialog opens:
1. The modal dialog steals focus from the editor button
2. Qt's QAbstractItemView sees the FocusOut event and assumes editing is complete
3. It calls QStyledItemDelegate::destroyEditor() to clean up the editor
4. The button gets deleted while the color dialog is still running
5. When the color dialog closes, it tries to interact with the deleted button
This application is doing something unusual - using a button that spawns a modal dialog as a table cell editor. Most Qt delegates use immediate editing widgets (text fields, dropdowns) that don't require modal interactions.
The eventFilter solution works by intercepting the FocusOut event that triggers editor destruction, essentially telling Qt "I'm still editing, don't destroy me yet."
> Is there a danger of leaked Qt resources now that we are blocking destruction? Yes, that's a valid concern. By blocking FocusOut events, we're preventing Qt's normal editor cleanup mechanism. However, the risk is likely minimal because:
1. Qt still cleans up eventually - The editor will be destroyed when the table view itself is destroyed or when editing truly ends (e.g., clicking elsewhere, pressing Enter/Escape)
2. Short-lived editors - These editor widgets are typically very short-lived (created when editing starts, destroyed when editing ends)
3. Parent widget cleanup - Since the button has a parent widget, Qt's parent-child ownership will clean it up when the parent is destroyed
To be completely safe, we could add a more targeted approach - only block FocusOut events while a color dialog is actively open, rather than blocking all FocusOut events. We could track when the modal dialog is shown/hidden and only filter during that window.
But for this use case, the current solution should be fine since:
- Color picking is infrequent
- The editor lifetime is still bounded by the table view's normal lifecycle
- Qt's parent-child cleanup provides a safety net
The memory usage impact should be negligible compared to the complexity of implementing more sophisticated tracking.
-----I've definitely had less-satisfying discussions over bugs with actual human senior software engineers than this.
I’d love to see how this compares when either the problem space is different or the language/ecosystem is different.
It was a great read regardless!
LLMs are nothing more than rubber ducking in game dev. The code they generate is often useful as a starting point or to lighten the mood because it's so bad you get a laugh. Beyond that it's broadly useless.
I put this down to the relatively small number of people who work in game dev resulting in relatively small number of blogs from which to "learn" game dev.
Game Dev is a conservative industry with a lot of magic sauce hidden inside companies for VERY good reasons.
Multiplying two 24 bit posits in 8-bit Avr for instance. No models have succeeded yet, but usually because they try and put more than 8 bits into a register. Algorithmically it seems like they are on the right track but they don't seem to be able to hold the idea that registers are only 8-bits through the entirety of their response.
Something along the lines of
Can you generate 8-bit AVR assembly code to multiply two 24 bit posit numbers
You get some pretty funny results from the models that have no idea what a posit is. It's usually pretty clear to tell if they know what they are supposed to be doing. I haven't had a success yet (haven't tried for a while though). Some of them have come pretty close, but usually it's the trying to squeeze more than 8 bits of data into a register is what brings them down.
One thing you learn quickly about working with LLMs if they have these kind of baked-in biases, some of which are very fixed and tied to their very limited ability to engage in novel reasoning (cc François Chollet), while others are far more loosely held/correctable. If it sticks with the errant patten, even when provided the proper context, it probably isn’t something an off-the-shelf model can handle.
Although in fairness this was a year ago on GPT 3.5 IIRC
GPT3.5 was impressive at the time, but today's SOTA (like GPT 5 Pro) are almost night-and-difference both in terms of just producing better code for wider range of languages (I mostly do Rust and Clojure, handles those fine now, was awful with 3.5) and more importantly, in terms of following your instructions in user/system prompts, so it's easier to get higher quality code from it now, as long as you can put into words what "higher quality code" means for you.
A side note: as it's been painfully pointed out to me, "vibe coding" means not reading the code (ever!). We need a term for coding with LLMs exclusively, but also reviewing the code they output at each step.
BASE: Brain And Silicon Engineering
CLASS: Computer/Llm-Assisted Software Specification
STRUCT: Scripting Through Recurrent User/Computer Teamup
ELSE: Electronically Leveraged Software Engineering
VOID: Very Obvious Intelligence Deficit
Okay maybe not that last one
Prediction: arguments over the definition will ensue
It doesn't imply AI, but I don't distinguish between AI-assisted and pre-AI coding, just vibe-coding as I think thats the important demarcation now.
"Lets prompt up a new microservice for this"
"What have you been prompting lately?"
"Looking at commits, prompt coding is now 50% of your output. Have a raise"
Probably it was said many times already, but it will rather be the competition between programmers with AI and programmers without one, rather than no programmers with AI.
In particular, I love this part:
"I had serious doubts about the feasibility and efficiency of using inherently ambiguous natural languages as (indirect) programming tools, with a machine in between doing all the interpretation and translation toward artificial languages endowed with strict formal semantics. No more doubts: LLM-based AI coding assistants are extremely useful, incredibly powerful, and genuinely energising.
But they are fully useful and safe only if you know what you are doing and are able to check and (re)direct what they might be doing — or have been doing unbeknownst to you. You can trust them if you can trust yourself."
Which isn’t really “vibe coding” as it’s been promoted, i.e. a way for non-programmers to just copy and paste their way to fully working software systems.
It’s a very powerful tool but needs to be used by someone with the expertise to find the flaws.
What it does is pretty simple. You give it a problem, setup enviornment with libraries and all.
It continuously makes changes to the program, then checks it output.
And iteratively improves it.
For example, we used it to build a new method to apply diffs generated by LLMs to files.
As different models are good at different things, we managed to run it against models to figure out which method performs best.
Can a human do it? I doubt.
> Also, these assistants (for now) appear to exhibit no common sense about what is “much”, “little”, “exceptional”, “average”, etc. For example, after measuring a consumption of 3.5GB of memory (!!) for solving a 3-disk problem (due to a bug), the assistant declared all was well...
That describes a good portion of my coworkers.
This is not someone who just said "build me X", left it to run for a while, and then accepted whatever it wrote without reading it.
(I'm not criticizing the article's author here. It was an excellent, thoughtful read, and I think an article that was actually about something vibe-coded would be boring and not really teach me anything useful.)
`wrote a non-optimal algorithm and claimed it is optimal (in terms of guaranteed shortest solution) until (sometimes later) I noticed the bug;`
That's my general concern, that the Ai generation would make mistakes that I would otherwise catch, but getting into the vibe, I might start to trust the AI a bit too much, and all those lovely subtle bugs might pop up.
First time encountering the phrase.
Evolution went from Machine Code to Assembly to Low-level programming languages to High-level programming languages (with frameworks), to... plain English.
Better results if you… tip the AI, offer it physical touch, you need to say the words “go slow and take a deep breath first”…
It’s a subjective system without control testing. Humans are definitely going to apply religion, dogma, and ritual to it.
- Threatening or tipping a model generally has no significant effect on benchmark performance.
- Prompt variations can significantly affect performance on a per-question level. However, it is hard to know in advance whether a particular prompting approach will help or harm the LLM's ability to answer any particular question.
Now… for fun. Look up “best prompting” or “the perfect prompt” on YouTube. Thousands of videos “tips” and “expect recommendations” that are bordering the arcane.
"You are a world-class developer in <platform>..." type of crap.
I'm not saying I've proven it or anything, but it doesn't sound far-fetched that a thing that generates new text based on previous text, would be affected by the previous text, even minor details like using ALL CAPS or just lowercase, since those are different tokens for the LLM.
I've noticed the same thing with what exact words you use. State a problem as a lay/random person, using none of the domain words for things, and you get a worse response compared to if you used industry jargon. It kind of makes sense to me considering how they work internally, but happy to be proven otherwise if you're sitting on evidence either way :)
The issue is that you can’t know if you are positively or negatively effecting because there is no real control.
And the effect could switch between prompts.
I’m not saying introducing noise isn’t a valid option, just doing it in ‘X’ or ‘y’ method as dogma is straight bullshit.
"Implement xxx"
?
I don't think we can offend these things (yet).
I'm hoping that sharing my experience, amongst all others, can: A) help someone understand more / set their expectations B) get someone to point out how to do it better
On one hand, I managed, in 10 days, to get the amount of functionality would take ~2 months of coding "by hand". If I started the same project now - after learning, realising what works and not, and adapting - it would probably be possible in 5. The amount done was incredible - and it's working.
On the other hand:
- you need to be already very experienced in knowing how things should be built well, how they need to work together, and what is a good way to organize the user interface for the functionality
- you then need to have some practical experience with LLMs to know the limitations, and guide it through the above gradually, with proper level of detail provided and iteration. Which takes attention and process and time - it won't be a couple of sentences and hitting enter a couple of times, no matter how smart your prompts are
- otherwise, if you didn't think it through and planned it first, and did it with consideration of LLM itself, and you just give it high level requirements for an app with multiple functionalities - you'll just get a mess. You can try and improve your prompts over and over, and you'll get a different kind of mess every time, but mess nevertheless
- even with doing all of the above, you'll get a very very mediocre result in terms of "feeling of quality" - thoughtfulness of design, how information is laid out and things are organised - UX and polish. It might be more than fine for a lot of use-cases, but if you're building something that people need to use productively every day, it's not passable...
- the problem is that, at least in my experience, you can't get it to high level with LLM in an automated way - you still need to craft it meticulously. And doing that will require manually tearing down a lot of what LLM generated. And that way you'll still end up with something at least a bit compromised, and messy when it comes to code
In summary, it's amazing how far it's come and how much you can do quickly - but if you need quality, there's no going around it, you still need most of the effort and time do invest in it. Considering both together, I think it's still a great position to be in currently for people who can provide that needed level of quality - sometimes you can do things very easily and quickly and sometimes you do your proud work with a bit of assistance along the way.
I'm not sure until when that will work, or what happens later, or how does current state already bodes for less experienced people...
It looks like the methodology this chap used could become a boilerplate.
I still wonder, if (as the author mentions and I've seen in my experience) companies are pivoting to hiring more senior devs and fewer or no junior devs...
... where will the new generations of senior devs come from? If, as the author argues, the role of the knowledgeable senior is still needed to guide the AI and review the occasional subtle errors it produces, where will new generations of seniors be trained? Surely one cannot go from junior-to-senior (in the sense described in TFA) just by talking to the AI? Where will the intuition that something is off come from?
Another thing that worries me, but I'm willing to believe it'll get better: the reckless abandon with which AI solutions consume resources and are completely obvious to it, like TFA describes (3.5 GB of RAM for the easiest, 3 pillar Hanoi configuration). Every veteran computer user (not just programmers but also gamers) has been decrying for ages how software becomes more and more bloated, how hardware doesn't scale with the (mis)use of resources, etc. And I worry this kind of vibe coding will only make it horribly worse. I'm hoping some sense of resource consciousness can be included in new training datasets...
Right now we're comparing seniors who learned the old way to juniors who learned the old way. Soon we'll start having juniors who started out with this stuff.
It also takes time to learn how to teach people to use tools. We're all still figuring out how to use these, and I think again, more experience is a big help here. But at some point we'll start having people who not only start out with this stuff, but they get to learn from people who've figured out how to use it already.
But who will hire them? Businesses are ramping down from hiring juniors, since apparently a few good seniors with AI can replace them (in the minds of the people doing the hiring).
Or is it that when all of the previous batch of seniors have retired or died of old age, businesses will have no option but to hire juniors trained "the new way", without a solid background to help them understand when AI solutions are flawed or misguided, and pray it all works out?
Anyone who wants a competitive advantage?
My claim is that the gap between junior and senior has temporarily widened, which is why someone who previously would want to hire juniors might not right now. But I expect it will narrow as a generation that learned on this stuff comes into the fold, probably to a smaller gap than existed pre-LLM.
I think it will also narrow if the tools continue to get better.
Do you mean long-term vision? Short-term the advantage is in hiring only seniors, but do you mean companies will foresee trouble looming ahead and "waste" money on juniors just to avert this disaster?
My own feeling is that this could become like a sort of... well, I recently heard of the term "population time bomb", and it was eye-opening for me. How once it starts rolling, it's incredibly hard/impossible to revert, etc.
So what if we have some sort of "experience time bomb" here? Businesses stop hiring juniors. Seniors are needed to make AI work, but their experience isn't passed on because... who to pass it to? And then juniors won't have this wealth of "on the job experience" to be able to smell AI disaster and course-correct. The kind of experience you learn from actual work, not books.
I could certainly see a wave of oversupply of juniors followed by a wave of undersupply. Say we stop hiring many juniors - a lot of people trying to get into industry right now are in for a rude time. Then maybe fewer people trying to learn it over the next few years, but those who do end up quite valuable.
If you are 100% vibe coding then you do not own the code at all. You might have some limited protections in the UK, but EU/US any AI generated code can't be copyrighted.
So someone can steal your Vibe coded app and resell it without fear.
.. The other major issue that I have seen using LLMs is that it is useless if you "don't know what you don't know". Sample code often given is incorrect, or not the best approach.
A few times when I discuss my issues with the code generated, it has offered better code to do the same thing.
It takes 10 to 20 times as long to debug because the code is impossible to change or understand how it works.
This is the crux of the A.I. issue.
What a spectacular article.