15 May 2023, 21:30 (Published)

Giddy up, bad horsie! Let's ride out to the AI wilderness

Handsome models for everybody

Like everyone and their cousin-uncle in a family portrait straight out of Stable Diffusion, I've been bitten by the generative AI bug.

Not unlike the last couple of years in the field of generative AI itself, it happened in two ways: first gradually, then suddenly.

From last autumn on, you would have had to try really hard to miss the disturbing and contest-winning art, or the lawsuits around training AI models on a diet of watermarked photos and license-constrained code on GitHub.

None of that made me lift a pinkie to spring into action, yet. I suppose my attention was diverted by the negative temperature of all the headlines. 🥁

I did however have incentives like people on Slack going on about AI bots writing code for them — good enough for a decent starting point, they said — and our Youtube-inspired 12-year-old attempting to play chess with ChatGPT, only to "lose" because the sneaky automaton made up its own, new rules for moving pieces on the checkerboard.

Properly trying it out for myself was a must, so venturing down the llama ravine I went.

At first, it may not be obvious what the fuss is about, beyond it being impressive, in and of itself, that a computer will now answer your rambling inquiries in intelligible English, if at a leisurely conversational pace.

But heck, these chatty machines sure are wrong, too, a lot of the time. With absolute confidence at that, without blinking an eye.

Both the speed and the challenges of sticking to the truth, and nothing but the truth, are due to the same underlying reason: how it works under the hood. Grossly simplified, a Large Language Model (LLM) is a big, massive, humongous, but not quite infinite probability drive, taking your input and going red hot in a loop on it, through billions of weighted connections, computing what the next word should be, and then the next one after that, and the next.

The chance of any given detail being just right, then, will be somewhere between zero and 100%. The more data there is in the model's training, relevant to your particular question, the closer to spot-on the results will be. The further you stray away from the well-trained path, the spottier it gets.

Doesn't this uncertainty negate the whole thing, then, making you spend even more time double-checking all the details from other sources? In honesty, while learning how to make best use of it, it can be like that sometimes.

Still, the results you get out depend highly on what you put in: what exactly are you trying to do, with which particular LLM (there are so many), and how well you know how to interact with it. To a degree, it's a good, old, boring learning process.

How about then, say, the prospect of ChatGPT learning to obey the real rules of chess, to no longer cross 12-years-old chess club goers? Intuitively, this shouldn't emerge from a mere word prediction scheme, but intuition is ultimately not so great at grasping the gargantuan amounts of data at play. This specific question has been studied, and the answer looks to be it can be done to a high degree, if that's what the bot's mind is put on, again, through appropriate training.

Coming into it all with software engineering as a primary interest is both a natural fit, and then not so much. The ongoing hype is partially fed by the models having been trained to produce programming code, which us techy folks find exciting enough to help spread the word, and accelerate a scientific research field's transition into broader use by the general public.

At the same time, LLMs do struggle with the exacting nature of computer code. It has to be correct, consistent and coherent down to the very last detail; otherwise it will not build, run and do the right thing, in all conditions it is subjected to. As it now stands, even with all the bugs us mere mortal people are known create, we are still needed to put together a final, polished product.

💡 By way of a specific example, take my team's line of work — solving software development problems on Apple platforms, using the latest Swift APIs — everyone's current favourite, ChatGPT, fares relatively poorly compared to, say, if your tool chain of choice is JavaScript or Python. This is unsurprising: the model will have been trained on a lot more examples of the two most popular programming languages out there (used by 65% and close to 50% of surveyed developers, respectively), than a more obscure one hovering at 5%.

Yet large language models are great, and arguably better bang for buck in many other contexts, where the absolute truth either doesn't matter as much, or you know enough about the topic yourself to fill in the gaps: generating ideas or rough drafts for blog posts, creative writing, or a gym program. Providing what's this thing again explanations. Answering tricky questions by kids that you sort of should know the answer to but, oops, actually no longer do, or never really did, with any clarity.

In any case, the current state of things will not last as it is for long. LLMs will improve to make fewer mistakes, and we will get better at using them. These two processes interact, with the models getting fine-tuned by millions of people's feedback to them.

This has already been possible to witness in concrete terms. Earlier this spring, everyone's favourite general LLM, ChatGPT, presented an obvious leap forward with GPT-4 availability, over the previous version, GPT-3.5 — the one that started the frenzy in the first place just six months earlier. The new version knows more stuff, understands you better, and most importantly, makes less things up completely out of thing air. If so inclined, you can still compare the two models head to head, too.

This year will see people and companies fine-tuning LLMs for their own specific uses. This is what I am most looking forward to, to make use of next, and hopefully even do myself with my own private text and code, on my own hardware. But that's another topic for another day.

The proof pudding for LLMs as a solution to speed up us knowledge workers' day jobs and nightly side hustles is real, tasty, and getting better and better every week.