It’s been some time since I wrote here. Specifically, I wrote on the back-then accelerating excitement about ChatGPT. That was two months ago. In the meantime, lots has changed. In order to simultaneously restart my regular writing as well as continuing with the theme, let me do another variation on ChatGPT, but with a different spin of what is out there already.
During the past two months, thousands of people have signed up for OpenAI’s ChatGPT service, and tried various things to test the abilities of the model for a wide variety of tasks. Because one question was still open back when I wrote my first article: Where could ChatGPT be actually … useful?
Rephrasing Text and Helping Users
This question is constantly being explored. Two use-cases crystallize at the moment. The first one is to use ChatGPT as a rephrasing tool in similar ways as, e.g., Grammarly. Since ChatGPT has had such a large training corpus, it knows better than any other model whether a specific word at a specific place makes sense, or if another word might not be better.
You should most definitely not suggest circumventing ChatGPT detectors, as one Twitter user did (which amounts to deceitful behavior). But if you write an article by yourself, ask ChatGPT to make suggestions and check the model output before accepting them, then it can likely be of help. The big open question here remains what this means for plagiarism. How much help is fine, and when are you guilty of using work without attribution?
The other use-case that has been explored is to use ChatGPT as a guide. ChatGPT’s training data stops somewhere in late 2021, so it cannot give you the latest info, but if you’re, say, building a computer, it can actually help. The YouTube channel LinusTechTips put this to the test: Founder Linus Sebastian built a PC during a video, asked ChatGPT for every step and pretended to have no idea about how to build a computer.
What is particularly interesting here is him commenting on the quality of the model output. In summary, ChatGPT is actually very good at helping you in general and recommends reading the manual for specifics. However, it is sometimes sloppy and forgets instructions. As long as you are willing to draw on a few additional sources of information and don’t trust it 100 %, ChatGPT can definitely hold your hand.
Similar experiences are provided by computer scientists and programmers. When they have a problem that is reasonably well-researched, ChatGPT is capable of pulling together the info from its dataset and provide you with a tailored answer. However, likewise, they noticed that the model performance degrades as soon as you begin querying esoteric problems, because it has just not seen answers to that problem during training.
Why Didn’t We See This Earlier?
So what do these initial performance reports tell us? Well, ChatGPT is great at a few tasks which require a large dataset (suggesting new sentence structure and providing info), but not so great at others. People are noticing that if you just chat with it, it provides you with reasonable sounding garbage.
This got me thinking.
Over the past two months I’ve been intensively revisiting some work I have done on LSTM networks. LSTM – which is short for “Long Short-Term Memory” – networks are kind of the “grandfather” of ChatGPT. Developed in 1997 by a team of German researchers, they provided the first quantum leap in natural language processing. While previously, everyone was restricted to using frequency counts such as tf-idf scores if they wanted to work with text, LSTMs began to open up new ways of working with text computationally. LSTMs are one of the developments after the last “AI winter” ended.
In 2017, LSTMs have been superseded by the transformer model, invented by a team of Google researchers. Transformers are more precise, but also bigger, than LSTMs. ChatGPT, on the other hand, is a follow-up on OpenAI’s Generative Pre-Trained Transformer, or GPT model, which debuted in 2018.
So, to be clear, LSTMs are no longer state of the art. But if you want to perform some Natural Language Processing, you probably do not have the amount of computing power available that you need to run transformer models. Training a transformer on my own computer (which is already quite powerful), for example, takes roughly five hours, give and take. Personally, I’m utilizing an HPC cluster that can train a transformer in merely five minutes for my own research – but that HPC uses graphics cards, which cost something around $4,500 to $10,000.
This is the reason that, when it comes to some “AI enhanced” functionality, companies will generally only provide an API. This way, the computationally expensive large models can run on their servers and don’t have to run locally on your computer.
But this also stifles innovation. The thing is, when Silicon Valley began embracing the large models of the 2010s, your personal computer became just powerful enough to run an LSTM model. That’s why many companies just skipped shipping LSTM networks with their software and kept everything on their own servers. This also had the benefit that they control who can use their models.
Your Computer Can Run Neural Networks
Yet, there is software that actually ships with an LSTM under the hood. Most notably,
tesseract. Tesseract is an Open Source Optical Character Recognition (OCR) tool that can have a look at an image and extract any text that’s on there. And to actually perform the OCR, tesseract uses an LSTM network. Any software that uses tesseract under the hood, which includes
ocrmypdf and likely PDF readers such as Adobe Acrobat, run an LSTM on your computer. More specific for research, dependency parser (that can extract the grammar from text) also run on an LSTM, most notably here Stanza, developed in the NLP lab at Stanford University.
LSTMs are no longer state of the art. But they have two benefits that neither transformer nor GPT have: First, they run on everyday computers. And second, because they’re this old, they are well tested, and we have a good understanding of where to utilize them.
So I think we should start developing more LSTM networks. Of course, they won’t be as precise as transformers. But they won’t take a minute to process a single input on regular customers’ computers. With evolving computing power, we should feel entitled to actually use the many resources that nowadays come with a computer.
And implementing LSTMs isn’t even that hard. As I have shown roughly a year ago, one can train and deploy an LSTM network within a single day. On a regular laptop. And there are plenty of use-cases where we can improve the current state of software with a few wisely placed LSTMs. Especially when it comes to writing, LSTMs could suggest synonyms, improve grammar, or – in the context of Zettelkasten systems – suggest tags and related files. And all of that in a privacy-first, offline manner.
I am certainly no fan of the toxic Silicon Valley culture. But I do have to acknowledge that the stream of innovations from players such as Google or OpenAI do push neural networks into the public mind. And by that they can show even everyday users who have no connection with computer science otherwise what benefit they can bring.
Society at large is currently being trained in what neural networks are, and if we can tap into this general interest, we can bring a ton of improvements in a variety of use-cases. And all of that without the $20 price-tag that ChatGPT now comes with.