Revisiting custom GPTs — the good, the bad, and the ... interesting!
It's been a minute but I thought it time to take a fresh look at OpenAI's custom GPT feature. I found myself both frustrated and surprised!
OpenAI launched custom GPT’s a couple of years ago in November 2023. They were a neat way of creating a ChatGPT-powered custom AI bot without any coding, that could be shared with others.
At the time, I played around with them and found them cute and interesting, but ultimately limited frustrating.
Two years on though, more and more people seem to be folding custom GPTs into their workflow and offering them to others — sometimes as part of a professional service. And so I thought it was about time I took another look — just in case anything had changed.
The custom GPT “Grand Challenge” I chose was to create a GPT that would allow users to engage with over 300 posts on this Substack. I’ve long been frustrated by how quickly some of these become buried and invisible, and so it seemed like a no-brainer to create an app that allowed readers to rediscover them in interesting and useful ways.
With hindsight, this was probably not a good project to test OpenAI’s GPT builder platform with, as will become apparent shortly. But it did end up reminding me of where some of the big limitations are with GPTs that depend on a simplified Retrieval-Augmented Generation (RAG) approach.1
My plan was a good one — or so I thought — and a pretty sophisticated one to boot!
I started off by exporting all my Substack posts (including pre-launch imported posts) into a database file (324 of them), and set to work with Claude (using Opus 4.2) to synthesize this into a well-structures JSON file (data stored as structured text) with summaries of each post and links to the original — including keywords and categories.
The idea was that the GPT would refer to the file every time it was asked a question, and provide blindingly insightful insights based on 324 posts, including links to sources. And the summaries and meta data associated with each post meant that the GPT wasn’t overwhelmed with several megabytes of data.
What could possibly go wrong?
Lots as it turns out, and as I should have known.
But before I get there, on to part 2 of the plan:
I next asked Claude to construct four additional documents:
First, I asked it to develop an author voice style guide, based on the posts — I wanted the GPT to sound like me.
Next, I asked for a guide to my personal and professional perspectives, again pulling mainly from the posts.
Thirdly, I asked it to develop a core set of instructions for the GPT.
And finally, I asked for an additional document containing detailed guidance, as the core GPT instructions are limited to less than 8000 characters.
In other words, working with Claude (which is my preference for developing complex document ecosystems like this) I started to construct an extensive instruction set and knowledge base for the GPT that would ensure that it was powerful, smart, and accurate.
Uploading the files to my new GPT and running it, it was fantastic! The responses were articulate, informed, insightful, serendipitous, and persuasive. In other words, everything I hoped for.
But they were also deeply flawed.
What I’d forgotten was that OpenAI’s machinery behind the GPTs chunks and segments the uploaded knowledge documents, meaning that. at any one time it only sees a fraction of them.
In effect, knowledge retrieval was only partial at any given time, and was based on context. And so while the GPT could respond with eloquence and beauty, accuracy and usefulness flew right out of the window.
It felt just like being back in November 2023!
Not to be beaten, I worked on seeing if I could find a way to overcome the limitations. I spent hours with Claude, iterating and reiterating, trying different approaches (including moving from a JSON to a plain text file), getting creative with the instructions, and occasionally losing my rag (pun very much intended!).
But to no avail. The GPT continued to be superficially compelling and substantively flawed (two tests which it never completely succeeded at were being able to reliably retrieve the oldest post on the Substack, and breaking away from repeatedly referring to a small number of posts once it had latched onto them.)
And so I came to the disappointing conclusion that custom GPTs — at least, those built using OpenAI’s custom GPT builder — remain deeply flawed. They’re fine for playing around with, and possibly good for some tasks if you understand their limitations. But they are still deeply limited by partial and opaque retrieval, are deeply influenced by internal heuristics, and have a tendency to favor beautiful responses over accurate or reliable ones. They are highly sensitive to context, and change behavior depending on the subscription plan a user has and the model being used.
In other words, they are interesting and persuasive (and incredibly easy to spin up), but deeply unreliable.
But then …
Having realized that what I wanted was beyond the capabilities of a custom GPT, I thought why not lean into the flaws?
And so, still working with Claude, I added instructions that introduced a dash of epistemic humility and reflexivity into the GPT’s character. And I went line by line through the instruction files to ensure that the GPT represented what I was looking for, rather than just what Claude thought I wanted.
I also, at some point, added summary files of my three books, just because Claude seemed to think they were important.
The resulting GPT was still badly flawed. But it now realized this, and was happy to talk about it! And this made it far more interesting to engage with.
The result is a GPT that is sometimes brilliant and sometimes not, but is usually aware of its limitations and happy to help users work around them — or simply embrace them.
It also provides, as it turns out, a great meta-reflection on generative AI and our evolving relationships with it.
If you want to check out what this reflectively flawed GPT is like, please do check it out — the link’s below:
The bottom line though is that not a lot has changed in the past two years as far as I can see in OpenAI’s custom GPT land — although I’m sure there’ll be a deluge of commenters telling me how wrong I am.
Which in itself will be great, as I’d like to think that some things have got better since November 2023 😊
(Update: I’ve posted a couple of links to alternatives to OpenAI’s custom GPTs in the comments, including Gemini Gems and Google’s NotebookLM. Each has different pros and cons, but worth exploring — as is the GPT above.)
Postscript
Just in case anyone’s interested, you can examine the GPT files I used below (not including the knowledge base as it’s rather large) — although please be aware that what’s powering the GPT when you read this may be different, because of course another issue with the whole system is that there is no version control!
I should be clear here that, while the OpenAI GPT Builder is an incredibly easy way to build customized GPT applications, it is also one of the least robust ways of doing this. My purpose here was not to create a robust app — for that I’d have used different approaches — but to stress-test OpenAI’s platform, because the lack of friction between idea and app here makes it especially attractive to users.




Claude skills are far better than GPTs in my view but to do what you are looking for here, NotebookLM is the superior option. You might not be able to get in every link (I think the paid version has 300 potential sources) but you could certainly get enough in by just dropping in links to cross-reference and get more out of what you are looking for. At least that is how I would approach it. I moved off of GPTs over a year ago for precisely these reasons.
It also seems like a place where Claude Projects or Gemini Gems would perform significantly better by using active memory instead of RAG. It'd be interesting to upload the exact same build in these two platforms and see if that makes a difference.