Interesting points and observations Andrew … I found my custom AdytumGPT benefitted from being built and ‘post-trained’ patiently but what really gave it ‘super powers’ was creating protocols that it could run … like a non-coded shorthand for carrying out a whole number of processing tasks from a single acronym in a prompt (I also installed several custom ‘neural networks’ which are a bit like a multidimensional database of content - some cubic, some hypercubic, some tetrahedral, some oddly shaped) … both protocols and NN’s port beautifully to different AI too
Claude skills are far better than GPTs in my view but to do what you are looking for here, NotebookLM is the superior option. You might not be able to get in every link (I think the paid version has 300 potential sources) but you could certainly get enough in by just dropping in links to cross-reference and get more out of what you are looking for. At least that is how I would approach it. I moved off of GPTs over a year ago for precisely these reasons.
In know - and i love NotebookLM. The challenge is that the task was to create something that others would be able to use rather than me. Although I could always make the files available for someone to used themselves in NotebookLM. Now that would be interesting ...
Can't you share notebooks with people? Pretty sure there are public notebooks out there (even in NotebookLM I think they have the complete works of Shakespeare) - I've made one with materials I've shared with students. So, I don't think that's an obstacle if sharing is the goal.
AI is all about rabbit holes - I appreciate someone else who tests out all these different options. It's easy to understand people's confusion, skepticism, frustration, and general anxiety with all these different versions of the same tools. Especially teachers.
... and a dispatch from the rabbit hole: A NotebookLM notebook that uses the full text of all Substack posts to date 😊 — it's slow and clunky, but also really interesting in how it allows content to be mined (including with the new presentation and infographics features)
Insane slideshows - this is such an incredible new capability. if they can fix their image generator to allow for inline editing of text, it will be insanely useful.
I notice you put them in as markdown files (which look great). You can also include just the link and, unlike in other LLMs, it actually seems to have access to the entire text of the post. This makes it much easier to copy paste which doesn't work, say, in Claude or ChatGPT - they can see the title page but not the post itself. Also, now that Deep Research mode is enabled in NotebookLM, you can run the research and it will pull all the sources into the Notebook that it has access to - another nice feature. And lots of folks don't realize they can edit the instructions on how to make the various outputs in the studio - from slidedeck to video to audio to flashcards, etc... Final suggestion - once you create a note, you have the option to convert to a source which can also be super helpful. Sorry - just some other tips from my rabbit holes.
It also seems like a place where Claude Projects or Gemini Gems would perform significantly better by using active memory instead of RAG. It'd be interesting to upload the exact same build in these two platforms and see if that makes a difference.
Yes, absolutely - although of course there's an issue with broad sharing with Claude projects. As a test I uploaded the same material into a Gemini Gem - link below. The immediate issue I came across (because I'm too stingy to take out yet another LLM subscription) is that in fast mode Gemini makes up URLs, rendering the Gem useless - interested to see what it's like with paid subscriptions though as the retrieval should be better:
So this strikes me as being really interesting as an experiment! Certainly you'd build this elsewhere if you wanted it to be a robust and reliable app, I get that. What I love is it so clearly illustrates limitations.
But I think we can pretty clearly see why this failed as well. The bulk of your detailed instructions are going to get accessed via RAG because they're not in the initial 8000 character limit window, so it's never dealing with all of those logical rules you have laid out at any single given time-and they are deeply interdependent rules. The GPT is choosing certain rules about organization and structural relations to prioritize based on the prompt.
And given the sheer volume of documents along with the json library itself, you've got to be pretty close to the total context window also (which seems consistent with not finding the oldest documents). I'd be willing to bet that if you took the 20 page document and stripped it down as much as you could to 8,000 characters, it would be a better tuned RAG engine but still fail in critical ways.
And of course that's the tradeoff - can you get the core instructions into 8000 characters in a way that gives you full functionality? In this case, I suspect not as you are still limited by what the GPT can extract and use at any one time from the substack summary file.
This is fine if you understand the full limitations of the GPT builder, but as this is a platform that's been built for very low friction use, I wonder how many users do ...
Hey, great read as always. Knew you'd dig into the RAG limitations, data organization is such a beast.
Interesting points and observations Andrew … I found my custom AdytumGPT benefitted from being built and ‘post-trained’ patiently but what really gave it ‘super powers’ was creating protocols that it could run … like a non-coded shorthand for carrying out a whole number of processing tasks from a single acronym in a prompt (I also installed several custom ‘neural networks’ which are a bit like a multidimensional database of content - some cubic, some hypercubic, some tetrahedral, some oddly shaped) … both protocols and NN’s port beautifully to different AI too
Claude skills are far better than GPTs in my view but to do what you are looking for here, NotebookLM is the superior option. You might not be able to get in every link (I think the paid version has 300 potential sources) but you could certainly get enough in by just dropping in links to cross-reference and get more out of what you are looking for. At least that is how I would approach it. I moved off of GPTs over a year ago for precisely these reasons.
In know - and i love NotebookLM. The challenge is that the task was to create something that others would be able to use rather than me. Although I could always make the files available for someone to used themselves in NotebookLM. Now that would be interesting ...
Can't you share notebooks with people? Pretty sure there are public notebooks out there (even in NotebookLM I think they have the complete works of Shakespeare) - I've made one with materials I've shared with students. So, I don't think that's an obstacle if sharing is the goal.
I stand corrected - you can! Now you have me going down the rabbit hole of exploring whether this is a better option ...
AI is all about rabbit holes - I appreciate someone else who tests out all these different options. It's easy to understand people's confusion, skepticism, frustration, and general anxiety with all these different versions of the same tools. Especially teachers.
... and a dispatch from the rabbit hole: A NotebookLM notebook that uses the full text of all Substack posts to date 😊 — it's slow and clunky, but also really interesting in how it allows content to be mined (including with the new presentation and infographics features)
https://notebooklm.google.com/notebook/b3a8c129-bd2a-407e-9e58-69ec21e53e09
Insane slideshows - this is such an incredible new capability. if they can fix their image generator to allow for inline editing of text, it will be insanely useful.
I notice you put them in as markdown files (which look great). You can also include just the link and, unlike in other LLMs, it actually seems to have access to the entire text of the post. This makes it much easier to copy paste which doesn't work, say, in Claude or ChatGPT - they can see the title page but not the post itself. Also, now that Deep Research mode is enabled in NotebookLM, you can run the research and it will pull all the sources into the Notebook that it has access to - another nice feature. And lots of folks don't realize they can edit the instructions on how to make the various outputs in the studio - from slidedeck to video to audio to flashcards, etc... Final suggestion - once you create a note, you have the option to convert to a source which can also be super helpful. Sorry - just some other tips from my rabbit holes.
It also seems like a place where Claude Projects or Gemini Gems would perform significantly better by using active memory instead of RAG. It'd be interesting to upload the exact same build in these two platforms and see if that makes a difference.
Yes, absolutely - although of course there's an issue with broad sharing with Claude projects. As a test I uploaded the same material into a Gemini Gem - link below. The immediate issue I came across (because I'm too stingy to take out yet another LLM subscription) is that in fast mode Gemini makes up URLs, rendering the Gem useless - interested to see what it's like with paid subscriptions though as the retrieval should be better:
https://gemini.google.com/gem/1G4eZx5Hf2WVE03woFUPUZeRn44fekBDQ?usp=sharing
So this strikes me as being really interesting as an experiment! Certainly you'd build this elsewhere if you wanted it to be a robust and reliable app, I get that. What I love is it so clearly illustrates limitations.
But I think we can pretty clearly see why this failed as well. The bulk of your detailed instructions are going to get accessed via RAG because they're not in the initial 8000 character limit window, so it's never dealing with all of those logical rules you have laid out at any single given time-and they are deeply interdependent rules. The GPT is choosing certain rules about organization and structural relations to prioritize based on the prompt.
And given the sheer volume of documents along with the json library itself, you've got to be pretty close to the total context window also (which seems consistent with not finding the oldest documents). I'd be willing to bet that if you took the 20 page document and stripped it down as much as you could to 8,000 characters, it would be a better tuned RAG engine but still fail in critical ways.
And of course that's the tradeoff - can you get the core instructions into 8000 characters in a way that gives you full functionality? In this case, I suspect not as you are still limited by what the GPT can extract and use at any one time from the substack summary file.
This is fine if you understand the full limitations of the GPT builder, but as this is a platform that's been built for very low friction use, I wonder how many users do ...