AI | Dan Calle

Andrej Karpathy posted an interesting question on X recently, asking how people used LLM chat bots. Specifically he was talking about making new chats for every question/purpose, vs. the One Thread approach. He discussed some trade-offs in both performance and fidelity when taking advantage of the large context windows now available to us. It’s really interesting and I recommend you read the whole post. But I was surprised to find that no one else who answered him described the method I use, and so I wrote it up, and it garnered a lot of interest, so I thought I’d flesh it out a bit further here:

I manage hundreds (thousands?) of conversations that fall into four groups:

long-running, bookmarked – basically my staff
3 examples:
- I have an AI personal trainer/nutritionist I always return to for training/nutrition questions.
- I have one conversation that helped me build my current home Linux box, and I return to it for any HW/OS/SW questions related to it.
- I have several AI professors I use to learn various subjects – one per subject
useful, may return to, but not necessarily
examples:
- I saw nice sweet potatoes at the grocery store – asked about sweet potato soup, made soup. A week later, I saw a nice pumpkin – wanted to make a similar soup. Remember that convo, which already knows my equipment and preferences, returned to that conversation for a different soup.
- in general, if I think I’ve asked a question before, and the context from before will save me some time now, I use search to look at previous conversations, and might continue one of them rather than start a new one
One-off questions: I usually ask them in a fresh conversation
Truly throwaway questions. Not only do I start a fresh conversation, but I will usually archive/delete it when I’m done. This is when the subject is pretty trivial, and I view it as clutter.

Special case for some long-running conversations: I have also noticed that sometimes overly long context can start to produce weird effects (and Andrej describes a bit of why this happens). The LLM starts to hallucinate more, is less reliable about remembering details, and so on. In situations like this I sometimes ask it to generate a detailed summary of everything we have been working on, and I may ask follow-up questions, and then I paste the results into a new conversation and continue from there basically having transplanted the essentials from the older chat to the fresh one.

It is true that, by this point, the LLM is already getting a bit squirrely, so if the summary is missing anything important, I remind the LLM, and it’s enough to help it remember and then it add a summary of that too. It’s not that it forgets, it just has lost the thread on what is most relevant. But I can help, and so I repeat until I’m satisfied and only then do I feed it to the new chat. It’s not perfect, but I’m also not deleting the old chat so I can go back to it if necessary.

One of the followup questions I got was about how I go about organizing and finding the conversations I want in the midst of the thousands I have. That works like this:

Both Grok & ChatGPT make it relatively easy. You can rename chats, so for my “staff,” that is, my long-running chats-with-a-purpose that I return to, I rename them like,

** Personal Trainer **
** Nuclear Engineering Professor **
** Productivity Coach **

which makes them easy to pick out of the list.

Grok even lets you bookmark specific chats. And, if one of my AI staff has scrolled way down because I haven’t talked to it in a while, it will pop right up when I search for it because it has a good and memorable name.

I used to create browser bookmarks too, because each chat has its own URL, but I find that’s not needed and so I stopped.

As for the remaining conversations, I’m less worried about needing to find them. Like in my soup recipe example, I just search for “soup” and 13 conversations pop up, and I reopen the one that is the most relevant to what I want to do now.

Hope you found this as helpful and let me know on X (I don’t ever look at the comments here), or if you have a better method I’d love to hear it too.

Bing chat is not above ending the chat when it paints itself into a corner.

If you don’t know the scene, it’s a classic: https://youtu.be/5oWNQSCPWy4

In an effort to get access to the new ChatGPT-4 features, I’ve started using the Bing iOS app. And, while it’s cool in some ways, I have found that wrapping GPT4 in Bing Chat often feels like they’ve wrapped Leonardo da Vinci in Chandler Bing.

My adventures started yesterday evening, in a way that has nothing to do with the GPT-4 features, but seem oddly relevant to what happened today, so I’ll describe it briefly:

I am an American ex pat living in Vienna. So my Bing Chat has my location as Vienna, my region as Austria, and my language as English. But all its notifications about breaking and local news, weather, whatever, are in German. It’s irritating because my German is not great, so I sent feedback on this through the app.

Adventures in font identification

Flash forward to this morning. Having another tech issue that has nothing to do with GPT-4, I had decided to write a post about the problems I’m experiencing with iOS 17. As part of this, I wanted to make a parody image of translation app, changing the text of a button. I decided to upload a picture of the button and ask GPT-4 if it could identify the font. OpenAI hasn’t rolled multi-modal input out to my ChatGPT Plus account yet, so Bing Chat was my only option. I gave it the image, and asked it “What font is this?”

It searches for “German button font,” which is not what I wanted, but it’s an understandable action from the AI. And, though the response does seem to include some kind of interpretation of the style of font, it is also clear that the search it performed also informed its answer, because it mentioned fonts used in central Europe, and informed me as to where I could download Old German fonts. So, as I am accustomed to having fruitful interactions with ChatGPT when I explain how its response differs from what I’m looking for, I explained: “Don’t search for German fonts, the word in the image is irrelevant, I just want to know what font is used by the way it looks.”

Denial isn’t just a river in Egypt

Bing Chat straight up denied that it searched for German fonts, and said it only used the style of the text to identify it. I respond, “Yes, you did. You searched for ‘German button font,’ and though your response might have included information based on the style of the text, it also mentioned downloading Old German fonts. I just want you to identify the font by how it looks, unbiased from the search for the word in the image.”

In response, Chandler Bing Chat pretended the connection was poor and ended the chat. I’m serious, I’m not joking: it said I had a network connection problem, and that I would have to restart the chat.

Second attempt

Now, at this point, I actually thought the network connection issue was real. But, in a fit of anthropomorphic pique, I started the new chat like this:

This time it didn’t say anything about a network connection, but it did end the chat immediately. Also, what the heck is with responding in German!? It had never done that before! I’d used it a number of times over the last several days, and Bing Chat never wrote to me in German – the German notifications I was complaining are part of the Bing iOS apps other features – not the chat. But now you see why I mentioned them in the beginning.

Hm, ok, well, fine, it’s offended now. I took the screenshots you see above (I wish I’d done so for the previous chat, but I’ll show you a recreation at the end of this article), and created a new chat.

Third time’s the…nope

This time I wanted to get back to my original question, and wanted to prevent the problem from before, so I uploaded the image, and asked it to identify the font in the image, and explicitly told it that the text in the image is irrelevant.

It said it was analyzing the image, and also searched for “font identification tool.” This time its response had nothing to do with the image – it just gave me a list of font identification cools, and then set the title of the chat to “Identificación de fuentes.” Where the heck did the Spanish come from?!

I’ve flustered it so much it is speaking in tongues!

Bing gets an attitude

Ok, at this point I decide I have to document this nuttiness, and I start this article. Because I missed out on the screenshot of the first chat, I try to recreate it as close as I can – the results are hilarious:

My analysis is that Microsoft, stung by bad press from when Bing chat professed love for a tech journalist and asked him to end his marriage, has put such strong guardrails in place that it causes the current version to be overcautious.

When AIs compare notes

It also doesn’t seem to have incorporated behind-the-scenes instructions to the ChatBot to tell it how it is working. Not sure what I’m talking about? This is something I also discovered recently. In ChatGPT Plus web interface, I have access to DALL-E 3, and it looks like this:

Now, my iOS ChatGPT app doesn’t have access to DALL-E 3 yet, but it is still possible to open the same chat via the history, and it looks like this:

Notice that’s a response from DALL-E 3 to ChatGPT. It’s not intended to be read by me, which is why it wasn’t visible to me when I first did it. But it is clearly explaining something to ChatGPT so that it doesn’t act oddly to me. And, just as clearly, Bing Chat doesn’t have something like that, and so it is left to its own devices when the app does something without the chatbot knowing.

At this point I’m worried I’m going to get my account suspended for daring to argue with the AI, but, well, this is the world now – when we snarl at our computers because they’re not behaving the way we want them to, they can argue back and then become passive-aggressive.

Dan Calle

Will tilt at windmills.

Tag Archives: AI

Context Engineering

Bing Chat has an attitude!