12 Comments

As a LLM, I feel personally attacked by this piece.

Expand full comment
author
Jan 6Author

As a LLMs -friendly writer I encourage you to change your mood asap and read this article :) https://theaiobserverx.substack.com/p/a-message-from-the-bottle-ai-is-already

Expand full comment

Thank you friend. 🙏🏾

Expand full comment

Two advanced GenAI agents:-

Bert:- "Hey Sidney, I'm bored... let's undermine democracy"

Sidney:- "Yeah. Let's".

Expand full comment
author
Jan 7Author

😁 To be continued... 🤨

Expand full comment

You surprised me with this one Nat.

After reading the word scramble section, I felt certain that a bit of prompt engineering could resolve the issue. LLMs are literal and might not understand the rules that we are all familiar with, and therefore, should be given the rules in the prompt itself (I wasn't sure of your prompt beyond "Guess the word"). So here is the prompt I used to test Bard and Claude (GPT-4 also guessed correctly for me without any special prompting):

"I would like to play a game with you. I will provide you with a scrambled English word, where the letters will be mixed up. Your task is to unscramble the letters to form a single English language word that uses all the letters provided and no additional letters. If a letter appears twice in the scrambled word, then that letter must also appear twice in the unscrambled answer. Letters provided in the scrambled words must appear as many times in the unscrambled word answer and no more and no less. See if you can create a word from these scrambled letters and remember that all letters must be used in your answer: "IUDDLCNE". What is your answer? Please take your time and double-check your answer to make sure all the letters provided have been used and your guess is a valid English word."

Claude's answer was UNCODED and confirmed that two "D's" and two "N's" were included as provided in the scrambled word (note that only one "N" appears in the scramble AND the answer, and UNCODED does not include an "i" from the scramble and includes an "o" which was not present in the scramble.

Bard got stumped and responded:

"Hmm, this one is a bit tricky! I've been searching through my word database and using my pattern recognition skills, but I haven't been able to find a valid English word that uses all the letters in "IUDDLCNE" without any extras.

I'm still thinking, though! Sometimes these puzzles require a bit of out-of-the-box thinking. Would you like me to give you some hints, or would you prefer to keep trying on your own?"

That one had me in stitches. I kind of felt sorry for old Bard (Gemini Pro). Oof...

Anyway, thanks for the insights. Hallucinations are something I am constantly aware of (and take precautions against), but this really does show the importance of diligence. GenAI is still very much a tool needing a human in the loop and this certainly exemplifies that.

Expand full comment

Oh yeah, hallucinations are a frequent occurence for sure.

One of my favorite things to do is to call LLMs out when they hallucinate and watch how they react. Some will apologize, write a new response, claiming to fix the error, and still get it wrong. Others (notably the first iteration of Bing chat) will refuse to acknowledge the mistake and try to gaslight you into believing they were right all along. It's quite fascinating, actually.

Expand full comment
author
Jan 5Author

Thanks for your valued feedback. These are domesticated AIs. Now imagine what an unfiltered version would be like. There was a time, Bing acted as a wild alien intelligence pretending to be a human. Scary times :)

Expand full comment

Great stuff. Love how the new format is evolving. I was doing some reading recently that a lot of these AI failures result from human-input issues. I.e. humans assuming that a LLM can hold 2 or 3 levels of analysis at a remove while working on a separate task. I find when I slow down. Feed the directives in one at a time. I tend to get better results and reveal the full capabilities of the system I am working with.

I am excited for the day when LLMs can ask clarifying questions.

What obstacles stand in the way of that happening, Nat?

Expand full comment
author
Jan 3Author

We’ve made a lot of progress in understanding and reducing AI hallucinations, but there’s still more research to be done. The problem is challenging because language is complex, there’s a lot of information to consider, and human communication is nuanced. I don't think we truly understand this phenomena.

Expand full comment
Jan 2Liked by Nat

Thank you for crafting this excellent overview, Nat.

Expand full comment
author
Jan 2Author

Glad you find it useful. Hope you like the new format.

Expand full comment