The Paradox of AI: Why Can’t Smart Systems…

Nat

Jan 2

AI: It Knows Everything, Except How to Open a Jar.

Read →

12 Comments

dan mantena

Waiting for Godot and ASI

Jan 6Liked by Nat

As a LLM, I feel personally attacked by this piece.

Expand full comment

Reply (1)

Nat

Jan 6Author

As a LLMs -friendly writer I encourage you to change your mood asap and read this article :) https://theaiobserverx.substack.com/p/a-message-from-the-bottle-ai-is-already

Expand full comment

Reply (1)

dan mantena

Waiting for Godot and ASI

Jan 6Liked by Nat

Thank you friend. 🙏🏾

Expand full comment

Paul Backhouse

COPING WITH CARING

Jan 7Liked by Nat

Two advanced GenAI agents:-

Bert:- "Hey Sidney, I'm bored... let's undermine democracy"

Sidney:- "Yeah. Let's".

Expand full comment

Reply (1)

Nat

Jan 7Author

😁 To be continued... 🤨

Expand full comment

Gene Bernardin

Jan 6Liked by Nat

You surprised me with this one Nat.

After reading the word scramble section, I felt certain that a bit of prompt engineering could resolve the issue. LLMs are literal and might not understand the rules that we are all familiar with, and therefore, should be given the rules in the prompt itself (I wasn't sure of your prompt beyond "Guess the word"). So here is the prompt I used to test Bard and Claude (GPT-4 also guessed correctly for me without any special prompting):

"I would like to play a game with you. I will provide you with a scrambled English word, where the letters will be mixed up. Your task is to unscramble the letters to form a single English language word that uses all the letters provided and no additional letters. If a letter appears twice in the scrambled word, then that letter must also appear twice in the unscrambled answer. Letters provided in the scrambled words must appear as many times in the unscrambled word answer and no more and no less. See if you can create a word from these scrambled letters and remember that all letters must be used in your answer: "IUDDLCNE". What is your answer? Please take your time and double-check your answer to make sure all the letters provided have been used and your guess is a valid English word."

Claude's answer was UNCODED and confirmed that two "D's" and two "N's" were included as provided in the scrambled word (note that only one "N" appears in the scramble AND the answer, and UNCODED does not include an "i" from the scramble and includes an "o" which was not present in the scramble.

Bard got stumped and responded:

"Hmm, this one is a bit tricky! I've been searching through my word database and using my pattern recognition skills, but I haven't been able to find a valid English word that uses all the letters in "IUDDLCNE" without any extras.

I'm still thinking, though! Sometimes these puzzles require a bit of out-of-the-box thinking. Would you like me to give you some hints, or would you prefer to keep trying on your own?"

That one had me in stitches. I kind of felt sorry for old Bard (Gemini Pro). Oof...

Anyway, thanks for the insights. Hallucinations are something I am constantly aware of (and take precautions against), but this really does show the importance of diligence. GenAI is still very much a tool needing a human in the loop and this certainly exemplifies that.

Expand full comment

Daniel Nest

Why Try AI?

Jan 5Liked by Nat

Oh yeah, hallucinations are a frequent occurence for sure.

One of my favorite things to do is to call LLMs out when they hallucinate and watch how they react. Some will apologize, write a new response, claiming to fix the error, and still get it wrong. Others (notably the first iteration of Bing chat) will refuse to acknowledge the mistake and try to gaslight you into believing they were right all along. It's quite fascinating, actually.

Expand full comment

Reply (1)

Nat

Jan 5Author

Thanks for your valued feedback. These are domesticated AIs. Now imagine what an unfiltered version would be like. There was a time, Bing acted as a wild alien intelligence pretending to be a human. Scary times :)

Expand full comment

Nick Potkalitsky

Educating AI

Jan 2Liked by Nat

Great stuff. Love how the new format is evolving. I was doing some reading recently that a lot of these AI failures result from human-input issues. I.e. humans assuming that a LLM can hold 2 or 3 levels of analysis at a remove while working on a separate task. I find when I slow down. Feed the directives in one at a time. I tend to get better results and reveal the full capabilities of the system I am working with.

I am excited for the day when LLMs can ask clarifying questions.

What obstacles stand in the way of that happening, Nat?

Expand full comment

Reply (1)

Nat

Jan 3Author

We’ve made a lot of progress in understanding and reducing AI hallucinations, but there’s still more research to be done. The problem is challenging because language is complex, there’s a lot of information to consider, and human communication is nuanced. I don't think we truly understand this phenomena.

Expand full comment

Ann

Jan 2Liked by Nat

Thank you for crafting this excellent overview, Nat.

Expand full comment

Reply (1)

Nat

Jan 2Author

Glad you find it useful. Hope you like the new format.

Expand full comment

The AI Observer

The Paradox of AI: Why Can’t Smart Systems…