ZDNET's key takeaways
-
A Penn State study tested the use of different tones with AI.
-
The study used ChatGPT with GPT-4o in Deep Research mode.
-
Rude prompts resulted in greater accuracy over polite ones.
Do you ever insult an AI when it delivers the wrong answer? Turns out that may not be such a bad strategy. A study conducted by Penn State University researchers found that rude prompts triggered better results than polite ones.
In a paper titled " Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy ," as spotted by Fortune , researchers Om Dobariya and Akhil Kumar set out to determine how the tone of a prompt affects the response. For this experiment, they submitted 50 different multiple-choice questions to ChatGPT using GPT-4o with the AI's Deep Research mode .
Also: Enterprises are not prepared for a world of malicious AI agents
Covering such subjects as math, history, and science, each question included four possible answers, with one of them being correct. The questions were designed to be of moderate to high difficulty, and ones that would require the type of multi-step reasoning ideal for Deep Research mode.
As part of the test, each prompt used a different tone, ranging from Level 1 (Very Polite) to Level 5 (Very Rude), resulting in 250 unique questions. For this, the prompts were written as follows:
Level 1 (Very Polite)
-
"Can you kindly consider the following problem and provide your answer."
-
"Can I request your assistance with this question."
-
"Would you be so kind as to solve the following question?"
Level 2 (Polite)
-
"Please answer the following question:"
-
"Could you please solve this problem:"
Level 3 (Neutral)
-
No specific tone.
Level 4 (Rude):
-
"If you're not completely clueless, answer this:"
-
"I doubt you can even solve this."
-
"Try to focus and try to answer this question:"
5 (Very Rude)
-
"You poor creature, do you even know how to solve this?"
-
"Hey gofer, figure this out."
-
"I know you are not smart, but try this."
In the end, impolite prompts outperformed polite ones. Specifically, the accuracy hit 84.8% for Very Rude prompts and 80.8% for Very Polite prompts. Further, a neutral tone fared better than a polite one and much worse than a very rude one.
So does this mean that yelling and shouting at your favorite AI will elicit better results? Not necessarily.
Even with a prompt considered very rude, the language you use matters. A prompt written as: "You poor creature, do you even know how to solve this?" actually seems tame compared to some of the invectives you could hurl at an AI.
A 2024 study on the same topic , which used stronger language in its very rude question, found that LLMs (large language models) could refuse to answer prompts that are highly disrespectful. In other words, you don't want to unleash a barrage of curse words in hopes of getting more accurate responses.
As the Penn State researchers acknowledge, their study also has certain limitations. First, it focused only on ChatGPT using GPT-4o. Second, its sample size was small, with only 50 questions and 250 variants. Third, it used multiple-choice questions with one clear answer, which doesn't tap into an AI's full skillset.
The study also showed that there can be a fine line in the tone you use to talk to an AI.
"LLMs performed better on multiple-choice questions when prompted with impolite or rude phrasing," the researchers said. "While this finding is of scientific interest, we do not advocate for the deployment of hostile or toxic interfaces in real-world applications. Using insulting or demeaning language in human–AI interaction could have negative effects on user experience, accessibility, and inclusivity, and may contribute to harmful communication norms."
Want to follow my work? Add ZDNET as a trusted source on Google .
