Salesforce study finds LLM agents flunk CRM and confidentiality tests

Welcome • User Guide
ToS • Privacy • Canary
Donate • Bugs • License

©2025 Poal.co

1.2K

•

Salesforce study finds LLM agents flunk CRM and confidentiality tests (www.theregister.com)

From the post:

>A new benchmark developed by academics shows that LLM-based AI agents perform below par on standard CRM tests and fail to understand the need for customer confidentiality. A team led by Kung-Hsiang Huang, a Salesforce AI researcher, showed that using a new benchmark relying on synthetic data, LLM agents achieve around a 58 percent success rate on tasks that can be completed in a single step without needing follow-up actions or more information. Using the benchmark tool CRMArena-Pro, the team also showed performance of LLM agents drops to 35 percent when a task requires multiple steps. Another cause for concern is highlighted in the LLM agents' handling of confidential information. "Agents demonstrate low confidentiality awareness, which, while improvable through targeted prompting, often negatively impacts task performance," a paper published at the end of last month said.

Looks like their plan to not hire a single new programmer/engineer this year is going completely out the window because they adopted a technology they don't understand. Early adopter's often get fucked with tech like this. Otherwise those creating it have no reason to sell it to you since it would give them the largest market advantage in history. Archive: https://archive.today/hoGM0 From the post: >>A new benchmark developed by academics shows that LLM-based AI agents perform below par on standard CRM tests and fail to understand the need for customer confidentiality. A team led by Kung-Hsiang Huang, a Salesforce AI researcher, showed that using a new benchmark relying on synthetic data, LLM agents achieve around a 58 percent success rate on tasks that can be completed in a single step without needing follow-up actions or more information. Using the benchmark tool CRMArena-Pro, the team also showed performance of LLM agents drops to 35 percent when a task requires multiple steps. Another cause for concern is highlighted in the LLM agents' handling of confidential information. "Agents demonstrate low confidentiality awareness, which, while improvable through targeted prompting, often negatively impacts task performance," a paper published at the end of last month said.

(post is archived)

[–] • 1 pt

That doesn't necessarily make the AI CSA's worse than the human ones, in my experience.

link

/s/business

created ago

About this sub

THE RULES:

No advertising, this includes "Free stuff for your whatever" type posts, even if you didn't write the post.
Posts must be about things businesses are doing. Posts that are nothing more than long writeups with links to things that "help your business" or explain "why you need this for your business" are advertising and fall under rule #1.
Keep memes and image posts on topic for what businesses are doing.
You're free to talk about what you've done in making your personal business a success, but talk about it in general terms. Don't give product names, pictures, or any kind of links.
Start at #1 and read the rules again.

Posts that are amusingly bad may be moved to and deemed "Coke." Batteries are not included, this offer void where prohibited by law, which is pretty much everywhere including the moon, Mars, and specifically exoplanet GA3-701 where it is punishable by a fine of not less than 15²⁸ stimborbs and 376 solar years in prison. No cash value save that which is in your pants pockets and will need to be given to us. By using this sub, you offer control of your eternal soul to stupidbird who may use it for whatever purposes, up to and including the dark arts in which case your only recourse is to appeal to the council of evil and have your petition denied. Birds are not real. We’ll decide what will be on the contract. You’ll pay us to be allowed to do the work. You will make the product. Then you’ll give it to us, and it will be ours forever. If anybody buys it, we might give you some money. If you’re lucky. Or we might put it in a drawer and forget about it. In any event you will get nothing, and you will thank us for it. This isn't the algorithm, it is close. You understand that any breach of this agreement will invoke unspecified consequences of a dire nature, potentially affecting your being in ways beyond your comprehension. Proceed at your own peril. We've increased your synergy, but this sub cannot save you money if you own a parakeet. Parakeets are not real. All posts here are for informational purposes only and should not be taken as business or medical advice. Supplies are limited, this offer void after January 1, 1922 or when supplies are exhausted. Please put that down, don't stick that in there, and for the love of god please put on some pants, no one wants to see that. (c) 1988 Con-Troll, Inc, PO Box 12, Wrestlemania, MN.