WelcomeUser Guide
ToSPrivacyCanary
DonateBugsLicense

©2025 Poal.co

1.2K

Looks like their plan to not hire a single new programmer/engineer this year is going completely out the window because they adopted a technology they don't understand. Early adopter's often get fucked with tech like this. Otherwise those creating it have no reason to sell it to you since it would give them the largest market advantage in history.

Archive: https://archive.today/hoGM0

From the post:

>A new benchmark developed by academics shows that LLM-based AI agents perform below par on standard CRM tests and fail to understand the need for customer confidentiality. A team led by Kung-Hsiang Huang, a Salesforce AI researcher, showed that using a new benchmark relying on synthetic data, LLM agents achieve around a 58 percent success rate on tasks that can be completed in a single step without needing follow-up actions or more information. Using the benchmark tool CRMArena-Pro, the team also showed performance of LLM agents drops to 35 percent when a task requires multiple steps. Another cause for concern is highlighted in the LLM agents' handling of confidential information. "Agents demonstrate low confidentiality awareness, which, while improvable through targeted prompting, often negatively impacts task performance," a paper published at the end of last month said.

Looks like their plan to not hire a single new programmer/engineer this year is going completely out the window because they adopted a technology they don't understand. Early adopter's often get fucked with tech like this. Otherwise those creating it have no reason to sell it to you since it would give them the largest market advantage in history. Archive: https://archive.today/hoGM0 From the post: >>A new benchmark developed by academics shows that LLM-based AI agents perform below par on standard CRM tests and fail to understand the need for customer confidentiality. A team led by Kung-Hsiang Huang, a Salesforce AI researcher, showed that using a new benchmark relying on synthetic data, LLM agents achieve around a 58 percent success rate on tasks that can be completed in a single step without needing follow-up actions or more information. Using the benchmark tool CRMArena-Pro, the team also showed performance of LLM agents drops to 35 percent when a task requires multiple steps. Another cause for concern is highlighted in the LLM agents' handling of confidential information. "Agents demonstrate low confidentiality awareness, which, while improvable through targeted prompting, often negatively impacts task performance," a paper published at the end of last month said.

(post is archived)

[–] 1 pt

That doesn't necessarily make the AI CSA's worse than the human ones, in my experience.