“A recent Gartner poll shows that while 55% of organizations are experimenting with generative AI, only 10% have put generative AI into production. One of the biggest barriers to productionizing LLMs is dealing with their tendency to produce bogus outputs known as hallucinations, which precludes their use in applications where correct outputs are necessary (i.e., most applications)! Despite this, some organizations have deployed these unreliable LLMs, sometimes with catastrophic results. Air Canada’s chatbot hallucinated refund policies, eventually resulting in the airline being held responsible for the misinformation and being ordered by a tribunal to refund a customer; the chatbot has since been taken down. A federal judge fined a law firm after their lawyers used ChatGPT to draft a brief full of fabricated citations. New York City’s “MyCity” chatbot has been hallucinating wrong answers to business owners’ questions about local laws.”
https://cleanlab.ai/blog/trustworthy-language-model/
Do we have any hopes of overcoming this? One startup claims they have a good answer.
"LLMs will always have some hallucinations, but by providing a trustworthiness score with every output, Cleanlab TLM lets you identify when the LLM is hallucinating."
https://cleanlab.ai/blog/trustworthy-language-model/