In artificial intelligence (AI), a hallucination or artificial hallucination (also occasionally called confabulation or delusion) is a confident response by an AI that does not seem to be justified by its training data.
Why do AI hallucinations happen?
There are many possible technical reasons for hallucinations in LLMs. While the inner workings of LLMs and the exact mechanisms that produce outputs are opaque, there are several general causes that researchers point to. Some of them include the following:
- Data quality. Hallucinations from data occur when there is bad information in the source content. LLMs rely on a large body of training data that data that can contain noise, errors, biases or inconsistencies. ChatGPT, for example, included Reddit in its training data.
- Generation method. Hallucinations can also occur from the training and generation methods, even when the data set is consistent and reliable. For example, bias created by the model’s previous generations could cause a hallucination. A false decoding from the transformer could also be the cause of hallucination. Models might also have a bias toward generic or specific words, which influences the information they generate and fabricate.
- Input context. If the input prompt is unclear, inconsistent or contradictory, hallucinations can arise. While data quality and training are out of the user’s hands, input context is not. Users can hone their inputs to improve results.