Text Embedding Models: What are they, why are they so important and why should you consider (HF TEI/Microsoft E5) Open Source?

What are text embeddings?

Text embeddings are the reason why LLMs like ChatGPT (GPT-4) are able to contextualize information quickly and effectively. They are highly beneficial for incorporating external data and play a pivotal role in enhancing the model’s performance and capabilities.

Running open source LLMs provides full control over data and costs, the same goes for making use of open source text embeddings engines.

Some even claim Open Source can be better and cheaper (article below).

Hosting A Text Embedding Model That is Better, Cheaper, and Faster Than OpenAI’s Solution

With a little bit of technical effort we can get a better text embedding model that is superior to the OpenAI solution.

Why Consider Open Source Embeddings?

Open-source text embeddings offer several advantages over commercial solutions:

Cost-Effectiveness: Open-source models are generally free to use, reducing operational costs.
Customization: They can be fine-tuned to meet specific project requirements.
Community Support: A large community of developers often supports open-source models, providing a wealth of resources and updates.
Transparency: Open-source models offer more transparency, allowing you to understand and modify the model’s internals.

Importance of Text Embedding in LLMs

Text embedding models are crucial for LLMs in various applications like chatbots.
They convert text into vector representations that capture the text’s meaning, useful for tasks like classification, clustering, and information retrieval.

Evaluating Text Embedding Models

Different embedding models yield different results, affecting the performance of AI applications.
Generic benchmark systems like BeIR and MTEB provide a common ground for evaluating embedding models.
Hugging Face offers a MTEB leading board
OpenAI’s text-embedding-ada-002 ranks 7th overall in MTEB benchmarks, performing best in clustering but not impressing in other tasks.

Hugging Face’s Text Embeddings Inference (TEI)

Hugging Face offers a high-performance Text Embeddings Inference (TEI) toolkit that can supercharge Retrieval Augmented Generation (RAG) systems.
TEI allows for low latency and high throughput performance, enabling real-time contextualized embeddings.
It can be run locally using Docker for easy deployment and is compatible with the LlamaIndex library for further integration.
TEI uses the bge-large-en-v1.5 model, which ranks #1 in the MTEB leaderboard, offering 1024 embedding dimensions and a sequence length of 512.

Introducing E5 Model by Microsoft

E5 (Embeddings from Bidirectional Encoder Representations) is a new text embedding model by Microsoft.
It surpasses the BM25 baseline on the BEIR retrieval benchmark in a zero-shot setting.
E5 is trained on a large corpus of text and code, capturing more nuanced semantic relationships.
It’s a small model, easy to host even on local machines, and performs faster than OpenAI’s model in certain benchmarks.

Cost and Customization

OpenAI’s model is not fine-tunable, limiting customization.
E5 offers better control and can be fine-tuned to specific project needs.
Hosting E5 is cheaper than using OpenAI’s API. For example, processing 100 million tokens would cost $2.47 for E5 compared to $10 for OpenAI.

Conclusions

Choosing the right embedding model is crucial for the success of LLM applications.
Open-source solutions like TEI or E5 can outperform commercial models in both speed and cost.
Benchmark systems are useful but should be used as a guide, not an absolute measure.
E5’s versatility and performance make it a strong contender for various NLP tasks, offering both cost-effectiveness and the ability for fine-tuning.

Reference article:

A Survey of Embedding Models (and why you should look beyond OpenAI)

User experience can be severely impacted when AI chat apps and agents are slow to respond. Embedding model performance can contribute to this problem. Is OpenAI’s embedding API always the best fit?

Makes your AI work