The world of software development is being revolutionized by Large Language Models (LLMs), presenting a new primitive that differs significantly from traditional computing resources. Given their novelty and distinct behavior, integrating LLMs into application development requires a unique approach. Let’s delve into the emerging architecture for LLM applications, based on insights from AI startups and tech companies.
The LLM Application Stack
The LLM application stack comprises various systems, tools, and design patterns. While this architecture is still evolving, it currently includes:
- Data Pipelines: Tools like Databricks and Airflow are used for ETL processes.
- Embedding Models: OpenAI, Cohere, and Hugging Face are key players in this space.
- Vector Databases: Pinecone, Weaviate, and ChromaDB stand out for efficient storage and retrieval of embeddings.
- Orchestration Frameworks: Langchain and LlamaIndex help in managing the complexities of prompt chaining and interfacing with external APIs.
- APIs/Plugins: OpenAI, Serp, and Wolfram provide necessary interfaces for LLM integration.
- LLM Caches: Redis and SQLite are used for caching to improve response times and cost-effectiveness.
The Design Pattern: In-context Learning
In-context learning is a pivotal design pattern in this stack, allowing developers to use LLMs off-the-shelf and control their behavior through strategic prompting. This approach involves embedding private data, constructing relevant prompts, and executing these prompts via pre-trained LLMs for inference.
The Workflow: Data Preprocessing to Inference
- Data Preprocessing/Embedding: Private data is stored in a vector database after being processed through an embedding model.
- Prompt Construction/Retrieval: For user queries, relevant documents are retrieved from the vector database to construct a series of prompts.
- Prompt Execution/Inference: Compiled prompts are submitted to LLMs for inference, with operational systems like logging and caching integrated at this stage.
The Future Outlook
As the underlying technology of LLMs advances, the architecture for LLM applications is expected to undergo substantial changes. This evolving landscape represents a shift from AI being a specialized domain to becoming a more accessible, data engineering-centric field.
Conclusion
The emergence of LLMs as a new foundation in software development is reshaping how applications are built. The current architecture, centered around in-context learning, offers a versatile framework for integrating LLMs into various applications. As this technology continues to evolve, it promises to bring more profound changes and innovations in the field of AI and software development.
For a more detailed exploration of this topic, see the original post by Matt Bornstein and Rajko Radovanovic at Andreessen Horowitz here.
Flow/Diagram Summary:
Data Pipelines:
- Tools: Databricks, Airflow
- Function: ETL Processes
Embedding Models:
- Providers: OpenAI, Cohere, Hugging Face
- Role: Processing and Embedding Data
Vector Databases:
- Examples: Pinecone, Weaviate, ChromaDB
- Purpose: Storing and Retrieving Embeddings
Orchestration Frameworks:
- Key Frameworks: Langchain, LlamaIndex
- Task: Managing Prompt Chaining and API Interfacing
APIs/Plugins:
LLM Caches:
In-context Learning Pattern: