Meta Platforms unveiled quietly through a research paper on arXiv.org, this new AI model is an extension of Meta’s open-source Llama 2. The model has been trained on longer sequences and a dataset where long texts are upsampled. As a result, Llama 2 Long outperforms some of the leading AI models, including OpenAI’s GPT-3.5 Turbo and Claude 2, in generating responses to long user prompts.
Takeaways:
- Extended Capabilities: Llama 2 Long is an advanced version of Meta’s open-source Llama 2, designed to handle longer training sequences.
- Impressive Performance: The model outperforms GPT-3.5 Turbo and Claude 2 in generating responses to long user prompts, boasting a 16,000-character context window for GPT-3.5 Turbo and a 100,000-character context window for Claude 2.
- Technical Innovations: The architecture remains largely the same as the original Llama 2 but includes crucial modifications to the Rotary Positional Embedding (RoPE) encoding, allowing the model to attend to longer sequences effectively.
- Training Techniques: Meta researchers used Reinforcement Learning from Human Feedback (RLHF) and synthetic data generated by Llama 2 chat to improve its performance in various tasks.
- Open-Source Validation: The AI community has shown admiration and excitement for Llama 2 Long, validating Meta’s open-source approach in generative AI.