Quick look at Llama 4’s latest models: Scout, Maverick, and a sneak peek at Behemoth

After months of speculation and delays, on April 10, 2025, Meta officially unveiled Llama 4—the next-generation model in its open-source language model lineup. But this isn't just another update. This release introduces two new standout models: Llama 4 Scout and Llama 4 Maverick, marking the debut of Meta’s fully redesigned multimodal architecture. The big question: Is this a breakthrough moment in AI, or a calculated step in Meta’s long-term strategy as the competition in the AI space heats up?

With this article, here are the key takeaways you’ll walk away with:

Meta officially announces the release of Llama 4
A closer look at the specs and roles of three AI models: Llama 4 Scout, Maverick, and a sneak peek at Behemoth
Is Llama 4 just an upgrade—or a strategic leap in Meta’s broader AI roadmap?

Meta officially launches Llama 4

Meta has officially launched Llama 4, its latest large language model (LLM) featuring a state-of-the-art mixture-of-experts (MoE) architecture. Unlike previous models, Llama 4 is multimodal, capable of processing and integrating text, images, video, and audio, allowing for seamless content transformation between formats. For the first time, Llama 4 can handle both text and pictures simultaneously, offering improved visual language understanding.

MoE used by Meta to train the Llama 4 model. Cre: Meta

The new models have already been integrated into Meta's AI-powered products like WhatsApp, Messenger, Instagram Direct, and the Meta AI website. They’re also available on cloud platforms such as Azure AI Foundry and Azure Databricks, enabling easy access for developers.

Despite the excitement, development wasn’t without challenges. Meta faced performance issues during internal tests, especially around reasoning and mathematical tasks. To address this, Meta implemented new training methods, including MoE architecture, before finally launching Llama 4, which had been delayed earlier due to not meeting the company's technical expectations.

In addition to Llama 4, Meta introduced two new models—Llama 4 Scout and Llama 4 Maverick—as part of the release. These new models bring more specialized capabilities for different use cases, further pushing the boundaries of what’s possible with AI. Meta also unveiled a preview of Behemoth, an upcoming model that promises to revolutionize the AI space even further.

Llama 4 hub featuring the latest models: Scout, Maverick, and previews of Behemoth. Cre: Meta

Llama 4 Scout

Llama 4 Scout is Meta's highest-performing small model with 17 billion active parameters and 16 experts. It offers impressive speed and multimodal capabilities, handling both text and images. With a 10 million token context window (roughly 5 million words), it can also run efficiently on a single GPU, like the NVIDIA H100.

Despite its impressive specs, Meta has not addressed how Scout handles complex queries beyond simple text retrieval. The model's use of the outdated "Needle in the Haystack" benchmark to test the context window raises some concerns, especially with more advanced AI benchmarks available. Additionally, while Scout supports a 10M token context, it was only trained with 256K tokens, meaning the advertised window is based on generalization, not direct training results.

According to Meta, Llama 4 Scout is the strongest in coding tasks. Cre: Meta

Scout uses Meta's MoE architecture, activating only a subset of parameters per token, making it more compute-efficient and scalable compared to dense models like GPT-4, where all parameters are used.

Meta also highlights Scout’s multimodal abilities. It was pre-trained on text, image, and video data, enabling it to process text and visual inputs together. In tasks like visual grounding and VQA (visual question answering), Scout outperforms previous Llama models and competes well against larger systems.

Scout’s benchmark scores are impressive compared to other Llama models and competing AI systems

Scout’s benchmark scores are impressive compared to other Llama models and competing AI models. Cre: Meta

Llama 4 Maverick

Meta's Llama 4 Maverick has been recognized as the top multimodal model, outperforming GPT-4 and Gemini 2.0 Flash across various benchmarks. It also achieves comparable performance to DeepSeek v3 in reasoning and coding tasks, but with fewer parameters. Maverick offers a great performance-to-cost ratio, scoring 1417 ELO on LMArena, and can run on a single host.

Like Llama 4 Scout, Maverick uses 17 billion active parameters, but it draws from a total of 400 billion parameters, distributed across 128 different expert models. The model employs a mixture-of-experts (MoE) architecture, optimizing computational efficiency by activating only a subset of specialized systems for each data input. Despite these improvements, Maverick requires a full Nvidia DGX H100 server, equipped with 8 GPUs to deploy, due to its large scale. It supports a context window of up to 1 million tokens.

Meta claims that Llama 4 Maverick outperforms both OpenAI’s GPT-4 and Google’s Gemini 2.0 Flash in several standard evaluations. It also matches Deepseek V3 in reasoning and code generation tasks, while utilizing significantly fewer parameters. With its impressive ELO score and powerful capabilities, Maverick is set to be a key player in the AI field.

While Scout pushes the boundaries of context length, Maverick is designed to provide consistent, high-quality performance across a diverse range of tasks.

Additionally, Maverick benefited from co-distillation with Llama 4 Behemoth, Meta's larger internal model. This process enhanced Maverick’s performance, particularly in reasoning and chat quality, without incurring additional training costs.

Benchmark results show Llama 4 performs better in reasoning and chat quality tasks. Cre: Meta

Preview of Behemoth

Meta has given us a sneak peek at Llama 4 Behemoth, their most powerful model to date and one of the smartest LLMs in the world. Behemoth outperforms models like GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro across several STEM benchmarks, showcasing its advanced capabilities.

Scout and Maverick are based on Llama 4 Behemoth, which is still in training. Meta has revealed that Behemoth is built with 288 billion active parameters, 16 experts, and nearly two trillion total parameters, though it is still being fine-tuned.

While Llama 4 Behemoth is not yet fully ready, Meta is already sharing exciting details about its progress. The model’s pre-training uses FP8 and 32K GPUs, achieving 390 TFLOPs/GPU. The training dataset comprises over 30 trillion tokens, more than double what was used for Llama 3, incorporating diverse text, image, and video content.

Initial specifications of Llama 4 Behemoth. Cre: Meta

In post-training, Meta refined the model by discarding 95% of SFT examples, focusing instead on difficult prompts and reinforcing complex reasoning, coding, and multilingual tasks. Dynamic filtering was also used to discard low-value prompts during reinforcement learning, ensuring that Behemoth could generate high-quality and efficient outputs.

Llama 4 release: Just an upgrade, or Meta’s strategic leap?

As OpenAI, DeepSeek, and xAI continue to push the boundaries of AI in early 2025 with ChatGPT-4.5, DeepSeek R1 or Grok, Meta’s release of Llama 4—alongside Scout and Maverick—feels less like a routine update and more like a bold shift in direction. With a sharper focus on efficiency, cost, and global adaptability, this launch positions Meta not just to compete, but to redefine how AI is built and deployed.

Here are key insights from Rabiloo’s experts on what makes this release stand out:

Smaller, faster, smarter

One of the standout features of Llama 4 Scout is its compact design—built with 17 billion active parameters, making it relatively small by today’s standards. But what it lacks in size, it makes up for in performance. Thanks to an impressively large context window, Scout delivers strong results on long, complex inputs.

Its architecture is based on MoE a system that activates only the most relevant experts for each task, rather than the entire model. This not only improves speed but also significantly reduces compute costs.

As AI researcher Ivan Badeev points out, “With enough context, Llama 4 Scout’s performance on specific applied tasks could be significantly better than many state-of-the-art models.” It’s clear that Meta is aiming to build models that are not just powerful—but practical and scalable for real-world use.

A tactical focus on cost

Meta is also addressing one of the biggest blockers to AI deployment: cost. According to the company, Llama 4 Maverick can run at just 19 to 49 cents per million tokens, compared to $4.38 for GPT-4o. Even Google's more affordable Gemini 2.0 Flash costs 17 cents—while DeepSeek v3.1 is priced at 48 cents.

Though Meta isn’t selling access to these models commercially, cost-efficiency remains a core part of its internal strategy. As Wipro’s Chintan Mota points out, “The infrastructure, the inference, the lock-in—it all adds up.” Meta is aiming to solve exactly that.

More than just architecture

Beyond performance and pricing, Meta has also introduced a new training technique—called Meta—which helps fine-tune hyperparameters like per-layer learning rates and initialization scales with greater reliability. This method reflects Meta's ambition not just to keep up, but to innovate at the foundational level of model development.

A global-first vision

Meta’s strategy isn’t only about efficiency—it’s also about reach. Llama 4 was pre-trained on 200 languages, over 100 of which had more than 1 billion tokens each. That’s 10 times more multilingual data than Llama 3 used, highlighting a clear move toward building models that perform well across global use cases—not just English.

How to access the latest Llama 4?

There are several ways to try out and work with Llama 4, whether you're a developer, researcher, or casual user.

You can download the models directly from Meta’s official site at the Llama website , where you’ll also find documentation and licensing details.

For developers looking to deploy at scale, Llama 4 is available on cloud platforms like Azure AI Foundry and Azure Databricks, making it easy to integrate into existing infrastructures.

If you're interested in API access, you can visit the Meta AI developer platform to explore tools, resources, and integration options. While there’s currently no standalone public API for Meta AI chat, developer access is expected to expand over time.

For everyday users, Meta AI is already live across several Meta platforms, including WhatsApp, Messenger, Instagram, and Facebook. To use it, just log in with a Meta account and start chatting—no setup required. Note that availability may vary by region and account type.

Conclusion

The launch of Llama 4, along with the Scout and Maverick models, marks a solid step forward for Meta in the AI race. Each model brings something different to the table—whether it’s speed, context handling, or balanced performance. With growing support across platforms and early signs of what’s coming next with Behemoth, Llama 4 isn’t just a technical release—it’s a sign of where Meta is headed with its AI strategy.