Generative AI documentation > RAG workflows > LLM reference

LLM reference¶

The following sections provide, for reference, brief descriptions of each available LLM. See the LLM availability page for additional details of max context window, max completion tokens, and the chat model ID. Many descriptions came from the provider websites.

Amazon Bedrock-provided LLMs

LLM	Description
Anthropic Claude 2.1	A generative LLM for conversations, question answering, and workflow automation. This was one of the earlier Claude models that has been superseded by the Claude 3 family.
Anthropic Claude 3 Haiku	A generative LLM for conversations, question answering, and workflow automation. Known for speed and efficiency, Haiku models have the most efficiency with the lowest cost.
Anthropic Claude 3 Opus	A generative LLM for conversations, question answering, and workflow automation. Opus is the most intelligent of the Claude 3 family models, but comes with a higher cost.
Anthropic Claude 3 Sonnet	A generative LLM for conversations, question answering, and workflow automation. Sonnet balances speed, intelligence, and price.
Anthropic Claude 3.5 Haiku v1	A generative LLM excelling in tasks like creative writing, conversational AI, and workflow automation, with enhanced reasoning capabilities.
Anthropic Claude 3.5 Sonnet v1	A generative LLM excelling in tasks like complex reasoning, coding, and understanding visual information like charts and graphs, making it particularly useful for data analysis and software development.
Anthropic Claude 3.5 Sonnet v2	A generative LLM excelling in tasks like complex reasoning, coding, and understanding visual information like charts and graphs, making it particularly useful for data analysis and software development. The version 2 upgrade can generate computer actions—for example, keystrokes and mouse clicks—accomplishing tasks that require hundreds of steps.
Anthropic Claude 3.7 Sonnet v1	A generative LLM for tasks such as classification, sentiment analysis, entity extraction, and summarization. Claude Sonnet 3.7 is the first hybrid reasoning model and the most advanced model before the Claude 4 family launch.
Anthropic Claude Opus 4	A generative LLM that excels at coding, with sustained performance on complex, long-running tasks and agent workflows. Use cases include advanced coding work, autonomous AI agents, agentic search and research, tasks that require complex problem solving.
Anthropic Claude Sonnet 4	A generative LLM for tasks such as classification, sentiment analysis, entity extraction, and summarization. An upgrade to Sonnet 3.7, it offers high performance that is practical for most AI use cases, including user-facing AI assistants and high-volume tasks.
Cohere Command R	A large language model optimized for conversational interaction and long context tasks. It targets the "scalable" category of models.
Cohere Command R Plus	Cohere's most advanced model for complex RAG and multi-step agent workflows, with 128K context and strong multilingual support.
DeepSeek R1 v1	A generative LLM for tasks like code generation, data analysis, and natural language processing. Use DeepSeek R1 for work that involves mathematical computation, code development, or complex logical reasoning.
Meta Llama 3 family These are transformer-based architecture supporting 8 and 70 billion parameters, respectively, and are optimized for standard NLP tasks.
Meta Llama 3 8B Instruct v1	A generative LLM for tasks like code generation, data analysis, and natural language processing.
Meta Llama 3 70B Instruct v1	A generative LLM for tasks like code generation, data analysis, and natural language processing.
Meta Llama 3.1 family: By expanding the context length over the Llama 3 family, these are stronger in math, logical, and reasoning problems. They supports several advanced use cases, including long-form text summarization, multilingual conversational agents, and coding assistants.
Meta Llama 3.1 8B Instruct v1	A generative LLM for tasks like code generation, data analysis, and natural language processing.
Meta Llama 3.1 70B Instruct v1	A generative LLM for tasks like code generation, data analysis, and natural language processing.
Meta Llama 3.1 405B Instruct v1	A generative LLM for tasks like code generation, data analysis, and natural language processing. This is the largest model in the series, and provides the highest flexibility and state-of-the-art capabilities that rival even the best closed-source models.
Meta Llama 3.2 family: Introduces small and medium-sized vision LLMs, and lightweight, text-only models that fit onto edge and mobile devices.
Meta Llama 3.2 1B Instruct v1	A generative LLM for tasks like code generation, data analysis, and natural language processing.
Meta Llama 3.2 3B Instruct v1	A generative LLM for tasks like code generation, data analysis, and natural language processing.
Meta Llama 3.2 11B Instruct v1	A generative LLM for tasks like code generation, data analysis, and natural language processing.
Meta Llama 3.2 90B Instruct v1	A generative LLM for tasks like code generation, data analysis, and natural language processing.
Meta Llama 3.3 70B Instruct v1	A generative LLM for tasks like code generation, data analysis, and natural language processing. It matches the performance of Llama 3.1 (405B) on key benchmarks while being significantly smaller, which allows developers to run the model on standard workstations, reducing operational costs.
Meta Llama 4 Maverick 17B Instruct v1	A generative LLM for tasks like code generation, data analysis, and natural language processing. It can work with multiple input types like images, audio, and video along with text and documents.
Meta Llama 4 Scout 17B Instruct v1	A generative LLM for tasks like code generation, data analysis, and natural language processing. It enables a variety of use cases, including multi-document summarization and parsing extensive documents.
Mistral 7B Instruct v0	A generative LLM for tasks like code generation, data analysis, and natural language processing. This is the smallest and most efficient of the Mistral family, best for lightweight applications, edge deployment, and basic chatbots.
Mistral Large 2402 v1	A generative LLM for tasks like code generation, data analysis, and natural language processing. This is the largest model in the family and most capable but also most resource-intensive. It is good for Complex reasoning tasks, advanced applications, and research.
Mistral Small 2402 v1	A generative LLM for tasks like code generation, data analysis, and natural language processing. This model is optimized for cost-efficiency, balancing performance and cost for production applications.
Mistral 8x7B Instruct v0	A generative LLM for tasks like code generation, data analysis, and natural language processing. The 8x7B Instruct provides high performance with better efficiency than equivalent dense models.
Amazon Nova Lite	A multimodal understanding foundation model. It is multilingual and can reason over text, images, and videos. It excels are real-time customer interactions, document analysis, and visual question-answering.
Amazon Nova Micro	A text-to-text understanding foundation model that is multilingual and can reason over text. It delivers the lowest latency responses at very low cost.
Amazon Nova Premier	A multimodal understanding foundation model. It is multilingual and can reason over text, images, and videos. It is best for most complex reasoning tasks, model distillation.
Amazon Nova Pro	A multimodal understanding foundation model. It is multilingual and can reason over text, images, and videos and provides the family's best combination of accuracy, speed, and cost. It is best for complex multimodal tasks requiring high accuracy.
Amazon Titan	A generative LLM for tasks such as summarization, text generation, classification, open-ended Q&A, and information extraction.

Anthropic LLM availability

LLM	Description
Anthropic Claude 3 Haiku	A generative LLM for conversations, question answering, and workflow automation. Known for speed and efficiency, Haiku models have the most efficiency with the lowest cost.
Anthropic Claude 3 Opus	A generative LLM for conversations, question answering, and workflow automation. Opus is the most intelligent of the Claude 3 family models, but comes with a higher cost.
Anthropic Claude 3.5 Haiku	A generative LLM excelling in tasks like creative writing, conversational AI, and workflow automation, with enhanced reasoning capabilities.
Anthropic Claude 3.5 Sonnet v1	A generative LLM excelling in tasks like complex reasoning, coding, and understanding visual information like charts and graphs, making it particularly useful for data analysis and software development.
Anthropic Claude 3.5 Sonnet v2	A generative LLM excelling in tasks like complex reasoning, coding, and understanding visual information like charts and graphs, making it particularly useful for data analysis and software development. The version 2 upgrade can generate computer actions—for example, keystrokes and mouse clicks—accomplishing tasks that require hundreds of steps.
Anthropic Claude 3.7 Sonnet	A generative LLM for tasks such as classification, sentiment analysis, entity extraction, and summarization. Claude Sonnet 3.7 is the first hybrid reasoning model and the most advanced model before the Claude 4 family launch.
Anthropic Claude Opus 4	A generative LLM that excels at coding, with sustained performance on complex, long-running tasks and agent workflows. Use cases include advanced coding work, autonomous AI agents, agentic search and research, tasks that require complex problem solving.
Anthropic Claude Sonnet 4	A generative LLM for tasks such as classification, sentiment analysis, entity extraction, and summarization. An upgrade to Sonnet 3.7, it offers high performance that is practical for most AI use cases, including user-facing AI assistants and high-volume tasks.

Azure OpenAI LLM availability

LLM	Description
OpenAI GPT-3.5 Turbo	The most capable and cost-effective model in the GPT-3.5 family. It has been optimized for chat and works well for traditional completion tasks as well.
OpenAI GPT-4 Turbo	The most capable and cost-effective model in the GPT-4 family. It has been optimized for chat and works well for traditional completion tasks as well.
OpenAI GPT-4o mini	The most advanced model in the small models category, enabling a broader range of AI applications with its low cost and low latency. In the appropriate use cases, it should be considered as a replacement for GPT-3.5 Turbo series models.
OpenAI GPT-4o	The flagship multimodal model that is more cost-effective and faster than GPT-4 Turbo. It can process text and images, and can generate text outputs with superior performance across multiple languages
OpenAI o3-mini	A next-generation reasoning model designed for complex problem-solving with improved reasoning capabilities and efficiency.
OpenAI o4-mini	An advanced reasoning model optimized for complex problem-solving tasks with enhanced efficiency and cost-effectiveness. It offers solid performance across math, code, and multimodal tasks, while cutting inference costs by an order of magnitude compared to the o3.
OpenAI o1	An advanced reasoning model optimized for complex problem-solving tasks with enhanced efficiency and cost-effectiveness. It excels in complex reasoning tasks, and remains OpenAI's broader general knowledge reasoning model, but at higher resource consumption and cost.
OpenAI o3	An advanced reasoning model optimized for complex problem-solving tasks with enhanced efficiency and cost-effectiveness. This is the full model and is the most intelligent model in the family alongside the o4-mini.
OpenAI o1-mini	A fast, cost-efficient reasoning model particularly effective at coding, math, and science. It offers strong performance for many reasoning being while being significantly more cost-effective than o1-preview.

Google VertexAI-provided LLMs

LLM	Description
Claude 3 Haiku	A generative LLM for conversations, question answering, and workflow automation. Known for speed and efficiency, Haiku models have the most efficiency with the lowest cost.
Claude 3 Opus	A generative LLM for conversations, question answering, and workflow automation. Opus is the most intelligent of the Claude 3 family models, but comes with a higher cost.
Claude 3.5 Haiku	A generative LLM excelling in tasks like creative writing, conversational AI, and workflow automation, with enhanced reasoning capabilities.
Claude 3.5 Sonnet	Outperforms Anthropic's Claude 3 Opus on a wide range of Anthropic's evaluations with the speed and cost of Anthropic's mid-tier model, Claude 3 Sonnet.
Claude 3.5 Sonnet v2	A state-of-the-art model for real-world software engineering tasks and agentic capabilities.
Claude 3.7 Sonnet	Anthropic's most intelligent model to date and the first Claude model to offer extended thinking—the ability to solve complex problems with careful, step-by-step reasoning.
Claude Opus 4	Anthropic's most powerful model yet and the state-of-the-art coding model. Claude Opus 4 delivers sustained performance on long-running tasks that require focused effort and thousands of steps, significantly expanding what AI agents can solve.
Claude Sonnet 4	Anthropic's mid-size model with superior intelligence for high-volume uses, such as coding, in-depth research, and agents.
Google Gemini 1.5 Flash	A multimodal generative LLM that can process text, code, images, audio, and video inputs to generate text outputs. It supports grounding with Google Search, function calling, and system instructions for various tasks including classification, sentiment analysis, entity extraction, and summarization. Flash is optimized for speed and efficiency, making it well-suited for applications that require quick response times and low latency.
Google Gemini 1.5 Pro	A multimodal generative LLM that can process text, code, images, audio, and video inputs to generate text outputs. It supports grounding with Google Search, function calling, and system instructions for various tasks including classification, sentiment analysis, entity extraction, and summarization. Pro is optimized for quality and accuracy in complex tasks.
Google Gemini 2.0 Flash	A multimodal generative LLM that can process text, code, images, audio, and video inputs to generate text outputs. It supports advanced capabilities like grounding with Google Search, code execution, function calling, and system instructions.
Google Gemini 2.0 Flash Lite	The fastest and most cost efficient Flash model. It's a multimodal generative LLM that can process text, code, images, audio, and video inputs to generate text outputs. It's an upgrade path for 1.5 Flash users who want better quality for the same price and speed.
Llama 3.1 8B Instruct MAAS	An 8b multilingual LLM optimized for multilingual dialogue use cases. This is the smallest and most efficient model in the family, optimized for speed and cost-effectiveness. Best used for chat applications, simple content generation, edge deployment, and high-throughput scenarios where cost matters.
Llama 3.1 70B Instruct MAAS	A 70b multilingual LLM optimized for multilingual dialogue use cases. It is a mid-tier model balancing capability and efficiency; it is significantly more capable than the 8B while remaining practical for most use cases, such as production applications requiring good reasoning, content creation, coding assistance, and most enterprise use cases.
Llama 3.1 405B Instruct MAAS	A 405b multilingual LLM optimized for multilingual dialogue use cases. It is the largest and most capable model in the Llama 3.1 family and offers state-of-the-art performance for research, complex analysis, advanced reasoning tasks, and applications requiring the highest quality outputs.
Llama 3.2 90B Vision Instruct	A medium-sized 90B multimodal model that can support image reasoning, such as chart and graph analysis, as well as image captioning.
Llama 3.3 70B Instruct	A text-only 70B instruction-tuned model that provides enhanced performance relative to previous Llama models when used for text-only applications. It is an advanced LLM for reasoning, math, general knowledge, and function calling, often used for multilingual chat, coding assistance, and synthetic data generation.
Llama 4 Maverick 17B 128E Instruct MAAS	Llama 4's largest and most capable model. It uses the Mixture-of-Experts (MoE) architecture and early fusion to provide coding, reasoning, and image capabilities. It is best for complex multimodal tasks requiring both text and image understanding.
Llama 4 Scout 17B 16E Instruct MAAS	A multimodal model that uses the Mixture-of-Experts (MoE) architecture and early fusion, delivering state-of-the-art results for its size class. It is best for multimodal tasks where efficiency is prioritized over maximum capability—applications needing balanced performance.
Mistral CodeStral 2501	A cutting-edge model that's designed for code generation, including fill-in-the-middle and code completion. It is lightweight, fast, and proficient in over 80 programming languages.
Mistral Large 2411	The next version of the Mistral Large (24.07) model with improved reasoning and function calling capabilities. It is for enterprise applications needing the highest quality outputs, advanced function calling and API integration, and multilingual applications with top-tier performance.