Reference documentation > GenAI reference > LLM reference

LLM reference¶

The following sections provide, for reference, brief descriptions of each available LLM. See the LLM availability page for additional details of max context window, max completion tokens, and the chat model ID. Many descriptions came from the provider websites.

Amazon Bedrock-provided LLMs

LLM	Description
Amazon Nova Lite	A multimodal understanding foundation model. It is multilingual and can reason over text, images, and videos. It excels are real-time customer interactions, document analysis, and visual question-answering.
Amazon Nova Micro	A text-to-text understanding foundation model that is multilingual and can reason over text. It delivers the lowest latency responses at very low cost.
Amazon Nova Premier	A multimodal understanding foundation model. It is multilingual and can reason over text, images, and videos. It is best for most complex reasoning tasks, model distillation.
Amazon Nova Pro	A multimodal understanding foundation model. It is multilingual and can reason over text, images, and videos and provides the family's best combination of accuracy, speed, and cost. It is best for complex multimodal tasks requiring high accuracy.
Amazon Titan	A generative LLM for tasks such as summarization, text generation, classification, open-ended Q&A, and information extraction.
Anthropic Claude Sonnet 4.5	Claude Sonnet 4.5 is Anthropic’s most powerful model for powering real-world agents, with industry-leading capabilities around coding, and computer use. It is the ideal balance of performance and practicality for most internal and external use cases.
Anthropic Claude Sonnet 4	A generative LLM for tasks such as classification, sentiment analysis, entity extraction, and summarization. An upgrade to Sonnet 3.7, it offers high performance that is practical for most AI use cases, including user-facing AI assistants and high-volume tasks.
Anthropic Claude Sonnet 3.7 v1	A generative LLM for tasks such as classification, sentiment analysis, entity extraction, and summarization. Claude Sonnet 3.7 is the first hybrid reasoning model and the most advanced model before the Claude 4 family launch.
Anthropic Claude Sonnet 3.5 v1	A generative LLM excelling in tasks like complex reasoning, coding, and understanding visual information like charts and graphs, making it particularly useful for data analysis and software development.
Anthropic Claude Sonnet 3.5 v2	A generative LLM excelling in tasks like complex reasoning, coding, and understanding visual information like charts and graphs, making it particularly useful for data analysis and software development. The version 2 upgrade can generate computer actions—for example, keystrokes and mouse clicks—accomplishing tasks that require hundreds of steps.
Anthropic Claude Sonnet 3	A generative LLM for conversations, question answering, and workflow automation. Sonnet balances speed, intelligence, and price.
Anthropic Claude Opus 4.6	Claude Opus 4.6 is an Anthropic most intelligent model for building agents and coding.
Anthropic Claude Opus 4.5	Claude Opus 4.5 is an Anthropic premium generative model with maximum intelligence across coding, agents, computer use, and enterprise workflows.
Anthropic Claude Opus 4.1	The next generation of Anthropic’s most powerful model yet, Claude Opus 4.1 is an industry leader for coding. It delivers sustained performance on long-running tasks that require focused effort and thousands of steps, significantly expanding what AI agents can solve. Claude Opus 4.1 is ideal for powering frontier agent products and features.
Anthropic Claude Opus 4	A generative LLM that excels at coding, with sustained performance on complex, long-running tasks and agent workflows. Use cases include advanced coding work, autonomous AI agents, agentic search and research, and tasks that require complex problem solving.
Anthropic Claude Opus 3	A generative LLM for conversations, question answering, and workflow automation. Opus is the most intelligent of the Claude 3 family models, but comes with a higher cost.
Anthropic Claude Haiku 3.5 v1	A generative LLM excelling in tasks like creative writing, conversational AI, and workflow automation, with enhanced reasoning capabilities.
Anthropic Claude Haiku 3	A generative LLM for conversations, question answering, and workflow automation. Known for speed and efficiency, Haiku models have the most efficiency with the lowest cost.
Anthropic Claude 2.1	A generative LLM for conversations, question answering, and workflow automation. This was one of the earlier Claude models that has been superseded by the Claude 3 family.
Cohere Command R	A large language model optimized for conversational interaction and long context tasks. It targets the "scalable" category of models.
Cohere Command R Plus	Cohere's most advanced model for complex RAG and multi-step agent workflows, with 128K context and strong multilingual support.
DeepSeek R1 v1	A generative LLM for tasks like code generation, data analysis, and natural language processing. Use DeepSeek R1 for work that involves mathematical computation, code development, or complex logical reasoning.
Meta Llama 3 family These are transformer-based architecture supporting 8 and 70 billion parameters, respectively, and are optimized for standard NLP tasks.
Meta Llama 3 8B Instruct v1	A generative LLM for tasks like code generation, data analysis, and natural language processing.
Meta Llama 3 70B Instruct v1	A generative LLM for tasks like code generation, data analysis, and natural language processing.
Meta Llama 3.1 family: By expanding the context length over the Llama 3 family, these are stronger in math, logical, and reasoning problems. They supports several advanced use cases, including long-form text summarization, multilingual conversational agents, and coding assistants.
Meta Llama 3.1 8B Instruct v1	A generative LLM for tasks like code generation, data analysis, and natural language processing.
Meta Llama 3.1 70B Instruct v1	A generative LLM for tasks like code generation, data analysis, and natural language processing.
Meta Llama 3.2 family: Introduces small and medium-sized vision LLMs, and lightweight, text-only models that fit onto edge and mobile devices.
Meta Llama 3.2 1B Instruct v1	A generative LLM for tasks like code generation, data analysis, and natural language processing.
Meta Llama 3.2 3B Instruct v1	A generative LLM for tasks like code generation, data analysis, and natural language processing.
Meta Llama 3.2 11B Instruct v1	A generative LLM for tasks like code generation, data analysis, and natural language processing.
Meta Llama 3.2 90B Instruct v1	A generative LLM for tasks like code generation, data analysis, and natural language processing.
Meta Llama 3.3 70B Instruct v1	A generative LLM for tasks like code generation, data analysis, and natural language processing. It matches the performance of Llama 3.1 (405B) on key benchmarks while being significantly smaller, which allows developers to run the model on standard workstations, reducing operational costs.
Meta Llama 4 Maverick 17B Instruct v1	A generative LLM for tasks like code generation, data analysis, and natural language processing. It can work with multiple input types like images, audio, and video along with text and documents.
Mistral Large 2402 v1	A generative LLM for tasks like code generation, data analysis, and natural language processing. This is the largest model in the family and most capable but also most resource-intensive. It is good for Complex reasoning tasks, advanced applications, and research.
Mistral Small 2402 v1	A generative LLM for tasks like code generation, data analysis, and natural language processing. This model is optimized for cost-efficiency, balancing performance and cost for production applications.
Mistral 8x7B Instruct v0	A generative LLM for tasks like code generation, data analysis, and natural language processing. The 8x7B Instruct provides high performance with better efficiency than equivalent dense models.
NVIDIA Nemotron Nano 2 12B	NVIDIA Nemotron Nano 2 12B brings cost efficiency with high accuracy, excelling in multi-image reasoning, video understanding, document intelligence, visual Q&A, and summarization tasks.
NVIDIA Nemotron Nano 2 9B	NVIDIA Nemotron Nano 2 9B is a high-efficiency LLM designed to excel in reasoning and agentic tasks.
OpenAI gpt-oss-20b	A 20B open-weight language model released under the Apache 2.0 license. It is well-suited for reasoning and function calling use cases. The 20B model delivers similar results to OpenAI o3-mini on common benchmarks and can run on edge devices with 16GB of memory.
OpenAI gpt-oss-120b	A 120B open-weight language model released under the Apache 2.0 license. It is well-suited for reasoning and function calling use cases and runs on a single 80GB GPU with near-parity to OpenAI o4-mini on core reasoning benchmarks.

Azure OpenAI-provided LLMs

LLM	Description
OpenAI GPT-5.2	GPT-5.2 is a flagship model for coding and agentic tasks across industries.
OpenAI GPT-5.1	GPT-5.1 is ideal for tasks that require long-form reasoning and synthesis, such as analyzing complex documents or generating detailed reports, and for interactive workflows like spreadsheet editing, structured content creation, and coding assistance.
OpenAI GPT-5 Codex	GPT-5-Codex is a version of GPT-5 optimized for agentic coding tasks in Codex or similar environments.
OpenAI GPT-5	A coding-strong, advanced multimodal model with improved reasoning and agent support. GPT-5 features particular strengths in code generation, bug fixing, and refactoring; instruction following; and long context and tool calling.
OpenAI GPT-5 mini	A fast, cost-efficient variant of GPT-5, optimized for precise tasks and prompts.
OpenAI GPT-5 nano	The fastest, least expensive of the GPT-5 family, the nano is optimized for ultra low-latency, cost-sensitive tasks. It provides solid quality, especially for summarization and classification tasks.
OpenAI GPT-4o	The flagship multimodal model that is more cost-effective and faster than GPT-4 Turbo. It can process text and images, and can generate text outputs with superior performance across multiple languages
OpenAI o4-mini	An advanced reasoning model optimized for complex problem-solving tasks with enhanced efficiency and cost-effectiveness. It offers solid performance across math, code, and multimodal tasks, while cutting inference costs by an order of magnitude compared to the o3.
OpenAI o3	An advanced reasoning model optimized for complex problem-solving tasks with enhanced efficiency and cost-effectiveness. This is the full model and is the most intelligent model in the family alongside the o4-mini.
OpenAI o3-mini	A next-generation reasoning model designed for complex problem-solving with improved reasoning capabilities and efficiency.
OpenAI o1	An advanced reasoning model optimized for complex problem-solving tasks with enhanced efficiency and cost-effectiveness. It excels in complex reasoning tasks, and remains OpenAI's broader general knowledge reasoning model, but at higher resource consumption and cost.

Google VertexAI-provided LLMs

LLM	Description
Google Gemini 3 Pro Preview	Google Gemini 3 Pro Preview is designed to tackle the most challenging agentic problems with strong coding and state-of-the-art reasoning capabilities. It is the best model for complex multimodal understanding. Compared to Gemini 2.5 Pro, it improves significantly on complex instruction following and delivers outcomes with better output efficiency.
Google Gemini 2.5 Flash	A multimodal generative LLM designed for speed, low latency, and cost-efficiency. It can process text, code, images, audio, and video inputs to generate text outputs. It supports advanced capabilities like grounding with Google Search, code execution, function calling, and system instructions. It improves on Gemini 1.5 Flash with the addition of its code execution capability.
Google Gemini 2.5 Pro	A multimodal generative LLM designed for complex reasoning tasks. It can process text, code, images, audio, and video inputs to generate text outputs. It supports advanced capabilities like grounding with Google Search, code execution, function calling, and system instructions. It improves on Gemini 1.5 Pro with the addition of its code execution capability.
Google Gemini 2.0 Flash	A multimodal generative LLM that can process text, code, images, audio, and video inputs to generate text outputs. It supports advanced capabilities like grounding with Google Search, code execution, function calling, and system instructions.
Google Gemini 2.0 Flash Lite	The fastest and most cost efficient Flash model. It's a multimodal generative LLM that can process text, code, images, audio, and video inputs to generate text outputs. It's an upgrade path for 1.5 Flash users who want better quality for the same price and speed.
Claude Sonnet 4.5	Anthropic's mid-sized model for powering real-world agents, with capabilities in coding, computer use, cybersecurity, and working with office files like spreadsheets.
Claude Sonnet 4	Anthropic's mid-size model with superior intelligence for high-volume uses, such as coding, in-depth research, and agents.
Claude 3.7 Sonnet	Anthropic's most intelligent model to date and the first Claude model to offer extended thinking—the ability to solve complex problems with careful, step-by-step reasoning.
Claude 3.5 Sonnet	Outperforms Anthropic's Claude 3 Opus on a wide range of Anthropic's evaluations with the speed and cost of Anthropic's mid-tier model, Claude 3 Sonnet.
Claude 3.5 Sonnet v2	A state-of-the-art model for real-world software engineering tasks and agentic capabilities.
Claude Opus 4.1	Anthropic's most powerful model yet and the state-of-the-art coding model. Claude Opus 4.1 delivers sustained performance on long-running tasks that require focused effort and thousands of steps, significantly expanding what AI agents can solve.
Claude Opus 4	Anthropic's Claude Opus 4 delivers sustained performance on long-running tasks that require focused effort and thousands of steps, significantly expanding what AI agents can solve.
Claude 3 Opus	A generative LLM for conversations, question answering, and workflow automation. Opus is the most intelligent of the Claude 3 family models, but comes with a higher cost.
Claude 3.5 Haiku	A generative LLM excelling in tasks like creative writing, conversational AI, and workflow automation, with enhanced reasoning capabilities.
Claude 3 Haiku	A generative LLM for conversations, question answering, and workflow automation. Known for speed and efficiency, Haiku models have the most efficiency with the lowest cost.
Llama 3.1 8B Instruct MAAS	An 8b multilingual LLM optimized for multilingual dialogue use cases. This is the smallest and most efficient model in the family, optimized for speed and cost-effectiveness. Best used for chat applications, simple content generation, edge deployment, and high-throughput scenarios where cost matters.
Llama 3.1 70B Instruct MAAS	A 70b multilingual LLM optimized for multilingual dialogue use cases. It is a mid-tier model balancing capability and efficiency; it is significantly more capable than the 8B while remaining practical for most use cases, such as production applications requiring good reasoning, content creation, coding assistance, and most enterprise use cases.
Llama 3.1 405B Instruct MAAS	A 405b multilingual LLM optimized for multilingual dialogue use cases. It is the largest and most capable model in the Llama 3.1 family and offers state-of-the-art performance for research, complex analysis, advanced reasoning tasks, and applications requiring the highest quality outputs.
Llama 3.2 90B Vision Instruct	A medium-sized 90B multimodal model that can support image reasoning, such as chart and graph analysis, as well as image captioning.
Llama 3.3 70B Instruct	A text-only 70B instruction-tuned model that provides enhanced performance relative to previous Llama models when used for text-only applications. It is an advanced LLM for reasoning, math, general knowledge, and function calling, often used for multilingual chat, coding assistance, and synthetic data generation.
Llama 4 Maverick 17B 128E Instruct MAAS	Llama 4's largest and most capable model. It uses the Mixture-of-Experts (MoE) architecture and early fusion to provide coding, reasoning, and image capabilities. It is best for complex multimodal tasks requiring both text and image understanding.
Llama 4 Scout 17B 16E Instruct MAAS	A multimodal model that uses the Mixture-of-Experts (MoE) architecture and early fusion, delivering state-of-the-art results for its size class. It is best for multimodal tasks where efficiency is prioritized over maximum capability—applications needing balanced performance.

Anthropic LLM availability

LLM	Description
Anthropic Claude Sonnet 4.5	Claude Sonnet 4.5 is Anthropic's best Sonnet model to date for agents, coding, and computer use. Anthropic's most accurate and detailed model for long-running tasks, with enhanced domain knowledge in coding, finance, and cybersecurity.
Anthropic Claude Sonnet 4	A generative LLM for tasks such as classification, sentiment analysis, entity extraction, and summarization. An upgrade to Sonnet 3.7, it offers high performance that is practical for most AI use cases, including user-facing AI assistants and high-volume tasks.
Anthropic Claude Sonnet 3.7	A generative LLM for tasks such as classification, sentiment analysis, entity extraction, and summarization. Claude Sonnet 3.7 is the first hybrid reasoning model and the most advanced model before the Claude 4 family launch.
Anthropic Claude Sonnet 3.5 v1	A generative LLM excelling in tasks like complex reasoning, coding, and understanding visual information like charts and graphs, making it particularly useful for data analysis and software development.
Anthropic Claude Sonnet 3.5 v2	A generative LLM excelling in tasks like complex reasoning, coding, and understanding visual information like charts and graphs, making it particularly useful for data analysis and software development. The version 2 upgrade can generate computer actions—for example, keystrokes and mouse clicks—accomplishing tasks that require hundreds of steps.
Anthropic Claude Opus 4.6	Claude Opus 4.6 is an Anthropic most intelligent model for building agents and coding.
Anthropic Claude Opus 4.5	Claude Opus 4.5 is an Anthropic premium generative model with maximum intelligence across coding, agents, computer use, and enterprise workflows.
Anthropic Claude Opus 4.1	Claude Opus 4.1 is a drop-in replacement for Opus 4. It delivers superior performance and precision for real-world coding and agentic tasks. It handles complex, multistep problems with more rigor and attention to detail.
Anthropic Claude Opus 4	A generative LLM that excels at coding, with sustained performance on complex, long-running tasks and agent workflows. Use cases include advanced coding work, autonomous AI agents, agentic search and research, tasks that require complex problem solving.
Anthropic Claude Haiku 3.5	A generative LLM excelling in tasks like creative writing, conversational AI, and workflow automation, with enhanced reasoning capabilities.
Anthropic Claude Haiku 3	A generative LLM for conversations, question answering, and workflow automation. Known for speed and efficiency, Haiku models have the most efficiency with the lowest cost.

Cerebras-provided LLMs

LLM	Description
Cerebras Llama 3.1 8B	Llama 3.1 8B is an 8B multilingual LLM optimized for chat and instruction-following on Cerebras Inference.
Cerebras Llama 4 Scout 17B 16E Instruct	A multimodal-ready instruction model offering excellent performance for chat, reasoning, and instruction-following tasks.
Cerebras Qwen 3 235B Instruct	A large instruction-tuned model hosted on Cerebras suitable for evaluation and advanced text tasks.

TogetherAI-provided LLMs

LLM	Description
Arcee AI Virtuoso Large	Arcee AI's most powerful and versatile general-purpose model, designed to excel at handling complex and varied tasks across domains. With state-of-the-art performance, it offers unparalleled capability for nuanced understanding, contextual adaptability, and high accuracy.
Arcee AI Coder-Large	A high-performance model tailored for intricate programming tasks, Coder-Large thrives in software development environments. With its focus on efficiency, reliability, and adaptability, it supports developers in crafting, debugging, and refining code for complex systems.
Arcee AI Maestro Reasoning	An advanced reasoning model optimized for high-performance enterprise applications. Building on the innovative training techniques first deployed in maestro-7b-preview, Maestro Reasoning offers significantly enhanced reasoning capabilities at scale, rivaling or surpassing leading models like OpenAI's O1 and DeepSeek's R1, but at substantially reduced computational costs.
Google Gemma 3N E4B Instruct	An LLM optimized for efficient execution on mobile and low-resource devices (such as phones, laptops, and tablets), Gemma 3N E4B Instruct supports multimodal inputs—including text, visual data, and audio—enabling diverse tasks such as text generation, speech recognition, translation, and image analysis.
Meta Llama 3 8B Instruct Lite	A lightweight optimized version of Llama 3 8B optimized for dialogue use cases and efficient inference and instruction-following tasks.
Meta Llama 3.1 8B Instruct Turbo	An optimized high-performance version of Llama 3.1 8B designed for fast inference and instruction-following tasks.
Meta Llama 3.2 3B Instruct Turbo	A compact and efficient model optimized for fast inference while maintaining strong instruction-following capabilities.
Meta Llama 3.3 70B Instruct Turbo	An advanced large-scale model with enhanced capabilities for complex reasoning and instruction-following tasks, and is optimized for multilingual dialogue use cases.
Meta Llama 4 Maverick Instruct (17Bx128E)	A mixture-of-experts model with 17B parameters and 128 experts, designed for high-performance and efficient inference across diverse tasks.
Mistral Small 3 Instruct (24B)	A generative LLM that is pre-trained and instructed for generative AI tasks that require robust language and instruction-following performance with very low latency.
Mistral Mixtral-8x7B Instruct v0.1	A pretrained generative Sparse Mixture of Experts for chat and instruction use. Incorporates 8 experts (feed-forward networks) for a total of 47 billion parameters.