Skip to content

LLM reference

The following sections provide, for reference, brief descriptions of each available LLM. See the LLM availability page for additional details of max context window, max completion tokens, and the chat model ID. Many descriptions came from the provider websites.

Amazon Bedrock-provided LLMs

Meta Llama is licensed under the Meta Llama 4 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.

LLM Description
Amazon Nova Lite A multimodal understanding foundation model. It is multilingual and can reason over text, images, and videos. It excels are real-time customer interactions, document analysis, and visual question-answering.
Amazon Nova Micro A text-to-text understanding foundation model that is multilingual and can reason over text. It delivers the lowest latency responses at very low cost.
Amazon Nova Premier A multimodal understanding foundation model. It is multilingual and can reason over text, images, and videos. It is best for most complex reasoning tasks, model distillation.
Amazon Nova Pro A multimodal understanding foundation model. It is multilingual and can reason over text, images, and videos and provides the family's best combination of accuracy, speed, and cost. It is best for complex multimodal tasks requiring high accuracy.
Amazon Titan A generative LLM for tasks such as summarization, text generation, classification, open-ended Q&A, and information extraction.
Anthropic Claude 2.1 A generative LLM for conversations, question answering, and workflow automation. This was one of the earlier Claude models that has been superseded by the Claude 3 family.
Anthropic Claude 3 Haiku A generative LLM for conversations, question answering, and workflow automation. Known for speed and efficiency, Haiku models have the most efficiency with the lowest cost.
Anthropic Claude 3 Opus A generative LLM for conversations, question answering, and workflow automation. Opus is the most intelligent of the Claude 3 family models, but comes with a higher cost.
Anthropic Claude 3 Sonnet A generative LLM for conversations, question answering, and workflow automation. Sonnet balances speed, intelligence, and price.
Anthropic Claude 3.5 Haiku v1 A generative LLM excelling in tasks like creative writing, conversational AI, and workflow automation, with enhanced reasoning capabilities.
Anthropic Claude 3.5 Sonnet v1 A generative LLM excelling in tasks like complex reasoning, coding, and understanding visual information like charts and graphs, making it particularly useful for data analysis and software development.
Anthropic Claude 3.5 Sonnet v2 A generative LLM excelling in tasks like complex reasoning, coding, and understanding visual information like charts and graphs, making it particularly useful for data analysis and software development. The version 2 upgrade can generate computer actions—for example, keystrokes and mouse clicks—accomplishing tasks that require hundreds of steps.
Anthropic Claude 3.7 Sonnet v1 A generative LLM for tasks such as classification, sentiment analysis, entity extraction, and summarization. Claude Sonnet 3.7 is the first hybrid reasoning model and the most advanced model before the Claude 4 family launch.
Anthropic Claude Opus 4 A generative LLM that excels at coding, with sustained performance on complex, long-running tasks and agent workflows. Use cases include advanced coding work, autonomous AI agents, agentic search and research, and tasks that require complex problem solving.
Anthropic Claude Sonnet 4 A generative LLM for tasks such as classification, sentiment analysis, entity extraction, and summarization. An upgrade to Sonnet 3.7, it offers high performance that is practical for most AI use cases, including user-facing AI assistants and high-volume tasks.
Anthropic Claude Opus 4.1 The next generation of Anthropic’s most powerful model yet, Claude Opus 4.1 is an industry leader for coding. It delivers sustained performance on long-running tasks that require focused effort and thousands of steps, significantly expanding what AI agents can solve. Claude Opus 4.1 is ideal for powering frontier agent products and features.
Anthropic Claude Sonnet 4.5 Claude Sonnet 4.5 is Anthropic’s most powerful model for powering real-world agents, with industry-leading capabilities around coding, and computer use. It is the ideal balance of performance and practicality for most internal and external use cases.
Cohere Command R A large language model optimized for conversational interaction and long context tasks. It targets the "scalable" category of models.
Cohere Command R Plus Cohere's most advanced model for complex RAG and multi-step agent workflows, with 128K context and strong multilingual support.
DeepSeek R1 v1 A generative LLM for tasks like code generation, data analysis, and natural language processing. Use DeepSeek R1 for work that involves mathematical computation, code development, or complex logical reasoning.
Meta Llama 3 family These are transformer-based architecture supporting 8 and 70 billion parameters, respectively, and are optimized for standard NLP tasks.
Meta Llama 3 8B Instruct v1 A generative LLM for tasks like code generation, data analysis, and natural language processing.
Meta Llama 3 70B Instruct v1 A generative LLM for tasks like code generation, data analysis, and natural language processing.
Meta Llama 3.1 family: By expanding the context length over the Llama 3 family, these are stronger in math, logical, and reasoning problems. They supports several advanced use cases, including long-form text summarization, multilingual conversational agents, and coding assistants.
Meta Llama 3.1 8B Instruct v1 A generative LLM for tasks like code generation, data analysis, and natural language processing.
Meta Llama 3.1 70B Instruct v1 A generative LLM for tasks like code generation, data analysis, and natural language processing.
Meta Llama 3.1 405B Instruct v1 A generative LLM for tasks like code generation, data analysis, and natural language processing. This is the largest model in the series, and provides the highest flexibility and state-of-the-art capabilities that rival even the best closed-source models.
Meta Llama 3.2 family: Introduces small and medium-sized vision LLMs, and lightweight, text-only models that fit onto edge and mobile devices.
Meta Llama 3.2 1B Instruct v1 A generative LLM for tasks like code generation, data analysis, and natural language processing.
Meta Llama 3.2 3B Instruct v1 A generative LLM for tasks like code generation, data analysis, and natural language processing.
Meta Llama 3.2 11B Instruct v1 A generative LLM for tasks like code generation, data analysis, and natural language processing.
Meta Llama 3.2 90B Instruct v1 A generative LLM for tasks like code generation, data analysis, and natural language processing.
Meta Llama 3.3 70B Instruct v1 A generative LLM for tasks like code generation, data analysis, and natural language processing. It matches the performance of Llama 3.1 (405B) on key benchmarks while being significantly smaller, which allows developers to run the model on standard workstations, reducing operational costs.
Meta Llama 4 Maverick 17B Instruct v1 A generative LLM for tasks like code generation, data analysis, and natural language processing. It can work with multiple input types like images, audio, and video along with text and documents.
Meta Llama 4 Scout 17B Instruct v1 A generative LLM for tasks like code generation, data analysis, and natural language processing. It enables a variety of use cases, including multi-document summarization and parsing extensive documents.
Mistral 7B Instruct v0 A generative LLM for tasks like code generation, data analysis, and natural language processing. This is the smallest and most efficient of the Mistral family, best for lightweight applications, edge deployment, and basic chatbots.
Mistral Large 2402 v1 A generative LLM for tasks like code generation, data analysis, and natural language processing. This is the largest model in the family and most capable but also most resource-intensive. It is good for Complex reasoning tasks, advanced applications, and research.
Mistral Small 2402 v1 A generative LLM for tasks like code generation, data analysis, and natural language processing. This model is optimized for cost-efficiency, balancing performance and cost for production applications.
Mistral 8x7B Instruct v0 A generative LLM for tasks like code generation, data analysis, and natural language processing. The 8x7B Instruct provides high performance with better efficiency than equivalent dense models.
OpenAI gpt-oss-20b A 20B open-weight language model released under the Apache 2.0 license. It is well-suited for reasoning and function calling use cases. The 20B model delivers similar results to OpenAI o3-mini on common benchmarks and can run on edge devices with 16GB of memory.
OpenAI gpt-oss-120b A 120B open-weight language model released under the Apache 2.0 license. It is well-suited for reasoning and function calling use cases and runs on a single 80GB GPU with near-parity to OpenAI o4-mini on core reasoning benchmarks.
Azure OpenAI-provided LLMs
LLM Description
OpenAI GPT-3.5 Turbo A cost-effective chat model in the GPT-3.5 family, optimized for chat and suitable for traditional completion tasks. While still available, the provider recommends using the GPT-4o-mini, although the GPT-3.5 Turbo is still available for use in the API.
OpenAI GPT-4 Turbo A GPT-4 family model optimized for chat and traditional completion tasks with a long context window and tool calling. While still available, the provider recommends using the newer GPT-4o.
OpenAI GPT-4o The flagship multimodal model that is more cost-effective and faster than GPT-4 Turbo. It can process text and images, and can generate text outputs with superior performance across multiple languages
OpenAI GPT-4o mini The most advanced model in the small models category, enabling a broader range of AI applications with its low cost and low latency. In the appropriate use cases, it should be considered as a replacement for GPT-3.5 Turbo series models.
OpenAI GPT-5 A coding-strong, advanced multimodal model with improved reasoning and agent support. GPT-5 features particular strengths in code generation, bug fixing, and refactoring; instruction following; and long context and tool calling.
OpenAI GPT-5 mini A fast, cost-efficient variant of GPT-5, optimized for precise tasks and prompts.
OpenAI GPT-5 nano The fastest, least expensive of the GPT-5 family, the nano is optimized for ultra low-latency, cost-sensitive tasks. It provides solid quality, especially for summarization and classification tasks.
OpenAI o1 An advanced reasoning model optimized for complex problem-solving tasks with enhanced efficiency and cost-effectiveness. It excels in complex reasoning tasks, and remains OpenAI's broader general knowledge reasoning model, but at higher resource consumption and cost.
OpenAI o1-mini A fast, cost-efficient reasoning model particularly effective at coding, math, and science. It offers strong performance for many reasoning being while being significantly more cost-effective than o1-preview. While still available, the provider recommends using the newer o3-mini model that features higher intelligence at the same latency.
OpenAI o3 An advanced reasoning model optimized for complex problem-solving tasks with enhanced efficiency and cost-effectiveness. This is the full model and is the most intelligent model in the family alongside the o4-mini.
OpenAI o3-mini A next-generation reasoning model designed for complex problem-solving with improved reasoning capabilities and efficiency.
OpenAI o4-mini An advanced reasoning model optimized for complex problem-solving tasks with enhanced efficiency and cost-effectiveness. It offers solid performance across math, code, and multimodal tasks, while cutting inference costs by an order of magnitude compared to the o3.
Google VertexAI-provided LLMs

Meta Llama is licensed under the Meta Llama 4 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.

LLM Description
Claude 3 Haiku A generative LLM for conversations, question answering, and workflow automation. Known for speed and efficiency, Haiku models have the most efficiency with the lowest cost.
Claude 3 Opus A generative LLM for conversations, question answering, and workflow automation. Opus is the most intelligent of the Claude 3 family models, but comes with a higher cost.
Claude 3.5 Haiku A generative LLM excelling in tasks like creative writing, conversational AI, and workflow automation, with enhanced reasoning capabilities.
Claude 3.5 Sonnet Outperforms Anthropic's Claude 3 Opus on a wide range of Anthropic's evaluations with the speed and cost of Anthropic's mid-tier model, Claude 3 Sonnet.
Claude 3.5 Sonnet v2 A state-of-the-art model for real-world software engineering tasks and agentic capabilities.
Claude 3.7 Sonnet Anthropic's most intelligent model to date and the first Claude model to offer extended thinking—the ability to solve complex problems with careful, step-by-step reasoning.
Claude Opus 4 Anthropic's Claude Opus 4 delivers sustained performance on long-running tasks that require focused effort and thousands of steps, significantly expanding what AI agents can solve.
Claude Sonnet 4 Anthropic's mid-size model with superior intelligence for high-volume uses, such as coding, in-depth research, and agents.
Claude Opus 4.1 Anthropic's most powerful model yet and the state-of-the-art coding model. Claude Opus 4.1 delivers sustained performance on long-running tasks that require focused effort and thousands of steps, significantly expanding what AI agents can solve.
Claude Sonnet 4.5 Anthropic's mid-sized model for powering real-world agents, with capabilities in coding, computer use, cybersecurity, and working with office files like spreadsheets.
Google Gemini 1.5 Flash A multimodal generative LLM that can process text, code, images, audio, and video inputs to generate text outputs. It supports grounding with Google Search, function calling, and system instructions for various tasks including classification, sentiment analysis, entity extraction, and summarization. Flash is optimized for speed and efficiency, making it well-suited for applications that require quick response times and low latency.
Google Gemini 1.5 Pro A multimodal generative LLM that can process text, code, images, audio, and video inputs to generate text outputs. It supports grounding with Google Search, function calling, and system instructions for various tasks including classification, sentiment analysis, entity extraction, and summarization. Pro is optimized for quality and accuracy in complex tasks.
Google Gemini 2.0 Flash A multimodal generative LLM that can process text, code, images, audio, and video inputs to generate text outputs. It supports advanced capabilities like grounding with Google Search, code execution, function calling, and system instructions.
Google Gemini 2.0 Flash Lite The fastest and most cost efficient Flash model. It's a multimodal generative LLM that can process text, code, images, audio, and video inputs to generate text outputs. It's an upgrade path for 1.5 Flash users who want better quality for the same price and speed.
Google Gemini 2.5 Flash A multimodal generative LLM designed for speed, low latency, and cost-efficiency. It can process text, code, images, audio, and video inputs to generate text outputs. It supports advanced capabilities like grounding with Google Search, code execution, function calling, and system instructions. It improves on Gemini 1.5 Flash with the addition of its code execution capability.
Google Gemini 2.5 Pro A multimodal generative LLM designed for complex reasoning tasks. It can process text, code, images, audio, and video inputs to generate text outputs. It supports advanced capabilities like grounding with Google Search, code execution, function calling, and system instructions. It improves on Gemini 1.5 Pro with the addition of its code execution capability.
Llama 3.1 8B Instruct MAAS An 8b multilingual LLM optimized for multilingual dialogue use cases. This is the smallest and most efficient model in the family, optimized for speed and cost-effectiveness. Best used for chat applications, simple content generation, edge deployment, and high-throughput scenarios where cost matters.
Llama 3.1 70B Instruct MAAS A 70b multilingual LLM optimized for multilingual dialogue use cases. It is a mid-tier model balancing capability and efficiency; it is significantly more capable than the 8B while remaining practical for most use cases, such as production applications requiring good reasoning, content creation, coding assistance, and most enterprise use cases.
Llama 3.1 405B Instruct MAAS A 405b multilingual LLM optimized for multilingual dialogue use cases. It is the largest and most capable model in the Llama 3.1 family and offers state-of-the-art performance for research, complex analysis, advanced reasoning tasks, and applications requiring the highest quality outputs.
Llama 3.2 90B Vision Instruct A medium-sized 90B multimodal model that can support image reasoning, such as chart and graph analysis, as well as image captioning.
Llama 3.3 70B Instruct A text-only 70B instruction-tuned model that provides enhanced performance relative to previous Llama models when used for text-only applications. It is an advanced LLM for reasoning, math, general knowledge, and function calling, often used for multilingual chat, coding assistance, and synthetic data generation.
Llama 4 Maverick 17B 128E Instruct MAAS Llama 4's largest and most capable model. It uses the Mixture-of-Experts (MoE) architecture and early fusion to provide coding, reasoning, and image capabilities. It is best for complex multimodal tasks requiring both text and image understanding.
Llama 4 Scout 17B 16E Instruct MAAS A multimodal model that uses the Mixture-of-Experts (MoE) architecture and early fusion, delivering state-of-the-art results for its size class. It is best for multimodal tasks where efficiency is prioritized over maximum capability—applications needing balanced performance.
Mistral CodeStral 2501 A cutting-edge model that's designed for code generation, including fill-in-the-middle and code completion. It is lightweight, fast, and proficient in over 80 programming languages.
Mistral Large 2411 The next version of the Mistral Large (24.07) model with improved reasoning and function calling capabilities. It is for enterprise applications needing the highest quality outputs, advanced function calling and API integration, and multilingual applications with top-tier performance.
Anthropic LLM availability
LLM Description
Anthropic Claude 3 Haiku A generative LLM for conversations, question answering, and workflow automation. Known for speed and efficiency, Haiku models have the most efficiency with the lowest cost.
Anthropic Claude 3 Opus A generative LLM for conversations, question answering, and workflow automation. Opus is the most intelligent of the Claude 3 family models, but comes with a higher cost.
Anthropic Claude 3.5 Haiku A generative LLM excelling in tasks like creative writing, conversational AI, and workflow automation, with enhanced reasoning capabilities.
Anthropic Claude 3.5 Sonnet v1 A generative LLM excelling in tasks like complex reasoning, coding, and understanding visual information like charts and graphs, making it particularly useful for data analysis and software development.
Anthropic Claude 3.5 Sonnet v2 A generative LLM excelling in tasks like complex reasoning, coding, and understanding visual information like charts and graphs, making it particularly useful for data analysis and software development. The version 2 upgrade can generate computer actions—for example, keystrokes and mouse clicks—accomplishing tasks that require hundreds of steps.
Anthropic Claude 3.7 Sonnet A generative LLM for tasks such as classification, sentiment analysis, entity extraction, and summarization. Claude Sonnet 3.7 is the first hybrid reasoning model and the most advanced model before the Claude 4 family launch.
Anthropic Claude Opus 4 A generative LLM that excels at coding, with sustained performance on complex, long-running tasks and agent workflows. Use cases include advanced coding work, autonomous AI agents, agentic search and research, tasks that require complex problem solving.
Anthropic Claude Sonnet 4 A generative LLM for tasks such as classification, sentiment analysis, entity extraction, and summarization. An upgrade to Sonnet 3.7, it offers high performance that is practical for most AI use cases, including user-facing AI assistants and high-volume tasks.
Anthropic Claude Opus 4.1 Claude Opus 4.1 is a drop-in replacement for Opus 4. It delivers superior performance and precision for real-world coding and agentic tasks. It handles complex, multistep problems with more rigor and attention to detail.
Anthropic Claude Sonnet 4.5 Claude Sonnet 4.5 is Anthropic's best Sonnet model to date for agents, coding, and computer use. Anthropic's most accurate and detailed model for long-running tasks, with enhanced domain knowledge in coding, finance, and cybersecurity.
Cerebras-provided LLMs

Meta Llama is licensed under the Meta Llama 4 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.

LLM Description
Cerebras Llama 3.1 8B Llama 3.1 8B is an 8B multilingual LLM optimized for chat and instruction-following on Cerebras Inference.
Cerebras Llama 3.3 70B Llama 3.3 70B is a 70B instruction-tuned model for text tasks with strong multilingual performance.
Cerebras Llama 4 Scout 17B 16E Instruct A multimodal-ready instruction model offering excellent performance for chat, reasoning, and instruction-following tasks.
Cerebras Qwen 3 32B Qwen 3 32B is optimized for fast, accurate text tasks on Cerebras Inference.
Cerebras Qwen 3 235B Instruct A large instruction-tuned model hosted on Cerebras suitable for evaluation and advanced text tasks.
TogetherAI-provided LLMs

Meta Llama is licensed under the Meta Llama 4 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.

LLM Description
Arcee AI Virtuoso Large Arcee AI's most powerful and versatile general-purpose model, designed to excel at handling complex and varied tasks across domains. With state-of-the-art performance, it offers unparalleled capability for nuanced understanding, contextual adaptability, and high accuracy.
Arcee AI Coder-Large A high-performance model tailored for intricate programming tasks, Coder-Large thrives in software development environments. With its focus on efficiency, reliability, and adaptability, it supports developers in crafting, debugging, and refining code for complex systems.
Arcee AI Maestro Reasoning An advanced reasoning model optimized for high-performance enterprise applications. Building on the innovative training techniques first deployed in maestro-7b-preview, Maestro Reasoning offers significantly enhanced reasoning capabilities at scale, rivaling or surpassing leading models like OpenAI's O1 and DeepSeek's R1, but at substantially reduced computational costs.
Google Gemma 3N E4B Instruct An LLM optimized for efficient execution on mobile and low-resource devices (such as phones, laptops, and tablets), Gemma 3N E4B Instruct supports multimodal inputs—including text, visual data, and audio—enabling diverse tasks such as text generation, speech recognition, translation, and image analysis.
Marin Community Marin 8B Instruct An instruction-tuned language model developed by the Marin Community (a collaborative effort to develop open-source foundation models) for conversational and task-oriented applications.
Meta Llama 3 8B Instruct Lite A lightweight optimized version of Llama 3 8B optimized for dialogue use cases and efficient inference and instruction-following tasks.
Meta Llama 3 70B Instruct Reference The standard reference implementation of Llama 3 70B for advanced instruction-following and conversational tasks.
Meta Llama 3.1 8B Instruct Turbo An optimized high-performance version of Llama 3.1 8B designed for fast inference and instruction-following tasks.
Meta Llama 3.1 405B Instruct Turbo The largest and most capable model in the Llama 3.1 series, optimized for complex reasoning, advanced instruction-following, and high-performance applications.
Meta Llama 3.2 3B Instruct Turbo A compact and efficient model optimized for fast inference while maintaining strong instruction-following capabilities.
Meta Llama 3.3 70B Instruct Turbo An advanced large-scale model with enhanced capabilities for complex reasoning and instruction-following tasks, and is optimized for multilingual dialogue use cases.
Meta Llama 4 Maverick Instruct (17Bx128E) A mixture-of-experts model with 17B parameters and 128 experts, designed for high-performance and efficient inference across diverse tasks.
Meta Llama 4 Scout Instruct (17Bx16E) A mixture-of-experts model with 17B parameters and 16 experts, optimized for balanced performance and efficient resource utilization.
Mistral Small 3 Instruct (24B) A generative LLM that is pre-trained and instructed for generative AI tasks that require robust language and instruction-following performance with very low latency.
Mistral (7B) Instruct A high-performing, industry-standard 7.3B parameter model fine-tuned for instruction-following, with optimizations for speed and context length.
Mistral (7B) Instruct v0.2 An improved version of the 7B instruction-tuned model with enhanced performance and capabilities.
Mistral (7B) Instruct v0.3 The latest version of the 7B instruction-tuned model with further improvements in performance and reliability.
Mistral Mixtral-8x7B Instruct v0.1 A pretrained generative Sparse Mixture of Experts for chat and instruction use. Incorporates 8 experts (feed-forward networks) for a total of 47 billion parameters.