Latest AI Model Updates (February 2026)
Verified highlights through Feb 13, 2026: OpenAI's GPT-5.3-Codex, Anthropic's Claude Opus 4.6, Mistral Large 3, Google's Gemini 3 Deep Think, Zhipu's GLM-5, Moonshot's Kimi K2.5, MiniMax M2.5, plus DeepSeek V3.2 and Qwen3-Coder-Next.
As of February 13, 2026, frontier AI progress is increasingly about agentic reliability: models that can plan, write, test, refactor, and stay grounded across long tasks. This page has been updated with the latest official releases and rollouts through today.
Note: Despite constant speculation, there are no official GPT-6 / Llama 5 / Grok 5 launch announcements as of Feb 13, 2026. The biggest recent jumps are Claude Opus 4.6, GPT-5.3-Codex, MiniMax M2.5, GLM-5, and Mistral Large 3.
What's New Since November 2025
1) Anthropic Claude Opus 4.6 (Feb 2026)
Latest release: Claude Opus 4.6 released Feb 2026, with a Fast mode research preview announced on Feb 7, 2026.
What changed
- Improved performance for coding, reasoning, and agentic tasks
- Available via Anthropic API, Amazon Bedrock, and Google Cloud Vertex AI
- Fast mode introduces a speed/intelligence trade-off for lower-latency agent loops (research preview)
2) OpenAI GPT-5.2 → GPT-5.3-Codex (Dec 2025 – Feb 2026)
Latest releases: GPT-5.2 (Dec 11, 2025), GPT-5.2-Codex (Dec 18, 2025), and GPT-5.3-Codex (Feb 5, 2026).
Highlights
- GPT-5.2: August 2025 knowledge cutoff; up to 3.2M context and up to 256K output tokens
- GPT-5.2-Codex: Codex-tuned GPT-5.2 variant focused on software engineering workflows
- GPT-5.3-Codex: 25% faster than GPT-5.2-Codex; OpenAI reports new highs on SWE-Bench Pro and Terminal-Bench
3) Google Gemini 3 Deep Think is now available (Feb 2026)
Update: Google announced Gemini 3 Deep Think availability updates in Feb 2026, expanding access beyond the initial Gemini 3 Pro launch (Nov 18, 2025).
What this means
- Deep Think mode becomes available for Ultra subscribers and via API (per Google)
- Best fit for tasks where extra thinking time pays off: planning, difficult debugging, and research synthesis
4) Mistral AI: European Open-Weight Leadership
Latest Releases: Mistral AI released Mistral Large 3 and Ministral 3 family (Dec 2, 2025), Devstral 2 (Dec 10, 2025), and Voxtral Transcribe 2 (Feb 4, 2026).
Mistral Large 3 (Dec 2025)
- 41B active parameters, 675B total parameters (sparse MoE architecture)
- Ranks #2 among open-source non-reasoning models on LMArena leaderboard
- 256K context window for extended document processing
- Released under Apache 2.0 license for maximum openness
- Trained on NVIDIA H200 GPUs with frontier-class capabilities
- Available on Mistral AI Studio, Amazon Bedrock, Azure Foundry, Hugging Face
Ministral 3 Family (Dec 2025)
- Three compact models: 3B, 8B, and 14B parameters
- Best performance-to-cost ratio in their class
- Designed for edge devices, robotics, and on-device AI
- 14B reasoning variant achieves 85% on AIME 2025
- Optimized for NVIDIA Spark, RTX PCs, and Jetson devices
- Multimodal and multilingual capabilities from the start
Devstral 2 & Voxtral (2025-2026)
- Devstral 2: 24B coding model outperforming Qwen 3 Coder (30B)
- Devstral Small 2: Compact variant for local deployment
- Voxtral Transcribe 2: On-device speech-to-text with real-time diarization
- Voxtral: Privacy-first design runs entirely on smartphone/laptop
- Competitive with OpenAI Whisper and Google Cloud Speech on FLEURS benchmark
Ecosystem: Mistral positions itself as Europe's answer to US AI dominance, with $2B+ raised and ~€14B valuation. The company emphasizes efficiency, transparency, and open-source values while maintaining competitive performance.
5) Zhipu AI (Z.ai): China's GLM Family Evolution
Latest Releases: GLM-4.7 (Dec 22, 2025), GLM-5 (Feb 11, 2026), GLM-Image (Jan 2026), and GLM-4.7-Flash (Jan 20, 2026).
GLM-5 (Feb 11, 2026)
- 744B total parameters (doubled from GLM-4.7's 355B)
- Trained on 28.5 trillion tokens
- Shifts from "vibe coding" to "agentic engineering"
- Achieves industry-leading scores for open models in coding and agentic tasks
- Surpasses Gemini 3 Pro on some benchmarks according to internal tests
- Still lags Claude Opus 4.6 on coding benchmarks overall
- Available at $3/month or free for local deployment
GLM-4.7 (Dec 2025)
- 84.9% on LiveCodeBench (ahead of Claude Sonnet 4.5)
- 73.8% on SWE-bench Verified (highest among open-source models at release)
- 95.7% on AIME 2025 mathematics benchmark
- "Preserved Thinking" maintains reasoning chains across multiple turns
- 42.8% on Humanity's Last Exam (41% improvement over predecessor)
- 87.4% on τ²-Bench for multi-step tool usage
GLM-4.7-Flash (Jan 2026)
- 30B-A3B MoE model optimized for local deployment
- Strongest in 30B class for coding and reasoning
- 128K context window for large codebases
- Free tier option via chat.z.ai
- Designed for lightweight deployment with strong performance
GLM Ecosystem Updates
- GLM-Image: Trained entirely on Chinese Huawei Ascend hardware
- AutoGLM-Phone: Multilingual mobile automation framework
- GLM-OCR: High-performance document understanding
- IPO Success: Listed on Hong Kong Stock Exchange (Jan 8, 2026) at HK$116.20
- Stock surged 173% in one month post-IPO to HK$317.80
- US Entity List (Jan 2025) accelerated sovereign AI strategy
Strategic Context: Zhipu (Z.ai) represents China's push for AI sovereignty after US export restrictions. All GLM models from 4.6 onward are trained on domestic Chinese chips (Huawei Ascend, Cambricon, Moore Threads), demonstrating independence from NVIDIA hardware.
6) Moonshot AI: Kimi K2.5 Agent Swarm
Latest Release: Kimi K2.5 released January 27, 2026, introducing groundbreaking Agent Swarm technology.
Kimi K2.5 Key Features
- 1 trillion total parameters, 32B active (MoE architecture)
- Trained on 15 trillion mixed visual and text tokens
- Native multimodal: Vision and language trained together from scratch
- Agent Swarm: Coordinates up to 100 specialized AI agents in parallel
- 4.5x faster task completion compared to sequential processing
- 50.2% on Humanity's Last Exam at 76% lower cost than Claude Opus 4.5
- 78.4% on BrowseComp (vs 60.6% for standard single-agent approach)
Four Operational Modes
- K2.5 Instant: Fast responses for standard queries
- K2.5 Thinking: Extended reasoning for complex problems
- K2.5 Agent: Single autonomous agent for task execution
- K2.5 Agent Swarm: Parallel multi-agent coordination
Technical Innovations
- Visual Coding: Generates code from UI designs and video workflows
- Parallel-Agent Reinforcement Learning: Prevents serial collapse in task execution
- Critical Path Optimization: Measures slowest sub-agent at each stage
- Autonomous image search and layout iteration
- Works best with Kimi Code CLI framework
Availability & Pricing
- API: $0.60/M input tokens, $2.50/M output tokens
- Access via kimi.com (browser), Kimi App (mobile), moonshot.ai (API)
- Open-source under Modified MIT License
- Weights available on Hugging Face and GitHub
- Compatible with OpenAI SDK for easy migration
Use Cases: K2.5 excels at multi-modal AI agents, visual analysis, web development with autonomous iteration, and tool-augmented agentic workflows. The Agent Swarm technology particularly shines in wide information gathering tasks requiring parallel execution.
7) MiniMax M2.5: The $1/Hour Frontier Model
Latest Release: MiniMax M2.5 released February 12, 2026, positioning itself as "intelligence too cheap to meter."
M2.5 Performance Benchmarks
- 80.2% on SWE-Bench Verified (within 0.6pp of Claude Opus 4.6)
- 51.3% on Multi-SWE-Bench (best performance in industry for multilingual)
- 55.4% on SWE-Bench Pro
- 76.3% on BrowseComp (with context management)
- 37% faster than M2.1, matching Claude Opus 4.6 speed
Cost Revolution
- $1 per hour of continuous generation at 100 tokens/second
- 1/10 to 1/20 the cost of GPT-5.3-Codex and Claude Opus 4.6
- M2.5: $0.30/M input, $2.40/M output
- M2.5-Lightning: 100 TPS variant with same pricing
- Makes 24/7 agentic operation economically feasible
Technical Architecture
- 230B total parameters, 10B active (MoE architecture)
- Trained on 200,000+ real-world environments
- Supports 10+ programming languages (Go, C, C++, TypeScript, Rust, Kotlin, Python, Java, JavaScript, PHP, Lua, Dart, Ruby)
- Forge RL Framework: Agent-native reinforcement learning with 40x training speedup
- Spec-Writing Tendency: Plans like an architect before coding
Full-Stack Capabilities
- 0-to-1 system design and environment setup
- 1-to-10 system development
- 10-to-90 feature iteration
- 90-to-100 comprehensive code review and testing
- Web, Android, iOS, and Windows platform support
- Server-side APIs, business logic, databases, and more
Office & Productivity
- Word documents and LaTeX-enabled PDFs
- PowerPoint presentations with professional layouts
- Excel spreadsheets with formulas, pivot tables, and charts
- Performance on MEWC (Microsoft Excel World Championship) problems
Availability
- Open-sourced on HuggingFace and GitHub
- API access via minimax.io
- Free tier available through OpenHands Cloud (limited time)
- Supports private cluster deployment and fine-tuning
- Compatible with vLLM and SGLang for optimal performance
Market Impact: M2.5 represents the first frontier model where cost is truly negligible for continuous use. At $1/hour, developers can run M2.5 24/7 for an entire month for less than $750, making it a game-changer for agentic applications and automated workflows.
8) Other Open-Weight Coding Models
DeepSeek V3.2 (Dec 1, 2025): Released V3.2 with open-source weights and announced a major API price reduction. Continues as cost-effective alternative with 88% on AIME 2025 and 82% on GPQA Diamond.
Qwen3-Coder-Next (Feb 2026): Open-weight MoE coder model with up to 256K context (per model card). Designed for large-scale code generation and refactoring tasks.
Industry Landscape & Competitive Position
| Model | Best For | Key Strength | Released |
|---|---|---|---|
| Claude Opus 4.6 | Coding & Agents | Improved coding/reasoning/agentic tasks (plus Fast mode preview) | Feb 2026 |
| GPT-5.3-Codex | Software engineering | Faster agentic coding; new highs on SWE-Bench Pro & Terminal-Bench | Feb 5, 2026 |
| MiniMax M2.5 | Cost-efficient coding | 80.2% SWE-Bench at 1/10 the cost; $1/hour frontier model | Feb 12, 2026 |
| GLM-5 | Agentic engineering | 744B params, sovereign Chinese AI, competitive with Gemini 3 Pro | Feb 11, 2026 |
| Kimi K2.5 | Agent Swarm | 100 parallel agents, 4.5x faster execution, visual agentic intelligence | Jan 27, 2026 |
| Mistral Large 3 | Open-weight flagship | 675B MoE, #2 open-source on LMArena, Apache 2.0 license | Dec 2, 2025 |
| GPT-5.2 | General purpose | Aug 2025 cutoff; very long context and large outputs | Dec 11, 2025 |
| Gemini 3 Deep Think | Deep reasoning | Deep Think mode availability expands (Ultra + API) | Feb 2026 |
| Ministral 3 | Edge AI | 3B/8B/14B compact models, 85% AIME with 14B reasoning variant | Dec 2, 2025 |
| GLM-4.7 | Coding agents | 84.9% LiveCodeBench, Preserved Thinking, $3/month | Dec 22, 2025 |
| DeepSeek V3.2 | Open-weight coding | Open-source weights + major API price reduction | Dec 1, 2025 |
| Qwen3-Coder-Next | Open-weight coding | MoE coder model with up to 256K context | Feb 2026 |
Key Trends in Early 2026
- Cost disruption: MiniMax M2.5 at $1/hour and GLM-4.7 at $3/month force repricing across the industry.
- Agent-first coding: Frontier releases now optimize for long-running software engineering tasks, not just chat.
- Multi-agent systems: Kimi K2.5's Agent Swarm and MiniMax's parallel execution show the future of AI workflows.
- European open-source push: Mistral Large 3 and Ministral 3 establish Europe as a credible alternative to US/China dominance.
- Chinese AI sovereignty: GLM models trained entirely on domestic chips prove independence from US export restrictions.
- Speed/quality knobs: "Fast modes" and tiered inference are becoming standard for agent loops.
- Open-weight momentum: MiniMax M2.5, GLM-5, Mistral Large 3 keep improving and are easier to deploy across platforms.
- Long-context becomes practical: Bigger contexts matter when agents must stay grounded in repo history and docs.
- Benchmark evolution: SWE-bench variants + terminal-style tasks are now mainstream evaluation targets.
- Platform integration: Models ship alongside app/IDE tooling for end-to-end workflows.
- On-device AI: Ministral 3 and Voxtral show frontier capabilities can run on edge devices.
Sources & Official Links
OpenAI
Anthropic
- Anthropic: Claude model release notes (Opus 4.6)
- Anthropic: API release notes (Opus 4.6 + Fast mode)
Mistral AI
- Mistral AI: Mistral 3 announcement
- Mistral AI: Model documentation
- Mistral AI: Latest news (Voxtral Transcribe 2)
Zhipu AI (Z.ai)
Moonshot AI
MiniMax
Other
Frequently Asked Questions (FAQ)
Is GPT-6 released yet?
As of February 13, 2026, there is no official GPT-6 launch announcement in the sources linked on this page.
Is Llama 5 released?
As of February 13, 2026, Meta's latest official open-weight family is Llama 4; there is no Llama 5 announcement in the sources linked on this page.
Is Grok 5 released?
As of February 13, 2026, xAI's latest public releases include Grok 4.1 and fast variants; there is no Grok 5 launch announcement in the sources linked on this page.
What are the newest coding-focused models?
Recent coding-focused updates include OpenAI's GPT-5.3-Codex, Anthropic's Claude Opus 4.6, MiniMax M2.5, Zhipu's GLM-5, Mistral's Devstral 2, plus open-weight options like DeepSeek V3.2 and Qwen3-Coder-Next.
What should I use for cost-efficient coding?
MiniMax M2.5 at $1/hour and GLM-4.7 at $3/month represent the most cost-efficient frontier coding models. For self-hosting, consider Mistral Large 3, Kimi K2.5, or GLM-5 - all available as open weights.
What about European AI models?
Mistral AI leads Europe's open-source AI efforts with Mistral Large 3 (675B MoE), Ministral 3 (3B/8B/14B edge models), and Devstral 2 for coding. All released under Apache 2.0 license with strong performance and privacy focus.
What's the best model for multi-agent workflows?
Kimi K2.5's Agent Swarm technology can coordinate up to 100 specialized AI agents in parallel, achieving 4.5x faster execution on complex tasks compared to sequential approaches.
Bottom Line (Feb 2026)
Early 2026 is defined by three major shifts: cost disruption (MiniMax M2.5, GLM-4.7), multi-agent systems (Kimi K2.5's Agent Swarm), and geopolitical AI sovereignty (Chinese models on domestic chips, European open-source push).
The gap between open-weight and proprietary models continues to narrow. MiniMax M2.5 matches Claude Opus 4.6 on SWE-Bench at 1/10 the cost. GLM-5 challenges Gemini 3 Pro. Mistral Large 3 ranks #2 among open-source models. The era of expensive, closed AI may be ending faster than expected.
Next frontier: workflow integration, tool-use robustness, multi-modal agent coordination, and continued price/performance improvements from open-weight models.