AI Research 67% 1 min readJul 3, 2026, 10:48 PM

Deepseek V4 Flash running on RTX 5090 MoE

30-second summary

Deepseek V4 Flash has been optimized for an RTX 5090 setup, resulting in improved benchmark scores. The optimization achieved a reduction in TG T/S from 22.7 to 21.3 and PP T/S from 1105 to 927.

Deepseek V4 Flash running on RTX 5090 MoE
Key takeaways
  • Deepseek V4 Flash has been optimized for an RTX 5090 setup
  • The optimization resulted in improved benchmark scores, including reductions in TG T/S and PP T/S
  • The user's setup includes an AMD Ryzen 9 9900X3D processor and DDR5 RAM
  • The MoE model was used without unified KV or memory map
Full story

A user has successfully optimized Deepseek V4 Flash for their setup, which includes an NVIDIA GeForce RTX 5090 and an AMD Ryzen 9 9900X3D processor.

The optimization process involved running benchmark tests with various settings, including Prompt Processing with token ranges from 8192 to 65536, and using the MoE (Mixture of Experts) model without unified KV or memory map. The results show significant improvements in performance, with reductions in both TG T/S and PP T/S.

The specific hardware configuration used for the optimization includes an X870 AORUS ELITE WIFI7 motherboard, 24-core AMD Ryzen 9 9900X3D processor, and DDR5 RAM. The user also specified the use of n-cpu-moe 37, indicating a specific configuration for the MoE model.

These results demonstrate the potential for optimizing Deepseek V4 Flash for specific hardware configurations, which can lead to improved performance and efficiency in various applications.

Source: Deepseek V4 Flash running on RTX 5090 MoE. Read the full piece at the source.

Why this matters
Developers

Optimization techniques can improve performance in AI applications

Everyone

Improved performance can lead to more efficient processing of large datasets

Glossary
MoE
Mixture of Experts, a model that combines multiple expert models to improve performance
TG T/S
Tokens per second, a measure of processing speed
PP T/S
Prompt Processing tokens per second, a measure of processing speed for prompt-based tasks
Sources · 1
Related
TickrWire

AI news intelligence. We aggregate, verify, summarise and explain the latest artificial intelligence news from open, legal sources.

Daily AI digest

Top AI stories, summarised, in your inbox each morning.

© 2026 TickrWire. Summaries and analysis are AI-generated and may contain errors.Privacy