AI Research 67% 1 min readJul 3, 2026, 10:48 PM

Deepseek V4 Flash running on RTX 5090 MoE

30-second summary

Deepseek V4 Flash has been optimized for an RTX 5090 setup, resulting in improved benchmark scores. The optimization achieved a reduction in TG T/S from 22.7 to 21.3 and PP T/S from 1105 to 927.

Deepseek V4 Flash running on RTX 5090 MoE

Key takeaways

Deepseek V4 Flash has been optimized for an RTX 5090 setup
The optimization resulted in improved benchmark scores, including reductions in TG T/S and PP T/S
The user's setup includes an AMD Ryzen 9 9900X3D processor and DDR5 RAM
The MoE model was used without unified KV or memory map

Full story

A user has successfully optimized Deepseek V4 Flash for their setup, which includes an NVIDIA GeForce RTX 5090 and an AMD Ryzen 9 9900X3D processor.

The optimization process involved running benchmark tests with various settings, including Prompt Processing with token ranges from 8192 to 65536, and using the MoE (Mixture of Experts) model without unified KV or memory map. The results show significant improvements in performance, with reductions in both TG T/S and PP T/S.

The specific hardware configuration used for the optimization includes an X870 AORUS ELITE WIFI7 motherboard, 24-core AMD Ryzen 9 9900X3D processor, and DDR5 RAM. The user also specified the use of n-cpu-moe 37, indicating a specific configuration for the MoE model.

These results demonstrate the potential for optimizing Deepseek V4 Flash for specific hardware configurations, which can lead to improved performance and efficiency in various applications.

Source: Deepseek V4 Flash running on RTX 5090 MoE. Read the full piece at the source.

Why this matters

Developers

Optimization techniques can improve performance in AI applications

Businesses

Investors

Students

Everyone

Improved performance can lead to more efficient processing of large datasets

Glossary

MoE: Mixture of Experts, a model that combines multiple expert models to improve performance
TG T/S: Tokens per second, a measure of processing speed
PP T/S: Prompt Processing tokens per second, a measure of processing speed for prompt-based tasks

Sources · 1

Deepseek V4 Flash running on RTX 5090 MoE ↗

TickrWire

How China Is Meddling in America’s AI Debate - The National Interest

1 min read19m ago

TickrWire

Ministry touts benefits of its AI certification - Taipei Times

1 min read2h ago

TickrWire

How leaning too heavily on artificial intelligence fuels student burnout - PsyPost

1 min read6h ago

GPU Survivors: Can You Survive a 1T Parameter Inference Run?

1 min read7h ago