Nous Research has officially unveiled Hermes 4, a powerful new family of open-weight AI models available in 14B, 70B, and 405B parameter sizes, all built on Llama 3.1 checkpoints. What makes Hermes 4 special is that it reaches frontier-level performance using only post-training techniques โ no secret data, no closed methods.
One of its standout features is hybrid reasoning. These models can switch between standard responses and explicit reasoning using special tags whenever a problem requires deeper deliberation. This gives Hermes 4 the ability to handle complex tasks more intelligently while still providing concise answers when needed.
Whatโs even more impressive is that Hermes 4 sets a new benchmark among open-weight models, delivering state-of-the-art performance while staying completely transparent and following a neutral alignment philosophy. In other words, it proves that cutting-edge reasoning capabilities can be developed entirely through open-source methods โ a huge step forward for the AI community.
๐๐ฎ๐๐ฎ๐๐ผ๐ฟ๐ด๐ฒ: ๐๐ฟ๐ฎ๐ฝ๐ต-๐๐ฎ๐๐ฒ๐ฑ ๐ฆ๐๐ป๐๐ต๐ฒ๐๐ถ๐ฐ ๐๐ฎ๐๐ฎ ๐๐ฒ๐ป๐ฒ๐ฟ๐ฎ๐๐ถ๐ผ๐ป
At the heart of Hermes 4 lies DataForge, a groundbreaking system thatโs changing the way training data is created. Unlike traditional methods where datasets are manually curated, DataForge takes a graph-based approach to synthetic data generation, making the entire process far more powerful and flexible.
Hereโs how it works: DataForge is built on a directed acyclic graph (DAG), where each node represents a specific action defined through PDDL (Planning Domain Definition Language). Every node comes with its own preconditions, postconditions, and transformations, which together enable the automatic creation of complex data pipelines.
By leveraging pre-training seed data from sources like DCLM and FineWeb, DataForge can do some truly impressive things. For example, it can take a Wikipedia article, transform it into a rap song, and then generate instruction-answer pairs based on that transformation.
This innovative system produces around 5 million samples containing a whopping 19 billion tokens. Notably, reasoning-focused samples are intentionally token-heavy โ averaging five times more tokens than regular samples โ to handle detailed thinking traces of up to 16,000 tokens.
In short, DataForge is what gives Hermes 4 its edge, enabling smarter, richer, and more context-aware reasoning by revolutionizing how training data is synthesized.

๐ฅ๐ฒ๐ท๐ฒ๐ฐ๐๐ถ๐ผ๐ป ๐ฆ๐ฎ๐บ๐ฝ๐น๐ถ๐ป๐ด ๐ฎ๐ ๐จ๐ป๐ฝ๐ฟ๐ฒ๐ฐ๐ฒ๐ฑ๐ฒ๐ป๐๐ฒ๐ฑ ๐ฆ๐ฐ๐ฎ๐น๐ฒ
Hermes 4 relies on Atropos, Nous Researchโs open-source reinforcement learning environment, to deliver its advanced reasoning capabilities. Atropos plays a crucial role by implementing rejection sampling across nearly 1,000 different task-specific verifiers, ensuring that the model learns only from high-quality reasoning trajectories spread across a variety of domains.
Some of the key verification environments used in this process include:
Answer Format Training โ rewards the model for maintaining correct formatting across 150+ output formats.
Instruction Following โ leverages RLVR-IFEval tasks to handle complex instructions and constraints more effectively.
Schema Adherence โ ensures accurate JSON generation by validating outputs against Pydantic models.
Tool Use Training โ teaches the model how to demonstrate agent-like behavior when using external tools.
Through this extensive rejection sampling process, Atropos builds a large corpus of verified reasoning trajectories, often creating multiple unique solution paths that lead to the same correct result. This approach ensures that Hermes 4 learns robust reasoning strategies rather than simply memorizing fixed templates.
In essence, Atropos acts as the quality gatekeeper for Hermes 4, shaping it into a more reliable, adaptive, and intelligent reasoning system.