Blog - AI Ascend

Jul 10, 2026

The Lethal Trifecta: Why Prompt Injection Is Structural in Tool-Using Agents

The protocols that let agents read your email, query your database, and post to Slack were the most-adopted infrastructure in AI over the past year. They also handed attackers a way in that no input filter can close, because the model cannot tell an instruction from the data it is reading.

23 min read

Jul 06, 2026

Agentic Reinforcement Learning: Training Models to Act, Not Just Answer

RLHF taught a model to answer one question well. Agentic RL asks a harder thing: take fifty actions in a live environment, most of them invisible in the final reward, and still learn which ones mattered. That single change, from one step to a hundred, rewrites the whole training problem.

24 min read

Jul 05, 2026

Context Engineering for Long-Horizon Agents: Managing the Only Resource That Runs Out

An agent that can act for eight hours still thinks inside a window that holds a few hundred thousand tokens. The gap between those two numbers is where most agent failures now live, and closing it has become its own engineering discipline.

26 min read

Jul 04, 2026

Titans: The Sequence Architecture That Learns to Remember While It Runs

Most long-context models compress the past into a fixed-size vector and hope nothing important got squeezed out. Titans instead gives the model a small neural network as its memory, and lets that network keep training on the data it is currently reading.

27 min read

Jul 04, 2026

Multi-Head Latent Attention: How DeepSeek Compressed the KV Cache Without Losing Quality

Every token an LLM generates forces it to reload the keys and values of every token that came before. Multi-Head Latent Attention rewrites that trade by caching one compressed vector instead of dozens of separate heads, cutting memory by more than 90 percent without matching the usual quality tax.

24 min read

Jul 03, 2026

Context Rot: Why Bigger Context Windows Don't Mean Better Retrieval

A million-token window promises perfect recall of everything you feed it. Controlled tests on 18 frontier models show recall degrading steadily, unevenly, and well before the window fills, a pattern researchers now call context rot.

23 min read

Jul 01, 2026

Orthogonalizing Memory Reads: A Muon Trick for Noisy Recurrent Recall

A short experimental note claims that orthogonalizing an mLSTM's matrix memory at read time, borrowing Muon's Newton-Schulz step, sharpens noisy associative recall exactly where the baseline is failing. We walk through the mechanism and check every claim against the primary literature and the code.

15 min read

Jul 01, 2026

Patches Over Tokens: How the Byte Latent Transformer Kills the Tokenizer

A tokenizer decides in advance how many bits of compute every piece of text deserves. The Byte Latent Transformer throws that decision out and lets the entropy of the raw bytes allocate compute instead, matching Llama 3 at 8B parameters while spending up to half the inference FLOPs.

23 min read

Jun 30, 2026

Muon and MuonClip: The Optimizer That Broke Adam's Monopoly on LLM Pretraining

For eight years Adam owned every serious pretraining run. Then a competitive-speedrun optimizer that orthogonalizes its own gradient updates scaled from a 124M-parameter toy to a trillion-parameter model with zero loss spikes. Here is how Muon works, and why it took a clipping trick to survive at scale.

23 min read

The AI Ascend blog.

The Lethal Trifecta: Why Prompt Injection Is Structural in Tool-Using Agents

Agentic Reinforcement Learning: Training Models to Act, Not Just Answer

Context Engineering for Long-Horizon Agents: Managing the Only Resource That Runs Out

Titans: The Sequence Architecture That Learns to Remember While It Runs

Multi-Head Latent Attention: How DeepSeek Compressed the KV Cache Without Losing Quality

Context Rot: Why Bigger Context Windows Don't Mean Better Retrieval

Orthogonalizing Memory Reads: A Muon Trick for Noisy Recurrent Recall

Patches Over Tokens: How the Byte Latent Transformer Kills the Tokenizer

Muon and MuonClip: The Optimizer That Broke Adam's Monopoly on LLM Pretraining