TIES and DARE Merging

Merging the weights of two fine-tuned language models with a simple average reliably degrades performance on both tasks. The culprit is not averaging itself; it is interference between the delta parameters each model has accumulated. TIES and DARE are two algorithmic responses to that interference, and together they underpin most of the serious model-merging work happening in 2023-2025.

The Task Vector Picture

To reason about merging, you need the concept of a task vector (introduced in Ilharco et al., ICLR 2023). Given a pre-trained model with weights \(\theta_0\) and a fine-tuned model with weights \(\theta_{ft}\), the task vector is simply:

\[\tau = \theta_{ft} - \theta_0\]

A merged model is then assembled as:

\[\theta_{merged} = \theta_0 + \lambda \cdot \sum_{i=1}^{n} \tau_i\]

where \(\lambda\) is a scaling coefficient and the sum runs over \(n\) fine-tuned models you want to combine. The arithmetic is seductive in its simplicity. The problem is that \(\tau_i\) and \(\tau_j\) will often disagree on the same weight: one pushes it positive, the other pushes it negative. Summing them cancels signal from both models. That is parameter interference.

TIES: Trim, Elect Sign, Merge

TIES-Merging (Yadav et al., NeurIPS 2023) attacks interference with three sequential steps applied to each weight position independently.

Step 1 - Trim. Most fine-tuned weights move very little from \(\theta_0\). Those tiny perturbations are noise, not signal, and including them adds interference without adding task knowledge. TIES trims by zeroing out all delta values whose absolute magnitude falls below a top-\(k\) threshold: only the \(k\%\) largest-magnitude deltas in each task vector are kept. Typical values of \(k\) are 20 to 50 percent.

Step 2 - Elect sign. After trimming, each surviving weight position \(j\) has a set of non-zero delta values across the \(n\) models. Some are positive, some negative. TIES resolves this by electing a consensus sign for each position: whichever sign has greater total absolute mass wins.

\[\hat{s}_j = \text{sign}\!\left(\sum_{i=1}^{n} \tau_i^{(j)}\right)\]

Step 3 - Disjoint merge. Only the models whose delta at position \(j\) agrees with the elected sign \(\hat{s}_j\) contribute to the final value. The rest are masked out. The survivors are averaged:

\[\theta_{merged}^{(j)} = \theta_0^{(j)} + \lambda \cdot \frac{1}{|\mathcal{A}_j|} \sum_{i \in \mathcal{A}_j} \tau_i^{(j)}\]

where \(\mathcal{A}_j\) is the subset of models with sign-aligned deltas at position \(j\).

The intuition: a weight that strongly wants to go up across several tasks should go up. A weight pulled in opposite directions by different tasks is a conflict; TIES lets the majority coalition win rather than letting them cancel.

# Pseudo-trace for 3 models, single weight position j
delta = [-0.8, +0.3, +0.6]          # raw deltas after trim
elected_sign = sign(sum) = sign(0.1) = +1
aligned_models = {model_2, model_3}  # model_1 is negative, excluded
merged_delta = mean([+0.3, +0.6]) = +0.45

The Task Vector Picture

TIES: Trim, Elect Sign, Merge

Keep reading with Pro.