mhc-algorithm

@benchflow-ai/mhc-algorithm

benchflow-ai

230

165 forks

Updated 1/18/2026

View on GitHub

Implement mHC (Manifold-Constrained Hyper-Connections) for stabilizing deep network training. Use when implementing residual connection improvements with doubly stochastic matrices via Sinkhorn-Knopp algorithm. Based on DeepSeek's 2025 paper (arXiv:2512.24880).

Installation

$skills install @benchflow-ai/mhc-algorithm

Claude Code

Cursor

Copilot

Codex

Antigravity

Details

Repositorybenchflow-ai/skillsbench

Pathtasks/mhc-layer-impl/environment/skills/mhc-algorithm/SKILL.md

Branchmain

Scoped Name@benchflow-ai/mhc-algorithm

Usage

After installing, this skill will be available to your AI coding assistant.

Verify installation:

skills list

Skill Instructions

name: mhc-algorithm description: Implement mHC (Manifold-Constrained Hyper-Connections) for stabilizing deep network training. Use when implementing residual connection improvements with doubly stochastic matrices via Sinkhorn-Knopp algorithm. Based on DeepSeek's 2025 paper (arXiv:2512.24880).

mHC: Manifold-Constrained Hyper-Connections

Overview

mHC (Manifold-Constrained Hyper-Connections) stabilizes deep network training by constraining residual mixing matrices to be doubly stochastic. It provides:

Stable Training: Lower gradient norm variance via doubly stochastic constraints
Multiple Streams: Hyper-Connections with learnable mixing across residual streams
Sinkhorn Projection: Log-space Sinkhorn-Knopp algorithm for doubly stochastic projection
GPT Integration: Pattern for wrapping attention and MLP layers

Two components:

HyperConnections Module: Core PyTorch module with H_res, H_pre, H_post matrices
Sinkhorn-Knopp: Log-space projection to doubly stochastic manifold

Quick Reference

Topic	Reference
Core Concepts & Math	Core Concepts
Sinkhorn Algorithm	Sinkhorn-Knopp
HyperConnections Module	Module Implementation
GPT Integration	GPT Integration
Common Pitfalls	Pitfalls

Installation

# Required packages
pip install torch einops numpy

Minimal Example

import torch
import torch.nn as nn
from einops import rearrange, einsum

def sinkhorn_knopp(logits, num_iters=20, tau=0.05):
    log_alpha = logits / tau
    for _ in range(num_iters):
        log_alpha = log_alpha - torch.logsumexp(log_alpha, dim=-1, keepdim=True)
        log_alpha = log_alpha - torch.logsumexp(log_alpha, dim=-2, keepdim=True)
    return torch.exp(log_alpha)

class HyperConnections(nn.Module):
    def __init__(self, num_streams, dim, branch=None, layer_idx=0):
        super().__init__()
        self.num_streams = num_streams
        self.branch = branch

        # Initialize H_res near identity
        init_h_res = torch.full((num_streams, num_streams), -8.0)
        init_h_res.fill_diagonal_(0.0)
        self.H_res_logits = nn.Parameter(init_h_res)

        # H_pre/H_post for depth connections
        init_h_pre = torch.full((1, num_streams), -8.0)
        init_h_pre[0, layer_idx % num_streams] = 0.0
        self.H_pre_logits = nn.Parameter(init_h_pre)
        self.H_post_logits = nn.Parameter(torch.zeros(1, num_streams))

    def forward(self, x):
        s = self.num_streams
        x = rearrange(x, "(b s) t d -> b t s d", s=s)

        h_res = sinkhorn_knopp(self.H_res_logits)
        x_mixed = einsum(h_res, x, "s t, b n s d -> b n t d")

        h_pre = self.H_pre_logits.softmax(dim=-1)
        branch_in = einsum(h_pre, x, "v s, b n s d -> b n v d").squeeze(-2)

        branch_out = self.branch(branch_in) if self.branch else branch_in

        h_post = self.H_post_logits.softmax(dim=-1)
        depth_out = einsum(branch_out, h_post, "b t d, v s -> b t s d")

        output = x_mixed + depth_out
        return rearrange(output, "b t s d -> (b s) t d")

Common Imports

import torch
import torch.nn as nn
import torch.nn.functional as F
from einops import rearrange, einsum, repeat, reduce

When to Use What

Scenario	Approach
Standard residual connection	No mHC needed
Deep networks (>12 layers) with stability issues	Use mHC with num_streams=4
GPT/Transformer training	Wrap both attention and MLP with HyperConnections
Custom Sinkhorn iterations	Adjust num_iters (20 default) and tau (0.05 default)
Memory-constrained training	Reduce num_streams or batch size

External Resources

mHC Paper: https://arxiv.org/abs/2512.24880
Hyper-Connections: https://arxiv.org/abs/2409.19606
Sinkhorn's Theorem: https://en.wikipedia.org/wiki/Sinkhorn%27s_theorem

More by benchflow-ai

View all

fjsp-baseline-repair-with-downtime-and-policy

230

Repair an (often imperfect) Flexible Job Shop Scheduling baseline into a downtime-feasible, precedence-correct schedule while staying within policy budgets and matching the evaluator’s exact metrics and “local minimal right-shift” checks.

temporal-python-testing

230

Test Temporal workflows with pytest, time-skipping, and mocking strategies. Covers unit testing, integration testing, replay testing, and local development setup. Use when implementing Temporal workflow tests or debugging test failures.

locational-marginal-prices

230

Extract locational marginal prices (LMPs) from DC-OPF solutions using dual values. Use when computing nodal electricity prices, reserve clearing prices, or performing price impact analysis.

Managed Package Architecture

230

This skill should be used when the user asks to "design package structure", "create managed package", "configure 2GP", "set up namespace", "version management", or mentions managed package topics like "LMA", "subscriber orgs", or "package versioning". Provides comprehensive guidance for second-generation managed package (2GP) architecture, ISV development patterns, and package lifecycle management.