$ stat ./projects/gssr.md
Title: Goal-Conditioned State Space Reasoning (GSSR)
Date: 2/1/2026
Description: This project explores test-time adaptation of hybrid State Space Models (SSMs) by injecting goal-conditioned perturbations directly into the recurrent latent states during inference. The goal is to steer model behavior toward desired outcomes without any weight updates and create a lightweight form of test-time fine-tuning that leverages the natural recurrence of SSMs.
(( Open on GitHub ))AGI is hiding in latent space.
This project explores test-time adaptation of hybrid State Space Models (SSMs) by manipulating the recurrent latent states prior to or during during inference. The goal is to steer model behavior toward desired outcomes without any weight updates and create a lightweight form of test-time fine-tuning that leverages the natural recurrence of SSMs.
I hypothesize that test-time adaptation of hybrid State Space Models (SSMs), such as those in the IBM Granite-4.0 series, can be effectively achieved through
goal-conditioned perturbations applied to recurrent latent statesduring inference OR bymergine cached states prior to inferernceor some combination thereof. This approach could enable dynamic steering of model outputs toward desired behaviors without updating model parameters, thereby demonstrating a lightweight form of test-time fine-tuning that preserves training stability while enhancing adaptability to specific tasks or domains.
Early results show clear steering effects but also inherent instability. Random perturbations already change behavior noticeably suggesting latent state manipulation is a viable control channel.
As a first step toward goal-conditioned latent steering (part of my research into GSSR), I hacked on IBM's Granite-4.0 hybrid model (Mamba-2 + attention), located the recurrent SSM states in the late layers, and perturbed them mid-generation with a simple random nudge. The output changed immediately from clean answers to repetition lock-in or foreign-code gibberish at higher strength.
Building a Bayesian GSSR Goal Network
Continuing the work from Hacking Granite-4.0-Hybrid (above), this write-up talks through the process of creating and conditioning the goal state to apply as perturbations to selected states in a hybrid transformer-ssm model to steer model outputs in pusuit of a viable test-time fine-tuning strategy.
Goal-Conditioned State Caching
...
Inspecting HybridMambaAttentionDynamicCache attributes...
Cache attributes:
.conv_states: type=<class 'list'>
length=32, first item type=<class 'torch.Tensor'>
example shape=torch.Size([1, 1792, 4]), mean=-0.0420
.get_mask_sizes: type=<class 'method'>
.get_seq_length: type=<class 'method'>
.has_previous_state: type=<class 'bool'>
.is_compileable: type=<class 'bool'>
.key_cache: type=<class 'list'>
length=32, first item type=<class 'torch.Tensor'>
example shape=torch.Size([1, 0]), mean=nan
.layers_block_type: type=<class 'list'>
length=32, first item type=<class 'str'>
.reorder_cache: type=<class 'method'>
.ssm_states: type=<class 'list'>
length=32, first item type=<class 'torch.Tensor'>
example shape=torch.Size([1, 48, 32, 128]), mean=0.0000
.transformer_layers: type=<class 'list'>
length=4, first item type=<class 'int'>
.update: type=<class 'method'>
.value_cache: type=<class 'list'>
length=32, first item type=<class 'torch.Tensor'>
example shape=torch.Size([1, 0]), mean=nan
Found 'ssm_states': type=<class 'list'>, length=32
Mamba state 0: shape=torch.Size([1, 48, 32, 128]), mean=0.0000
Mamba state 1: shape=torch.Size([1, 48, 32, 128]), mean=0.0000
Mamba state 2: shape=torch.Size([1, 48, 32, 128]), mean=0.0000
Mamba state 3: shape=torch.Size([1, 48, 32, 128]), mean=-0.0000
Mamba state 4: shape=torch.Size([1, 48, 32, 128]), mean=0.0001
Mamba state 5: shape=torch.Size([1, 48, 32, 128]), mean=0.0011
Mamba state 6: shape=torch.Size([1, 48, 32, 128]), mean=0.0009
Mamba state 7: shape=torch.Size([1, 48, 32, 128]), mean=0.0012
Mamba state 8: shape=torch.Size([1, 48, 32, 128]), mean=0.0006
Mamba state 9: shape=torch.Size([1, 48, 32, 128]), mean=0.0001
Mamba state 10: shape=torch.Size([1, 0]), mean=nan
...
Unsteered Output and Steered OutputEncoding Goal: 'Achieve maximum energy efficiency and system stability.'
Starting Training Step...
DEBUG tensor inputs_embeds: shape=torch.Size([1, 30, 768]), device=cuda:0, dtype=torch.bfloat16, mean=0.000595
DEBUG tensor inputs_embeds after concat: shape=torch.Size([1, 38, 768]), device=cuda:0, dtype=torch.bfloat16, mean=0.000496
Step 1 | Loss: 6.2888 | KL: 1072.0000 DEBUG tensor inputs_embeds: shape=torch.Size([1, 31, 768]), device=cuda:0, dtype=torch.bfloat16, mean=0.000273
DEBUG tensor inputs_embeds after concat: shape=torch.Size([1, 39, 768]), device=cuda:0, dtype=torch.bfloat16, mean=0.000313
Step 2 | Loss: 19.7368 | KL: 1328.0000
DEBUG tensor inputs_embeds: shape=torch.Size([1, 30, 768]), device=cuda:0, dtype=torch.bfloat16, mean=0.000362 DEBUG tensor inputs_embeds after concat: shape=torch.Size([1, 38, 768]), device=cuda:0, dtype=torch.bfloat16, mean=0.000370
Step 3 | Loss: 56.1229 | KL: 1528.0000
Computing Gradients...
Success: Gradients successfully propagated to Goal Encoder and GRU.
Goal Encoder Grad Norm: 0.000000
GRU Grad Norm: 1.828125
Testing Plan Generation (Latent Rollout)...
Generated Plan Sample:
------------------------------
Plan to achieve goal:
Step 1: 2
>7
) and their value for 3. 2
)
Step 2: during that this and the process, you are using.
past_key_values (cached states)._prepare_inputs, summarize_and_capture_state, generate and generate_using_cache.All operations are external — no model weights are modified.
If you're working on SSMs, test-time adaptation, latent steering, Bayesian filtering at inference, or goal-conditioned generation, I'd love to collaborate or hear your thoughts.
Open to PRs, ideas, and/or discussions.
# Install deps
uv add torch transformers
# Run test/training script
uv run main.py
uv run gen.py
uv run test_bench.py
uv run train.py
LLM-Integrated Bayesian State Space Models for Multimodal Time-Series Forecasting
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
LaGarNet: Goal-Conditioned Recurrent State-Space Models for Pick-and-Place Garment Flattening
Act2Goal: From World Model To General Goal-conditioned Policy
Feel free to open issues, email me at wallscreet@proton.me or DM @wallscreet on X with questions/ideas.
Finding related projects...
$ cd .. && ./projects.sh
← Back to all projects