- Steering LLM Reasoning Through Bias-Only Adaptation and
Small Vectors, Big Effects: a Mechanistic Study of RL-Induced Reasoning via Steering Vectors
RL-trained steering vectors match full fine-tuning while remaining interpretable – they suppress languages, appear as first-token substitution, avoid attention-mediated effects, transfer across models, compose across layers, and more.
- In-Context Reinforcement Learning for Variable Action Spaces Headless-AD - a generalization of Algorithm Distillation to variable action spaces.
- Understanding the Effectiveness of Cross-Domain Contrastive Unsupervised Domain Adaptation. What is important in the application of Contrastive Loss in Unsupervised Domain Adaptation
Code: steering-reasoning
Code: headless-ad
Code: cross_domain_contrastive_uda