Motivation

L-SR1 is a learned quasi-Newton optimizer that uses compact modules to build a rank-one curvature preconditioner from limited memory. SR1-style updates can be indefinite and yield unstable directions; naive PSD constructions via outer products are stable but violate the secant relation.

PGSM (Projection-Guided Secant Mechanism) addresses this by keeping a PSD rank-one preconditioner while guiding it with a secant penalty during meta-training. We ask whether learned optimizers benefit from curvature information—and show improved convergence, stability, and generalization across dimensions and on human mesh recovery.

Learned SR1 (L-SR1)

L-SR1 extends limited-memory SR1 with three lightweight, element-wise MLP modules—an input encoder ℰ, a vector generator 𝒫, and a learning-rate generator 𝒢—that read the current optimization state and produce a new rank-one vector for a fixed buffer ℬL. The inverse-Hessian estimate is reconstructed implicitly as a sum of outer products, yielding a PSD preconditioner and a descent direction applied with coordinate-wise step sizes. All components are dimension-invariant, so the same learned optimizer transfers across problem sizes without retraining.

L-SR1 block diagram (Fig. 2).
Fig. 2 — L-SR1 iteration: ℰ, 𝒫, 𝒢, and buffer ℬL.

PGSM — Projection-Guided Secant Mechanism

We construct preconditioners that remain PSD while approximately satisfying the quasi-Newton secant equation. The structured form B̃k = I + Σi vivi uses vectors from 𝒫; during meta-training, a secant penalty ‖pk − B̃kqk‖² is added to the trajectory loss. This encourages curvature-consistent updates without explicit projection at inference time.

PGSM geometry figure.
PGSM geometry — PSD projection with secant guidance.

Human mesh recovery (3DPW)

We integrate L-SR1 into the learned-optimization HMR framework of Song et al., replacing their gradient-descent update module while keeping the trainable initializer. The model is meta-trained on AMASS with self-supervision and evaluated on 3DPW without fine-tuning. L-SR1 improves PA-MPJPE and runtime relative to LGD at matched inner-step budgets.

2D reprojection error vs. iteration (3DPW). PA-MPJPE vs. iteration (3DPW).
Method PA-MPJPE Time (ms)
LGD (4 steps)55.90664
L-SR1 (4 steps)51.74364
L-SR1 w/o secant (4 steps)52.80364
L-SR1 (best horizon)51.581183

BibTeX

Official ICML proceedings @inproceedings citation will be added when available.

@misc{lifshitz2025lsr1,
  title         = {{L-SR1}: Learned Symmetric-Rank-One Preconditioning},
  author        = {Lifshitz, Gal and Zuler, Shahar and Fouks, Ori and Raviv, Dan},
  year          = {2025},
  eprint        = {2508.12270},
  archivePrefix = {arXiv},
  primaryClass  = {cs.LG},
  url           = {https://arxiv.org/abs/2508.12270},
}