Zero-Shot Off-Policy Learning
Preprint
The paper addresses the problem of zero-shot adaptation of Behavioral Foundational Models (BFMs), where a policy extracted at test time is far from optimal. We propose a training-free method which finds a close to optimal policy at test time by leveraging distribution correction coefficient extracted from the pretrained BFM.