A transformer-based framework for 6D object pose estimation, extending RT-DETR with parallel pose heads to predict 3D rotation and translation of known objects from RGB / RGB-D images. RACE6D is a 6D ...
2026-07-02 Teaching Vision-Language-Action Models What to See and Where to Look Yuguang Yang et.al. 2607.01658 link 2026-07-02 VLAFlow: A Unified Training Framework for Vision-Language-Action Models ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results