Abstract
Technical Summary
Architecture
Overview of the proposed NILS framework for labeling long-horizon robot play sequences in a zero-shot manner using an ensemble of pretrained expert models. NILS consists of three stages:
- all relevant objects in the video are detected
- object-centric changes are detected and collected
- object change information is used to detect keystates and an LLM is prompted to generate a language label for the task
Examples
These Examples showcase annotations generated by our framework and the respective scene annotations. Press Play to start playing the long-horizon trajectory and sample to sample a new trajectory.
Video
Last Keystate
Scene Annotations
Generated Labels
Example Labeling Videos
These videos showcase the annotations generated by NILS on BridgeV2 and Fractal. The bounding boxes are the boxes obtained after Stage 2 and NILS’ filtering steps.
Annotations for Bridge V2
Annotations for Fractal 2022
Policy Rollouts
These examples showcase some tasks performed by a policy trained on our real-kitchen dataset that is annotated by NILS. The policy is evaluated on the same toy kitchen.
Following examples are rollouts of an Octo policy trained on the BridgeV2 dataset using the labels generated by NILS. Both real-world and simulation (using SimplerEnv) rollouts were performed.
Place the green spoon on top of the rag
Place the green spoon on top of the rag
Place the sushi inside the wooden bowl
Place the sushi inside the green bowl
Place the yellow spoon on top the blue cloth
Relocate the yellow spoon from the table to inside the blue cloth
Failure Cases
Move the fork to the left
Move the fork away from the round object
Move the fork to the left
Clean the pan with the kitchen towel
Place the pan on top of the kitchen towel, next to the chicken wing and spoon
Move the pan 29.5 pixels to the left and 79.5 pixels forward
Dust the lamp
Polish the silverware
Wipe up the spill
Place the toy corn in the center of the table, next to the blue cup
Shift the toy corn 130.5 pixels to the right
Relocate the toy corn from the left side of the table to the center
Pick up the soda can and place it to the left of the toy mouse
Relocate the soda can from its initial position to the left of the toy mouse
Place the soda can next to the toy mouse on its left side
Lift the pot lid off the sausage toy
Take the pot lid off the sausage toy
Uncover the sausage toy by removing the pot lid
Citation
@inproceedings{
blank2024scaling,
title={Scaling Robot Policy Learning via Zero-Shot Labeling with Foundation Models},
author={Nils Blank and Moritz Reuss and Marcel R{\"u}hle and {\"O}mer Erdin{\c{c}} Ya{\u{g}}murlu and Fabian Wenzel and Oier Mees and Rudolf Lioutikov},
booktitle={8th Annual Conference on Robot Learning},
year={2024},
url={https://openreview.net/forum?id=EdVNB2kHv1}
}