Grasping With Common Sense. How to leverage large language models… | by Nikolaus Correll | Mar, 2024


How to leverage large language models for robotic grasping and code generation

Grasping and manipulation remain a hard, unsolved problem in robotics. Grasping is not just about identifying points where to put your fingers on an object to create sufficient constraints. Grasping is also about applying just enough force to pick up the object with breaking it, while making sure it can be put to its intended use. At the same time, grasping provides critical sensor input to detect what an object is and what its properties are. With mobility essentially solved, grasping and manipulation remains the final frontier in unlocking truely autonomous labor replacements.

Imagine you are sending your humanoid robot companion to the supermarket and tell it “Check the avocados for ripeness and grab an avocado for guacomole today”. There is a lot of stuff going on:

  1. The quality “ripeness” is not obvious from the avocado’s color as would be the case for a strawberry or a tomato, but requires tactile information
  2. “Grab an avocado”, in particular a ripe one, implies a certain gentleness when handling it. When picking up an Avocado, you might be less careful than when picking up a raspberry, but also be cognizant of not damaging the legume.
  3. “Guacomole today” implies a certain ripeness level that is different than “for Guacomole in three days from now”.

With the advent of large language models such as ChatGPT, a lot of this information has already been extracted from the internet. Indeed, you might ask ChatGPT how to determine an Avocado’s ripeness, what to pay attention to for appropriately grasping it, how long it will take to ripen, and how it would feel then. But, how can we leverage this information for robotic manipulation? In our paper “DeliGrasp: Inferring Object Mass, Friction, and Compliance with LLMs for Adaptive and Minimally Deforming Grasp Policies” [1], we prompt ChatGPT to generate code for a reactive grasping controller that is parameterized with estimated physical properties. The approach is illustrated in the Figure below and has been validated on twelve, previously unknown, delicate and deformable items including food, produce, toys, and other everyday items, spanning two orders of magnitude in mass and required pick-up force.

: Large language models (LLMs) have rich physical knowledge about worldly objects, but cannot directly reason robot grasps for them. Paired with open-world localization and pose estimation (left), our method (middle), queries LLMs for the salient physical characteristics of mass, friction, and compliance as the basis for an adaptive grasp controller. DeliGrasp policies successfully grasp delicate and deformable objects (right). These policies also produce compliance feedback as measured spring constants, which we leverage for downstream tasks like picking ripe produce (middle). Picture from https://arxiv.org/pdf/2403.07832v1.pdf



Source link

Be the first to comment

Leave a Reply

Your email address will not be published.


*