Safe Robot Handover Using YOLO and Voice Commands or Visual Triggers

Document Type

Conference Proceeding

Publication Date

12-19-2025

Department

Department of Applied Computing; Department of Manufacturing and Mechanical Engineering Technology

Abstract

This paper presents a comprehensive system for safe object handover from a robot to a human. It utilizes a 5 degree of freedom Trossen RX-200 robotic manipulator, a RealSense D415 depth-sensing camera, and a custom-trained machine vision model using YOLOv11n with segmentation. In this system, the user either uses voice triggers to request a specific tool or the system determines the desired tool on visual perception. The visual perception is derived from YOLOv11n’s object detection and segmentation pipeline, enabling the recognition and classification of tools, the human hand, and possible visual triggers used to indicate which tool is desired. During handover, the proposed system segments each object to be manipulated (such as a knife or screwdriver) into three zones: safe (the region to be grasped by the human), pick (the region to be grasped by the robot), and sharp (the dangerous region). It also plans robot actions to grasp the object’s pick zone and pass the object’s safe zone to the human, with the sharp zone facing away from the human. The system is evaluated through handover scenarios involving the passing of knives and screwdrivers, with results demonstrating safe and reliable operation triggered by either visual cues or voice commands.

Publication Title

2025 3rd International Conference on Artificial Intelligence, Blockchain, and Internet of Things (AIBThings)

Share

COinS