Logo IRVL

TAST-STEM | GROUP 013 | IRVL

DR. YU XIANG | Jikai Wang | Qifan Zhang

Eugene Wang, Kevin Cui, Lucas Wu, Edgar Poon, Pai Zheng, Ellie Tang, Ted Ma, Jeremy Chang

Intelligent Robotics and Vision Lab (IRVL) at the University of Texas, Dallas

PRESENTATION

View our full project presentation and supporting materials.

POSTER

Dive into the qualitative and quantitative analysis conducted throughout the project

Research Poster

Perception Driven Robot Grasping | Simulation Data | Testing Results

PROJECT SYNOPSIS

Our project's overarching focus was to estimate hand and object poses from data collected in a multi-camera system. We configured and tested an end-to-end pipeline for simultaneously tracking hand and object poses from collected recordings as commonly used in AR/VR (XR), robotic and computer vision, security, etc.

CORE SKILLS LEARNED

Before diving into the main system, we built a strong technical foundation. We learned Python, PyTorch for machine learning, Docker for containerized environments, and Ubuntu for a Linux-based workflow. We also studied essential linear algebra concepts like coordinate transformations and matrix operations, which are crucial for pose estimation and 3D geometry.

Research Poster

Perception Driven Robot Grasping | Simulation Data | Testing Results

6 Degrees of Freedom (6 DoF)

To accurately determine the position and orientation of any object in 3D space, we studied the six degrees of freedom: translation along the x, y, and z axes (forward/backward, left/right, up/down), and rotation around those same axes (roll, pitch, and yaw). Mastering these concepts allowed us to understand how robots interpret object locations and orientations for precise manipulation.

Research Poster

Robot Operating System (ROS)

We integrated ROS (Robot Operating System) to bridge our perception pipeline with the physical robot. ROS manages communication between various hardware and software components using a modular system of nodes, topics, messages, and services. For example, one node might process camera images, while another controls the robot gripper, all coordinated through ROS’s communication channels.

Research Poster

Perception Pipeline: NIDS-Net + FoundationPose

Our pipeline begins with NIDS-Net, a neural network used to segment objects in RGB-D images. These segmented outputs are passed into FoundationPose, a model that estimates the full 6D pose of objects. Together, they provide the spatial awareness needed for the robot to plan grasping actions based on visual input.

Research Poster
Research Poster

Simulating with Gazebo

Before deploying on the physical robot, we tested the system inside Gazebo, a powerful robotics simulator. It allowed us to validate the full grasping pipeline in a safe, virtual environment. This helped us fine-tune object segmentation and pose estimation outputs before performing real-world grasps on the Fetch robot.

Grasp Execution on the Fetch Robot

We deployed our pipeline on a real Fetch robot. The robot received RGB-D data, segmented the scene, estimated object poses, and calculated grasp positions. Grasping was successful in most cases. However, challenges arose with cluttered scenes and curved or reflective objects, where pose estimation accuracy could drop due to sensor or model limitations.

OVERVIEW

Hand-Object Interaction

Developing models to understand and predict how hands manipulate objects in 3D space

3D Pose Estimation

Creating algorithms to estimate the 3D position and orientation of objects from images

Robotic Manipulation

Translating computer vision research into robotic control systems

PROCESS

PROJECT VIDEO

Group 13 Presentation Video by Ellie Tang

RESOURCES