Project R-13992

Title

Deep Learning based methods for transparent object grasping (Research)

Abstract

Due to the property of reflectivity and refraction, transparent objects are hard to be perceived by RGB-D cameras, such as Intel RealSense, Microsoft Kinect, etc. Existing studies on transparent object detection could be divided into two types, traditional methods and deep learning based methods. Traditional methods, e.g., IR stereo, light field photography, structured light sensing and cross-modal stereo, these methods mainly rely on physical detection which brings a huge cost on the design of an elaborate visual module and is hard to be promoted in different scenes. Opposed to the traditional methods, deep learning based methods reduce the dependence on specific physical equipment and are better to be adopted in different scenarios. Although there are many different practices to detect transparent objects in a deep learning manner, most of them have been restricted by various aspects. For example, Sajjan et al. estimated the 3D geometry of transparent objects from a single RGB-D image according to the predicted depth representations and Cholesky optimization, which is suffering from huge computational cost. Kalra et al. proposed a deep learning framework for transparent object segmentation by processing the data collected from polarization cameras. However, it could not achieve promising results on a real robot for grasping the transparent objects. To solve these problems, in this research, I will leverage multiple deep learning methods to deal with these tasks, which could be divided into four parts as follows: 1. The preprocessing of point cloud data. Since the training process is conducted on synthetic objects and the test process is conducted on real-world objects, it is necessary to extract common characteristics from the point cloud data, such as RGB information, surface normal, transparent object mask and occlusion boundary and so on. 2. Feature extraction. After obtaining the different common characteristics from part one, there are different ways to deal with these characteristics. For example, for the 2D features, such as RGB information, transparent object masks and occlusion boundary, we adopt ResNet101 and DenseNet to extract dense features. For 3D features, like surface normal and depth maps, transformer-based methods would a better choice to extract effective features. 3. The introduction of Generative Adversarial Network (GAN). According to previous work, we notice that the boundaries of the generative point cloud are blurred, which lead to non-ideal results on the overall prediction. Thus, in this research, we will introduce GAN-based method in cooperation with Sobel operator to improve the performance of the overall results. 4. Robot execution is the last part required to verify our proposed method, we prepare to execute our method on a real-world robot for execution.

Period of project

16 June 2023 - 31 March 2024