國立虎尾科技大學 |

Chain of Thought Reasoning for Robotic Arm Grasping and Embodied Spatial Perception.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Chain of Thought Reasoning for Robotic Arm Grasping and Embodied Spatial Perception./
作者:	Yang, Fan.
面頁冊數:	1 online resource (66 pages)
附註:	Source: Masters Abstracts International, Volume: 85-11.
Contained By:	Masters Abstracts International85-11.
標題:	Computer engineering. -
電子資源:	click for full text (PQDT)
ISBN:	9798382715230

Chain of Thought Reasoning for Robotic Arm Grasping and Embodied Spatial Perception.
Yang, Fan.

Chain of Thought Reasoning for Robotic Arm Grasping and Embodied Spatial Perception. - 1 online resource (66 pages)

Source: Masters Abstracts International, Volume: 85-11.

Thesis (M.S.)--New York University Tandon School of Engineering, 2024.

Includes bibliographical references

The rapid development of language models such as BERT, GPT-3, and GPT-4 in recent years, has promoted the emergence of visual language models and multi-modal models, further enhancing the model's scene perception and interaction capabilities. At the same time, with the development of robots and embedded artificial intelligence, we have also seen continuous growth in research on embodied artificial intelligence (AI). This article introduces how to apply large language models (LLMs), visual models, and multi-modal models to robot tasks to enhance their scene perception and interaction capabilities.Our research is divided into three experiments. The first experiment focused on environmental perception, improving the text output quality of the visual language model in the current scene through carefully designed prompt engineering and auxiliary prompts. The second experiment further explored the interaction between robotic agents and the scene. We designed an end-to-end system based on a large language model and a Thought-to-Action Reasoning (TAR) module to enhance the robotic arm's understanding of target grasping tasks. The third experiment focuses on spatial information understanding, and we propose the Embodied Spatial Reasoning (EMBOSR) module to enhance the robotic agent's understanding of the 3D point cloud scene and answer various questions based on that scene. We propose a human instruction analysis system of robotic arm grasping and a 3D scene perception and question-answering system based on LLMs. The comprehensive reasoning ability of the systems is demonstrated through various simulated and real experiments. They indicate the important role of prompt engineering and chain of thought reasoning in completing robotic tasks, and also the importance and potential value of applying large language models to human-robot interaction tasks.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2024

Mode of access: World Wide Web

ISBN: 9798382715230Subjects--Topical Terms:

569006
Computer engineering.
Subjects--Index Terms:

Computer visionIndex Terms--Genre/Form:

554714
Electronic books.

Chain of Thought Reasoning for Robotic Arm Grasping and Embodied Spatial Perception.
LDR:03272ntm a22003977 4500 001 1152676
005 20241209114629.5
006 m o d
007 cr mn ---uuuuu
008 250605s2024 xx obm 000 0 eng d
020 $a 9798382715230
035 $a (MiAaPQ)AAI31297337
035 $a AAI31297337
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Yang, Fan. $3 891977
245 1 0 $a Chain of Thought Reasoning for Robotic Arm Grasping and Embodied Spatial Perception.
264 0 $c 2024
300 $a 1 online resource (66 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Masters Abstracts International, Volume: 85-11.
500 $a Advisor: Fang, Yi.
502 $a Thesis (M.S.)--New York University Tandon School of Engineering, 2024.
504 $a Includes bibliographical references
520 $a The rapid development of language models such as BERT, GPT-3, and GPT-4 in recent years, has promoted the emergence of visual language models and multi-modal models, further enhancing the model's scene perception and interaction capabilities. At the same time, with the development of robots and embedded artificial intelligence, we have also seen continuous growth in research on embodied artificial intelligence (AI). This article introduces how to apply large language models (LLMs), visual models, and multi-modal models to robot tasks to enhance their scene perception and interaction capabilities.Our research is divided into three experiments. The first experiment focused on environmental perception, improving the text output quality of the visual language model in the current scene through carefully designed prompt engineering and auxiliary prompts. The second experiment further explored the interaction between robotic agents and the scene. We designed an end-to-end system based on a large language model and a Thought-to-Action Reasoning (TAR) module to enhance the robotic arm's understanding of target grasping tasks. The third experiment focuses on spatial information understanding, and we propose the Embodied Spatial Reasoning (EMBOSR) module to enhance the robotic agent's understanding of the 3D point cloud scene and answer various questions based on that scene. We propose a human instruction analysis system of robotic arm grasping and a 3D scene perception and question-answering system based on LLMs. The comprehensive reasoning ability of the systems is demonstrated through various simulated and real experiments. They indicate the important role of prompt engineering and chain of thought reasoning in completing robotic tasks, and also the importance and potential value of applying large language models to human-robot interaction tasks.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2024
538 $a Mode of access: World Wide Web
650 4 $a Computer engineering. $3 569006
650 4 $a Robotics. $3 561941
653 $a Computer vision
653 $a Vision Language Model
653 $a Large language models
653 $a Embodied Spatial Reasoning
653 $a Human-robot interaction
655 7 $a Electronic books. $2 local $3 554714
690 $a 0800
690 $a 0464
690 $a 0771
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a New York University Tandon School of Engineering. $b Electrical & Computer Engineering. $3 1437750
773 0 $t Masters Abstracts International $g 85-11.
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=31297337 $z click for full text (PQDT)