國立虎尾科技大學 |

Human and AI Interpretations of Photogrammetrically Captured Scenes.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	Human and AI Interpretations of Photogrammetrically Captured Scenes./
作者:	Rubinstein, Jacob.
面頁冊數:	1 online resource (78 pages)
附註:	Source: Masters Abstracts International, Volume: 85-12.
Contained By:	Masters Abstracts International85-12.
標題:	Computer science. -
電子資源:	click for full text (PQDT)
ISBN:	9798383162606

Human and AI Interpretations of Photogrammetrically Captured Scenes.
Rubinstein, Jacob.

Human and AI Interpretations of Photogrammetrically Captured Scenes. - 1 online resource (78 pages)

Source: Masters Abstracts International, Volume: 85-12.

Thesis (M.S.)--University of Maryland, Baltimore County, 2024.

Includes bibliographical references

3D technologies are increasingly prevalent and powerful, fundamentally reshaping how we interpret and comprehend information. This additional modality changes the way both humans and AI perceive scenes they are shown and interact with. This work aims to explore this shift from multiple angles. Chapter 2 delves into the ramifications of three-dimensional space on AI agents, while Chapter 3 explores how humans can harness 3D techniques to enhance collaboration for the preservation of cultural heritage.Virtual reality is increasingly utilized to support embodied AI agents, such as robots, engaged in 'sim-to-real' based learning approaches. At the same time, tools such as large vision-and-language models offer new capabilities that tie into a wide variety of tasks and capabilities. In order to understand how such agents can learn from simulated environments, Chapter 2 explores a language model's ability to recover the type of object represented by a photorealistic 3D model as a function of the 3D perspective from which the model is viewed. We used photogrammetry to create 3D models of commonplace objects and rendered 2D images of these models from an fixed set of 420 virtual camera perspectives. A well-studied image and language model (CLIP) was used to generate text (i.e., prompts) corresponding to these images. Using multiple instances of various object classes, we studied which camera perspectives were most likely to return accurate text categorizations for each class of object.Affordable drones and geotagged photos have created many new opportunities for geospatial analysis, with divergent application domains such as historical preservation, national defense, and disaster response. In chapter 3, we analyze a series of group work tasks comprising a project to index a cemetery with incomplete records of its older sections, while noting that many of these group work tasks are agnostic to the application domain in question. To prepare for the group work, hundreds of images are captured by a pre-programmed flight of a consumer-grade quadcopter at low altitude. These images are then orthorectified to create a web-based map layer of sufficiently high resolution for group members to visually identify and annotate individual gravestones. Group members then visit the site in person and capture close-up and contextual geotagged photos using mobile phones. Contextual photos are framed such that their positions can be determined using the web-based map layer and visual landmarks. As on-site photos are captured, group members can work off-site to annotate the web-based map and link these annotations to a third-party website, findagrave.com, where they upload photos and type metadata (e.g., names, dates, notes). Gravestones and other positions of interest which require other on-site actions are marked as such on the map and group members return to the site to take these actions. Notably, group members can participate in any number of tasks within the workflow, and different phases of work can happen in parallel for different parts of the cemetery.Throughout this work, the focus is on understanding how a 2D image from a single perspective enables an agent (human or AI) to understand the 3D context of that image. The presence of key visual indicators - whether a stem of an apple or a tree behind a grave - is important for both humans and AI to comprehend the meaning afforded to them from their visual vantage point.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2024

Mode of access: World Wide Web

ISBN: 9798383162606Subjects--Topical Terms:

573171
Computer science.
Subjects--Index Terms:

PhotogrammetryIndex Terms--Genre/Form:

554714
Electronic books.

Human and AI Interpretations of Photogrammetrically Captured Scenes.
LDR:04812ntm a22003977 4500 001 1150187
005 20241022111610.5
006 m o d
007 cr bn ---uuuuu
008 250605s2024 xx obm 000 0 eng d
020 $a 9798383162606
035 $a (MiAaPQ)AAI31242135
035 $a AAI31242135
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Rubinstein, Jacob. $3 1476627
245 1 0 $a Human and AI Interpretations of Photogrammetrically Captured Scenes.
264 0 $c 2024
300 $a 1 online resource (78 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Masters Abstracts International, Volume: 85-12.
500 $a Advisor: Engel, Don.
502 $a Thesis (M.S.)--University of Maryland, Baltimore County, 2024.
504 $a Includes bibliographical references
520 $a 3D technologies are increasingly prevalent and powerful, fundamentally reshaping how we interpret and comprehend information. This additional modality changes the way both humans and AI perceive scenes they are shown and interact with. This work aims to explore this shift from multiple angles. Chapter 2 delves into the ramifications of three-dimensional space on AI agents, while Chapter 3 explores how humans can harness 3D techniques to enhance collaboration for the preservation of cultural heritage.Virtual reality is increasingly utilized to support embodied AI agents, such as robots, engaged in 'sim-to-real' based learning approaches. At the same time, tools such as large vision-and-language models offer new capabilities that tie into a wide variety of tasks and capabilities. In order to understand how such agents can learn from simulated environments, Chapter 2 explores a language model's ability to recover the type of object represented by a photorealistic 3D model as a function of the 3D perspective from which the model is viewed. We used photogrammetry to create 3D models of commonplace objects and rendered 2D images of these models from an fixed set of 420 virtual camera perspectives. A well-studied image and language model (CLIP) was used to generate text (i.e., prompts) corresponding to these images. Using multiple instances of various object classes, we studied which camera perspectives were most likely to return accurate text categorizations for each class of object.Affordable drones and geotagged photos have created many new opportunities for geospatial analysis, with divergent application domains such as historical preservation, national defense, and disaster response. In chapter 3, we analyze a series of group work tasks comprising a project to index a cemetery with incomplete records of its older sections, while noting that many of these group work tasks are agnostic to the application domain in question. To prepare for the group work, hundreds of images are captured by a pre-programmed flight of a consumer-grade quadcopter at low altitude. These images are then orthorectified to create a web-based map layer of sufficiently high resolution for group members to visually identify and annotate individual gravestones. Group members then visit the site in person and capture close-up and contextual geotagged photos using mobile phones. Contextual photos are framed such that their positions can be determined using the web-based map layer and visual landmarks. As on-site photos are captured, group members can work off-site to annotate the web-based map and link these annotations to a third-party website, findagrave.com, where they upload photos and type metadata (e.g., names, dates, notes). Gravestones and other positions of interest which require other on-site actions are marked as such on the map and group members return to the site to take these actions. Notably, group members can participate in any number of tasks within the workflow, and different phases of work can happen in parallel for different parts of the cemetery.Throughout this work, the focus is on understanding how a 2D image from a single perspective enables an agent (human or AI) to understand the 3D context of that image. The presence of key visual indicators - whether a stem of an apple or a tree behind a grave - is important for both humans and AI to comprehend the meaning afforded to them from their visual vantage point.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2024
538 $a Mode of access: World Wide Web
650 4 $a Computer science. $3 573171
650 4 $a Computer engineering. $3 569006
653 $a Photogrammetry
653 $a 3D technologies
653 $a Cultural heritage
653 $a Geospatial analysis
653 $a Mobile phones
655 7 $a Electronic books. $2 local $3 554714
690 $a 0984
690 $a 0800
690 $a 0464
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a University of Maryland, Baltimore County. $b Computer Science. $3 1179407
773 0 $t Masters Abstracts International $g 85-12.
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=31242135 $z click for full text (PQDT)