國立虎尾科技大學 |

General Purpose and Interactive Video Analytics.

紀錄類型:	書目-語言資料,手稿 : Monograph/item
正題名/作者:	General Purpose and Interactive Video Analytics./
作者:	Llamas, Francisco Alejandro Romero.
面頁冊數:	1 online resource (111 pages)
附註:	Source: Dissertations Abstracts International, Volume: 85-07, Section: B.
Contained By:	Dissertations Abstracts International85-07B.
標題:	Emergency communications systems. -
電子資源:	click for full text (PQDT)
ISBN:	9798381020724

General Purpose and Interactive Video Analytics.
Llamas, Francisco Alejandro Romero.

General Purpose and Interactive Video Analytics. - 1 online resource (111 pages)

Source: Dissertations Abstracts International, Volume: 85-07, Section: B.

Thesis (Ph.D.)--Stanford University, 2023.

Includes bibliographical references

The proliferation of video collections and the increased capabilities of machine learning models have led to a growing desire for video analytics - the process of extracting insights from video. These two trends have made automatic and meaningful analysis of video increasingly feasible, allowing users to answer queries such as "how many birds of a particular species visit a feeder per day" or "do any cars that passed an intersection match an AMBER alert." Despite these advances, video today cannot be explored as practically and as performant as structured data. Exploring video today requires significant time and expertise for optimizing queries to meet performance, cost, and accuracy goals. This thesis focuses on the design of a general purpose video analytics database management system that allows users to query videos as easily, interactively, and cost-efficiently as querying structured data with scale-out systems like Spark SQL and PrestoDB. To reach this vision, we need to address challenges across three areas: systems (performance and cost), databases (automated optimization), and artificial intelligence (ease-of-use).We first focus on the systems challenges for how to improve the latency and resource efficiency of executing directed acyclic graphs of machine learning models for video analysis. The latency and resource efficiency of these directed acyclic graphs can be optimized using configurable knobs for each operation (e.g., batch size or type of hardware used). However, determining efficient configurations is challenging because (a) the configuration search space is exponentially large, (b) the optimal configuration depends on users' desired latency and cost targets, and (c) input video contents may exercise different paths in the directed acyclic graph and produce a variable amount of intermediate results. We present Llama: a heterogeneous and serverless framework for video processing. Given an end-to-end latency target, Llama optimizes for cost efficiency by (a) calculating a latency target for each operation invocation, and (b) dynamically running a cost-based optimizer to assign configurations across heterogeneous hardware that best meet the calculated per-invocation latency target. Compared to state-of-the-art cluster and serverless video analytics and processing systems, Llama achieves 7.8x lower latency and 16x cost reduction on average.Given the high cost of processing frames using expensive models, we then focus on query optimization. While researchers have proposed optimizations such as selectively using faster but less accurate models to replace or filter frames for expensive models, users today must manually explore how and when these optimizations should be applied. This is especially difficult for complex queries with multiple predicates and models. We propose Relational Hints, a declarative interface that allows users to suggest ML model relationships based on domain knowledge. Users can express two key relationships: when a model can replace another (CAN REPLACE) and when a model can be used to filter frames for another (CAN FILTER). We then present the VIVA video analytics system that uses relational hints to optimize SQL queries on video datasets. VIVA automatically selects and validates the hints applicable to the query, generates possible query plans using a formal set of transformations, and finds the best performance plan that meets a user's accuracy requirements. Using VIVA, we show that hints improve performance up to 16.6x without sacrificing accuracy.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2024

Mode of access: World Wide Web

ISBN: 9798381020724Subjects--Topical Terms:

1413644
Emergency communications systems.
Index Terms--Genre/Form:

554714
Electronic books.

General Purpose and Interactive Video Analytics.
LDR:04793ntm a22003377 4500 001 1148753
005 20240930100137.5
006 m o d
007 cr bn ---uuuuu
008 250605s2023 xx obm 000 0 eng d
020 $a 9798381020724
035 $a (MiAaPQ)AAI30726920
035 $a (MiAaPQ)STANFORDzm860jk5533
035 $a AAI30726920
040 $a MiAaPQ $b eng $c MiAaPQ $d NTU
100 1 $a Llamas, Francisco Alejandro Romero. $3 1474797
245 1 0 $a General Purpose and Interactive Video Analytics.
264 0 $c 2023
300 $a 1 online resource (111 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertations Abstracts International, Volume: 85-07, Section: B.
500 $a Advisor: Kozyrakis, Christos;Rosenblum, Mendel;Trippel, Caroline.
502 $a Thesis (Ph.D.)--Stanford University, 2023.
504 $a Includes bibliographical references
520 $a The proliferation of video collections and the increased capabilities of machine learning models have led to a growing desire for video analytics - the process of extracting insights from video. These two trends have made automatic and meaningful analysis of video increasingly feasible, allowing users to answer queries such as "how many birds of a particular species visit a feeder per day" or "do any cars that passed an intersection match an AMBER alert." Despite these advances, video today cannot be explored as practically and as performant as structured data. Exploring video today requires significant time and expertise for optimizing queries to meet performance, cost, and accuracy goals. This thesis focuses on the design of a general purpose video analytics database management system that allows users to query videos as easily, interactively, and cost-efficiently as querying structured data with scale-out systems like Spark SQL and PrestoDB. To reach this vision, we need to address challenges across three areas: systems (performance and cost), databases (automated optimization), and artificial intelligence (ease-of-use).We first focus on the systems challenges for how to improve the latency and resource efficiency of executing directed acyclic graphs of machine learning models for video analysis. The latency and resource efficiency of these directed acyclic graphs can be optimized using configurable knobs for each operation (e.g., batch size or type of hardware used). However, determining efficient configurations is challenging because (a) the configuration search space is exponentially large, (b) the optimal configuration depends on users' desired latency and cost targets, and (c) input video contents may exercise different paths in the directed acyclic graph and produce a variable amount of intermediate results. We present Llama: a heterogeneous and serverless framework for video processing. Given an end-to-end latency target, Llama optimizes for cost efficiency by (a) calculating a latency target for each operation invocation, and (b) dynamically running a cost-based optimizer to assign configurations across heterogeneous hardware that best meet the calculated per-invocation latency target. Compared to state-of-the-art cluster and serverless video analytics and processing systems, Llama achieves 7.8x lower latency and 16x cost reduction on average.Given the high cost of processing frames using expensive models, we then focus on query optimization. While researchers have proposed optimizations such as selectively using faster but less accurate models to replace or filter frames for expensive models, users today must manually explore how and when these optimizations should be applied. This is especially difficult for complex queries with multiple predicates and models. We propose Relational Hints, a declarative interface that allows users to suggest ML model relationships based on domain knowledge. Users can express two key relationships: when a model can replace another (CAN REPLACE) and when a model can be used to filter frames for another (CAN FILTER). We then present the VIVA video analytics system that uses relational hints to optimize SQL queries on video datasets. VIVA automatically selects and validates the hints applicable to the query, generates possible query plans using a formal set of transformations, and finds the best performance plan that meets a user's accuracy requirements. Using VIVA, we show that hints improve performance up to 16.6x without sacrificing accuracy.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2024
538 $a Mode of access: World Wide Web
650 4 $a Emergency communications systems. $3 1413644
650 4 $a Computer science. $3 573171
655 7 $a Electronic books. $2 local $3 554714
690 $a 0984
690 $a 0800
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a Stanford University. $3 1184533
773 0 $t Dissertations Abstracts International $g 85-07B.
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30726920 $z click for full text (PQDT)