國立虎尾科技大學 |

登入

回首頁

語義導向的自然語言和視覺串聯基於小物件偵測的跨模態互動 = = Sema...

黃獻弘

語義導向的自然語言和視覺串聯基於小物件偵測的跨模態互動 = = Semantic-Guided Natural Language and Visual Fusion for Cross-Modal Interaction Based on Small Object Detection /

紀錄類型:	書目-語言資料,印刷品 : Monograph/item
正題名/作者:	語義導向的自然語言和視覺串聯基於小物件偵測的跨模態互動 =/ 黃獻弘.
其他題名:	Semantic-Guided Natural Language and Visual Fusion for Cross-Modal Interaction Based on Small Object Detection /
其他題名:	Semantic-Guided Natural Language and Visual Fusion for Cross-Modal Interaction Based on Small Object Detection.
作者:	黃獻弘
出版者:	雲林縣 :國立虎尾科技大學 , : 民113.07.,
面頁冊數:	[10], 61面 :圖, 表 ; : 30公分.;
附註:	指導教授: 蘇暉凱, 宋啟嘉.
標題:	Small object detection. -
電子資源:	電子資源

語義導向的自然語言和視覺串聯基於小物件偵測的跨模態互動 = = Semantic-Guided Natural Language and Visual Fusion for Cross-Modal Interaction Based on Small Object Detection /
黃獻弘

語義導向的自然語言和視覺串聯基於小物件偵測的跨模態互動 =Semantic-Guided Natural Language and Visual Fusion for Cross-Modal Interaction Based on Small Object Detection /Semantic-Guided Natural Language and Visual Fusion for Cross-Modal Interaction Based on Small Object Detection.黃獻弘. - 初版. - 雲林縣 :國立虎尾科技大學 ,民113.07. - [10], 61面 :圖, 表 ;30公分.

指導教授: 蘇暉凱, 宋啟嘉.

碩士論文--國立虎尾科技大學電機工程系碩士班.

含參考書目.

自2017年Google提出Transformer模型架構以來，人工智慧領域取得了顯著的進展。本研究利用微調技術將基於Transformer架構的BERT預訓練模型與CNN架構的PRB-FPN-Net影像辨識模型相結合。通過微調技術對這兩個模型進行了訓練，並利用WordNetLemmatizer演算法對提取到的詞進行還原，從而使詞句變為無詞綴的形式。這一步驟能夠有效地提高模型對自然語言資料的理解能力。為了驗證研究方法的有效性，本研究使用COCO和Objects365兩個不同規格的訓練和驗證資料集，並對PRB-FPN-Net和BERT模型進行預訓練，最終結合這兩個模型，開發出一個能夠根據輸入描述準確進行指定物件偵測的模型。實驗結果顯示，該模型在物件偵測的準確性和效率方面均優於其他方法，為多模態互動領域提供了一種新方案。與YOLO-World相比，該模型在COCO2017驗證資料集上的精度指標顯著提高（52.6 vs 45.8），特別是在未使用其他大數據集進行微調的情況下，展示出明顯的準確性優勢。雖然在性能指標上略遜於採用Transformer + Transformer架構的GLIP，但在資源消耗方面，參數量減少了一半，仍能在資源有限的場景下維持一定的推理速度。綜合來看，本研究提出了一種基於Transformer和CNN架構的跨模態互動方法，並在自然語言理解與視覺感知方面實現了卓越的效果。.

(平裝)Subjects--Topical Terms:

1451890
Small object detection.

語義導向的自然語言和視覺串聯基於小物件偵測的跨模態互動 = = Semantic-Guided Natural Language and Visual Fusion for Cross-Modal Interaction Based on Small Object Detection /
LDR:03701cam a2200241 i 4500 001 1129893
008 241015s2024 ch ak erm 000 0 chi d
035 $a (THES)112NYPI0441018
040 $a NFU $b chi $c NFU $e CCR
041 0 # $a chi $b chi $b eng
084 $a 008.165M $b 4421:3 113 $2 ncsclt
100 1 $a 黃獻弘 $3 1448935
245 1 0 $a 語義導向的自然語言和視覺串聯基於小物件偵測的跨模態互動 = $b Semantic-Guided Natural Language and Visual Fusion for Cross-Modal Interaction Based on Small Object Detection / $c 黃獻弘.
246 1 1 $a Semantic-Guided Natural Language and Visual Fusion for Cross-Modal Interaction Based on Small Object Detection.
250 $a 初版.
260 # $a 雲林縣 : $b 國立虎尾科技大學 , $c 民113.07.
300 $a [10], 61面 : $b 圖, 表 ; $c 30公分.
500 $a 指導教授: 蘇暉凱, 宋啟嘉.
500 $a 學年度: 112.
502 $a 碩士論文--國立虎尾科技大學電機工程系碩士班.
504 $a 含參考書目.
520 3 $a 自2017年Google提出Transformer模型架構以來，人工智慧領域取得了顯著的進展。本研究利用微調技術將基於Transformer架構的BERT預訓練模型與CNN架構的PRB-FPN-Net影像辨識模型相結合。通過微調技術對這兩個模型進行了訓練，並利用WordNetLemmatizer演算法對提取到的詞進行還原，從而使詞句變為無詞綴的形式。這一步驟能夠有效地提高模型對自然語言資料的理解能力。為了驗證研究方法的有效性，本研究使用COCO和Objects365兩個不同規格的訓練和驗證資料集，並對PRB-FPN-Net和BERT模型進行預訓練，最終結合這兩個模型，開發出一個能夠根據輸入描述準確進行指定物件偵測的模型。實驗結果顯示，該模型在物件偵測的準確性和效率方面均優於其他方法，為多模態互動領域提供了一種新方案。與YOLO-World相比，該模型在COCO2017驗證資料集上的精度指標顯著提高（52.6 vs 45.8），特別是在未使用其他大數據集進行微調的情況下，展示出明顯的準確性優勢。雖然在性能指標上略遜於採用Transformer + Transformer架構的GLIP，但在資源消耗方面，參數量減少了一半，仍能在資源有限的場景下維持一定的推理速度。綜合來看，本研究提出了一種基於Transformer和CNN架構的跨模態互動方法，並在自然語言理解與視覺感知方面實現了卓越的效果。.
520 3 $a Since its introduction in 2017, the Transformer model architecture has revolutionized artificial intelligence. This study leverages the BERT pre-trained model, based on the Transformer architecture, alongside the CNN-based PRB-FPN-Net image recognition model. Through fine-tuning both models and applying WordNetLemmatizer for lemmatization, the study transforms extracted words into their root forms, enhancing the model's grasp of natural language information. To validates a cross-modal interaction method by using the COCO and Objects365 datasets for training and validation. The PRB-FPN-Net and BERT models were pre-trained and combined, resulting in a model that accurately detects specified objects based on input descriptions. The model outperforms other methods in object detection accuracy and efficiency, particularly improving precision on the COCO2017 validation set (52.6 vs. 45.8) compared to YOLO-World. While slightly less accurate than the Transformer-based GLIP, it reduces parameter consumption by half, maintaining inference speed in resource-limited scenarios. Overall, this study presents a novel approach that excels in natural language understanding and visual perception..
563 $a (平裝)
650 # 4 $a Small object detection. $3 1451890
650 # 4 $a WordNetLemmatizer algorithm. $3 1451889
650 # 4 $a Multimodal interaction. $3 1451888
650 # 4 $a PRB-FPN-Net. $3 1451887
650 # 4 $a BERT. $3 1420167
650 # 4 $a 小物件偵測. $3 1451886
650 # 4 $a WordNetLemmatizer演算法. $3 1451885
650 # 4 $a 多模態互動. $3 1451884
856 7 # $u https://handle.ncl.edu.tw/11296/u7bhvz $z 電子資源 $2 http