國立虎尾科技大學 |

Statistical Methods for Annotation Analysis

紀錄類型:	書目-語言資料,印刷品 : Monograph/item
正題名/作者:	Statistical Methods for Annotation Analysis/ by Silviu Paun, Ron Artstein, Massimo Poesio.
作者:	Paun, Silviu.
其他作者:	Artstein, Ron.
面頁冊數:	XIX, 197 p.online resource. :
Contained By:	Springer Nature eBook
標題:	Artificial intelligence. -
電子資源:	https://doi.org/10.1007/978-3-031-03763-4
ISBN:	9783031037634

Statistical Methods for Annotation Analysis
Paun, Silviu.

Statistical Methods for Annotation Analysis[electronic resource] /by Silviu Paun, Ron Artstein, Massimo Poesio. - 1st ed. 2022. - XIX, 197 p.online resource. - Synthesis Lectures on Human Language Technologies,1947-4059. - Synthesis Lectures on Human Language Technologies,.

Preface -- Acknowledgements -- Introduction -- Coefficients of Agreement -- Using Agreement Measures for CL Annotation Tasks -- Probabilistic Models of Agreement -- Probabilistic Models of Annotation -- Learning from Multi-Annotated Corpora -- Bibliography -- Authors' Biographies.

Labelling data is one of the most fundamental activities in science, and has underpinned practice, particularly in medicine, for decades, as well as research in corpus linguistics since at least the development of the Brown corpus. With the shift towards Machine Learning in Artificial Intelligence (AI), the creation of datasets to be used for training and evaluating AI systems, also known in AI as corpora, has become a central activity in the field as well. Early AI datasets were created on an ad-hoc basis to tackle specific problems. As larger and more reusable datasets were created, requiring greater investment, the need for a more systematic approach to dataset creation arose to ensure increased quality. A range of statistical methods were adopted, often but not exclusively from the medical sciences, to ensure that the labels used were not subjective, or to choose among different labels provided by the coders. A wide variety of such methods is now in regular use. This book is meant to provide a survey of the most widely used among these statistical methods supporting annotation practice. As far as the authors know, this is the first book attempting to cover the two families of methods in wider use. The first family of methods is concerned with the development of labelling schemes and, in particular, ensuring that such schemes are such that sufficient agreement can be observed among the coders. The second family includes methods developed to analyze the output of coders once the scheme has been agreed upon, particularly although not exclusively to identify the most likely label for an item among those provided by the coders. The focus of this book is primarily on Natural Language Processing, the area of AI devoted to the development of models of language interpretation and production, but many if not most of the methods discussed here are also applicable to other areas of AI, or indeed, to other areas of Data Science.

ISBN: 9783031037634

Standard No.: 10.1007/978-3-031-03763-4doiSubjects--Topical Terms:

559380
Artificial intelligence.

LC Class. No.: Q334-342

Dewey Class. No.: 006.3

Statistical Methods for Annotation Analysis
LDR:03600nam a22003975i 4500 001 1086958
003 DE-He213
005 20220712000838.0
007 cr nn 008mamaa
008 221228s2022 sz | s |||| 0|eng d
020 $a 9783031037634 $9 978-3-031-03763-4
024 7 $a 10.1007/978-3-031-03763-4 $2 doi
035 $a 978-3-031-03763-4
050 4 $a Q334-342
050 4 $a TA347.A78
072 7 $a UYQ $2 bicssc
072 7 $a COM004000 $2 bisacsh
072 7 $a UYQ $2 thema
082 0 4 $a 006.3 $2 23
100 1 $a Paun, Silviu. $e author. $4 aut $4 http://id.loc.gov/vocabulary/relators/aut $3 1393854
245 1 0 $a Statistical Methods for Annotation Analysis $h [electronic resource] / $c by Silviu Paun, Ron Artstein, Massimo Poesio.
250 $a 1st ed. 2022.
264 1 $a Cham : $b Springer International Publishing : $b Imprint: Springer, $c 2022.
300 $a XIX, 197 p. $b online resource.
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
347 $a text file $b PDF $2 rda
490 1 $a Synthesis Lectures on Human Language Technologies, $x 1947-4059
505 0 $a Preface -- Acknowledgements -- Introduction -- Coefficients of Agreement -- Using Agreement Measures for CL Annotation Tasks -- Probabilistic Models of Agreement -- Probabilistic Models of Annotation -- Learning from Multi-Annotated Corpora -- Bibliography -- Authors' Biographies.
520 $a Labelling data is one of the most fundamental activities in science, and has underpinned practice, particularly in medicine, for decades, as well as research in corpus linguistics since at least the development of the Brown corpus. With the shift towards Machine Learning in Artificial Intelligence (AI), the creation of datasets to be used for training and evaluating AI systems, also known in AI as corpora, has become a central activity in the field as well. Early AI datasets were created on an ad-hoc basis to tackle specific problems. As larger and more reusable datasets were created, requiring greater investment, the need for a more systematic approach to dataset creation arose to ensure increased quality. A range of statistical methods were adopted, often but not exclusively from the medical sciences, to ensure that the labels used were not subjective, or to choose among different labels provided by the coders. A wide variety of such methods is now in regular use. This book is meant to provide a survey of the most widely used among these statistical methods supporting annotation practice. As far as the authors know, this is the first book attempting to cover the two families of methods in wider use. The first family of methods is concerned with the development of labelling schemes and, in particular, ensuring that such schemes are such that sufficient agreement can be observed among the coders. The second family includes methods developed to analyze the output of coders once the scheme has been agreed upon, particularly although not exclusively to identify the most likely label for an item among those provided by the coders. The focus of this book is primarily on Natural Language Processing, the area of AI devoted to the development of models of language interpretation and production, but many if not most of the methods discussed here are also applicable to other areas of AI, or indeed, to other areas of Data Science.
650 0 $a Artificial intelligence. $3 559380
650 0 $a Natural language processing (Computer science). $3 802180
650 0 $a Computational linguistics. $3 555811
650 1 4 $a Artificial Intelligence. $3 646849
650 2 4 $a Natural Language Processing (NLP). $3 1254293
650 2 4 $a Computational Linguistics. $3 670080
700 1 $a Artstein, Ron. $e author. $4 aut $4 http://id.loc.gov/vocabulary/relators/aut $3 1393855
700 1 $a Poesio, Massimo. $4 aut $4 http://id.loc.gov/vocabulary/relators/aut $3 1112357
710 2 $a SpringerLink (Online service) $3 593884
773 0 $t Springer Nature eBook
776 0 8 $i Printed edition: $z 9783031037733
776 0 8 $i Printed edition: $z 9783031037535
776 0 8 $i Printed edition: $z 9783031037832
830 0 $a Synthesis Lectures on Human Language Technologies, $x 1947-4059 $3 1389817
856 4 0 $u https://doi.org/10.1007/978-3-031-03763-4
912 $a ZDB-2-SXSC
950 $a Synthesis Collection of Technology (R0) (SpringerNature-85007)