a国产,中文字幕久久波多野结衣AV,欧美粗大猛烈老熟妇,女人av天堂

當(dāng)前位置:主頁 > 碩博論文 > 信息類碩士論文 >

基于深度學(xué)習(xí)的中文網(wǎng)絡(luò)衍生實(shí)體的識別與分類

發(fā)布時間:2019-04-16 08:40
【摘要】:隨著互聯(lián)網(wǎng)信息內(nèi)容的爆炸,網(wǎng)絡(luò)上充斥著大量的近音詞、縮略語、同義詞等非規(guī)范的中文表達(dá)。由于中文在組織與使用上的靈活性,大量的文本主體詞采用這些形式的衍生詞進(jìn)行表達(dá),這類主體詞被稱為網(wǎng)絡(luò)衍生實(shí)體。由于中文網(wǎng)絡(luò)衍生實(shí)體復(fù)雜多變,難以識別,并且常常被用來替換原詞語以規(guī)避政府的網(wǎng)絡(luò)輿情監(jiān)管,因此給自然語言處理及輿情監(jiān)控帶來了諸多困難。針對特定類別的衍生實(shí)體識別,雖然國內(nèi)外學(xué)者已有廣泛的探討和研究,卻至今沒有對網(wǎng)絡(luò)衍生實(shí)體的整體數(shù)據(jù)分布進(jìn)行研究;并且,大量的新的衍生實(shí)體不斷出現(xiàn),對網(wǎng)絡(luò)衍生實(shí)體的識別技術(shù)提出了新的要求。本文的主要工作如下:1)分別針對各類衍生實(shí)體的識別,對國內(nèi)外的解決方法進(jìn)行了研究和對比,分析了近年來主流識別模型的方法和技術(shù)的發(fā)展趨勢;通過對各方法的分析與總結(jié),指出各方法在實(shí)際應(yīng)用中的優(yōu)劣之處;同時,結(jié)合本文所研究的問題的特點(diǎn),提出采用基于深度學(xué)習(xí)的方法進(jìn)行中文網(wǎng)絡(luò)衍生實(shí)體識別的新思路。2)提出了兩種用于中文網(wǎng)絡(luò)衍生實(shí)體識別的神經(jīng)網(wǎng)絡(luò)架構(gòu):滑動窗口法和句子卷積法,從而解決了文本中句子長度不統(tǒng)一、無法輸入神經(jīng)網(wǎng)絡(luò)的問題;采用word2vec技術(shù)獲取模型輸入向量;同時,采用棧式自編碼器編碼人工特征向量,組成復(fù)合輸入以進(jìn)一步提高模型的識別效果;通過采用特殊的激活函數(shù)和訓(xùn)練算法,加速了模型的訓(xùn)練過程,進(jìn)一步優(yōu)化了模型的結(jié)構(gòu)。3)在構(gòu)建的語料庫基礎(chǔ)上,進(jìn)行了大量的對比實(shí)驗(yàn)。由于缺少開放語料庫,本文采用Scrapy爬蟲框架進(jìn)行語料的抓取(語料大小為252.3MB),并且通過人工標(biāo)注,完成了語料庫的構(gòu)建;針對該語料庫,進(jìn)行了大量的衍生實(shí)體識別測試,并比較了模型在各類實(shí)體識別上的結(jié)果差異;實(shí)驗(yàn)結(jié)果表明,本文所提出的兩種模型框架,均能夠有效地應(yīng)對網(wǎng)絡(luò)衍生實(shí)體識別的問題,其性能指標(biāo)F1值分別為78.6%和76.9%,并在各類實(shí)體的識別上各有所長,其結(jié)果均優(yōu)于采用傳統(tǒng)模型在該語料集上的識別效果;同時,通過研究不同參數(shù)、不同方法對實(shí)驗(yàn)結(jié)果的影響,得到了關(guān)于該模型的更一般的調(diào)參經(jīng)驗(yàn),為其他研究人員提供了參考。實(shí)踐表明,本文所提出的基于深度學(xué)習(xí)的神經(jīng)網(wǎng)絡(luò)實(shí)體識別模型,可以很好地應(yīng)用于中文網(wǎng)絡(luò)衍生實(shí)體的識別任務(wù)上來。該模型可以同時對各類衍生實(shí)體得到較好的識別性能,能夠滿足大數(shù)據(jù)背景下中文網(wǎng)絡(luò)衍生實(shí)體識別的新需求。
[Abstract]:With the explosion of Internet information content, the network is full of non-standard Chinese expressions such as close words, acronyms, synonyms and so on. Due to the flexibility in the organization and use of Chinese, a large number of text subject words are expressed by these forms of derivative words, which are called network-derived entities. Due to the complexity and variety of Chinese Internet derivative entities, which are difficult to identify, and are often used to replace the original words in order to evade the government's network public opinion supervision, it has brought many difficulties to natural language processing and public opinion monitoring. In view of the specific categories of derivative entity recognition, although domestic and foreign scholars have been extensively discussed and studied, there is no research on the overall data distribution of the network derivative entity up to now. Moreover, a large number of new derivative entities appear constantly, which puts forward new requirements for the identification technology of network derivative entities. The main work of this paper is as follows: 1) according to the identification of various derivative entities, this paper studies and compares the solutions at home and abroad, and analyzes the development trend of the mainstream identification model methods and technologies in recent years; Through the analysis and summary of each method, the advantages and disadvantages of each method in practical application are pointed out. At the same time, combined with the characteristics of the problems studied in this paper, A new idea of Chinese network derived entity recognition based on deep learning is proposed. 2) two neural network structures for Chinese network derived entity recognition are proposed: sliding window method and sentence convolutional method. Thus it solves the problem that sentence length is not uniform and can not be inputted into neural network in the text. The word2vec technology is used to obtain the input vector of the model, and the stack self-encoder is used to encode the artificial feature vector to make up the compound input to further improve the recognition effect of the model. Through the use of special activation function and training algorithm, the training process of the model is accelerated and the structure of the model is further optimized. 3) on the basis of the corpus, a lot of comparative experiments are carried out. Because of the lack of open corpus, this paper uses the Scrapy crawler framework to capture the corpus (the size of the corpus is 252.3MB), and completes the construction of the corpus through manual tagging. Based on the corpus, a large number of derived entity recognition tests are carried out, and the results of the model on various entity recognition are compared. The experimental results show that the two models proposed in this paper can effectively deal with the problem of identification of network derived entities, and their performance indices F1 are 78.6% and 76.9%, respectively, and have their own advantages in the identification of all kinds of entities. The results are better than the traditional models in the recognition of the corpus. At the same time, by studying the influence of different parameters and methods on the experimental results, more general experience of adjusting parameters for the model is obtained, which provides reference for other researchers. The practice shows that the neural network entity recognition model based on deep learning proposed in this paper can be applied to the identification task of Chinese network derived entities. This model can identify all kinds of derivative entities at the same time, and can meet the new requirements of Chinese network derived entity recognition under the background of big data.
【學(xué)位授予單位】:武漢大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文 前6條

1 郗亞輝;;產(chǎn)品評論挖掘中特征同義詞的識別[J];中文信息學(xué)報;2016年04期

2 張燕;張揚(yáng);孫茂松;;基于中文拼音輸入法數(shù)據(jù)的漢語方言詞匯自動識別[J];中文信息學(xué)報;2013年05期

3 彭春艷;張暉;包玲玉;陳昌平;;基于條件隨機(jī)域的生物命名實(shí)體識別[J];計(jì)算機(jī)工程;2009年22期

4 陸勇,侯漢清;用于信息檢索的同義詞自動識別及其進(jìn)展[J];南京農(nóng)業(yè)大學(xué)學(xué)報(社會科學(xué)版);2004年03期

5 張華平,劉群;基于角色標(biāo)注的中國人名自動識別研究[J];計(jì)算機(jī)學(xué)報;2004年01期

6 周強(qiáng);;基于語料庫和面向統(tǒng)計(jì)學(xué)的自然語言處理技術(shù)[J];計(jì)算機(jī)科學(xué);1995年04期

,

本文編號:2458638

資料下載
論文發(fā)表

本文鏈接:http://www.wukwdryxk.cn/shoufeilunwen/xixikjs/2458638.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶17d26***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
公和我做好爽添厨房在线观看| 女人高潮抽搐喷液30分钟视频| 夜色影视| 国产精品99久久久久久WWW | 97香蕉超级碰碰碰久久兔费| 国产欧美日韩亚洲精品区| 哈巴河县| 成人爱爱| 国产成人精品一区二三区| 精品久久久久久无码人妻热| 伊人无码一区二区三区| 蜜桃4| 欧美日韩一本| 人人妻人人妻人人片AV| 精品国际久久久久999波多野| 久久精品国产亚洲AV果冻传媒 | 安义县| 欧美精品色| 色老头oldmanⅴideos| 国产午夜精品一区二区三区视频 | 国产精品亚洲А∨天堂免| 久久精品一区二区三区不卡牛牛 | 日日影院| 亚洲国产精品ⅤA在线播放| 亚洲啪啪综合AV一区| 午夜精品久久久久久久99热| 国产美女MM131爽爽爽| 亚洲色偷偷色噜噜狠狠99| 欧美性猛交xxxx乱大交3| 夫妻开放交友网站| japanesematur丰满| 我要看毛片| 久久亚洲精品无码爱剪辑| 国产亚洲美女精品久久久久| 久久久久亚洲精品中文字幕 | 国产美女遭强高潮网站| 伽师县| 男人J桶进女人P无遮挡全过程| 亚洲av色综成人网77777| 精品国产成人一区二区| 少妇精品导航|