基于微博特定實(shí)體的關(guān)聯(lián)信息挖掘算法研究
[Abstract]:With the rise of web2.0 technology, Internet social applications, Weibo has gradually become an indispensable part of people's daily life. The explosion of Weibo results in an explosive increase in the amount of Weibo data. How to make use of huge Weibo data, how to obtain the required information from the massive data, how to mine and identify the associated information of entities, has become the focus of academic research at this stage. By analyzing the characteristics of Weibo, this paper puts forward the information mining system of specific entity object based on Weibo-micro-mail system, and retrieves the information from Weibo environment. Specific entity information mining and recommendation system based on inter-entity association are studied from shallow to deep. The main innovations and contributions of this paper lie in the following aspects: firstly, a query extension method based on resistance network model is proposed, which uses the resistance network model on the circuit system to simulate the inter-word relation network in text space. Use effective resistance to characterize the correlation between words. This method effectively simplifies the computation of complex word-to-word relationship networks. The results of Microblog Track evaluation proposed by TREC show that this method can obtain extended words that accord with the original query intention of users and improve the retrieval indexes. Secondly, on the basis of query extension, an extended word-to-word association mining algorithm based on word vitality model is proposed. By using the affinity density between words in the dynamic model of words, the relevance of extended word questions is calculated, the extended word pairs are obtained, and the extended word pairs are used for query reconstruction. The experimental data show that the extended word pair can effectively reduce the information offset caused by the extended word and obtain a good effect in the information mining of the entity object. Finally, a personalized recommendation system based on word activation model is designed and implemented, which is influenced by user's interest and environmental information. This system has achieved excellent results in the Contextual Suggestion Track evaluation of TREC, which fully demonstrates the validity of the word activation model in the mining of association between entities.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP393.092;TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前9條
1 董振東,董強(qiáng);知網(wǎng)和漢語研究[J];當(dāng)代語言學(xué);2001年01期
2 馬暉男;吳江寧;潘東華;;一種基于同義詞詞典的模糊查詢擴(kuò)展方法[J];大連理工大學(xué)學(xué)報(bào);2007年03期
3 魏曉寧;;基于隱馬爾科夫模型的中文分詞研究[J];電腦知識(shí)與技術(shù)(學(xué)術(shù)交流);2007年21期
4 韓立新,陳貴海,謝立;一個(gè)面向Internet的個(gè)性化信息檢索系統(tǒng)模型[J];電子學(xué)報(bào);2002年02期
5 高茂庭;王正歐;;一種基于雙詞關(guān)聯(lián)的文本特征選擇模型[J];計(jì)算機(jī)工程與應(yīng)用;2007年10期
6 鄒海山,吳勇,吳月珠,陳陣;中文搜索引擎中的中文信息處理技術(shù)[J];計(jì)算機(jī)應(yīng)用研究;2000年12期
7 董振東;董強(qiáng);郝長(zhǎng)伶;;知網(wǎng)的理論發(fā)現(xiàn)[J];中文信息學(xué)報(bào);2007年04期
8 劉海峰;王元元;張學(xué)仁;劉守生;;一種基于聚類和LSA相結(jié)合的文本特征降維方法[J];情報(bào)雜志;2008年02期
9 丁立愷;夏勇明;錢松榮;;基于詞關(guān)聯(lián)度的文本檢索系統(tǒng)[J];微型電腦應(yīng)用;2011年03期
,本文編號(hào):2467832
本文鏈接:http://www.wukwdryxk.cn/guanlilunwen/ydhl/2467832.html