a国产,中文字幕久久波多野结衣AV,欧美粗大猛烈老熟妇,女人av天堂

當前位置:主頁 > 文藝論文 > 廣告藝術論文 >

微博熱點話題檢測與跟蹤技術研究

發(fā)布時間:2018-10-23 20:31
【摘要】:話題檢測與跟蹤是指從海量數(shù)據(jù)中發(fā)現(xiàn)被最多討論的話題并在后續(xù)信息中跟進話題的發(fā)展變化狀態(tài),為人們解決愈發(fā)嚴重的信息爆炸問題。話題檢測與跟蹤可以節(jié)省用戶時間,跟進事件發(fā)展動態(tài);為輿情監(jiān)控提供數(shù)據(jù)支持,有重要的實際價值和安全意義。隨著越來越多的用戶使用微博進行信息發(fā)布和話題討論,熱點話題展示也逐漸變成微博平臺的一個重要功能。由于微博的即時性很強,突發(fā)新聞在微博上的傳播速度很快,而且對于影響力較大的新聞事件,參與報道、轉發(fā)、評論的用戶數(shù)量也很大,往往能夠先于傳統(tǒng)新聞媒體做出反應。因此,針對微博的特點,本文通過過濾無效微博,設計并實現(xiàn)了一種針對微博的熱點話題跟蹤及檢測方法,主要工作如下:1)分析了微博特性,過濾了無效微博。微博用戶人群復雜,涵蓋范圍廣,差別大,內容駁雜。通過分析微博用戶特征,包括用戶粉絲數(shù)與用戶每日發(fā)布微博數(shù),過濾廣告用戶與僵尸用戶;通過分析微博內容,過濾商家推廣活動,與用戶分享內容,用戶參與的活動等大量對話題無貢獻的微博;通過分析分詞后的微博數(shù)據(jù),過濾包含詞數(shù)過多和過少的微博,去除無意義的過短文本,和重復過多的過長文本,有效過濾無效微博,降低計算復雜度。2)設計并實現(xiàn)了基于時間特性的微博熱點話題檢測算法。將微博按時間遞增順序處理,通過改進Single-Pass聚類算法,包括相似度計算方法的改進,結合用戶影響力的話題向量更新方法的改進,進行初步話題檢測;利用FP-Growth頻繁項集發(fā)現(xiàn)算法,挖掘頻繁特征詞集,修正SP算法的錯誤;利用改進的K-MEDOIDS算法對頻繁特征詞集進行聚類,抽取最終話題,提高了計算效率與話題檢測的準確率。3)設計并實現(xiàn)了基于時間特性的多查詢向量自適應話題跟蹤算法。基于微博數(shù)量在時間維度上的分布特征,將微博按時段分組,并按時間遞增順序處理;將每個時段的話題與已存在所有話題組的所有話題進行相似度計算對比,根據(jù)閾值選擇將其歸入已存在話題組或創(chuàng)建新的話題組,自適應更改加入話題組的話題向量。有效的跟蹤話題發(fā)展狀態(tài),提高了準確率,減少了話題漂移。
[Abstract]:Topic detection and tracking is to find the most discussed topic from the massive data and follow up the development and change of the topic in the follow-up information to solve the increasingly serious problem of information explosion for people. Topic detection and tracking can save user time, follow up the development of events, and provide data support for public opinion monitoring, which has important practical value and security significance. As more and more users use Weibo to publish information and discuss topics, hot topic display has gradually become an important function of Weibo platform. Because Weibo's immediacy is very strong, breaking news spreads very quickly on Weibo, and the number of users who participate in reporting, forwarding, and commenting on news events with great influence is also very large. It is often possible to react before the traditional news media. Therefore, according to the characteristics of Weibo, this paper designs and implements a method of tracking and detecting hot topics for Weibo by filtering invalid Weibo. The main work is as follows: 1) analyzing the characteristics of Weibo, filtering the invalid Weibo. Weibo user crowd is complex, covers a wide range, the difference is big, the content is complicated. By analyzing Weibo's user characteristics, including the number of users' fans and the number of users issuing Weibo daily, filtering advertising users and zombie users, analyzing the content of Weibo, filtering merchants' promotional activities, and sharing content with users, Weibo, who has no contribution to the topic, participated in a large number of activities such as user participation. By analyzing the Weibo data after the participle, he filtered too many words and too few words to remove meaningless and too short text, and repeated too many long texts. Effectively filter invalid Weibo, reduce the computational complexity. 2) designed and implemented the algorithm based on the time characteristics of Weibo hot topic detection. Weibo is processed in the order of increasing time, by improving the Single-Pass clustering algorithm, including the improvement of similarity calculation method, combining with the improvement of the topic vector updating method of user's influence, the preliminary topic detection is carried out, and the FP-Growth frequent itemset discovery algorithm is used. Mining frequent feature word sets, correcting errors of SP algorithm, clustering frequent feature words set with improved K-MEDOIDS algorithm, extracting final topic, The computational efficiency and the accuracy of topic detection are improved. 3) A multi-query vector adaptive topic tracking algorithm based on time characteristic is designed and implemented. On the basis of the distribution of Weibo's quantity in time dimension, Weibo is grouped according to the period of time and processed in the order of increasing time, and the similarity calculation between the topics of each time period and all the topics that already exist in all the topic groups is compared. According to the threshold selection, the topic vector is changed adaptively to the existing topic group or to create a new topic group. Tracking the status of topic development effectively improves the accuracy and reduces the topic drift.
【學位授予單位】:東南大學
【學位級別】:碩士
【學位授予年份】:2016
【分類號】:TP391.1;TP393.092

【參考文獻】

相關期刊論文 前5條

1 周剛;鄒鴻程;熊小兵;黃永忠;;MB-SinglePass:基于組合相似度的微博話題檢測[J];計算機科學;2012年10期

2 廉捷;周欣;曹偉;劉云;;新浪微博數(shù)據(jù)挖掘方案[J];清華大學學報(自然科學版);2011年10期

3 張輝;周敬民;王亮;趙莉萍;;基于三維文檔向量的自適應話題追蹤器模型[J];中文信息學報;2010年05期

4 洪宇;張宇;劉挺;李生;;話題檢測與跟蹤的評測及研究綜述[J];中文信息學報;2007年06期

5 王會珍;朱靖波;季鐸;葉娜;張斌;;基于反饋學習自適應的中文話題追蹤[J];中文信息學報;2006年03期

,

本文編號:2290384

資料下載
論文發(fā)表

本文鏈接:http://www.wukwdryxk.cn/wenyilunwen/guanggaoshejilunwen/2290384.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權申明:資料由用戶4e630***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
日日澡夜夜澡人人高潮| 国产真实91东北熟妇hdxxx| av无码精品一区二区三区三级| 女人性做爰免费网站| 亚洲人成网站18禁止久久影院| 精品人妻VA出轨中文字幕| 99精品久久99久久久久胖女人| 免费看片AV免费大片| 美女扒开下面让男生桶白浆| 亚洲av无码专区国产乱码app| 亚色av| 中文字幕无码第1页| 亚洲精品TV久久久久久久久久| 国语对白嫖老妇胖老太| 内射极品少妇| 无码任你躁久久久久久老妇| 无码av专区丝袜专区| 亚洲av无码不卡一区二区三区| 东京热一区二区三区无码视频 | av免费电影| 亚洲首页国产精品丝袜| 国产中文字幕乱人伦在线观看| 一区二三国产好的精华液品牌 | 精品久久久久久久无码| 亚洲午夜久久久精品影院| 亚洲情a成黄在线观看动漫尤物 | 无遮挡激情视频国产在线观看| 麻豆人妻少妇精品无码专区| 国内少妇偷人精品视频免费| 人人模人人爽人人喊久久| 欧美贵妇videos性办公室| 另类ts人妖一区二区三区| 国内精品免费视频自在线拍| 国产精品亚洲片在线观看不卡| 久久精品国产曰本波多野结衣| 狠狠久久五月精品中文字幕| 麻豆精产国品一二三产区区别免费| 狼友网精品视频在线观看| 亚洲国产成人精品综合av| 97超碰人人| 亚洲午夜精品久久久久久久久久久久|