a国产,中文字幕久久波多野结衣AV,欧美粗大猛烈老熟妇,女人av天堂

當(dāng)前位置:主頁 > 科技論文 > 自動化論文 >

基于代價敏感方法的垃圾網(wǎng)頁欺詐檢測

發(fā)布時間:2018-05-30 23:20

  本文選題:垃圾網(wǎng)頁檢測 + 代價敏感學(xué)習(xí) ; 參考:《西南交通大學(xué)》2017年碩士論文


【摘要】:隨著近20年互聯(lián)網(wǎng)技術(shù)的急速發(fā)展,各式各樣的網(wǎng)站和Web應(yīng)用層出不窮,這些網(wǎng)站的出現(xiàn)給人們的生活帶來了便利。與此同時,作為互聯(lián)網(wǎng)發(fā)展的副產(chǎn)品,網(wǎng)上也存在大量含有詐騙或有害信息的垃圾網(wǎng)頁,這些被作弊者散布的垃圾網(wǎng)頁嚴(yán)重地危害著上網(wǎng)者的利益。如何準(zhǔn)確地識別和檢測這些垃圾網(wǎng)頁是當(dāng)前研究者所關(guān)注的熱點之一。本文首先從垃圾網(wǎng)頁二元分類檢測入手,研究當(dāng)垃圾網(wǎng)頁和正常網(wǎng)頁被錯分后產(chǎn)生的不同代價,采用了基于代價敏感支持向量機的檢測方法。在引入代價敏感方法后,針對很多方案中需要人為指定代價的問題,基于粒子群優(yōu)化算法構(gòu)建了融合代價計算的垃圾網(wǎng)頁檢測框架。具體做法是把代價敏感支持向量機包裝為粒子群算法的適應(yīng)函數(shù),其中代價敏感分類的代價參數(shù)作為粒子群算法的尋優(yōu)問題,分類算法的AUC值作為適應(yīng)函數(shù)的輸出。以此既保證了分類檢測的性能又降低了人為因素對算法的影響。其次,本文研究了多級垃圾網(wǎng)頁檢測問題,多級檢測相比二分檢測更加細(xì)粒度,要求垃圾網(wǎng)頁按不同危害度被檢出。本文基于代價敏感支持向量機的“一對一”組合多元分類方法實現(xiàn)了多級垃圾網(wǎng)頁檢測,“一對一”組合多分類方法既保證了檢測性能,又避免了代價矩陣中代價融合的問題。之后同樣結(jié)合粒子群優(yōu)化算法,對多個誤分類代價進行計算。本文基于UK2007垃圾網(wǎng)頁數(shù)據(jù)集的原始類標(biāo)數(shù)據(jù),構(gòu)建了 MC-UK2007三類別的新數(shù)據(jù)集。之后分別使用UK2007和MC-UK2007進行融合代價計算的二分類和多分類檢測實驗,并應(yīng)用其他算法設(shè)置了多組實驗進行對比。實驗結(jié)果顯示本文所提的兩個方法均能取得更優(yōu)的AUC值,表明本文方法能夠更有效地檢出垃圾網(wǎng)頁。
[Abstract]:With the rapid development of Internet technology in recent 20 years, a variety of websites and Web applications emerge in endlessly. The emergence of these websites brings convenience to people's lives. At the same time, as a by-product of the development of the Internet, there are also a large number of spam pages containing fraud or harmful information on the Internet. These spam pages spread by cheaters seriously harm the interests of Internet users. How to accurately identify and detect these spam pages is one of the hot topics that researchers pay attention to. This paper starts with the binary classification detection of garbage pages, studies the different costs when garbage pages and normal pages are misclassified, and adopts a cost-sensitive support vector machine based detection method. After introducing the cost sensitive method, aiming at the problem of artificial specified cost in many schemes, a garbage page detection framework based on particle swarm optimization (PSO) algorithm is proposed. The specific method is to package the cost sensitive support vector machine as the adaptive function of the particle swarm optimization algorithm, in which the cost parameters of the cost sensitive classification are taken as the optimization problem of the particle swarm optimization algorithm, and the AUC value of the classification algorithm is taken as the output of the fitness function. This not only ensures the performance of classification and detection, but also reduces the influence of human factors on the algorithm. Secondly, this paper studies the problem of multilevel garbage page detection. Multilevel detection is more fine-grained than binary detection, which requires garbage pages to be detected according to different hazards. In this paper, the "one to one" multivariate classification method based on the cost sensitive support vector machine is used to realize multilevel spam page detection. The "one to one" combined multiple classification method not only guarantees the detection performance, but also avoids the problem of cost fusion in the cost matrix. After that, the cost of multiple misclassification is calculated with particle swarm optimization (PSO). Based on the original class mark data of UK2007 garbage page data set, this paper constructs a new data set of three categories of MC-UK2007. After that, UK2007 and MC-UK2007 are used to carry out two-classification and multi-classification detection experiments of fusion cost calculation, and other algorithms are used to set up multi-group experiments for comparison. The experimental results show that the two methods proposed in this paper can obtain better AUC value, which indicates that the proposed method can detect garbage pages more effectively.
【學(xué)位授予單位】:西南交通大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP393.092;TP18

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 劉汝雋;賈斌;辛陽;;基于信息增益特征選擇的網(wǎng)絡(luò)異常檢測模型[J];計算機應(yīng)用;2016年S2期

2 董亞楠;劉學(xué)軍;李斌;;一種基于用戶行為特征選擇的點擊欺詐檢測方法[J];計算機科學(xué);2016年10期

3 權(quán)鑫;顧韻華;鄭關(guān)勝;顧彬;;一種增量式的代價敏感支持向量機[J];中國科學(xué)技術(shù)大學(xué)學(xué)報;2016年09期

4 盧曉勇;陳木生;;基于隨機森林和欠采樣集成的垃圾網(wǎng)頁檢測[J];計算機應(yīng)用;2016年03期

5 李法良;朱焱;曾俊東;;集成PCA降維與分類算法的垃圾網(wǎng)頁檢測[J];計算機應(yīng)用與軟件;2014年10期

6 呂超鎮(zhèn);姬東鴻;吳飛飛;;基于LDA特征擴展的短文本分類[J];計算機工程與應(yīng)用;2015年04期

7 劉奇旭;張辣,

本文編號:1957272


資料下載
論文發(fā)表

本文鏈接:http://www.wukwdryxk.cn/kejilunwen/zidonghuakongzhilunwen/1957272.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶68983***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
亚洲精品狼友在线播放| 中文字幕亚洲精品无码| 无码熟妇人妻av| 99久久精品国产一区二区| 狠狠视频| 在线大香蕉| 日韩3p| 欧美性生恔XXXXXDDDD| 少妇精品导航| 国产偷窥熟女高潮精品视频 | 久久久亚洲精品无码| 国产xxxxx在线观看| 成人国产欧美大片一区| 国产风韵犹存丰满大屁股| 猎艳人妻中年熟妇岳| 老熟女网站| 无码爆乳超乳中文字幕在线| 国产精品无码V在线观看| 人人妻人人添人人爽日韩欧美| 亚洲国产精品久久一线不卡| 曰本无码人妻丰满熟妇啪啪| 日本人真人姓交大视频| 色欲A∨无码蜜臀AV免费播| 亚洲AV片劲爆在线观看| 在线看片免费人成视频电影| 国产精品一区波多野结衣| 国产亚洲AV夜间福利香蕉149| 激情欧美日韩一区二区 | 亚洲中文字幕无码一区二区三区 | 欧美不卡在线| 久久sese| 日韩av在线网站| 久久久久久精品国产观看2010| 人人看人人爱| 亚洲va欧美va人人爽成人影院| 成人一级黄色片| 久久成人一区| www深夜成人白色液体| 亚洲精华液一二三产区| 亚洲αv| 欧美a一级|