An Insight of Machine Learning in Web Network Analysis

2019 ◽  
Vol 11 (2) ◽  
pp. 20-34
Author(s):  
Meenakshi Sharma ◽  
Anshul Garg

The World Wide Web is immensely rich in knowledge. The knowledge comes from both the content and distinctive characteristics of the web like its hyperlink structure. The problem comes in digging the relevant data from the web and giving the most appropriate decision to solve the given problem, which can be used for improving any business organisation. The effective solution of the problem depends on how efficiently and effectively the analysis of the web data is done. In analysing the data on web, not only relevant content analysis is essential but also the analysis of web structure is important. This article gives a brief introduction about the various terminologies and measures like centrality, Page Rank, and density used in the web networking analysis. This article will also give a brief introduction about the various supervised ML techniques such as classification, regression, and unsupervised machine learning techniques such as clustering, etc., which are very useful in analysing the web network so that user can make quick and effective decision making

2021 ◽  
pp. 155005942110608
Author(s):  
Jakša Vukojević ◽  
Damir Mulc ◽  
Ivana Kinder ◽  
Eda Jovičić ◽  
Krešimir Friganović ◽  
...  

In everyday clinical practice, there is an ongoing debate about the nature of major depressive disorder (MDD) in patients with borderline personality disorder (BPD). The underlying research does not give us a clear distinction between those 2 entities, although depression is among the most frequent comorbid diagnosis in borderline personality patients. The notion that depression can be a distinct disorder but also a symptom in other psychopathologies led our team to try and delineate those 2 entities using 146 EEG recordings and machine learning. The utilized algorithms, developed solely for this purpose, could not differentiate those 2 entities, meaning that patients suffering from MDD did not have significantly different EEG in terms of patients diagnosed with MDD and BPD respecting the given data and methods used. By increasing the data set and the spatiotemporal specificity, one could have a more sensitive diagnostic approach when using EEG recordings. To our knowledge, this is the first study that used EEG recordings and advanced machine learning techniques and further confirmed the close interrelationship between those 2 entities.


2020 ◽  
Vol 10 (18) ◽  
pp. 6527 ◽  
Author(s):  
Omar Sharif ◽  
Mohammed Moshiul Hoque ◽  
A. S. M. Kayes ◽  
Raza Nowrozy ◽  
Iqbal H. Sarker

Due to the substantial growth of internet users and its spontaneous access via electronic devices, the amount of electronic contents has been growing enormously in recent years through instant messaging, social networking posts, blogs, online portals and other digital platforms. Unfortunately, the misapplication of technologies has increased with this rapid growth of online content, which leads to the rise in suspicious activities. People misuse the web media to disseminate malicious activity, perform the illegal movement, abuse other people, and publicize suspicious contents on the web. The suspicious contents usually available in the form of text, audio, or video, whereas text contents have been used in most of the cases to perform suspicious activities. Thus, one of the most challenging issues for NLP researchers is to develop a system that can identify suspicious text efficiently from the specific contents. In this paper, a Machine Learning (ML)-based classification model is proposed (hereafter called STD) to classify Bengali text into non-suspicious and suspicious categories based on its original contents. A set of ML classifiers with various features has been used on our developed corpus, consisting of 7000 Bengali text documents where 5600 documents used for training and 1400 documents used for testing. The performance of the proposed system is compared with the human baseline and existing ML techniques. The SGD classifier ‘tf-idf’ with the combination of unigram and bigram features are used to achieve the highest accuracy of 84.57%.


2017 ◽  
pp. 71-93 ◽  
Author(s):  
I. Goloshchapova ◽  
M. Andreev

The paper proposes a new approach to measure inflation expectations of the Russian population based on text mining of information on the Internet with the help of machine learning techniques. Two indicators were constructed on the base of readers’ comments to inflation news in major Russian economic media available in the web at the period from 2014 through 2016: with the help of words frequency and sentiment analysis of comments content. During the whole considered period of time both indicators were characterized by dynamics adequate to the development of macroeconomic situation and were also able to forecast dynamics of official Bank of Russia indicators of population inflation expectations for approximately one month in advance.


2020 ◽  
Vol 8 (6) ◽  
pp. 3117-3120

Prediction is the way of identifying the behavior of a person towards online shopping by analyzing the reviews publicly available on the web. In the present study, machine learning approaches are used to extract reviews from the web and segregate and classify them in to five categories, namely, strongly positive, positive, neutral, negative, and strongly negative, for the prediction of human behavior. Several pre-processing methods (including stop-word removal) are applied and web crawler is used to gather the data. This is followed by the application of Stanford POS tagger for tagging the reviews, which is done after stemming by using the porter stemmer algorithm. Analysis of a person’s behavior is performed and experimental results are compared with machine learning approaches.


Author(s):  
Pasquale De Luca

The violation of privacy, others people or personal, is a very current problem, which concerns not only on the web but also in private life. In the years 1990 it was expected that nowadays, that any routine operation was carried out "manually", and it would be performed through mobile phones or personal computers. The problem pertains the distribution network that allows to share and bring together information and as result the network becomes unsafe, if subjected to attacks. Nowaday we put personal information on web because otherwise we are seen as “weak”. This work aims to measure and analyze how much information are shared by users of a pre-established social network and it is carried out through a set of algorithms techniques of machine learning.


Author(s):  
K. Selvakuberan ◽  
M. Indra Devi ◽  
R. Rajaram

The explosive growth of the Web makes it a very useful information resource to all types of users. Today, everyone accesses the Internet for various purposes and retrieving the required information within the stipulated time is the major demand from users. Also, the Internet provides millions of Web pages for each and every search term. Getting interesting and required results from the Web becomes very difficult and turning the classification of Web pages into relevant categories is the current research topic. Web page classification is the current research problem that focuses on classifying the documents into different categories, which are used by search engines for producing the result. In this chapter we focus on different machine learning techniques and how Web pages can be classified using these machine learning techniques. The automatic classification of Web pages using machine learning techniques is the most efficient way used by search engines to provide accurate results to the users. Machine learning classifiers may also be trained to preserve the personal details from unauthenticated users and for privacy preserving data mining.


2012 ◽  
pp. 50-65 ◽  
Author(s):  
K. Selvakuberan ◽  
M. Indra Devi ◽  
R. Rajaram

The explosive growth of the Web makes it a very useful information resource to all types of users. Today, everyone accesses the Internet for various purposes and retrieving the required information within the stipulated time is the major demand from users. Also, the Internet provides millions of Web pages for each and every search term. Getting interesting and required results from the Web becomes very difficult and turning the classification of Web pages into relevant categories is the current research topic. Web page classification is the current research problem that focuses on classifying the documents into different categories, which are used by search engines for producing the result. In this chapter we focus on different machine learning techniques and how Web pages can be classified using these machine learning techniques. The automatic classification of Web pages using machine learning techniques is the most efficient way used by search engines to provide accurate results to the users. Machine learning classifiers may also be trained to preserve the personal details from unauthenticated users and for privacy preserving data mining.


2017 ◽  
Vol 865 ◽  
pp. 650-656
Author(s):  
Yun Jae Choung ◽  
Myung Hee Jo

Surface material classification is an important task for the preservation of land properties and the management of land development plans. The use of remotely sensed images is efficient for the surface material classification task without human access. This research aims to select the most appropriate machine learning technique for the surface material classification task using the remotely sensed images. In this research, the three different machine learning techniques (MD (Minimum Distance), MLC (Maximum Likelihood Classification), and SVM (Support Vector Machine)) were applied for surface material classification using the Landsat-8 OLI (Operational Land Imager) image acquired in Ulsan, South Korea, in the following steps. First, the training samples for each land cover in the given Landsat images were selected by manual labor. Next, the different machine learning techniques (MD, MLC, and SVM) were applied on the given Landsat images, respectively, for carrying out the surface material classification tasks. The accuracies of the three land cover classification maps generated by the different techniques were assessed using the ground truths. Finally, accuracy comparison was conducted for selecting the most suitable approach for classifying the various surface materials in Ulsan. The statistical results show that the SVM classifier is superior to the MD and MLC classifiers for carrying out surface material classification using the given Landsat-8 OLI image.


2018 ◽  
Vol 7 (3.20) ◽  
pp. 1
Author(s):  
Mahmoud Sammour ◽  
Burairah Hussin ◽  
Mohd Fairuz Iskandar Othman ◽  
Mohamed Doheir ◽  
Basel AlShaikhdeeb ◽  
...  

One of the significant threats that faces the web nowadays is the DNS tunneling which is an attack that exploit the domain name protocol in order to bypass security gateways. This would lead to lose critical information which is a disastrous situation for many organizations. Recently, researchers have pay more attention in the machine learning techniques regarding the process of DNS tunneling. Machine learning is significantly impacted by the utilized features. However, the lack of benchmarking standard dataset for DNS tunneling, researchers have captured the features of DNS tunneling using different techniques. This paper aims to present a review on the features used for the DNS tunneling. 


Author(s):  
Omar Sharif ◽  
Mohammed Moshiul Hoque ◽  
A. S. M. Kayes ◽  
Raza Nowrozy ◽  
Iqbal H. Sarker

Due to the substantial growth of internet users and its spontaneous access via electronic devices, the amount of electronic contents is growing enormously in recent years through instant messaging, social networking posts, blogs, online portals, and other digital platforms. Unfortunately, the misapplication of technologies has boosted with this rapid growth of online content which leads to the rise in suspicious activities. People misuse the web media to disseminate malicious activity, perform the illegal movement, abuse other people, and publicize suspicious contents on the web. The suspicious contents usually available in the form of text, audio or video, whereas text contents have been used in most of the cases to perform suspicious activities. Thus, one of the most challenging issues for NLP researchers is to develop a system that can identify suspicious text efficiently from the specific contents. In this paper, a Machine Learning (ML)-based classification model is proposed (hereafter called STD) to classify Bengali text into non-suspicious and suspicious categories based on its original contents. A set of ML classifiers with various features has been used on our developed corpus, consisting of 7000 Bengali text documents where 5600 documents used for training and 1400 documents used for testing. The performance of the proposed system is compared with the human baseline and existing ML techniques. The SGD classifier `tf-idf’ with the combination of unigram and bigram features are used to achieve the highest accuracy of 84.57%.


Sign in / Sign up

Export Citation Format

Share Document