Fingerprinting Keywords in Search Queries over Tor

AbstractSearch engine queries contain a great deal of private and potentially compromising information about users. One technique to prevent search engines from identifying the source of a query, and Internet service providers (ISPs) from identifying the contents of queries is to query the search engine over an anonymous network such as Tor.In this paper, we study the extent to which Website Fingerprinting can be extended to fingerprint individual queries or keywords to web applications, a task we call Keyword Fingerprinting (KF). We show that by augmenting traffic analysis using a two-stage approach with new task-specific feature sets, a passive network adversary can in many cases defeat the use of Tor to protect search engine queries.We explore three popular search engines, Google, Bing, and Duckduckgo, and several machine learning techniques with various experimental scenarios. Our experimental results show that KF can identify Google queries containing one of 300 targeted keywords with recall of 80% and precision of 91%, while identifying the specific monitored keyword among 300 search keywords with accuracy 48%. We also further investigate the factors that contribute to keyword fingerprintability to understand how search engines and users might protect against KF.

Download Full-text

Analysis of Naıve Bayes Algorithm for Email Spam Filtering

International Journal for Modern Trends in Science and Technology - RTT2020 ◽

10.46501/ijmtst0701002 ◽

2021 ◽

Vol 7 (01) ◽

pp. 5-9

Author(s):

RajKishore Sahni

Keyword(s):

Machine Learning ◽

Service Providers ◽

Machine Learning Techniques ◽

Research Trend ◽

Learning Approaches ◽

Spam Filtering ◽

Internet Service ◽

Learning Techniques ◽

Bayes Algorithm ◽

Email Spam

The upsurge in the volume of unwanted emails called spam has created an intense need for the development of more dependable and robust antispam filters. Machine learning methods of recent are being used to successfully detect and filter spam emails. We present a systematic review of some of the popular machine learning based email spam filtering approaches. Our review covers survey of the important concepts, attempts, efficiency, and the research trend in spam filtering. The preliminary discussion in the study background examines the applications of machine learning techniques to the email spam filtering process of the leading internet service providers (ISPs) like Gmail, Yahoo and Outlook emails spam filters. Discussion on general email spam filtering process, and the various efforts by different researchers in combating spam through the use machine learning techniques was done. Our review compares the strengths and drawbacks of existing machine learning approaches and the open research problems in spam filtering. We recommended deep learning and deep adversarial learning as the future techniques that can effectively handle the menace of spam emails

Download Full-text

Identifying Key Predictors of Cognitive Dysfunction in Older People Using Supervised Machine Learning Techniques: Observational Study

JMIR Medical Informatics ◽

10.2196/20995 ◽

2020 ◽

Vol 8 (9) ◽

pp. e20995

Author(s):

Debbie Rankin ◽

Michaela Black ◽

Bronac Flanagan ◽

Catherine F Hughes ◽

Adrian Moore ◽

...

Keyword(s):

Machine Learning ◽

Cognitive Function ◽

Cognitive Performance ◽

Cognitive Dysfunction ◽

Service Providers ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Score Range

Background Machine learning techniques, specifically classification algorithms, may be effective to help understand key health, nutritional, and environmental factors associated with cognitive function in aging populations. Objective This study aims to use classification techniques to identify the key patient predictors that are considered most important in the classification of poorer cognitive performance, which is an early risk factor for dementia. Methods Data were used from the Trinity-Ulster and Department of Agriculture study, which included detailed information on sociodemographic, clinical, biochemical, nutritional, and lifestyle factors in 5186 older adults recruited from the Republic of Ireland and Northern Ireland, a proportion of whom (987/5186, 19.03%) were followed up 5-7 years later for reassessment. Cognitive function at both time points was assessed using a battery of tests, including the Repeatable Battery for the Assessment of Neuropsychological Status (RBANS), with a score <70 classed as poorer cognitive performance. This study trained 3 classifiers—decision trees, Naïve Bayes, and random forests—to classify the RBANS score and to identify key health, nutritional, and environmental predictors of cognitive performance and cognitive decline over the follow-up period. It assessed their performance, taking note of the variables that were deemed important for the optimized classifiers for their computational diagnostics. Results In the classification of a low RBANS score (<70), our models performed well (F1 score range 0.73-0.93), all highlighting the individual’s score from the Timed Up and Go (TUG) test, the age at which the participant stopped education, and whether or not the participant’s family reported memory concerns to be of key importance. The classification models performed well in classifying a greater rate of decline in the RBANS score (F1 score range 0.66-0.85), also indicating the TUG score to be of key importance, followed by blood indicators: plasma homocysteine, vitamin B6 biomarker (plasma pyridoxal-5-phosphate), and glycated hemoglobin. Conclusions The results suggest that it may be possible for a health care professional to make an initial evaluation, with a high level of confidence, of the potential for cognitive dysfunction using only a few short, noninvasive questions, thus providing a quick, efficient, and noninvasive way to help them decide whether or not a patient requires a full cognitive evaluation. This approach has the potential benefits of making time and cost savings for health service providers and avoiding stress created through unnecessary cognitive assessments in low-risk patients.

Download Full-text

A classification approach with different feature sets to predict the quality of different types of wine using machine learning techniques

2018 20th International Conference on Advanced Communication Technology (ICACT) ◽

10.23919/icact.2018.8323674 ◽

2018 ◽

Author(s):

Satyabrata Aich ◽

Ahmed Abdulhakim Al-Absi ◽

Kueh Lee Hui ◽

John Tark Lee ◽

Mangal Sain

Keyword(s):

Machine Learning ◽

Machine Learning Techniques ◽

Classification Approach ◽

Feature Sets ◽

Learning Techniques ◽

Different Types

Download Full-text

Classification of Spamming Attacks to Blogging Websites and Their Security Techniques

Encyclopedia of Criminal Activities and the Deep Web ◽

10.4018/978-1-5225-9715-5.ch058 ◽

2020 ◽

pp. 864-880 ◽

Cited By ~ 1

Author(s):

Rizwan Ur Rahman ◽

Rishu Verma ◽

Himani Bansal ◽

Deepak Singh Tomar

Keyword(s):

Search Engine ◽

World Wide ◽

Web Search ◽

Service Providers ◽

Web Pages ◽

Internet Service ◽

Important Concern ◽

Attack Scenario ◽

Explosive Expansion

With the explosive expansion of information on the world wide web, search engines are becoming more significant in the day-to-day lives of humans. Even though a search engine generally gives huge number of results for certain query, the majority of the search engine users simply view the first few web pages in result lists. Consequently, the ranking position has become a most important concern of internet service providers. This article addresses the vulnerabilities, spamming attacks, and countermeasures in blogging sites. In the first part, the article explores the spamming types and detailed section on vulnerabilities. In the next part, an attack scenario of form spamming is presented, and defense approach is presented. Consequently, the aim of this article is to provide review of vulnerabilities, threats of spamming associated with blogging websites, and effective measures to counter them.

Download Full-text

Machine learning techniques applied to detect cyber attacks on web applications

Logic Journal of IGPL ◽

10.1093/jigpal/jzu038 ◽

2014 ◽

Vol 23 (1) ◽

pp. 45-56 ◽

Cited By ~ 22

Author(s):

M. Chora ◽

R. Kozik

Keyword(s):

Machine Learning ◽

Web Applications ◽

Cyber Attacks ◽

Machine Learning Techniques ◽

Learning Techniques

Download Full-text

Introduction

The Liability of Internet Intermediaries ◽

10.1093/oso/9780198719779.003.0001 ◽

2016 ◽

Author(s):

Jaani Riordan

Keyword(s):

Social Networks ◽

Search Engines ◽

Service Providers ◽

Personal Data ◽

Political Life ◽

Internet Service Providers ◽

Electronic Communications ◽

Internet Service ◽

Data Centres

Internet intermediaries are essential features of modern commerce, social and political life, and the dissemination of ideas. Some act as conduits through which our transmissions pass; others are custodians of our personal data and gatekeepers of the world’s knowledge. They supply the infrastructure and tools which make electronic communications possible. These services encompass a vast ecosystem of different entities: internet service providers, website operators, hosts, data centres, social networks, media platforms, search engines, app developers, marketplaces, app stores, and others—many of which are household names.

Download Full-text

Current Policy Issues in Internet Intermediary Liability

Oxford Research Encyclopedia of Communication ◽

10.1093/acrefore/9780190228613.013.889 ◽

2019 ◽

Author(s):

Lucas Logan

Keyword(s):

Search Engines ◽

Service Providers ◽

Strict Liability ◽

Limited Liability ◽

The United States ◽

Free Expression ◽

Internet Service ◽

Company Policy ◽

Media Services ◽

Intermediary Liability

Intermediary liability is at the center of the debate over free expression, free speech, and an open Internet. The underlying policies form network regulation that governs the extent that websites, search engines, and Internet service providers that host user content are legally responsible for what their users post or upload. Levels of intermediary liability are commonly categorized as providing broad immunity, limited liability, or strict liability. In the United States, intermediaries are given broad immunity through Section 230 of the Communication Decency Act. In practice, this means that search engines cannot be held liable for the speech of individuals appearing in search results, or a news site is not responsible for what people are typing in its comment section. Immunity is important to the existence of free expression because it ensures that intermediaries do not have incentives to censor content out of fear of the law. The millions of users continuously generating content through Facebook and YouTube, for instance, would not be able to do so if those intermediaries were fearful of legal consequences due to the actions of any given user. Privacy policy online is most evidently showcased by the European Union’s Right to be Forgotten policy, which forces search engines to delist an individual’s information that is deemed harmful to reputation. Hateful and harmful speech is also regulated online through intermediary liability, although social media services often decide when and how to remove this type of content based on company policy.

Download Full-text

Makine Öğrenmesi Teknikleri ile internet Servis Sağlayicisi için Müşteri Kayip Tahmini : Ensemble Churn Prediction for Internet Service Provider with Machine Learning Techniques

2020 5th International Conference on Computer Science and Engineering (UBMK) ◽

10.1109/ubmk50275.2020.9219369 ◽

2020 ◽

Author(s):

Gokhan Goy ◽

Burak Kolukisa ◽

Cenk Bahcevan ◽

Vehbi Cagri Gungor

Keyword(s):

Machine Learning ◽

Service Provider ◽

Internet Service Provider ◽

Machine Learning Techniques ◽

Churn Prediction ◽

Internet Service ◽

Learning Techniques

Download Full-text

Combining a rule-based classifier with ensemble of feature sets and machine learning techniques for sentiment analysis on microblog

2016 19th International Conference on Computer and Information Technology (ICCIT) ◽

10.1109/iccitechn.2016.7860214 ◽

2016 ◽

Cited By ~ 5

Author(s):

Umme Aymun Siddiqua ◽

Tanveer Ahsan ◽

Abu Nowshed Chy

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Machine Learning Techniques ◽

Rule Based ◽

Feature Sets ◽

Learning Techniques ◽

Rule Based Classifier

Download Full-text

Classification of Macronutrient Deficiencies in Maize Plant Using Machine Learning

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v8i6.pp4197-4203 ◽

2018 ◽

Vol 8 (6) ◽

pp. 4197

Author(s):

Leena N ◽

K. K. Saju

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Nutrient Management ◽

Management Strategies ◽

Nutrient Deficiency ◽

Crop Productivity ◽

Machine Learning Techniques ◽

Nutritional Deficiencies ◽

Feature Sets ◽

Learning Techniques

<p>Detection of nutritional deficiencies in plants is vital for improving crop productivity. Timely identification of nutrient deficiency through visual symptoms in the plants can help farmers take quick corrective action by appropriate nutrient management strategies. The application of computer vision and machine learning techniques offers new prospects in non-destructive field-based analysis for nutrient deficiency. Color and shape are important parameters in feature extraction. In this work, two different techniques are used for image segmentation and feature extraction to generate two different feature sets from the same image sets. These are then used for classification using different machine learning techniques. The experimental results are analyzed and compared in terms of classification accuracy to find the best algorithm for the two feature sets.</p>

Download Full-text