Data mining methods of healthy indoor climate coefficients for comfortable well-being

Abstract This article provides information about a currently developed measurement and analysis system ‘Smart Monitoring’, which is used on scientific project in terms of healthy indoor air coefficients, as well as the processing of the collected data for machine learning algorithms. The target is to reduce CO2 emissions caused by wrong ventilation habits in building sector after renovation process in older buildings.

Download Full-text

Data Mining applied on Web Robots Detection: A Systematic Mapping

10.21528/cbic2021-60 ◽

2021 ◽

Author(s):

Ramon Abilio ◽

Cristiano Garcia ◽

Victor Fernandes

Keyword(s):

Machine Learning ◽

Data Mining ◽

Learning Algorithms ◽

Web Server ◽

Machine Learning Algorithms ◽

Web Pages ◽

Systematic Mapping ◽

Mining Methods ◽

Server Logs ◽

Web Server Logs

Browsing on Internet is part of the world population’s daily routine. The number of web pages is increasing and so is the amount of published content (news, tutorials, images, videos) provided by them. Search engines use web robots to index web contents and to offer better results to their users. However, web robots have also been used for exploiting vulnerabilities in web pages. Thus, monitoring and detecting web robots’ accesses is important in order to keep the web server as safe as possible. Data Mining methods have been applied to web server logs (used as data source) in order to detect web robots. Then, the main objective of this work was to observe evidences of definition or use of web robots detection by analyzing web server-side logs using Data Mining methods. Thus, we conducted a systematic Literature mapping, analyzing papers published between 2013 and 2020. In the systematic mapping, we analyzed 34 studies and they allowed us to better understand the area of web robots detection, mapping what is being done, the data used to perform web robots detection, the tools, and algorithms used in the Literature. From those studies, we extracted 33 machine learning algorithms, 64 features, and 13 tools. This study is helpful for researchers to find machine learning algorithms, features, and tools to detect web robots by analyzing web server logs.

Download Full-text

Quality of life control by selected methods of air exchange in a typical apartment building

10.20944/preprints202103.0362.v1 ◽

2021 ◽

Author(s):

Iveta Bullová ◽

Peter Kapalo ◽

Dušan Katunský

Keyword(s):

Air Quality ◽

Indoor Air Quality ◽

Indoor Air ◽

Allergic Diseases ◽

Well Being ◽

Indoor Climate ◽

Apartment Building ◽

Change Rate ◽

Air Change Rate ◽

Negative Effect

Air change rate is an important parameter for quantification of ventilation heat losses and also affects the indoor climate of buildings. Indoor air quality is significantly associated with ventilation. If air change isn't sufficient, trapped allergens, pollutants and irritants can degrade the indoor air quality and affect the well-being of a building's occupants. Many studies on ventilation and health have concluded that lower air change rates can have a negative effect on people’s health and low ventilation may result in an increase in allergic diseases. Quantification of air change rate is complicated, since it is affected by a number of parameters, of which the one of the most variable is the air-wind flow. This study aims to determination and comparison of values of the air change rate in two methods - by quantifying of aerodynamic coefficient Cp = Cpe - Cpi – so called aerodynamic quantification of the building and the methodology based on experimental measurements of carbon dioxide in the selected reference room in apartment building.

Download Full-text

Predicting Student Failure in University Examination using Machine Learning Algorithms

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.e2643.039520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 956-959

Keyword(s):

Machine Learning ◽

Data Mining ◽

Performance Management ◽

Student Performance ◽

Learning Algorithms ◽

Educational Data Mining ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Social Characteristics ◽

Student Failure

Student Performance Management is one of the key pillars of the higher education institutions since it directly impacts the student’s career prospects and college rankings. This paper follows the path of learning analytics and educational data mining by applying machine learning techniques in student data for identifying students who are at the more likely to fail in the university examinations and thus providing needed interventions for improved student performance. The Paper uses data mining approach with 10 fold cross validation to classify students based on predictors which are demographic and social characteristics of the students. This paper compares five popular machine learning algorithms Rep Tree, Jrip, Random Forest, Random Tree, Naive Bayes algorithms based on overall classifier accuracy as well as other class specific indicators i.e. precision, recall, f-measure. Results proved that Rep tree algorithm outperformed other machine learning algorithms in classifying students who are at more likely to fail in the examinations.

Download Full-text

Dr. Phish: Phishing Website Detector

E3S Web of Conferences ◽

10.1051/e3sconf/202129701032 ◽

2021 ◽

Vol 297 ◽

pp. 01032

Author(s):

Harish Kumar ◽

Anshal Prasad ◽

Ninad Rane ◽

Nilay Tamane ◽

Anjali Yeole

Keyword(s):

Machine Learning ◽

Data Mining ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Cyber Crime ◽

Data Mining Algorithms ◽

Learning Techniques ◽

Mining Algorithms ◽

Host Properties ◽

New Strategies

Phishing is a common attack on credulous people by making them disclose their unique information. It is a type of cyber-crime where false sites allure exploited people to give delicate data. This paper deals with methods for detecting phishing websites by analyzing various features of URLs by Machine learning techniques. This experimentation discusses the methods used for detection of phishing websites based on lexical features, host properties and page importance properties. We consider various data mining algorithms for evaluation of the features in order to get a better understanding of the structure of URLs that spread phishing. To protect end users from visiting these sites, we can try to identify the phishing URLs by analyzing their lexical and host-based features.A particular challenge in this domain is that criminals are constantly making new strategies to counter our defense measures. To succeed in this contest, we need Machine Learning algorithms that continually adapt to new examples and features of phishing URLs.

Download Full-text

Lead-based virtual screening and prediction of EGFR inhibitors using PubChem’s database with data mining and machine learning algorithms

10.1021/scimeetings.0c03836 ◽

2020 ◽

Cited By ~ 1

Author(s):

Kedan He

Keyword(s):

Machine Learning ◽

Data Mining ◽

Virtual Screening ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Egfr Inhibitors

Download Full-text

Sensor Validation for Indoor Air Quality using Machine Learning

10.5753/eniac.2020.12174 ◽

2020 ◽

Author(s):

Vagner Seibert ◽

Ricardo Araújo ◽

Richard McElligott

Keyword(s):

Machine Learning ◽

Air Quality ◽

Indoor Air Quality ◽

Indoor Air ◽

Nearest Neighbor ◽

Contextual Information ◽

Machine Learning Algorithms ◽

K Nearest Neighbor ◽

Sensor Validation ◽

Single Reading

To guarantee a high indoor air quality is an increasingly important task. Sensors measure pollutants in the air and allow for monitoring and controlling air quality. However, all sensors are susceptible to failures, either permanent or transitory, that can yield incorrect readings. Automatically detecting such faulty readings is therefore crucial to guarantee sensors' reliability. In this paper we evaluate three Machine Learning algorithms applied to the task of classifying a single reading from a sensor as faulty or not, comparing them to standard statistical approaches. We show that all tested machine learning methods -- Multi-layer Perceptron, K-Nearest Neighbor and Random Forest -- outperform their statistical counterparts, both by allowing better separation boundaries and by allowing for the use of contextual information. We further show that this result does not depend on the amount of data, but ML methods are able to continue to improve as more data is made available.

Download Full-text

Enhanced Machine Learning and Data Mining Methods for Analysing Large Hybrid Electric Vehicle Fleets based on Load Spectrum Data

10.1007/978-3-658-20367-2 ◽

2018 ◽

Cited By ~ 2

Author(s):

Philipp Bergmeir

Keyword(s):

Machine Learning ◽

Data Mining ◽

Electric Vehicle ◽

Hybrid Electric Vehicle ◽

Load Spectrum ◽

Mining Methods ◽

Hybrid Electric ◽

Spectrum Data

Download Full-text

A new classification system for autism based on machine learning of artificial intelligence

Technology and Health Care ◽

10.3233/thc-213032 ◽

2021 ◽

pp. 1-18

Author(s):

Seyed Reza Shahamiri ◽

Fadi Thabtah ◽

Neda Abdelhamid

Keyword(s):

Machine Learning ◽

Scoring Function ◽

Autistic Traits ◽

Well Being ◽

Learning Technologies ◽

Machine Learning Algorithms ◽

The Social ◽

Hidden Patterns ◽

The Individual ◽

Fold Cross Validation

BACKGROUND: Autistic Spectrum Disorder (ASD) is a neurodevelopment condition that is normally linked with substantial healthcare costs. Typical ASD screening techniques are time consuming, so the early detection of ASD could reduce such costs and help limit the development of the condition. OBJECTIVE: We propose an automated approach to detect autistic traits that replaces the scoring function used in current ASD screening with a more intelligent and less subjective approach. METHODS: The proposed approach employs deep neural networks (DNNs) to detect hidden patterns from previously labelled cases and controls, then applies the knowledge derived to classify the individual being screened. Specificity, sensitivity, and accuracy of the proposed approach are evaluated using ten-fold cross-validation. A comparative analysis has also been conducted to compare the DNNs’ performance with other prominent machine learning algorithms. RESULTS: Results indicate that deep learning technologies can be embedded within existing ASD screening to assist the stakeholders in the early identification of ASD traits. CONCLUSION: The proposed system will facilitate access to needed support for the social, physical, and educational well-being of the patient and family by making ASD screening more intelligent and accurate.

Download Full-text

Machine Learning for Business Analytics

Advances in Data Mining and Database Management - Challenges and Applications of Data Analytics in Social Perspectives ◽

10.4018/978-1-7998-2566-1.ch013 ◽

2021 ◽

pp. 232-256

Author(s):

Kağan Okatan

Keyword(s):

Machine Learning ◽

Data Mining ◽

Social Media ◽

Big Data ◽

Machine Learning Algorithms ◽

Decision Makers ◽

Business Analytics ◽

Business Intelligence Systems ◽

Long Time ◽

Rules Of The Game

All these types of analytics have been answering business questions for a long time about the principal methods of investigating data warehouses. Especially data mining and business intelligence systems support decision makers to reach the information they want. Many existing systems are trying to keep up with a phenomenon that has changed the rules of the game in recent years. This is undoubtedly the undeniable attraction of 'big data'. In particular, the issue of evaluating the big data generated especially by social media is among the most up-to-date issues of business analytics, and this issue demonstrates the importance of integrating machine learning into business analytics. This section introduces the prominent machine learning algorithms that are increasingly used for business analytics and emphasizes their application areas.

Download Full-text

Prediction of Skin Diseases Using Machine Learning

10.4018/978-1-7998-7888-9.ch008 ◽

2022 ◽

pp. 154-178

Author(s):

Siddhartha Kumar Arjaria ◽

Vikas Raj ◽

Sunil Kumar ◽

Priyanshu Shrivastava ◽

Monu Kumar ◽

...

Keyword(s):

Machine Learning ◽

Data Mining ◽

Skin Disease ◽

Skin Diseases ◽

Information Gain ◽

Machine Learning Algorithms ◽

Ensemble Method ◽

Chi Square ◽

Data Mining Techniques ◽

Disease Rates

Skin disease rates have been increasing over the past few decades. It has led to both fatal and non-fatal disabilities all around the world, especially in those areas where medical resources are not good enough. Early diagnosis of skin diseases increases the chances of cure significantly. Therefore, this work is comparing six machine learning algorithms, namely KNN, random forest, neural network, naïve bayes, logistic regression, and SVM, for the prediction of the skin diseases. The information gain, gain ratio, gini decrease, chi-square, and relieff are used to rank the features. This work comprises the introduction, literature review, and proposed methodology parts. In this research paper, a new method of analyzing skin disease has been proposed in which six different data mining techniques are used to develop an ensemble method that integrates all the six data mining techniques as a single one. The ensemble method used on the dermatology dataset gives improved result with 94% accuracy in comparison to other classifier algorithms and hence is more effective in this area.

Download Full-text