Retail Site Selection using Machine Learning Algorithms

Selecting a new site for retail business expansion has always been a challenge for decision-makers. It requires not only the sales data but the geographic data in order to decide the potential location for their respective purposes. Proper use of the data could lead to better decision-making. To date, common techniques such as geographic information system (GIS) and multi-criteria decision making (MCDM) have been applied to site selection. These methods, however, require not only extensive human effort but more importantly, difficult to validate the importance of identified variables. In this work, sales performance is proposed as a function of geospatial features to determine the suitability of a retail location. The main aim of this study was to identify features attributed to optimal site selection which in turn facilitate sales prediction for a telecommunication company in Malaysia. In this research, various feature selection techniques and machine learning models were deployed for sales prediction in order to determine the suitability of the new location. The findings show the top 3 feature selections are prediction step in VSURF, random search, and fuse learner with search strategy; the top 3 families are boosting, random forest and bagging; and the top 3 classifiers are C5.0, rf, and parRF. The crossover combination of the top feature selection-classifier can produce the AUC of more than 0.75. The highest AUC, 0.8354 was obtained through random search-parRF.

Download Full-text

Feature-Selection and Mutual-Clustering Approaches to Improve DoS Detection and Maintain WSNs’ Lifetime

Sensors ◽

10.3390/s21144821 ◽

2021 ◽

Vol 21 (14) ◽

pp. 4821

Author(s):

Rami Ahmad ◽

Raniyah Wazirali ◽

Qusay Bsoul ◽

Tarik Abu-Ain ◽

Waleed Abu-Ain

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Open Field ◽

Network Lifetime ◽

Detection Efficiency ◽

Denial Of Service ◽

Harmony Search ◽

Machine Learning Algorithms ◽

Transport Layer ◽

Feature Selection Techniques

Wireless Sensor Networks (WSNs) continue to face two major challenges: energy and security. As a consequence, one of the WSN-related security tasks is to protect them from Denial of Service (DoS) and Distributed DoS (DDoS) attacks. Machine learning-based systems are the only viable option for these types of attacks, as traditional packet deep scan systems depend on open field inspection in transport layer security packets and the open field encryption trend. Moreover, network data traffic will become more complex due to increases in the amount of data transmitted between WSN nodes as a result of increasing usage in the future. Therefore, there is a need to use feature selection techniques with machine learning in order to determine which data in the DoS detection process are most important. This paper examined techniques for improving DoS anomalies detection along with power reservation in WSNs to balance them. A new clustering technique was introduced, called the CH_Rotations algorithm, to improve anomaly detection efficiency over a WSN’s lifetime. Furthermore, the use of feature selection techniques with machine learning algorithms in examining WSN node traffic and the effect of these techniques on the lifetime of WSNs was evaluated. The evaluation results showed that the Water Cycle (WC) feature selection displayed the best average performance accuracy of 2%, 5%, 3%, and 3% greater than Particle Swarm Optimization (PSO), Simulated Annealing (SA), Harmony Search (HS), and Genetic Algorithm (GA), respectively. Moreover, the WC with Decision Tree (DT) classifier showed 100% accuracy with only one feature. In addition, the CH_Rotations algorithm improved network lifetime by 30% compared to the standard LEACH protocol. Network lifetime using the WC + DT technique was reduced by 5% compared to other WC + DT-free scenarios.

Download Full-text

Network Intrusion Detection with Feature Selection Techniques using Machine-Learning Algorithms

International Journal of Computer Applications ◽

10.5120/ijca2016910764 ◽

2016 ◽

Vol 150 (12) ◽

pp. 1-13 ◽

Cited By ~ 4

Author(s):

Koushal Kumar ◽

Jaspreet Singh

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Intrusion Detection ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Network Intrusion Detection ◽

Network Intrusion ◽

Feature Selection Techniques

Download Full-text

Supervised Machine Learning Algorithms for Bioelectromagnetics: Prediction Models and Feature Selection Techniques Using Data from Weak Radiofrequency Radiation Effect on Human and Animals Cells

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph17124595 ◽

2020 ◽

Vol 17 (12) ◽

pp. 4595

Author(s):

Malka N. Halgamuge

Keyword(s):

Machine Learning ◽

Experimental Data ◽

Feature Selection ◽

Exposure Time ◽

New Technologies ◽

Prediction Models ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Animal Cells ◽

Feature Selection Techniques

The emergence of new technologies to incorporate and analyze data with high-performance computing has expanded our capability to accurately predict any incident. Supervised Machine learning (ML) can be utilized for a fast and consistent prediction, and to obtain the underlying pattern of the data better. We develop a prediction strategy, for the first time, using supervised ML to observe the possible impact of weak radiofrequency electromagnetic field (RF-EMF) on human and animal cells without performing in-vitro laboratory experiments. We extracted laboratory experimental data from 300 peer-reviewed scientific publications (1990–2015) describing 1127 experimental case studies of human and animal cells response to RF-EMF. We used domain knowledge, Principal Component Analysis (PCA), and the Chi-squared feature selection techniques to select six optimal features for computation and cost-efficiency. We then develop grouping or clustering strategies to allocate these selected features into five different laboratory experiment scenarios. The dataset has been tested with ten different classifiers, and the outputs are estimated using the k-fold cross-validation method. The assessment of a classifier’s prediction performance is critical for assessing its suitability. Hence, a detailed comparison of the percentage of the model accuracy (PCC), Root Mean Squared Error (RMSE), precision, sensitivity (recall), 1 − specificity, Area under the ROC Curve (AUC), and precision-recall (PRC Area) for each classification method were observed. Our findings suggest that the Random Forest algorithm exceeds in all groups in terms of all performance measures and shows AUC = 0.903 where k-fold = 60. A robust correlation was observed in the specific absorption rate (SAR) with frequency and cumulative effect or exposure time with SAR×time (impact of accumulated SAR within the exposure time) of RF-EMF. In contrast, the relationship between frequency and exposure time was not significant. In future, with more experimental data, the sample size can be increased, leading to more accurate work.

Download Full-text

Benchmark of feature selection techniques with machine learning algorithms for cancer datasets

Proceedings of the International Conference on Artificial Intelligence and Robotics and the International Conference on Automation, Control and Robotics Engineering - ICAIR-CACRE '16 ◽

10.1145/2952744.2952753 ◽

2016 ◽

Cited By ~ 1

Author(s):

Munirah Mohd Yusof ◽

Rozlini Mohamed ◽

Noorhaniza Wahid

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Feature Selection Techniques

Download Full-text

Detecting Network Anomalies using Multilayer Feature Selection Techniques and Machine Learning Algorithms

10.1109/gcat52182.2021.9587542 ◽

2021 ◽

Author(s):

Vikrant Singh ◽

Shavik Balyan ◽

Mayank Mathur

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Feature Selection Techniques ◽

Network Anomalies

Download Full-text

Efficient Prediction of Cardiovascular Disease Using Machine Learning Algorithms With Relief and LASSO Feature Selection Techniques

IEEE Access ◽

10.1109/access.2021.3053759 ◽

2021 ◽

Vol 9 ◽

pp. 19304-19326

Author(s):

Pronab Ghosh ◽

Sami Azam ◽

Mirjam Jonkman ◽

Asif Karim ◽

F. M. Javed Mehedi Shamrat ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Feature Selection ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Efficient Prediction ◽

Feature Selection Techniques

Download Full-text

Sentiment Analysis of Movie Reviews: A Study of Machine Learning Algorithms with Various Feature Selection Methods

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v5i9.113121 ◽

2017 ◽

Vol 5 (9) ◽

Cited By ~ 1

Author(s):

Rajwinder Kaur

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Sentiment Analysis ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Selection Methods

Download Full-text

Text Classification in Clinical Practice Guidelines Using Machine-Learning Assisted Pattern-Based Approach

Applied Sciences ◽

10.3390/app11083296 ◽

2021 ◽

Vol 11 (8) ◽

pp. 3296

Author(s):

Musarrat Hussain ◽

Jamil Hussain ◽

Taqdir Ali ◽

Syed Imran Ali ◽

Hafiz Syed Muhammad Bilal ◽

...

Keyword(s):

Machine Learning ◽

Decision Making ◽

Clinical Practice ◽

Clinical Practice Guidelines ◽

Practice Guidelines ◽

Machine Learning Algorithms ◽

Nominal Group ◽

Specific Information ◽

Matching Techniques ◽

Disease Specific

Clinical Practice Guidelines (CPGs) aim to optimize patient care by assisting physicians during the decision-making process. However, guideline adherence is highly affected by its unstructured format and aggregation of background information with disease-specific information. The objective of our study is to extract disease-specific information from CPG for enhancing its adherence ratio. In this research, we propose a semi-automatic mechanism for extracting disease-specific information from CPGs using pattern-matching techniques. We apply supervised and unsupervised machine-learning algorithms on CPG to extract a list of salient terms contributing to distinguishing recommendation sentences (RS) from non-recommendation sentences (NRS). Simultaneously, a group of experts also analyzes the same CPG and extract the initial patterns “Heuristic Patterns” using a group decision-making method, nominal group technique (NGT). We provide the list of salient terms to the experts and ask them to refine their extracted patterns. The experts refine patterns considering the provided salient terms. The extracted heuristic patterns depend on specific terms and suffer from the specialization problem due to synonymy and polysemy. Therefore, we generalize the heuristic patterns to part-of-speech (POS) patterns and unified medical language system (UMLS) patterns, which make the proposed method generalize for all types of CPGs. We evaluated the initial extracted patterns on asthma, rhinosinusitis, and hypertension guidelines with the accuracy of 76.92%, 84.63%, and 89.16%, respectively. The accuracy increased to 78.89%, 85.32%, and 92.07% with refined machine-learning assistive patterns, respectively. Our system assists physicians by locating disease-specific information in the CPGs, which enhances the physicians’ performance and reduces CPG processing time. Additionally, it is beneficial in CPGs content annotation.

Download Full-text

Effective combining of feature selection techniques for machine learning-enabled IoT intrusion detection

Multimedia Tools and Applications ◽

10.1007/s11042-021-10567-y ◽

2021 ◽

Author(s):

Md Arafatur Rahman ◽

A. Taufiq Asyhari ◽

Ong Wei Wen ◽

Husnul Ajra ◽

Yussuf Ahmed ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Intrusion Detection ◽

Feature Selection Techniques

Download Full-text

Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01403-2 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Alan Brnabic ◽

Lisa M. Hess

Keyword(s):

Machine Learning ◽

Decision Making ◽

Literature Review ◽

Systematic Literature Review ◽

Real World ◽

Learning Algorithms ◽

External Validation ◽

Machine Learning Algorithms ◽

Learning Methods ◽

Machine Learning Methods

Abstract Background Machine learning is a broad term encompassing a number of methods that allow the investigator to learn from the data. These methods may permit large real-world databases to be more rapidly translated to applications to inform patient-provider decision making. Methods This systematic literature review was conducted to identify published observational research of employed machine learning to inform decision making at the patient-provider level. The search strategy was implemented and studies meeting eligibility criteria were evaluated by two independent reviewers. Relevant data related to study design, statistical methods and strengths and limitations were identified; study quality was assessed using a modified version of the Luo checklist. Results A total of 34 publications from January 2014 to September 2020 were identified and evaluated for this review. There were diverse methods, statistical packages and approaches used across identified studies. The most common methods included decision tree and random forest approaches. Most studies applied internal validation but only two conducted external validation. Most studies utilized one algorithm, and only eight studies applied multiple machine learning algorithms to the data. Seven items on the Luo checklist failed to be met by more than 50% of published studies. Conclusions A wide variety of approaches, algorithms, statistical software, and validation strategies were employed in the application of machine learning methods to inform patient-provider decision making. There is a need to ensure that multiple machine learning approaches are used, the model selection strategy is clearly defined, and both internal and external validation are necessary to be sure that decisions for patient care are being made with the highest quality evidence. Future work should routinely employ ensemble methods incorporating multiple machine learning algorithms.

Download Full-text