Site Selection of Digital Signage in Beijing: A Combination of Machine Learning and an Empirical Approach

With the extensive use of digital signage, precise site selection is an urgent issue for digital signage enterprises and management agencies. This research aims to provide an accurate digital signage site-selection model that integrates the spatial characteristics of geographical location and multisource factor data and combines empirical location models with machine learning methods to recommend locations for digital signage. The outdoor commercial digital signage within the Sixth Ring Road area in Beijing was selected as an example and was combined with population census, average house prices, social network check-in data, the centrality of traffic networks, and point of interest (POI) facilities data as research data. The data were divided into 100–1000 m grids for digital signage site-selection modelling. The empirical approach of the improved Huff model was used to calculate the spatial accessibility of digital signage, and machine learning approaches such as back propagation neural network (BP neural networks) were used to calculate the potential location of digital signage. The site of digital signage to be deployed was obtained by overlay analysis. The result shows that the proposed method has a higher true positive rate and a lower false positive rate than the other three site selection models, which indicates that this method has higher accuracy for site selection. The site results show that areas suitable for digital signage are mainly distributed in Sanlitun, Wangfujing, Financial Street, Beijing West Railway Station, and along the main road network within the Sixth Ring Road. The research provides a reference for integrating geographical features and content data into the site-selection algorithm. It can effectively improve the accuracy and scientific nature of digital signage layouts and the efficiency of digital signage to a certain extent.

Download Full-text

Genome-wide discovery of pre-miRNAs: comparison of recent approaches based on machine learning

Briefings in Bioinformatics ◽

10.1093/bib/bbaa184 ◽

2020 ◽

Cited By ~ 1

Author(s):

Leandro A Bugnon ◽

Cristian Yones ◽

Diego H Milone ◽

Georgina Stegmayer

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Homo Sapiens ◽

False Positive Rate ◽

Class Imbalance ◽

Mirna Precursor ◽

Learning Approaches ◽

Novel Mirna ◽

Genome Wide ◽

Positive Rate

Abstract Motivation The genome-wide discovery of microRNAs (miRNAs) involves identifying sequences having the highest chance of being a novel miRNA precursor (pre-miRNA), within all the possible sequences in a complete genome. The known pre-miRNAs are usually just a few in comparison to the millions of candidates that have to be analyzed. This is of particular interest in non-model species and recently sequenced genomes, where the challenge is to find potential pre-miRNAs only from the sequenced genome. The task is unfeasible without the help of computational methods, such as deep learning. However, it is still very difficult to find an accurate predictor, with a low false positive rate in this genome-wide context. Although there are many available tools, these have not been tested in realistic conditions, with sequences from whole genomes and the high class imbalance inherent to such data. Results In this work, we review six recent methods for tackling this problem with machine learning. We compare the models in five genome-wide datasets: Arabidopsis thaliana, Caenorhabditis elegans, Anopheles gambiae, Drosophila melanogaster, Homo sapiens. The models have been designed for the pre-miRNAs prediction task, where there is a class of interest that is significantly underrepresented (the known pre-miRNAs) with respect to a very large number of unlabeled samples. It was found that for the smaller genomes and smaller imbalances, all methods perform in a similar way. However, for larger datasets such as the H. sapiens genome, it was found that deep learning approaches using raw information from the sequences reached the best scores, achieving low numbers of false positives. Availability The source code to reproduce these results is in: http://sourceforge.net/projects/sourcesinc/files/gwmirna Additionally, the datasets are freely available in: https://sourceforge.net/projects/sourcesinc/files/mirdata

Download Full-text

IoT Botnet Anomaly Detection Using Unsupervised Deep Learning

Electronics ◽

10.3390/electronics10161876 ◽

2021 ◽

Vol 10 (16) ◽

pp. 1876

Author(s):

Ioana Apostol ◽

Marius Preda ◽

Constantin Nila ◽

Ion Bica

Keyword(s):

Deep Learning ◽

Detection System ◽

False Positive Rate ◽

Empirical Evaluation ◽

Learning Approaches ◽

Promising Alternative ◽

Positive Rate ◽

Unsupervised Deep Learning ◽

Attack Surface ◽

Iot Devices

The Internet of Things has become a cutting-edge technology that is continuously evolving in size, connectivity, and applicability. This ecosystem makes its presence felt in every aspect of our lives, along with all other emerging technologies. Unfortunately, despite the significant benefits brought by the IoT, the increased attack surface built upon it has become more critical than ever. Devices have limited resources and are not typically created with security features. Lately, a trend of botnet threats transitioning to the IoT environment has been observed, and an army of infected IoT devices can expand quickly and be used for effective attacks. Therefore, identifying proper solutions for securing IoT systems is currently an important and challenging research topic. Machine learning-based approaches are a promising alternative, allowing the identification of abnormal behaviors and the detection of attacks. This paper proposes an anomaly-based detection solution that uses unsupervised deep learning techniques to identify IoT botnet activities. An empirical evaluation of the proposed method is conducted on both balanced and unbalanced datasets to assess its threat detection capability. False-positive rate reduction and its impact on the detection system are also analyzed. Furthermore, a comparison with other unsupervised learning approaches is included. The experimental results reveal the performance of the proposed detection method.

Download Full-text

Identification of newborns at risk for autism using electronic medical records and machine learning

10.1101/19008367 ◽

2019 ◽

Author(s):

Rayees Rahman ◽

Arad Kodesh ◽

Stephen Z Levine ◽

Sven Sandin ◽

Abraham Reichenberg ◽

...

Keyword(s):

Machine Learning ◽

Autism Spectrum Disorder ◽

Positive Predictive Value ◽

Electronic Medical Records ◽

Predictive Value ◽

False Positive ◽

Medical Records ◽

False Positive Rate ◽

Autism Spectrum ◽

Positive Rate

AbstractImportanceCurrent approaches for early identification of individuals at high risk for autism spectrum disorder (ASD) in the general population are limited, where most ASD patients are not identified until after the age of 4. This is despite substantial evidence suggesting that early diagnosis and intervention improves developmental course and outcome.ObjectiveDevelop a machine learning (ML) method predicting the diagnosis of ASD in offspring in a general population sample, using parental electronic medical records (EMR) available before childbirthDesignPrognostic study of EMR data within a single Israeli health maintenance organization, for the parents of 1,397 ASD children (ICD-9/10), and 94,741 non-ASD children born between January 1st, 1997 through December 31st, 2008. The complete EMR record of the parents was used to develop various ML models to predict the risk of having a child with ASD.Main outcomes and measuresRoutinely available parental sociodemographic information, medical histories and prescribed medications data until offspring’s birth were used to generate features to train various machine learning algorithms, including multivariate logistic regression, artificial neural networks, and random forest. Prediction performance was evaluated with 10-fold cross validation, by computing C statistics, sensitivity, specificity, accuracy, false positive rate, and precision (positive predictive value, PPV).ResultsAll ML models tested had similar performance, achieving an average C statistics of 0.70, sensitivity of 28.63%, specificity of 98.62%, accuracy of 96.05%, false positive rate of 1.37%, and positive predictive value of 45.85% for predicting ASD in this dataset.Conclusion and relevanceML algorithms combined with EMR capture early life ASD risk. Such approaches may be able to enhance the ability for accurate and efficient early detection of ASD in large populations of children.Key pointsQuestionCan autism risk in children be predicted using the pre-birth electronic medical record (EMR) of the parents?FindingsIn this population-based study that included 1,397 children with autism spectrum disorder (ASD) and 94,741 non-ASD children, we developed a machine learning classifier for predicting the likelihood of childhood diagnosis of ASD with an average C statistic of 0.70, sensitivity of 28.63%, specificity of 98.62%, accuracy of 96.05%, false positive rate of 1.37%, and positive predictive value of 45.85%.MeaningThe results presented serve as a proof-of-principle of the potential utility of EMR for the identification of a large proportion of future children at a high-risk of ASD.

Download Full-text

IoT Dataset Validation Using Machine Learning Techniques for Traffic Anomaly Detection

Electronics ◽

10.3390/electronics10222857 ◽

2021 ◽

Vol 10 (22) ◽

pp. 2857

Author(s):

Laura Vigoya ◽

Diego Fernandez ◽

Victor Carneiro ◽

Francisco Nóvoa

Keyword(s):

Machine Learning ◽

Anomaly Detection ◽

False Positive Rate ◽

Machine Learning Techniques ◽

Support Vector ◽

High Detection Rate ◽

Security Vulnerabilities ◽

Smart Systems ◽

Learning Techniques ◽

Positive Rate

With advancements in engineering and science, the application of smart systems is increasing, generating a faster growth of the IoT network traffic. The limitations due to IoT restricted power and computing devices also raise concerns about security vulnerabilities. Machine learning-based techniques have recently gained credibility in a successful application for the detection of network anomalies, including IoT networks. However, machine learning techniques cannot work without representative data. Given the scarcity of IoT datasets, the DAD emerged as an instrument for knowing the behavior of dedicated IoT-MQTT networks. This paper aims to validate the DAD dataset by applying Logistic Regression, Naive Bayes, Random Forest, AdaBoost, and Support Vector Machine to detect traffic anomalies in IoT. To obtain the best results, techniques for handling unbalanced data, feature selection, and grid search for hyperparameter optimization have been used. The experimental results show that the proposed dataset can achieve a high detection rate in all the experiments, providing the best mean accuracy of 0.99 for the tree-based models, with a low false-positive rate, ensuring effective anomaly detection.

Download Full-text

Machine Learning for Automated Polyp Detection in Computed Tomography Colonography

Machine Learning ◽

10.4018/978-1-60960-818-7.ch407 ◽

2012 ◽

pp. 830-850

Author(s):

Abhilash Alexander Miranda ◽

Olivier Caelen ◽

Gianluca Bontempi

Keyword(s):

Machine Learning ◽

Computed Tomography ◽

False Positive ◽

False Positive Rate ◽

Learning Algorithms ◽

Colorectal Polyps ◽

Machine Learning Algorithms ◽

Computed Tomography Colonography ◽

Positive Rate ◽

Independent Features

This chapter presents a comprehensive scheme for automated detection of colorectal polyps in computed tomography colonography (CTC) with particular emphasis on robust learning algorithms that differentiate polyps from non-polyp shapes. The authors’ automated CTC scheme introduces two orientation independent features which encode the shape characteristics that aid in classification of polyps and non-polyps with high accuracy, low false positive rate, and low computations making the scheme suitable for colorectal cancer screening initiatives. Experiments using state-of-the-art machine learning algorithms viz., lazy learning, support vector machines, and naïve Bayes classifiers reveal the robustness of the two features in detecting polyps at 100% sensitivity for polyps with diameter greater than 10 mm while attaining total low false positive rates, respectively, of 3.05, 3.47 and 0.71 per CTC dataset at specificities above 99% when tested on 58 CTC datasets. The results were validated using colonoscopy reports provided by expert radiologists.

Download Full-text

Identification of newborns at risk for autism using electronic medical records and machine learning

European Psychiatry ◽

10.1192/j.eurpsy.2020.17 ◽

2020 ◽

Vol 63 (1) ◽

Author(s):

Rayees Rahman ◽

Arad Kodesh ◽

Stephen Z. Levine ◽

Sven Sandin ◽

Abraham Reichenberg ◽

...

Keyword(s):

Machine Learning ◽

General Population ◽

Electronic Medical Records ◽

False Positive ◽

Medical Records ◽

False Positive Rate ◽

Autism Spectrum ◽

Health Maintenance ◽

C Statistic ◽

Positive Rate

Abstract Background. Current approaches for early identification of individuals at high risk for autism spectrum disorder (ASD) in the general population are limited, and most ASD patients are not identified until after the age of 4. This is despite substantial evidence suggesting that early diagnosis and intervention improves developmental course and outcome. The aim of the current study was to test the ability of machine learning (ML) models applied to electronic medical records (EMRs) to predict ASD early in life, in a general population sample. Methods. We used EMR data from a single Israeli Health Maintenance Organization, including EMR information for parents of 1,397 ASD children (ICD-9/10) and 94,741 non-ASD children born between January 1st, 1997 and December 31st, 2008. Routinely available parental sociodemographic information, parental medical histories, and prescribed medications data were used to generate features to train various ML algorithms, including multivariate logistic regression, artificial neural networks, and random forest. Prediction performance was evaluated with 10-fold cross-validation by computing the area under the receiver operating characteristic curve (AUC; C-statistic), sensitivity, specificity, accuracy, false positive rate, and precision (positive predictive value [PPV]). Results. All ML models tested had similar performance. The average performance across all models had C-statistic of 0.709, sensitivity of 29.93%, specificity of 98.18%, accuracy of 95.62%, false positive rate of 1.81%, and PPV of 43.35% for predicting ASD in this dataset. Conclusions. We conclude that ML algorithms combined with EMR capture early life ASD risk as well as reveal previously unknown features to be associated with ASD-risk. Such approaches may be able to enhance the ability for accurate and efficient early detection of ASD in large populations of children.

Download Full-text

Data Mining Validation of Fluconazole Breakpoints Established by the European Committee on Antimicrobial Susceptibility Testing

Antimicrobial Agents and Chemotherapy ◽

10.1128/aac.00081-09 ◽

2009 ◽

Vol 53 (7) ◽

pp. 2949-2954 ◽

Cited By ~ 18

Author(s):

Isabel Cuesta ◽

Concha Bielza ◽

Pedro Larrañaga ◽

Manuel Cuenca-Estrella ◽

Fernando Laguna ◽

...

Keyword(s):

Machine Learning ◽

Antimicrobial Susceptibility ◽

Roc Curve ◽

False Positive ◽

Statistical Power ◽

Susceptibility Testing ◽

False Positive Rate ◽

Antimicrobial Susceptibility Testing ◽

European Committee ◽

Positive Rate

ABSTRACT European Committee on Antimicrobial Susceptibility Testing (EUCAST) breakpoints classify Candida strains with a fluconazole MIC ≤ 2 mg/liter as susceptible, those with a fluconazole MIC of 4 mg/liter as representing intermediate susceptibility, and those with a fluconazole MIC > 4 mg/liter as resistant. Machine learning models are supported by complex statistical analyses assessing whether the results have statistical relevance. The aim of this work was to use supervised classification algorithms to analyze the clinical data used to produce EUCAST fluconazole breakpoints. Five supervised classifiers (J48, Correlation and Regression Trees [CART], OneR, Naïve Bayes, and Simple Logistic) were used to analyze two cohorts of patients with oropharyngeal candidosis and candidemia. The target variable was the outcome of the infections, and the predictor variables consisted of values for the MIC or the proportion between the dose administered and the MIC of the isolate (dose/MIC). Statistical power was assessed by determining values for sensitivity and specificity, the false-positive rate, the area under the receiver operating characteristic (ROC) curve, and the Matthews correlation coefficient (MCC). CART obtained the best statistical power for a MIC > 4 mg/liter for detecting failures (sensitivity, 87%; false-positive rate, 8%; area under the ROC curve, 0.89; MCC index, 0.80). For dose/MIC determinations, the target was >75, with a sensitivity of 91%, a false-positive rate of 10%, an area under the ROC curve of 0.90, and an MCC index of 0.80. Other classifiers gave similar breakpoints with lower statistical power. EUCAST fluconazole breakpoints have been validated by means of machine learning methods. These computer tools must be incorporated in the process for developing breakpoints to avoid researcher bias, thus enhancing the statistical power of the model.

Download Full-text

Dear Watch, Should I get a COVID Test? Designing deployable machine learning for wearables

10.21203/rs.3.rs-505984/v1 ◽

2021 ◽

Author(s):

Anna Goldenberg ◽

Bret Nestor ◽

Jaryd Hunter ◽

Raghu Kainkaryam ◽

Erik Drysdale ◽

...

Keyword(s):

Public Health ◽

Machine Learning ◽

Real World ◽

False Positive Rate ◽

Wearable Devices ◽

Classification Performance ◽

Wearable Device ◽

Screening Tools ◽

Machine Learning Model ◽

Positive Rate

Abstract Commercial wearable devices are surfacing as an appealing mechanism to detect COVID-19 and potentially other public health threats, due to their widespread use. To assess the validity of wearable devices as population health screening tools, it is essential to evaluate predictive methodologies based on wearable devices by mimicking their real-world deployment. Several points must be addressed to transition from statistically significant differences between infected and uninfected cohorts to COVID-19 inferences on individuals. We demonstrate the strengths and shortcomings of existing approaches on a cohort of 32,198 individuals who experience influenza like illness (ILI), 204 of which report testing positive for COVID-19. We show that, despite commonly made design mistakes resulting in overestimation of performance, when properly designed wearables can be effectively used as a part of the detection pipeline. For example, knowing the week of year, combined with naive randomised test set generation leads to substantial overestimation of COVID-19 classification performance at 0.73 AUROC. However, an average AUROC of only 0.55 +/- 0.02 would be attainable in a simulation of real-world deployment, due to the shifting prevalence of COVID-19 and non-COVID-19 ILI to trigger further testing. In this work we show how to train a machine learning model to differentiate ILI days from healthy days, followed by a survey to differentiate COVID-19 from influenza and unspecified ILI based on symptoms. In a forthcoming week, models can expect a sensitivity of 0.50 (0-0.74, 95% CI), while utilising the wearable device to reduce the burden of surveys by 35%. The corresponding false positive rate is 0.22 (0.02-0.47, 95% CI). In the future, serious consideration must be given to the design, evaluation, and reporting of wearable device interventions if they are to be relied upon as part of frequent COVID-19 or other public health threat testing infrastructures.

Download Full-text

CRISPRIdentify: Identification of CRISPR arrays using machine learning approach

10.1101/2020.11.05.369512 ◽

2020 ◽

Cited By ~ 1

Author(s):

Alexander Mitrofanov ◽

Omer S. Alkhnbashi ◽

Sergey A. Shmakov ◽

Kira S. Makarova ◽

Eugene V. Koonin ◽

...

Keyword(s):

Machine Learning ◽

Dna Sequences ◽

False Positive Rate ◽

Scoring Function ◽

Protein S ◽

Genomic Region ◽

Immune Functions ◽

Crispr Array ◽

Positive Rate ◽

Data Driven Approach

CRISPR-Cas are adaptive immune systems that degrade foreign genetic elements in archaea and bacteria. In carrying out their immune functions, CRISPR-Cas systems heavily rely on RNA components. These CRISPR (cr) RNAs are repeat-spacer units that are produced by processing of pre-crRNA, the transcript of CRISPR arrays, and guide Cas protein(s) to the cognate invading nucleic acids, enabling their destruction. Several bioinformatics tools have been developed to detect CRISPR arrays based solely on DNA sequences, but all these tools employ the same strategy of looking for repetitive patterns, which might correspond to CRISPR array repeats. The identified patterns are evaluated using a fixed, built-in scoring function, and arrays exceeding a cut-off value are reported. Here, we instead introduce a data-driven approach that uses machine learning to detect and differentiate true CRISPR arrays from false ones based on several features. Our CRISPR detection tool, CRISPRIdentify, performs three steps: detection, feature extraction and classification based on manually curated sets of positive and negative examples of CRISPR arrays. The identified CRISPR arrays are then reported to the user accompanied by detailed annotation. We demonstrate that our approach identifies not only previously detected CRISPR arrays, but also CRISPR array candidates not detected by other tools. Compared to other methods, our tool has a drastically reduced false-positive rate. In contrast to the existing tools, our approach not only provides the user with the basic statistics on the identified CRISPR arrays but also produces a certainty score as a practical measure of the likelihood that a given genomic region is a CRISPR array.

Download Full-text

Machine Learning Based Technique for Detection of Rank Attack in RPL based Internet of Things Networks

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.i3044.0789s319 ◽

2019 ◽

Vol 8 (9S3) ◽

pp. 244-248

Keyword(s):

Machine Learning ◽

Internet Of Things ◽

Nearest Neighbor ◽

False Positive Rate ◽

Security And Privacy ◽

Weak Links ◽

K Nearest Neighbor ◽

Detection Mechanism ◽

Wormhole Attack ◽

Positive Rate

Internet of Things (IoT) is a new Paradiagram in the network technology. It has the vast application in almost every field like retail, industries, and healthcare etc. It has challenges like security and privacy, robustness, weak links, less power, etc. A major challenge among these is security. Due to the weak connectivity links, these Internet of Things network leads to many attacks in the network layer. RPL is a routing protocol which establishes a path particularly for the constrained nodes in Internet of Things based networks. These RPL based network is exposed to many attacks like black hole attack, wormhole attack, sinkhole attack, rank attack, etc. This paper proposed a detection technique for rank attack based on the machine learning approach called MLTKNN, based on K-nearest neighbor algorithm. The proposed technique was simulated in the Cooja simulation with 30 motes and calculated the true positive rate and false positive rate of the proposed detection mechanism. Finally proved that, the performance of the proposed technique was efficient in terms of the delay, packet delivery rate and in detection of the rank attack.

Download Full-text