Predicting Hard Disk Failure by Means of Automatized Labeling and Machine Learning Approach

Today, cloud systems provide many key services to development and production environments; reliable storage services are crucial for a multitude of applications ranging from commercial manufacturing, distribution and sales up to scientific research, which is often at the forefront of computing resource demands. In large-scale computer centers, the storage system requires particular attention and investment; usually, a large number of diverse storage devices need to be deployed in order to match the varying performance and volume requirements of changing user applications. As of today, magnetic drives still play a dominant role in terms of deployed storage volume and of service outages due to device failure. In this paper, we study methods to facilitate automated proactive disk replacement. We propose a method to identify disks with media failures in a production environment and describe an application of supervised machine learning to predict disk failures. In particular, a proper stage to automatically label (healthy/at-risk) the disks during the training and validation stage is presented along with tuning strategy to optimize the hyperparameters of the associated machine learning classifier. The approach is trained and validated against a large set of 65,000 hard drives in the CERN computer center, and the achieved results are discussed.

Download Full-text

Adsorption Isotherm Predictions for Multiple Molecules in MOFs Using the Same Deep Learning Model

10.26434/chemrxiv.9894224.v1 ◽

2019 ◽

Author(s):

Ryther Anderson ◽

Achay Biong ◽

Diego Gómez-Gualdrón

Keyword(s):

Neural Network ◽

Machine Learning ◽

Molecular Simulation ◽

Large Scale ◽

Learning Model ◽

Operating Conditions ◽

Small Subset ◽

Screening Methods ◽

Large Set ◽

Metal Organic

<div>Tailoring the structure and chemistry of metal-organic frameworks (MOFs) enables the manipulation of their adsorption properties to suit specific energy and environmental applications. As there are millions of possible MOFs (with tens of thousands already synthesized), molecular simulation, such as grand canonical Monte Carlo (GCMC), has frequently been used to rapidly evaluate the adsorption performance of a large set of MOFs. This allows subsequent experiments to focus only on a small subset of the most promising MOFs. In many instances, however, even molecular simulation becomes prohibitively time consuming, underscoring the need for alternative screening methods, such as machine learning, to precede molecular simulation efforts. In this study, as a proof of concept, we trained a neural network as the first example of a machine learning model capable of predicting full adsorption isotherms of different molecules not included in the training of the model. To achieve this, we trained our neural network only on alchemical species, represented only by their geometry and force field parameters, and used this neural network to predict the loadings of real adsorbates. We focused on predicting room temperature adsorption of small (one- and two-atom) molecules relevant to chemical separations. Namely, argon, krypton, xenon, methane, ethane, and nitrogen. However, we also observed surprisingly promising predictions for more complex molecules, whose properties are outside the range spanned by the alchemical adsorbates. Prediction accuracies suitable for large-scale screening were achieved using simple MOF (e.g. geometric properties and chemical moieties), and adsorbate (e.g. forcefield parameters and geometry) descriptors. Our results illustrate a new philosophy of training that opens the path towards development of machine learning models that can predict the adsorption loading of any new adsorbate at any new operating conditions in any new MOF.</div>

Download Full-text

Mobile Agent-Based Computing Resource and Usage Monitoring at Large Scale Computer Centers

Volume 3: 2011 ASME/IEEE International Conference on Mechatronic and Embedded Systems and Applications, Parts A and B ◽

10.1115/detc2011-48699 ◽

2011 ◽

Cited By ~ 1

Author(s):

Zhixin Tie ◽

David Ko ◽

Harry H. Cheng

Keyword(s):

Monitoring System ◽

Mobile Agent ◽

Computer Games ◽

Data Exchange ◽

Large Scale ◽

Agent Based ◽

Computer Center ◽

Control Command ◽

Computer Centers ◽

Computer Resources

Mobile agent technology has become an important approach for the design and development of distributed systems. However, there is little research regarding the monitoring of computer resources and usage at large scale distributed computer centers. This paper presents a mobile agent-based system called the Mobile Agent Based Computer Monitoring System (MABCMS) that supports the dynamic sending and executing of control command, dynamic data exchange, and dynamic deployment of mobile code in C/C++. Based on the Mobile-C library, agents can call low level functions in binary dynamic or static libraries, and thus can monitor computer resources and usage conveniently and efficiently. Two experimental applications have been designed using the MABCMS. The experiments were conducted in a university computer center with hundreds of computer workstations and 15 server machines. The first experiment uses the MABCMS to detect improper usage of the computer workstations, such as playing computer games. The second experimental application uses the MABCMS to detect system resources such as available hard disk space. The experiments show that the mobile agent based monitoring system is an effective method for detecting and interacting with students playing computer games and a practical way to monitor computer resources in large scale distributed computer centers.

Download Full-text

Nonprofit Role Classification Using Mission Descriptions and Supervised Machine Learning

Nonprofit and Voluntary Sector Quarterly ◽

10.1177/08997640211057393 ◽

2021 ◽

pp. 089976402110573

Author(s):

Megan LePere-Schloop

Keyword(s):

Machine Learning ◽

Geographic Variation ◽

Mission Statements ◽

Research Note ◽

Supervised Machine Learning ◽

Future Research ◽

Large Set ◽

Large Sample ◽

Qualitative Approaches

Scholars have used both quantitative and qualitative approaches to empirically study nonprofit roles. Mission statements and program descriptions often reflect such roles, however, until recently collecting and classifying a large sample has been labor-intensive. This research note uses data on United Ways that e-filed their 990 forms and supervised machine learning to illustrate an approach for classifying a large set of mission descriptions by roles. Temporal and geographic variation in roles detected in mission statements suggests that such an approach may be fruitful in future research.

Download Full-text

A Machine Learning Pipeline for Demand Response Capacity Scheduling

Energies ◽

10.3390/en13071848 ◽

2020 ◽

Vol 13 (7) ◽

pp. 1848 ◽

Cited By ~ 1

Author(s):

Gautham Krishnadas ◽

Aristides Kiprakis

Keyword(s):

Machine Learning ◽

Energy Balance ◽

Smart Grid ◽

Demand Response ◽

Large Scale ◽

Performance Metrics ◽

Supervised Machine Learning ◽

Algorithm Selection ◽

Load Forecast ◽

Forecast Models

Demand response (DR) is an integral component of smart grid operations that offers the necessary flexibility to support its decarbonisation. In incentive-based DR programs, deviations from the scheduled DR capacity affect the grid’s energy balance and result in revenue losses for the DR participants. This issue aggravates with increasing DR delivery from participants such as large consumer buildings who have limited standard methods to follow for DR capacity scheduling. Load curtailment based DR capacity availability from such consumers can be forecasted reliably with the help of supervised machine learning (ML) models. This study demonstrates the development of data-driven ML based total and flexible load forecast models for a retail building. The ML model development tasks such as data pre-processing, training-testing dataset preparation, cross-validation, algorithm selection, hyperparameter optimisation, feature ranking, model selection and model evaluation are guided by deployment-centric design criteria such as reliability, computational efficiency and scalability. Based on the selected performance metrics, the day-ahead and week-ahead ML based load forecast models developed for the retail building are shown to outperform the timeseries persistence models used for benchmarking. Furthermore, the deployment of these models for DR capacity scheduling is proposed as an ML pipeline that can be realised with the help of ML workflows, computational resources as well as systems for monitoring and visualisation. The ML pipeline ensures faster, cost-effective and large-scale deployment of forecast models that support reliable DR capacity scheduling without affecting the grid’s energy balance. Minimisation of revenue losses encourages increased DR participation from large consumer buildings, ensuring further flexibility in the smart grid.

Download Full-text

A deep learning and novelty detection framework for rapid phenotyping in high-content screening

10.1101/134627 ◽

2017 ◽

Cited By ~ 2

Author(s):

Christoph Sommer ◽

Rudolf Hoefler ◽

Matthias Samwer ◽

Daniel W. Gerlich

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Large Scale ◽

Novelty Detection ◽

A Priori ◽

Mitotic Cell ◽

Supervised Machine Learning ◽

High Content Screening ◽

Data Sets ◽

User Training

AbstractSupervised machine learning is a powerful and widely used method to analyze high-content screening data. Despite its accuracy, efficiency, and versatility, supervised machine learning has drawbacks, most notably its dependence on a priori knowledge of expected phenotypes and time-consuming classifier training. We provide a solution to these limitations with CellCognition Explorer, a generic novelty detection and deep learning framework. Application to several large-scale screening data sets on nuclear and mitotic cell morphologies demonstrates that CellCognition Explorer enables discovery of rare phenotypes without user training, which has broad implications for improved assay development in high-content screening.

Download Full-text

Learning from the 2018 Western Japan Heavy Rains to Detect Floods during the 2019 Hagibis Typhoon

Remote Sensing ◽

10.3390/rs12142244 ◽

2020 ◽

Vol 12 (14) ◽

pp. 2244

Author(s):

Luis Moya ◽

Erick Mas ◽

Shunichi Koshimura

Keyword(s):

Machine Learning ◽

Real Time ◽

Local Governments ◽

Large Scale ◽

Damage Identification ◽

Remote Sensing Data ◽

Early Response ◽

Training Data ◽

Supervised Machine Learning ◽

A Current

Applications of machine learning on remote sensing data appear to be endless. Its use in damage identification for early response in the aftermath of a large-scale disaster has a specific issue. The collection of training data right after a disaster is costly, time-consuming, and many times impossible. This study analyzes a possible solution to the referred issue: the collection of training data from past disaster events to calibrate a discriminant function. Then the identification of affected areas in a current disaster can be performed in near real-time. The performance of a supervised machine learning classifier to learn from training data collected from the 2018 heavy rainfall at Okayama Prefecture, Japan, and to identify floods due to the typhoon Hagibis on 12 October 2019 at eastern Japan is reported in this paper. The results show a moderate agreement with flood maps provided by local governments and public institutions, and support the assumption that previous disaster information can be used to identify a current disaster in near-real time.

Download Full-text

Supervised machine learning for diagnostic classification from large-scale neuroimaging datasets

Brain Imaging and Behavior ◽

10.1007/s11682-019-00191-8 ◽

2019 ◽

Vol 14 (6) ◽

pp. 2378-2416 ◽

Cited By ~ 5

Author(s):

Pradyumna Lanka ◽

D Rangaprakash ◽

Michael N. Dretsch ◽

Jeffrey S. Katz ◽

Thomas S. Denney ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

Supervised Machine Learning ◽

Diagnostic Classification

Download Full-text

Utilizing supervised machine learning to identify microglia and astrocytes in situ: implications for large-scale image analysis and quantification

Journal of Neuroscience Methods ◽

10.1016/j.jneumeth.2019.108424 ◽

2019 ◽

Vol 328 ◽

pp. 108424

Author(s):

M Liu ◽

J Ylanko ◽

E Weekman ◽

T Beckett ◽

D Andrews ◽

...

Keyword(s):

Machine Learning ◽

Image Analysis ◽

Large Scale ◽

Supervised Machine Learning

Download Full-text

Characterizing and Identifying the Prevalence of Web-Based Misinformation Relating to Medication for Opioid Use Disorder: Machine Learning Approach

Journal of Medical Internet Research ◽

10.2196/30753 ◽

2021 ◽

Vol 23 (12) ◽

pp. e30753

Author(s):

Mai ElSherief ◽

Steven A Sumner ◽

Christopher M Jones ◽

Royal K Law ◽

Akadia Kacha-Ochana ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Large Scale ◽

Addiction Treatment ◽

Opioid Use Disorder ◽

Supervised Machine Learning ◽

Computational Techniques ◽

Opioid Use ◽

Web Based ◽

Health Communities

Background Expanding access to and use of medication for opioid use disorder (MOUD) is a key component of overdose prevention. An important barrier to the uptake of MOUD is exposure to inaccurate and potentially harmful health misinformation on social media or web-based forums where individuals commonly seek information. There is a significant need to devise computational techniques to describe the prevalence of web-based health misinformation related to MOUD to facilitate mitigation efforts. Objective By adopting a multidisciplinary, mixed methods strategy, this paper aims to present machine learning and natural language analysis approaches to identify the characteristics and prevalence of web-based misinformation related to MOUD to inform future prevention, treatment, and response efforts. Methods The team harnessed public social media posts and comments in the English language from Twitter (6,365,245 posts), YouTube (99,386 posts), Reddit (13,483,419 posts), and Drugs-Forum (5549 posts). Leveraging public health expert annotations on a sample of 2400 of these social media posts that were found to be semantically most similar to a variety of prevailing opioid use disorder–related myths based on representational learning, the team developed a supervised machine learning classifier. This classifier identified whether a post’s language promoted one of the leading myths challenging addiction treatment: that the use of agonist therapy for MOUD is simply replacing one drug with another. Platform-level prevalence was calculated thereafter by machine labeling all unannotated posts with the classifier and noting the proportion of myth-indicative posts over all posts. Results Our results demonstrate promise in identifying social media postings that center on treatment myths about opioid use disorder with an accuracy of 91% and an area under the curve of 0.9, including how these discussions vary across platforms in terms of prevalence and linguistic characteristics, with the lowest prevalence on web-based health communities such as Reddit and Drugs-Forum and the highest on Twitter. Specifically, the prevalence of the stated MOUD myth ranged from 0.4% on web-based health communities to 0.9% on Twitter. Conclusions This work provides one of the first large-scale assessments of a key MOUD-related myth across multiple social media platforms and highlights the feasibility and importance of ongoing assessment of health misinformation related to addiction treatment.

Download Full-text

Machine-learning for cluster analysis of localization microscopy data.

10.1101/505719 ◽

2018 ◽

Author(s):

David J Williamson ◽

Garth L Burn ◽

Juliette Griffie ◽

Daniel M Davis ◽

Dylan M Owen

Keyword(s):

Machine Learning ◽

Cluster Analysis ◽

Single Molecule ◽

Large Scale ◽

Supervised Machine Learning ◽

Point Pattern ◽

Data Set ◽

Localization Microscopy ◽

Human T Cell ◽

Microscopy Data

Quantifying the clustering of points within single-molecule localization microscopy data is useful to understanding the spatial relationships of the molecules in the underlying sample. The conversion of point pattern data into a meaningful description of clustering is difficult, especially for biologically derived data, as the definitions of clustering are often subjective or simplistic. Many existing computational approaches are also limited in their ability to process large-scale data-sets or to deal effectively with inhomogeneities in clustering. Here we have developed a supervised machine-learning approach to cluster analysis which is fast and accurate. Trained on a variety of simulated clustered data, the network can then classify all points from a typical localization microscopy data-set (several million points from the entire field of view) as being either clustered or not-clustered, with the potential to include additional classifiers to describe different types of clusters. Clustered points can then be further refined into like-clusters for the measurement of cluster area, shape, and point-density. We demonstrate the performance on simulated data and experimental data of the kinase Csk and the adaptor PAG in both naive and pre-stimulated primary human T cell synapses.

Download Full-text