scholarly journals Predicting Hard Disk Failure by Means of Automatized Labeling and Machine Learning Approach

2021 ◽  
Vol 11 (18) ◽  
pp. 8293
Author(s):  
Federico Gargiulo ◽  
Dirk Duellmann ◽  
Pasquale Arpaia ◽  
Rosario Schiano Lo Moriello

Today, cloud systems provide many key services to development and production environments; reliable storage services are crucial for a multitude of applications ranging from commercial manufacturing, distribution and sales up to scientific research, which is often at the forefront of computing resource demands. In large-scale computer centers, the storage system requires particular attention and investment; usually, a large number of diverse storage devices need to be deployed in order to match the varying performance and volume requirements of changing user applications. As of today, magnetic drives still play a dominant role in terms of deployed storage volume and of service outages due to device failure. In this paper, we study methods to facilitate automated proactive disk replacement. We propose a method to identify disks with media failures in a production environment and describe an application of supervised machine learning to predict disk failures. In particular, a proper stage to automatically label (healthy/at-risk) the disks during the training and validation stage is presented along with tuning strategy to optimize the hyperparameters of the associated machine learning classifier. The approach is trained and validated against a large set of 65,000 hard drives in the CERN computer center, and the achieved results are discussed.

2019 ◽  
Author(s):  
Ryther Anderson ◽  
Achay Biong ◽  
Diego Gómez-Gualdrón

<div>Tailoring the structure and chemistry of metal-organic frameworks (MOFs) enables the manipulation of their adsorption properties to suit specific energy and environmental applications. As there are millions of possible MOFs (with tens of thousands already synthesized), molecular simulation, such as grand canonical Monte Carlo (GCMC), has frequently been used to rapidly evaluate the adsorption performance of a large set of MOFs. This allows subsequent experiments to focus only on a small subset of the most promising MOFs. In many instances, however, even molecular simulation becomes prohibitively time consuming, underscoring the need for alternative screening methods, such as machine learning, to precede molecular simulation efforts. In this study, as a proof of concept, we trained a neural network as the first example of a machine learning model capable of predicting full adsorption isotherms of different molecules not included in the training of the model. To achieve this, we trained our neural network only on alchemical species, represented only by their geometry and force field parameters, and used this neural network to predict the loadings of real adsorbates. We focused on predicting room temperature adsorption of small (one- and two-atom) molecules relevant to chemical separations. Namely, argon, krypton, xenon, methane, ethane, and nitrogen. However, we also observed surprisingly promising predictions for more complex molecules, whose properties are outside the range spanned by the alchemical adsorbates. Prediction accuracies suitable for large-scale screening were achieved using simple MOF (e.g. geometric properties and chemical moieties), and adsorbate (e.g. forcefield parameters and geometry) descriptors. Our results illustrate a new philosophy of training that opens the path towards development of machine learning models that can predict the adsorption loading of any new adsorbate at any new operating conditions in any new MOF.</div>


Author(s):  
Zhixin Tie ◽  
David Ko ◽  
Harry H. Cheng

Mobile agent technology has become an important approach for the design and development of distributed systems. However, there is little research regarding the monitoring of computer resources and usage at large scale distributed computer centers. This paper presents a mobile agent-based system called the Mobile Agent Based Computer Monitoring System (MABCMS) that supports the dynamic sending and executing of control command, dynamic data exchange, and dynamic deployment of mobile code in C/C++. Based on the Mobile-C library, agents can call low level functions in binary dynamic or static libraries, and thus can monitor computer resources and usage conveniently and efficiently. Two experimental applications have been designed using the MABCMS. The experiments were conducted in a university computer center with hundreds of computer workstations and 15 server machines. The first experiment uses the MABCMS to detect improper usage of the computer workstations, such as playing computer games. The second experimental application uses the MABCMS to detect system resources such as available hard disk space. The experiments show that the mobile agent based monitoring system is an effective method for detecting and interacting with students playing computer games and a practical way to monitor computer resources in large scale distributed computer centers.


2021 ◽  
pp. 089976402110573
Author(s):  
Megan LePere-Schloop

Scholars have used both quantitative and qualitative approaches to empirically study nonprofit roles. Mission statements and program descriptions often reflect such roles, however, until recently collecting and classifying a large sample has been labor-intensive. This research note uses data on United Ways that e-filed their 990 forms and supervised machine learning to illustrate an approach for classifying a large set of mission descriptions by roles. Temporal and geographic variation in roles detected in mission statements suggests that such an approach may be fruitful in future research.


Energies ◽  
2020 ◽  
Vol 13 (7) ◽  
pp. 1848 ◽  
Author(s):  
Gautham Krishnadas ◽  
Aristides Kiprakis

Demand response (DR) is an integral component of smart grid operations that offers the necessary flexibility to support its decarbonisation. In incentive-based DR programs, deviations from the scheduled DR capacity affect the grid’s energy balance and result in revenue losses for the DR participants. This issue aggravates with increasing DR delivery from participants such as large consumer buildings who have limited standard methods to follow for DR capacity scheduling. Load curtailment based DR capacity availability from such consumers can be forecasted reliably with the help of supervised machine learning (ML) models. This study demonstrates the development of data-driven ML based total and flexible load forecast models for a retail building. The ML model development tasks such as data pre-processing, training-testing dataset preparation, cross-validation, algorithm selection, hyperparameter optimisation, feature ranking, model selection and model evaluation are guided by deployment-centric design criteria such as reliability, computational efficiency and scalability. Based on the selected performance metrics, the day-ahead and week-ahead ML based load forecast models developed for the retail building are shown to outperform the timeseries persistence models used for benchmarking. Furthermore, the deployment of these models for DR capacity scheduling is proposed as an ML pipeline that can be realised with the help of ML workflows, computational resources as well as systems for monitoring and visualisation. The ML pipeline ensures faster, cost-effective and large-scale deployment of forecast models that support reliable DR capacity scheduling without affecting the grid’s energy balance. Minimisation of revenue losses encourages increased DR participation from large consumer buildings, ensuring further flexibility in the smart grid.


2017 ◽  
Author(s):  
Christoph Sommer ◽  
Rudolf Hoefler ◽  
Matthias Samwer ◽  
Daniel W. Gerlich

AbstractSupervised machine learning is a powerful and widely used method to analyze high-content screening data. Despite its accuracy, efficiency, and versatility, supervised machine learning has drawbacks, most notably its dependence on a priori knowledge of expected phenotypes and time-consuming classifier training. We provide a solution to these limitations with CellCognition Explorer, a generic novelty detection and deep learning framework. Application to several large-scale screening data sets on nuclear and mitotic cell morphologies demonstrates that CellCognition Explorer enables discovery of rare phenotypes without user training, which has broad implications for improved assay development in high-content screening.


2020 ◽  
Vol 12 (14) ◽  
pp. 2244
Author(s):  
Luis Moya ◽  
Erick Mas ◽  
Shunichi Koshimura

Applications of machine learning on remote sensing data appear to be endless. Its use in damage identification for early response in the aftermath of a large-scale disaster has a specific issue. The collection of training data right after a disaster is costly, time-consuming, and many times impossible. This study analyzes a possible solution to the referred issue: the collection of training data from past disaster events to calibrate a discriminant function. Then the identification of affected areas in a current disaster can be performed in near real-time. The performance of a supervised machine learning classifier to learn from training data collected from the 2018 heavy rainfall at Okayama Prefecture, Japan, and to identify floods due to the typhoon Hagibis on 12 October 2019 at eastern Japan is reported in this paper. The results show a moderate agreement with flood maps provided by local governments and public institutions, and support the assumption that previous disaster information can be used to identify a current disaster in near-real time.


2019 ◽  
Vol 14 (6) ◽  
pp. 2378-2416 ◽  
Author(s):  
Pradyumna Lanka ◽  
D Rangaprakash ◽  
Michael N. Dretsch ◽  
Jeffrey S. Katz ◽  
Thomas S. Denney ◽  
...  

10.2196/30753 ◽  
2021 ◽  
Vol 23 (12) ◽  
pp. e30753
Author(s):  
Mai ElSherief ◽  
Steven A Sumner ◽  
Christopher M Jones ◽  
Royal K Law ◽  
Akadia Kacha-Ochana ◽  
...  

Background Expanding access to and use of medication for opioid use disorder (MOUD) is a key component of overdose prevention. An important barrier to the uptake of MOUD is exposure to inaccurate and potentially harmful health misinformation on social media or web-based forums where individuals commonly seek information. There is a significant need to devise computational techniques to describe the prevalence of web-based health misinformation related to MOUD to facilitate mitigation efforts. Objective By adopting a multidisciplinary, mixed methods strategy, this paper aims to present machine learning and natural language analysis approaches to identify the characteristics and prevalence of web-based misinformation related to MOUD to inform future prevention, treatment, and response efforts. Methods The team harnessed public social media posts and comments in the English language from Twitter (6,365,245 posts), YouTube (99,386 posts), Reddit (13,483,419 posts), and Drugs-Forum (5549 posts). Leveraging public health expert annotations on a sample of 2400 of these social media posts that were found to be semantically most similar to a variety of prevailing opioid use disorder–related myths based on representational learning, the team developed a supervised machine learning classifier. This classifier identified whether a post’s language promoted one of the leading myths challenging addiction treatment: that the use of agonist therapy for MOUD is simply replacing one drug with another. Platform-level prevalence was calculated thereafter by machine labeling all unannotated posts with the classifier and noting the proportion of myth-indicative posts over all posts. Results Our results demonstrate promise in identifying social media postings that center on treatment myths about opioid use disorder with an accuracy of 91% and an area under the curve of 0.9, including how these discussions vary across platforms in terms of prevalence and linguistic characteristics, with the lowest prevalence on web-based health communities such as Reddit and Drugs-Forum and the highest on Twitter. Specifically, the prevalence of the stated MOUD myth ranged from 0.4% on web-based health communities to 0.9% on Twitter. Conclusions This work provides one of the first large-scale assessments of a key MOUD-related myth across multiple social media platforms and highlights the feasibility and importance of ongoing assessment of health misinformation related to addiction treatment.


2018 ◽  
Author(s):  
David J Williamson ◽  
Garth L Burn ◽  
Juliette Griffie ◽  
Daniel M Davis ◽  
Dylan M Owen

Quantifying the clustering of points within single-molecule localization microscopy data is useful to understanding the spatial relationships of the molecules in the underlying sample. The conversion of point pattern data into a meaningful description of clustering is difficult, especially for biologically derived data, as the definitions of clustering are often subjective or simplistic. Many existing computational approaches are also limited in their ability to process large-scale data-sets or to deal effectively with inhomogeneities in clustering. Here we have developed a supervised machine-learning approach to cluster analysis which is fast and accurate. Trained on a variety of simulated clustered data, the network can then classify all points from a typical localization microscopy data-set (several million points from the entire field of view) as being either clustered or not-clustered, with the potential to include additional classifiers to describe different types of clusters. Clustered points can then be further refined into like-clusters for the measurement of cluster area, shape, and point-density. We demonstrate the performance on simulated data and experimental data of the kinase Csk and the adaptor PAG in both naive and pre-stimulated primary human T cell synapses.


Sign in / Sign up

Export Citation Format

Share Document