scholarly journals Collaborative Multi-Expert Active Learning for Mobile Health Monitoring: Architecture, Algorithms, and Evaluation

Sensors ◽  
2020 ◽  
Vol 20 (7) ◽  
pp. 1932
Author(s):  
Ramyar Saeedi ◽  
Keyvan Sasani ◽  
Assefaw H. Gebremedhin

Mobile health monitoring plays a central role in the future of cyber physical systems (CPS) for healthcare applications. Such monitoring systems need to process user data accurately. Unlike in other human-centered CPS, in healthcare CPS, the user functions in multiple roles all at the same time: as an operator, an actuator, the physical environment and, most importantly, the target that needs to be monitored in the process. Therefore, mobile health CPS devices face highly dynamic settings generally, and accuracy of the machine learning models the devices employ may drop dramatically every time a change in setting happens. Novel learning architecture that specifically address challenges associated with dynamic environments are therefore needed. Using active learning and transfer learning as organizing principles, we propose a collaborative multiple-expert architecture and accompanying algorithms for the design of machine learning models that autonomously adapt to a new configuration, context, or user need. Specifically, our architecture and its constituent algorithms are designed to manage heterogeneous knowledge sources or experts with varying levels of confidence and type while minimizing adaptation cost. Additionally, our framework incorporates a mechanism for collaboration among experts to enrich their knowledge, which in turn decreases both cost and uncertainty of data labeling in future steps. We evaluate the efficacy of the architecture using two publicly available human activity datasets. We attain activity recognition accuracy of over 85 % (for the first dataset) and 92 % (for the second dataset) by labeling only 15 % of unlabeled data.

2020 ◽  
Vol 6 (1) ◽  
Author(s):  
Ye Sheng ◽  
Yasong Wu ◽  
Jiong Yang ◽  
Wencong Lu ◽  
Pierre Villars ◽  
...  

Abstract The Materials Genome Initiative requires the crossing of material calculations, machine learning, and experiments to accelerate the material development process. In recent years, data-based methods have been applied to the thermoelectric field, mostly on the transport properties. In this work, we combined data-driven machine learning and first-principles automated calculations into an active learning loop, in order to predict the p-type power factors (PFs) of diamond-like pnictides and chalcogenides. Our active learning loop contains two procedures (1) based on a high-throughput theoretical database, machine learning methods are employed to select potential candidates and (2) computational verification is applied to these candidates about their transport properties. The verification data will be added into the database to improve the extrapolation abilities of the machine learning models. Different strategies of selecting candidates have been tested, finally the Gradient Boosting Regression model of Query by Committee strategy has the highest extrapolation accuracy (the Pearson R = 0.95 on untrained systems). Based on the prediction from the machine learning models, binary pnictides, vacancy, and small atom-containing chalcogenides are predicted to have large PFs. The bonding analysis reveals that the alterations of anionic bonding networks due to small atoms are beneficial to the PFs in these compounds.


2019 ◽  
Vol 10 (35) ◽  
pp. 8154-8163 ◽  
Author(s):  
Yao Zhang ◽  
Alpha A. Lee

We report a statistically principled method to quantify the uncertainty of machine learning models for molecular properties prediction. We show that this uncertainty estimate can be used to judiciously design experiments.


2020 ◽  
Vol 2 (1) ◽  
pp. 21
Author(s):  
Jaiber Camacho-Olarte ◽  
Julián Esteban Salomón Torres ◽  
Daniel A. Garavito Jimenez ◽  
Jersson X. Leon Medina ◽  
Ricardo C. Gomez Vargas ◽  
...  

Within a model of scientific and technical cooperation between the smelting company Cerro Matoso S.A. (CMSA) and the Universidad Nacional de Colombia (UNAL), a project was developed in order to take advantage of the data that were obtained from a sensor network in a ferronickel electric arc furnace at CMSA to improve the structural health monitoring process. Through this sensor network, online data are obtained on the temperature measurement along the refractory lining of the electric furnace, as well as heat fluxes and chemical characterization of the minerals on each stage of the process. These data are stored in a local database, which stores several years of historical data with valuable information for control and analysis purposes. These data reflect the behavior of the industrial process and can be used in the development of machine learning models to predict some of the electric arc furnace operation parameters, and thus improve the decision-making process. Currently, most of the data are analyzed by the experts of the structural control department, but, due to the large amount of data, the development of analytical tools is necessary to support their work. This paper proposes a data cleaning approach for improving data quality by creating a set of rules and filters based on both expert judgment and best practices in data quality. A statistical analysis was also carried out in order to detect variables with anomalies and outliers, which do not reflect real operation parameters and belong to anomalous data that should not be considered for modelling. With the proposed process, the quality of the data was improved and abnormal data were eliminated in order to consolidate a clean data set for later use in the development of machine learning models. This work contributes on understanding data cleansing rules that must be considered in order to reflect the real behavior of the electric furnace operation for further analysis and modeling tasks.


2018 ◽  
Vol 10 (1) ◽  
Author(s):  
Wang-Chi Cheung ◽  
Weiwen Zhang ◽  
Yong Liu ◽  
Feng Yang ◽  
Rick-Siow-Mong Goh

Recent studies have revealed the success of data-driven machine health monitoring, which motivates the use of machine learning models in machine health prognostic tasks. While the machine learning approach to health monitoring is gaining importance, the construction of machine learning models is often impeded by the difficulty in choosing the underlying hyper-parameter configuration (HP-config), which governs the construction of the machine learning model. While an effective choice of HP-config can be achieved with human effort, such an effort is often time consuming and requires domain knowledge. In this paper, we consider the use of Bayesian optimization algorithms, which automate an effective choice of HP-config by solving the associated hyperparameter optimization problem. Numerical experiments on the data from PHM 2016 Data Challenge demonstrate the salience of the proposed automatic framework, and exhibit improvement over default HP-configs in standard machine learning packages or chosen by a human agent.


10.2196/17984 ◽  
2020 ◽  
Vol 8 (3) ◽  
pp. e17984 ◽  
Author(s):  
Irena Spasic ◽  
Goran Nenadic

Background Clinical narratives represent the main form of communication within health care, providing a personalized account of patient history and assessments, and offering rich information for clinical decision making. Natural language processing (NLP) has repeatedly demonstrated its feasibility to unlock evidence buried in clinical narratives. Machine learning can facilitate rapid development of NLP tools by leveraging large amounts of text data. Objective The main aim of this study was to provide systematic evidence on the properties of text data used to train machine learning approaches to clinical NLP. We also investigated the types of NLP tasks that have been supported by machine learning and how they can be applied in clinical practice. Methods Our methodology was based on the guidelines for performing systematic reviews. In August 2018, we used PubMed, a multifaceted interface, to perform a literature search against MEDLINE. We identified 110 relevant studies and extracted information about text data used to support machine learning, NLP tasks supported, and their clinical applications. The data properties considered included their size, provenance, collection methods, annotation, and any relevant statistics. Results The majority of datasets used to train machine learning models included only hundreds or thousands of documents. Only 10 studies used tens of thousands of documents, with a handful of studies utilizing more. Relatively small datasets were utilized for training even when much larger datasets were available. The main reason for such poor data utilization is the annotation bottleneck faced by supervised machine learning algorithms. Active learning was explored to iteratively sample a subset of data for manual annotation as a strategy for minimizing the annotation effort while maximizing the predictive performance of the model. Supervised learning was successfully used where clinical codes integrated with free-text notes into electronic health records were utilized as class labels. Similarly, distant supervision was used to utilize an existing knowledge base to automatically annotate raw text. Where manual annotation was unavoidable, crowdsourcing was explored, but it remains unsuitable because of the sensitive nature of data considered. Besides the small volume, training data were typically sourced from a small number of institutions, thus offering no hard evidence about the transferability of machine learning models. The majority of studies focused on text classification. Most commonly, the classification results were used to support phenotyping, prognosis, care improvement, resource management, and surveillance. Conclusions We identified the data annotation bottleneck as one of the key obstacles to machine learning approaches in clinical NLP. Active learning and distant supervision were explored as a way of saving the annotation efforts. Future research in this field would benefit from alternatives such as data augmentation and transfer learning, or unsupervised learning, which do not require data annotation.


Sign in / Sign up

Export Citation Format

Share Document