scholarly journals Analytic continuation via domain knowledge free machine learning

2018 ◽  
Vol 98 (24) ◽  
Author(s):  
Hongkee Yoon ◽  
Jae-Hoon Sim ◽  
Myung Joon Han
2021 ◽  
pp. 000370282110345
Author(s):  
Tatu Rojalin ◽  
Dexter Antonio ◽  
Ambarish Kulkarni ◽  
Randy P. Carney

Surface-enhanced Raman scattering (SERS) is a powerful technique for sensitive label-free analysis of chemical and biological samples. While much recent work has established sophisticated automation routines using machine learning and related artificial intelligence methods, these efforts have largely focused on downstream processing (e.g., classification tasks) of previously collected data. While fully automated analysis pipelines are desirable, current progress is limited by cumbersome and manually intensive sample preparation and data collection steps. Specifically, a typical lab-scale SERS experiment requires the user to evaluate the quality and reliability of the measurement (i.e., the spectra) as the data are being collected. This need for expert user-intuition is a major bottleneck that limits applicability of SERS-based diagnostics for point-of-care clinical applications, where trained spectroscopists are likely unavailable. While application-agnostic numerical approaches (e.g., signal-to-noise thresholding) are useful, there is an urgent need to develop algorithms that leverage expert user intuition and domain knowledge to simplify and accelerate data collection steps. To address this challenge, in this work, we introduce a machine learning-assisted method at the acquisition stage. We tested six common algorithms to measure best performance in the context of spectral quality judgment. For adoption into future automation platforms, we developed an open-source python package tailored for rapid expert user annotation to train machine learning algorithms. We expect that this new approach to use machine learning to assist in data acquisition can serve as a useful building block for point-of-care SERS diagnostic platforms.


2021 ◽  
Author(s):  
Richard Büssow ◽  
Bruno Hain ◽  
Ismael Al Nuaimi

Abstract Objective and Scope Analysis of operational plant data needs experts in order to interpret detected anomalies which are defined as unusual operation points. The next step on the digital transformation journey is to provide actionable insights into the data. Prescriptive Maintenance defines in advance which kind of detailed maintenance and spare parts will be required. This paper details requirements to improve these predictions for rotating equipment and show potential to integrate the outcome into an operational workflow. Methods, Procedures, Process First principle or physics-based modelling provides additional insights into the data, since the results are directly interpretable. However, such approaches are typically assumed to be expensive to build and not scalable. Identification of and focus on the relevant equipment to be modeled in a hybrid model using a combination of first principle physics and machine learning is a successful strategy. The model is trained using a machine learning approach with historic or current real plant data, to predict conditions which have not occurred before. The better the Artificial Intelligence is trained, the better the prediction will be. Results, Observations, Conclusions The general aim when operating a plant is the actual usage of operational data for process and maintenance optimization by advanced analytics. Typically a data-driven central oversight function supports operations and maintenance staff. A major lesson-learned is that the results of a rather simple statistical approach to detect anomalies fall behind the expectations and are too labor intensive. It is a widely spread misinterpretation that being able to deal with big data is sufficient to come up with good prediction quality for Prescriptive Maintenance. What big data companies are normally missing is domain knowledge, especially on plant critical rotating equipment. Without having domain knowledge the relevant input into the model will have shortcomings and hence the same will apply to its predictions. This paper gives an example of a refinery where the described hybrid model has been used. Novel and Additive Information First principle models are typically expensive to build and not scalable. This hybrid model approach, combining first principle physics based models with artificial intelligence and integration into an operational workflow shows a new way forward.


2021 ◽  
Author(s):  
Thitaree Lertliangchai ◽  
Birol Dindoruk ◽  
Ligang Lu ◽  
Xi Yang

Abstract Dew point pressure (DPP) is a key variable that may be needed to predict the condensate to gas ratio behavior of a reservoir along with some production/completion related issues and calibrate/constrain the EOS models for integrated modeling. However, DPP is a challenging property in terms of its predictability. Recognizing the complexities, we present a state-of-the-art method for DPP prediction using advanced machine learning (ML) techniques. We compare the outcomes of our methodology with that of published empirical correlation-based approaches on two datasets with small sizes and different inputs. Our ML method noticeably outperforms the correlation-based predictors while also showing its flexibility and robustness even with small training datasets provided various classes of fluids are represented within the datasets. We have collected the condensate PVT data from public domain resources and GeoMark RFDBASE containing dew point pressure (the target variable), and the compositional data (mole percentage of each component), temperature, molecular weight (MW), MW and specific gravity (SG) of heptane plus as input variables. Using domain knowledge, before embarking the study, we have extensively checked the measurement quality and the outcomes using statistical techniques. We then apply advanced ML techniques to train predictive models with cross-validation to avoid overfitting the models to the small datasets. We compare our models against the best published DDP predictors with empirical correlation-based techniques. For fair comparisons, the correlation-based predictors are also trained using the underlying datasets. In order to improve the outcomes and using the generalized input data, pseudo-critical properties and artificial proxy features are also employed.


2017 ◽  
Vol 7 (1) ◽  
Author(s):  
Ivan Kruglov ◽  
Oleg Sergeev ◽  
Alexey Yanilkin ◽  
Artem R. Oganov

2021 ◽  
Author(s):  
◽  
Hazel Darney

<p>With the rapid uptake of machine learning artificial intelligence in our daily lives, we are beginning to realise the risks involved in implementing this technology in high-stakes decision making. This risk is due to machine learning decisions being based in human-curated datasets, meaning these decisions are not bias-free. Machine learning datasets put women at a disadvantage due to factors including (but not limited to) historical exclusion of women in data collection, research, and design; as well as the low participation of women in artificial intelligence fields. These factors mean that applications of machine learning may fail to treat the needs and experiences of women as equal to those of men.    Research into understanding gender biases in machine learning frequently occurs within the computer science field. This has frequently resulted in research where bias is inconsistently defined, and proposed techniques do not engage with relevant literature outside of the artificial intelligence field. This research proposes a novel, interdisciplinary approach to the measurement and validation of gender biases in machine learning. This approach translates methods of human-based gender bias measurement in psychology, forming a gender bias questionnaire for use on a machine rather than a human.   The final output system of this research as a proof of concept demonstrates the potential for a new approach to gender bias investigation. This system takes advantage of the qualitative nature of language to provide a new way of understanding gender data biases by outputting both quantitative and qualitative results. These results can then be meaningfully translated into their real-world implications.</p>


2021 ◽  
Author(s):  
◽  
Hazel Darney

<p>With the rapid uptake of machine learning artificial intelligence in our daily lives, we are beginning to realise the risks involved in implementing this technology in high-stakes decision making. This risk is due to machine learning decisions being based in human-curated datasets, meaning these decisions are not bias-free. Machine learning datasets put women at a disadvantage due to factors including (but not limited to) historical exclusion of women in data collection, research, and design; as well as the low participation of women in artificial intelligence fields. These factors mean that applications of machine learning may fail to treat the needs and experiences of women as equal to those of men.    Research into understanding gender biases in machine learning frequently occurs within the computer science field. This has frequently resulted in research where bias is inconsistently defined, and proposed techniques do not engage with relevant literature outside of the artificial intelligence field. This research proposes a novel, interdisciplinary approach to the measurement and validation of gender biases in machine learning. This approach translates methods of human-based gender bias measurement in psychology, forming a gender bias questionnaire for use on a machine rather than a human.   The final output system of this research as a proof of concept demonstrates the potential for a new approach to gender bias investigation. This system takes advantage of the qualitative nature of language to provide a new way of understanding gender data biases by outputting both quantitative and qualitative results. These results can then be meaningfully translated into their real-world implications.</p>


Author(s):  
Dolores García ◽  
Jesus O. Lacruz ◽  
Damiano Badini ◽  
Danilo De Donno ◽  
Joerg Widmer

2021 ◽  
Author(s):  
Meng Ji ◽  
Yanmeng Liu ◽  
Tianyong Hao

BACKGROUND Much of current health information understandability research uses medical readability formula (MRF) to assess the cognitive difficulty of health education resources. This is based on an implicit assumption that medical domain knowledge represented by uncommon words or jargons form the sole barriers to health information access among the public. Our study challenged this by showing that for readers from non-English speaking backgrounds with higher education attainment, semantic features of English health texts rather than medical jargons can explain the lack of cognitive access of health materials among readers with better understanding of health terms, yet limited exposure to English health education materials. OBJECTIVE Our study explored combined MRF and multidimensional semantic features (MSF) for developing machine learning algorithms to predict the actual level of cognitive accessibility of English health materials on health risks and diseases for specific populations. We compare algorithms to evaluate the cognitive accessibility of specialised health information for non-native English speaker with advanced education levels yet very limited exposure to English health education environments. METHODS We used 108 semantic features to measure the content complexity and accessibility of original English resources. Using 1000 English health texts collected from international health organization websites, rated by international tertiary students, we compared machine learning (decision tree, SVM, discriminant analysis, ensemble tree and logistic regression) after automatic hyperparameter optimization (grid search for the best combination of hyperparameters of minimal classification errors). We applied 10-fold cross-validation on the whole dataset for the model training and testing, calculated the AUC, sensitivity, specificity, and accuracy as the measured of the model performance. RESULTS Using two sets of predictor features: widely tested MRF and MSF proposed in our study, we developed and compared three sets of machine learning algorithms: the first set of algorithms used MRF as predictors only, the second set of algorithms used MSF as predictors only, and the last set of algorithms used both MRF and MSF as integrated models. The results showed that the integrated models outperformed in terms of AUC, sensitivity, accuracy, and specificity. CONCLUSIONS Our study showed that cognitive accessibility of English health texts is not limited to word length and sentence length conventionally measured by MRF. We compared machine learning algorithms combing MRF and MSF to explore the cognitive accessibility of health information from syntactic and semantic perspectives. The results showed the strength of integrated models in terms of statistically increased AUC, sensitivity, and accuracy to predict health resource accessibility for the target readership, indicating that both MRF and MSF contribute to the comprehension of health information, and that for readers with advanced education, semantic features outweigh syntax and domain knowledge.


2021 ◽  
Vol 5 (12) ◽  
pp. 73
Author(s):  
Daniel Kerrigan ◽  
Jessica Hullman ◽  
Enrico Bertini

Eliciting knowledge from domain experts can play an important role throughout the machine learning process, from correctly specifying the task to evaluating model results. However, knowledge elicitation is also fraught with challenges. In this work, we consider why and how machine learning researchers elicit knowledge from experts in the model development process. We develop a taxonomy to characterize elicitation approaches according to the elicitation goal, elicitation target, elicitation process, and use of elicited knowledge. We analyze the elicitation trends observed in 28 papers with this taxonomy and identify opportunities for adding rigor to these elicitation approaches. We suggest future directions for research in elicitation for machine learning by highlighting avenues for further exploration and drawing on what we can learn from elicitation research in other fields.


Sign in / Sign up

Export Citation Format

Share Document