Machine Learning-Assisted Sampling of SERS Substrates Improves Data Collection Efficiency

Surface-enhanced Raman scattering (SERS) is a powerful technique for sensitive label-free analysis of chemical and biological samples. While much recent work has established sophisticated automation routines using machine learning and related artificial intelligence methods, these efforts have largely focused on downstream processing (e.g., classification tasks) of previously collected data. While fully automated analysis pipelines are desirable, current progress is limited by cumbersome and manually intensive sample preparation and data collection steps. Specifically, a typical lab-scale SERS experiment requires the user to evaluate the quality and reliability of the measurement (i.e., the spectra) as the data are being collected. This need for expert user-intuition is a major bottleneck that limits applicability of SERS-based diagnostics for point-of-care clinical applications, where trained spectroscopists are likely unavailable. While application-agnostic numerical approaches (e.g., signal-to-noise thresholding) are useful, there is an urgent need to develop algorithms that leverage expert user intuition and domain knowledge to simplify and accelerate data collection steps. To address this challenge, in this work, we introduce a machine learning-assisted method at the acquisition stage. We tested six common algorithms to measure best performance in the context of spectral quality judgment. For adoption into future automation platforms, we developed an open-source python package tailored for rapid expert user annotation to train machine learning algorithms. We expect that this new approach to use machine learning to assist in data acquisition can serve as a useful building block for point-of-care SERS diagnostic platforms.

Download Full-text

Advances in the Prediction of Protein Subcellular Locations with Machine Learning

Current Bioinformatics ◽

10.2174/1574893614666181217145156 ◽

2019 ◽

Vol 14 (5) ◽

pp. 406-421 ◽

Cited By ~ 3

Author(s):

Ting-He Zhang ◽

Shao-Wu Zhang

Keyword(s):

Machine Learning ◽

Feature Fusion ◽

Protein Sequences ◽

Subcellular Location ◽

Automated Analysis ◽

Cellular Level ◽

Machine Learning Algorithms ◽

Feature Representation ◽

Protein Subcellular Location ◽

Protein Subcellular Locations

Background: Revealing the subcellular location of a newly discovered protein can bring insight into their function and guide research at the cellular level. The experimental methods currently used to identify the protein subcellular locations are both time-consuming and expensive. Thus, it is highly desired to develop computational methods for efficiently and effectively identifying the protein subcellular locations. Especially, the rapidly increasing number of protein sequences entering the genome databases has called for the development of automated analysis methods. Methods: In this review, we will describe the recent advances in predicting the protein subcellular locations with machine learning from the following aspects: i) Protein subcellular location benchmark dataset construction, ii) Protein feature representation and feature descriptors, iii) Common machine learning algorithms, iv) Cross-validation test methods and assessment metrics, v) Web servers. Result & Conclusion: Concomitant with a large number of protein sequences generated by highthroughput technologies, four future directions for predicting protein subcellular locations with machine learning should be paid attention. One direction is the selection of novel and effective features (e.g., statistics, physical-chemical, evolutional) from the sequences and structures of proteins. Another is the feature fusion strategy. The third is the design of a powerful predictor and the fourth one is the protein multiple location sites prediction.

Download Full-text

mSphere of Influence: the Rise of Artificial Intelligence in Infection Biology

mSphere ◽

10.1128/msphere.00315-19 ◽

2019 ◽

Vol 4 (3) ◽

Cited By ~ 2

Author(s):

Artur Yakimovich

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Machine Learning Algorithms ◽

Label Free ◽

Edge Label ◽

Bacterial Colony ◽

Link Type ◽

Colony Counting ◽

Anthrax Spores ◽

Infection Biology

ABSTRACT Artur Yakimovich works in the field of computational virology and applies machine learning algorithms to study host-pathogen interactions. In this mSphere of Influence article, he reflects on two papers “Holographic Deep Learning for Rapid Optical Screening of Anthrax Spores” by Jo et al. (Y. Jo, S. Park, J. Jung, J. Yoon, et al., Sci Adv 3:e1700606, 2017, https://doi.org/10.1126/sciadv.1700606) and “Bacterial Colony Counting with Convolutional Neural Networks in Digital Microbiology Imaging” by Ferrari and colleagues (A. Ferrari, S. Lombardi, and A. Signoroni, Pattern Recognition 61:629–640, 2017, https://doi.org/10.1016/j.patcog.2016.07.016). Here he discusses how these papers made an impact on him by showcasing that artificial intelligence algorithms can be equally applicable to both classical infection biology techniques and cutting-edge label-free imaging of pathogens.

Download Full-text

A survey on prediction of diabetes using classification algorithms

Journal of Achievements of Materials and Manufacturing Engineering ◽

10.5604/01.3001.0014.8490 ◽

2021 ◽

Vol 2 (104) ◽

pp. 77-84

Author(s):

A. Khanwalkar ◽

R. Soni

Keyword(s):

Machine Learning ◽

Data Collection ◽

Learning Algorithm ◽

Algorithm Design ◽

Machine Learning Algorithms ◽

Classification Algorithms ◽

Machine Learning Algorithm ◽

Collection Method ◽

Data Collection Method ◽

Diagnostic Center

Purpose: Diabetes is a chronic disease that pays for a large proportion of the nation's healthcare expenses when people with diabetes want medical care continuously. Several complications will occur if the polymer disorder is not treated and unrecognizable. The prescribed condition leads to a diagnostic center and a doctor's intention. One of the real-world subjects essential is to find the first phase of the polytechnic. In this work, basically a survey that has been analyzed in several parameters within the poly-infected disorder diagnosis. It resembles the classification algorithms of data collection that plays an important role in the data collection method. Automation of polygenic disorder analysis, as well as another machine learning algorithm. Design/methodology/approach: This paper provides extensive surveys of different analogies which have been used for the analysis of medical data, For the purpose of early detection of polygenic disorder. This paper takes into consideration methods such as J48, CART, SVMs and KNN square, this paper also conducts a formal surveying of all the studies, and provides a conclusion at the end. Findings: This surveying has been analyzed on several parameters within the poly-infected disorder diagnosis. It resembles that the classification algorithms of data collection plays an important role in the data collection method in Automation of polygenic disorder analysis, as well as another machine learning algorithm. Practical implications: This paper will help future researchers in the field of Healthcare, specifically in the domain of diabetes, to understand differences between classification algorithms. Originality/value: This paper will help in comparing machine learning algorithms by going through results and selecting the appropriate approach based on requirements.

Download Full-text

Abstract 17060: Machine Learning and Video Recognition for Automated Detection of Fluid Status in Heart Failure Patients

Circulation ◽

10.1161/circ.142.suppl_3.17060 ◽

2020 ◽

Vol 142 (Suppl_3) ◽

Author(s):

Pratik Doshi ◽

John Tanaka ◽

Jedrek Wosik ◽

Natalia M Gil ◽

Martin Bertran ◽

...

Keyword(s):

Machine Learning ◽

Heart Failure ◽

Point Of Care ◽

Machine Learning Algorithms ◽

Right Atrial ◽

Heart Catheterization ◽

Heart Failure Patients ◽

Non Invasive ◽

Fluid Status ◽

Video Recognition

Introduction: There is a need for innovative solutions to better screen and diagnose the 7 million patients with chronic heart failure. A key component of assessing these patients is monitoring fluid status by evaluating for the presence and height of jugular venous distension (JVD). We hypothesize that video analysis of a patient’s neck using machine learning algorithms and image recognition can identify the amount of JVD. We propose the use of high fidelity video recordings taken using a mobile device camera to determine the presence or absence of JVD, which we will use to develop a point of care testing tool for early detection of acute exacerbation of heart failure. Methods: In this feasibility study, patients in the Duke cardiac catheterization lab undergoing right heart catheterization were enrolled. RGB and infrared videos were captured of the patient’s neck to detect JVD and correlated with right atrial pressure on the heart catheterization. We designed an adaptive filter based on biological priors that enhances spatially consistent frequency anomalies and detects jugular vein distention, with implementation done on Python. Results: We captured and analyzed footage for six patients using our model. Four of these six patients shared a similar strong signal outliner within the frequency band of 95bpm – 200bpm when using a conservative threshold, indicating the presence of JVD. We did not use statistical analysis given the small nature of our cohort, but in those we detected a positive JVD signal the RA mean was 20.25 mmHg and PCWP mean was 24.3 mmHg. Conclusions: We have demonstrated the ability to evaluate for JVD via infrared video and found a relationship with RHC values. Our project is innovative because it uses video recognition and allows for novel patient interactions using a non-invasive screening technique for heart failure. This tool can become a non-invasive standard to both screen for and help manage heart failure patients.

Download Full-text

A Comparative Analysis of the Prediction of Gas Condensate Dew Point Pressure Using Advanced Machine Learning Algorithms

10.2118/205997-ms ◽

2021 ◽

Author(s):

Thitaree Lertliangchai ◽

Birol Dindoruk ◽

Ligang Lu ◽

Xi Yang

Keyword(s):

Machine Learning ◽

Domain Knowledge ◽

Compositional Data ◽

Machine Learning Algorithms ◽

Empirical Correlation ◽

Dew Point ◽

Pvt Data ◽

Point Pressure ◽

Input Variables ◽

Dew Point Pressure

Abstract Dew point pressure (DPP) is a key variable that may be needed to predict the condensate to gas ratio behavior of a reservoir along with some production/completion related issues and calibrate/constrain the EOS models for integrated modeling. However, DPP is a challenging property in terms of its predictability. Recognizing the complexities, we present a state-of-the-art method for DPP prediction using advanced machine learning (ML) techniques. We compare the outcomes of our methodology with that of published empirical correlation-based approaches on two datasets with small sizes and different inputs. Our ML method noticeably outperforms the correlation-based predictors while also showing its flexibility and robustness even with small training datasets provided various classes of fluids are represented within the datasets. We have collected the condensate PVT data from public domain resources and GeoMark RFDBASE containing dew point pressure (the target variable), and the compositional data (mole percentage of each component), temperature, molecular weight (MW), MW and specific gravity (SG) of heptane plus as input variables. Using domain knowledge, before embarking the study, we have extensively checked the measurement quality and the outcomes using statistical techniques. We then apply advanced ML techniques to train predictive models with cross-validation to avoid overfitting the models to the small datasets. We compare our models against the best published DDP predictors with empirical correlation-based techniques. For fair comparisons, the correlation-based predictors are also trained using the underlying datasets. In order to improve the outcomes and using the generalized input data, pseudo-critical properties and artificial proxy features are also employed.

Download Full-text

Predicting Health Material Cognitive Accessibility Using Multidimensional Semantic Features and Readability Tools as Predicators (Preprint)

10.2196/preprints.29175 ◽

2021 ◽

Author(s):

Meng Ji ◽

Yanmeng Liu ◽

Tianyong Hao

Keyword(s):

Machine Learning ◽

Health Education ◽

Health Information ◽

Domain Knowledge ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Semantic Features ◽

Integrated Models ◽

Advanced Education ◽

Cognitive Accessibility

BACKGROUND Much of current health information understandability research uses medical readability formula (MRF) to assess the cognitive difficulty of health education resources. This is based on an implicit assumption that medical domain knowledge represented by uncommon words or jargons form the sole barriers to health information access among the public. Our study challenged this by showing that for readers from non-English speaking backgrounds with higher education attainment, semantic features of English health texts rather than medical jargons can explain the lack of cognitive access of health materials among readers with better understanding of health terms, yet limited exposure to English health education materials. OBJECTIVE Our study explored combined MRF and multidimensional semantic features (MSF) for developing machine learning algorithms to predict the actual level of cognitive accessibility of English health materials on health risks and diseases for specific populations. We compare algorithms to evaluate the cognitive accessibility of specialised health information for non-native English speaker with advanced education levels yet very limited exposure to English health education environments. METHODS We used 108 semantic features to measure the content complexity and accessibility of original English resources. Using 1000 English health texts collected from international health organization websites, rated by international tertiary students, we compared machine learning (decision tree, SVM, discriminant analysis, ensemble tree and logistic regression) after automatic hyperparameter optimization (grid search for the best combination of hyperparameters of minimal classification errors). We applied 10-fold cross-validation on the whole dataset for the model training and testing, calculated the AUC, sensitivity, specificity, and accuracy as the measured of the model performance. RESULTS Using two sets of predictor features: widely tested MRF and MSF proposed in our study, we developed and compared three sets of machine learning algorithms: the first set of algorithms used MRF as predictors only, the second set of algorithms used MSF as predictors only, and the last set of algorithms used both MRF and MSF as integrated models. The results showed that the integrated models outperformed in terms of AUC, sensitivity, accuracy, and specificity. CONCLUSIONS Our study showed that cognitive accessibility of English health texts is not limited to word length and sentence length conventionally measured by MRF. We compared machine learning algorithms combing MRF and MSF to explore the cognitive accessibility of health information from syntactic and semantic perspectives. The results showed the strength of integrated models in terms of statistically increased AUC, sensitivity, and accuracy to predict health resource accessibility for the target readership, indicating that both MRF and MSF contribute to the comprehension of health information, and that for readers with advanced education, semantic features outweigh syntax and domain knowledge.

Download Full-text

Programming Language Support for Implementing Machine Learning Algorithms

Handbook of Research on Applications and Implementations of Machine Learning Techniques - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-9902-9.ch021 ◽

2020 ◽

pp. 402-421

Author(s):

Anitha Elavarasi S. ◽

Jayanthi J.

Keyword(s):

Machine Learning ◽

Programming Languages ◽

Programming Language ◽

Domain Knowledge ◽

Learning Algorithm ◽

Mathematical Optimization ◽

Machine Learning Algorithms ◽

Application Development ◽

Machine Learning Applications ◽

And Control

Machine learning provides the system to automatically learn without human intervention and improve their performance with the help of previous experience. It can access the data and use it for learning by itself. Even though many algorithms are developed to solve machine learning issues, it is difficult to handle all kinds of inputs data in-order to arrive at accurate decisions. The domain knowledge of statistical science, probability, logic, mathematical optimization, reinforcement learning, and control theory plays a major role in developing machine learning based algorithms. The key consideration in selecting a suitable programming language for implementing machine learning algorithm includes performance, concurrence, application development, learning curve. This chapter deals with few of the top programming languages used for developing machine learning applications. They are Python, R, and Java. Top three programming languages preferred by data scientist are (1) Python more than 57%, (2) R more than 31%, and (3) Java used by 17% of the data scientists.

Download Full-text

Expert-level Automated Biomarker Identification in Optical Coherence Tomography Scans

Scientific Reports ◽

10.1038/s41598-019-49740-7 ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 4

Author(s):

Thomas Kurmann ◽

Siqing Yu ◽

Pablo Márquez-Neila ◽

Andreas Ebneter ◽

Martin Zinkernagel ◽

...

Keyword(s):

Machine Learning ◽

Optical Coherence Tomography ◽

Critical Role ◽

Cost Effective ◽

Automated Analysis ◽

Machine Learning Algorithms ◽

Three Dimensions ◽

Optical Coherence ◽

Clinical Routine ◽

Wide Range

Abstract In ophthalmology, retinal biological markers, or biomarkers, play a critical role in the management of chronic eye conditions and in the development of new therapeutics. While many imaging technologies used today can visualize these, Optical Coherence Tomography (OCT) is often the tool of choice due to its ability to image retinal structures in three dimensions at micrometer resolution. But with widespread use in clinical routine, and growing prevalence in chronic retinal conditions, the quantity of scans acquired worldwide is surpassing the capacity of retinal specialists to inspect these in meaningful ways. Instead, automated analysis of scans using machine learning algorithms provide a cost effective and reliable alternative to assist ophthalmologists in clinical routine and research. We present a machine learning method capable of consistently identifying a wide range of common retinal biomarkers from OCT scans. Our approach avoids the need for costly segmentation annotations and allows scans to be characterized by biomarker distributions. These can then be used to classify scans based on their underlying pathology in a device-independent way.

Download Full-text

Evaluating Human versus Machine Learning Performance in a LegalTech Problem

Applied Sciences ◽

10.3390/app12010297 ◽

2021 ◽

Vol 12 (1) ◽

pp. 297

Author(s):

Tamás Orosz ◽

Renátó Vági ◽

Gergely Márk Csányi ◽

Dániel Nagy ◽

István Üveges ◽

...

Keyword(s):

Machine Learning ◽

Domain Knowledge ◽

Classification Problem ◽

Machine Learning Algorithms ◽

Added Value ◽

Learning Performance ◽

Production Environment ◽

Legal Domain ◽

The Cost

Many machine learning-based document processing applications have been published in recent years. Applying these methodologies can reduce the cost of labor-intensive tasks and induce changes in the company’s structure. The artificial intelligence-based application can replace the application of trainees and free up the time of experts, which can increase innovation inside the company by letting them be involved in tasks with greater added value. However, the development cost of these methodologies can be high, and usually, it is not a straightforward task. This paper presents a survey result, where a machine learning-based legal text labeler competed with multiple people with different legal domain knowledge. The machine learning-based application used binary SVM-based classifiers to resolve the multi-label classification problem. The used methods were encapsulated and deployed as a digital twin into a production environment. The results show that machine learning algorithms can be effectively utilized for monotonous but domain knowledge- and attention-demanding tasks. The results also suggest that embracing the machine learning-based solution can increase discoverability and enrich the value of data. The test confirmed that the accuracy of a machine learning-based system matches up with the long-term accuracy of legal experts, which makes it applicable to automatize the working process.

Download Full-text

Successful leveraging of image processing and machine learning in seismic structural interpretation: A review

The Leading Edge ◽

10.1190/tle37060451.1 ◽

2018 ◽

Vol 37 (6) ◽

pp. 451-461 ◽

Cited By ~ 26

Author(s):

Zhen Wang ◽

Haibin Di ◽

Muhammad Amir Shafiq ◽

Yazeed Alaudah ◽

Ghassan AlRegib

Keyword(s):

Machine Learning ◽

Image Processing ◽

Deep Learning ◽

Seismic Data ◽

Domain Knowledge ◽

Machine Learning Algorithms ◽

Hydrocarbon Exploration ◽

Machine Learning Techniques ◽

Structural Interpretation ◽

Salt Domes

As a process that identifies geologic structures of interest such as faults, salt domes, or elements of petroleum systems in general, seismic structural interpretation depends heavily on the domain knowledge and experience of interpreters as well as visual cues of geologic structures, such as texture and geometry. With the dramatic increase in size of seismic data acquired for hydrocarbon exploration, structural interpretation has become more time consuming and labor intensive. By treating seismic data as images rather than signal traces, researchers have been able to utilize advanced image-processing and machine-learning techniques to assist interpretation directly. In this paper, we mainly focus on the interpretation of two important geologic structures, faults and salt domes, and summarize interpretation workflows based on typical or advanced image-processing and machine-learning algorithms. In recent years, increasing computational power and the massive amount of available data have led to the rise of deep learning. Deep-learning models that simulate the human brain's biological neural networks can achieve state-of-the-art accuracy and even exceed human-level performance on numerous applications. The convolutional neural network — a form of deep-learning model that is effective in analyzing visual imagery — has been applied in fault and salt dome interpretation. At the end of this review, we provide insight and discussion on the future of structural interpretation.

Download Full-text