scholarly journals Curating Archival Anatomic Pathology Material For Machine Learning Algorithm Development

2020 ◽  
Vol 154 (Supplement_1) ◽  
pp. S124-S125
Author(s):  
A Collins ◽  
A Norgan ◽  
J J Garcia

Abstract Introduction/Objective Advances in whole slide imaging have enabled the application of machine learning algorithms to anatomic pathology. In the current state, the development of accurate algorithms requires robust training data with correctly assigned diagnostic and classification labels. Increasingly, institutions have looked to their archival slides as a source of “ground truth” for algorithm development. However, the curation and use of archival data poses several challenges. Here, we share lessons learned from reviewing head and neck pathology consult cases spanning a 10- year period at Mayo Clinic Rochester. Methods Archived surgical pathology slides from 2,590 consult cases were reviewed. Clinical and demographic information was recorded for each case, including surgical date, surgical procedure, anatomic site, age, gender and diagnosis. Cases were excluded from the curated archive if there was insufficient volume or quality of tissue to render a specific diagnosis (141 cases, 5.6%). Slides with a range of tissue size and quality, from numerable laboratories were included in the curated archive. Selected cases were collated by anatomic site: ear, gnathic, larynx, nasopharynx, neck, oral cavity, oropharynx, salivary gland and sinonasal tract. Results Common diagnostic reconciliations (115 cases, 4.4%) fell within the following categories: (1) novel entities (59 cases, 2.3%), including biphenotypic sinonasal sarcoma and clear cell carcinoma; (2) novel classifications (21 cases, 0.8%), as seen in HPV-related oropharyngeal squamous cell carcinoma and polymorphous adenocarcinoma; and (3) novel grading schema (35 cases, 1.4%), as seen in keratinizing dysplasia and oropharyngeal malignancies. Conclusion Several nuances emerged in the process of reviewing slides, highlighting the need for continual amendment of any machine learning dataset over time. Curating anatomic pathology cases for machine learning algorithm development requires the recognition of emerging entities, with re-classification and re-grading as needed.

Electronics ◽  
2021 ◽  
Vol 10 (7) ◽  
pp. 827
Author(s):  
Satvik Venkatesh ◽  
David Moffat ◽  
Eduardo Reck Miranda

Music and speech detection provides us valuable information regarding the nature of content in broadcast audio. It helps detect acoustic regions that contain speech, voice over music, only music, or silence. In recent years, there have been developments in machine learning algorithms to accomplish this task. However, broadcast audio is generally well-mixed and copyrighted, which makes it challenging to share across research groups. In this study, we address the challenges encountered in automatically synthesising data that resembles a radio broadcast. Firstly, we compare state-of-the-art neural network architectures such as CNN, GRU, LSTM, TCN, and CRNN. Later, we investigate how audio ducking of background music impacts the precision and recall of the machine learning algorithm. Thirdly, we examine how the quantity of synthetic training data impacts the results. Finally, we evaluate the effectiveness of synthesised, real-world, and combined approaches for training models, to understand if the synthetic data presents any additional value. Amongst the network architectures, CRNN was the best performing network. Results also show that the minimum level of audio ducking preferred by the machine learning algorithm was similar to that of human listeners. After testing our model on in-house and public datasets, we observe that our proposed synthesis technique outperforms real-world data in some cases and serves as a promising alternative.


Author(s):  
A. Khanwalkar ◽  
R. Soni

Purpose: Diabetes is a chronic disease that pays for a large proportion of the nation's healthcare expenses when people with diabetes want medical care continuously. Several complications will occur if the polymer disorder is not treated and unrecognizable. The prescribed condition leads to a diagnostic center and a doctor's intention. One of the real-world subjects essential is to find the first phase of the polytechnic. In this work, basically a survey that has been analyzed in several parameters within the poly-infected disorder diagnosis. It resembles the classification algorithms of data collection that plays an important role in the data collection method. Automation of polygenic disorder analysis, as well as another machine learning algorithm. Design/methodology/approach: This paper provides extensive surveys of different analogies which have been used for the analysis of medical data, For the purpose of early detection of polygenic disorder. This paper takes into consideration methods such as J48, CART, SVMs and KNN square, this paper also conducts a formal surveying of all the studies, and provides a conclusion at the end. Findings: This surveying has been analyzed on several parameters within the poly-infected disorder diagnosis. It resembles that the classification algorithms of data collection plays an important role in the data collection method in Automation of polygenic disorder analysis, as well as another machine learning algorithm. Practical implications: This paper will help future researchers in the field of Healthcare, specifically in the domain of diabetes, to understand differences between classification algorithms. Originality/value: This paper will help in comparing machine learning algorithms by going through results and selecting the appropriate approach based on requirements.


The aim of this research is to do risk modelling after analysis of twitter posts based on certain sentiment analysis. In this research we analyze posts of several users or a particular user to check whether they can be cause of concern to the society or not. Every sentiment like happy, sad, anger and other emotions are going to provide scaling of severity in the conclusion of final table on which machine learning algorithm is applied. The data which is put under the machine learning algorithms are been monitored over a period of time and it is related to a particular topic in an area


2020 ◽  
Vol 7 (10) ◽  
pp. 380-389
Author(s):  
Asogwa D.C ◽  
Anigbogu S.O ◽  
Anigbogu G.N ◽  
Efozia F.N

Author's age prediction is the task of determining the author's age by studying the texts written by them. The prediction of author’s age can be enlightening about the different trends, opinions social and political views of an age group. Marketers always use this to encourage a product or a service to an age group following their conveyed interests and opinions. Methodologies in natural language processing have made it possible to predict author’s age from text by examining the variation of linguistic characteristics. Also, many machine learning algorithms have been used in author’s age prediction. However, in social networks, computational linguists are challenged with numerous issues just as machine learning techniques are performance driven with its own challenges in realistic scenarios. This work developed a model that can predict author's age from text with a machine learning algorithm (Naïve Bayes) using three types of features namely, content based, style based and topic based. The trained model gave a prediction accuracy of 80%.


Author(s):  
Virendra Tiwari ◽  
Balendra Garg ◽  
Uday Prakash Sharma

The machine learning algorithms are capable of managing multi-dimensional data under the dynamic environment. Despite its so many vital features, there are some challenges to overcome. The machine learning algorithms still requires some additional mechanisms or procedures for predicting a large number of new classes with managing privacy. The deficiencies show the reliable use of a machine learning algorithm relies on human experts because raw data may complicate the learning process which may generate inaccurate results. So the interpretation of outcomes with expertise in machine learning mechanisms is a significant challenge in the machine learning algorithm. The machine learning technique suffers from the issue of high dimensionality, adaptability, distributed computing, scalability, the streaming data, and the duplicity. The main issue of the machine learning algorithm is found its vulnerability to manage errors. Furthermore, machine learning techniques are also found to lack variability. This paper studies how can be reduced the computational complexity of machine learning algorithms by finding how to make predictions using an improved algorithm.


Author(s):  
Ladly Patel ◽  
Kumar Abhishek Gaurav

In today's world, a huge amount of data is available. So, all the available data are analyzed to get information, and later this data is used to train the machine learning algorithm. Machine learning is a subpart of artificial intelligence where machines are given training with data and the machine predicts the results. Machine learning is being used in healthcare, image processing, marketing, etc. The aim of machine learning is to reduce the work of the programmer by doing complex coding and decreasing human interaction with systems. The machine learns itself from past data and then predict the desired output. This chapter describes machine learning in brief with different machine learning algorithms with examples and about machine learning frameworks such as tensor flow and Keras. The limitations of machine learning and various applications of machine learning are discussed. This chapter also describes how to identify features in machine learning data.


2020 ◽  
Vol 48 (7) ◽  
pp. 030006052093688
Author(s):  
Daehyuk Yim ◽  
Tae Young Yeo ◽  
Moon Ho Park

Objective To develop a machine learning algorithm to identify cognitive dysfunction based on neuropsychological screening test results. Methods This retrospective study included 955 participants: 341 participants with dementia (dementia), 333 participants with mild cognitive impairment (MCI), and 341 participants who were cognitively healthy. All participants underwent evaluations including the Mini-Mental State Examination and the Montreal Cognitive Assessment. Each participant’s caregiver or informant was surveyed using the Korean Dementia Screening Questionnaire at the same visit. Different machine learning algorithms were applied, and their overall accuracies, Cohen’s kappa, receiver operating characteristic curves, and areas under the curve (AUCs) were calculated. Results The overall screening accuracies for MCI, dementia, and cognitive dysfunction (MCI or dementia) using a machine learning algorithm were approximately 67.8% to 93.5%, 96.8% to 99.9%, and 75.8% to 99.9%, respectively. Their kappa statistics ranged from 0.351 to 1.000. The AUCs of the machine learning models were statistically superior to those of the competing screening model. Conclusion This study suggests that a machine learning algorithm can be used as a supportive tool in the screening of MCI, dementia, and cognitive dysfunction.


Author(s):  
Petr Berka ◽  
Ivan Bruha

The genuine symbolic machine learning (ML) algorithms are capable of processing symbolic, categorial data only. However, real-world problems, e.g. in medicine or finance, involve both symbolic and numerical attributes. Therefore, there is an important issue of ML to discretize (categorize) numerical attributes. There exist quite a few discretization procedures in the ML field. This paper describes two newer algorithms for categorization (discretization) of numerical attributes. The first one is implemented in the KEX (Knowledge EXplorer) as its preprocessing procedure. Its idea is to discretize the numerical attributes in such a way that the resulting categorization corresponds to KEX knowledge acquisition algorithm. Since the categorization for KEX is done "off-line" before using the KEX machine learning algorithm, it can be used as a preprocessing step for other machine learning algorithms, too. The other discretization procedure is implemented in CN4, a large extension of the well-known CN2 machine learning algorithm. The range of numerical attributes is divided into intervals that may form a complex generated by the algorithm as a part of the class description. Experimental results show a comparison of performance of KEX and CN4 on some well-known ML databases. To make the comparison more exhibitory, we also used the discretization procedure of the MLC++ library. Other ML algorithms such as ID3 and C4.5 were run under our experiments, too. Then, the results are compared and discussed.


Sign in / Sign up

Export Citation Format

Share Document