English Language Accent Classification and Conversion using Machine Learning

2020 ◽  
Author(s):  
Pratik Parikh ◽  
Ketaki Velhal ◽  
Sanika Potdar ◽  
Aayushi Sikligar ◽  
Ruhina Karani
2018 ◽  
Vol 46 (1) ◽  

Damian Trilling & Jelle Boumans Automated analysis of Dutch language-based texts. An overview and research agenda While automated methods of content analysis are increasingly popular in today’s communication research, these methods have hardly been adopted by communication scholars studying texts in Dutch. This essay offers an overview of the possibilities and current limitations of automated text analysis approaches in the context of the Dutch language. Particularly in dictionary-based approaches, research is far less prolific as research on the English language. We divide the most common types of content-analytical research questions into three categories: 1) research problems for which automated methods ought to be used, 2) research problems for which automated methods could be used, and 3) research problems for which automated methods (currently) cannot be used. Finally, we give suggestions for the advancement of automated text analysis approaches for Dutch texts. Keywords: automated content analysis, Dutch, dictionaries, supervised machine learning, unsupervised machine learning


2020 ◽  
Vol 10 (15) ◽  
pp. 5135
Author(s):  
Nuria Caballé-Cervigón ◽  
José L. Castillo-Sequera ◽  
Juan A. Gómez-Pulido ◽  
José M. Gómez-Pulido ◽  
María L. Polo-Luque

Human healthcare is one of the most important topics for society. It tries to find the correct effective and robust disease detection as soon as possible to patients receipt the appropriate cares. Because this detection is often a difficult task, it becomes necessary medicine field searches support from other fields such as statistics and computer science. These disciplines are facing the challenge of exploring new techniques, going beyond the traditional ones. The large number of techniques that are emerging makes it necessary to provide a comprehensive overview that avoids very particular aspects. To this end, we propose a systematic review dealing with the Machine Learning applied to the diagnosis of human diseases. This review focuses on modern techniques related to the development of Machine Learning applied to diagnosis of human diseases in the medical field, in order to discover interesting patterns, making non-trivial predictions and useful in decision-making. In this way, this work can help researchers to discover and, if necessary, determine the applicability of the machine learning techniques in their particular specialties. We provide some examples of the algorithms used in medicine, analysing some trends that are focused on the goal searched, the algorithm used, and the area of applications. We detail the advantages and disadvantages of each technique to help choose the most appropriate in each real-life situation, as several authors have reported. The authors searched Scopus, Journal Citation Reports (JCR), Google Scholar, and MedLine databases from the last decades (from 1980s approximately) up to the present, with English language restrictions, for studies according to the objectives mentioned above. Based on a protocol for data extraction defined and evaluated by all authors using PRISMA methodology, 141 papers were included in this advanced review.


Sci ◽  
2020 ◽  
Vol 2 (4) ◽  
pp. 92
Author(s):  
Ovidiu Calin

This paper presents a quantitative approach to poetry, based on the use of several statistical measures (entropy, informational energy, N-gram, etc.) applied to a few characteristic English writings. We found that English language changes its entropy as time passes, and that entropy depends on the language used and on the author. In order to compare two similar texts, we were able to introduce a statistical method to asses the information entropy between two texts. We also introduced a method of computing the average information conveyed by a group of letters about the next letter in the text. We found a formula for computing the Shannon language entropy and we introduced the concept of N-gram informational energy of a poetry. We also constructed a neural network, which is able to generate Byron-type poetry and to analyze the information proximity to the genuine Byron poetry.


BMJ Open ◽  
2022 ◽  
Vol 12 (1) ◽  
pp. e055525
Author(s):  
Yik-Ki Jacob Wan ◽  
Guilherme Del Fiol ◽  
Mary M McFarland ◽  
Melanie C Wright

IntroductionEarly identification of patients who may suffer from unexpected adverse events (eg, sepsis, sudden cardiac arrest) gives bedside staff valuable lead time to care for these patients appropriately. Consequently, many machine learning algorithms have been developed to predict adverse events. However, little research focuses on how these systems are implemented and how system design impacts clinicians’ decisions or patient outcomes. This protocol outlines the steps to review the designs of these tools.Methods and analysisWe will use scoping review methods to explore how tools that leverage machine learning algorithms in predicting adverse events are designed to integrate into clinical practice. We will explore the types of user interfaces deployed, what information is displayed, and how clinical workflows are supported. Electronic sources include Medline, Embase, CINAHL Complete, Cochrane Library (including CENTRAL), and IEEE Xplore from 1 January 2009 to present. We will only review primary research articles that report findings from the implementation of patient deterioration surveillance tools for hospital clinicians. The articles must also include a description of the tool’s user interface. Since our primary focus is on how the user interacts with automated tools driven by machine learning algorithms, electronic tools that do not extract data from clinical data documentation or recording systems such as an EHR or patient monitor, or otherwise require manual entry, will be excluded. Similarly, tools that do not synthesise information from more than one data variable will also be excluded. This review will be limited to English-language articles. Two reviewers will review the articles and extract the data. Findings from both researchers will be compared with minimise bias. The results will be quantified, synthesised and presented using appropriate formats.Ethics and disseminationEthics review is not required for this scoping review. Findings will be disseminated through peer-reviewed publications.


Author(s):  
Kareema G. Milad ◽  
Yasser F. Hassan ◽  
Ashraf S. El Sayed

Machine learning techniques usually require a large number of training samples to achieve maximum benefit. In this case, limited training samples are not enough to learn models; recently there has been a growing interest in machine learning methods that can exploit knowledge from such other tasks to improve performance. Multi-task learning was proposed to solve this problem. Multi-task learning is a machine learning paradigm for learning a number tasks simultaneously, exploiting commonalities between them. When there are relations between the tasks to learn, it can be advantageous to learn all these tasks simultaneously instead of learning each task independently. In this paper, we propose translate language from source language to target language using Multi-task learning, for our need building a relation extraction system between the words in the texts, we applied related tasks ( part-of-speech , chunking and named entity recognition) and train it's in parallel on annotated data using hidden markov model, Experiments of text translation task show that our proposed work can improve the performance of a translation task with the help of other related tasks.


2020 ◽  
Vol 17 (9) ◽  
pp. 4258-4261
Author(s):  
Jagadish S. Kallimani ◽  
C. P. Chandrika ◽  
Aniket Singh ◽  
Zaifa Khan

Authorship Identification pertains to establishing the author of a particular document, currently unknown, based on the documents previously available. The field of authorship identification has been explored so far primarily in the English language, using several supervised and unsupervised machine learning models along with usage of NLP techniques, but work on regional languages is highly limited. This may be due to the lack of collection of proper datasets and preprocessing techniques attributed to the rich morphological and stylistic features in these languages. In this paper we apply some supervised machine learning models, namely SVM and Naïve Bayes to Hindi literature to perform authorship analysis by picking four Hindi authors. We compare and analyze the accuracy which is so obtained using different models and bag of words approach.


Sci ◽  
2020 ◽  
Vol 2 (4) ◽  
pp. 78
Author(s):  
Ovidiu Calin

This paper presents a quantitative approach to poetry, based on the use of several statistical measures (entropy, informational energy, N-gram, etc.) applied to a few characteristic English writings. We found that English language changes its entropy as time passes, and that entropy depends on the language used and on the author. In order to compare two similar texts, we were able to introduce a statistical method to asses the information entropy between two texts. We also introduced a method of computing the average information conveyed by a group of letters about the next letter in the text. We found a formula for computing the Shannon language entropy and we introduced the concept of N-gram informational energy of a poetry. We also constructed a neural network, which is able to generate Byron-type poetry and to analyze the information proximity to the genuine Byron poetry.


Pharmacy ◽  
2019 ◽  
Vol 7 (3) ◽  
pp. 130 ◽  
Author(s):  
Alan Hanna ◽  
Lezley-Anne Hanna

Background: Fitness to practise (FtP) impairment (failure of a healthcare professional to demonstrate skills, knowledge, character and/or health required for their job) can compromise patient safety, the profession’s reputation, and an individual’s career. In the United Kingdom (UK), various healthcare professionals’ FtP cases (documents about the panel hearing(s) and outcome(s) relating to the alleged FtP impairment) are publicly available, yet reviewing these to learn lessons may be time-consuming given the number of cases across the professions and amount of text in each. We aimed to demonstrate how machine learning facilitated the examination of such cases (at uni- and multi-professional level), involving UK dental, medical, nursing and pharmacy professionals. Methods: Cases dating from August 2017 to June 2019 were downloaded (577 dental, 481 medical, 2199 nursing and 63 pharmacy) and converted to text files. A topic analysis method (non-negative matrix factorization; machine learning) was employed for data analysis. Results: Identified topics were criminal offences; dishonesty (fraud and theft); drug possession/supply; English language; indemnity insurance; patient care (including incompetence) and personal behavior (aggression, sexual conduct and substance misuse). The most frequently identified topic for dental, medical and nursing professions was patient care whereas for pharmacy, it was criminal offences. Conclusions: While commonalities exist, each has different priorities which professional and educational organizations should strive to address.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Christina M. Ramirez ◽  
Marisa A. Abrajano ◽  
R. Michael Alvarez

Abstract Survey responses in public health surveys are heterogeneous. The quality of a respondent’s answers depends on many factors, including cognitive abilities, interview context, and whether the interview is in person or self-administered. A largely unexplored issue is how the language used for public health survey interviews is associated with the survey response. We introduce a machine learning approach, Fuzzy Forests, which we use for model selection. We use the 2013 California Health Interview Survey (CHIS) as our training sample and the 2014 CHIS as the test sample. We found that non-English language survey responses differ substantially from English responses in reported health outcomes. We also found heterogeneity among the Asian languages suggesting that caution should be used when interpreting results that compare across these languages. The 2013 Fuzzy Forests model also correctly predicted 86% of good health outcomes using 2014 data as the test set. We show that the Fuzzy Forests methodology is potentially useful for screening for and understanding other types of survey response heterogeneity. This is especially true in high-dimensional and complex surveys.


2019 ◽  
Vol 40 (Supplement_1) ◽  
Author(s):  
A Banerjee ◽  
S Chen ◽  
G Fatemifar ◽  
H Hemingway ◽  
T Lumbers ◽  
...  

Abstract Introduction Heart failure (HF), acute coronary syndromes (ACS) and atrial fibrillation (AF) are among the commonest cardiovascular diseases (CVD), frequently co-exist and share pathophysiology. Definitions of diagnosis and prognosis are suboptimal. Machine learning (ML) is increasingly used in subtype definition and risk prediction, but the design, methods and results of studies have not been appraised. Purpose To conduct a systematic review of ML for discovery of new subtypes and risk prediction in HF, ACS and AF. Methods PubMed, MEDLINE, and Web of Science databases were searched (January 2000-August 2018) for English language publications with agreed search terms pertaining to machine learning, clustering, CVD, subtype and risk prediction. The baseline characteristics of the study population, the method of ML, covariates and results were extracted for each study. Results Of 5012 identified studies, 43 met inclusion criteria. Of the 33 studies of unsupervised ML for disease clustering (mean n=2354; min 117, max 44886), there were 22 in HF, 9 in ACS and 2 in AF. 22/33 studies involved <1000 individuals and 24 were based in North America. Across diseases, 27 studies were in outpatients, and 5 used trial data. The mean number of covariates used was 26; most commonly demographic and symptom variables. The ML methods used were partitional (n=12), hierarchical (n=4), self-organising map (n=1) and hidden Markov model (n=1). Most studies used only one ML method (n=25). Only 15 studies validated or replicated findings. 20/33 studies found 2 or 3 disease clusters, Most studies found 2–3 clusters (20/33) and most clusters were based on physical or physiological characteristics (30/33). Of the 10 studies of supervised ML for risk prediction (mean n=43003; min 228, max 378256), 4 were in HF, 5 in ACS and 1 in AF. 2/11 studies involved <1000 individuals and most were from North America (n=6). All studies had an observational design, used at least 2 ML methods and validated or replicated findings. The setting was varied: primary care (n=2), emergency department (n=2), inpatient (n=4) and mixed (n=2). The mean number of covariates was 102. The commonest ML methods were neural networks (n=5), random forest (n=4) and support vector machine (n=4). All studies showed positive finding, i.e. ML approaches improved risk prediction. Conclusions Studies to-date of ML in HF, ACS and AF have focused on North America (68.2%), and 50% included less than 1000 individuals. Moreover, there is heterogeneity in clinical setting, study designs for data collection and ML methods used. Comparison between methods of ML and validation are common to studies of risk prediction but not disease clustering. There is likely to be a publication bias of ML studies in HF, AF and ACS. ML may improve data-driven characterisation of CVD but consensus guidelines for reporting of research using ML are urgently needed to ensure the internal and external validity and applicability of study findings. Acknowledgement/Funding Innovative Medicines Initiative (European Union)


Sign in / Sign up

Export Citation Format

Share Document