scholarly journals A clinical specific BERT developed using a huge Japanese clinical text corpus

PLoS ONE ◽  
2021 ◽  
Vol 16 (11) ◽  
pp. e0259763
Author(s):  
Yoshimasa Kawazoe ◽  
Daisaku Shibata ◽  
Emiko Shinohara ◽  
Eiji Aramaki ◽  
Kazuhiko Ohe

Generalized language models that are pre-trained with a large corpus have achieved great performance on natural language tasks. While many pre-trained transformers for English are published, few models are available for Japanese text, especially in clinical medicine. In this work, we demonstrate the development of a clinical specific BERT model with a huge amount of Japanese clinical text and evaluate it on the NTCIR-13 MedWeb that has fake Twitter messages regarding medical concerns with eight labels. Approximately 120 million clinical texts stored at the University of Tokyo Hospital were used as our dataset. The BERT-base was pre-trained using the entire dataset and a vocabulary including 25,000 tokens. The pre-training was almost saturated at about 4 epochs, and the accuracies of Masked-LM and Next Sentence Prediction were 0.773 and 0.975, respectively. The developed BERT did not show significantly higher performance on the MedWeb task than the other BERT models that were pre-trained with Japanese Wikipedia text. The advantage of pre-training on clinical text may become apparent in more complex tasks on actual clinical text, and such an evaluation set needs to be developed.

2020 ◽  
Author(s):  
Yoshimasa Kawazoe ◽  
Daisaku Shibata ◽  
Emiko Shinohara ◽  
Eiji Aramaki ◽  
Kazuhiko Ohe

Generalized language models that pre-trained with a large corpus have achieved great performance on natural language tasks. While many pre-trained transformers for English are published, few models are available for Japanese text, especially in clinical medicine. In this work, we demonstrate a development of a clinical specific BERT model with a huge size of Japanese clinical narrative and evaluated it on the NTCIR-13 MedWeb that has pseudo-Twitter messages about medical concerns with eight labels. Approximately 120 millions of clinical text stored at the University of Tokyo Hospital were used as dataset. The BERT-base was pre-trained with the entire dataset and a vocabulary including 25,000 tokens. The pre-training was almost saturated at about 4 epochs, and the accuracies of Masked LM and Next Sentence Prediction were 0.773 and 0.975, respectively. The developed BERT tends to show higher performances on the MedWeb task than the other nonspecific BERTs, however, no significant differences were found. The advantage of training on domain-specific texts may become apparent in the more complex tasks on actual clinical text, and such corpus for the evaluation is required to be developed.


2021 ◽  
Vol 9 ◽  
pp. 226-242
Author(s):  
Zhaofeng Wu ◽  
Hao Peng ◽  
Noah A. Smith

Abstract For natural language processing systems, two kinds of evidence support the use of text representations from neural language models “pretrained” on large unannotated corpora: performance on application-inspired benchmarks (Peters et al., 2018, inter alia), and the emergence of syntactic abstractions in those representations (Tenney et al., 2019, inter alia). On the other hand, the lack of grounded supervision calls into question how well these representations can ever capture meaning (Bender and Koller, 2020). We apply novel probes to recent language models— specifically focusing on predicate-argument structure as operationalized by semantic dependencies (Ivanova et al., 2012)—and find that, unlike syntax, semantics is not brought to the surface by today’s pretrained models. We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning, yielding benefits to natural language understanding (NLU) tasks in the GLUE benchmark. This approach demonstrates the potential for general-purpose (rather than task-specific) linguistic supervision, above and beyond conventional pretraining and finetuning. Several diagnostics help to localize the benefits of our approach.1


2019 ◽  
Vol 5 (3) ◽  
pp. 189
Author(s):  
Amado C Gequinto ◽  
Do Mads

Skills and competencies are highly regarded in todays global market. Different agencies specifically those seeking for  technologists, technicians, and engineers, have stressed out that skills and competencies as major components  for individual workers.  This aimed to determine  the relevance and appropriateness of acquired skills and competencies by industrial technology graduates, and determine the extent of use of skills and competencies in the current employment. Review of related literatures and studies have been considered in the realization, understanding, analysis, and interpretation of this research exploration. A descriptive method of research was used with 78 graduates from 2015-2016 and 117 graduates from 2016-2017, who participated in the study survey process. The BatStateU Standardized Questionnaire was used to gather data. A brief interview and talk during the visit of alumni in the university was also considered, as well as the other means of social media like email, facebook, messenger, and text messaging.   Results show that skills and competecnices acquired by industrial technology graduates are all relevant and appropriate.  The study also found that there is some to great extent use of acquired skills and competencies to their current employment. The study implies that the acquired skills and competencies from the university significantly provided the graduates the opportunities ins the national and global markets and industries.


Maximise your exam success with this essential revision guide. The third edition of Oxford Assess and Progress: Clinical Medicine features over 550 Single Best Answer questions. Packed with questions written by practicing clinicians and educators, this revision tool is an authoritative guide on core clinical topics and professional themes. Each question is accompanied by extensive feedback which explains not only the rationale of the correct answer, but why the other options are incorrect. Further reading resources and cross-references to the Oxford Handbook of Clinical Medicine have been fully updated to expand your revision further. Progess to exam success with the third edition of Oxford Assess and Progress: Clinical Medicine.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Lisa Grossman Liu ◽  
Raymond H. Grossman ◽  
Elliot G. Mitchell ◽  
Chunhua Weng ◽  
Karthik Natarajan ◽  
...  

AbstractThe recognition, disambiguation, and expansion of medical abbreviations and acronyms is of upmost importance to prevent medically-dangerous misinterpretation in natural language processing. To support recognition, disambiguation, and expansion, we present the Medical Abbreviation and Acronym Meta-Inventory, a deep database of medical abbreviations. A systematic harmonization of eight source inventories across multiple healthcare specialties and settings identified 104,057 abbreviations with 170,426 corresponding senses. Automated cross-mapping of synonymous records using state-of-the-art machine learning reduced redundancy, which simplifies future application. Additional features include semi-automated quality control to remove errors. The Meta-Inventory demonstrated high completeness or coverage of abbreviations and senses in new clinical text, a substantial improvement over the next largest repository (6–14% increase in abbreviation coverage; 28–52% increase in sense coverage). To our knowledge, the Meta-Inventory is the most complete compilation of medical abbreviations and acronyms in American English to-date. The multiple sources and high coverage support application in varied specialties and settings. This allows for cross-institutional natural language processing, which previous inventories did not support. The Meta-Inventory is available at https://bit.ly/github-clinical-abbreviations.


2020 ◽  
Vol 39 (3) ◽  
pp. 182-188
Author(s):  
Samuel M. Cohen

To begin, I wish to thank the Academy of Toxicological Sciences for bestowing this honor on me. I have had a rewarding career in basic research and clinical medicine, beginning with research in high school and always planning on becoming a physician. I have had the good fortune of having outstanding mentors, wonderful parents, and a supportive and intuitive wife and family. This article provides a brief overview of some of the events of my career and individuals who have played a major role, beginning with the M.D./Ph.D. program at the University of Wisconsin, pathology residency and faculty at St. Vincent Hospital, Worcester, Massachusetts, a year as visiting professor at Nagoya City University, and my career at the University of Nebraska Medical Center since 1981. This could not have happened without the strong input and support from these individuals, the numerous students, residents and fellows with whom I have learned so much, and the more than 500 terrific collaborators.


2021 ◽  
Vol 13 (4) ◽  
pp. 1828
Author(s):  
Elisa Chaleta ◽  
Margarida Saraiva ◽  
Fátima Leal ◽  
Isabel Fialho ◽  
António Borralho

In this work we analyzed the mapping of Sustainable Development Goals in the curricular units of the undergraduate courses of the School of Social Sciences at the University of Évora. Of a total of 449 curricular units, only 374 had students enrolled in 2020/2021. The data presented refer to the 187 course units that had Sustainable Development Goals in addition to SDG4 (Quality Education) assigned to all the course units. Considering the set of curricular units, the results showed that the most mentioned objectives were those related to Gender Equality (SDG 5), Reduced Inequalities (SDG 10), Decent Work and Economic Growth (SDG 8) and Peace, Justice and Strong Institutions (SDG 16). Regarding the differences between the departments, which are also distinct scientific areas, we have observed that the Departments of Economics and Management had more objectives related to labor and economic growth, while the other departments mentioned more objectives related to inequalities, gender or other.


Symmetry ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 634
Author(s):  
Alakbar Valizada ◽  
Natavan Akhundova ◽  
Samir Rustamov

In this paper, various methodologies of acoustic and language models, as well as labeling methods for automatic speech recognition for spoken dialogues in emergency call centers were investigated and comparatively analyzed. Because of the fact that dialogue speech in call centers has specific context and noisy, emotional environments, available speech recognition systems show poor performance. Therefore, in order to accurately recognize dialogue speeches, the main modules of speech recognition systems—language models and acoustic training methodologies—as well as symmetric data labeling approaches have been investigated and analyzed. To find an effective acoustic model for dialogue data, different types of Gaussian Mixture Model/Hidden Markov Model (GMM/HMM) and Deep Neural Network/Hidden Markov Model (DNN/HMM) methodologies were trained and compared. Additionally, effective language models for dialogue systems were defined based on extrinsic and intrinsic methods. Lastly, our suggested data labeling approaches with spelling correction are compared with common labeling methods resulting in outperforming the other methods with a notable percentage. Based on the results of the experiments, we determined that DNN/HMM for an acoustic model, trigram with Kneser–Ney discounting for a language model and using spelling correction before training data for a labeling method are effective configurations for dialogue speech recognition in emergency call centers. It should be noted that this research was conducted with two different types of datasets collected from emergency calls: the Dialogue dataset (27 h), which encapsulates call agents’ speech, and the Summary dataset (53 h), which contains voiced summaries of those dialogues describing emergency cases. Even though the speech taken from the emergency call center is in the Azerbaijani language, which belongs to the Turkic group of languages, our approaches are not tightly connected to specific language features. Hence, it is anticipated that suggested approaches can be applied to the other languages of the same group.


Author(s):  
Santiago Zanella-Béguelin ◽  
Lukas Wutschitz ◽  
Shruti Tople ◽  
Victor Rühle ◽  
Andrew Paverd ◽  
...  

Science ◽  
2021 ◽  
Vol 371 (6526) ◽  
pp. 284-288 ◽  
Author(s):  
Brian Hie ◽  
Ellen D. Zhong ◽  
Bonnie Berger ◽  
Bryan Bryson

The ability for viruses to mutate and evade the human immune system and cause infection, called viral escape, remains an obstacle to antiviral and vaccine development. Understanding the complex rules that govern escape could inform therapeutic design. We modeled viral escape with machine learning algorithms originally developed for human natural language. We identified escape mutations as those that preserve viral infectivity but cause a virus to look different to the immune system, akin to word changes that preserve a sentence’s grammaticality but change its meaning. With this approach, language models of influenza hemagglutinin, HIV-1 envelope glycoprotein (HIV Env), and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) Spike viral proteins can accurately predict structural escape patterns using sequence data alone. Our study represents a promising conceptual bridge between natural language and viral evolution.


Sign in / Sign up

Export Citation Format

Share Document