scholarly journals Relation extraction between bacteria and biotopes from biomedical texts with attention mechanisms and domain-specific contextual representations

2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Amarin Jettakul ◽  
Duangdao Wichadakul ◽  
Peerapon Vateekul

Abstract Background The Bacteria Biotope (BB) task is a biomedical relation extraction (RE) that aims to study the interaction between bacteria and their locations. This task is considered to pertain to fundamental knowledge in applied microbiology. Some previous investigations conducted the study by applying feature-based models; others have presented deep-learning-based models such as convolutional and recurrent neural networks used with the shortest dependency paths (SDPs). Although SDPs contain valuable and concise information, some parts of crucial information that is required to define bacterial location relationships are often neglected. Moreover, the traditional word-embedding used in previous studies may suffer from word ambiguation across linguistic contexts. Results Here, we present a deep learning model for biomedical RE. The model incorporates feature combinations of SDPs and full sentences with various attention mechanisms. We also used pre-trained contextual representations based on domain-specific vocabularies. To assess the model’s robustness, we introduced a mean F1 score on many models using different random seeds. The experiments were conducted on the standard BB corpus in BioNLP-ST’16. Our experimental results revealed that the model performed better (in terms of both maximum and average F1 scores; 60.77% and 57.63%, respectively) compared with other existing models. Conclusions We demonstrated that our proposed contributions to this task can be used to extract rich lexical, syntactic, and semantic features that effectively boost the model’s performance. Moreover, we analyzed the trade-off between precision and recall to choose the proper cut-off to use in real-world applications.

2019 ◽  
Author(s):  
Amarin Jettakul ◽  
Duangdao Wichadakul ◽  
Peerapon Vateekul

AbstractThe Bacteria Biotope (BB) task is biomedical relation extraction (RE) that aims to study the interaction between bacteria and their locations. This task is considered to pertain to fundamental knowledge in applied microbiology. Some previous investigations have used feature-based models; others have presented deep-learning-based models such as convolutional and recurrent neural networks used with the shortest dependency paths (SDPs). Although SDPs contain valuable and concise information, sections of significant information necessary to define bacterial location relationships are often neglected. In addition, the traditional word embedding used in previous studies may suffer from word ambiguation across linguistic contexts.Here, we present a deep learning model for biomedical RE. The model incorporates feature combinations of SDPs and full sentences with various attention mechanisms. We also used pre-trained contextual representations based on domain-specific vocabularies. In order to assess the model’s robustness, we introduced a mean F1 score on many models using different random seeds. The experiments were conducted on the standard BB corpus in BioNLP-ST’16. Our experimental results revealed that the model performed better (in terms of both maximum and average F1 scores; 60.77% and 57.63%, respectively) compared with other existing models.We demonstrated that our proposed contributions to this task can be used to extract rich lexical, syntactic, and semantic features that effectively boost the model’s performance. Moreover, we analyzed the trade-off between precision and recall in order to choose the proper cut-off to use in real-world applications.


2020 ◽  
Author(s):  
Ting Sun ◽  
Yufei He ◽  
Wendong Li ◽  
Guang Liu ◽  
Lin Li ◽  
...  

AbstractBackgroundIDH wild-type glioblastoma (GBM) is the most aggressive tumor in the central nervous system in spite of extensive therapies. Neoantigen based personalized immune therapies achieve promising results in melanoma and lung cancer, but few neoantigen based models perform well in IDH wild-type GBM. Unlike the neoantigen load and occurrence that are well studied and often found useless, the association between neoantigen intrinsic features and prognosis remain unclear in IDH wild-type GBM.ResultsWe presented a novel neoantigen intrinsic feature-based deep learning model (neoDL) to stratify IDH wild-type GBMs into subgroups with different survivals. We first calculated a total of 2928 intrinsic features for each neoantigen and filtered out those not associated with survival, followed by applying neoDL in the TCGA data cohort. Leave one out cross validation (LOOCV) in the TCGA demonstrated that neoDL successfully classified IDH wild-type GBMs into different prognostic subgroups, which was further validated in an independent data cohorts from Asian population. Long-term survival IDH wild-type GBMs identified by neoDL were found characterized by 12 protective neoantigen intrinsic features and enriched in development and cell cycle.ConclusionsOur results provide a novel model, neoDL, that can be therapeutically exploited to identify IDH wild-type GBM with good prognosis who will most likely benefit from neoantigen based personalized immunetherapy.


Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 2893
Author(s):  
Daniel Bravo-Candel ◽  
Jésica López-Hernández ◽  
José Antonio García-Díaz ◽  
Fernando Molina-Molina ◽  
Francisco García-Sánchez

Real-word errors are characterized by being actual terms in the dictionary. By providing context, real-word errors are detected. Traditional methods to detect and correct such errors are mostly based on counting the frequency of short word sequences in a corpus. Then, the probability of a word being a real-word error is computed. On the other hand, state-of-the-art approaches make use of deep learning models to learn context by extracting semantic features from text. In this work, a deep learning model were implemented for correcting real-word errors in clinical text. Specifically, a Seq2seq Neural Machine Translation Model mapped erroneous sentences to correct them. For that, different types of error were generated in correct sentences by using rules. Different Seq2seq models were trained and evaluated on two corpora: the Wikicorpus and a collection of three clinical datasets. The medicine corpus was much smaller than the Wikicorpus due to privacy issues when dealing with patient information. Moreover, GloVe and Word2Vec pretrained word embeddings were used to study their performance. Despite the medicine corpus being much smaller than the Wikicorpus, Seq2seq models trained on the medicine corpus performed better than those models trained on the Wikicorpus. Nevertheless, a larger amount of clinical text is required to improve the results.


2021 ◽  
Vol 11 ◽  
Author(s):  
Mehdi Astaraki ◽  
Guang Yang ◽  
Yousuf Zakko ◽  
Iuliana Toma-Dasu ◽  
Örjan Smedby ◽  
...  

ObjectivesBoth radiomics and deep learning methods have shown great promise in predicting lesion malignancy in various image-based oncology studies. However, it is still unclear which method to choose for a specific clinical problem given the access to the same amount of training data. In this study, we try to compare the performance of a series of carefully selected conventional radiomics methods, end-to-end deep learning models, and deep-feature based radiomics pipelines for pulmonary nodule malignancy prediction on an open database that consists of 1297 manually delineated lung nodules.MethodsConventional radiomics analysis was conducted by extracting standard handcrafted features from target nodule images. Several end-to-end deep classifier networks, including VGG, ResNet, DenseNet, and EfficientNet were employed to identify lung nodule malignancy as well. In addition to the baseline implementations, we also investigated the importance of feature selection and class balancing, as well as separating the features learned in the nodule target region and the background/context region. By pooling the radiomics and deep features together in a hybrid feature set, we investigated the compatibility of these two sets with respect to malignancy prediction.ResultsThe best baseline conventional radiomics model, deep learning model, and deep-feature based radiomics model achieved AUROC values (mean ± standard deviations) of 0.792 ± 0.025, 0.801 ± 0.018, and 0.817 ± 0.032, respectively through 5-fold cross-validation analyses. However, after trying out several optimization techniques, such as feature selection and data balancing, as well as adding context features, the corresponding best radiomics, end-to-end deep learning, and deep-feature based models achieved AUROC values of 0.921 ± 0.010, 0.824 ± 0.021, and 0.936 ± 0.011, respectively. We achieved the best prediction accuracy from the hybrid feature set (AUROC: 0.938 ± 0.010).ConclusionThe end-to-end deep-learning model outperforms conventional radiomics out of the box without much fine-tuning. On the other hand, fine-tuning the models lead to significant improvements in the prediction performance where the conventional and deep-feature based radiomics models achieved comparable results. The hybrid radiomics method seems to be the most promising model for lung nodule malignancy prediction in this comparative study.


2021 ◽  
Vol 4 (6) ◽  
pp. e202000951
Author(s):  
Ethan Schonfeld ◽  
Edward Vendrow ◽  
Joshua Vendrow ◽  
Elan Schonfeld

Essential genes have been studied by copy number variants and deletions, both associated with introns. The premise of our work is that introns of essential genes have distinct characteristic properties. We provide support for this by training a deep learning model and demonstrating that introns alone can be used to classify essentiality. The model, limited to first introns, performs at an increased level, implicating first introns in essentiality. We identify unique properties of introns of essential genes, finding that their structure protects against deletion and intron-loss events, especially centered on the first intron. We show that GC density is increased in the first introns of essential genes, allowing for increased enhancer activity, protection against deletions, and improved splice site recognition. We find that first introns of essential genes are of remarkably smaller size than their nonessential counterparts, and to protect against common 3′ end deletion events, essential genes carry an increased number of (smaller) introns. To demonstrate the importance of the seven features we identified, we train a feature-based model using only these features and achieve high performance.


Author(s):  
SeonWoo Lee ◽  
HyeonTak Yu ◽  
HoJun Yang ◽  
InSeo Song ◽  
JaeHeung Yang ◽  
...  

Hypergravity accelerators are a type of large machinery used for gravity training or medical research. A failure of such large equipment can be a serious problem in terms of safety or costs. This paper proposes a prediction model that can proactively prevent failures that may occur in a hy-pergravity accelerator. The method proposed in this paper was to convert vibration signals to spectograms and perform classification training using a deep learning model. An experiment was conducted to evaluate the performance of the method proposed in this paper. A 4-channel accel-erometer was attached to the bearing housing, which is a rotor, and time-amplitude data were obtained from the measured values by sampling. The data were converted to a two-dimensional spectrogram, and classification training was performed using a deep learning model for four conditions of the equipment: Unbalance, Misalignment, Shaft Rubbing, and Normal. The ex-perimental results showed that the proposed method had a 99.5% F1-Score, which was up to 23% higher than the 76.25% for existing feature-based learning models.


2021 ◽  
Vol 22 (S1) ◽  
Author(s):  
Cong Sun ◽  
Zhihao Yang ◽  
Lei Wang ◽  
Yin Zhang ◽  
Hongfei Lin ◽  
...  

Abstract Background The recognition of pharmacological substances, compounds and proteins is essential for biomedical relation extraction, knowledge graph construction, drug discovery, as well as medical question answering. Although considerable efforts have been made to recognize biomedical entities in English texts, to date, only few limited attempts were made to recognize them from biomedical texts in other languages. PharmaCoNER is a named entity recognition challenge to recognize pharmacological entities from Spanish texts. Because there are currently abundant resources in the field of natural language processing, how to leverage these resources to the PharmaCoNER challenge is a meaningful study. Methods Inspired by the success of deep learning with language models, we compare and explore various representative BERT models to promote the development of the PharmaCoNER task. Results The experimental results show that deep learning with language models can effectively improve model performance on the PharmaCoNER dataset. Our method achieves state-of-the-art performance on the PharmaCoNER dataset, with a max F1-score of 92.01%. Conclusion For the BERT models on the PharmaCoNER dataset, biomedical domain knowledge has a greater impact on model performance than the native language (i.e., Spanish). The BERT models can obtain competitive performance by using WordPiece to alleviate the out of vocabulary limitation. The performance on the BERT model can be further improved by constructing a specific vocabulary based on domain knowledge. Moreover, the character case also has a certain impact on model performance.


Sign in / Sign up

Export Citation Format

Share Document