scholarly journals Validation of a Semiautomated Natural Language Processing–Based Procedure for Meta-Analysis of Cancer Susceptibility Gene Penetrance

2019 ◽  
pp. 1-9 ◽  
Author(s):  
Zhengyi Deng ◽  
Kanhua Yin ◽  
Yujia Bao ◽  
Victor Diego Armengol ◽  
Cathy Wang ◽  
...  

PURPOSE Quantifying the risk of cancer associated with pathogenic mutations in germline cancer susceptibility genes—that is, penetrance—enables the personalization of preventive management strategies. Conducting a meta-analysis is the best way to obtain robust risk estimates. We have previously developed a natural language processing (NLP) –based abstract classifier which classifies abstracts as relevant to penetrance, prevalence of mutations, both, or neither. In this work, we evaluate the performance of this NLP-based procedure. MATERIALS AND METHODS We compared the semiautomated NLP-based procedure, which involves automated abstract classification and text mining, followed by human review of identified studies, with the traditional procedure that requires human review of all studies. Ten high-quality gene–cancer penetrance meta-analyses spanning 16 gene–cancer associations were used as the gold standard by which to evaluate the performance of our procedure. For each meta-analysis, we evaluated the number of abstracts that required human review (workload) and the ability to identify the studies that were included by the authors in their quantitative analysis (coverage). RESULTS Compared with the traditional procedure, the semiautomated NLP-based procedure led to a lower workload across all 10 meta-analyses, with an overall 84% reduction (2,774 abstracts v 16,941 abstracts) in the amount of human review required. Overall coverage was 93%—we are able to identify 132 of 142 studies—before reviewing references of identified studies. Reasons for the 10 missed studies included blank and poorly written abstracts. After reviewing references, nine of the previously missed studies were identified and coverage improved to 99% (141 of 142 studies). CONCLUSION We demonstrated that an NLP-based procedure can significantly reduce the review workload without compromising the ability to identify relevant studies. NLP algorithms have promising potential for reducing human efforts in the literature review process.

2019 ◽  
Vol 56 (11) ◽  
pp. 718-726 ◽  
Author(s):  
Sabrina Talukdar ◽  
Lara Hawkes ◽  
Helen Hanson ◽  
Anjana Kulkarni ◽  
Angela F Brady ◽  
...  

Clinical testing with chromosomal microarray (CMA) is most commonly undertaken for clinical indications such as intellectual disability, dysmorphic features and/or congenital abnormalities. Identification of a structural aberration (SA) involving a cancer susceptibility gene (CSG) constitutes a type of incidental or secondary finding. Laboratory reporting, risk communication and clinical management of these structural aberrations with secondary implications (SASIs) is currently inconsistent. We undertake meta-analysis of 18 622 instances of CMA performed for unrelated indications in which 106 SASIs are identified involving in total 40 different CSGs. Here we present the recommendations of a joint UK working group representing the British Society of Genomic Medicine, UK Cancer Genetics Group and UK Association for Clinical Genomic Science. SASIs are categorised into four groups, defined by the type of SA and the cancer risk. For each group, recommendations are provided regarding reflex parental testing and cancer risk management.


2019 ◽  
Author(s):  
Xin-yuan Zhang ◽  
Xiao-han Wei ◽  
Bao-jie Wang ◽  
Jun Yao

Abstract Background The growing studies reports that the genes participating in repairing of DNA double-strand breaks may be cancer-susceptibility genes. Rs1805377 (A>G) is a functional single nucleotide polymorphism (SNP) in the x-ray cross-complementing group 4 (XRCC4) gene that may be involved in the etiology of cancer. However, no conclusive results can be determined from individually published studies. Thus, we performed a meta-analysis to examine the association between XRCC4 rs1805377 polymorphism and cancer risk.Methods The potential literatures were searched using three online electronic databases (PubMed, Embase, and Web of Science). The available studies were included according to the inclusion criteria. The pooled analysis were performed to explore the association between XRCC4 rs1805377 locus and the risk of cancer. Additionally, we also performed subgroup analysis and sensitivity analysis.Results Twenty-three studies were included in our meta-analysis. It contained 9,433 cancer patients and 10,337 healthy controls. The pooled results showed that there was no association between rs1805377 and the risk of cancer. Under the dominant model, the final pooled odds ratios (ORs) was 1.115 (95% confidence intervals: 0.956-1.301; P = 0.165) in a random effects model without the statistical significance. The subgroup analysis by ethnicity and source of controls also didn’t find that rs1805377 polymorphism was related to cancer occurrence. In the subgroup by type of cancers, the significant association was only found in gastric antrum adenocarcinoma.Conclusions our meta-analysis suggested that there was no association between rs1805377 polymorphism and cancer occurrence. It may provide useful information for the relevant studies on the etiology of cancer in future.


2020 ◽  
Vol 48 (6) ◽  
pp. 030006052092636
Author(s):  
Xin-yuan Zhang ◽  
Xiao-han Wei ◽  
Bao-jie Wang ◽  
Jun Yao

Objectives A growing number of studies have reported that genes involved in the repair of DNA double-strand breaks might be cancer-susceptibility genes. The x-ray cross-complementing group 4 gene ( XRCC4) encodes a protein that functions in the repair of DNA double-strand breaks, and this meta-analysis aimed to investigate the relationship between the XRCC4 rs1805377 polymorphism and cancer occurrence. Methods We retrieved case–control studies that met the inclusion criteria from PubMed, Web of Science, Embase, and China National Knowledge Infrastructure databases. Associations between rs1805377 and cancer risk were evaluated by odds ratios (ORs) using a random effects model and 95% confidence intervals (CIs) as well as sensitivity and subgroup analyses. Results After inclusion criteria were met, the meta-analysis involved 24 studies that included 9,633 cancer patients and 10,544 healthy controls. No significant association was found between rs1805377 and the risk of cancer (pooled OR = 1.107; 95% CI = 0.955–1.284) in the dominant genetic model. Similarly, no significant association was observed in the subgroup analysis. Conclusions Through this meta-analysis, we found no association between the rs1805377 polymorphism and cancer occurrence. This may provide useful information for relevant future studies into the etiology of cancer.


2020 ◽  
pp. 3-17
Author(s):  
Peter Nabende

Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven methods. Then we apply a state of the art data-driven machine translation method to learn models for automating translation between Lumasaaba and English using a very limited data set of parallel sentences. Automatic evaluation results show that a transformer-based Neural Machine Translation model architecture leads to consistently better BLEU scores than the recurrent neural network-based models. Moreover, the automatically generated translations can be comprehended to a reasonable extent and are usually associated with the source language input.


Diabetes ◽  
2019 ◽  
Vol 68 (Supplement 1) ◽  
pp. 1243-P
Author(s):  
JIANMIN WU ◽  
FRITHA J. MORRISON ◽  
ZHENXIANG ZHAO ◽  
XUANYAO HE ◽  
MARIA SHUBINA ◽  
...  

Author(s):  
Pamela Rogalski ◽  
Eric Mikulin ◽  
Deborah Tihanyi

In 2018, we overheard many CEEA-AGEC members stating that they have "found their people"; this led us to wonder what makes this evolving community unique. Using cultural historical activity theory to view the proceedings of CEEA-ACEG 2004-2018 in comparison with the geographically and intellectually adjacent ASEE, we used both machine-driven (Natural Language Processing, NLP) and human-driven (literature review of the proceedings) methods. Here, we hoped to build on surveys—most recently by Nelson and Brennan (2018)—to understand, beyond what members say about themselves, what makes the CEEA-AGEC community distinct, where it has come from, and where it is going. Engaging in the two methods of data collection quickly diverted our focus from an analysis of the data themselves to the characteristics of the data in terms of cultural historical activity theory. Our preliminary findings point to some unique characteristics of machine- and human-driven results, with the former, as might be expected, focusing on the micro-level (words and language patterns) and the latter on the macro-level (ideas and concepts). NLP generated data within the realms of "community" and "division of labour" while the review of proceedings centred on "subject" and "object"; both found "instruments," although NLP with greater granularity. With this new understanding of the relative strengths of each method, we have a revised framework for addressing our original question.  


2020 ◽  
Author(s):  
Vadim V. Korolev ◽  
Artem Mitrofanov ◽  
Kirill Karpov ◽  
Valery Tkachenko

The main advantage of modern natural language processing methods is a possibility to turn an amorphous human-readable task into a strict mathematic form. That allows to extract chemical data and insights from articles and to find new semantic relations. We propose a universal engine for processing chemical and biological texts. We successfully tested it on various use-cases and applied to a case of searching a therapeutic agent for a COVID-19 disease by analyzing PubMed archive.


Sign in / Sign up

Export Citation Format

Share Document