ThermoScan: Semi-automatic Identification of Protein Stability Data From PubMed

During the last years, the increasing number of DNA sequencing and protein mutagenesis studies has generated a large amount of variation data published in the biomedical literature. The collection of such data has been essential for the development and assessment of tools predicting the impact of protein variants at functional and structural levels. Nevertheless, the collection of manually curated data from literature is a highly time consuming and costly process that requires domain experts. In particular, the development of methods for predicting the effect of amino acid variants on protein stability relies on the thermodynamic data extracted from literature. In the past, such data were deposited in the ProTherm database, which however is no longer maintained since 2013. For facilitating the collection of protein thermodynamic data from literature, we developed the semi-automatic tool ThermoScan. ThermoScan is a text mining approach for the identification of relevant thermodynamic data on protein stability from full-text articles. The method relies on a regular expression searching for groups of words, including the most common conceptual words appearing in experimental studies on protein stability, several thermodynamic variables, and their units of measure. ThermoScan analyzes full-text articles from the PubMed Central Open Access subset and calculates an empiric score that allows the identification of manuscripts reporting thermodynamic data on protein stability. The method was optimized on a set of publications included in the ProTherm database, and tested on a new curated set of articles, manually selected for presence of thermodynamic data. The results show that ThermoScan returns accurate predictions and outperforms recently developed text-mining algorithms based on the analysis of publication abstracts.Availability: The ThermoScan server is freely accessible online at https://folding.biofold.org/thermoscan. The ThermoScan python code and the Google Chrome extension for submitting visualized PMC web pages to the ThermoScan server are available at https://github.com/biofold/ThermoScan.

Download Full-text

Investigating the impact of weakly supervised data on text mining models of publication transparency: a case study on randomized controlled trials

10.1101/2021.09.14.21263586 ◽

2021 ◽

Author(s):

Linh Hoang ◽

Lan Jiang ◽

Halil Kilicoglu

Keyword(s):

Text Mining ◽

Text Classification ◽

Controlled Trial ◽

Biomedical Literature ◽

Classification Model ◽

Major Barrier ◽

Weak Supervision ◽

Randomized Controlled ◽

The Impact ◽

Supervision Strategies

AbstractLack of large quantities of annotated data is a major barrier in developing effective text mining models of biomedical literature. In this study, we explored weak supervision strategies to improve the accuracy of text classification models developed for assessing methodological transparency of randomized controlled trial (RCT) publications. Specifically, we used Snorkel, a framework to programmatically build training sets, and UMLS-EDA, a data augmentation method that leverages a small number of existing examples to generate new training instances, for weak supervision and assessed their effect on a BioBERT-based text classification model proposed for the task in previous work. Performance improvements due to weak supervision were limited and were surpassed by gains from hyperparameter tuning. Our analysis suggests that refinements to the weak supervision strategies to better deal with multi-label case could be beneficial.

Download Full-text

Re-curation and Rational Enrichment of Knowledge Graphs in Biological Expression Language

10.1101/536409 ◽

2019 ◽

Author(s):

Charles Tapley Hoyt ◽

Daniel Domingo-Fernández ◽

Rana Aldisi ◽

Lingling Xu ◽

Kristian Kolpeja ◽

...

Keyword(s):

Text Mining ◽

Full Text ◽

Biomedical Literature ◽

Knowledge Graph ◽

Pubmed Central ◽

Link Type ◽

Information Density ◽

Manual Curation ◽

Rapid Accumulation ◽

Knowledge Graphs

AbstractThe rapid accumulation of new biomedical literature not only causes curated knowledge graphs to become outdated and incomplete, but also makes manual curation an impractical and unsustainable solution. Automated or semi-automated workflows are necessary to assist in prioritizing and curating the literature to update and enrich knowledge graphs.We have developed two workflows: one for re-curating a given knowledge graph to assure its syntactic and semantic quality and another for rationally enriching it by manually revising automatically extracted relations for nodes with low information density. We applied these workflows to the knowledge graphs encoded in Biological Expression Language from the NeuroMMSig database using content that was pre-extracted from MEDLINE abstracts and PubMed Central full text articles using text mining output integrated by INDRA. We have made this workflow freely available at https://github.com/bel-enrichment/bel-enrichment.Database URLhttps://github.com/bel-enrichment/results

Download Full-text

Text-mining clinically relevant cancer biomarkers for curation into the CIViC database

Genome Medicine ◽

10.1186/s13073-019-0686-y ◽

2019 ◽

Vol 11 (1) ◽

Cited By ~ 3

Author(s):

Jake Lever ◽

Martin R. Jones ◽

Arpad M. Danos ◽

Kilannin Krysiak ◽

Melika Bonakdar ◽

...

Keyword(s):

Open Access ◽

Text Mining ◽

Full Text ◽

Cancer Genomics ◽

Improve Patient Care ◽

Cancer Biomarkers ◽

Biomedical Literature ◽

Precision Oncology ◽

Creative Commons ◽

Manual Curation

Abstract Background Precision oncology involves analysis of individual cancer samples to understand the genes and pathways involved in the development and progression of a cancer. To improve patient care, knowledge of diagnostic, prognostic, predisposing, and drug response markers is essential. Several knowledgebases have been created by different groups to collate evidence for these associations. These include the open-access Clinical Interpretation of Variants in Cancer (CIViC) knowledgebase. These databases rely on time-consuming manual curation from skilled experts who read and interpret the relevant biomedical literature. Methods To aid in this curation and provide the greatest coverage for these databases, particularly CIViC, we propose the use of text mining approaches to extract these clinically relevant biomarkers from all available published literature. To this end, a group of cancer genomics experts annotated sentences that discussed biomarkers with their clinical associations and achieved good inter-annotator agreement. We then used a supervised learning approach to construct the CIViCmine knowledgebase. Results We extracted 121,589 relevant sentences from PubMed abstracts and PubMed Central Open Access full-text papers. CIViCmine contains over 87,412 biomarkers associated with 8035 genes, 337 drugs, and 572 cancer types, representing 25,818 abstracts and 39,795 full-text publications. Conclusions Through integration with CIVIC, we provide a prioritized list of curatable clinically relevant cancer biomarkers as well as a resource that is valuable to other knowledgebases and precision cancer analysts in general. All data is publically available and distributed with a Creative Commons Zero license. The CIViCmine knowledgebase is available at http://bionlp.bcgsc.ca/civicmine/.

Download Full-text

Text-mining clinically relevant cancer biomarkers for curation into the CIViC database

10.1101/500686 ◽

2018 ◽

Author(s):

Jake Lever ◽

Martin R Jones ◽

Arpad M Danos ◽

Kilannin Krysiak ◽

Melika Bonakdar ◽

...

Keyword(s):

Open Access ◽

Text Mining ◽

Full Text ◽

Cancer Genomics ◽

Improve Patient Care ◽

Biomedical Literature ◽

Precision Oncology ◽

Manual Curation ◽

Cancer Types ◽

Improve Patient

Precision oncology involves analysis of individual cancer samples to understand the genes and pathways involved in the development and progression of a cancer. To improve patient care, knowledge of diagnostic, prognostic, predisposing and drug response markers is essential. Several knowledgebases have been created by different groups to collate evidence for these associations. These include the open-access Clinical Interpretation of Variants in Cancer (CIViC) knowledgebase. These databases rely on time-consuming manual curation from skilled experts who read and interpret the relevant biomedical literature. To aid in this curation and provide the greatest coverage for these databases, particularly CIViC, we propose the use of text mining approaches to extract these clinically relevant biomarkers from all available published literature. To this end, a group of cancer genomics experts annotated biomarkers and their clinical associations discussed in 800 sentences and achieved good inter-annotator agreement. We then used a supervised learning approach to construct the CIViCmine knowledgebase (http://bionlp.bcgsc.ca/civicmine/) extracting 128,857 relevant sentences from PubMed abstracts and Pubmed Central Open Access full text papers. CIViCmine contains over 90,992 biomarkers associated with 7,866 genes, 402 drugs and 557 cancer types, representing 29,153 abstracts and 40,551 full-text publications. Through integration with CIVIC, we provide a prioritised list of curatable biomarkers as well as a resource that is valuable to other knowledgebases and precision cancer analysts in general.

Download Full-text

Analysis of Protein/Protein Interactions Through Biomedical Literature: Text Mining of Abstracts vs. Text Mining of Full Text Articles

Knowledge Exploration in Life Science Informatics - Lecture Notes in Computer Science ◽

10.1007/978-3-540-30478-4_9 ◽

2004 ◽

pp. 96-108 ◽

Cited By ~ 2

Author(s):

Eric P. G. Martin ◽

Eric G. Bremer ◽

Marie-Claude Guerin ◽

Catherine DeSesa ◽

Olivier Jouve

Keyword(s):

Text Mining ◽

Protein Interactions ◽

Full Text ◽

Biomedical Literature ◽

Protein Protein Interactions

Download Full-text

Воздействиеультразвуковогополянадинамическуювязкостьводы

Vodosnabzhenie i sanitarnaia tehnika ◽

10.35776/mnp.2019.09.04 ◽

2019 ◽

pp. 27-29

Author(s):

P. Vikulin ◽

K. Khlopov ◽

M. Cherkashin

Keyword(s):

Dynamic Viscosity ◽

Experimental Studies ◽

Ultrasonic Field ◽

Maximum Effect ◽

Ultrasonic Frequency ◽

Ultrasonic Vibrations ◽

Wastewater Disposal ◽

Laboratory Setup ◽

Sonic Treatment ◽

The Impact

Enhancing water purification processes is provided by various methods including physical ones, in particular, exposure to ultrasonic vibrations. The change in the dynamic viscosity of water affects the rate of deposition of particles in the aquatic environment which can be used in natural and wastewater treatment. At the Department Water Supply and Wastewater Disposal of the National Research Moscow State University of Civil Engineering experimental studies were conducted under laboratory conditions to study the effect of ultrasound on the change in the dynamic viscosity of water. A laboratory setup has been designed consisting of an ultrasonic frequency generator of the relative intensity, a transducer (concentrator) that transmits ultrasonic vibrations to the source water, and sonic treatment tanks. Experimental studies on the impact of the ultrasonic field in the cavitation mode on the dynamic viscosity of the aqueous medium were carried out the exposure time was obtained to achieve the maximum effect.Интенсификация процессов очистки воды осуществляется с помощью различных методов, в том числе и физических, в частности воздействием ультразвуковых колебаний. Изменение динамической вязкости воды влияет на скорость осаждения частиц в водной среде, что может быть использовано в процессах очистки природных и сточных вод. На кафедре Водоснабжение и водоотведение Национального исследовательского Московского государственного строительного университета в лабораторных условиях проведены экспериментальные исследования по изучению влияния ультразвука на изменение динамической вязкости воды. Разработана схема лабораторной установки, состоящая из генератора ультразвуковых частот с соответствующей интенсивностью, преобразователя (концентратора), передающего ультразвуковые колебания в исходную воду, и емкости для озвучивания. Выполнены экспериментальные исследования по влиянию ультразвукового поля в режиме кавитации на динамическую вязкость водной среды, получено время экспозиции для достижения максимального эффекта.

Download Full-text

Infoveillance based on Social Sensors to Analyze the impact of Covid19 in South American Population (Preprint)

10.2196/preprints.19337 ◽

2020 ◽

Cited By ~ 2

Author(s):

Josimar E. Chire Saire

Keyword(s):

Text Mining ◽

Electronic Communication ◽

Spanish Speakers ◽

South American ◽

South American Population ◽

The People ◽

Data Source ◽

The Impact ◽

General Concern ◽

Social Sensors

BACKGROUND Infoveillance is an application from Infodemiology field with the aim to monitor public health and create public policies. Social sensor is the people providing thought, ideas through electronic communication channels(i.e. Internet). The actual scenario is related to tackle the covid19 impact over the world, many countries have the infrastructure, scientists to help the growth and countries took actions to decrease the impact. South American countries have a different context about Economy, Health and Research, so Infoveillance can be a useful tool to monitor and improve the decisions and be more strategical. The motivation of this work is analyze the capital of Spanish Speakers Countries in South America using a Text Mining Approach with Twitter as data source. The preliminary results helps to understand what happens two weeks ago and opens the analysis from different perspectives i.e. Economics, Social. OBJECTIVE Analyze the behaviour of South American Capitals in front of covid19 pandemics and show the helpfulness of Text Mining Approach for Infoveillance tasks. METHODS Text Mining process RESULTS - Argentina and Venezuela capitals are the biggest number of post during this period, opposite with Bolivia, Ecuador and Uruguay. - Most relevant users are related to mass media like radio, television or newspapers. - There is a general concern about covid19 but every country talks about different areas: Economics, Health, Environmental Impact. CONCLUSIONS Infoveillance based on Social Sensors with data coming from Twitter can help to understand the trends on the population of the capitals. Besides, it is necessary to filter the posts for processing the text and get insights about frequency, top users, most important terms. This data is useful to analyse the population from different approaches. INTERNATIONAL REGISTERED REPORT RR2-https://doi.org/10.1101/2020.04.06.20055749

Download Full-text

Family Influences on Youth Offending

The Oxford Handbook of Developmental and Life-Course Criminology ◽

10.1093/oxfordhb/9780190201371.013.18 ◽

2018 ◽

pp. 377-403

Author(s):

Abigail A. Fagan ◽

Kristen M. Benedini

Keyword(s):

Life Course ◽

Family Environment ◽

Parenting Practices ◽

Experimental Studies ◽

The United States ◽

Parental Violence ◽

Youth Delinquency ◽

The Impact ◽

Children’S Exposure

This chapter reviews the degree to which empirical evidence demonstrates that families influence youth delinquency. Because they are most likely to be emphasized in life-course theories, this chapter focuses on parenting practices such as parental warmth and involvement, supervision and discipline of children, and child maltreatment. It also summarizes literature examining the role of children's exposure to parental violence, family criminality, and young (teenage) parents in affecting delinquency. Because life-course theories are ideally tested using longitudinal data, which allow examination of, in this case, the impact of parenting practices on children's subsequent behaviors, this chapter focuses on evidence generated from prospective studies conducted in the United States and other countries. It also discusses findings from experimental studies designed to reduce youth substance use and delinquency by improving the family environment.

Download Full-text

Anthropomorphic Strategies Promote Wildlife Conservation through Empathy: The Moderation Role of the Public Epidemic Situation

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18073565 ◽

2021 ◽

Vol 18 (7) ◽

pp. 3565

Author(s):

Dan Yue ◽

Zepeng Tong ◽

Jianchi Tian ◽

Yang Li ◽

Linxiu Zhang ◽

...

Keyword(s):

College Students ◽

Behavioral Intention ◽

Wildlife Conservation ◽

Negative Emotion ◽

Public Awareness ◽

Negative Emotions ◽

Experimental Studies ◽

Wildlife Trade ◽

Illegal Wildlife Trade ◽

The Impact

The global illegal wildlife trade directly threatens biodiversity and leads to disease outbreaks and epidemics. In order to avoid the loss of endangered species and ensure public health security, it is necessary to intervene in illegal wildlife trade and promote public awareness of the need for wildlife conservation. Anthropomorphism is a basic and common psychological process in humans that plays a crucial role in determining how a person interacts with other non-human agents. Previous research indicates that anthropomorphizing nature entities through metaphors could increase individual behavioral intention of wildlife conservation. However, relatively little is known about the mechanism by which anthropomorphism influences behavioral intention and whether social context affects the effect of anthropomorphism. This research investigated the impact of negative emotions associated with a pandemic situation on the effectiveness of anthropomorphic strategies for wildlife conservation across two experimental studies. Experiment 1 recruited 245 college students online and asked them to read a combination of texts and pictures as anthropomorphic materials. The results indicated that anthropomorphic materials could increase participants’ empathy and decrease their wildlife product consumption intention. Experiment 2 recruited 140 college students online and they were required to read the same materials as experiment 1 after watching a video related to epidemics. The results showed that the effect of wildlife anthropomorphization vanished if participants’ negative emotion was aroused by the video. The present research provides experimental evidence that anthropomorphic strategies would be useful for boosting public support for wildlife conservation. However, policymakers and conservation organizations must be careful about the negative effects of the pandemic context, as the negative emotions produced by it seems to weaken the effectiveness of anthropomorphic strategies.

Download Full-text

Image Statistics Preserving Encrypt-then-Compress Scheme Dedicated for JPEG Compression Standard

Entropy ◽

10.3390/e23040421 ◽

2021 ◽

Vol 23 (4) ◽

pp. 421

Author(s):

Dariusz Puchala ◽

Kamil Stokfiszewski ◽

Mykhaylo Yatsymirskyy

Keyword(s):

Statistical Analysis ◽

Image Encryption ◽

Experimental Studies ◽

Quality Measures ◽

Input Image ◽

Jpeg Compression ◽

Image Statistics ◽

Compression Stage ◽

Wide Range ◽

The Impact

In this paper, the authors analyze in more details an image encryption scheme, proposed by the authors in their earlier work, which preserves input image statistics and can be used in connection with the JPEG compression standard. The image encryption process takes advantage of fast linear transforms parametrized with private keys and is carried out prior to the compression stage in a way that does not alter those statistical characteristics of the input image that are crucial from the point of view of the subsequent compression. This feature makes the encryption process transparent to the compression stage and enables the JPEG algorithm to maintain its full compression capabilities even though it operates on the encrypted image data. The main advantage of the considered approach is the fact that the JPEG algorithm can be used without any modifications as a part of the encrypt-then-compress image processing framework. The paper includes a detailed mathematical model of the examined scheme allowing for theoretical analysis of the impact of the image encryption step on the effectiveness of the compression process. The combinatorial and statistical analysis of the encryption process is also included and it allows to evaluate its cryptographic strength. In addition, the paper considers several practical use-case scenarios with different characteristics of the compression and encryption stages. The final part of the paper contains the additional results of the experimental studies regarding general effectiveness of the presented scheme. The results show that for a wide range of compression ratios the considered scheme performs comparably to the JPEG algorithm alone, that is, without the encryption stage, in terms of the quality measures of reconstructed images. Moreover, the results of statistical analysis as well as those obtained with generally approved quality measures of image cryptographic systems, prove high strength and efficiency of the scheme’s encryption stage.

Download Full-text