Words to Matter: De novo Architected Materials Design Using Transformer Neural Networks

Transformer neural networks have become widely used in a variety of AI applications, enabling significant advances in Natural Language Processing (NLP) and computer vision. Here we demonstrate the use of transformer neural networks in the de novo design of architected materials using a unique approach based on text input that enables the design to be directed by descriptive text, such as “a regular lattice of steel”. Since transformer neural nets enable the conversion of data from distinct forms into one another, including text into images, such methods have the potential to be used as a natural-language-driven tool to develop complex materials designs. In this study we use the Contrastive Language-Image Pre-Training (CLIP) and VQGAN neural networks in an iterative process to generate images that reflect text prompt driven materials designs. We then use the resulting images to generate three-dimensional models that can be realized using additive manufacturing, resulting in physical samples of these text-based materials. We present several such word-to-matter examples, and analyze 3D printed material specimen through associated additional finite element analysis, especially focused on mechanical properties including mechanism design. As an emerging new field, such language-based design approaches can have profound impact, including the use of transformer neural nets to generate machine code for 3D printing, optimization of processing conditions, and other end-to-end design environments that intersect directly with human language.

Download Full-text

Natural language processing for cognitive therapy: Extracting schemas from thought records

PLoS ONE ◽

10.1371/journal.pone.0257832 ◽

2021 ◽

Vol 16 (10) ◽

pp. e0257832

Author(s):

Franziska Burger ◽

Mark A. Neerincx ◽

Willem-Paul Brinkman

Keyword(s):

Mental Health ◽

Neural Networks ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Recurrent Neural Networks ◽

Support Vector ◽

Free Text ◽

Cognitive Approach ◽

Text Input

The cognitive approach to psychotherapy aims to change patients’ maladaptive schemas, that is, overly negative views on themselves, the world, or the future. To obtain awareness of these views, they record their thought processes in situations that caused pathogenic emotional responses. The schemas underlying such thought records have, thus far, been largely manually identified. Using recent advances in natural language processing, we take this one step further by automatically extracting schemas from thought records. To this end, we asked 320 healthy participants on Amazon Mechanical Turk to each complete five thought records consisting of several utterances reflecting cognitive processes. Agreement between two raters on manually scoring the utterances with respect to how much they reflect each schema was substantial (Cohen’s κ = 0.79). Natural language processing software pretrained on all English Wikipedia articles from 2014 (GLoVE embeddings) was used to represent words and utterances, which were then mapped to schemas using k-nearest neighbors algorithms, support vector machines, and recurrent neural networks. For the more frequently occurring schemas, all algorithms were able to leverage linguistic patterns. For example, the scores assigned to the Competence schema by the algorithms correlated with the manually assigned scores with Spearman correlations ranging between 0.64 and 0.76. For six of the nine schemas, a set of recurrent neural networks trained separately for each of the schemas outperformed the other algorithms. We present our results here as a benchmark solution, since we conducted this research to explore the possibility of automatically processing qualitative mental health data and did not aim to achieve optimal performance with any of the explored models. The dataset of 1600 thought records comprising 5747 utterances is published together with this article for researchers and machine learning enthusiasts to improve upon our outcomes. Based on our promising results, we see further opportunities for using free-text input and subsequent natural language processing in other common therapeutic tools, such as ecological momentary assessments, automated case conceptualizations, and, more generally, as an alternative to mental health scales.

Download Full-text

Neural Language Modeling for Molecule Generation

10.26434/chemrxiv.14700831 ◽

2021 ◽

Author(s):

Sanjar Adilov

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Natural Language Processing ◽

Drug Design ◽

Natural Language ◽

Language Processing ◽

De Novo ◽

Language Modeling ◽

Machine Learning Methods

Generative neural networks have shown promising results in <i>de novo</i> drug design. Recent studies suggest that one of the efficient ways to produce novel molecules matching target properties is to model SMILES sequences using deep learning in a way similar to language modeling in natural language processing. In this paper, we present a survey of various machine learning methods for SMILES-based language modeling and propose our benchmarking results on a standardized subset of ChEMBL database.

Download Full-text

Neural Language Modeling for Molecule Generation

10.26434/chemrxiv.14700831.v1 ◽

2021 ◽

Author(s):

Sanjar Adilov

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Natural Language Processing ◽

Drug Design ◽

Natural Language ◽

Language Processing ◽

De Novo ◽

Language Modeling ◽

Machine Learning Methods

Download Full-text

Optimization of Recurrent Neural Networks on Natural Language Processing

Proceedings of the 2019 8th International Conference on Computing and Pattern Recognition ◽

10.1145/3373509.3373573 ◽

2019 ◽

Cited By ~ 2

Author(s):

Jingyu Huang ◽

Yunfei Feng

Keyword(s):

Neural Networks ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Recurrent Neural Networks

Download Full-text

A Survey on Bias in Deep NLP

Applied Sciences ◽

10.3390/app11073184 ◽

2021 ◽

Vol 11 (7) ◽

pp. 3184

Author(s):

Ismael Garrido-Muñoz ◽

Arturo Montejo-Ráez ◽

Fernando Martínez-Santiago ◽

L. Alfonso Ureña-López

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Natural Language Processing ◽

Probability Distribution ◽

Natural Language ◽

Network Design ◽

Language Processing ◽

Deep Neural Networks ◽

Learning Processes ◽

Relevant Issue

Deep neural networks are hegemonic approaches to many machine learning areas, including natural language processing (NLP). Thanks to the availability of large corpora collections and the capability of deep architectures to shape internal language mechanisms in self-supervised learning processes (also known as “pre-training”), versatile and performing models are released continuously for every new network design. These networks, somehow, learn a probability distribution of words and relations across the training collection used, inheriting the potential flaws, inconsistencies and biases contained in such a collection. As pre-trained models have been found to be very useful approaches to transfer learning, dealing with bias has become a relevant issue in this new scenario. We introduce bias in a formal way and explore how it has been treated in several networks, in terms of detection and correction. In addition, available resources are identified and a strategy to deal with bias in deep NLP is proposed.

Download Full-text

Finite element analysis of Palmaz-Schatz stent and express stent using three dimensional models

2017 Third International Conference on Biosignals, Images and Instrumentation (ICBSII) ◽

10.1109/icbsii.2017.8082294 ◽

2017 ◽

Author(s):

S. Keerthana ◽

Visalakshi. Cho

Keyword(s):

Finite Element Analysis ◽

Finite Element ◽

Three Dimensional ◽

Element Analysis ◽

Dimensional Models ◽

Three Dimensional Models

Download Full-text

Abstract PO-050: Identifying de novo stage IV breast cancer (DNIV) cases in Electronic Health Records (EHR) using natural language processing

10.1158/1557-3265.adi21-po-050 ◽

2021 ◽

Author(s):

Liwei Wang ◽

Karthik Giridhar ◽

Kimberly Corbin ◽

Brenda Ernst ◽

Sadia Choudhery ◽

...

Keyword(s):

Breast Cancer ◽

Natural Language Processing ◽

Electronic Health Records ◽

Natural Language ◽

Language Processing ◽

De Novo ◽

Stage Iv ◽

Health Records ◽

Stage Iv Breast Cancer ◽

Electronic Health

Download Full-text

Natural Language Processing with Subsymbolic Neural Networks

Neural Network Perspectives on Cognition and Adaptive Robotics ◽

10.1201/9780367813239-8 ◽

2019 ◽

pp. 120-139

Author(s):

Risto Miikkulainen

Keyword(s):

Neural Networks ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing

Download Full-text

Prediction of Emergency Department Hospital Admission Based on Natural Language Processing and Neural Networks

Methods of Information in Medicine ◽

10.3414/me17-01-0024 ◽

2017 ◽

Vol 56 (05) ◽

pp. 377-389 ◽

Cited By ~ 21

Author(s):

Xingyu Zhang ◽

Joyce Kim ◽

Rachel E. Patzer ◽

Stephen R. Pitts ◽

Aaron Patzer ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Emergency Department ◽

Logistic Regression ◽

Natural Language Processing ◽

Natural Language ◽

Hospital Admission ◽

Language Processing ◽

Predictive Accuracy ◽

Free Text

SummaryObjective: To describe and compare logistic regression and neural network modeling strategies to predict hospital admission or transfer following initial presentation to Emergency Department (ED) triage with and without the addition of natural language processing elements.Methods: Using data from the National Hospital Ambulatory Medical Care Survey (NHAMCS), a cross-sectional probability sample of United States EDs from 2012 and 2013 survey years, we developed several predictive models with the outcome being admission to the hospital or transfer vs. discharge home. We included patient characteristics immediately available after the patient has presented to the ED and undergone a triage process. We used this information to construct logistic regression (LR) and multilayer neural network models (MLNN) which included natural language processing (NLP) and principal component analysis from the patient’s reason for visit. Ten-fold cross validation was used to test the predictive capacity of each model and receiver operating curves (AUC) were then calculated for each model.Results: Of the 47,200 ED visits from 642 hospitals, 6,335 (13.42%) resulted in hospital admission (or transfer). A total of 48 principal components were extracted by NLP from the reason for visit fields, which explained 75% of the overall variance for hospitalization. In the model including only structured variables, the AUC was 0.824 (95% CI 0.818-0.830) for logistic regression and 0.823 (95% CI 0.817-0.829) for MLNN. Models including only free-text information generated AUC of 0.742 (95% CI 0.7310.753) for logistic regression and 0.753 (95% CI 0.742-0.764) for MLNN. When both structured variables and free text variables were included, the AUC reached 0.846 (95% CI 0.839-0.853) for logistic regression and 0.844 (95% CI 0.836-0.852) for MLNN.Conclusions: The predictive accuracy of hospital admission or transfer for patients who presented to ED triage overall was good, and was improved with the inclusion of free text data from a patient’s reason for visit regardless of modeling approach. Natural language processing and neural networks that incorporate patient-reported outcome free text may increase predictive accuracy for hospital admission.

Download Full-text

Combining Natural Language Processing and Metabarcoding to Reveal Pathogen-Environment Associations

10.1101/2020.09.02.280578 ◽

2020 ◽

Author(s):

David C. Molik ◽

DeAndre Tomlinson ◽

Shane Davitt ◽

Eric L. Morgan ◽

Benjamin Roche ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Cryptococcus Neoformans ◽

Language Processing ◽

Ribosomal Rna ◽

Internal Transcribed Spacer ◽

De Novo ◽

Ecological Niches ◽

Sequence Read Archive ◽

Ncbi Sequence Read Archive

AbstractCryptococcus neoformans is responsible for life-threatening infections that primarily affect immunocompromised individuals and has an estimated worldwide burden of 220,000 new cases each year—with 180,000 resulting deaths—mostly in sub-Saharan Africa. Surprisingly, little is known about the ecological niches occupied by C. neoformans in nature. To expand our understanding of the distribution and ecological associations of this pathogen we implement a Natural Language Processing approach to better describe the niche of C. neoformans. We use a Latent Dirichlet Allocation model to de novo topic model sets of metagenetic research articles written about varied subjects which either explicitly mention, inadvertently find, or fail to find C. neoformans. These articles are all linked to NCBI Sequence Read Archive datasets of 18S ribosomal RNA and/or Internal Transcribed Spacer gene-regions. The number of topics was determined based on the model coherence score, and articles were assigned to the created topics via a Machine Learning approach with a Random Forest algorithm. Our analysis provides support for a previously suggested linkage between C. neoformans and soils associated with decomposing wood. Our approach, using a search of single-locus metagenetic data, gathering papers connected to the datasets, de novo determination of topics, the number of topics, and assignment of articles to the topics, illustrates how such an analysis pipeline can harness large-scale datasets that are published/available but not necessarily fully analyzed, or whose metadata is not harmonized with other studies. Our approach can be applied to a variety of systems to assert potential evidence of environmental associations.Author SummaryOur finding that C. neoformans is associated with decomposing wood is reinforced by the general literature on C. neoformans and its close congeneric relatives and warrants further investigation. This work demonstrates the potential utility of pairing Natural Language Processing (NLP) with single-locus metagenetic data for the study of Neglected Tropical Diseases. We present a novel method to study the ecological niches of rare pathogens that leverages the immense amount of data available to researchers in the NCBI Sequence Read Archive (SRA)combined with a text-mining analysis based on Natural Language Processing. We demonstrate that text processing, noun identification, and verb identification can play an important role in analyzing a large corpus of documents together with metagenetic data. Forging this connection requires access to all of the available ecological 18S ribosomal RNA and Internal Transcribed Spacer NCBI SRA datasets. These datasets use metabarcoding to query taxonomic diversity in eukaryotic organisms, and in the case of the Internal Transcribed Spacer, they specifically target Fungi. The presence of specific species is inferred when diagnostic 18S or ITS gene region sequences are found in the SRA data. We searched for C. neoformans in all 18S and ITS datasets available and gathered all associated journal articles that either cite the SRA data accessions or are cited in the SRA data accessions.Published metagenetic data often have associated metadata including: latitude and longitude, temperature, and other physical characteristics describing the conditions in which the metagenetic sample was collected. These metadata are not always be presented in consistent formats, so harmonizing study methods may be needed to appropriately compare metagenetic data as commonly required in metanalysis studies. We present an analysis which takes as input articles associated with SRA datasets that were found to contain evidence of C. neoformans. We apply NLP methods to this corpus of articles to describe the niche of C. neoformans. Our results reinforce the current understanding of C. neoformans’s niche, indicating the pertinence of employing a NLP analysis to identify the niche of an organism. This approach could further the description of virtually any other organism that routinely appears in metagenetic surveys, especially pathogens, whose ecological niches are unknown or poorly understood.Optional Striking ImageCryptococcus neoformans cells budding. Image Provided Courtesy of Felipe H. Santiago-Tirado, colored by Kristina Davis, CC-BY 4.0

Download Full-text