scholarly journals Automated Ontology Building for News Text using Associative Word Properties

It is known in the literature that ontology had been used extensively in machine learning for performance enhancement for text retrieval. It is also shown that robust ontology with detailed description of the domain knowledge will contribute to the accuracy in the retrieval. Nevertheless, we argue in some domain such as news text retrieval, building an ontology manually can be costly for a large-scale news repository and especially with the changes in content due to the dynamic events. In addition, maintenance can be a dauting task to keep up with new words that are associated with new events. This paper demonstrates the attempt to fully automate the development of an ontology for identifying the news domain and its subdomain. The ontology specification is defined based on the needs of the accuracy in retrieval. The mechanism of generating the ontology specification is defined and the results of the retrieval performance is discussed.

2020 ◽  
Vol 52 (1) ◽  
pp. 477-508 ◽  
Author(s):  
Steven L. Brunton ◽  
Bernd R. Noack ◽  
Petros Koumoutsakos

The field of fluid mechanics is rapidly advancing, driven by unprecedented volumes of data from experiments, field measurements, and large-scale simulations at multiple spatiotemporal scales. Machine learning (ML) offers a wealth of techniques to extract information from data that can be translated into knowledge about the underlying fluid mechanics. Moreover, ML algorithms can augment domain knowledge and automate tasks related to flow control and optimization. This article presents an overview of past history, current developments, and emerging opportunities of ML for fluid mechanics. We outline fundamental ML methodologies and discuss their uses for understanding, modeling, optimizing, and controlling fluid flows. The strengths and limitations of these methods are addressed from the perspective of scientific inquiry that considers data as an inherent part of modeling, experiments, and simulations. ML provides a powerful information-processing framework that can augment, and possibly even transform, current lines of fluid mechanics research and industrial applications.


BMC Genomics ◽  
2019 ◽  
Vol 20 (S11) ◽  
Author(s):  
Tianle Ma ◽  
Aidong Zhang

Abstract Background Comprehensive molecular profiling of various cancers and other diseases has generated vast amounts of multi-omics data. Each type of -omics data corresponds to one feature space, such as gene expression, miRNA expression, DNA methylation, etc. Integrating multi-omics data can link different layers of molecular feature spaces and is crucial to elucidate molecular pathways underlying various diseases. Machine learning approaches to mining multi-omics data hold great promises in uncovering intricate relationships among molecular features. However, due to the “big p, small n” problem (i.e., small sample sizes with high-dimensional features), training a large-scale generalizable deep learning model with multi-omics data alone is very challenging. Results We developed a method called Multi-view Factorization AutoEncoder (MAE) with network constraints that can seamlessly integrate multi-omics data and domain knowledge such as molecular interaction networks. Our method learns feature and patient embeddings simultaneously with deep representation learning. Both feature representations and patient representations are subject to certain constraints specified as regularization terms in the training objective. By incorporating domain knowledge into the training objective, we implicitly introduced a good inductive bias into the machine learning model, which helps improve model generalizability. We performed extensive experiments on the TCGA datasets and demonstrated the power of integrating multi-omics data and biological interaction networks using our proposed method for predicting target clinical variables. Conclusions To alleviate the overfitting problem in deep learning on multi-omics data with the “big p, small n” problem, it is helpful to incorporate biological domain knowledge into the model as inductive biases. It is very promising to design machine learning models that facilitate the seamless integration of large-scale multi-omics data and biomedical domain knowledge for uncovering intricate relationships among molecular features and clinical features.


2021 ◽  
Vol 11 (5) ◽  
pp. 2424
Author(s):  
Lin Pey Fan ◽  
Tzu How Chu

The curation design of cultural heritage sites, such as museums, influence the level of visitor satisfaction and the possibility of revisitation; therefore, an efficient exhibit layout is critical. The difficulty of determining the behavior of visitors and the layout of galleries means that exhibition layout is a knowledge-intensive, time-consuming process. The progressive development of machine learning provides a low-cost and highly flexible workflow in the management of museums, compared to traditional curation design. For example, the facility’s optimal layout, floor, and furniture arrangement can be obtained through the repeated adjustment of artificial intelligence algorithms within a relatively short time. In particular, an optimal planning method is indispensable for the immense and heavy trains in the railway museum. In this study, we created an innovative strategy to integrate the domain knowledge of exhibit displaying, spatial planning, and machine learning to establish a customized recommendation scheme. Guided by an interactive experience model and the morphology of point–line–plane–stereo, we obtained three aspects (visitors, objects, and space), 12 dimensions (orientation, visiting time, visual distance, centrality, main path, district, capacity, etc.), 30 physical principles, 24 suggestions, and five main procedures to implement layout patterns and templates to create an exhibit layout guide for the National Railway Museum of Taiwan, which is currently being transferred from the railway workshop for the sake of preserving the rail culture heritage. Our results are suitable and extendible to different museums by adjusting the criteria used to establish a new recommendation scheme.


2020 ◽  
Author(s):  
Jin Soo Lim ◽  
Jonathan Vandermause ◽  
Matthijs A. van Spronsen ◽  
Albert Musaelian ◽  
Christopher R. O’Connor ◽  
...  

Restructuring of interface plays a crucial role in materials science and heterogeneous catalysis. Bimetallic systems, in particular, often adopt very different composition and morphology at surfaces compared to the bulk. For the first time, we reveal a detailed atomistic picture of the long-timescale restructuring of Pd deposited on Ag, using microscopy, spectroscopy, and novel simulation methods. Encapsulation of Pd by Ag always precedes layer-by-layer dissolution of Pd, resulting in significant Ag migration out of the surface and extensive vacancy pits. These metastable structures are of vital catalytic importance, as Ag-encapsulated Pd remains much more accessible to reactants than bulk-dissolved Pd. The underlying mechanisms are uncovered by performing fast and large-scale machine-learning molecular dynamics, followed by our newly developed method for complete characterization of atomic surface restructuring events. Our approach is broadly applicable to other multimetallic systems of interest and enables the previously impractical mechanistic investigation of restructuring dynamics.


2021 ◽  
Author(s):  
Norberto Sánchez-Cruz ◽  
Jose L. Medina-Franco

<p>Epigenetic targets are a significant focus for drug discovery research, as demonstrated by the eight approved epigenetic drugs for treatment of cancer and the increasing availability of chemogenomic data related to epigenetics. This data represents a large amount of structure-activity relationships that has not been exploited thus far for the development of predictive models to support medicinal chemistry efforts. Herein, we report the first large-scale study of 26318 compounds with a quantitative measure of biological activity for 55 protein targets with epigenetic activity. Through a systematic comparison of machine learning models trained on molecular fingerprints of different design, we built predictive models with high accuracy for the epigenetic target profiling of small molecules. The models were thoroughly validated showing mean precisions up to 0.952 for the epigenetic target prediction task. Our results indicate that the herein reported models have considerable potential to identify small molecules with epigenetic activity. Therefore, our results were implemented as freely accessible and easy-to-use web application.</p>


2019 ◽  
Author(s):  
Ryther Anderson ◽  
Achay Biong ◽  
Diego Gómez-Gualdrón

<div>Tailoring the structure and chemistry of metal-organic frameworks (MOFs) enables the manipulation of their adsorption properties to suit specific energy and environmental applications. As there are millions of possible MOFs (with tens of thousands already synthesized), molecular simulation, such as grand canonical Monte Carlo (GCMC), has frequently been used to rapidly evaluate the adsorption performance of a large set of MOFs. This allows subsequent experiments to focus only on a small subset of the most promising MOFs. In many instances, however, even molecular simulation becomes prohibitively time consuming, underscoring the need for alternative screening methods, such as machine learning, to precede molecular simulation efforts. In this study, as a proof of concept, we trained a neural network as the first example of a machine learning model capable of predicting full adsorption isotherms of different molecules not included in the training of the model. To achieve this, we trained our neural network only on alchemical species, represented only by their geometry and force field parameters, and used this neural network to predict the loadings of real adsorbates. We focused on predicting room temperature adsorption of small (one- and two-atom) molecules relevant to chemical separations. Namely, argon, krypton, xenon, methane, ethane, and nitrogen. However, we also observed surprisingly promising predictions for more complex molecules, whose properties are outside the range spanned by the alchemical adsorbates. Prediction accuracies suitable for large-scale screening were achieved using simple MOF (e.g. geometric properties and chemical moieties), and adsorbate (e.g. forcefield parameters and geometry) descriptors. Our results illustrate a new philosophy of training that opens the path towards development of machine learning models that can predict the adsorption loading of any new adsorbate at any new operating conditions in any new MOF.</div>


2019 ◽  
Vol 19 (1) ◽  
pp. 4-16 ◽  
Author(s):  
Qihui Wu ◽  
Hanzhong Ke ◽  
Dongli Li ◽  
Qi Wang ◽  
Jiansong Fang ◽  
...  

Over the past decades, peptide as a therapeutic candidate has received increasing attention in drug discovery, especially for antimicrobial peptides (AMPs), anticancer peptides (ACPs) and antiinflammatory peptides (AIPs). It is considered that the peptides can regulate various complex diseases which are previously untouchable. In recent years, the critical problem of antimicrobial resistance drives the pharmaceutical industry to look for new therapeutic agents. Compared to organic small drugs, peptide- based therapy exhibits high specificity and minimal toxicity. Thus, peptides are widely recruited in the design and discovery of new potent drugs. Currently, large-scale screening of peptide activity with traditional approaches is costly, time-consuming and labor-intensive. Hence, in silico methods, mainly machine learning approaches, for their accuracy and effectiveness, have been introduced to predict the peptide activity. In this review, we document the recent progress in machine learning-based prediction of peptides which will be of great benefit to the discovery of potential active AMPs, ACPs and AIPs.


Sign in / Sign up

Export Citation Format

Share Document