Toward Automated Large-Scale Information Integration and Discovery

Author(s):  
Paul Brown ◽  
Peter Haas ◽  
Jussi Myllymaki ◽  
Hamid Pirahesh ◽  
Berthold Reinwald ◽  
...  
2020 ◽  
Vol 18 (1) ◽  
Author(s):  
Bo-Ya Ji ◽  
Zhu-Hong You ◽  
Han-Jing Jiang ◽  
Zhen-Hao Guo ◽  
Kai Zheng

Abstract Background The prediction of potential drug-target interactions (DTIs) not only provides a better comprehension of biological processes but also is critical for identifying new drugs. However, due to the disadvantages of expensive and high time-consuming traditional experiments, only a small section of interactions between drugs and targets in the database were verified experimentally. Therefore, it is meaningful and important to develop new computational methods with good performance for DTIs prediction. At present, many existing computational methods only utilize the single type of interactions between drugs and proteins without paying attention to the associations and influences with other types of molecules. Methods In this work, we developed a novel network embedding-based heterogeneous information integration model to predict potential drug-target interactions. Firstly, a heterogeneous multi-molecuar information network is built by combining the known associations among protein, drug, lncRNA, disease, and miRNA. Secondly, the Large-scale Information Network Embedding (LINE) model is used to learn behavior information (associations with other nodes) of drugs and proteins in the network. Hence, the known drug-protein interaction pairs can be represented as a combination of attribute information (e.g. protein sequences information and drug molecular fingerprints) and behavior information of themselves. Thirdly, the Random Forest classifier is used for training and prediction. Results In the results, under the five-fold cross validation, our method obtained 85.83% prediction accuracy with 80.47% sensitivity at the AUC of 92.33%. Moreover, in the case studies of three common drugs, the top 10 candidate targets have 8 (Caffeine), 7 (Clozapine) and 6 (Pioglitazone) are respectively verified to be associated with corresponding drugs. Conclusions In short, these results indicate that our method can be a powerful tool for predicting potential drug-target interactions and finding unknown targets for certain drugs or unknown drugs for certain targets.


Author(s):  
Paulien Hogeweg

Biological evolution is a multilevel process and should be studied as such. A first, important step in studying evolution in this way has been the work of Peter Schuster and co-workers on RNA evolution. For RNA the genotype-phenotype mapping can be calculated explicitly. The resulting evolutionary dynamics is dominated by neutral paths, and the potential of major change by a single point mutation.Examining whole genomes, of which about 60 are now available, we see that gene content of genomes is changing relatively rapidly: gene duplication, gene loss and gene generation is ubiquitous. In fact, it seems that point-mutations play a relatively minor role, relative to changes in gene regulation and gene content in adaptive evolution.Large scale micro-array studies, in which the expression of every gene can be measured simultaneously, give a first glimpse of the `division of labor´ between duplicated genes. A preliminary analysis suggests that differential expression is often the primary event which allows duplicated genes to be maintained in a genome, but alternate routes also exist, most notably on the one hand the mere need of a lot of product, and on the other hand differentiation within multi-protein complexes consisting of homologous genes.I will discuss these results in terms of multilevel evolution. in particular in terms of information integration and the alternatives of `individual based´


2008 ◽  
Vol 14 (2) ◽  
pp. 253-281
Author(s):  
XABIER ARTOLA ◽  
AITOR SOROA

AbstractThe design and construction of lexical resources is a critical issue in Natural Language Processing (NLP). Real-world NLP systems need large-scale lexica, which provide rich information about words and word senses at all levels: morphologic, syntactic, lexical semantics, etc., but the construction of lexical resources is a difficult and costly task. The last decade has been highly influenced by the notion of reusability, that is, the use of the information of existing lexical resources in constructing new ones. It is unrealistic, however, to expect that the great variety of available lexical information resources could be converted into a single and standard representation schema in the near future. The purpose of this article is to present the ELHISA system, a software architecture for the integration of heterogeneous lexical information. We address, from the point of view of the information integration area, the problem of querying very different existing lexical information sources using a unique and common query language. The integration in ELHISA is performed in a logical way, so that the lexical resources do not suffer any modification when integrating them into the system. ELHISA is primarily defined as a consultation system for accessing structured lexical information, and therefore it does not have the capability to modify or update the underlying information. For this purpose, a General Conceptual Model (GCM) for describing diverse lexical data has been conceived. The GCM establishes a fixed vocabulary describing objects in the lexical information domain, their attributes, and the relationships among them. To integrate the lexical resources into the federation, a Source Conceptual Model (SCM) is built on the top of each one, which represents the lexical objects concurring in each particular source. To answer the user queries, ELHISA must access the integrated resources, and, hence, it must translate the query expressed in GCM terms into queries formulated in terms of the SCM of each source. The relation between the GCM and the SCMs is explicitly described by means of mapping rules called Content Description Rules. Data integration at the extensional level is achieved by means of the data cleansing process, needed if we want to compare the data arriving from different sources. In this process, the object identification step is carried out. Based on this architecture, a prototype named ELHISA has been built, and five resources covering a broad scope have been integrated into it so far for testing purposes. The fact that such heterogeneous resources have been integrated with ease into the system shows, in the opinion of the authors, the suitability of the approach taken.


1997 ◽  
Vol 06 (03n04) ◽  
pp. 241-268 ◽  
Author(s):  
Anisoara Nica ◽  
Elke A. Rundensteiner

Challenging issues for processing queries specified over large-scale information spaces (for example, Digital Libraries or the World Wide Web) include the diversity of the information sources in terms of their structures, query interfaces and search capabilities, as well as the dynamics of sources continuously being added, removed or upgraded. In this paper, we give an innovative solution for query planning in such environments. The foundation of our solution is the Dynamic Information Integration Model (DIIM) which supports the specification of not only content but also capabilities of resources without requiring the establishment of a uniform integration schema. Besides the development of the DIIM model, contributions of this paper include: (1) the introduction of the notion of fully specified queries that are semantically equivalent to a loosely-specified query; (2) a translation algorithm of a loosely-specified query into a set of semantically equivalent feasible query plans that are consistent with the binding patterns of query templates of the individual sources (capability descriptions in DIIM) and with interrelationships between information sources (expressed as join constraints in DIIM); and (3) a search restriction algorithm for optimizing query processing by pruning the search space into the relevant subspace of a query. The plans obtained by the proposed query planning process which is composed of the search restriction and translation algorithms can be shown to correspond to query plans semantically equivalent to the initial loosely-specified input query.


2020 ◽  
Author(s):  
D. Lu ◽  
I. Pappas ◽  
D. K. Menon ◽  
E. A. Stamatakis

AbstractHuman brains interpret external stimuli based on internal representations. One untested hypothesis is that the default-mode network (DMN) while responsible for internally oriented cognition can also encode externally oriented information. The unique neuroanatomical and functional fingerprint of the posterior part of the DMN supports a prominent role for the precuneus in this process. By utilising imaging data during two tasks from 100 participants, we found that the precuneus is functionally divided into dorsal and ventral subdivisions, each one differentially connecting to internally and externally oriented networks. The strength and direction of their connectivity is modulated by task difficulty in a manner dictated by the balance of internal versus external cognitive demands. Our study provides evidence that the medial posterior part of the DMN may drive interactions between large-scale networks, potentially allowing access to stored representations for moment to moment interpretation of an ever-changing environment.


2018 ◽  
Vol 14 (05) ◽  
pp. 93
Author(s):  
Jin Wang ◽  
Hua Shao

<span style="font-family: 'Times New Roman',serif; font-size: 10pt; mso-fareast-font-family: 'Times New Roman'; mso-fareast-language: DE; mso-ansi-language: EN-GB; mso-bidi-language: AR-SA;" lang="EN-GB">When a wireless sensor network is used to perform real-time security monitoring inside a building, there are drawbacks like multi-path signal fading and difficulty in spectrum sensing. In light of these problems, this paper proposes an improved signal spectrum sensing algorithm based on support vector machine (SVM), which inhibits the impacts brought by the low signal-noise-ratio (SNR) environment in the transmission process of wireless sensor signals through the embedded cyclostationary characteristic parameters. Based on this, considering the low efficiency and poor fault tolerance of multi-task monitoring and scheduling inside the building, this paper also proposes a multi-task coordination and scheduling algorithm based on physical information integration, which achieves multi-task scheduling and execution through intelligent breakdown and prioritization of general tasks. The simulation test shows that, compared with the artificial neural network (ANN) algorithm and the maximum-minimum eigenvalue (MME) algorithm, the proposed algorithm has much better spectrum sensing effect under low SNR, takes less computation time, and achieves higher accuracy in large-scale multi-task coordination and scheduling. The research conclusions can provide new ideas for the application of wireless sensor network in intelligent building security monitoring.</span>


2011 ◽  
Vol 327 ◽  
pp. 203-209
Author(s):  
Xia Jie Jin ◽  
Cai Xing Lin

As an important content of engineering project management, information integrated management system is a public service platform of the whole system integrate management. In some way, the engineering project management is equal to project information management. Under the theory of information integration, this paper analysis the important role of the owner in the information integrated management, and puts forward the large-scale chemical engineering project information integrated management should take the owner as main integrator and controller. Then, This paper research on the basic conditions for chemical engineering project information integration, Based on this, it build up three layers information integrated management system of large-scale chemical engineering projects based on owner; and analysis and researched the construction and implemented methods of data storage layer 、data managed and shared layer、application layer. Providing a mind of constructive about information integrate management of chemical engineering projects.


2021 ◽  
Vol 13 (14) ◽  
pp. 7937
Author(s):  
Tingchen Fang ◽  
Yiming Zhao ◽  
Jian Gong ◽  
Feiliang Wang ◽  
Jian Yang

Recently, the digital operation and maintenance of large-scale public venues have received increasing attention. The traditional building automation system (BAS), which can only provide information in a non-visualized way, is incapable of meeting the complex requirements of modern operation and maintenance. Therefore, a 3D-based building information modeling (BIM) technology is needed to improve operation and maintenance efficiency. In the paper, a BAS-to-BIM combined strategy is introduced, and the BIM-based maintenance object framework for large-scale public venues is re-built. The conversion method and lightweight method for the BIM maintenance model are introduced and a new type of public protocol, which can be used to attain a unified protocol layer that serves the BIM model, is proposed. In addition, this article presents the application of technologies, such as virtual/mixed reality, to improve the convenience of operation and maintenance. Finally, a practical project of a snow-sports stadium is given as an example to elaborate on the benefit of the proposed method. It indicates that the functions, for example, information integration, visualization, and positioning, introduced by BIM technology can effectively improve the quality and efficiency of project operation and maintenance.


Sensors ◽  
2021 ◽  
Vol 21 (11) ◽  
pp. 3583
Author(s):  
Zhansheng Liu ◽  
Xintong Meng ◽  
Zezhong Xing ◽  
Antong Jiang

Safety management in hoisting is the key issue to determine the development of prefabricated building construction. However, the security management in the hoisting stage lacks a truly effective method of information physical fusion, and the safety risk analysis of hoisting does not consider the interaction of risk factors. In this paper, a hoisting safety risk management framework based on digital twin (DT) is presented. The digital twin hoisting safety risk coupling model is built. The proposed model integrates the Internet of Things (IoT), Building Information Modeling (BIM), and a security risk analysis method combining the Apriori algorithm and complex network. The real-time perception and virtual–real interaction of multi-source information in the hoisting process are realized, the association rules and coupling relationship among hoisting safety risk factors are mined, and the time-varying data information is visualized. Demonstration in the construction of a large-scale prefabricated building shows that with the proposed framework, it is possible to complete the information fusion between the hoisting site and the virtual model and realize the visual management. The correlative relationship among hoisting construction safety risk factors is analyzed, and the key control factors are found. Moreover, the efficiency of information integration and sharing is improved, the gap of coupling analysis of security risk factors is filled, and effective security management and decision-making are achieved with the proposed approach.


Sign in / Sign up

Export Citation Format

Share Document