sorting algorithm
Recently Published Documents





Mir Ragib Ishraq ◽  
Nitesh Khadka ◽  
Asif Mohammed Samir ◽  
M. Shahidur Rahman

Three different Indic/Indo-Aryan languages - Bengali, Hindi and Nepali have been explored here in character level to find out similarities and dissimilarities. Having shared the same root, the Sanskrit, Indic languages bear common characteristics. That is why computer and language scientists can take the opportunity to develop common Natural Language Processing (NLP) techniques or algorithms. Bearing the concept in mind, we compare and analyze these three languages character by character. As an application of the hypothesis, we also developed a uniform sorting algorithm in two steps, first for the Bengali and Nepali languages only and then extended it for Hindi in the second step. Our thorough investigation with more than 30,000 words from each language suggests that, the algorithm maintains total accuracy as set by the local language authorities of the respective languages and good efficiency.

Robinson Jiménez-Moreno ◽  
Paula Useche ◽  
Javier O. Pinzón-Arenas

2021 ◽  
Vol 13 (1) ◽  
Xuhan Liu ◽  
Kai Ye ◽  
Herman W. T. van Vlijmen ◽  
Michael T. M. Emmerich ◽  
Adriaan P. IJzerman ◽  

AbstractIn polypharmacology drugs are required to bind to multiple specific targets, for example to enhance efficacy or to reduce resistance formation. Although deep learning has achieved a breakthrough in de novo design in drug discovery, most of its applications only focus on a single drug target to generate drug-like active molecules. However, in reality drug molecules often interact with more than one target which can have desired (polypharmacology) or undesired (toxicity) effects. In a previous study we proposed a new method named DrugEx that integrates an exploration strategy into RNN-based reinforcement learning to improve the diversity of the generated molecules. Here, we extended our DrugEx algorithm with multi-objective optimization to generate drug-like molecules towards multiple targets or one specific target while avoiding off-targets (the two adenosine receptors, A1AR and A2AAR, and the potassium ion channel hERG in this study). In our model, we applied an RNN as the agent and machine learning predictors as the environment. Both the agent and the environment were pre-trained in advance and then interplayed under a reinforcement learning framework. The concept of evolutionary algorithms was merged into our method such that crossover and mutation operations were implemented by the same deep learning model as the agent. During the training loop, the agent generates a batch of SMILES-based molecules. Subsequently scores for all objectives provided by the environment are used to construct Pareto ranks of the generated molecules. For this ranking a non-dominated sorting algorithm and a Tanimoto-based crowding distance algorithm using chemical fingerprints are applied. Here, we adopted GPU acceleration to speed up the process of Pareto optimization. The final reward of each molecule is calculated based on the Pareto ranking with the ranking selection algorithm. The agent is trained under the guidance of the reward to make sure it can generate desired molecules after convergence of the training process. All in all we demonstrate generation of compounds with a diverse predicted selectivity profile towards multiple targets, offering the potential of high efficacy and low toxicity.

2021 ◽  
Vol 2021 ◽  
pp. 1-7
Bochun Yin ◽  
Lei Fu

Aiming at the problems of poor data quality and low application rate in the construction of existing media corpus, this paper proposes the construction and application research of media corpus based on big data. Media corpus data are collected, the data are divided into four categories, the heuristic data item column sorting algorithm is introduced to sort all collection processes, the minimum value of data item collection rate is determined, on this basis, the maximum value of quantity in media corpus is determined, and data collection is realized in media corpus data through sliding window. Then, the state characteristics and probability distribution of feature data are determined by dynamic Bayesian network, the relationship between the state variables and dimensions of media corpus data is determined, and the media corpus data state is processed by component to complete the preprocessing of media corpus data; finally, through the application research of storage and encryption of the designed database through big data technology, the storage structure data and encryption secret key are designed to realize the construction and application of media corpus. The experimental results show that the data quality of the media corpus constructed by the proposed method is high, and its application rate has been improved to a certain extent.

2021 ◽  
pp. 203-210
Swarna Saha ◽  
Soumyadip Sarkar ◽  
Rituparna Patra ◽  
Subhasree Bhattacharjee

Information ◽  
2021 ◽  
Vol 12 (10) ◽  
pp. 413
Andry Alamsyah ◽  
Nidya Dudija ◽  
Sri Widiyanesti

Human online activities leave digital traces that provide a perfect opportunity to understand their behavior better. Social media is an excellent place to spark conversations or state opinions. Thus, it generates large-scale textual data. In this paper, we harness those data to support the effort of personality measurement. Our first contribution is to develop the Big Five personality trait-based model to detect human personalities from their textual data in the Indonesian language. The model uses an ontology approach instead of the more famous machine learning model. The former better captures the meaning and intention of phrases and words in the domain of human personality. The legacy and more thorough ways to assess nature are by doing interviews or by giving questionnaires. Still, there are many real-life applications where we need to possess an alternative method, which is cheaper and faster than the legacy methodology to select individuals based on their personality. The second contribution is to support the model implementation by building a personality measurement platform. We use two distinct features for the model: an n-gram sorting algorithm to parse the textual data and a crowdsourcing mechanism that facilitates public involvement contributing to the ontology corpus addition and filtering.

Sign in / Sign up

Export Citation Format

Share Document