divide and conquer
Recently Published Documents


TOTAL DOCUMENTS

2191
(FIVE YEARS 526)

H-INDEX

59
(FIVE YEARS 7)

Author(s):  
Xianwen Liao ◽  
Yongzhong Huang ◽  
Peng Yang ◽  
Lei Chen

By defining the computable word segmentation unit and studying its probability characteristics, we establish an unsupervised statistical language model (SLM) for a new pre-trained sequence labeling framework in this article. The proposed SLM is an optimization model, and its objective is to maximize the total binding force of all candidate word segmentation units in sentences under the condition of no annotated datasets and vocabularies. To solve SLM, we design a recursive divide-and-conquer dynamic programming algorithm. By integrating SLM with the popular sequence labeling models, Vietnamese word segmentation, part-of-speech tagging and named entity recognition experiments are performed. The experimental results show that our SLM can effectively promote the performance of sequence labeling tasks. Just using less than 10% of training data and without using a dictionary, the performance of our sequence labeling framework is better than the state-of-the-art Vietnamese word segmentation toolkit VnCoreNLP on the cross-dataset test. SLM has no hyper-parameter to be tuned, and it is completely unsupervised and applicable to any other analytic language. Thus, it has good domain adaptability.


2022 ◽  
Vol 13 (2) ◽  
pp. 1-23
Author(s):  
Liang Wang ◽  
Zhiwen Yu ◽  
Bin Guo ◽  
Dingqi Yang ◽  
Lianbo Ma ◽  
...  

In this article, we propose and study a novel data-driven framework for Targeted Outdoor Advertising Recommendation (TOAR) with a special consideration of user profiles and advertisement topics. Given an advertisement query and a set of outdoor billboards with different spatial locations and rental prices, our goal is to find a subset of billboards, such that the total targeted influence is maximum under a limited budget constraint. To achieve this goal, we are facing two challenges: (1) it is difficult to estimate targeted advertising influence in physical world; (2) due to NP hardness, many common search techniques fail to provide a satisfied solution with an acceptable time, especially for large-scale problem settings. Taking into account the exposure strength, advertisement matching degree, and advertising repetition effect, we first build a targeted influence model that can characterize that the advertising influence spreads along with users mobility. Subsequently, based on a divide-and-conquer strategy, we develop two effective approaches, i.e., a master–slave-based sequential optimization method, TOAR-MSS, and a cooperative co-evolution-based optimization method, TOAR-CC, to solve our studied problem. Extensive experiments on two real-world datasets clearly validate the effectiveness and efficiency of our proposed approaches.


Author(s):  
Feng Xiong ◽  
Hongzhi Wang

The data mining has remained a subject of unfailing charm for research. The knowledge graph is rising and showing infinite life force and strong developing potential in recent years, where it is observed that acyclic knowledge graph has capacity for enhancing usability. Though the development of knowledge graphs has provided an ample scope for appearing the abilities of data mining, related researches are still insufficient. In this paper, we introduce path traversal patterns mining to knowledge graph. We design a novel simple path traversal pattern mining framework for improving the representativeness of result. A divide-and-conquer approach of combining each path is proposed to discover the most frequent traversal patterns in knowledge graph. To support the algorithm, we design a linked list structure indexed by the length of sequences with handy operations. The correctness of algorithm is proven. Experiments show that our algorithm reaches a high coverage with low output amounts compared to existing frequent sequence mining algorithms.


Author(s):  
Paul Zaharias ◽  
Tandy Warnow

With the increased availability of sequence data and even of fully sequenced and assembled genomes, phylogeny estimation of very large trees (even of hundreds of thousands of sequences) is now a goal for some biologists. Yet, the construction of these phylogenies is a complex pipeline presenting analytical and computational challenges, especially when the number of sequences is very large. In the last few years, new methods have been developed that aim to enable highly accurate phylogeny estimations on these large datasets, including divide-and-conquer techniques for multiple sequence alignment and/or tree estimation, methods that can estimate species trees from multi-locus datasets while addressing heterogeneity due to biological processes (e.g., incomplete lineage sorting and gene duplication and loss), and methods to add sequences into large gene trees or species trees. Here we present some of these recent advances and discuss opportunities for future improvements.


Mathematics ◽  
2022 ◽  
Vol 10 (2) ◽  
pp. 208
Author(s):  
Jun Wu ◽  
Yuanyuan Li ◽  
Li Shi ◽  
Liping Yang ◽  
Xiaxia Niu ◽  
...  

Existing studies have made a great endeavor in predicting users’ potential interests in items by modeling user preferences and item characteristics. As an important indicator of users’ satisfaction and loyalty, repeat purchase behavior is a promising perspective to extract insightful information for community e-commerce. However, the repeated purchase behaviors of users have not yet been thoroughly studied. To fill in this research gap from the perspective of repeated purchase behavior and improve the process of generation of candidate recommended items this research proposed a novel approach called ReRec (Repeat purchase Recommender) for real-life applications. Specifically, the proposed ReRec approach comprises two components: the first is to model the repeat purchase behaviors of different types of users and the second is to recommend items to users based on their repeat purchase behaviors of different types. The extensive experiments are conducted on a real dataset collected from a community e-commerce platform, and the performance of our model has improved at least about 13.6% compared with the state-of-the-art techniques in recommending online items (measured by F-measure). Specifically, for active users, with w = 1 and N(UA)∈[5,25], the results of ReRec show a significant improvement (at least 50%) in recommendation. With α and σ as 0.75 and 0.2284, respectively, the proposed ReRec for unactive users is also superior to (at least 13.6%) the evaluation indicators of traditional Item CF when N(UB)∈[6, 25]. To the best of our knowledge, this paper is the first to study recommendations in community e-commerce.


2022 ◽  
Vol 183 (1-2) ◽  
pp. 1-31
Author(s):  
Raymond Devillers

In order to speed up the synthesis of Petri nets from labelled transition systems, a divide and conquer strategy consists in defining decompositions of labelled transition systems, such that each component is synthesisable iff so is the original system. Then corresponding Petri Net composition operators are searched to combine the solutions of the various components into a solution of the original system. The paper presents two such techniques, which may be combined: products and articulations. They may also be used to structure transition systems, and to analyse the performance of synthesis techniques when applied to such structures.


2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Xiong Yuan ◽  
Zirong Li ◽  
Liwen Xiong ◽  
Sufeng Song ◽  
Xingfei Zheng ◽  
...  

Abstract Background Plant variety identification is the one most important of agricultural systems. Development of DNA marker profiles of released varieties to compare with candidate variety or future variety is required. However, strictly speaking, scientists did not use most existing variety identification techniques for “identification” but for “distinction of a limited number of cultivars,” of which generalization ability always not be well estimated. Because many varieties have similar genetic backgrounds, even some essentially derived varieties (EDVs) are involved, which brings difficulties for identification and breeding progress. A fast, accurate variety identification method, which also has good performance on EDV determination, needs to be developed. Results In this study, with the strategy of “Divide and Conquer,” a variety identification method Conditional Random Selection (CRS) method based on SNP of the whole genome of 3024 rice varieties was developed and be applied in essentially derived variety (EDV) identification of rice. CRS is a fast, efficient, and automated variety identification method. Meanwhile, in practical, with the optimal threshold of identity score searched in this study, the set of SNP (including 390 SNPs) showed optimal performance on EDV and non-EDV identification in two independent testing datasets. Conclusion This approach first selected a minimal set of SNPs to discriminate non-EDVs in the 3000 Rice Genome Project, then united several simplified SNP sets to improve its generalization ability for EDV and non-EDV identification in testing datasets. The results suggested that the CRS method outperformed traditional feature selection methods. Furthermore, it provides a new way to screen out core SNP loci from the whole genome for DNA fingerprinting of crop varieties and be useful for crop breeding.


2022 ◽  
Vol 156 (1) ◽  
pp. 014105
Author(s):  
Xuecheng Shao ◽  
Jian Lv ◽  
Peng Liu ◽  
Sen Shao ◽  
Pengyue Gao ◽  
...  

Author(s):  
Anand Sunder ◽  

One of the most challenging problems in computational geometry is closest pair of points given n points. Brute force algorithms[1] and Divide and conquer[1] have been verified and the lowest complexity of attributed to latter class of algorithms, with worst case being for the former being . We propose a method of partitioning the set of n-points based on the least area rectangle that can circumscribe these points


2022 ◽  
Vol 22 (1) ◽  
Author(s):  
Monique Aouad ◽  
Jean-Pierre Flandrois ◽  
Frédéric Jauffrit ◽  
Manolo Gouy ◽  
Simonetta Gribaldo ◽  
...  

Abstract Background The recent rise in cultivation-independent genome sequencing has provided key material to explore uncharted branches of the Tree of Life. This has been particularly spectacular concerning the Archaea, projecting them at the center stage as prominently relevant to understand early stages in evolution and the emergence of fundamental metabolisms as well as the origin of eukaryotes. Yet, resolving deep divergences remains a challenging task due to well-known tree-reconstruction artefacts and biases in extracting robust ancient phylogenetic signal, notably when analyzing data sets including the three Domains of Life. Among the various strategies aimed at mitigating these problems, divide-and-conquer approaches remain poorly explored, and have been primarily based on reconciliation among single gene trees which however notoriously lack ancient phylogenetic signal. Results We analyzed sub-sets of full supermatrices covering the whole Tree of Life with specific taxonomic sampling to robustly resolve different parts of the archaeal phylogeny in light of their current diversity. Our results strongly support the existence and early emergence of two main clades, Cluster I and Cluster II, which we name Ouranosarchaea and Gaiarchaea, and we clarify the placement of important novel archaeal lineages within these two clades. However, the monophyly and branching of the fast evolving nanosized DPANN members remains unclear and worth of further study. Conclusions We inferred a well resolved rooted phylogeny of the Archaea that includes all recently described phyla of high taxonomic rank. This phylogeny represents a valuable reference to study the evolutionary events associated to the early steps of the diversification of the archaeal domain. Beyond the specifics of archaeal phylogeny, our results demonstrate the power of divide-and-conquer approaches to resolve deep phylogenetic relationships, which should be applied to progressively resolve the entire Tree of Life.


Sign in / Sign up

Export Citation Format

Share Document