discovery algorithms
Recently Published Documents


TOTAL DOCUMENTS

154
(FIVE YEARS 54)

H-INDEX

14
(FIVE YEARS 2)

Author(s):  
Edward Yuhang He ◽  
Natashia Boland ◽  
George Nemhauser ◽  
Martin Savelsbergh

Finding a shortest path in a network is a fundamental optimization problem. We focus on settings in which the travel time on an arc in the network depends on the time at which traversal of the arc begins. In such settings, reaching the destination as early as possible is not the only objective of interest. Minimizing the duration of the path, that is, the difference between the arrival time at the destination and the departure from the origin, and minimizing the travel time along the path from origin to destination, are also of interest. We introduce dynamic discretization discovery algorithms to efficiently solve such time-dependent shortest path problems with piecewise linear arc travel time functions. The algorithms operate on partially time-expanded networks in which arc costs represent lower bounds on the arc travel time over the subsequent time interval. A shortest path in this partially time-expanded network yields a lower bound on the value of an optimal path. Upper bounds are easily obtained as by-products of the lower bound calculations. The algorithms iteratively refine the discretization by exploiting breakpoints of the arc travel time functions. In addition to time discretization refinement, the algorithms permit time intervals to be eliminated, improving lower and upper bounds, until, in a finite number of iterations, optimality is proved. Computational experiments show that only a small fraction of breakpoints must be explored and that the fraction decreases as the length of the time horizon and the size of the network increases, making the algorithms highly efficient and scalable. Summary of Contribution: New data collection techniques have increased the availability and fidelity of time-dependent travel time information, making the time-dependent variant of the classic shortest path problem an extremely relevant problem in the field of operations research. This paper provides novel algorithms for the time-dependent shortest path problem with both the minimum duration and minimum travel time objectives, which aims to address the computational challenges faced by existing algorithms. A computational study shows that our new algorithm is indeed significantly more efficient than existing approaches.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Salvatore Citraro ◽  
Giulio Rossetti

AbstractGrouping well-connected nodes that also result in label-homogeneous clusters is a task often known as attribute-aware community discovery. While approaching node-enriched graph clustering methods, rigorous tools need to be developed for evaluating the quality of the resulting partitions. In this work, we present X-Mark, a model that generates synthetic node-attributed graphs with planted communities. Its novelty consists in forming communities and node labels contextually while handling categorical or continuous attributive information. Moreover, we propose a comparison between attribute-aware algorithms, testing them against our benchmark. Accordingly to different classification schema from recent state-of-the-art surveys, our results suggest that X-Mark can shed light on the differences between several families of algorithms.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Xianyong Li ◽  
Ying Tang ◽  
Yajun Du ◽  
Yanjie Li

The key nodes play important roles in the processes of information propagation and opinion evolution in social networks. Previous work rarely considered multiple relationships and features into key node discovery algorithms at the same time. Based on the relational networks including the forwarding network, replying network, and mentioning network in a social network, this paper first proposes an algorithm of the overlapping user relational network to extract different relational networks with same nodes. Integrated with these relational networks, a multirelationship network is established. Subsequently, a key node discovery (KND) algorithm is presented on the basis of the shortest path, degree centrality, and random walk features in the multirelationship network. The advantages of the proposed KND algorithm are proved by the SIR propagation model and the normalized discounted cumulative gain on the multirelationship networks and single-relation networks. The experiment’s results show that the proposed KND method for finding the key nodes is superior to other baseline methods on different networks.


2021 ◽  
Author(s):  
Jarmo Mäkelä ◽  
Laila Melkas ◽  
Ivan Mammarella ◽  
Tuomo Nieminen ◽  
Suyog Chandramouli ◽  
...  

Abstract. This is a comment on "Estimating causal networks in biosphere–atmosphere interaction with the PCMCI approach" by Krich et al., Biogeosciences, 17, 1033–1061, 2020, which gives a good introduction to causal discovery, but confines the scope by investigating the outcome of a single algorithm. In this comment, we argue that the outputs of causal discovery algorithms should not usually be considered as end results but starting points and hypothesis for further study. We illustrate how not only different algorithms, but also different initial states and prior information of possible causal model structures, affect the outcome. We demonstrate how to incorporate expert domain knowledge with causal structure discovery and how to detect and take into account overfitting and concept drift.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Riyanarto Sarno ◽  
Kelly Rossa Sungkono ◽  
Muhammad Taufiqulsa’di ◽  
Hendra Darmawan ◽  
Achmad Fahmi ◽  
...  

AbstractProcess discovery helps companies automatically discover their existing business processes based on the vast, stored event log. The process discovery algorithms have been developed rapidly to discover several types of relations, i.e., choice relations, non-free choice relations with invisible tasks. Invisible tasks in non-free choice, introduced by $$\alpha ^{\$ }$$ α $ method, is a type of relationship that combines the non-free choice and the invisible task. $$\alpha ^{\$ }$$ α $ proposed rules of ordering relations of two activities for determining invisible tasks in non-free choice. The event log records sequences of activities, so the rules of $$\alpha ^{\$ }$$ α $ check the combination of invisible task within non-free choice. The checking processes are time-consuming and result in high computing times of $$\alpha ^{\$ }$$ α $ . This research proposes Graph-based Invisible Task (GIT) method to discover efficiently invisible tasks in non-free choice. GIT method develops sequences of business activities as graphs and determines rules to discover invisible tasks in non-free choice based on relationships of the graphs. The analysis of the graph relationships by rules of GIT is more efficient than the iterative process of checking combined activities by $$\alpha ^{\$ }$$ α $ . This research measures the time efficiency of storing the event log and discovering a process model to evaluate GIT algorithm. Graph database gains highest storing computing time of batch event logs; however, this database obtains low storing computing time of streaming event logs. Furthermore, based on an event log with 99 traces, GIT algorithm discovers a process model 42 times faster than α++ and 43 times faster than α$. GIT algorithm can also handle 981 traces, while α++ and α$ has maximum traces at 99 traces. Discovering a process model by GIT algorithm has less time complexity than that by $$\alpha ^{\$ }$$ α $ , wherein GIT obtains $$O(n^{3} )$$ O ( n 3 ) and $$\alpha ^{\$ }$$ α $ obtains $$O(n^{4} )$$ O ( n 4 ) . Those results of the evaluation show a significant improvement of GIT method in term of time efficiency.


2021 ◽  
Author(s):  
Sebastian Schmidl ◽  
Thorsten Papenbrock

AbstractBidirectional order dependencies (bODs) capture order relationships between lists of attributes in a relational table. They can express that, for example, sorting books by publication date in ascending order also sorts them by age in descending order. The knowledge about order relationships is useful for many data management tasks, such as query optimization, data cleaning, or consistency checking. Because the bODs of a specific dataset are usually not explicitly given, they need to be discovered. The discovery of all minimal bODs (in set-based canonical form) is a task with exponential complexity in the number of attributes, though, which is why existing bOD discovery algorithms cannot process datasets of practically relevant size in a reasonable time. In this paper, we propose the distributed bOD discovery algorithm DISTOD, whose execution time scales with the available hardware. DISTOD is a scalable, robust, and elastic bOD discovery approach that combines efficient pruning techniques for bOD candidates in set-based canonical form with a novel, reactive, and distributed search strategy. Our evaluation on various datasets shows that DISTOD outperforms both single-threaded and distributed state-of-the-art bOD discovery algorithms by up to orders of magnitude; it can, in particular, process much larger datasets.


PLoS ONE ◽  
2021 ◽  
Vol 16 (8) ◽  
pp. e0255718
Author(s):  
Ehsan Pournoor ◽  
Zaynab Mousavian ◽  
Abbas Nowzari-Dalini ◽  
Ali Masoudi-Nejad

Regardless of all efforts on community discovery algorithms, it is still an open and challenging subject in network science. Recognizing communities in a multilayer network, where there are several layers (types) of connections, is even more complicated. Here, we concentrated on a specific type of communities called seed-centric local communities in the multilayer environment and developed a novel method based on the information cascade concept, called PLCDM. Our simulations on three datasets (real and artificial) signify that the suggested method outstrips two known earlier seed-centric local methods. Additionally, we compared it with other global multilayer and single-layer methods. Eventually, we applied our method on a biological two-layer network of Colon Adenocarcinoma (COAD), reconstructed from transcriptomic and post-transcriptomic datasets, and assessed the output modules. The functional enrichment consequences infer that the modules of interest hold biomolecules involved in the pathways associated with the carcinogenesis.


Author(s):  
Maxime Peyrard ◽  
Robert West

Causal discovery, the task of automatically constructing a causal model from data, is of major significance across the sciences. Evaluating the performance of causal discovery algorithms should ideally involve comparing the inferred models to ground-truth models available for benchmark datasets, which in turn requires a notion of distance between causal models. While such distances have been proposed previously, they are limited by focusing on graphical properties of the causal models being compared. Here, we overcome this limitation by defining distances derived from the causal distributions induced by the models, rather than exclusively from their graphical structure. Pearl and Mackenzie [2018] have arranged the properties of causal models in a hierarchy called the ``ladder of causation'' spanning three rungs: observational, interventional, and counterfactual. Following this organization, we introduce a hierarchy of three distances, one for each rung of the ladder. Our definitions are intuitively appealing as well as efficient to compute approximately. We put our causal distances to use by benchmarking standard causal discovery systems on both synthetic and real-world datasets for which ground-truth causal models are available.


2021 ◽  
Vol 16 ◽  
pp. 1-14
Author(s):  
Zineb Lamghari

Process discovery technique aims at automatically generating a process model that accurately describes a Business Process (BP) based on event data. Related discovery algorithms consider recorded events are only resulting from an operational BP type. While the management community defines three BP types, which are: Management, Support and Operational. They distinguish each BP type by different proprieties like the main business process objective as domain knowledge. This puts forward the lack of process discovery technique in obtaining process models according to business process types (Management and Support). In this paper, we demonstrate that business process types can guide the process discovery technique in generating process models. A special interest is given to the use of process mining to deal with this challenge.


2021 ◽  
Author(s):  
Eduardo Henrique Monteiro Pena ◽  
Eduardo Cunha De Almeida

This work makes contributions that reach central problems in connection with data dependencies. The first problem regards the discovery of dependencies of high expressive power. We introduce an efficient algorithm for the discovery of denial constraints: a type of dependency that has enough expressive power to generalize other important types of dependencies and to express complex business rules. The second problem concerns the application of dependencies for improving data consistency. We present a modification for traditional dependency discovery approaches that enables the dependency discovery algorithms to return reliable results even if they run on data containing some inconsistent records. Also, we present a system for detecting violations of dependencies efficiently. Our extensive experimental evaluation shows that our system is up to three orders-of-magnitude faster than state-of-the-art solutions, especially for larger datasets and massive numbers of dependency violations. The last contribution in this work regards the application of dependencies in query optimization. We present a system for the automatic discovery and selection of functional dependencies. Our experimental evaluation shows that our system selects relevant functional dependencies that help reducing the overall query response time for various types of query workloads.


Sign in / Sign up

Export Citation Format

Share Document