space complexity
Recently Published Documents


TOTAL DOCUMENTS

363
(FIVE YEARS 65)

H-INDEX

21
(FIVE YEARS 3)

2022 ◽  
Author(s):  
Jordan M Eizenga ◽  
Benedict Paten

Modern genomic sequencing data is trending toward longer sequences with higher accuracy. Many analyses using these data will center on alignments, but classical exact alignment algorithms are infeasible for long sequences. The recently proposed WFA algorithm demonstrated how to perform exact alignment for long, similar sequences in O(sN) time and O(s2) memory, where s is a score that is low for similar sequences (Marco-Sola et al., 2021). However, this algorithm still has infeasible memory requirements for longer sequences. Also, it uses an alternate scoring system that is unfamiliar to many bioinformaticians. We describe variants of WFA that improve its asymptotic memory use from O(s2) to O(s3/2) and its asymptotic run time from O(sN) to O(s2 + N). We expect the reduction in memory use to be particularly impactful, as it makes it practical to perform highly multithreaded megabase-scale exact alignments in common compute environments. In addition, we show how to fold WFA's alternate scoring into the broader literature on alignment scores.


Author(s):  
Kalyana Saravanan ◽  
Angamuthu Tamilarasi

Big data is a collection of large volume of data and extract similar data points from large dataset. Clustering is an essential data mining technique for examining large volume of data. Several techniques have been developed for handling big dataset. However, with much time consumption and space complexity, accuracy is said to be compromised. In order to improve clustering accuracy with less complexity, Sørensen-Dice Indexing based Weighted Iterative X-means Clustering (SDI-WIXC) technique is introduced. SDI-WIXC technique is used for grouping the similar data points with higher clustering accuracy and minimal time. First, number of data points is collected from big dataset. Then, along with the weight value, the given dataset is partitioned into ‘X’ number of clusters. Next, based on the similarity measure, Weighted Iterated X-means Clustering (WIXC) is applied for clustering data points. Sørensen-Dice Indexing Process is used for measuring similarity between cluster weight value and data points. Upon similarity found between weight value of cluster and data point, data points are grouped into a specific cluster. Besides, the WIXC method also improves the cluster assignments through repeated subdivision using Bayesian probability criterion. This in turn helps to group all data points and hence, improving the clustering accuracy. Experimental evaluation is carried out with number of factors such as clustering accuracy, clustering time and space complexity with respect to the number of data points. The experimental results reported that the proposed SDI-WIXC technique obtains high clustering accuracy with minimum time as well as space complexity.


Webology ◽  
2021 ◽  
Vol 18 (2) ◽  
pp. 166-182
Author(s):  
M. Anoop ◽  
P. Sripriya

Clustering is a general task of data mining where partitioning a large dataset into dissimilar groups is done. The enormous growth of Geo-Social Networks (GeoSNs) includes users, who create millions of heterogeneous data with a variety of information. Analyzing such volume of data is a challenging task. The clustering of large volume of data is used to identify the frequently visited location information of the users in Geo-Social Networks. In order to improve the clustering of a large volume of data, a novel technique called Extended Jaccard Indexive Buffalo Optimized Data Clustering (EJIBODC) is introduced for grouping the data with high accuracy and less time consumption. The main aim of EJIBODC technique is to partition the big dataset into different groups. In this technique, many clusters with centroids are initialized to group the data. After that, Extended Jaccard Indexive Buffalo Optimization technique is applied to find the fittest cluster for grouping the data. The Extended Jaccard Index is applied in the Buffalo Optimization to measure the fitness between the data and the centroid. Based on the similarity value, using a gradient ascent function, the data finds the fittest cluster centroid for grouping. After that, the fitness value of cluster is updated and all the data gets grouped into a suitable cluster with high accuracy and minimum error rate. An experimental procedure is involved with big geo-social dataset and testing of different clustering algorithms. The series discussion is carried out on factors such as clustering accuracy, error rate, clustering time and space complexity with respect to a number of data. Experimental outcomes demonstrate that the proposed EJIBODC technique achieves improved performance in terms of higher clustering accuracy, less error rate, time consumption and space complexity when compared to previous related clustering techniques.


Author(s):  
Izabella Stach

AbstractThis paper proposes a new representation for the Public Help Index θ (briefly, PHI θ). Based on winning coalitions, the PHI θ index was introduced by Bertini et al. in (2008). The goal of this article is to reformulate the PHI θ index using null player free winning coalitions. The set of these coalitions unequivocally defines a simple game. Expressing the PHI θ index by the winning coalitions that do not contain null players allows us in a transparent way to show the parts of the power assigned to null and non-null players in a simple game. Moreover, this new representation may imply a reduction of computational cost (in the sense of space complexity) in algorithms to compute the PHI θ index if at least one of the players is a null player. We also discuss some relationships among the Holler index, the PHI θ index, and the gnp index (based on null player free winning coalitions) proposed by Álvarez-Mozos et al. in (2015).


2021 ◽  
Vol 28 (4) ◽  
pp. 225-240
Author(s):  
Mehmet Hakan Karaata

In this paper, we first coin a new graph theoretic problem called the diameter cycle problem with numerous applications. A longest cycle in a graph G = (V, E) is referred to as a diameter cycle of G iff the distance in G of every vertex on the cycle to the rest of the on-cycle vertices is maximal. We then present two algorithms for finding a diameter cycle of a biconnected graph. The first algorithm is an abstract intuitive algorithm that utilizes a brute-force mechanism for expanding an initial cycle by repeatedly replacing paths on the cycle with longer paths. The second algorithm is a concrete algorithm that uses fundamental cycles in the expansion process and has the time and space complexity of O(n^6) and O(n^2), respectively. To the best of our knowledge, this problem was neither defined nor addressed in the literature. The diameter cycle problem distinguishes itself from other cycle finding problems by identifying cycles that are maximally long while maximizing the distances between vertices in the cycle. Existing cycle finding algorithms such as fundamental and longest cycle algorithms do not discover cycles where the distances between vertices are maximized while also maximizing the length of the cycle.


Author(s):  
Oscar H. Ibarra ◽  
Jozef Jirásek ◽  
Ian McQuillan ◽  
Luca Prigioniero

This paper examines several measures of space complexity of variants of stack automata: non-erasing stack automata and checking stack automata. These measures capture the minimum stack size required to accept every word in the language of the automaton (weak measure), the maximum stack size used in any accepting computation on any accepted word (accept measure), and the maximum stack size used in any computation (strong measure). We give a detailed characterization of the accept and strong space complexity measures for checking stack automata. Exactly one of three cases can occur: the complexity is either bounded by a constant, behaves like a linear function, or it can not be bounded by any function of the length of the input word (and it is decidable which case occurs). However, this result does not hold for non-erasing stack automata; we provide an example where the space complexity grows proportionally to the square root of the length of the input. Furthermore, we study the complexity bounds of machines which accept a given language, and decidability of space complexity properties.


2021 ◽  
pp. 1-23
Author(s):  
Jinfeng Wang ◽  
Shuaihui Huang ◽  
Fajian Jiang ◽  
Zhishen Zheng ◽  
Jianbin Ou ◽  
...  

Fuzzy integral in data mining is an excellent information fusion tool. It has obvious advantages in solving the combination of features and has more successful applications in classification problems. However, with the increase of the number of features, the time complexity and space complexity of fuzzy integral will also increase exponentially. This problem limits the development of fuzzy integral. This article proposes a high-efficiency fuzzy integral—Parallel and Sparse Frame Based Fuzzy Integral (PSFI) for reducing time complexity and space complexity in the calculation of fuzzy integrals, which is based on the distributed parallel computing framework-Spark combined with the concept of sparse storage. Aiming at the efficiency problem of the Python language, Cython programming technology is introduced in the meanwhile. Our algorithm is packaged into an algorithm library to realize a more efficient PSFI. The experiments verified the impact of the number of parallel nodes on the performance of the algorithm, test the performance of PSFI in classification, and apply PSFI on regression problems and imbalanced big data classification. The results have shown that PSFI reduces the variable storage space requirements of datasets with aplenty of features by thousands of times with the increase of computing resources. Furthermore, it is proved that PSFI has higher prediction accuracy than the classic fuzzy integral running on a single processor.


Author(s):  
Anshita Garg

This is a research-based project and the basic point motivating this project is learning and implementing algorithms that reduce time and space complexity. In the first part of the project, we reduce the time taken to search a given record by using a B/B+ tree rather than indexing and traditional sequential access. It is concluded that disk-access times are much slower than main memory access times. Typical seek times and rotational delays are of the order of 5 to 6 milliseconds and typical data transfer rates are of the range of 5 to 10 million bytes per second and therefore, main memory access times are likely to be at least 4 or 5 orders of magnitude faster than disk access on any given system. Therefore, the objective is to minimize the number of disk accesses, and thus, this project is concerned with techniques for achieving that objective i.e. techniques for arranging the data on a disk so that any required piece of data, say some specific record, can be located in a few I/O’s as possible. In the second part of the project, Dynamic Programming problems were solved with Recursion, Recursion With Storage, Iteration with Storage, Iteration with Smaller Storage. The problems which have been solved in these 4 variations are Fibonacci, Count Maze Path, Count Board Path, and Longest Common Subsequence. All 4 variations are an improvement over one another and thus time and space complexity are reduced significantly as we go from Recursion to Iteration with Smaller Storage.


2021 ◽  
pp. 109821402095849
Author(s):  
Kirsten Kainz ◽  
Allison Metz ◽  
Noreen Yazejian

Large-scale education interventions aimed at diminishing disparities and generating equitable learning outcomes are often complex, involving multiple components and intended impacts. Evaluating implementation of complex interventions is challenging because of the interactive and emergent nature of intervention components. Methods that build from systems science have proven useful for addressing evaluation challenges in the complex intervention space. Complexity science shares some terminology with systems science, but the primary aims and methods of complexity science are different from those of systems science. In this paper we describe some of the language and ideas used in complexity science. We offer a set of priorities for evaluation of complex interventions based on language and ideas used in complexity science and methodologies aligned with the priorities.


Sign in / Sign up

Export Citation Format

Share Document