scholarly journals Ht-index for empirical evaluation of the sampled graph-based Discrete Pulse Transform

2020 ◽  
Vol 32 (2) ◽  
Author(s):  
Mark De Lancey ◽  
Inger Fabris-Rotelli

The Discrete Pulse Transform decomposes a signal into pulses, with the most recent and effective implementation being a graph-base algorithm called the Roadmaker’s Pavage. Even though an efficient implementation, the theoretical structure results in a slow, deterministic algorithm. This paper examines the use of the spectral domain of graphs and designs graph filter banks to downsample the algorithm, investigating the extent to which this speeds up the algorithm. Converting graph signals to the spectral domain is costly, thus estimation for filter banks is examined, as well as the design of a reusable filter bank. The sampled version requires hyperparameters to reconstruct the same textures of the image as the original algorithm, preventing a large scale study. Here an objective and efficient way of deriving similar results between the original and our proposed Filtered Roadmaker’s Pavage is provided. The method makes use of the Ht-index, separating the distribution of information at scale intervals. Empirical research using benchmark datasets provides improved results, showing that using the proposed algorithm consistently runs faster, uses less computational resources, while having a positive SSIM with low variance. This provides an informative and faster approximation to the nonlinear DPT, a property not standardly achievable.

Author(s):  
Tomer Lange ◽  
Joseph (Seffi) Naor ◽  
Gala Yadgar

Flash-based solid state drives (SSDs) have gained a central role in the infrastructure of large-scale datacenters, as well as in commodity servers and personal devices. The main limitation of flash media is its inability to support update-in-place: after data has been written to a physical location, it has to be erased before new data can be written to it. Moreover, SSDs support read and write operations in granularity of pages, while erasures are performed on entire blocks, which often contain hundreds of pages. When erasing a block, any valid data it stores must be rewritten to a clean location. As an SSD eventually wears out with progressing number of erasures, the efficiency of the management algorithm has a significant impact on its endurance. In this paper we first formally define the SSD management problem. We then explore this problem from an algorithmic perspective, considering it in both offline and online settings. In the offline setting, we present a near-optimal algorithm that, given any input, performs a negligible number of rewrites (relative to the input length). We also discuss the hardness of the offline problem. In the online setting, we first consider algorithms that have no prior knowledge about the input. We prove that no deterministic algorithm outperforms the greedy algorithm in this setting, and discuss the possible benefit of randomization. We then augment our model, assuming that each request for a page arrives with a prediction of the next time the page is updated. We design an online algorithm that uses such predictions, and show that its performance improves as the prediction error decreases. We also show that the performance of our algorithm is never worse than that guaranteed by the greedy algorithm, even when the prediction error is large. We complement our theoretical findings with an empirical evaluation of our algorithms, comparing them with the state-of-the-art scheme. The results confirm that our algorithms exhibit an improved performance for a wide range of input traces.


2021 ◽  
Vol 14 (11) ◽  
pp. 2327-2340
Author(s):  
Side Li ◽  
Arun Kumar

Many applications that use large-scale machine learning (ML) increasingly prefer different models for subgroups (e.g., countries) to improve accuracy, fairness, or other desiderata. We call this emerging popular practice learning over groups , analogizing to GROUP BY in SQL, albeit for ML training instead of SQL aggregates. From the systems standpoint, this practice compounds the already data-intensive workload of ML model selection (e.g., hyperparameter tuning). Often, thousands of models may need to be trained, necessitating high-throughput parallel execution. Alas, most ML systems today focus on training one model at a time or at best, parallelizing hyperparameter tuning. This status quo leads to resource wastage, low throughput, and high runtimes. In this work, we take the first step towards enabling and optimizing learning over groups from the data systems standpoint for three popular classes of ML: linear models, neural networks, and gradient-boosted decision trees. Analytically and empirically, we compare standard approaches to execute this workload today: task-parallelism and data-parallelism. We find neither is universally dominant. We put forth a novel hybrid approach we call grouped learning that avoids redundancy in communications and I/O using a novel form of parallel gradient descent we call Gradient Accumulation Parallelism (GAP). We prototype our ideas into a system we call Kingpin built on top of existing ML tools and the flexible massively-parallel runtime Ray. An extensive empirical evaluation on large ML benchmark datasets shows that Kingpin matches or is 4x to 14x faster than state-of-the-art ML systems, including Ray's native execution and PyTorch DDP.


2021 ◽  
pp. 095679762097751
Author(s):  
Li Zhao ◽  
Jiaxin Zheng ◽  
Haiying Mao ◽  
Xinyi Yu ◽  
Jiacheng Ye ◽  
...  

Morality-based interventions designed to promote academic integrity are being used by educational institutions around the world. Although many such approaches have a strong theoretical foundation and are supported by laboratory-based evidence, they often have not been subjected to rigorous empirical evaluation in real-world contexts. In a naturalistic field study ( N = 296), we evaluated a recent research-inspired classroom innovation in which students are told, just prior to taking an unproctored exam, that they are trusted to act with integrity. Four university classes were assigned to a proctored exam or one of three types of unproctored exam. Students who took unproctored exams cheated significantly more, which suggests that it may be premature to implement this approach in college classrooms. These findings point to the importance of conducting ecologically valid and well-controlled field studies that translate psychological theory into practice when introducing large-scale educational reforms.


2021 ◽  
Author(s):  
Parsoa Khorsand ◽  
Fereydoun Hormozdiari

Abstract Large scale catalogs of common genetic variants (including indels and structural variants) are being created using data from second and third generation whole-genome sequencing technologies. However, the genotyping of these variants in newly sequenced samples is a nontrivial task that requires extensive computational resources. Furthermore, current approaches are mostly limited to only specific types of variants and are generally prone to various errors and ambiguities when genotyping complex events. We are proposing an ultra-efficient approach for genotyping any type of structural variation that is not limited by the shortcomings and complexities of current mapping-based approaches. Our method Nebula utilizes the changes in the count of k-mers to predict the genotype of structural variants. We have shown that not only Nebula is an order of magnitude faster than mapping based approaches for genotyping structural variants, but also has comparable accuracy to state-of-the-art approaches. Furthermore, Nebula is a generic framework not limited to any specific type of event. Nebula is publicly available at https://github.com/Parsoa/Nebula.


2021 ◽  
Vol 28 ◽  
pp. 469-473
Author(s):  
Amir Miraki ◽  
Hamid Saeedi-Sourck ◽  
Nicola Marchetti ◽  
Arman Farhang

2019 ◽  
Vol 17 (06) ◽  
pp. 947-975 ◽  
Author(s):  
Lei Shi

We investigate the distributed learning with coefficient-based regularization scheme under the framework of kernel regression methods. Compared with the classical kernel ridge regression (KRR), the algorithm under consideration does not require the kernel function to be positive semi-definite and hence provides a simple paradigm for designing indefinite kernel methods. The distributed learning approach partitions a massive data set into several disjoint data subsets, and then produces a global estimator by taking an average of the local estimator on each data subset. Easy exercisable partitions and performing algorithm on each subset in parallel lead to a substantial reduction in computation time versus the standard approach of performing the original algorithm on the entire samples. We establish the first mini-max optimal rates of convergence for distributed coefficient-based regularization scheme with indefinite kernels. We thus demonstrate that compared with distributed KRR, the concerned algorithm is more flexible and effective in regression problem for large-scale data sets.


Smart Cities ◽  
2021 ◽  
Vol 4 (2) ◽  
pp. 662-685
Author(s):  
Stephan Olariu

Under present-day practices, the vehicles on our roadways and city streets are mere spectators that witness traffic-related events without being able to participate in the mitigation of their effect. This paper lays the theoretical foundations of a framework for harnessing the on-board computational resources in vehicles stuck in urban congestion in order to assist transportation agencies with preventing or dissipating congestion through large-scale signal re-timing. Our framework is called VACCS: Vehicular Crowdsourcing for Congestion Support in Smart Cities. What makes this framework unique is that we suggest that in such situations the vehicles have the potential to cooperate with various transportation authorities to solve problems that otherwise would either take an inordinate amount of time to solve or cannot be solved for lack for adequate municipal resources. VACCS offers direct benefits to both the driving public and the Smart City. By developing timing plans that respond to current traffic conditions, overall traffic flow will improve, carbon emissions will be reduced, and economic impacts of congestion on citizens and businesses will be lessened. It is expected that drivers will be willing to donate under-utilized on-board computing resources in their vehicles to develop improved signal timing plans in return for the direct benefits of time savings and reduced fuel consumption costs. VACCS allows the Smart City to dynamically respond to traffic conditions while simultaneously reducing investments in the computational resources that would be required for traditional adaptive traffic signal control systems.


Author(s):  
Siva Reddy ◽  
Mirella Lapata ◽  
Mark Steedman

In this paper we introduce a novel semantic parsing approach to query Freebase in natural language without requiring manual annotations or question-answer pairs. Our key insight is to represent natural language via semantic graphs whose topology shares many commonalities with Freebase. Given this representation, we conceptualize semantic parsing as a graph matching problem. Our model converts sentences to semantic graphs using CCG and subsequently grounds them to Freebase guided by denotations as a form of weak supervision. Evaluation experiments on a subset of the Free917 and WebQuestions benchmark datasets show our semantic parser improves over the state of the art.


Author(s):  
ASHOKA JAYAWARDENA ◽  
PAUL KWAN

In this paper, we focus on the design of oversampled filter banks and the resulting framelets. The framelets obtained exhibit improved shift invariant properties over decimated wavelet transform. Shift invariance has applications in many areas, particularly denoising, coding and compression. Our contribution here is on filter bank completion. In addition, we propose novel factorization methods to design wavelet filters from given scaling filters.


2021 ◽  
Vol 15 (6) ◽  
pp. 1-20
Author(s):  
Zhe Chen ◽  
Aixin Sun ◽  
Xiaokui Xiao

Community detection on network data is a fundamental task, and has many applications in industry. Network data in industry can be very large, with incomplete and complex attributes, and more importantly, growing. This calls for a community detection technique that is able to handle both attribute and topological information on large scale networks, and also is incremental. In this article, we propose inc-AGGMMR, an incremental community detection framework that is able to effectively address the challenges that come from scalability, mixed attributes, incomplete values, and evolving of the network. Through construction of augmented graph, we map attributes into the network by introducing attribute centers and belongingness edges. The communities are then detected by modularity maximization. During this process, we adjust the weights of belongingness edges to balance the contribution between attribute and topological information to the detection of communities. The weight adjustment mechanism enables incremental updates of community membership of all vertices. We evaluate inc-AGGMMR on five benchmark datasets against eight strong baselines. We also provide a case study to incrementally detect communities on a PayPal payment network which contains users with transactions. The results demonstrate inc-AGGMMR’s effectiveness and practicability.


Sign in / Sign up

Export Citation Format

Share Document