scholarly journals Debug Utility for Validation of OASIS Parser

In EDA industry, GDSII format is the IC industry de facto standard for IC layout data exchange. The designs developed may require up to 1 Billion Byte (1TByte) of disk data. This huge amount of Big Data not only slows down the runtimes from design to physical verification but also increases the time to get a design to market. On the other hand, OASIS stream format which is replacement to GDSII is relatively new and is emerging in the industry. The OASIS stream format significantly reduces the data and thereby the tool is much likely to run faster. There hasn’t been significant development in enhancing the robustness of its usage due to lack of in-house test cases. This paper presents an approach to develop a debug utility for OASIS parser validation to increase its robustness. The debug utility is implemented using a Singleton design pattern. The utility essentially enables us to compare the data associated with both the stream formats and highlights the differences. The effective memory utilization of the proposed design is zero since all the structure are dynamically created and destroyed after its use. Iterative and unit testing were performed on the utility and the proposed design was tested with real time test cases to verify its robustness.

The necessity of Unit Testing cannot be denied in catching bugs at the lowest level of software development with low cost of fixing bugs. It is difficult and costly to detect a bug at a later stage of development in a large module of software. Since there are many individual units (functions) in a software program, manual unit testing needs a lot of effort and time. A huge amount of time and human effort can be saved if unit testing is automated. There are 3 basic needs of unit testing i.e. identifying the inputs to a function, detecting runtime errors and detecting logical errors. Concolic Testing is used to identify the inputs to function and detecting the runtime errors. Mutation Testing is used to detect the logical errors. The identified inputs to a function are called as Test Cases. Mutation testing can verify the test cases to see if there is a need of having more test cases for a function. Our software uses both Concolic and Mutation testing and can automate the unit testing process of Java code to a great extent. The output of the work is JUnit where user can make his own assertions for every auto-generated test cases.


Author(s):  
Manbir Sandhu ◽  
Purnima, Anuradha Saini

Big data is a fast-growing technology that has the scope to mine huge amount of data to be used in various analytic applications. With large amount of data streaming in from a myriad of sources: social media, online transactions and ubiquity of smart devices, Big Data is practically garnering attention across all stakeholders from academics, banking, government, heath care, manufacturing and retail. Big Data refers to an enormous amount of data generated from disparate sources along with data analytic techniques to examine this voluminous data for predictive trends and patterns, to exploit new growth opportunities, to gain insight, to make informed decisions and optimize processes. Data-driven decision making is the essence of business establishments. The explosive growth of data is steering the business units to tap the potential of Big Data to achieve fueling growth and to achieve a cutting edge over their competitors. The overwhelming generation of data brings with it, its share of concerns. This paper discusses the concept of Big Data, its characteristics, the tools and techniques deployed by organizations to harness the power of Big Data and the daunting issues that hinder the adoption of Business Intelligence in Big Data strategies in organizations.


Author(s):  
Muhammad Waqar Khan ◽  
Muhammad Asghar Khan ◽  
Muhammad Alam ◽  
Wajahat Ali

<p>During past few years, data is growing exponentially attracting researchers to work a popular term, the Big Data. Big Data is observed in various fields, such as information technology, telecommunication, theoretical computing, mathematics, data mining and data warehousing. Data science is frequently referred with Big Data as it uses methods to scale down the Big Data. Currently<br />more than 3.2 billion of the world population is connected to internet out of which 46% are connected via smart phones. Over 5.5 billion people are using cell phones. As technology is rapidly shifting from ordinary cell phones towards smart phones, therefore proportion of using internet is also growing. There<br />is a forecast that by 2020 around 7 billion people at the globe will be using internet out of which 52% will be using their smart phones to connect. In year 2050 that figure will be touching 95% of world population. Every device connect to internet generates data. As majority of the devices are using smart phones to<br />generate this data by using applications such as Instagram, WhatsApp, Apple, Google, Google+, Twitter, Flickr etc., therefore this huge amount of data is becoming a big threat for telecom sector. This paper is giving a comparison of amount of Big Data generated by telecom industry. Based on the collected data<br />we use forecasting tools to predict the amount of Big Data will be generated in future and also identify threats that telecom industry will be facing from that huge amount of Big Data.</p>


2021 ◽  
Vol 18 (4) ◽  
pp. 1-22
Author(s):  
Jerzy Proficz

Two novel algorithms for the all-gather operation resilient to imbalanced process arrival patterns (PATs) are presented. The first one, Background Disseminated Ring (BDR), is based on the regular parallel ring algorithm often supplied in MPI implementations and exploits an auxiliary background thread for early data exchange from faster processes to accelerate the performed all-gather operation. The other algorithm, Background Sorted Linear synchronized tree with Broadcast (BSLB), is built upon the already existing PAP-aware gather algorithm, that is, Background Sorted Linear Synchronized tree (BSLS), followed by a regular broadcast distributing gathered data to all participating processes. The background of the imbalanced PAP subject is described, along with the PAP monitoring and evaluation topics. An experimental evaluation of the algorithms based on a proposed mini-benchmark is presented. The mini-benchmark was performed over 2,000 times in a typical HPC cluster architecture with homogeneous compute nodes. The obtained results are analyzed according to different PATs, data sizes, and process numbers, showing that the proposed optimization works well for various configurations, is scalable, and can significantly reduce the all-gather elapsed times, in our case, up to factor 1.9 or 47% in comparison with the best state-of-the-art solution.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Hossein Ahmadvand ◽  
Fouzhan Foroutan ◽  
Mahmood Fathy

AbstractData variety is one of the most important features of Big Data. Data variety is the result of aggregating data from multiple sources and uneven distribution of data. This feature of Big Data causes high variation in the consumption of processing resources such as CPU consumption. This issue has been overlooked in previous works. To overcome the mentioned problem, in the present work, we used Dynamic Voltage and Frequency Scaling (DVFS) to reduce the energy consumption of computation. To this goal, we consider two types of deadlines as our constraint. Before applying the DVFS technique to computer nodes, we estimate the processing time and the frequency needed to meet the deadline. In the evaluation phase, we have used a set of data sets and applications. The experimental results show that our proposed approach surpasses the other scenarios in processing real datasets. Based on the experimental results in this paper, DV-DVFS can achieve up to 15% improvement in energy consumption.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Sara Migliorini ◽  
Alberto Belussi ◽  
Elisa Quintarelli ◽  
Damiano Carra

AbstractThe MapReduce programming paradigm is frequently used in order to process and analyse a huge amount of data. This paradigm relies on the ability to apply the same operation in parallel on independent chunks of data. The consequence is that the overall performances greatly depend on the way data are partitioned among the various computation nodes. The default partitioning technique, provided by systems like Hadoop or Spark, basically performs a random subdivision of the input records, without considering the nature and correlation between them. Even if such approach can be appropriate in the simplest case where all the input records have to be always analyzed, it becomes a limit for sophisticated analyses, in which correlations between records can be exploited to preliminarily prune unnecessary computations. In this paper we design a context-based multi-dimensional partitioning technique, called CoPart, which takes care of data correlation in order to determine how records are subdivided between splits (i.e., units of work assigned to a computation node). More specifically, it considers not only the correlation of data w.r.t. contextual attributes, but also the distribution of each contextual dimension in the dataset. We experimentally compare our approach with existing ones, considering both quality criteria and the query execution times.


Author(s):  
Preeti Arora ◽  
Deepali Virmani ◽  
P.S. Kulkarni

Sentiment analysis is the pre-eminent technology to extract the relevant information from the data domain. In this paper cross domain sentimental classification approach Cross_BOMEST is proposed. Proposed approach will extract <strong>†</strong>ve words using existing BOMEST technique, with the help of Ms Word Introp, Cross_BOMEST determines <strong>†</strong>ve words and replaces all its synonyms to escalate the polarity and blends two different domains and detects all the self-sufficient words. Proposed Algorithm is executed on Amazon datasets where two different domains are trained to analyze sentiments of the reviews of the other remaining domain. Proposed approach contributes propitious results in the cross domain analysis and accuracy of 92 % is obtained. Precision and Recall of BOMEST is improved by 16% and 7% respectively by the Cross_BOMEST.


First Monday ◽  
2019 ◽  
Author(s):  
James Brusseau

Compartmentalizing our distinct personal identities is increasingly difficult in big data reality. Pictures of the person we were on past vacations resurface in employers’ Google searches; LinkedIn which exhibits our income level is increasingly used as a dating web site. Whether on vacation, at work, or seeking romance, our digital selves stream together. One result is that a perennial ethical question about personal identity has spilled out of philosophy departments and into the real world. Ought we possess one, unified identity that coherently integrates the various aspects of our lives, or, incarnate deeply distinct selves suited to different occasions and contexts? At bottom, are we one, or many? The question is not only palpable today, but also urgent because if a decision is not made by us, the forces of big data and surveillance capitalism will make it for us by compelling unity. Speaking in favor of the big data tendency, Facebook’s Mark Zuckerberg promotes the ethics of an integrated identity, a single version of selfhood maintained across diverse contexts and human relationships. This essay goes in the other direction by sketching two ethical frameworks arranged to defend our compartmentalized identities, which amounts to promoting the dis-integration of our selves. One framework connects with natural law, the other with language, and both aim to create a sense of selfhood that breaks away from its own past, and from the unifying powers of big data technology.


Author(s):  
Guilherme Cavalcante Silva

Over the last few years, data studies within Social Sciences watched a growth in the number of researches highlighting the need for more proficuous participation from the Global South in the debates of the field. The lack of Southern voices in the academic scholarship on the one hand, and of recognition of the importance and autonomy of its local data practices, such as those from indigenous data movements, on the other, had been decisive in establishing a Big Data in the South agenda. This paper displays an analytical mapping of 131 articles published from 2014-2016 in Big Data & Society (BD&S), a leading journal acknowledged for its pioneering promotion of Big Data research among social scientists. Its goal is to provide an overview of the way data practices are approached in BD&S papers concerning its geopolitical instance. It argues that there is a tendency to generalise data practices overlooking the specific consequences of Big Data in Southern contexts because of an almost exclusive presence of Euroamerican perspectives in the journal. This paper argues that this happens as a result of an epistemological asymmetry that pervades Social Sciences.


Sign in / Sign up

Export Citation Format

Share Document