scholarly journals WIKI STREAMS: Wikipedia Article Recent Edit Retrieval System using Hierarchical Stream Clustering

Author(s):  
Arun Manicka Raja M ◽  
Swamynathan Sankaranarayanan

Abstract Stream analytics, a new paradigm in data analytics, has gained mo- mentum due to the voluminous stream data generation. With the huge increase in the edits performed on Wikipedia topics, it is tedious for the digital knowledge discovery users to nd their domain updates immediately. The users need to go through large information and spend more time to nd the potential data. There is a need for retrieving the Wikipedia edits based on the meta data of the article edits for later retriev-al. Hence, the clustering technique may be employed in order to group the Wikipedia article edits domain wise. Hence, in this paper, hierarchi- cal stream clustering is applied in order to retrieve the edits based on the user interest. Over a period of month, the data from Wikipedia is collected and used as a dataset. Our method is compared with the state-of-the-art clus-tering system WikiAutoCat and it is observed that the accuracy is improved by 10% and the clustering time is reduced by 20%.

2019 ◽  
Vol 8 (4) ◽  
pp. 8593-8596

Evaluation of Internet of Things (IoT) technologies in real life has scaled the enumeration of data in huge volumes and that too with high velocity, and thus a new issue has come into picture that is of management & analytics of this BIG IOT STREAM data. In order to optimize the performance of the IoT Machines and services provided by the vendors, industry is giving high priority to analyze this big IoT Stream Data for surviving in the competitive global environment. Thses analysis are done through number of applications using various Data Analytics Framework, which require obtaining the valuable information intelligently from a large amount of real-time produced data. This paper, discusses the challenges and issues faced by distributed stream analytics frameworks at the data processing level and tries to recommend a possible a Scalable Framework to adapt with the volume and velocity of Big IoT Stream Data. Experiments focus on evaluating the performance of three Distributed Stream Analytics Here Analytics frameworks, namely Apache Spark, Splunk and Apache Storm are being evaluated over large steam IoT data on latency & throughput as parameters in respect to concurrency. The outcome of the paper is to find the best possible existing framework and recommend a possible scalable framework.


Author(s):  
Pijush Kanti Dutta Pramanik ◽  
Saurabh Pal ◽  
Moutan Mukhopadhyay

Like other fields, the healthcare sector has also been greatly impacted by big data. A huge volume of healthcare data and other related data are being continually generated from diverse sources. Tapping and analysing these data, suitably, would open up new avenues and opportunities for healthcare services. In view of that, this paper aims to present a systematic overview of big data and big data analytics, applicable to modern-day healthcare. Acknowledging the massive upsurge in healthcare data generation, various ‘V's, specific to healthcare big data, are identified. Different types of data analytics, applicable to healthcare, are discussed. Along with presenting the technological backbone of healthcare big data and analytics, the advantages and challenges of healthcare big data are meticulously explained. A brief report on the present and future market of healthcare big data and analytics is also presented. Besides, several applications and use cases are discussed with sufficient details.


2019 ◽  
Vol 5 (2) ◽  
pp. 76-82
Author(s):  
Cornelius Mellino Sarungu ◽  
Liliana Liliana

Project management practice used many tools to support the process of recording and tracking data generated along the whole project. Project analytics provide deeper insights to be used on decision making. To conduct project analytics, one should explore the tools and techniques required. The mostcommon tool is Microsoft Excel. Its simplicity and flexibility make project manager or project team members can utilize it to do almost any kind of activities. We combine MS Excel with R Studio to brought data analytics into the project management process. While the data input process still using the old way that the project manager already familiar, the analytic engine could extract data from it and create visualization of needed parameters in a single output report file. This kind of approach deliver a low cost solution of project analytics for the organization. We can implement it with relatively low cost technology onone side, some of them are free, while maintaining the simple way of data generation process. This solution can also be proposed to improve project management process maturity level to the next stage, like CMMI level 4 that promote project analytics. Index Terms—project management, project analytics, data analytics.


Author(s):  
Nurshazwani Muhamad Mahfuz ◽  
Marina Yusoff ◽  
Zakiah Ahmad

<div style="’text-align: justify;">Clustering provides a prime important role as an unsupervised learning method in data analytics to assist many real-world problems such as image segmentation, object recognition or information retrieval. It is often an issue of difficulty for traditional clustering technique due to non-optimal result exist because of the presence of outliers and noise data.  This review paper provides a review of single clustering methods that were applied in various domains.  The aim is to see the potential suitable applications and aspect of improvement of the methods. Three categories of single clustering methods were suggested, and it would be beneficial to the researcher to see the clustering aspects as well as to determine the requirement for clustering method for an employment based on the state of the art of the previous research findings.</div>


2011 ◽  
Vol 29 (6) ◽  
pp. 817-825 ◽  
Author(s):  
Muhammad Khurram Zahoor

Reservoir surveillance always requires fast, unproblematic access and solution to different relative permeability models which have been developed from time to time. In addition, complex models sometimes require in-depth knowledge of mathematics for solution prior to use them for data generation. For this purpose, in-house software has been designed to generate rigorous relative permeability curves, with a provision to include users own relative permeability models, a part from built-in various relative permeability correlations. The developed software with state-of-the-art algorithms has been used to analyze the effect of variations in residual and maximum wetting phase saturation on relative permeability curves for a porous medium having very high non-uniformity in pore size distribution. To further increase the spectrum of the study, two relative permeability models, i.e., Pirson's correlation and Brooks and Corey model has been used and the obtained results show that the later model is more sensitive to such variations.


Author(s):  
Pethuru Raj

The implications of the digitization process among a bevy of trends are definitely many and memorable. One is the abnormal growth in data generation, gathering, and storage due to a steady increase in the number of data sources, structures, scopes, sizes, and speeds. In this chapter, the author shows some of the impactful developments brewing in the IT space, how the tremendous amount of data getting produced and processed all over the world impacts the IT and business domains, how next-generation IT infrastructures are accordingly getting refactored, remedied, and readied for the impending big data-induced challenges, how likely the move of the big data analytics discipline towards fulfilling the digital universe requirements of extracting and extrapolating actionable insights for the knowledge-parched is, and finally, the establishment and sustenance of the dreamt smarter planet.


Web Services ◽  
2019 ◽  
pp. 2161-2171
Author(s):  
Miltiadis D. Lytras ◽  
Vijay Raghavan ◽  
Ernesto Damiani

The Big Data and Data Analytics is a brand new paradigm, for the integration of Internet Technology in the human and machine context. For the first time in the history of the human mankind we are able to transforming raw data that are massively produced by humans and machines in to knowledge and wisdom capable of supporting smart decision making, innovative services, new business models, innovation, and entrepreneurship. For the Web Science research, this is a new methodological and technological spectrum of advanced methods, frameworks and functionalities never experienced in the past. At the same moment communities out of web science need to realize the potential of this new paradigm with the support of new sound business models and a critical shift in the perception of decision making. In this short visioning article, the authors are analyzing the main aspects of Big Data and Data Analytics Research and they provide their own metaphor for the next years. A number of research directions are outlined as well as a new roadmap towards the evolution of Big Data to Smart Decisions and Cognitive Computing. The authors do hope that the readers would like to react and to propose their own value propositions for the domain initiating a scientific dialogue beyond self-fulfilled expectations.


Biotechnology ◽  
2019 ◽  
pp. 1967-1984
Author(s):  
Dharmendra Trikamlal Patel

Voluminous data are being generated by various means. The Internet of Things (IoT) has emerged recently to group all manmade artificial things around us. Due to intelligent devices, the annual growth of data generation has increased rapidly, and it is expected that by 2020, it will reach more than 40 trillion GB. Data generated through devices are in unstructured form. Traditional techniques of descriptive and predictive analysis are not enough for that. Big Data Analytics have emerged to perform descriptive and predictive analysis on such voluminous data. This chapter first deals with the introduction to Big Data Analytics. Big Data Analytics is very essential in Bioinformatics field as the size of human genome sometimes reaches 200 GB. The chapter next deals with different types of big data in Bioinformatics. The chapter describes several problems and challenges based on big data in Bioinformatics. Finally, the chapter deals with techniques of Big Data Analytics in the Bioinformatics field.


2020 ◽  
Vol 36 (11) ◽  
pp. 3516-3521 ◽  
Author(s):  
Lixiang Zhang ◽  
Lin Lin ◽  
Jia Li

Abstract Motivation Cluster analysis is widely used to identify interesting subgroups in biomedical data. Since true class labels are unknown in the unsupervised setting, it is challenging to validate any cluster obtained computationally, an important problem barely addressed by the research community. Results We have developed a toolkit called covering point set (CPS) analysis to quantify uncertainty at the levels of individual clusters and overall partitions. Functions have been developed to effectively visualize the inherent variation in any cluster for data of high dimension, and provide more comprehensive view on potentially interesting subgroups in the data. Applying to three usage scenarios for biomedical data, we demonstrate that CPS analysis is more effective for evaluating uncertainty of clusters comparing to state-of-the-art measurements. We also showcase how to use CPS analysis to select data generation technologies or visualization methods. Availability and implementation The method is implemented in an R package called OTclust, available on CRAN. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document