The Elements of Statistical Learning, Data Mining, Inference, and Prediction. Trevor Hastie, Robert Tibshirani and Jerome Friedman, Springer, New York, 2001. No. of pages: xvi+533. ISBN 0-387-95284-5

2004 ◽  
Vol 23 (3) ◽  
pp. 528-529 ◽  
Author(s):  
Hans C. Van Houwelingen
2018 ◽  
Vol 02 (02) ◽  
pp. 1850015 ◽  
Author(s):  
Joseph R. Barr ◽  
Joseph Cavanaugh

It is not unusual that efforts to validate a statistical model exceed those used to build the model. Multiple techniques are used to validate, compare and contrast among competing statistical models: Some are concerned with a model’s ability to predict new data while others are concerned with model descriptiveness of the data. Without claiming to provide a comprehensive view of the landscape, in this paper we will touch on both aspects of model validation. There is much more to the subject and the reader is referred to any of the many classical statistical texts including the revised two volumes of Bickel and Docksum (2016), the one by Hastie, Tibshirani, and Friedman [The Elements of Statistical Learning: Data Mining, Inference, and Predication, 2nd edn. (Springer, 2009)], and several others listed in the bibliography.


Author(s):  
Manmohan Singh ◽  
Rajendra Pamula ◽  
Alok Kumar

There are various applications of clustering in the fields of machine learning, data mining, data compression along with pattern recognition. The existent techniques like the Llyods algorithm (sometimes called k-means) were affected by the issue of the algorithm which converges to a local optimum along with no approximation guarantee. For overcoming these shortcomings, an efficient k-means clustering approach is offered by this paper for stream data mining. Coreset is a popular and fundamental concept for k-means clustering in stream data. In each step, reduction determines a coreset of inputs, and represents the error, where P represents number of input points according to nested property of coreset. Hence, a bit reduction in error of final coreset gets n times more accurate. Therefore, this motivated the author to propose a new coreset-reduction algorithm. The proposed algorithm executed on the Covertype dataset, Spambase dataset, Census 1990 dataset, Bigcross dataset, and Tower dataset. Our algorithm outperforms with competitive algorithms like Streamkm[Formula: see text], BICO (BIRCH meets Coresets for k-means clustering), and BIRCH (Balance Iterative Reducing and Clustering using Hierarchies.


Web Services ◽  
2019 ◽  
pp. 105-126
Author(s):  
N. Nawin Sona

This chapter aims to give an overview of the wide range of Big Data approaches and technologies today. The data features of Volume, Velocity, and Variety are examined against new database technologies. It explores the complexity of data types, methodologies of storage, access and computation, current and emerging trends of data analysis, and methods of extracting value from data. It aims to address the need for clarity regarding the future of RDBMS and the newer systems. And it highlights the methods in which Actionable Insights can be built into public sector domains, such as Machine Learning, Data Mining, Predictive Analytics and others.


Sign in / Sign up

Export Citation Format

Share Document