k-Anonymization with Minimal Loss of Information

Author(s):  
Aristides Gionis ◽  
Tamir Tassa
Author(s):  
Sanah Nashir Sayyed ◽  
Namrata Mahender C.

Summarization is the process of selecting representative data to produce a reduced version of the given data with a minimal loss of information; so, it generally works on text, images, videos, and speech data. The chapter deals with not only concepts of text summarization (types, stages, issues, and criteria) but also with applications. The two main categories of approaches generally used in text summaries (i.e., abstractive and extractive) are discussed. Abstractive techniques use linguistic methods to interpret the text; they produce understandable and semantically equivalent sentences with a shorter length. Extractive techniques mostly rely on statistical methods for extracting essential sentences from the given text. In addition, the authors explore the SACAS model to exemplify the process of summarization. The SACAS system analyzed 50 stories, and its evaluation is presented in terms of a new measurement based on question-answering MOS, which is also introduced in this chapter.


PIERS Online ◽  
2005 ◽  
Vol 1 (2) ◽  
pp. 231-235
Author(s):  
Konrad Skowronek ◽  
Jerzy Frackowiak ◽  
Piotr Szymkowiak ◽  
Grzegorz Trzmiel ◽  
Piotr Walczak ◽  
...  

2021 ◽  
pp. 000370282098784
Author(s):  
James Renwick Beattie ◽  
Francis Esmonde-White

Spectroscopy rapidly captures a large amount of data that is not directly interpretable. Principal Components Analysis (PCA) is widely used to simplify complex spectral datasets into comprehensible information by identifying recurring patterns in the data with minimal loss of information. The linear algebra underpinning PCA is not well understood by many applied analytical scientists and spectroscopists who use PCA. The meaning of features identified through PCA are often unclear. This manuscript traces the journey of the spectra themselves through the operations behind PCA, with each step illustrated by simulated spectra. PCA relies solely on the information within the spectra, consequently the mathematical model is dependent on the nature of the data itself. The direct links between model and spectra allow concrete spectroscopic explanation of PCA, such the scores representing ‘concentration’ or ‘weights’. The principal components (loadings) are by definition hidden, repeated and uncorrelated spectral shapes that linearly combine to generate the observed spectra. They can be visualized as subtraction spectra between extreme differences within the dataset. Each PC is shown to be a successive refinement of the estimated spectra, improving the fit between PC reconstructed data and the original data. Understanding the data-led development of a PCA model shows how to interpret application specific chemical meaning of the PCA loadings and how to analyze scores. A critical benefit of PCA is its simplicity and the succinctness of its description of a dataset, making it powerful and flexible.


2021 ◽  
Vol 11 (14) ◽  
pp. 6405
Author(s):  
Pere Marti-Puig ◽  
Alejandro Bennásar-Sevillá ◽  
Alejandro Blanco-M. ◽  
Jordi Solé-Casals

Today, the use of SCADA data for predictive maintenance and forecasting of wind turbines in wind farms is gaining popularity due to the low cost of this solution compared to others that require the installation of additional equipment. SCADA data provides four statistical measures (mean, standard deviation, maximum value, and minimum value) of hundreds of wind turbine magnitudes, usually in a 5-min or 10-min interval. Several studies have analysed the loss of information associated with the reduction of information when using five minutes instead of four seconds as a sampling frequency, or when compressing a time series recorded at 5 min to 10 min, concluding that some, but not all, of these magnitudes are seriously affected. However, to our knowledge, there are no studies on increasing the time interval beyond 10 min to take these four statistical values, and how this aggregation affects prognosis models. Our work shows that, despite the irreversible loss of information that occurs in the first 5 min, increasing the time considered to take the four representative statistical values improves the performance of the predicted targets in normality models.


2021 ◽  
Vol 20 (2) ◽  
Author(s):  
Xing-Bo Pan ◽  
Gang Xu ◽  
Zong-Peng Li ◽  
Xiu-Bo Chen ◽  
Yi-Xian Yang

Sign in / Sign up

Export Citation Format

Share Document