Spectral Methods for Data Clustering

Author(s):  
Wenyuan Li

With the rapid growth of the World Wide Web and the capacity of digital data storage, tremendous amount of data are generated daily from business and engineering to the Internet and science. The Internet, financial real-time data, hyperspectral imagery, and DNA microarrays are just a few of the common sources that feed torrential streams of data into scientific and business databases worldwide. Compared to statistical data sets with small size and low dimensionality, traditional clustering techniques are challenged by such unprecedented high volume, high dimensionality complex data. To meet these challenges, many new clustering algorithms have been proposed in the area of data mining (Han & Kambr, 2001).

2018 ◽  
Vol 6 (3) ◽  
pp. 359-363
Author(s):  
A. Saxena ◽  
◽  
S. Sharma ◽  
S. Dangi ◽  
A. Sharma ◽  
...  

Sensors ◽  
2021 ◽  
Vol 21 (15) ◽  
pp. 5204
Author(s):  
Anastasija Nikiforova

Nowadays, governments launch open government data (OGD) portals that provide data that can be accessed and used by everyone for their own needs. Although the potential economic value of open (government) data is assessed in millions and billions, not all open data are reused. Moreover, the open (government) data initiative as well as users’ intent for open (government) data are changing continuously and today, in line with IoT and smart city trends, real-time data and sensor-generated data have higher interest for users. These “smarter” open (government) data are also considered to be one of the crucial drivers for the sustainable economy, and might have an impact on information and communication technology (ICT) innovation and become a creativity bridge in developing a new ecosystem in Industry 4.0 and Society 5.0. The paper inspects OGD portals of 60 countries in order to understand the correspondence of their content to the Society 5.0 expectations. The paper provides a report on how much countries provide these data, focusing on some open (government) data success facilitating factors for both the portal in general and data sets of interest in particular. The presence of “smarter” data, their level of accessibility, availability, currency and timeliness, as well as support for users, are analyzed. The list of most competitive countries by data category are provided. This makes it possible to understand which OGD portals react to users’ needs, Industry 4.0 and Society 5.0 request the opening and updating of data for their further potential reuse, which is essential in the digital data-driven world.


2020 ◽  
Vol 8 (6) ◽  
pp. 5643-5646

Since last decade, the exponential growth of the internet users and the size of data over the internet is increasing day by day, which lead to increase the complexity of the systems by implementing policies and security to avoid attacks on systems and networks. It is very important to understand and analyses the real time data traffic of the communication systems. The purpose of this paper to design a customized Java based application which enables analysts to capture the traffic at the bottleneck under the mean field communication environment where a large number of devices are communicating with each other. The sending data for further processing for analysis the trend to overcome vulnerabilities or to manage the effectiveness of the communication systems. The proposed application enables to capture 8 different types of protocol traffic such as HTTP, HTTPS, SMTP, UDP, TCP, ICMP and POP3. The application allows for analysis of the incoming/outgoing traffic in the visual to understand the nature of communication networks which lead to improve the performance of the networks with respect to hardware, software, data storage, security and reliability.


Repositor ◽  
2020 ◽  
Vol 2 (5) ◽  
pp. 541
Author(s):  
Denni Septian Hermawan ◽  
Syaifuddin Syaifuddin ◽  
Diah Risqiwati

AbstrakJaringan internet yang saat ini di gunakan untuk penyimpanan data atau halaman informasi pada website menjadi rentan terhadap serangan, untuk meninkatkan keamanan website dan jaringannya, di butuhkan honeypot yang mampu menangkap serangan yang di lakukan pada jaringan lokal dan internet. Untuk memudahkan administrator mengatasi serangan digunakanlah pengelompokan serangan dengan metode K-Means untuk mengambil ip penyerang. Pembagian kelompok pada titik cluster akan menghasilkan output ip penyerang.serangan di ambil sercara realtime dari log yang di miliki honeypot dengan memanfaatkan MHN.Abstract The number of internet networks used for data storage or information pages on the website is vulnerable to attacks, to secure the security of their websites and networks, requiring honeypots that are capable of capturing attacks on local networks and the internet. To make it easier for administrators to tackle attacks in the use of attacking groupings with the K-Means method to retrieve the attacker ip. Group divisions at the cluster point will generate the ip output of the attacker. The strike is taken as realtime from the logs that have honeypot by utilizing the MHN.


2008 ◽  
pp. 3611-3620
Author(s):  
Janusz Swierzowicz

The development of information technology is particularly noticeable in the methods and techniques of data acquisition, high-performance computing, and bandwidth frequency. According to a newly observed phenomenon, called a storage low (Fayyad & Uthurusamy, 2002), the capacity of digital data storage is doubled every 9 months with respect to the price. Data can be stored in many forms of digital media, for example, still images taken by a digital camera, MP3 songs, or MPEG videos from desktops, cell phones, or video cameras. Such data exceeds the total cumulative handwriting and printing during all of recorded human history (Fayyad, 2001). According to current analysis carried out by IBM Almaden Research (Swierzowicz, 2002), data volumes are growing at different speeds. The fastest one is Internet-resource growth: It will achieve the digital online threshold of exabytes within a few years (Liautaud, 2001). In these fast-growing volumes of data environments, restrictions are connected with a human’s low data-complexity and dimensionality analysis. Investigations on combining different media data, multimedia, into one application have begun as early as the 1960s, when text and images were combined in a document. During the research and development process, audio, video, and animation were synchronized using a time line to specify when they should be played (Rowe & Jain, 2004). Since the middle 1990s, the problems of multimedia data capture, storage, transmission, and presentation have extensively been investigated. Over the past few years, research on multimedia standards (e.g., MPEG-4, X3D, MPEG-7) has continued to grow. These standards are adapted to represent very complex multimedia data sets; can transparently handle sound, images, videos, and 3-D (three-dimensional) objects combined with events, synchronization, and scripting languages; and can describe the content of any multimedia object. Different algorithms need to be used in multimedia distribution and multimedia database applications. An example is an image database that stores pictures of birds and a sound database that stores recordings of birds (Kossmann, 2000). The distributed query that asks for “top ten different kinds of birds that have black feathers and a high voice” is described there by Kossmann (2000, p.436).


1997 ◽  
Vol 3 (S2) ◽  
pp. 1109-1110
Author(s):  
D.C. McCord ◽  
S.K. Kennedy ◽  
D.G. Kritikos

Manual scanning electron microscope (SEM) analysis is historically considered to be slow and tedious resulting in a low volume of data. This is due in large part to the mechanics of moving stage locations and recording image and spectral data. Conversely, high volume data acquired using automated SEM analysis has been associated with the need for complex systems for data management and analysis. In addition, the proliferation of high volume digital microscopy and its attendant “ tonnage” of paper images has lead to the desire for a “green” (filmless and hardcopy-reduced) operation.There are some classes of projects which are amenable to automated feature analysis - discrete features that are distinct from a background material. However, many projects require operator intervention in order to identify the region or points of interest. Yet, these projects may also require that large data sets be acquired and analyzed for statistical rigor.


2021 ◽  
Vol 11 (13) ◽  
pp. 6070
Author(s):  
Veronika Szücs ◽  
Gábor Arányi ◽  
Ákos Dávid

We live in a world of digital information communication and digital data storage. Following the development of technology, demands from the user side also pose serious challenges for developers, both in the field of hardware and software development. However, the increasing penetration of the Internet, IoT and digital solutions that have become available in almost every segment of life, carries risks as well as benefits. In this study, the authors present the phenomenon of ransomware attacks that appear on a daily basis, which endangers the operation and security of the digital sphere of both small and large enterprises and individuals. An overview of ransomware attacks, the tendency and characteristics of the attacks, which have caused serious financial loss and other damages to the victims, are presented. This manuscript also provides a brief overview of protection against ransomware attacks and the software and hardware options that enhance general user security and their effectiveness as standalone applications. The authors present the results of the study, which aimed to explore how the available software and hardware devices can implement digital user security. Based on the results of the research, the authors propose a complex system that can be used to increase the efficiency of network protection and OS protection tools already available to improve network security, and to detect ransomware attacks early. As a result, the model of the proposed protection system is presented, and it can be stated that the complex system should be able to detect ransomware attacks from either the Internet or the internal network at an early stage, mitigate malicious processes and maintain data in recoverable state.


Author(s):  
Tarun Goyal ◽  
Rakesh Rathi ◽  
Vinesh Kumar Jain ◽  
Emmanuel Shubhakar Pilli ◽  
Arka Prokash Mazumdar

In this article, the authors have discussed about the connection between Internet of Things and growth of big data. They have also discussed short reference on the evolution, features, lifecycle, and implementation of Big Data from IoT over the cloud. Internet of Things represents a platform or environment that consists of enormous number of sensors and mediators interconnecting heterogeneous physical devices over the internet. IoT applications are available in many real-world areas such as smart city, smart workplace, smart home, smart transportation and various other ubiquitous computing areas. Using IoT applications generates tremendous amount of data for storage and management in the internet. With the time and research evolution integration of the IoT platforms and cloud comes in the market and IoT platforms data storage and management started shifting to the cloud from the internet connected physical systems for many real-world application areas. Meanwhile when this data becomes huge termed as Big Data. Handling of Big Data over the cloud develops many new areas of research and attention.


2021 ◽  
Vol 11 (2) ◽  
pp. 1-16
Author(s):  
Shyla ◽  
Vishal Bhatnagar ◽  
Raju Ranjan ◽  
Arushi Jain

Big data is the high-volume, high-variety data which involves data storage, data management, and data analysis that presents a wide view of business possibility for real-time data, sensor data, and streaming data over the web. Big data relies on technology, analysis, and mythology where technology deals with computation power, accuracy, linking, and large datasets; analysis is to find patterns by analyzing large datasets to discover hidden information; and mythology is the wrong beliefs that large datasets give insight knowledge of data that is not obtained by small datasets. In this paper, the authors analyzed the major benefits the organization see from employing contract workers using map reduce programming framework.


Author(s):  
Wenyuan Li

With the rapid growth of the World Wide Web and the capacity of digital data storage, tremendous amount of data are generated daily from business and engineering to the Internet and science. The Internet, financial realtime data, hyperspectral imagery, and DNA microarrays are just a few of the common sources that feed torrential streams of data into scientific and business databases worldwide. Compared to statistical data sets with small size and low dimensionality, traditional clustering techniques are challenged by such unprecedented high volume, high dimensionality complex data. To meet these challenges, many new clustering algorithms have been proposed in the area of data mining (Han & Kambr, 2001). Spectral techniques have proven useful and effective in a variety of data mining and information retrieval applications where massive amount of real-life data is available (Deerwester et al., 1990; Kleinberg, 1998; Lawrence et al., 1999; Azar et al., 2001). In recent years, a class of promising and increasingly popular approaches — spectral methods — has been proposed in the context of clustering task (Shi & Malik, 2000; Kannan et al., 2000; Meila & Shi, 2001; Ng et al., 2001). Spectral methods have the following reasons to be an attractive approach to clustering problem: • Spectral approaches to the clustering problem offer the potential for dramatic improvements in efficiency and accuracy relative to traditional iterative or greedy algorithms. They do not intrinsically suffer from the problem of local optima. • Numerical methods for spectral computations are extremely mature and well understood, allowing clustering algorithms to benefit from a long history of implementation efficiencies in other fields (Golub & Loan, 1996). • Components in spectral methods have the naturally close relationship with graphs (Chung, 1997). This characteristic provides an intuitive and semantic understanding of elements in spectral methods. It Spectral Methods for Data Clustering Wenyuan Li Nanyang Technological University, Singapore Wee Keong Ng Nanyang Technological University, Singapore is important when the data is graph-based, such as links of WWW, or can be converted to graphs. In this paper, we systematically discuss applications of spectral methods to data clustering.


Sign in / Sign up

Export Citation Format

Share Document