scholarly journals A survey of Big Data dimensions vs Social Networks analysis

Author(s):  
Michele Ianni ◽  
Elio Masciari ◽  
Giancarlo Sperlí

Abstract The pervasive diffusion of Social Networks (SN) produced an unprecedented amount of heterogeneous data. Thus, traditional approaches quickly became unpractical for real life applications due their intrinsic properties: large amount of user-generated data (text, video, image and audio), data heterogeneity and high speed generation rate. More in detail, the analysis of user generated data by popular social networks (i.e Facebook (https://www.facebook.com/), Twitter (https://www.twitter.com/), Instagram (https://www.instagram.com/), LinkedIn (https://www.linkedin.com/)) poses quite intriguing challenges for both research and industry communities in the task of analyzing user behavior, user interactions, link evolution, opinion spreading and several other important aspects. This survey will focus on the analyses performed in last two decades on these kind of data w.r.t. the dimensions defined for Big Data paradigm (the so called Big Data 6 V’s).

Author(s):  
Muhammad Mazhar Ullah Rathore ◽  
Awais Ahmad ◽  
Anand Paul

Geosocial network data provides the full information on current trends in human, their behaviors, their living style, the incidents and events, the disasters, current medical infection, and much more with respect to locations. Hence, the current geosocial media can work as a data asset for facilitating the national and the government itself by analyzing the geosocial data at real-time. However, there are millions of geosocial network users, who generates terabytes of heterogeneous data with a variety of information every day with high-speed, termed as Big Data. Analyzing such big amount of data and making real-time decisions is an inspiring task. Therefore, this book chapter discusses the exploration of geosocial networks. A system architecture is discussed and implemented in a real-time environment in order to process the abundant amount of various social network data to monitor the earth events, incidents, medical diseases, user trends and thoughts to make future real-time decisions as well as future planning.


Author(s):  
Martin Atzmueller

For designing and modeling Artificial Intelligence (AI) systems in the area of human-machine interaction, suitable approaches for user modeling are important in order to both capture user characteristics. Using multimodal data, this can be performed from various perspectives. Specifically, for modeling user interactions in human interaction networks, appropriate approaches for capturing those interactions, as well as to analyze them in order to extract meaningful patterns are important. Specifically, for modeling user behavior for the respective AI systems, we can make use of diverse heterogeneous data sources. This paper investigates face-to-face as well as socio-spatial interaction networks for modeling user interactions from three perspectives: We analyze preferences and perceptions of human social interactions in relation to the interactions observed using wearable sensors, i. e., face-to-face as well as socio-spatial interactions fo the respective actors. For that, we investigate the correspondence of according networks, in order to identify conformance, exceptions, and anomalies. The analysis is performed on a real-world dataset capturing networks of proximity interactions coupled with self-report questionnaires about preferences and perception of those interactions. The different networks, and according perspectives then provide different options for user modeling and integration into AI systems modeling such user behavior.


2021 ◽  
Vol 27 (11) ◽  
pp. 1203-1221
Author(s):  
Amal Rekik ◽  
Salma Jamoussi

Clustering data streams in order to detect trending topic on social networks is a chal- lenging task that interests the researchers in the big data field. In fact, analyzing such data needs several requirements to be addressed due to their large amount and evolving nature. For this purpose, we propose, in this paper, a new evolving clustering method which can take into account the incremental nature of the data and meet with its principal requirements. Our method explores a deep learning technique to learn incrementally from unlabelled examples generated at high speed which need to be clustered instantly. To evaluate the performance of our method, we have conducted several experiments using the Sanders, HCR and Terr-Attacks datasets.


2020 ◽  
Vol 13 (1) ◽  
Author(s):  
Anyou Wang ◽  
Rong Hai

Abstract Objectives Numerous software has been developed to infer the gene regulatory network, a long-standing key topic in biology and computational biology. Yet the slowness and inaccuracy inherited in current software hamper their applications to the increasing massive data. Here, we develop a software, FINET (Fast Inferring NETwork), to infer a network with high accuracy and rapidity from big data. Results The high accuracy results from integrating algorithms with stability-selection, elastic-net, and parameter optimization. Tested by a known biological network, FINET infers interactions with over 94% precision. The high speed comes from partnering parallel computations implemented with Julia, a new compiled language that runs much faster than existing languages used in the current software, such as R, Python, and MATLAB. Regardless of FINET’s implementations with Julia, users with no background in the language or computer science can easily operate it, with only a user-friendly single command line. In addition, FINET can infer other networks such as chemical networks and social networks. Overall, FINET provides a confident way to efficiently and accurately infer any type of network for any scale of data.


Proceedings ◽  
2018 ◽  
Vol 2 (23) ◽  
pp. 1409 ◽  
Author(s):  
Miguel Ángel Fernández Fernández ◽  
Juan Luis Carús Candás ◽  
Pablo Barredo Gil ◽  
Antonio Miranda de la Torre ◽  
Gabriel Díaz Orueta

The exploitation of photovoltaic energy has experienced a great growth worldwide in recent years. Photovoltaic installations are characterized by the presence of a large number of devices and elements. This situation makes the operation and performance of a photovoltaic installation dependent of a large number of parameters and variables. Moreover, due to the size of some photovoltaic installations, a high volume of heterogeneous data is produced. Traditional approaches cannot tackle with such a huge amount of generated data. Through the adoption of a software architecture based on Industry 4.0. Key Enabling Technologies (such as Internet of Things and Big Data, among others), it is possible to improve the monitoring and operation procedures of photovoltaic plants.


In the current day scenario, a huge amount of data is been generated from various heterogeneous sources like social networks, business apps, government sector, marketing, health care system, sensors, machine log data which is created at such a high speed and other sources. Big Data is chosen as one among the upcoming area of research by several industries. In this paper, the author presents wide collection of literature that has been reviewed and analyzed. This paper emphasizes on Big Data Technologies, Application & Challenges, a comparative study on architectures, methodologies, tools, and survey results proposed by various researchers are presented


2016 ◽  
Author(s):  
Marco Moscatelli ◽  
Matteo Gnocchi ◽  
Andrea Manconi ◽  
Luciano Milanesi

Motivation Nowadays, advances in technology has arisen in a huge amount of data in both biomedical research and healthcare systems. This growing amount of data gives rise to the need for new research methods and analysis techniques. Analysis of these data offers new opportunities to define novel diagnostic processes. Therefore, a greater integration between healthcare and biomedical data is essential to devise novel predictive models in the field of biomedical diagnosis. In this context, the digitalization of clinical exams and medical records is becoming essential to collect heterogeneous information. Analysis of these data by means of big data technologies will allow a more in depth understanding of the mechanisms leading to diseases, and contextually it will facilitate the development of novel diagnostics and personalized therapeutics. The recent application of big data technologies in the medical fields will offer new opportunities to integrate enormous amount of medical and clinical information from population studies. Therefore, it is essential to devise new strategies aimed at storing and accessing the data in a standardized way. Moreover, it is important to provide suitable methods to manage these heterogeneous data. Methods In this work, we present a new information technology infrastructure devised to efficiently manage huge amounts of heterogeneous data for disease prevention and precision medicine. A test set based on data produced by a clinical and diagnostic laboratory has been built to set up the infrastructure. When working with clinical data is essential to ensure the confidentiality of sensitive patient data. Therefore, the set up phase has been carried out using "anonymous data". To this end, specific techniques have been adopted with the aim to ensure a high level of privacy in the correlation of the medical records with important secondary information (e.g., date of birth, place of residence). It should be noted that the rigidity of relational databases does not lend to the nature of these data. In our opinion, better results can be obtained using non-relational (NoSQL) databases. Starting from these considerations, the infrastructure has been developed on a NoSQL database with the aim to combine scalability and flexibility performances. In particular, MongoDB [1] has been used as it fits better to manage different types of data on large scale. In doing so, the infrastructure is able to provide an optimized management of huge amounts of heterogeneous data, while ensuring high speed of analysis. Results The presented infrastructure exploits big data technologies in order to overcome the limitations of relational databases when working with large and heterogeneous data. The infrastructure implements a set of interface procedures aimed at preparing the metadata for importing data in a NOSQL DB. Abstract truncated at 3,000 characters - the full version is available in the pdf file


Author(s):  
Carson K.S. Leung ◽  
Yibin Zhang

In the current era of big data, high volumes of a wide variety of valuable data—which may be of different veracities—can be easily generated or collected at a high speed in various real-life applications related to art, culture, design, engineering, mathematics, science, and technology. A data science solution helps manage, analyze, and mine these big data—such as musical data—for the discovery of interesting information and useful knowledge. As “a picture is worth a thousand words,” a visual representation provided by the data science solution helps visualize the big data and comprehend the mined information and discovered knowledge. This journal article presents a visual analytic system—which uses a hue-saturation-value (HSV) color model to represent big data—for data science on musical data and beyond (e.g., other types of big data).


2016 ◽  
Author(s):  
Marco Moscatelli ◽  
Matteo Gnocchi ◽  
Andrea Manconi ◽  
Luciano Milanesi

Motivation Nowadays, advances in technology has arisen in a huge amount of data in both biomedical research and healthcare systems. This growing amount of data gives rise to the need for new research methods and analysis techniques. Analysis of these data offers new opportunities to define novel diagnostic processes. Therefore, a greater integration between healthcare and biomedical data is essential to devise novel predictive models in the field of biomedical diagnosis. In this context, the digitalization of clinical exams and medical records is becoming essential to collect heterogeneous information. Analysis of these data by means of big data technologies will allow a more in depth understanding of the mechanisms leading to diseases, and contextually it will facilitate the development of novel diagnostics and personalized therapeutics. The recent application of big data technologies in the medical fields will offer new opportunities to integrate enormous amount of medical and clinical information from population studies. Therefore, it is essential to devise new strategies aimed at storing and accessing the data in a standardized way. Moreover, it is important to provide suitable methods to manage these heterogeneous data. Methods In this work, we present a new information technology infrastructure devised to efficiently manage huge amounts of heterogeneous data for disease prevention and precision medicine. A test set based on data produced by a clinical and diagnostic laboratory has been built to set up the infrastructure. When working with clinical data is essential to ensure the confidentiality of sensitive patient data. Therefore, the set up phase has been carried out using "anonymous data". To this end, specific techniques have been adopted with the aim to ensure a high level of privacy in the correlation of the medical records with important secondary information (e.g., date of birth, place of residence). It should be noted that the rigidity of relational databases does not lend to the nature of these data. In our opinion, better results can be obtained using non-relational (NoSQL) databases. Starting from these considerations, the infrastructure has been developed on a NoSQL database with the aim to combine scalability and flexibility performances. In particular, MongoDB [1] has been used as it fits better to manage different types of data on large scale. In doing so, the infrastructure is able to provide an optimized management of huge amounts of heterogeneous data, while ensuring high speed of analysis. Results The presented infrastructure exploits big data technologies in order to overcome the limitations of relational databases when working with large and heterogeneous data. The infrastructure implements a set of interface procedures aimed at preparing the metadata for importing data in a NOSQL DB. Abstract truncated at 3,000 characters - the full version is available in the pdf file


Sign in / Sign up

Export Citation Format

Share Document