scholarly journals Erratum to: ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics

2017 ◽  
Vol 9 (1) ◽  
Author(s):  
Jiangming Sun ◽  
Nina Jeliazkova ◽  
Vladimir Chupakhin ◽  
Jose-Felipe Golib-Dzib ◽  
Ola Engkvist ◽  
...  
2017 ◽  
Vol 9 (1) ◽  
Author(s):  
Jiangming Sun ◽  
Nina Jeliazkova ◽  
Vladimir Chupakhin ◽  
Jose-Felipe Golib-Dzib ◽  
Ola Engkvist ◽  
...  

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Yixue Zhu ◽  
Boyue Chai

With the development of increasingly advanced information technology and electronic technology, especially with regard to physical information systems, cloud computing systems, and social services, big data will be widely visible, creating benefits for people and at the same time facing huge challenges. In addition, with the advent of the era of big data, the scale of data sets is getting larger and larger. Traditional data analysis methods can no longer solve the problem of large-scale data sets, and the hidden information behind big data is digging out, especially in the field of e-commerce. We have become a key factor in competition among enterprises. We use a support vector machine method based on parallel computing to analyze the data. First, the training samples are divided into several working subsets through the SOM self-organizing neural network classification method. Compared with the ever-increasing progress of information technology and electronic equipment, especially the related physical information system finally merges the training results of each working set, so as to quickly deal with the problem of massive data prediction and analysis. This paper proposes that big data has the flexibility of expansion and quality assessment system, so it is meaningful to replace the double-sidedness of quality assessment with big data. Finally, considering the excellent performance of parallel support vector machines in data mining and analysis, we apply this method to the big data analysis of e-commerce. The research results show that parallel support vector machines can solve the problem of processing large-scale data sets. The emergence of data dirty problems has increased the effective rate by at least 70%.


2021 ◽  
Author(s):  
Mingchuan Yang ◽  
Xinye Shao ◽  
Guanchang Xue ◽  
Bingyu Xie

AbstractIn order to deal with the difficulty of spectrum sensing in cognitive satellite wireless networks, a large-scale cognitive network spectrum sensing algorithm based on big data analysis theory is studied, and a new algorithm using mean exponential eigenvalue is proposed. This new approach fully uses all the eigenvalues in sample covariance matrix of the sensing results to make the decision, which can effectively improve the detection performance without obtaining the prior information from licensed users. Through simulation, the performance of various large scale cognitive radio spectrum sensing algorithms based on big data analysis theory is compared, and the influence of satellite to ground channel conditions and the number of sensing nodes on the performance of the algorithm is discussed.


2020 ◽  
Author(s):  
Katharina Höflich ◽  
Martin Claus ◽  
Willi Rath ◽  
Dorian Krause ◽  
Benedikt von St. Vieth ◽  
...  

<p>Demand on high-end high performance computer (HPC) systems by the Earth system science community today encompasses not only the handling of complex simulations but also machine and deep learning as well as interactive data analysis workloads on large volumes of data. This poster addresses the infrastructure needs of large-scale interactive data analysis workloads on supercomputers. It lays out how to enable optimizations of existing infrastructure with respect to accessibility, usability and interactivity and aims at informing decision making about future systems. To enhance accessibility, options for distributed access, e.g. through JupyterHub, will be evaluated. To increase usability, the unification of working environments via the operation and the joint maintenance of containers will be explored. Containers serve as a portable base software setting for data analysis application stacks and allow for long-term usability of individual working environments and repeatability of scientific analysis. Aiming for interactive big-data analysis on HPC will also help the scientific community in utilizing increasingly heterogeneous supercomputers, since the modular data-analysis stack already contains solutions for seamless use of various architectures such as accelerators. However, to enable day-to-day interactive work on supercomputers, the inter-operation of workloads with quick turn-around times and highly variable resource demands needs to be understood and evaluated. To this end, scheduling policies on selected HPC systems are reviewed with respect to existing technical solutions such as job preemption, utilizing the resiliency features of parallel computing toolkits like Dask. Presented are preliminary results focussing on the aspects of usability and interactive use of HPC systems on the basis of typical use cases from the ocean science community.</p>


Author(s):  
Rasmus Helles ◽  
Jacob Ørmen ◽  
Klaus Bruhn Jensen ◽  
Signe Sophus Lai ◽  
Ericka Menchen-Trevino ◽  
...  

In recent years, large-scale analysis of log data from digital devices - often termed ""big data analysis"" (Lazer, Kennedy, King, & Vespignani, 2014) - have taken hold in the field of internet research. Through Application Programming Interfaces (APIs) and commercial measurement, scholars have been able to analyze social media users (Freelon 2014) and web audiences (Taneja, 2016) on an uprecedented scale. And by developing digital research tools, scholars have been able to track individuals across websites (Menchen-Trevino, 2013) and mobile applications (Ørmen & Thorhauge 2015) in greater detail than ever before. Big data analysis holds unique potential for studying communication in depth and across many individuals (see e.g. Boase & Ling, 2013; Prior, 2013). At the same time, this approach introduces new methodological challenges in the transparency of data collection (Webster, 2014), sampling of participants and validity of conclusions (Rieder, Abdulla, Poell, Woltering, & Zack, 2015). Firstly, data aggregation is typically designed for commercial rather than academic purposes. The type of data included as well as how it is presented depend in large part on the business interests of measurement and advertisement companies (Webster, 2014). Secondly, when relying on this kind of secondary data it can be difficult to validate the output or techniques used to generate the data (Rieder, Abdulla, Poell, Woltering, & Zack, 2015). Thirdly, often the unit of analysis is media-centric, taking specific websites or social network pages as the empirical basis instead of individual users (Taneja, 2016). This makes it hard to untangle the behavior of real-world users from the aggregate trends. Lastly, variations in what users do might be so large that it is necessary to move from the aggregate to smaller groups of users to make meaningful inferences (Welles, 2014). Internet research is thus faced with a new research approach in big data analysis with potentials and perils that need to be discussed in combination with traditional approaches. This panel explores the role of big data analysis in relation to the wider repertoire of methods in internet research. The panel comprises four presentations that each sheds light on the complementarity of big data analysis with more traditional qualitative and quantitative methods. The first presentation opens the discussion with an overview of strategies for combining digital traces and commercial audience data with qualitative interviews and quantitative survey methods. The next presentation explores the potential of trace data to improve upon the experimental method. Researcher-collected data enables scholars to operate in a real-world setting, in contrast to a research lab, while obtaining informed consent from participants. The third presentation argues that large-scale audience data provide a unique perspective on internet use. By integrating census-level information about users with detailed traces of their behavior across websites, commercial audience data combines the strength of surveys and digital trace data respectively. Lastly, the fourth presentation shows how multi-institutional collaboration makes it possible do document social media activity (on Twitter) for a whole country (Australia) in a comprehensive manner. A feat not possible through other methods on a similar scale. Through these four presentations, the panel aims to situate big data analysis in the broader repertoire of internet research methods. 


2016 ◽  
Vol 4 (3) ◽  
pp. 1-21 ◽  
Author(s):  
Sungchul Lee ◽  
Eunmin Hwang ◽  
Ju-Yeon Jo ◽  
Yoohwan Kim

Due to the advancement of Information Technology (IT), the hospitality industry is seeing a great value in gathering various kinds of and a large amount of customers' data. However, many hotels are facing a challenge in analyzing customer data and using it as an effective tool to understand the hospitality customers better and, ultimately, to increase the revenue. The authors' research attempts to resolve the current challenges of analyzing customer data in hospitality by utilizing the big data analysis tools, especially Hadoop and R. Hadoop is a framework for processing large-scale data. With the integration of new approach, their study demonstrates the ways of aggregating and analyzing the hospitality customer data to find meaningful customer information. Multiple decision trees are constructed from the customer data sets with the intention of classifying customers' needs and customers' clusters. By analyzing the customer data, the study suggests three strategies to increase the total expenditure of the customers within a limited amount of time during their stay.


2018 ◽  
Vol 30 (5) ◽  
pp. 554-571 ◽  
Author(s):  
Maria Vincenza Ciasullo ◽  
Orlando Troisi ◽  
Francesca Loia ◽  
Gennaro Maione

Purpose The purpose of this paper is to provide a better understanding of the reasons why people use or do not use carpooling. A further aim is to collect and analyze empirical evidence concerning the advantages and disadvantages of carpooling. Design/methodology/approach A large-scale text analytics study has been conducted: the collection of the peoples’ opinions have been realized on Twitter by means of a dedicated web crawler, named “Twitter4J.” After their mining, the collected data have been treated through a sentiment analysis realized by means of “SentiWordNet.” Findings The big data analysis identified the 12 most frequently used concepts about carpooling by Twitter’s users: seven advantages (economic efficiency, environmental efficiency, comfort, traffic, socialization, reliability, curiosity) and five disadvantages (lack of effectiveness, lack of flexibility, lack of privacy, danger, lack of trust). Research limitations/implications Although the sample is particularly large (10 percent of the data flow published on Twitter from all over the world in about one year), the automated collection of people’s comments has prevented a more in-depth analysis of users’ thoughts and opinions. Practical implications The research findings may direct entrepreneurs, managers and policy makers to understand the variables to be leveraged and the actions to be taken to take advantage of the potential benefits that carpooling offers. Originality/value The work has utilized skills from three different areas, i.e., business management, computing science and statistics, which have been synergistically integrated for customizing, implementing and using two IT tools capable of automatically identifying, selecting, collecting, categorizing and analyzing people’s tweets about carpooling.


Sign in / Sign up

Export Citation Format

Share Document