Comparative Study of Classification Techniques For Large Scale Data - Case Study

The existence of Massive datasets that are generated in many applications provides various opportunities and challenges. Especially, scalable mining of such large-scale datasets is a challenging issue that attracted some recent research. In the present study, the main focus is to analyse the classification techniques using WEKA machine learning workbench. Moreover, a large-scale dataset was used. This dataset comes from the protein structure prediction field. It has already been partitioned into training and test sets using the ten-fold cross-validation methodology. In this experiment, nine different methods have been tested. As a result, it became obvious that it is not applicable to test more than one classifier from the (tree) family in the same experiment. On the other hand, using (NaiveBayes) Classifier with the default properties of the attribute selection filter has a great time consuming. Finally, varying the parameters of the attribute selections should be prioritized for more accurate results.

Download Full-text

Exploration of Tourist Activities in Urban Destination Using Venue Check-In Data

Journal of Hospitality & Tourism Research ◽

10.1177/1096348019889121 ◽

2019 ◽

Vol 44 (3) ◽

pp. 472-498

Author(s):

Huy Quan Vu ◽

Jian Ming Luo ◽

Gang Li ◽

Rob Law

Keyword(s):

Data Collection ◽

Large Scale ◽

Traditional Approach ◽

Urban Tourism ◽

Data Set ◽

Tourism Marketing ◽

Large Scale Data ◽

New Type ◽

Scale Data

Understanding the differences and similarities in the activities of tourists from various cultures is important for tourism managers to develop appropriate plans and strategies that could support urban tourism marketing and managements. However, tourism managers still face challenges in obtaining such understanding because the traditional approach of data collection, which relies on survey and questionnaires, is incapable of capturing tourist activities at a large scale. In this article, we present a method for the study of tourist activities based on a new type of data, venue check-ins. The effectiveness of the presented approach is demonstrated through a case study of a major tourism country, France. Analysis based on a large-scale data set from 19 tourism cities in France reveals interesting differences and similarities in the activities of tourists from 14 markets (countries). Valuable insights are provided for various urban tourism applications.

Download Full-text

Developing a ‘Semi-Systematic’ Approach to Using Large-Scale Data-Sets for Small-Scale Interventions: The ‘Baby Matterz’ Initiative as a Case Study

The Urban Review ◽

10.1007/s11256-009-0144-z ◽

2010 ◽

Vol 43 (2) ◽

pp. 235-254

Author(s):

Mark O’Brien

Keyword(s):

Large Scale ◽

Systematic Approach ◽

Small Scale ◽

Data Sets ◽

Large Scale Data ◽

Scale Data ◽

Large Scale Data Sets

Download Full-text

Thinking in multitudes: Questionnaires and composite cases in early American psychology

History of the Human Sciences ◽

10.1177/0952695120903909 ◽

2020 ◽

Vol 33 (3-4) ◽

pp. 160-174 ◽

Cited By ~ 1

Author(s):

Jacy L. Young

Keyword(s):

Scientific Reasoning ◽

Large Scale ◽

Individual Case ◽

Early American ◽

Large Scale Data ◽

American Psychology ◽

Questionnaire Research ◽

The Individual ◽

Scale Data

In the late 19th century, the questionnaire was one means of taking the case study into the multitudes. This article engages with Forrester’s idea of thinking in cases as a means of interrogating questionnaire-based research in early American psychology. Questionnaire research was explicitly framed by psychologists as a practice involving both natural historical and statistical forms of scientific reasoning. At the same time, questionnaire projects failed to successfully enact the latter aspiration in terms of synthesizing masses of collected data into a coherent whole. Difficulties in managing the scores of descriptive information questionnaires generated ensured the continuing presence of individuals in the results of this research, as the individual case was excerpted and discussed alongside a cast of others. As a consequence, questionnaire research embodied an amalgam of case, natural historical, and statistical thinking. Ultimately, large-scale data collection undertaken with questionnaires failed in its aim to construct composite exemplars or ‘types’ of particular kinds of individuals; to produce the singular from the multitudes.

Download Full-text

Collecting and Classifying Large Scale Data to Build an Adaptive and Collective Memory: a Case Study in e-Health for a Pro-active Management

Multi-Agent Systems - Modeling, Interactions, Simulations and Case Studies ◽

10.5772/14988 ◽

2011 ◽

Author(s):

Singer Nicolas ◽

Trouilhet Sylvie ◽

Rammal Ali ◽

Pecatte Jean-Marie

Keyword(s):

Collective Memory ◽

Large Scale ◽

Active Management ◽

Large Scale Data ◽

Scale Data

Download Full-text

DNA Transnational Data Journeys and the Construction of Categories of Suspicion

Canadian Journal of Communication ◽

10.22230/cjc.2020v45n1a3441 ◽

2020 ◽

Vol 45 (1) ◽

Author(s):

Helena Machado ◽

Rafaela Granja

Keyword(s):

Large Scale ◽

National Level ◽

Societal Implications ◽

Large Scale Data ◽

Nous Montrons ◽

Nous Examinons ◽

Scale Data ◽

Shape Data ◽

The Eu

Background Systems for large-scale data exchanges are playing a pivotal role in the governance, surveillance, and social control of criminality in different parts of the world.Analysis This article explores the case study of the Prüm system, which is a technological system for the exchange of DNA data among several European Union (EU) countries. Making use of the concept of data journeys, it addresses how the transnational exchange of DNA data in the EU implicates the construction of categories of suspicion.Conclusion and implications The article shows how supranational- and national-level notions and attitudes over the ownership of data shape data journeys, and it discusses the societal implications of datafication and emerging data justice issues.Contexte Les systèmes d’échange de données à grande échelle jouent un rôle central dans la gouvernance, la surveillance et le contrôle social de la criminalité dans différentes régions du monde. Analyse Dans cet article, nous prenons l’étude de cas du système Prüm, qui est un système technologique permettant l’échange de données d’ADN entre plusieurs pays de l’Union européenne (UE). En utilisant le concept de trajets de données, nous examinons comment l’échange transnational de données d’ADN dans l’UE implique la construction de catégories de suspicion. Conclusion et implications Nous montrons comment les trajets de données sont façonnés par des notions et attitudes supranationales et nationales sur la propriété des données et discutons des implications sociétales de la communication des données et des nouveaux problèmes émergents de justice des données.

Download Full-text

Deep Learning Method for RNA Secondary Structure Prediction with Pseudoknots Based on Large-Scale Data

Journal of Healthcare Engineering ◽

10.1155/2021/6699996 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Bowen Shen ◽

Hao Zhang ◽

Cong Li ◽

Tianheng Zhao ◽

Yuanning Liu

Keyword(s):

Deep Learning ◽

Secondary Structure ◽

Structure Prediction ◽

Rna Secondary Structure ◽

Large Scale ◽

Secondary Structure Prediction ◽

Learning Methods ◽

Rna Secondary Structure Prediction ◽

Large Scale Data ◽

Scale Data

Traditional machine learning methods are widely used in the field of RNA secondary structure prediction and have achieved good results. However, with the emergence of large-scale data, deep learning methods have more advantages than traditional machine learning methods. As the number of network layers increases in deep learning, there will often be problems such as increased parameters and overfitting. We used two deep learning models, GoogLeNet and TCN, to predict RNA secondary results. And from the perspective of the depth and width of the network, improvements are made based on the neural network model, which can effectively improve the computational efficiency while extracting more feature information. We process the existing real RNA data through experiments, use deep learning models to extract useful features from a large amount of RNA sequence data and structure data, and then predict the extracted features to obtain each base’s pairing probability. The characteristics of RNA secondary structure and dynamic programming methods are used to process the base prediction results, and the structure with the largest sum of the probability of each base pairing is obtained, and this structure will be used as the optimal RNA secondary structure. We, respectively, evaluated GoogLeNet and TCN models based on 5sRNA, tRNA data, and tmRNA data, and compared them with other standard prediction algorithms. The sensitivity and specificity of the GoogLeNet model on the 5sRNA and tRNA data sets are about 16% higher than the best prediction results in other algorithms. The sensitivity and specificity of the GoogLeNet model on the tmRNA dataset are about 9% higher than the best prediction results in other algorithms. As deep learning algorithms’ performance is related to the size of the data set, as the scale of RNA data continues to expand, the prediction accuracy of deep learning methods for RNA secondary structure will continue to improve.

Download Full-text

USER FRIENDLY OPEN GIS TOOL FOR LARGE SCALE DATA ASSIMILATION – A CASE STUDY OF HYDROLOGICAL MODELLING

ISPRS - International Archives of the Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsarchives-xxxix-b4-427-2012 ◽

2012 ◽

Vol XXXIX-B4 ◽

pp. 427-430 ◽

Cited By ~ 2

Author(s):

P. K. Gupta

Keyword(s):

Data Assimilation ◽

Large Scale ◽

Hydrological Modelling ◽

Large Scale Data ◽

User Friendly ◽

Scale Data ◽

Open Gis

Download Full-text

Immersive Media, Scientific Visualization, and Global Umwelt

Advances in Media, Entertainment, and the Arts - Handbook of Research on the Global Impacts and Roles of Immersive Media ◽

10.4018/978-1-7998-2433-6.ch020 ◽

2020 ◽

pp. 416-429

Author(s):

Julieta Cristina Aguilera

Keyword(s):

Climate Change ◽

Human Body ◽

Large Scale ◽

Scientific Visualization ◽

Human Interactions ◽

Large Scale Data ◽

Sensory Motor ◽

Scale Data ◽

Immersive Media

This chapter deals with the global implications of immersive media: First, it considers how the concept of the umwelt can be used to address the extension of sensory motor capabilities of the human body. Next, it discusses what the implications are when the concept of the human umwelt is applied to scientific visualization in astronomy, which scales space and time to present data. Then, these scientific visualizations are discussed in the context of planetarium domes and what it means to collectively experience an immersive environment based on large scale data. As a case study, the final section articulates what this entails for the understanding of the effects of collective human interactions with our planetary environment at this stage of climate change.

Download Full-text

Requirements and Data Content Evaluation of Industry In-House Data Management System

Volume 3: 30th Computers and Information in Engineering Conference, Parts A and B ◽

10.1115/detc2010-28548 ◽

2010 ◽

Cited By ~ 4

Author(s):

Beshoy Morkos ◽

Shraddha Joshi ◽

Joshua D. Summers ◽

Gregory G. Mocko

Keyword(s):

Data Management ◽

Management System ◽

Large Scale ◽

Data Management System ◽

Industrial Case Study ◽

Large Scale Data ◽

Data Content ◽

Scale Data ◽

Future Growth

This paper presents an industrial case study performed on an in-house developed data management system for an automation firm. This data management system has been in use and evolving over a span of fifteen years. To ensure the system is robust to withstand the future growth of the corporation, a study is done to identify deficiencies that may prohibit efficient large scale data management. Specifically, this case study focused on the means in which project requirements are managed and explored the issues of perceived utility in the system. Two major findings are presented: completion metrics are not consistent or expressive of the actual needs and there is no linking between the activities and the original client requirements. Thus, the results of the study were used to depict the potential vulnerability of such deficiencies.

Download Full-text

Applying Collocation Analysis to Chinese Discourse: A Case Study of Causal Connectives

Lingua Sinica ◽

10.2478/linguasinica-2020-0004 ◽

2020 ◽

Vol 6 (1) ◽

pp. 1-24

Author(s):

Yipu Wei ◽

Dirk Speelman ◽

Jacqueline Evers-Vermeul

Keyword(s):

Large Scale ◽

Methodological Issues ◽

Large Scale Data ◽

Manual Analysis ◽

Discourse Relations ◽

Discourse Studies ◽

Definition Of ◽

Scale Data ◽

Selection Of

Abstract Collocation analysis can be used to extract meaningful linguistic information from large-scale corpus data. This paper reviews the methodological issues one may encounter when performing collocation analysis for discourse studies on Chinese. We propose four crucial aspects to consider in such analyses: (i) the definition of collocates according to various parameters; (ii) the choice of analysis and association measures; (iii) the definition of the search span; and (iv) the selection of corpora for analysis. To illustrate how these aspects can be addressed when applying a Chinese collocation analysis, we conducted a case study of two Chinese causal connectives: yushi ‘that is why’ and yin’er ‘as a result’. The distinctive collocation analysis shows how these two connectives differ in volitionality, an important dimension of discourse relations. The study also demonstrates that collocation analysis, as an explorative approach based on large-scale data, can provide valuable converging evidence for corpus-based studies that have been conducted with laborious manual analysis on limited datasets.

Download Full-text