scholarly journals Comparative Study of Classification Techniques For Large Scale Data - Case Study

2017 ◽  
Vol 2 (3) ◽  
pp. 56-61
Author(s):  
Nigar M.Shafiq Surameery ◽  
Dana Lattef Hussein

The existence of Massive datasets that are generated in many applications provides various opportunities and challenges. Especially, scalable mining of such large-scale datasets is a challenging issue that attracted some recent research. In the present study, the main focus is to analyse the classification techniques using WEKA machine learning workbench. Moreover, a large-scale dataset was used. This dataset comes from the protein structure prediction field. It has already been partitioned into training and test sets using the ten-fold cross-validation methodology. In this experiment, nine different methods have been tested. As a result, it became obvious that it is not applicable to test more than one classifier from the (tree) family in the same experiment. On the other hand, using (NaiveBayes) Classifier with the default properties of the attribute selection filter has a great time consuming. Finally, varying the parameters of the attribute selections should be prioritized for more accurate results.

2019 ◽  
Vol 44 (3) ◽  
pp. 472-498
Author(s):  
Huy Quan Vu ◽  
Jian Ming Luo ◽  
Gang Li ◽  
Rob Law

Understanding the differences and similarities in the activities of tourists from various cultures is important for tourism managers to develop appropriate plans and strategies that could support urban tourism marketing and managements. However, tourism managers still face challenges in obtaining such understanding because the traditional approach of data collection, which relies on survey and questionnaires, is incapable of capturing tourist activities at a large scale. In this article, we present a method for the study of tourist activities based on a new type of data, venue check-ins. The effectiveness of the presented approach is demonstrated through a case study of a major tourism country, France. Analysis based on a large-scale data set from 19 tourism cities in France reveals interesting differences and similarities in the activities of tourists from 14 markets (countries). Valuable insights are provided for various urban tourism applications.


2020 ◽  
Vol 33 (3-4) ◽  
pp. 160-174 ◽  
Author(s):  
Jacy L. Young

In the late 19th century, the questionnaire was one means of taking the case study into the multitudes. This article engages with Forrester’s idea of thinking in cases as a means of interrogating questionnaire-based research in early American psychology. Questionnaire research was explicitly framed by psychologists as a practice involving both natural historical and statistical forms of scientific reasoning. At the same time, questionnaire projects failed to successfully enact the latter aspiration in terms of synthesizing masses of collected data into a coherent whole. Difficulties in managing the scores of descriptive information questionnaires generated ensured the continuing presence of individuals in the results of this research, as the individual case was excerpted and discussed alongside a cast of others. As a consequence, questionnaire research embodied an amalgam of case, natural historical, and statistical thinking. Ultimately, large-scale data collection undertaken with questionnaires failed in its aim to construct composite exemplars or ‘types’ of particular kinds of individuals; to produce the singular from the multitudes.


2020 ◽  
Vol 45 (1) ◽  
Author(s):  
Helena Machado ◽  
Rafaela Granja

Background  Systems for large-scale data exchanges are playing a pivotal role in the governance, surveillance, and social control of criminality in different parts of the world.Analysis  This article explores the case study of the Prüm system, which is a technological system for the exchange of DNA data among several European Union (EU) countries. Making use of the concept of data journeys, it addresses how the transnational exchange of DNA data in the EU implicates the construction of categories of suspicion.Conclusion and implications The article shows how supranational- and national-level notions and attitudes over the ownership of data shape data journeys, and it discusses the societal implications of datafication and emerging data justice issues.Contexte Les systèmes d’échange de données à grande échelle jouent un rôle central dans la gouvernance, la surveillance et le contrôle social de la criminalité dans différentes régions du monde. Analyse  Dans cet article, nous prenons l’étude de cas du système Prüm, qui est un système technologique permettant l’échange de données d’ADN entre plusieurs pays de l’Union européenne (UE). En utilisant le concept de trajets de données, nous examinons comment l’échange transnational de données d’ADN dans l’UE implique la construction de catégories de suspicion. Conclusion et implications  Nous montrons comment les trajets de données sont façonnés par des notions et attitudes supranationales et nationales sur la propriété des données et discutons des implications sociétales de la communication des données et des nouveaux problèmes émergents de justice des données.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Bowen Shen ◽  
Hao Zhang ◽  
Cong Li ◽  
Tianheng Zhao ◽  
Yuanning Liu

Traditional machine learning methods are widely used in the field of RNA secondary structure prediction and have achieved good results. However, with the emergence of large-scale data, deep learning methods have more advantages than traditional machine learning methods. As the number of network layers increases in deep learning, there will often be problems such as increased parameters and overfitting. We used two deep learning models, GoogLeNet and TCN, to predict RNA secondary results. And from the perspective of the depth and width of the network, improvements are made based on the neural network model, which can effectively improve the computational efficiency while extracting more feature information. We process the existing real RNA data through experiments, use deep learning models to extract useful features from a large amount of RNA sequence data and structure data, and then predict the extracted features to obtain each base’s pairing probability. The characteristics of RNA secondary structure and dynamic programming methods are used to process the base prediction results, and the structure with the largest sum of the probability of each base pairing is obtained, and this structure will be used as the optimal RNA secondary structure. We, respectively, evaluated GoogLeNet and TCN models based on 5sRNA, tRNA data, and tmRNA data, and compared them with other standard prediction algorithms. The sensitivity and specificity of the GoogLeNet model on the 5sRNA and tRNA data sets are about 16% higher than the best prediction results in other algorithms. The sensitivity and specificity of the GoogLeNet model on the tmRNA dataset are about 9% higher than the best prediction results in other algorithms. As deep learning algorithms’ performance is related to the size of the data set, as the scale of RNA data continues to expand, the prediction accuracy of deep learning methods for RNA secondary structure will continue to improve.


Author(s):  
Julieta Cristina Aguilera

This chapter deals with the global implications of immersive media: First, it considers how the concept of the umwelt can be used to address the extension of sensory motor capabilities of the human body. Next, it discusses what the implications are when the concept of the human umwelt is applied to scientific visualization in astronomy, which scales space and time to present data. Then, these scientific visualizations are discussed in the context of planetarium domes and what it means to collectively experience an immersive environment based on large scale data. As a case study, the final section articulates what this entails for the understanding of the effects of collective human interactions with our planetary environment at this stage of climate change.


Author(s):  
Beshoy Morkos ◽  
Shraddha Joshi ◽  
Joshua D. Summers ◽  
Gregory G. Mocko

This paper presents an industrial case study performed on an in-house developed data management system for an automation firm. This data management system has been in use and evolving over a span of fifteen years. To ensure the system is robust to withstand the future growth of the corporation, a study is done to identify deficiencies that may prohibit efficient large scale data management. Specifically, this case study focused on the means in which project requirements are managed and explored the issues of perceived utility in the system. Two major findings are presented: completion metrics are not consistent or expressive of the actual needs and there is no linking between the activities and the original client requirements. Thus, the results of the study were used to depict the potential vulnerability of such deficiencies.


Lingua Sinica ◽  
2020 ◽  
Vol 6 (1) ◽  
pp. 1-24
Author(s):  
Yipu Wei ◽  
Dirk Speelman ◽  
Jacqueline Evers-Vermeul

Abstract Collocation analysis can be used to extract meaningful linguistic information from large-scale corpus data. This paper reviews the methodological issues one may encounter when performing collocation analysis for discourse studies on Chinese. We propose four crucial aspects to consider in such analyses: (i) the definition of collocates according to various parameters; (ii) the choice of analysis and association measures; (iii) the definition of the search span; and (iv) the selection of corpora for analysis. To illustrate how these aspects can be addressed when applying a Chinese collocation analysis, we conducted a case study of two Chinese causal connectives: yushi ‘that is why’ and yin’er ‘as a result’. The distinctive collocation analysis shows how these two connectives differ in volitionality, an important dimension of discourse relations. The study also demonstrates that collocation analysis, as an explorative approach based on large-scale data, can provide valuable converging evidence for corpus-based studies that have been conducted with laborious manual analysis on limited datasets.


Sign in / Sign up

Export Citation Format

Share Document