Census Data Analysis and Visualization Using R Tool

Author(s):  
Veena Gadad ◽  
Sowmyarani C. N.

As a result of increased usage of internet, a huge amount of data is collected from variety of sources like surveys, census, and sensors in internet of things. This resultant data is coined as big data and analysis of this leads to major decision making. Since the collected data is in raw form, it is difficult to understand inherent properties and it becomes just a liability if not analyzed, summarized, and visualized. Although text can be used to articulate the relation between facts and to explain the findings, presenting it in the form of tables and graphs conveys information effectively. Presentation of data using tools to create visual images in order to gain more insights into data is called as data visualization. Data analysis is processing and interpretation of data to discover useful information and to deduce certain inferences based on the values. This chapter concerns usage of R tool and understanding its effectiveness for data analysis and intelligent data visualization by experimenting on data set obtained from University of California Irvine Machine Learning Repository.

2021 ◽  
Author(s):  
Fabian Schlebusch ◽  
Frederic Kehrein ◽  
Rainer Röhrig ◽  
Barbara Namer ◽  
Ekaterina Kutafina

openMNGlab is an open-source software framework for data analysis, tailored for the specific needs of microneurography – a type of electrophysiological technique particularly important for research on peripheral neural fibers coding. Currently, openMNGlab loads data from Spike2 and Dapsys, which are two major data acquisition solutions. By building on top of the Neo software, openMNGlab can be easily extended to handle the most common electrophysiological data formats. Furthermore, it provides methods for data visualization, fiber tracking, and a modular feature database to extract features for data analysis and machine learning.


2021 ◽  
Vol 2 (1) ◽  
pp. 77-88
Author(s):  
Rakhmat Purnomo ◽  
Wowon Priatna ◽  
Tri Dharma Putra

The dynamics of higher education are changing and emphasize the need to adapt quickly. Higher education is under the supervision of accreditation agencies, governments and other stakeholders to seek new ways to improve and monitor student success and other institutional policies. Many agencies fail to make efficient use of the large amounts of available data. With the use of big data analytics in higher education, it can be obtained more insight into students, academics, and the process in higher education so that it supports predictive analysis and improves decision making. The purpose of this research is to implement big data analytical to increase the decision making of the competent party. This research begins with the identification of process data based on analytical learning, academic and process in the campus environment. The data used in this study is a public dataset from UCI machine learning, from the 33 available varibales, 4 varibales are used to measure student performance. Big data analysis in this study uses spark apace as a library to operate pyspark so that python can process big data analysis. The data already in the master slave is grouped using k-mean clustering to get the best performing student group. The results of this study succeeded in grouping students into 5 clusters, cluster 1 including the best student performance and cluster 5 including the lowest student performance


Author(s):  
Vanshika Lamba

Data analysis is the core part which needs to be done over the data in order to gather its characteristics for further specifications and estimations. But achieving the goal of extracting the maximum useful characteristics is the main barrier in the path of any organization. Data analysis plays an important role in the success of organization as it helps in proper decision making. And the best decision comes out by analyzing the past information, their present scenario and future impacts. But most of the information is extracted in the numerical form from the data set collected. Therefore, we need to select some proper summary statistics for the data exploration purpose. For eg - mean, median, mode, etc. This paper focuses on the classes of summary statistics to be used for the data analysis and how important is its use in the data exploration. The paper majorly concentrates on the measure of location and a brief idea about measure of dispersion and how measure of location is related to measure of spread.


2021 ◽  
Vol 1 (2) ◽  
pp. 16-24
Author(s):  
V Mareeswari ◽  
Sunita Chalageri ◽  
Kavita K Patil

Chronic kidney disease (CKD) is a world heath issues, and that also includes damages and can’t filter blood the way it should be. since we cannot predict the early stages of CKD, patience will fail to recognise the disease. Pre detection of CKD will allow patience to get timely facility to ameliorate the progress of the disease. Machine learning models will effectively aid clinician’s progress this goal because of the early and accurate recognition performances. The CKD data set is collected from the University of California Irvine (UCI) Machine Learning Recognition. Multiple Machine and deep learning algorithm used to predict the chronic kidney disease.


Author(s):  
Abigail Christina Fernandez

Data is just data if it is not put to proper comprehensive usage. Information is Knowledge and Knowledge gets upgraded to wisdom pertaining to insight in the relevant field of analysis. Data Science has become the key that unravels many pitches of interest in diversified fields of quest. It is of optimal stipulation that the solutions that the Artificial Intelligence Algorithms provide should do justice to the intent for which what it was built. But at times, inadvertently the word bias is declaimed, which has become an implicit or explicit inclusion in the Algorithms and the data collection methodologies incorporated. IT companies manoeuvring this technology need to treat this hushed underplay in prediction and decision making with top-notch priority to epitomise this imminent episode of Machine Learning in Data Analysis.


Author(s):  
Soma Roychowdhury ◽  
Debasis Bhattacharya

Every field of study generates a huge amount of data. The volume of data generated leads to information overload, and the ability to make sense of all these data is becoming increasingly important. This requires a good understanding of the data to be analyzed and different statistical techniques to be used in that context. On the basis of the issues important to the data set as well as other practical considerations, it is necessary to select appropriate methods to apply to the problem under study. This work focuses on different issues arising in the context of data analysis which need attention like understanding classifications of data, magnitude of errors in measurement, missing observations in the data set, outlier observations and their influences on the conclusion derived from the data, non-normal data, meta analysis, etc. In the process of discussion some examples have been included to illustrate how critical a data analysis procedure could be in order to make a meaningful decision from a data set.


2020 ◽  
Vol 9 (11) ◽  
pp. 671
Author(s):  
Alexander Bustamante ◽  
Laura Sebastia ◽  
Eva Onaindia

Integrating collaborative data in data-driven Business Intelligence (BI) system brings an opportunity to foster the decision-making process towards improving tourism competitiveness. This article presents BITOUR, a BI platform that integrates four collaborative data sources (Twitter, Openstreetmap, Tripadvisor and Airbnb). BITOUR follows a classical BI architecture and provides functionalities for data transformation, data processing, data analysis and data visualization. At the core of the data processing, BITOUR offers mechanisms to identify tourists in Twitter, assign tweets to attractions and accommodation sites from Tripadvisor and Airbnb, analyze sentiments in opinions issued by tourists, and all this using geolocation objects in Openstreetmap. With all these ingredients, BITOUR enables data analysis and visualization to answer questions like the most frequented places by tourists, the average stay length or the view of visitors of some particular destination.


2020 ◽  
Vol 27 (10) ◽  
pp. 2721-2757
Author(s):  
Rajat Kumar Behera ◽  
Pradip Kumar Bala ◽  
Rashmi Jain

PurposeAny business that opts to adopt a recommender engine (RE) for various potential benefits must choose from the candidate solutions, by matching to the task of interest and domain. The purpose of this paper is to choose RE that fits best from a set of candidate solutions using rule-based automated machine learning (ML) approach. The objective is to draw trustworthy conclusion, which results in brand building, and establishing a reliable relation with customers and undeniably to grow the business.Design/methodology/approachAn experimental quantitative research method was conducted in which the ML model was evaluated with diversified performance metrics and five RE algorithms by combining offline evaluation on historical and simulated movie data set, and the online evaluation on business-alike near-real-time data set to uncover the best-fitting RE.FindingsThe rule-based automated evaluation of RE has changed the testing landscape, with the removal of longer duration of manual testing and not being comprehensive. It leads to minimal manual effort with high-quality results and can possibly bring a new revolution in the testing practice to start a service line “Machine Learning Testing as a service” (MLTaaS) and the possibility of integrating with DevOps that can specifically help agile team to ship a fail-safe RE evaluation product targeting SaaS (software as a service) or cloud deployment.Research limitations/implicationsA small data set was considered for A/B phase study and was captured for ten movies from three theaters operating in a single location in India, and simulation phase study was captured for two movies from three theaters operating from the same location in India. The research was limited to Bollywood and Ollywood movies for A/B phase, and Ollywood movies for simulation phase.Practical implicationsThe best-fitting RE facilitates the business to make personalized recommendations, long-term customer loyalty forecasting, predicting the company's future performance, introducing customers to new products/services and shaping customer's future preferences and behaviors.Originality/valueThe proposed rule-based ML approach named “2-stage locking evaluation” is self-learned, automated by design and largely produces time-bound conclusive result and improved decision-making process. It is the first of a kind to examine the business domain and task of interest. In each stage of the evaluation, low-performer REs are excluded which leads to time-optimized and cost-optimized solution. Additionally, the combination of offline and online evaluation methods offer benefits, such as improved quality with self-learning algorithm, faster time to decision-making by significantly reducing manual efforts with end-to-end test coverage, cognitive aiding for early feedback and unattended evaluation and traceability by identifying the missing test metrics coverage.


Sign in / Sign up

Export Citation Format

Share Document