Machine Learning for Large-scale/Panel Data and Learning Analytics Data Analysis

2019 ◽  
Vol 35 (2) ◽  
pp. 313-338 ◽  
Author(s):  
Jin Eun Yoo
Author(s):  
Napoliana Souza ◽  
Gabriela Perry

Massive Open Online Courses (MOOCs) are a type of online coursewere students have little interaction,  no instructor, and in some cases, no deadlines to finisch assignments. For this reason, a better understanding of student affection in MOOCs is importantant could have potential to open new perspectives for this type of course. The recent popularization of tools, code libraries and algorithms for intensive data analysis made possible collect data from text and interaction with the platforms, which can be used to infer correlations between affection and learning. In this context, a bibliographical review was carried out, considering the period between 2012 and 2018, with the goal of identifying which methods are being to identify affective states. Three databases were used: ACM Digital Library, IEEE Xplore and Scopus, and 46 papers were found. The articles revealed that the most common methods are related to data intensive techinques (i.e. machine learning, sentiment analysis and, more broadly, learning analytics). Methods such as physiological signal recognition andself-report were less frequent.


Energies ◽  
2021 ◽  
Vol 14 (10) ◽  
pp. 2775
Author(s):  
Florian Marcel Nuţă ◽  
Alina Cristina Nuţă ◽  
Cristina Gabriela Zamfir ◽  
Stefan-Mihai Petrea ◽  
Dan Munteanu ◽  
...  

The work at hand assesses several driving factors of carbon emissions in terms of urbanization and energy-related parameters on a panel of emerging European economies, between 1990 and 2015. The use of machine learning algorithms and panel data analysis offered the possibility to determine the importance of the input variables by applying three algorithms (Random forest, XGBoost, and AdaBoost) and then by modeling the urbanization and the impact of energy intensity on the carbon emissions. The empirical results confirm the relationship between urbanization and energy intensity on CO2 emissions. The findings emphasize that separate components of energy consumption affect carbon emissions and, therefore, a transition toward renewable sources for energy needs is desirable. The models from the current study confirm previous studies’ observations made for other countries and regions. Urbanization, as a process, has an influence on the carbon emissions more than the actual urban regions do, confirming that all the activities carried out as urbanization efforts are more harmful than the resulted urban area. It is proper to say that the urban areas tend to embrace modern, more green technologies but the road to achieve environmentally friendly urban areas is accompanied by less environmentally friendly industries (such as the cement industry) and a high consumption of nonrenewable energy.


2021 ◽  
Vol 6 (2) ◽  
pp. 1515-1537
Author(s):  
Wei Xue ◽  
◽  
Pengcheng Wan ◽  
Qiao Li ◽  
Ping Zhong ◽  
...  

2020 ◽  
Author(s):  
Qiang Gu ◽  
Anup Kumar ◽  
Simon Bray ◽  
Allison Creason ◽  
Alireza Khanteymoori ◽  
...  

AbstractSupervised machine learning, where the goal is to predict labels of new instances by training on labeled data, has become an essential tool in biomedical data analysis. To make supervised machine learning more accessible to biomedical scientists, we have developed Galaxy-ML, a platform that enables scientists to perform end-to-end reproducible machine learning analyses at large scale using only a web browser. Galaxy-ML extends Galaxy, a biomedical computational workbench used by tens of thousands of scientists across the world, with a machine learning tool suite that supports end-to-end analysis.


2020 ◽  
Author(s):  
Mihaela E. Sardiu ◽  
Box C. Andrew ◽  
Jeff Haug ◽  
Michael P. Washburn

AbstractMachine learning and topological analysis methods are becoming increasingly used on various large-scale omics datasets. Modern high dimensional flow cytometry data sets share many features with other omics datasets like genomics and proteomics. For example, genomics or proteomics datasets can be sparse and have high dimensionality, and flow cytometry datasets can also share these features. This makes flow cytometry data potentially a suitable candidate for employing machine learning and topological scoring strategies, for example, to gain novel insights into patterns within the data. We have previously developed the Topological Score (TopS) and implemented it for the analysis of quantitative protein interaction network datasets. Here we show that the TopS approach for large scale data analysis is applicable to the analysis of a previously described flow cytometry sorted human hematopoietic stem cell dataset. We demonstrate that TopS is capable of effectively sorting this dataset into cell populations and identify rare cell populations. We demonstrate the utility of TopS when coupled with multiple approaches including topological data analysis, X-shift clustering, and t-Distributed Stochastic Neighbor Embedding (t-SNE). Our results suggest that TopS could be effectively used to analyze large scale flow cytometry datasets to find rare cell populations.


2021 ◽  
Vol 17 (6) ◽  
pp. e1009014
Author(s):  
Qiang Gu ◽  
Anup Kumar ◽  
Simon Bray ◽  
Allison Creason ◽  
Alireza Khanteymoori ◽  
...  

Supervised machine learning is an essential but difficult to use approach in biomedical data analysis. The Galaxy-ML toolkit (https://galaxyproject.org/community/machine-learning/) makes supervised machine learning more accessible to biomedical scientists by enabling them to perform end-to-end reproducible machine learning analyses at large scale using only a web browser. Galaxy-ML extends Galaxy (https://galaxyproject.org), a biomedical computational workbench used by tens of thousands of scientists across the world, with a suite of tools for all aspects of supervised machine learning.


Sign in / Sign up

Export Citation Format

Share Document