scholarly journals A machine learning strategy that leverages large datasets to boost statistical power in small-scale experiments

2019 ◽  
Author(s):  
William E. Fondrie ◽  
William S. Noble

AbstractMachine learning methods have proven invaluable for increasing the sensitivity of peptide detection in proteomics experiments. Most modern tools, such as Percolator and PeptideProphet, use semi-supervised algorithms to learn models directly from the datasets that they analyze. Although these methods are effective for many proteomics experiments, we suspected that they may be suboptimal for experiments of smaller scale. In this work, we found that the power and consistency of Percolator results was reduced as the size of the experiment was decreased. As an alternative, we propose a different operating mode for Percolator: learn a model with Per-colator from a large dataset and use the learned model to evaluate the small-scale experiment. We call this a “static modeling” approach, in contrast to Percolator’s usual “dynamic model” that is trained anew for each dataset. We applied this static modeling approach to two settings: small, gel-based experiments and single-cell proteomics. In both cases, static models increased the yield of detected peptides and eliminated the model-induced variability of the standard dynamic approach. These results suggest that static models are a powerful tool for bringing the full benefits of Percolator and other semi-supervised algorithms to small-scale experiments.Abstract Figure

2020 ◽  
pp. 140-148
Author(s):  
Md. Kumail Naqvi ◽  
Mrinal Anthwal ◽  
Ravindra Kumar

Biogas is the product of anaerobic vitiation of biodegradable matter. This paper focuses on the need of alternative and green sources of energy at a household level and how biogas produced from the everyday organic waste has the potential and possibility to replace LPG cylinders at houses, shops etc. and empower us to step towards an eco-friendly future. The purpose this small-scale experiment has been to find the perfect input matter that is easy to acquire and which produces the maximum amount of gas from minimum input and within small period of waste retention. Four different types of input waste material containing different quantities of cow dung and kitchen food waste were studied through individual experimental setups. Waste was mixed and kept at room temperature and the pH and total solid concentration of the samples were recorded on regular intervals. From the experiment it was found that the optimum yield of biogas at a small scale, based on the parameters such as retention period, pH and total solid con-centration can be obtained by the use of food waste form households and kitchens. The exact composition has been discussed in this paper. The energy generated by the small-scale generator has also been compared to that of an LPG cylinder and an LPG replacement model has also been presented.


2020 ◽  
Vol 1 (1) ◽  
pp. 1-10
Author(s):  
Evi Rahmawati ◽  
Irnin Agustina Dwi Astuti ◽  
N Nurhayati

IPA Integrated is a place for students to study themselves and the surrounding environment applied in daily life. Integrated IPA Learning provides a direct experience to students through the use and development of scientific skills and attitudes. The importance of integrated IPA requires to pack learning well, integrated IPA integration with the preparation of modules combined with learning strategy can maximize the learning process in school. In SMP 209 Jakarta, the value of the integrated IPA is obtained from 34 students there are 10 students completed and 24 students are not complete because they get the value below the KKM of 68. This research is a development study with the development model of ADDIE (Analysis, Design, Development, Implementation, and Evaluation). The use of KPS-based integrated IPA modules (Science Process sSkills) on the theme of rainbow phenomenon obtained by media expert validation results with an average score of 84.38%, average material expert 82.18%, average linguist 75.37%. So the average of all aspects obtained by 80.55% is worth using and tested to students. The results of the teacher response obtained 88.69% value with excellent criteria. Student responses on a small scale acquired an average score of 85.19% with highly agreed criteria and on the large-scale student response gained a yield of 86.44% with very agreed criteria. So the module can be concluded receiving a good response by the teacher and students.


2020 ◽  
Vol 11 (1) ◽  
pp. 1-21
Author(s):  
Bastiaan Bruinsma

AbstractWhile the design of voting advice applications (VAAs) is witnessing an increasing amount of attention, one aspect has until now been overlooked: its visualisations. This is remarkable, as it are those visualisations that communicate to the user the advice of the VAA. Therefore, this article aims to provide a first look at which visualisations VAAs adopt, why they adopt them, and how users comprehend them. For this, I will look at how design choices, specifically those on matching, influence the type of visualisation VAAs not only do but also have to, use. Second, I will report the results of a small-scale experiment that looked if all users comprehend similar visualisations in the same way. Here, I find that this is often not the case and that the interpretations of the users often differ. These first results suggest that VAA visualisations are wrongly underappreciated and demand closer attention of VAA designers.


2021 ◽  
Vol 209 ◽  
pp. 104493
Author(s):  
Haili Liao ◽  
Hanyu Mei ◽  
Gang Hu ◽  
Bo Wu ◽  
Qi Wang

2021 ◽  
Vol 61 (9) ◽  
pp. 4266-4279 ◽  
Author(s):  
Kuo Hao Lee ◽  
Andrew D. Fant ◽  
Jiqing Guo ◽  
Andy Guan ◽  
Joslyn Jung ◽  
...  

2021 ◽  
Vol 11 (2) ◽  
pp. 472
Author(s):  
Hyeongmin Cho ◽  
Sangkyun Lee

Machine learning has been proven to be effective in various application areas, such as object and speech recognition on mobile systems. Since a critical key to machine learning success is the availability of large training data, many datasets are being disclosed and published online. From a data consumer or manager point of view, measuring data quality is an important first step in the learning process. We need to determine which datasets to use, update, and maintain. However, not many practical ways to measure data quality are available today, especially when it comes to large-scale high-dimensional data, such as images and videos. This paper proposes two data quality measures that can compute class separability and in-class variability, the two important aspects of data quality, for a given dataset. Classical data quality measures tend to focus only on class separability; however, we suggest that in-class variability is another important data quality factor. We provide efficient algorithms to compute our quality measures based on random projections and bootstrapping with statistical benefits on large-scale high-dimensional data. In experiments, we show that our measures are compatible with classical measures on small-scale data and can be computed much more efficiently on large-scale high-dimensional datasets.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Florent Le Borgne ◽  
Arthur Chatton ◽  
Maxime Léger ◽  
Rémi Lenain ◽  
Yohann Foucher

AbstractIn clinical research, there is a growing interest in the use of propensity score-based methods to estimate causal effects. G-computation is an alternative because of its high statistical power. Machine learning is also increasingly used because of its possible robustness to model misspecification. In this paper, we aimed to propose an approach that combines machine learning and G-computation when both the outcome and the exposure status are binary and is able to deal with small samples. We evaluated the performances of several methods, including penalized logistic regressions, a neural network, a support vector machine, boosted classification and regression trees, and a super learner through simulations. We proposed six different scenarios characterised by various sample sizes, numbers of covariates and relationships between covariates, exposure statuses, and outcomes. We have also illustrated the application of these methods, in which they were used to estimate the efficacy of barbiturates prescribed during the first 24 h of an episode of intracranial hypertension. In the context of GC, for estimating the individual outcome probabilities in two counterfactual worlds, we reported that the super learner tended to outperform the other approaches in terms of both bias and variance, especially for small sample sizes. The support vector machine performed well, but its mean bias was slightly higher than that of the super learner. In the investigated scenarios, G-computation associated with the super learner was a performant method for drawing causal inferences, even from small sample sizes.


2021 ◽  
Author(s):  
Tom Young ◽  
Tristan Johnston-Wood ◽  
Volker L. Deringer ◽  
Fernanda Duarte

Predictive molecular simulations require fast, accurate and reactive interatomic potentials. Machine learning offers a promising approach to construct such potentials by fitting energies and forces to high-level quantum-mechanical data, but...


Sign in / Sign up

Export Citation Format

Share Document