Foundations of Data Science
Latest Publications


TOTAL DOCUMENTS

78
(FIVE YEARS 78)

H-INDEX

3
(FIVE YEARS 3)

Published By American Institute Of Mathematical Sciences

2639-8001

2022 ◽  
Vol 0 (0) ◽  
pp. 0
Author(s):  
Sarai Hedges ◽  
Kim Given

<p style='text-indent:20px;'>More research is needed involving middle school students' engagement in the statistical problem-solving process, particularly the beginning process steps: formulate a question and make a plan to collect data/consider the data. Further, the increased availability of large-scale electronically accessible data sets is an untapped area of study. This interpretive study examined middle school students' understanding of statistical concepts involved in making a plan to collect data to answer a statistical question within a social issue context using data available on the internet. Student artifacts, researcher notes, and audio and video recordings from nine groups of 20 seventh-grade students in two gifted education pull-out classes at a suburban middle school were used to answer the study research questions. Data were analyzed using a priori codes from previously developed frameworks and by using an inductive approach to find themes.</p><p style='text-indent:20px;'>Three themes that emerged from data related to confirmation bias. Some middle school students held preconceptions about the social issues they chose to study that biased their statistical questions. This in turn influenced the sources of data students used to answer their questions. Confirmation bias is a serious issue that is exacerbated due to endless sources of data electronically available. We argue that this type of bias should be addressed early in students' educational experiences. Based on the findings from this study, we offer recommendations for future research and implications for statistics and data science education.</p>


2022 ◽  
Vol 0 (0) ◽  
pp. 0
Author(s):  
Esmail Abdul Fattah ◽  
Janet Van Niekerk ◽  
Håvard Rue

<p style='text-indent:20px;'>Computing the gradient of a function provides fundamental information about its behavior. This information is essential for several applications and algorithms across various fields. One common application that requires gradients are optimization techniques such as stochastic gradient descent, Newton's method and trust region methods. However, these methods usually require a numerical computation of the gradient at every iteration of the method which is prone to numerical errors. We propose a simple limited-memory technique for improving the accuracy of a numerically computed gradient in this gradient-based optimization framework by exploiting (1) a coordinate transformation of the gradient and (2) the history of previously taken descent directions. The method is verified empirically by extensive experimentation on both test functions and on real data applications. The proposed method is implemented in the <inline-formula><tex-math id="M1">\begin{document}$\texttt{R} $\end{document}</tex-math></inline-formula> package <inline-formula><tex-math id="M2">\begin{document}$ \texttt{smartGrad}$\end{document}</tex-math></inline-formula> and in C<inline-formula><tex-math id="M3">\begin{document}$ \texttt{++} $\end{document}</tex-math></inline-formula>.</p>


2021 ◽  
Vol 0 (0) ◽  
pp. 0
Author(s):  
Govinda Anantha Padmanabha ◽  
Nicholas Zabaras

2021 ◽  
Vol 0 (0) ◽  
pp. 0
Author(s):  
Hengrui Luo ◽  
Alice Patania ◽  
Jisu Kim ◽  
Mikael Vejdemo-Johansson

<p style='text-indent:20px;'>Topological Data Analysis (TDA) provides novel approaches that allow us to analyze the geometrical shapes and topological structures of a dataset. As one important application, TDA can be used for data visualization and dimension reduction. We follow the framework of circular coordinate representation, which allows us to perform dimension reduction and visualization for high-dimensional datasets on a torus using persistent cohomology. In this paper, we propose a method to adapt the circular coordinate framework to take into account the roughness of circular coordinates in change-point and high-dimensional applications. To do that, we use a generalized penalty function instead of an <inline-formula><tex-math id="M1">\begin{document}$ L_{2} $\end{document}</tex-math></inline-formula> penalty in the traditional circular coordinate algorithm. We provide simulation experiments and real data analyses to support our claim that circular coordinates with generalized penalty will detect the change in high-dimensional datasets under different sampling schemes while preserving the topological structures.</p>


2021 ◽  
Vol 0 (0) ◽  
pp. 0
Author(s):  
John Maclean ◽  
Elaine T. Spiller

<p style='text-indent:20px;'>Many recent advances in sequential assimilation of data into nonlinear high-dimensional models are modifications to particle filters which employ efficient searches of a high-dimensional state space. In this work, we present a complementary strategy that combines statistical emulators and particle filters. The emulators are used to learn and offer a computationally cheap approximation to the forward dynamic mapping. This emulator-particle filter (Emu-PF) approach requires a modest number of forward-model runs, but yields well-resolved posterior distributions even in non-Gaussian cases. We explore several modifications to the Emu-PF that utilize mechanisms for dimension reduction to efficiently fit the statistical emulator, and present a series of simulation experiments on an atypical Lorenz-96 system to demonstrate their performance. We conclude with a discussion on how the Emu-PF can be paired with modern particle filtering algorithms.</p>


2021 ◽  
Vol 0 (0) ◽  
pp. 0
Author(s):  
Theodore Papamarkou ◽  
Alexey Lindo ◽  
Eric B. Ford

2021 ◽  
Vol 0 (0) ◽  
pp. 0
Author(s):  
Moon Duchin ◽  
Tom Needham ◽  
Thomas Weighill

2021 ◽  
Vol 0 (0) ◽  
pp. 0
Author(s):  
Rui Wang ◽  
Rundong Zhao ◽  
Emily Ribando-Gros ◽  
Jiahui Chen ◽  
Yiying Tong ◽  
...  
Keyword(s):  

2021 ◽  
Vol 0 (0) ◽  
pp. 0
Author(s):  
Christopher Oballe ◽  
David Boothe ◽  
Piotr J. Franaszczuk ◽  
Vasileios Maroulas

<p style='text-indent:20px;'>We propose ToFU, a new trainable neural network unit with a persistence diagram dissimilarity function as its activation. Since persistence diagrams are topological summaries of structures, this new activation measures and learns the topology of data to leverage it in machine learning tasks. We showcase the utility of ToFU in two experiments: one involving the classification of discrete-time autoregressive signals, and another involving a variational autoencoder. In the former, ToFU yields competitive results with networks that use spectral features while outperforming CNN architectures. In the latter, ToFU produces topologically-interpretable latent space representations of inputs without sacrificing reconstruction fidelity.</p>


Sign in / Sign up

Export Citation Format

Share Document