scholarly journals A detection metric designed for O’Connell effect eclipsing binaries

Author(s):  
Kyle B. Johnston ◽  
Rana Haber ◽  
Saida M. Caballero-Nieves ◽  
Adrian M. Peter ◽  
Véronique Petit ◽  
...  

Abstract We present the construction of a novel time-domain signature extraction methodology and the development of a supporting supervised pattern detection algorithm. We focus on the targeted identification of eclipsing binaries that demonstrate a feature known as the O’Connell effect. Our proposed methodology maps stellar variable observations to a new representation known as distribution fields (DFs). Given this novel representation, we develop a metric learning technique directly on the DF space that is capable of specifically identifying our stars of interest. The metric is tuned on a set of labeled eclipsing binary data from the Kepler survey, targeting particular systems exhibiting the O’Connell effect. The result is a conservative selection of 124 potential targets of interest out of the Villanova Eclipsing Binary Catalog. Our framework demonstrates favorable performance on Kepler eclipsing binary data, taking a crucial step in preparing the way for large-scale data volumes from next-generation telescopes such as LSST and SKA.

Author(s):  
Denali Molitor ◽  
Deanna Needell

Abstract In today’s data-driven world, storing, processing and gleaning insights from large-scale data are major challenges. Data compression is often required in order to store large amounts of high-dimensional data, and thus, efficient inference methods for analyzing compressed data are necessary. Building on a recently designed simple framework for classification using binary data, we demonstrate that one can improve classification accuracy of this approach through iterative applications whose output serves as input to the next application. As a side consequence, we show that the original framework can be used as a data preprocessing step to improve the performance of other methods, such as support vector machines. For several simple settings, we showcase the ability to obtain theoretical guarantees for the accuracy of the iterative classification method. The simplicity of the underlying classification framework makes it amenable to theoretical analysis.


Author(s):  
Zachary B Abrams ◽  
Caitlin E Coombes ◽  
Suli Li ◽  
Kevin R Coombes

Abstract Summary Unsupervised machine learning provides tools for researchers to uncover latent patterns in large-scale data, based on calculated distances between observations. Methods to visualize high-dimensional data based on these distances can elucidate subtypes and interactions within multi-dimensional and high-throughput data. However, researchers can select from a vast number of distance metrics and visualizations, each with their own strengths and weaknesses. The Mercator R package facilitates selection of a biologically meaningful distance from 10 metrics, together appropriate for binary, categorical and continuous data, and visualization with 5 standard and high-dimensional graphics tools. Mercator provides a user-friendly pipeline for informaticians or biologists to perform unsupervised analyses, from exploratory pattern recognition to production of publication-quality graphics. Availabilityand implementation Mercator is freely available at the Comprehensive R Archive Network (https://cran.r-project.org/web/packages/Mercator/index.html).


2019 ◽  
Vol 48 (4) ◽  
pp. 673-681
Author(s):  
Shufen Zhang ◽  
Zhiyu Liu ◽  
Xuebin Chen ◽  
Changyin Luo

In order to solve the problem of traditional K-Means clustering algorithm in dealing with large-scale data set, a Hadoop K-Means (referred to HKM) clustering algorithm is proposed. Firstly, according to the sample density, the algorithm eliminates the effects of noise points in the data set. Secondly, it optimizes the selection of the initial center point using the thought of the max-min distance. Finally, it uses a MapReduce programming model to realize the parallelization. Experimental results show that the proposed algorithm not only has high accuracy and stability in clustering results, but can also solve the problems of scalability encountered by traditional clustering algorithms in dealing with large scale data.


Lingua Sinica ◽  
2020 ◽  
Vol 6 (1) ◽  
pp. 1-24
Author(s):  
Yipu Wei ◽  
Dirk Speelman ◽  
Jacqueline Evers-Vermeul

Abstract Collocation analysis can be used to extract meaningful linguistic information from large-scale corpus data. This paper reviews the methodological issues one may encounter when performing collocation analysis for discourse studies on Chinese. We propose four crucial aspects to consider in such analyses: (i) the definition of collocates according to various parameters; (ii) the choice of analysis and association measures; (iii) the definition of the search span; and (iv) the selection of corpora for analysis. To illustrate how these aspects can be addressed when applying a Chinese collocation analysis, we conducted a case study of two Chinese causal connectives: yushi ‘that is why’ and yin’er ‘as a result’. The distinctive collocation analysis shows how these two connectives differ in volitionality, an important dimension of discourse relations. The study also demonstrates that collocation analysis, as an explorative approach based on large-scale data, can provide valuable converging evidence for corpus-based studies that have been conducted with laborious manual analysis on limited datasets.


2013 ◽  
Vol 416-417 ◽  
pp. 1076-1079
Author(s):  
Peng Chen ◽  
Ping Hu

Along with the increasing investment in the construction of large-scale data centers as well as operation centers, the users of the data centers have begun to encounter all sorts of difficulties in safety, energy consumption, and automated management. Computer room is the core of the whole automated construction and its establishment rationality, areas, and extensibility have a close tie with the maintenance and flexibility requirements of computer room. Under the demand of users, how to select the most appropriate intelligent power distribution cabinet for the computer room of video monitoring data center has become very important. In this paper, the functions of intelligent power distribution cabinet in computer room are analyzed from the perspective of user demand, and then how to select the most appropriate intelligent power distribution cabinet products is discussed.


Author(s):  
Lore Veelaert ◽  
Ingrid Moons ◽  
Sarah Rohaert ◽  
Els Du Bois

AbstractMaterials experience in design involves the meanings that materials convey to users through its expressive characteristics. Such meaning evoking patterns are influenced by parameters such as context, product (e.g.shape) and user. Consequently, there is a need to standardise experiential material characterisation and large-scale data collection, by means of a meaning-less or ‘neutral’ demonstrator to objectively compare materials.This paper explores the conception of this neutrality and proposes two opposing strategies: neutrality through complexity or through simplicity. In a pre-study with 20 designers, six associative pairs are selected as neutrality criteria, and shaped in 240 forms by 20 (non) designers in a main workshop. Following the simplicity strategy, these forms are averaged out in three steps by a team of five designers, based on a consensus on of delicate-rugged, aggressive-calm, futuristic-calm, masculine-feminine, traditional-modern, and toylike-professional, resulting in a selection of four averaged neutral forms.Finally, future research will focus on complexity to increase interactivity, so that consumers might be triggered in extensive material exploration.


2009 ◽  
Vol 28 (11) ◽  
pp. 2737-2740
Author(s):  
Xiao ZHANG ◽  
Shan WANG ◽  
Na LIAN

Sign in / Sign up

Export Citation Format

Share Document