An Empirical Approach to Temporal Reference Resolution

Scheduling dialogs, during which people negotiate the times of appointments, are common in everyday life. This paper reports the results of an in-depth empirical investigation of resolving explicit temporal references in scheduling dialogs. There are four phases of this work: data annotation and evaluation, model development, system implementation and evaluation, and model evaluation and analysis. The system and model were developed primarily on one set of data, and then applied later to a much more complex data set, to assess the generalizability of the model for the task being performed. Many different types of empirical methods are applied to pinpoint the strengths and weaknesses of the approach. Detailed annotation instructions were developed and an intercoder reliability study was performed, showing that naive annotators can reliably perform the targeted annotations. A fully automatic system has been developed and evaluated on unseen test data, with good results on both data sets. We adopt a pure realization of a recency-based focus model to identify precisely when it is and is not adequate for the task being addressed. In addition to system results, an in-depth evaluation of the model itself is presented, based on detailed manual annotations. The results are that few errors occur specifically due to the model of focus being used, and the set of anaphoric relations defined in the model are low in ambiguity for both data sets.

Download Full-text

A Visual and VAE Based Hierarchical Indoor Localization Method

Sensors ◽

10.3390/s21103406 ◽

2021 ◽

Vol 21 (10) ◽

pp. 3406

Author(s):

Jie Jiang ◽

Yin Zou ◽

Lidong Chen ◽

Yujie Fang

Keyword(s):

Image Retrieval ◽

Indoor Localization ◽

Data Sets ◽

Indoor Environments ◽

Global Features ◽

Data Set ◽

Data Annotation ◽

Wide Range ◽

Annotation Costs ◽

Global And Local

Precise localization and pose estimation in indoor environments are commonly employed in a wide range of applications, including robotics, augmented reality, and navigation and positioning services. Such applications can be solved via visual-based localization using a pre-built 3D model. The increase in searching space associated with large scenes can be overcome by retrieving images in advance and subsequently estimating the pose. The majority of current deep learning-based image retrieval methods require labeled data, which increase data annotation costs and complicate the acquisition of data. In this paper, we propose an unsupervised hierarchical indoor localization framework that integrates an unsupervised network variational autoencoder (VAE) with a visual-based Structure-from-Motion (SfM) approach in order to extract global and local features. During the localization process, global features are applied for the image retrieval at the level of the scene map in order to obtain candidate images, and are subsequently used to estimate the pose from 2D-3D matches between query and candidate images. RGB images only are used as the input of the proposed localization system, which is both convenient and challenging. Experimental results reveal that the proposed method can localize images within 0.16 m and 4° in the 7-Scenes data sets and 32.8% within 5 m and 20° in the Baidu data set. Furthermore, our proposed method achieves a higher precision compared to advanced methods.

Download Full-text

Classification of Clinically Significant Prostate Cancer on Multi-Parametric MRI: A Validation Study Comparing Deep Learning and Radiomics

Cancers ◽

10.3390/cancers14010012 ◽

2021 ◽

Vol 14 (1) ◽

pp. 12

Author(s):

Jose M. Castillo T. ◽

Muhammad Arif ◽

Martijn P. A. Starmans ◽

Wiro J. Niessen ◽

Chris H. Bangma ◽

...

Keyword(s):

Prostate Cancer ◽

Deep Learning ◽

Characteristic Curve ◽

Model Development ◽

Learning Model ◽

Multiparametric Mri ◽

Data Sets ◽

Data Set ◽

Test Sets ◽

Deep Learning Model

The computer-aided analysis of prostate multiparametric MRI (mpMRI) could improve significant-prostate-cancer (PCa) detection. Various deep-learning- and radiomics-based methods for significant-PCa segmentation or classification have been reported in the literature. To be able to assess the generalizability of the performance of these methods, using various external data sets is crucial. While both deep-learning and radiomics approaches have been compared based on the same data set of one center, the comparison of the performances of both approaches on various data sets from different centers and different scanners is lacking. The goal of this study was to compare the performance of a deep-learning model with the performance of a radiomics model for the significant-PCa diagnosis of the cohorts of various patients. We included the data from two consecutive patient cohorts from our own center (n = 371 patients), and two external sets of which one was a publicly available patient cohort (n = 195 patients) and the other contained data from patients from two hospitals (n = 79 patients). Using multiparametric MRI (mpMRI), the radiologist tumor delineations and pathology reports were collected for all patients. During training, one of our patient cohorts (n = 271 patients) was used for both the deep-learning- and radiomics-model development, and the three remaining cohorts (n = 374 patients) were kept as unseen test sets. The performances of the models were assessed in terms of their area under the receiver-operating-characteristic curve (AUC). Whereas the internal cross-validation showed a higher AUC for the deep-learning approach, the radiomics model obtained AUCs of 0.88, 0.91 and 0.65 on the independent test sets compared to AUCs of 0.70, 0.73 and 0.44 for the deep-learning model. Our radiomics model that was based on delineated regions resulted in a more accurate tool for significant-PCa classification in the three unseen test sets when compared to a fully automated deep-learning model.

Download Full-text

Alternative Clustering

Advances in Business Information Systems and Analytics - Applying Predictive Analytics Within the Service Sector ◽

10.4018/978-1-5225-2148-8.ch001 ◽

2017 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Avinash Navlani ◽

V. B. Gupta

Keyword(s):

Research Problem ◽

Research Community ◽

Data Sets ◽

Complex Data ◽

High Quality ◽

Data Set ◽

Alternative Clustering ◽

Complex Data Sets ◽

Data Objects ◽

Community Clustering

In the last couple of decades, clustering has become a very crucial research problem in the data mining research community. Clustering refers to the partitioning of data objects such as records and documents into groups or clusters of similar characteristics. Clustering is unsupervised learning, because of unsupervised nature there is no unique solution for all problems. Most of the time complex data sets require explanation in multiple clustering sets. All the Traditional clustering approaches generate single clustering. There is more than one pattern in a dataset; each of patterns can be interesting in from different perspectives. Alternative clustering intends to find all unlike groupings of the data set such that each grouping has high quality and distinct from each other. This chapter gives you an overall view of alternative clustering; it's various approaches, related work, comparing with various confusing related terms like subspace, multi-view, and ensemble clustering, applications, issues, and challenges.

Download Full-text

Improving Quality and Safety Through Use of Secondary Data: Methods Case Study

Western Journal of Nursing Research ◽

10.1177/0193945916672449 ◽

2016 ◽

Vol 39 (11) ◽

pp. 1477-1501 ◽

Cited By ~ 1

Author(s):

Victoria Goode ◽

Nancy Crego ◽

Michael P. Cary ◽

Deirdre Thornlow ◽

Elizabeth Merwin

Keyword(s):

Research Question ◽

Secondary Data ◽

Data Access ◽

Data Sets ◽

Complex Data ◽

Management Skills ◽

Data Set ◽

Large Numbers ◽

Need To Evaluate

Researchers need to evaluate the strengths and weaknesses of data sets to choose a secondary data set to use for a health care study. This research method review informs the reader of the major issues necessary for investigators to consider while incorporating secondary data into their repertoire of potential research designs and shows the range of approaches the investigators may take to answer nursing research questions in a variety of context areas. The researcher requires expertise in locating and judging data sets and in the development of complex data management skills for managing large numbers of records. There are important considerations such as firm knowledge of the research question supported by the conceptual framework and the selection of appropriate databases, which guide the researcher in delineating the unit of analysis. Other more complex issues for researchers to consider when conducting secondary data research methods include data access, management and security, and complex variable construction.

Download Full-text

Seasonal and intra-diurnal variability of small-scale gravity waves in OH airglow at two Alpine stations

Atmospheric Measurement Techniques ◽

10.5194/amt-12-457-2019 ◽

2019 ◽

Vol 12 (1) ◽

pp. 457-469 ◽

Cited By ~ 2

Author(s):

Patrick Hannawald ◽

Carsten Schmidt ◽

René Sedlak ◽

Sabine Wüst ◽

Michael Bittner

Keyword(s):

Gravity Waves ◽

High Frequency ◽

Phase Speed ◽

Propagation Direction ◽

Field Of View ◽

Small Scale ◽

Data Sets ◽

Data Set ◽

Diurnal Variability ◽

Fully Automatic

Abstract. Between December 2013 and August 2017 the instrument FAIM (Fast Airglow IMager) observed the OH airglow emission at two Alpine stations. A year of measurements was performed at Oberpfaffenhofen, Germany (48.09∘ N, 11.28∘ E) and 2 years at Sonnblick, Austria (47.05∘ N, 12.96∘ E). Both stations are part of the network for the detection of mesospheric change (NDMC). The temporal resolution is two frames per second and the field-of-view is 55 km × 60 km and 75 km × 90 km at the OH layer altitude of 87 km with a spatial resolution of 200 and 280 m per pixel, respectively. This resulted in two dense data sets allowing precise derivation of horizontal gravity wave parameters. The analysis is based on a two-dimensional fast Fourier transform with fully automatic peak extraction. By combining the information of consecutive images, time-dependent parameters such as the horizontal phase speed are extracted. The instrument is mainly sensitive to high-frequency small- and medium-scale gravity waves. A clear seasonal dependency concerning the meridional propagation direction is found for these waves in summer in the direction to the summer pole. The zonal direction of propagation is eastwards in summer and westwards in winter. Investigations of the data set revealed an intra-diurnal variability, which may be related to tides. The observed horizontal phase speed and the number of wave events per observation hour are higher in summer than in winter.

Download Full-text

THE INFLUENCE OF MAGNITUDES TYPES IN THE NONEXTENSIVITY APPLIED AT THE CIRCUM-PACIFIC SUBDUCTION ZONES

Brazilian Journal of Geophysics ◽

10.22564/rbgf.v36i4.1982 ◽

2018 ◽

Vol 36 (4) ◽

pp. 1

Author(s):

Thaís Machado Scherrer ◽

George Sand França ◽

Raimundo Silva ◽

Daniel Brito de Freitas ◽

Carlos da Silva Vilar

Keyword(s):

Solid Earth ◽

Subduction Zone ◽

Subduction Zones ◽

Time Interval ◽

Data Sets ◽

Data Set ◽

Different Types ◽

Data Source ◽

The Impact

ABSTRACT. Following our own previous work, we reanalyze the nonextensive behavior over the circum-Pacific subduction zones evaluating the impact of using different types of magnitudes in the results. We used the same data source and time interval of our previous work, the NEIC catalog in the years between 2001 and 2010. Even considering different data sets, the correlation between q and the subduction zone asperity is perceptible, but the values found for the nonextensive parameter in the considered data sets presents an expressive variation. The data set with surface magnitude exhibits the best adjustments.Keywords: Nonextensivity, Seismicity, Solid Earth, Earthquake.RESUMO. No mesmo caminho do nosso trabalho anterior, reanalisamos o comportamento não extensivo sobre as zonas de subducção do círcuo de fogo do Pacífico, avaliando o impacto do uso de diferentes tipos de magnitude nos resultados. Utilizamos o mesmo intervalo de dados e fonte de nosso trabalho anterior, do catálogo NEIC entre os anos 2001 e 2010. Mesmo considerando diferentes conjuntos de dados, a correlação entre q e a aspereza das zonas de subducção é perceptível, mas os valores encontrados para o parâmetro não extensivo no conjuntos de dados considerados apresentam uma variação expressiva. O conjunto de dados com magnitude de superfície exibe os melhores ajustes.Palavras-chave: Não extensividade, Sismicidade, Terra Sólida, Terremotos.

Download Full-text

Clusterdv, a simple density-based clustering method that is robust, general and automatic

10.1101/224840 ◽

2017 ◽

Author(s):

João C. Marques ◽

Michael B. Orger

Keyword(s):

Clustering Algorithm ◽

Underlying Structure ◽

Data Sets ◽

Natural Phenomena ◽

Cluster Number ◽

Data Set ◽

Density Peaks ◽

Wide Range ◽

Cluster Shape ◽

Fully Automatic

AbstractHow to partition a data set into a set of distinct clusters is a ubiquitous and challenging problem. The fact that data varies widely in features such as cluster shape, cluster number, density distribution, background noise, outliers and degree of overlap, makes it difficult to find a single algorithm that can be broadly applied. One recent method, clusterdp, based on search of density peaks, can be applied successfully to cluster many kinds of data, but it is not fully automatic, and fails on some simple data distributions. We propose an alternative approach, clusterdv, which estimates density dips between points, and allows robust determination of cluster number and distribution across a wide range of data, without any manual parameter adjustment. We show that this method is able to solve a range of synthetic and experimental data sets, where the underlying structure is known, and identifies consistent and meaningful clusters in new behavioral data.Author summarIt is common that natural phenomena produce groupings, or clusters, in data, that can reveal the underlying processes. However, the form of these clusters can vary arbitrarily, making it challenging to find a single algorithm that identifies their structure correctly, without prior knowledge of the number of groupings or their distribution. We describe a simple clustering algorithm that is fully automatic and is able to correctly identify the number and shape of groupings in data of many types. We expect this algorithm to be useful in finding unknown natural phenomena present in data from a wide range of scientific fields.

Download Full-text

Statistics, Data Mining, and Machine Learning in Astronomy

10.23943/princeton/9780691151687.001.0001 ◽

2014 ◽

Cited By ~ 191

Author(s):

Željko Ivezic ◽

Andrew J. Connolly ◽

Jacob T VanderPlas ◽

Alexander Gray

Keyword(s):

Sloan Digital Sky Survey ◽

Data Sets ◽

Complex Data ◽

Dark Energy Survey ◽

Astronomical Surveys ◽

Sky Survey ◽

Different Types ◽

Complex Data Sets ◽

Coding Standards ◽

Energy Survey

As telescopes, detectors, and computers grow ever more powerful, the volume of data at the disposal of astronomers and astrophysicists will enter the petabyte domain, providing accurate measurements for billions of celestial objects. This book provides a comprehensive and accessible introduction to the cutting-edge statistical methods needed to efficiently analyze complex data sets from astronomical surveys such as the Panoramic Survey Telescope and Rapid Response System, the Dark Energy Survey, and the upcoming Large Synoptic Survey Telescope. It serves as a practical handbook for graduate students and advanced undergraduates in physics and astronomy, and as an indispensable reference for researchers. The book presents a wealth of practical analysis problems, evaluates techniques for solving them, and explains how to use various approaches for different types and sizes of data sets. For all applications described in the book, Python code and example data sets are provided. The supporting data sets have been carefully selected from contemporary astronomical surveys (for example, the Sloan Digital Sky Survey) and are easy to download and use. The accompanying Python code is publicly available, well documented, and follows uniform coding standards. Together, the data sets and code enable readers to reproduce all the figures and examples, evaluate the methods, and adapt them to their own fields of interest.

Download Full-text

A Review of Regression Models in Machine Learning

10.51682/jiscom.00202005.2021 ◽

2021 ◽

Vol 2 (2) ◽

pp. 40-47

Author(s):

Sunil Kumar ◽

Vaibhav Bhatnagar

Keyword(s):

Machine Learning ◽

Regression Analysis ◽

Regression Model ◽

Regression Models ◽

Machine Learning Algorithms ◽

Data Sets ◽

Analysis Model ◽

Data Set ◽

Data Regression ◽

Different Types

Machine learning is one of the active fields and technologies to realize artificial intelligence (AI). The complexity of machine learning algorithms creates problems to predict the best algorithm. There are many complex algorithms in machine learning (ML) to determine the appropriate method for finding regression trends, thereby establishing the correlation association in the middle of variables is very difficult, we are going to review different types of regressions used in Machine Learning. There are mainly six types of regression model Linear, Logistic, Polynomial, Ridge, Bayesian Linear and Lasso. This paper overview the above-mentioned regression model and will try to find the comparison and suitability for Machine Learning. A data analysis prerequisite to launch an association amongst the innumerable considerations in a data set, association is essential for forecast and exploration of data. Regression Analysis is such a procedure to establish association among the datasets. The effort on this paper predominantly emphases on the diverse regression analysis model, how they binning to custom in context of different data sets in machine learning. Selection the accurate model for exploration is the most challenging assignment and hence, these models considered thoroughly in this study. In machine learning by these models in the perfect way and thru accurate data set, data exploration and forecast can provide the maximum exact outcomes.

Download Full-text

The Impact of Normalization Methods on RNA-Seq Data Analysis

BioMed Research International ◽

10.1155/2015/621690 ◽

2015 ◽

Vol 2015 ◽

pp. 1-10 ◽

Cited By ~ 44

Author(s):

J. Zyprych-Walczak ◽

A. Szabelska ◽

L. Handschuh ◽

K. Górczak ◽

K. Klamecka ◽

...

Keyword(s):

High Throughput Sequencing ◽

Data Sets ◽

Complex Data ◽

Rna Seq ◽

Medical Problems ◽

Data Set ◽

Normalization Methods ◽

Wide Range ◽

The Impact ◽

Selection Of

High-throughput sequencing technologies, such as the Illumina Hi-seq, are powerful new tools for investigating a wide range of biological and medical problems. Massive and complex data sets produced by the sequencers create a need for development of statistical and computational methods that can tackle the analysis and management of data. The data normalization is one of the most crucial steps of data processing and this process must be carefully considered as it has a profound effect on the results of the analysis. In this work, we focus on a comprehensive comparison of five normalization methods related to sequencing depth, widely used for transcriptome sequencing (RNA-seq) data, and their impact on the results of gene expression analysis. Based on this study, we suggest a universal workflow that can be applied for the selection of the optimal normalization procedure for any particular data set. The described workflow includes calculation of the bias and variance values for the control genes, sensitivity and specificity of the methods, and classification errors as well as generation of the diagnostic plots. Combining the above information facilitates the selection of the most appropriate normalization method for the studied data sets and determines which methods can be used interchangeably.

Download Full-text