imperfect data
Recently Published Documents


TOTAL DOCUMENTS

178
(FIVE YEARS 51)

H-INDEX

14
(FIVE YEARS 1)

Author(s):  
Kishor Raut

Abstract: Nowadays getting a good job is a vigorous and vast competition and many fail in the first step i.e., Resume shortlisting due to either imperfect data in the resume or imperfect/wrong resume format. Recruiter hardly takes 10-15 seconds to judge you upon your resume. In this survey paper, we point out a comparative study on different methods used for resume building and which technology is used to build them. Some of the methods use Android applications, some use Desktop applications. This paper makes a detailed analysis and talks about the merits and demerits of various Resume building methods. Keywords: Android applications, Desktop applications, Recruiter, Vigorous, Vast.


2022 ◽  
Vol 18 (1) ◽  
Author(s):  
Michael A. Stoto ◽  
Abbey Woolverton ◽  
John Kraemer ◽  
Pepita Barlow ◽  
Michael Clarke

Abstract Background The COVID-19 pandemic has led to an avalanche of scientific studies, drawing on many different types of data. However, studies addressing the effectiveness of government actions against COVID-19, especially non-pharmaceutical interventions, often exhibit data problems that threaten the validity of their results. This review is thus intended to help epidemiologists and other researchers identify a set of data issues that, in our view, must be addressed in order for their work to be credible. We further intend to help journal editors and peer reviewers when evaluating studies, to apprise policy-makers, journalists, and other research consumers about the strengths and weaknesses of published studies, and to inform the wider debate about the scientific quality of COVID-19 research. Results To this end, we describe common challenges in the collection, reporting, and use of epidemiologic, policy, and other data, including completeness and representativeness of outcomes data; their comparability over time and among jurisdictions; the adequacy of policy variables and data on intermediate outcomes such as mobility and mask use; and a mismatch between level of intervention and outcome variables. We urge researchers to think critically about potential problems with the COVID-19 data sources over the specific time periods and particular locations they have chosen to analyze, and to choose not only appropriate study designs but also to conduct appropriate checks and sensitivity analyses to investigate the impact(s) of potential threats on study findings. Conclusions In an effort to encourage high quality research, we provide recommendations on how to address the issues we identify. Our first recommendation is for researchers to choose an appropriate design (and the data it requires). This review describes considerations and issues in order to identify the strongest analytical designs and demonstrates how interrupted time-series and comparative longitudinal studies can be particularly useful. Furthermore, we recommend that researchers conduct checks or sensitivity analyses of the results to data source and design choices, which we illustrate. Regardless of the approaches taken, researchers should be explicit about the kind of data problems or other biases that the design choice and sensitivity analyses are addressing.


2021 ◽  
Author(s):  
Christina Bohk-Ewald ◽  
Enrique Acosta ◽  
Tim Riffe ◽  
Christian Dudel ◽  
Mikko Myrskyla

How deadly is an infection with SARS-CoV-2 worldwide over time? This information is critical for developing and assessing public health responses on the country and global levels. However, imperfect data have been the most limiting factor for estimating the COVID-19 infection fatality burden during the first year of the pandemic. Here we leverage recently emerged compelling data sources and broadly applicable modeling strategies to estimate the crude infection fatality rate (cIFR) in 77 countries from 28 March 2020 to 31 March 2021, using 2.4 million reported deaths and estimated 435 million infections by age, sex, country, and date. The global average of all cIFR estimates is 1.2% (10th to 90th percentile: 0.2% to 2.4%). The cIFR varies strongly across countries, but little within countries over time, and it is often lower for women than men. Cross-country differences in cIFR are largely driven by the age structures of both the general and the truly infected population. While the broad trends and patterns of the cIFR estimates are more robust, we show that their levels are uncertain and sensitive to input data and modeling choices. In consequence, increased efforts at collecting high-quality data are essential for accurately estimating the cIFR, which is a key indicator for better understanding the health and mortality consequences of this pandemic.


2021 ◽  
Vol 8 (1) ◽  
pp. 67-92
Author(s):  
Tado Jurić

Understanding how people react to the COVID-19 crisis, and what the consequences are of the COVID-19 pandemic is key to enable public health and other agencies to develop optimal intervention strategies. Because the timely identification of new cases of infection has proven to be the key to timely respond to the spread of infection within a particular region, we have developed a method that can detect and predict the emergence of new cases of COVID-19 at an early stage. Further, this method can give useful insights into a family’s life during the pandemic and give the prediction of birth rates. The basic methodological concept of our approach is to monitor the digital trace of language searches with the Google Trends analytical tool (GT). We divided the keyword frequency for selected words giving us a search frequency index and then compared searches with official statistics to prove the significations of results. 1) Google Trends tools are suitable for predicting the emergence of new COVID-19 cases in Croatia. The data collected by this method correlate with official data. In Croatia search activities using GT for terms such as “PCR +COVID”, and symptoms “cough + corona”, “pneumonia + corona”; “muscle pain + corona” correlate strongly with officially reported cases of the disease. 2) The method also shows effects on family life, increase in stress, and domestic violence. 3) Birth rate in 2021 will be just 87% of what it would be “a normal year” in Croatia. 4) This tool can give useful insights into domestic violence. Unquestionably, there are still significant open methodological issues and the questionable integrity of the data obtained using this source. The fact is also a problem that GT does not provide data on which population was sampled or how it was structured. Although these open-ended issues pose serious challenges for making clear estimates, statistics offer a range of tools available to deal with imperfect data as well as to develop controls that take data quality into account. All these insights show that GT has the potential to capture attitudes in the broad spectrum of family life themes. The benefit of this method is reliable estimates that can enable public health officials to prepare and better respond to the possible return of a pandemic in certain parts of the country and the need for responses to protect family well-being. Keywords: Google trends, COVID-19, birth rates, domestic violence, Croatia, predicting demographic trends, family


2021 ◽  
Vol 11 (23) ◽  
pp. 11326
Author(s):  
Nesrine Rahmouni ◽  
Domitile Lourdeaux ◽  
Azzeddine Benabbou ◽  
Tahar Bensebaa

This work is related to the diagnosis process in intelligent tutoring systems (ITS). This process is usually a complex task that relies on imperfect data. Indeed, learning data may suffer from imprecision, uncertainty, and sometimes contradictions. In this paper, we propose Diag-Skills a diagnosis model that uses the theory of belief functions to capture these imperfections. The objective of this work is twofold: first, a dynamic diagnosis of the evaluated skills, then, the prediction of the state of the non-evaluated ones. We conducted two studies to evaluate the prediction precision of Diag-Skills. The evaluations showed good precision in predictions and almost perfect agreement with the instructor when the model failed to predict the effective state of the skill. Our main premise is that these results will serve as a support to the remediation and the feedbacks given to the learners by providing them a proper personalization.


2021 ◽  
Author(s):  
Madalina Ciortan ◽  
Matthieu Defrance

Subspace clustering identifies multiple feature subspaces embedded in a dataset together with the underlying sample clusters. When applied to omic data, subspace clustering is a challenging task, as additional problems have to be addressed: the curse of dimensionality, the imperfect data quality and cluster separation, the presence of multiple subspaces representative of divergent views of the dataset, and the lack of consensus on the best clustering method. First, we propose a computational method discover to perform subspace clustering on tabular high dimensional data by maximizing the internal clustering score (i.e. cluster compactness) of feature subspaces. Our algorithm can be used in both unsupervised and semi-supervised settings. Secondly, by applying our method to a large set of omic datasets (i.e. microarray, bulk RNA-seq, scRNA-seq), we show that the subspace corresponding to the provided ground truth annotations is rarely the most compact one, as assumed by the methods maximizing the internal quality of clusters. Our results highlight the difficulty of fully validating subspace clusters (justified by the lack of feature annotations). Tested on identifying the ground-truth subspace, our method compared favorably with competing techniques on all datasets. Finally, we propose a suite of techniques to interpret the clustering results biologically in the absence of annotations. We demonstrate that subspace clustering can provide biologically meaningful sample-wise and feature-wise information, typically missed by traditional methods.


2021 ◽  
Author(s):  
Damien Delforge ◽  
Olivier de Viron ◽  
Marnik Vanclooster ◽  
Michel Van Camp ◽  
Arnaud Watlet

Abstract. We investigate the potential of causal inference methods (CIMs) to reveal hydrological connections from time-series. Four CIMs are selected from two criteria, linear or nonlinear, and bivariate or multivariate. A priori, multivariate and nonlinear CIMs are best suited for revealing hydrological connections because they suit nonlinear processes and deal with confounding factors such as rainfall, evapotranspiration, or seasonality. The four methods are applied to a synthetic case and a real karstic study case. The synthetic experiment indicates that, unlike the other methods, the multivariate nonlinear framework has a low false-positive rate and allows for ruling out a connection between two disconnected reservoirs forced with similar effective precipitation. However, the multivariate nonlinear method appears unstable when it comes to real cases, making the overall meaning of the causal links uncertain. Nevertheless, all CIMs bring valuable insights into the system’s dynamics, making them a cost-effective and recommendable tool for exploring data. Still, causal inference remains attached to subjective choices and operational constraints while building the dataset or constraining the analysis. As a result, the robustness of the conclusions that the CIMs can draw deserves to be questioned, especially with real and imperfect data. Therefore, alongside research perspectives, we encourage a flexible, informed, and limit-aware use of CIMs, without omitting any other approach that aims at the causal understanding of a system.


Sign in / Sign up

Export Citation Format

Share Document