training sets
Recently Published Documents


TOTAL DOCUMENTS

361
(FIVE YEARS 130)

H-INDEX

27
(FIVE YEARS 5)

2022 ◽  
Author(s):  
Sang He ◽  
Hongyan Liu ◽  
Junhui Zhan ◽  
Yun Meng ◽  
Yamei Wang ◽  
...  

2021 ◽  
Author(s):  
Shuwen Yue ◽  
Marc Riera ◽  
Raja Ghosh ◽  
Athanassios Panagiotopoulos ◽  
Francesco Paesani

Extending on previous work by Riera et al. [J. Chem. Theory Comput. 16, 2246 (2020)], we introduce a second generation family of data-driven many-body MB-nrg models for CO2 and systematically assess how the strength and anisotropy of the CO2-CO2 interactions affect the models' ability to predict vapor, liquid, and vapor-liquid equilibrium properties. Building upon the many-body expansion formalism, we construct a series of MB-nrg models by fitting 1-body and 2-body reference energies calculated at the coupled cluster level of theory for large monomer and dimer training sets. Advancing from the first generation models, we employ the Charge Model 5 scheme to determine the atomic charges and systematically scale the 2-body energies to obtain more accurate descriptions of vapor, liquid, and vapor-liquid equilibrium properties. Comparisons with the polarizable TTM-nrg model, which is constructed from the same training sets as the MB-nrg models but using a simpler representation of short-range interactions based on conventional Born-Mayer functions, showcase the necessity of high dimensional functional forms for an accurate description of the multidimensional energy landscape of liquid CO2. These findings emphasize the key role played by the training set quality and flexibility of the fitting functions in the development of transferable, data-driven models which, accurately representing high-dimensional many-body effects, can enable predictive computer simulations of molecular fluids across the entire phase diagram.


2021 ◽  
Author(s):  
Jennifer Jin ◽  
Myeong Ho Song ◽  
Soo Dong Kim ◽  
Daniel Jin
Keyword(s):  

2021 ◽  
Vol 13 (21) ◽  
pp. 4454
Author(s):  
Yanlong Gao ◽  
Yan Feng ◽  
Xumin Yu

In recent years, the deep neural network has shown a strong presence in classification tasks and its effectiveness has been well proved. However, the framework of DNN usually requires a large number of samples. Compared to the training sets in classification tasks, the training sets for the target detection of hyperspectral images may only include a few target spectra which are quite limited and precious. The insufficient labeled samples make the DNN-based hyperspectral target detection task a challenging problem. To address this problem, we propose a hyperspectral target detection approach with an auxiliary generative adversarial network. Specifically, the training set is first expanded by generating simulated target spectra and background spectra using the generative adversarial network. Then, a classifier which is highly associated with the discriminator of the generative adversarial network is trained based on the real and the generated spectra. Finally, in order to further suppress the background, guided filters are utilized to improve the smoothness and robustness of the detection results. Experiments conducted on real hyperspectral images show the proposed approach is able to perform more efficiently and accurately compared to other target detection approaches.


2021 ◽  
Vol 99 (Supplement_3) ◽  
pp. 28-28
Author(s):  
Jorge Hidalgo ◽  
Daniela Lourenco ◽  
Shogo Tsuruta ◽  
Yutaka Masuda ◽  
Vivian Breen ◽  
...  

Abstract The objectives of this research were to investigate trends for accuracy of genomic predictions over time in a broiler population accumulating data, and to test if data from distant generations are useful in maintaining the accuracy of genomic predictions in selection candidates. The data contained 820k phenotypes for a growth trait (GROW), 200k for two feed efficiency traits (FE1 and FE2), and 42k for a dissection trait (DT). The pedigree included 1.2M animals across 7 years, over 100k from the last 4 years were genotyped. Accuracy was calculated by the linear regression method. Before genotypes became available for training populations, accuracy was nearly stable despite the accumulation of phenotypes and pedigrees. When the first year of genomic data was included in the training population, accuracy increased 56, 77, 39, and 111% for GROW, FE1, FE2, and DT, respectively. With genomic information, the accuracies increased every year except the last one, when they declined for GROW and FE2. The decay of accuracy over time was evaluated in progeny, grand-progeny, and great-grand-progeny of training populations. Without genotypes, the average decline in accuracy across traits was 41% from progeny to grand-progeny, and 19% from grand-progeny to great-grand-progeny. Whit genotypes, the average decline across traits was 14% from progeny to grand-progeny, and 2% from grand-progeny to great-grand-progeny. The accuracies in the last 3 generations were the same when the training population included 5 or 2 years of data, and a marginal decrease was observed when the training population included only 1 year of data. Training sets including genomic information provided an increased accuracy and persistence of genomic predictions compared to training sets without genomic data. The two most recent years of data were enough to maintain the accuracy of predictions in selection candidates.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Ulf Norinder ◽  
Ola Spjuth ◽  
Fredrik Svensson

AbstractConfidence predictors can deliver predictions with the associated confidence required for decision making and can play an important role in drug discovery and toxicity predictions. In this work we investigate a recently introduced version of conformal prediction, synergy conformal prediction, focusing on the predictive performance when applied to bioactivity data. We compare the performance to other variants of conformal predictors for multiple partitioned datasets and demonstrate the utility of synergy conformal predictors for federated learning where data cannot be pooled in one location. Our results show that synergy conformal predictors based on training data randomly sampled with replacement can compete with other conformal setups, while using completely separate training sets often results in worse performance. However, in a federated setup where no method has access to all the data, synergy conformal prediction is shown to give promising results. Based on our study, we conclude that synergy conformal predictors are a valuable addition to the conformal prediction toolbox.


2021 ◽  
Vol 11 ◽  
Author(s):  
Wannian Sui ◽  
Zhangming Chen ◽  
Chuanhong Li ◽  
Peifeng Chen ◽  
Kai Song ◽  
...  

BackgroundLymph node metastasis (LNM) has a significant impact on the prognosis of patients with early gastric cancer (EGC). Our aim was to identify the independent risk factors for LNM and construct nomograms for male and female EGC patients, respectively.MethodsClinicopathological data of 1,742 EGC patients who underwent radical gastrectomy and lymphadenectomy in the First Affiliated Hospital, Second Affiliated Hospital, and Fourth Affiliated Hospital of Anhui Medical University between November 2011 and April 2021 were collected and analyzed retrospectively. Male and female patients from the First Affiliated Hospital of Anhui Medical University were assigned to training sets and then from the Second and Fourth Affiliated Hospitals of Anhui Medical University were enrolled in validation sets. Based on independent risk factors for LNM in male and female EGC patients from the training sets, the nomograms were established respectively, which was also verified by internal validation from the training sets and external validation from the validation sets.ResultsTumor size (odd ratio (OR): 1.386, p = 0.030), depth of invasion (OR: 0.306, p = 0.001), Lauren type (OR: 2.816, p = 0.000), lymphovascular invasion (LVI) (OR: 0.160, p = 0.000), and menopause (OR: 0.296, p = 0.009) were independent risk factors for female EGC patients. For male EGC patients, tumor size (OR: 1.298, p = 0.007), depth of invasion (OR: 0.257, p = 0.000), tumor location (OR: 0.659, p = 0.002), WHO type (OR: 1.419, p = 0.001), Lauren type (OR: 3.099, p = 0.000), and LVI (OR: 0.131, p = 0.000) were independent risk factors. Moreover, nomograms were established to predict the risk of LNM for female and male EGC patients, respectively. The area under the ROC curve of nomograms for female and male training sets were 87.7% (95% confidence interval (CI): 0.8397–0.914) and 94.8% (95% CI: 0.9273–0.9695), respectively. For the validation set, they were 92.4% (95% CI: 0.7979–1) and 93.4% (95% CI: 0.8928–0.9755), respectively. Additionally, the calibration curves showed good agreements between the bias-corrected prediction and the ideal reference line for both training sets and validation sets in female and male EGC patients.ConclusionsNomograms based on risk factors for LNM in male and female EGC patients may provide new insights into the selection of appropriate treatment methods.


Author(s):  
Andrey Parasich ◽  
Victor Parasich ◽  
Irina Parasich

Introduction: Proper training set formation is a key factor in machine learning. In real training sets, problems and errors commonly occur, having a critical impact on the training result. Training set need to be formed in all machine learning problems; therefore, knowledge of possible difficulties will be helpful. Purpose: Overview of possible problems in the formation of a training set, in order to facilitate their detection and elimination when working with real training sets. Analyzing the impact of these problems on the results of the training.  Results: The article makes on overview of possible errors in training set formation, such as lack of data, imbalance, false patterns, sampling from a limited set of sources, change in the general population over time, and others. We discuss the influence of these errors on the result of the training, test set formation, and training algorithm quality measurement. The pseudo-labeling, data augmentation, and hard samples mining are considered the most effective ways to expand a training set. We offer practical recommendations for the formation of a training or test set. Examples from the practice of Kaggle competitions are given. For the problem of cross-dataset generalization in neural network training, we propose an algorithm called Cross-Dataset Machine, which is simple to implement and allows you to get a gain in cross-dataset generalization. Practical relevance: The materials of the article can be used as a practical guide in solving machine learning problems.


Mathematics ◽  
2021 ◽  
Vol 9 (17) ◽  
pp. 2036
Author(s):  
Andreas Wichert

Probability theory is built around Kolmogorov’s axioms. To each event, a numerical degree of belief between 0 and 1 is assigned, which provides a way of summarizing the uncertainty. Kolmogorov’s probabilities of events are added, the sum of all possible events is one. The numerical degrees of belief can be estimated from a sample by its true fraction. The frequency of an event in a sample is counted and normalized resulting in a linear relation. We introduce quantum-like sampling. The resulting Kolmogorov’s probabilities are in a sigmoid relation. The sigmoid relation offers a better importability since it induces the bell-shaped distribution, it leads also to less uncertainty when computing the Shannon’s entropy. Additionally, we conducted 100 empirical experiments by quantum-like sampling 100 times a random training sets and validation sets out of the Titanic data set using the Naïve Bayes classifier. In the mean the accuracy increased from 78.84% to 79.46%.


Sign in / Sign up

Export Citation Format

Share Document