RootProf: software for multivariate analysis of unidimensional profiles

2014 ◽  
Vol 47 (3) ◽  
pp. 1087-1096 ◽  
Author(s):  
Rocco Caliandro ◽  
Danilo Benny Belviso

RootProfis a multi-purpose program which implements multivariate analysis of unidimensional profiles. Series of measurements, performed on related samples or on the same sample by varying some external stimulus, are analysed to find trends in data, classify them and extract quantitative information. Qualitative analysis is performed by using principal component analysis or correlation analysis. In both cases the data set is projected in a latent variable space, where a clustering algorithm classifies data points. Group separation is quantified by statistical tools. Quantitative phase analysis of a series of profiles is implemented by whole-profile fitting or by an unfolding procedure, and relies on a variety of pre-processing methods. Supervised quantitative analysis can be applied, provideda prioriinformation on some samples is provided.RootProfcan be applied to measurements from different techniques, which can be combined by means of a covariance analysis. A specific analysis for powder diffraction data allows estimation of the average size of crystal domains.RootProfborrows its graphics and data analysis capabilities from the Root framework, developed for high-energy physics experiments.

2020 ◽  
Vol 3 ◽  
Author(s):  
Marco Rovere ◽  
Ziheng Chen ◽  
Antonio Di Pilato ◽  
Felice Pantaleo ◽  
Chris Seez

One of the challenges of high granularity calorimeters, such as that to be built to cover the endcap region in the CMS Phase-2 Upgrade for HL-LHC, is that the large number of channels causes a surge in the computing load when clustering numerous digitized energy deposits (hits) in the reconstruction stage. In this article, we propose a fast and fully parallelizable density-based clustering algorithm, optimized for high-occupancy scenarios, where the number of clusters is much larger than the average number of hits in a cluster. The algorithm uses a grid spatial index for fast querying of neighbors and its timing scales linearly with the number of hits within the range considered. We also show a comparison of the performance on CPU and GPU implementations, demonstrating the power of algorithmic parallelization in the coming era of heterogeneous computing in high-energy physics.


2008 ◽  
Vol 2 (6) ◽  
pp. 597 ◽  
Author(s):  
Jun Kawakami ◽  
Wilma M. Hopman ◽  
Rachael Smith-Tryon ◽  
D. Robert Siemens

Introduction: Reported increases in surgical wait times for cancer have intensified the focus on this quality of health care indicator and have created a very public, concerted effort by providers to decrease wait times for cancer surgeryin Ontario. Delays in access to health care are multifactorial and their measurement from existing administrative databases can lack pertinent detail. The purpose of our study was to use a real-time surgery-booking software program to examine surgical wait times at a single centre.Methods: The real-time wait list management system Axcess.Rx has been used exclusively by the department of urology at the Kingston General Hospital to book all nonemergency surgery for 4 years. We reviewed the length of time from the decision to perform surgery to the actual date of surgery for patients in our group urological practice. Variables thought to be potentially important in predicting wait time were also collected, including the surgeon’s assessment of urgency, the type of procedure (i.e., diagnostic, minor cancer, major cancer, minor benign, major benign), age and sex of the patient, inpatient versus outpatient status and year of surgery. Analysis was planned a priori to determine factors that affected wait time by using multivariate analysis to analyze variables that were significant in univariate analysis.Results: There were 960 operations for cancer and 1654 for benign conditions performed during the evaluation period. The overall mean wait time was 36 days for cancer and 47 days for benign conditions, respectively. The mean wait time for cancer surgery reached a nadir in 2004 at 29.9 days and subsequently increased every year, reaching 56 days in 2007. In comparison, benign surgery reached a nadir wait time of 33.7 days in 2004 and in 2007 reached 74 days at our institution. Multivariate analysis revealed that the year of surgery was still a significant predictor of wait time. Urgency score, type of procedure and inpatient versus outpatient status were also predictive of wait time.Conclusion: The application of a prospectively collected data set is an effective and important tool to measure and subsequently examine surgical wait times. This tool has been essential to the accurate assessment of the effect of resource allocation on wait times for priority and nonpriority surgical programs within a discipline. Such tools are necessary to more fully assess and follow wait times at an institution or across a region.


2010 ◽  
Vol 47 (9) ◽  
pp. 1227-1251 ◽  
Author(s):  
Lisa G. Buckley ◽  
Derek W. Larson ◽  
Miriam Reichel ◽  
Tanya Samman

Documenting variation in theropod dinosaurs is usually hindered by the lack of a large sample size and specimens representing several ontogenetic stages. Here, variation within 140 disassociated and seven in situ tyrannosaur teeth from the Upper Cretaceous (lower Maastrichtian) monodominant Albertosaurus sarcophagus (Theropoda: Tyrannosauridae) bonebed is documented. This sample represents the largest data set of teeth from one population of A. sarcophagus containing both adult and juvenile specimens. Tooth variation was assessed using multivariate analyses (principal component, discriminant, and canonical variate analyses). Heterodonty in the teeth of A. sarcophagus contributes to the large amount of variation in the data set. Premaxillary teeth are significantly different from maxillary and dentary teeth, but there is no quantifiable difference between a priori identified maxillary and dentary teeth. Juvenile and adult teeth of A. sarcophagus show apparent quantitative differences that are size dependent on closer investigation, suggesting a cautious approach when interpreting multivariate analyses to identify novel tooth morphologies. Multivariate analyses on teeth of A. sarcophagus and published tooth data from other North American tyrannosaurid species reveals species-level clusters with little separation. The degree of separation among tooth clusters may reveal a phylogenetic signal in tyrannosaurid teeth.


2017 ◽  
Vol 2017 ◽  
pp. 1-15 ◽  
Author(s):  
Xiaochen Zhang ◽  
Dongxiang Jiang ◽  
Te Han ◽  
Nanfei Wang ◽  
Wenguang Yang ◽  
...  

To diagnose rotating machinery fault for imbalanced data, a method based on fast clustering algorithm (FCA) and support vector machine (SVM) was proposed. Combined with variational mode decomposition (VMD) and principal component analysis (PCA), sensitive features of the rotating machinery fault were obtained and constituted the imbalanced fault sample set. Next, a fast clustering algorithm was adopted to reduce the number of the majority data from the imbalanced fault sample set. Consequently, the balanced fault sample set consisted of the clustered data and the minority data from the imbalanced fault sample set. After that, SVM was trained with the balanced fault sample set and tested with the imbalanced fault sample set so the fault diagnosis model of the rotating machinery could be obtained. Finally, the gearbox fault data set and the rolling bearing fault data set were adopted to test the fault diagnosis model. The experimental results showed that the fault diagnosis model could effectively diagnose the rotating machinery fault for imbalanced data.


2008 ◽  
Vol 57 (10) ◽  
pp. 1659-1666 ◽  
Author(s):  
Kris Villez ◽  
Magda Ruiz ◽  
Gürkan Sin ◽  
Joan Colomer ◽  
Christian Rosén ◽  
...  

A methodology based on Principal Component Analysis (PCA) and clustering is evaluated for process monitoring and process analysis of a pilot-scale SBR removing nitrogen and phosphorus. The first step of this method is to build a multi-way PCA (MPCA) model using the historical process data. In the second step, the principal scores and the Q-statistics resulting from the MPCA model are fed to the LAMDA clustering algorithm. This procedure is iterated twice. The first iteration provides an efficient and effective discrimination between normal and abnormal operational conditions. The second iteration of the procedure allowed a clear-cut discrimination of applied operational changes in the SBR history. Important to add is that this procedure helped identifying some changes in the process behaviour, which would not have been possible, had we only relied on visually inspecting this online data set of the SBR (which is traditionally the case in practice). Hence the PCA based clustering methodology is a promising tool to efficiently interpret and analyse the SBR process behaviour using large historical online data sets.


Water ◽  
2020 ◽  
Vol 12 (8) ◽  
pp. 2214 ◽  
Author(s):  
Giuseppina Ioele ◽  
Michele De Luca ◽  
Fedora Grande ◽  
Giacomina Durante ◽  
Raffaella Trozzo ◽  
...  

The water vulnerability of the Crati river (Calabria, Italy), was assessed by applying chemometric methods on a large number of analytical parameters. This study was applied to a data set collected in the years 2015–2016, recording 30 physical–chemical and geological parameters at 25 sampling points, measured both for water and for sediments. The processing of the data by principal component analysis (PCA) allowed for highlighting the influence of the components most responsible for pollution. The accumulation of heavy metals in the water was detected only in two samples near the source of the river. On the contrary, their concentration values in the sediments exceeded the legal limit in several sites, probably due to their proximity to urban areas. In this case, high concentrations of chromium, mercury and nickel were detected both at the mouth of the river and along the valley. Lead was only detected in one sediment sample. The multivariate analysis techniques proved to be very useful to completely characterize the areas surrounding a river course and facilitate the development of a risk map to monitor health risks to the local population.


1994 ◽  
Vol 348 ◽  
Author(s):  
S. E. Derenzo ◽  
W. W. Moses ◽  
M. J. Weber ◽  
A. C. West

ABSTRACTOver the years a number of scintillator materials have been developed for a wide variety of nuclear detection applications in industry, high energy physics, and medical instrumentation. To expand the list of useful scintillators, we are pursuing the following systematic, comprehensive search: (1) select materials with good gamma-ray interaction properties from the 200,000 data set NIST crystal diffraction file, (2) synthesize samples (doped and undoped) in powdered or single crystal form, (3) test the samples using sub-nanosecond pulsed x-rays to measure important scintillation properties such as rise times, decay times, emission wavelengths, and light output, (4) prepare large, high quality crystals of the most promising candidates, and (5) test the crystals as gamma-ray detectors in representative configurations. An important parallel effort is the computation of electronic energy levels of activators and the band structure of intrinsic and host crystals to aid in the materials selection process.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1134
Author(s):  
Torben Möller ◽  
Tim W. Nattkemper

In recent years, an increasing number of cabled Fixed Underwater Observatories (FUOs) have been deployed, many of them equipped with digital cameras recording high-resolution digital image time series for a given period. The manual extraction of quantitative information from these data regarding resident species is necessary to link the image time series information to data from other sensors but requires computational support to overcome the bottleneck problem in manual analysis. As a priori knowledge about the objects of interest in the images is almost never available, computational methods are required that are not dependent on the posterior availability of a large training data set of annotated images. In this paper, we propose a new strategy for collecting and using training data for machine learning-based observatory image interpretation much more efficiently. The method combines the training efficiency of a special active learning procedure with the advantages of deep learning feature representations. The method is tested on two highly disparate data sets. In our experiments, we can show that the proposed method ALMI achieves on one data set a classification accuracy A > 90% with less than N = 258 data samples and A > 80% after N = 150 iterations, i.e., training samples, on the other data set outperforming the reference method regarding accuracy and training data required.


2021 ◽  
Vol 2021 (3) ◽  
Author(s):  
Konstantin T. Matchev ◽  
Prasanth Shyamsundar

Abstract We provide a prescription called ThickBrick to train optimal machine-learning-based event selectors and categorizers that maximize the statistical significance of a potential signal excess in high energy physics (HEP) experiments, as quantified by any of six different performance measures. For analyses where the signal search is performed in the distribution of some event variables, our prescription ensures that only the information complementary to those event variables is used in event selection and categorization. This eliminates a major misalignment with the physics goals of the analysis (maximizing the significance of an excess) that exists in the training of typical ML-based event selectors and categorizers. In addition, this decorrelation of event selectors from the relevant event variables prevents the background distribution from becoming peaked in the signal region as a result of event selection, thereby ameliorating the challenges imposed on signal searches by systematic uncertainties. Our event selectors (categorizers) use the output of machine-learning-based classifiers as input and apply optimal selection cutoffs (categorization thresholds) that are functions of the event variables being analyzed, as opposed to flat cutoffs (thresholds). These optimal cutoffs and thresholds are learned iteratively, using a novel approach with connections to Lloyd’s k-means clustering algorithm. We provide a public, Python implementation of our prescription, also called ThickBrick, along with usage examples.


Nukleonika ◽  
2016 ◽  
Vol 61 (3) ◽  
pp. 357-360 ◽  
Author(s):  
Jelena Filipović ◽  
Dimitrije Maletić ◽  
Vladimir Udovičić ◽  
Radomir Banjanac ◽  
Dejan Joković ◽  
...  

Abstract The paper presents results of multivariate analysis of variations of radon concentrations in the shallow underground laboratory and a family house, depending on meteorological variables only. All available multivariate classification and regression methods, developed for data analysis in high-energy physics and implemented in the toolkit for multivariate analysis (TMVA) software package in ROOT, are used in the analysis. The result of multivariate regression analysis is a mapped functional behaviour of variations of radon concentration depending on meteorological variables only, which can be used for the evaluation of radon concentration, as well as to help with modelling of variation of radon concentration. The results of analysis of the radon concentration variations in the underground laboratory and real indoor environment, using multivariate methods, demonstrated the potential usefulness of these methods. Multivariate analysis showed that there is a potentially considerable prediction power of variations of indoor radon concentrations based on the knowledge of meteorological variables only. In addition, the online system using the resulting mapped functional behaviour for underground laboratory in the Institute of Physics Belgrade is implemented, and the resulting evaluation of radon concentrations are presented in this paper.


Sign in / Sign up

Export Citation Format

Share Document