large data
Recently Published Documents





2022 ◽  
Vol 11 (3) ◽  
pp. 0-0

Emergence of big data in today’s world leads to new challenges for sorting strategies to analyze the data in a better way. For most of the analyzing technique, sorting is considered as an implicit attribute of the technique used. The availability of huge data has changed the way data is analyzed across industries. Healthcare is one of the notable areas where data analytics is making big changes. An efficient analysis has the potential to reduce costs of treatment and improve the quality of life in general. Healthcare industries are collecting massive amounts of data and look for the best strategies to use these numbers. This research proposes a novel non-comparison based approach to sort a large data that can further be utilized by any big data analytical technique for various analyses.

2022 ◽  
Vol 55 (1) ◽  
Nie Zhao ◽  
Chunming Yang ◽  
Fenggang Bian ◽  
Daoyou Guo ◽  
Xiaoping Ouyang

In situ synchrotron small-angle X-ray scattering (SAXS) is a powerful tool for studying dynamic processes during material preparation and application. The processing and analysis of large data sets generated from in situ X-ray scattering experiments are often tedious and time consuming. However, data processing software for in situ experiments is relatively rare, especially for grazing-incidence small-angle X-ray scattering (GISAXS). This article presents an open-source software suite (SGTools) to perform data processing and analysis for SAXS and GISAXS experiments. The processing modules in this software include (i) raw data calibration and background correction; (ii) data reduction by multiple methods; (iii) animation generation and intensity mapping for in situ X-ray scattering experiments; and (iv) further data analysis for the sample with an order degree and interface correlation. This article provides the main features and framework of SGTools. The workflow of the software is also elucidated to allow users to develop new features. Three examples are demonstrated to illustrate the use of SGTools for dealing with SAXS and GISAXS data. Finally, the limitations and future features of the software are also discussed.

2022 ◽  
Vol 53 (1) ◽  
pp. 31-44
Y. E. A. RAJ ◽  

Several sea breeze parameters such as time of onset, withdrawal, duration, depth, variation with height, direction etc.  have been derived and studied for Chennai city and Chennai AP observatories in this study, which has been based on a large  data base for the period March-October,1969-83. The monthly and sub monthly values of several sea breeze parameters have been derived. By invoking the concept of superposed epoch analysis the important role played by sea breeze in modulating diurnal variation of surface temperature and relative humidity has been established. The sea breeze at Chennai has been shown to be shallow with a depth of under 1 km. Modal directions of sea breeze and its normal speed have been  derived.

2022 ◽  
Kevin Muriithi Mirera

Data mining is a way to extract knowledge out of generally large data sets; in other words, it is an approach to discover hidden relationships among data by using artificial intelligence methods. This has made it an important field in research. Law is one of the most important fields for applying data mining given the plethora of data from law cases stenographer data to lawsuit data. Text summarization in NLP (Natural Language Processing) is the process of summarizing the information on large texts for quicker consumption it is an extremely useful technique in NLP. Identifying law case characteristics is the first step for developing further analysis. An approach based on data mining techniques is discussed in this paper to extract important entities from law cases which are written in plain text. The process will involve different Artificial intelligence techniques including clustering or other unsupervised or supervised learning techniques.

2022 ◽  
pp. 1-47
Mohammad Mohammadi ◽  
Peter Tino ◽  
Kerstin Bunte

Abstract The presence of manifolds is a common assumption in many applications, including astronomy and computer vision. For instance, in astronomy, low-dimensional stellar structures, such as streams, shells, and globular clusters, can be found in the neighborhood of big galaxies such as the Milky Way. Since these structures are often buried in very large data sets, an algorithm, which can not only recover the manifold but also remove the background noise (or outliers), is highly desirable. While other works try to recover manifolds either by pushing all points toward manifolds or by downsampling from dense regions, aiming to solve one of the problems, they generally fail to suppress the noise on manifolds and remove background noise simultaneously. Inspired by the collective behavior of biological ants in food-seeking process, we propose a new algorithm that employs several random walkers equipped with a local alignment measure to detect and denoise manifolds. During the walking process, the agents release pheromone on data points, which reinforces future movements. Over time the pheromone concentrates on the manifolds, while it fades in the background noise due to an evaporation procedure. We use the Markov chain (MC) framework to provide a theoretical analysis of the convergence of the algorithm and its performance. Moreover, an empirical analysis, based on synthetic and real-world data sets, is provided to demonstrate its applicability in different areas, such as improving the performance of t-distributed stochastic neighbor embedding (t-SNE) and spectral clustering using the underlying MC formulas, recovering astronomical low-dimensional structures, and improving the performance of the fast Parzen window density estimator.

Duong Vu ◽  
Henrik Nilsson ◽  
Gerard Verkley

The accuracy and precision of fungal molecular identification and classification are challenging, particularly in environmental metabarcoding approaches as these often trade accuracy for efficiency given the large data volumes at hand. In most ecological studies, only a single similarity cut-off value is used for sequence identification. This is not sufficient since the most commonly used DNA markers are known to vary widely in terms of inter- and intra-specific variability. We address this problem by presenting a new tool, dnabarcoder, to analyze and predict different local similarity cut-offs for sequence identification for different clades of fungi. For each similarity cut-off in a clade, a confidence measure is computed to evaluate the resolving power of the genetic marker in that clade. Experimental results showed that when analyzing a recently released filamentous fungal ITS DNA barcode dataset of CBS strains from the Westerdijk Fungal Biodiversity Institute, the predicted local similarity cut-offs varied immensely between the clades of the dataset. In addition, most of them had a higher confidence measure than the global similarity cut-off predicted for the whole dataset. When classifying a large public fungal ITS dataset – the UNITE database - against the barcode dataset, the local similarity cut-offs assigned fewer sequences than the traditional cut-offs used in metabarcoding studies. However, the obtained accuracy and precision were significantly improved.

Genes ◽  
2022 ◽  
Vol 13 (1) ◽  
pp. 121
Ewelina Pośpiech ◽  
Paweł Teisseyre ◽  
Jan Mielniczuk ◽  
Wojciech Branicki

The idea of forensic DNA intelligence is to extract from genomic data any information that can help guide the investigation. The clues to the externally visible phenotype are of particular practical importance. The high heritability of the physical phenotype suggests that genetic data can be easily predicted, but this has only become possible with less polygenic traits. The forensic community has developed DNA-based predictive tools by employing a limited number of the most important markers analysed with targeted massive parallel sequencing. The complexity of the genetics of many other appearance phenotypes requires big data coupled with sophisticated machine learning methods to develop accurate genomic predictors. A significant challenge in developing universal genomic predictive methods will be the collection of sufficiently large data sets. These should be created using whole-genome sequencing technology to enable the identification of rare DNA variants implicated in phenotype determination. It is worth noting that the correctness of the forensic sketch generated from the DNA data depends on the inclusion of an age factor. This, however, can be predicted by analysing epigenetic data. An important limitation preventing whole-genome approaches from being commonly used in forensics is the slow progress in the development and implementation of high-throughput, low DNA input sequencing technologies. The example of palaeoanthropology suggests that such methods may possibly be developed in forensics.

2022 ◽  
Alexandre Perez-Lebel ◽  
Gaël Varoquaux ◽  
Marine Le Morvan ◽  
Julie Josse ◽  
Jean-Baptiste Poline

BACKGROUND As databases grow larger, it becomes harder to fully control their collection, and they frequently come with missing values: incomplete observations. These large databases are well suited to train machine-learning models, for instance for forecasting or to extract biomarkers in biomedical settings. Such predictive approaches can use discriminative --rather than generative-- modeling, and thus open the door to new missing-values strategies. Yet existing empirical evaluations of strategies to handle missing values have focused on inferential statistics. RESULTS Here we conduct a systematic benchmark of missing-values strategies in predictive models with a focus on large health databases: four electronic health record datasets, a population brain imaging one, a health survey and two intensive care ones. Using gradient-boosted trees, we compare native support for missing values with simple and state-of-the-art imputation prior to learning. We investigate prediction accuracy and computational time. For prediction after imputation, we find that adding an indicator to express which values have been imputed is important, suggesting that the data are missing not at random. Elaborate missing values imputation can improve prediction compared to simple strategies but requires longer computational time on large data. Learning trees that model missing values --with missing incorporated attribute-- leads to robust, fast, and well-performing predictive modeling. CONCLUSIONS Native support for missing values in supervised machine learning predicts better than state-of-the-art imputation with much less computational cost. When using imputation, it is important to add indicator columns expressing which values have been imputed.

Cancers ◽  
2022 ◽  
Vol 14 (2) ◽  
pp. 292
Antonella Ciabattoni ◽  
Fabiana Gregucci ◽  
Karen Llange ◽  
Marina Alessandro ◽  
Francesca Corazzi ◽  

In breast cancer, the use of a boost to the tumor bed can improve local control. The aim of this research is to evaluate the safety and efficacy of the boost with intra-operative electron radiotherapy (IOERT) in patients with early-stage breast cancer undergoing conservative surgery and postoperative whole breast irradiation (WBI). The present retrospective multicenter large data were collected between January 2011 and March 2018 in 8 Italian Radiation Oncology Departments. Acute and late toxicity, objective (obj) and subjective (subj) cosmetic outcomes, in-field local control (LC), out-field LC, disease-free survival (DFS) and overall survival (OS) were evaluated. Overall, 797 patients were enrolled. IOERT-boost was performed in all patients during surgery, followed by WBI. Acute toxicity (≥G2) occurred in 179 patients (22.46%); one patient developed surgical wound infection (G3). No patients reported late toxicity ≥ G2. Obj-cosmetic result was excellent in 45%, good in 35%, fair in 20% and poor in 0% of cases. Subj-cosmetic result was excellent in 10%, good in 20%, fair in 69% and poor in 0.3% of cases. Median follow-up was 57 months (range 12–109 months). At 5 years, in-field LC was 99.2% (95% CI: 98–99.7); out-field LC 98.9% (95% CI: 97.4–99.6); DFS 96.2% (95% CI: 94.2–97.6); OS 98.6% (95% CI: 97.2–99.3). In conclusion, IOERT-boost appears to be safe, providing excellent local control for early-stage breast cancer. The safety and long-term efficacy should encourage use of this treatment, with the potential to reduce local recurrence.

Sign in / Sign up

Export Citation Format

Share Document