large data Latest Research Papers

2022 ◽

Vol 11 (3) ◽

pp. 0-0

Keyword(s):

Big Data ◽

Data Analytics ◽

Large Data ◽

The Novel ◽

Analytical Technique ◽

Huge Data ◽

Costs Of Treatment ◽

Layered Approach ◽

New Challenges

Emergence of big data in today’s world leads to new challenges for sorting strategies to analyze the data in a better way. For most of the analyzing technique, sorting is considered as an implicit attribute of the technique used. The availability of huge data has changed the way data is analyzed across industries. Healthcare is one of the notable areas where data analytics is making big changes. An efficient analysis has the potential to reduce costs of treatment and improve the quality of life in general. Healthcare industries are collecting massive amounts of data and look for the best strategies to use these numbers. This research proposes a novel non-comparison based approach to sort a large data that can further be utilized by any big data analytical technique for various analyses.

Download Full-text

Subsampling and Jackknifing: A Practically Convenient Solution for Large Data Analysis With Limited Computational Resources

Statistica Sinica ◽

10.5705/ss.202021.0257 ◽

2023 ◽

Author(s):

Shuyuan Wu ◽

Xuening Zhu ◽

Hansheng Wang

Keyword(s):

Data Analysis ◽

Large Data ◽

Large Data Analysis ◽

Computational Resources

Download Full-text

SGTools: a suite of tools for processing and analyzing large data sets from in situ X-ray scattering experiments

Journal of Applied Crystallography ◽

10.1107/s1600576721012267 ◽

2022 ◽

Vol 55 (1) ◽

Author(s):

Nie Zhao ◽

Chunming Yang ◽

Fenggang Bian ◽

Daoyou Guo ◽

Xiaoping Ouyang

Keyword(s):

Data Processing ◽

Small Angle ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

X Ray ◽

Intensity Mapping ◽

X Ray Scattering ◽

Ray Scattering

In situ synchrotron small-angle X-ray scattering (SAXS) is a powerful tool for studying dynamic processes during material preparation and application. The processing and analysis of large data sets generated from in situ X-ray scattering experiments are often tedious and time consuming. However, data processing software for in situ experiments is relatively rare, especially for grazing-incidence small-angle X-ray scattering (GISAXS). This article presents an open-source software suite (SGTools) to perform data processing and analysis for SAXS and GISAXS experiments. The processing modules in this software include (i) raw data calibration and background correction; (ii) data reduction by multiple methods; (iii) animation generation and intensity mapping for in situ X-ray scattering experiments; and (iv) further data analysis for the sample with an order degree and interface correlation. This article provides the main features and framework of SGTools. The workflow of the software is also elucidated to allow users to develop new features. Three examples are demonstrated to illustrate the use of SGTools for dealing with SAXS and GISAXS data. Finally, the limitations and future features of the software are also discussed.

Download Full-text

Climatological characteristics of sea breeze parameters at Chennai

MAUSAM ◽

10.54302/mausam.v53i1.1615 ◽

2022 ◽

Vol 53 (1) ◽

pp. 31-44

Author(s):

Y. E. A. RAJ ◽

P. V. SANKARAN ◽

B. RAMAKRISHNAN ◽

P. L. PADMAKUMAR

Keyword(s):

Relative Humidity ◽

Surface Temperature ◽

Diurnal Variation ◽

Large Data ◽

Sea Breeze ◽

Large Data Base ◽

Superposed Epoch Analysis ◽

Normal Speed ◽

Epoch Analysis ◽

Chennai City

Several sea breeze parameters such as time of onset, withdrawal, duration, depth, variation with height, direction etc. have been derived and studied for Chennai city and Chennai AP observatories in this study, which has been based on a large data base for the period March-October,1969-83. The monthly and sub monthly values of several sea breeze parameters have been derived. By invoking the concept of superposed epoch analysis the important role played by sea breeze in modulating diurnal variation of surface temperature and relative humidity has been established. The sea breeze at Chennai has been shown to be shallow with a depth of under 1 km. Modal directions of sea breeze and its normal speed have been derived.

Download Full-text

COMPARING DIFFERENT TECHNIQUES USED BY COMPANIES INTERGRATING AI IN LITIGATION Kevin Muriithi Mirera

10.31234/osf.io/nzy86 ◽

2022 ◽

Author(s):

Kevin Muriithi Mirera

Keyword(s):

Artificial Intelligence ◽

Data Mining ◽

Language Processing ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Plain Text ◽

Learning Techniques ◽

Important Field ◽

Case Characteristics

Data mining is a way to extract knowledge out of generally large data sets; in other words, it is an approach to discover hidden relationships among data by using artificial intelligence methods. This has made it an important field in research. Law is one of the most important fields for applying data mining given the plethora of data from law cases stenographer data to lawsuit data. Text summarization in NLP (Natural Language Processing) is the process of summarizing the information on large texts for quicker consumption it is an extremely useful technique in NLP. Identifying law case characteristics is the first step for developing further analysis. An approach based on data mining techniques is discussed in this paper to extract important entities from law cases which are written in plain text. The process will involve different Artificial intelligence techniques including clustering or other unsupervised or supervised learning techniques.

Download Full-text

Manifold Alignment Aware Ants: A Markovian Process for Manifold Extraction

Neural Computation ◽

10.1162/neco_a_01478 ◽

2022 ◽

pp. 1-47

Author(s):

Mohammad Mohammadi ◽

Peter Tino ◽

Kerstin Bunte

Keyword(s):

Background Noise ◽

Globular Clusters ◽

Large Data ◽

Local Alignment ◽

Density Estimator ◽

Data Sets ◽

Real World Data ◽

Data Points ◽

Low Dimensional ◽

Food Seeking

Abstract The presence of manifolds is a common assumption in many applications, including astronomy and computer vision. For instance, in astronomy, low-dimensional stellar structures, such as streams, shells, and globular clusters, can be found in the neighborhood of big galaxies such as the Milky Way. Since these structures are often buried in very large data sets, an algorithm, which can not only recover the manifold but also remove the background noise (or outliers), is highly desirable. While other works try to recover manifolds either by pushing all points toward manifolds or by downsampling from dense regions, aiming to solve one of the problems, they generally fail to suppress the noise on manifolds and remove background noise simultaneously. Inspired by the collective behavior of biological ants in food-seeking process, we propose a new algorithm that employs several random walkers equipped with a local alignment measure to detect and denoise manifolds. During the walking process, the agents release pheromone on data points, which reinforces future movements. Over time the pheromone concentrates on the manifolds, while it fades in the background noise due to an evaporation procedure. We use the Markov chain (MC) framework to provide a theoretical analysis of the convergence of the algorithm and its performance. Moreover, an empirical analysis, based on synthetic and real-world data sets, is provided to demonstrate its applicability in different areas, such as improving the performance of t-distributed stochastic neighbor embedding (t-SNE) and spectral clustering using the underlying MC formulas, recovering astronomical low-dimensional structures, and improving the performance of the fast Parzen window density estimator.

Download Full-text

dnabarcoder: an open-source software package for analyzing and predicting DNA sequence similarity cut-offs for fungal sequence identification

10.22541/au.164201896.67817672/v1 ◽

2022 ◽

Author(s):

Duong Vu ◽

Henrik Nilsson ◽

Gerard Verkley

Keyword(s):

Sequence Similarity ◽

Dna Barcode ◽

Large Data ◽

Resolving Power ◽

Local Similarity ◽

Confidence Measure ◽

Sequence Identification ◽

Accuracy And Precision ◽

Open Source Software Package ◽

Fungal Its

The accuracy and precision of fungal molecular identification and classification are challenging, particularly in environmental metabarcoding approaches as these often trade accuracy for efficiency given the large data volumes at hand. In most ecological studies, only a single similarity cut-off value is used for sequence identification. This is not sufficient since the most commonly used DNA markers are known to vary widely in terms of inter- and intra-specific variability. We address this problem by presenting a new tool, dnabarcoder, to analyze and predict different local similarity cut-offs for sequence identification for different clades of fungi. For each similarity cut-off in a clade, a confidence measure is computed to evaluate the resolving power of the genetic marker in that clade. Experimental results showed that when analyzing a recently released filamentous fungal ITS DNA barcode dataset of CBS strains from the Westerdijk Fungal Biodiversity Institute, the predicted local similarity cut-offs varied immensely between the clades of the dataset. In addition, most of them had a higher confidence measure than the global similarity cut-off predicted for the whole dataset. When classifying a large public fungal ITS dataset – the UNITE database - against the barcode dataset, the local similarity cut-offs assigned fewer sequences than the traditional cut-offs used in metabarcoding studies. However, the obtained accuracy and precision were significantly improved.

Download Full-text

Predicting Physical Appearance from DNA Data—Towards Genomic Solutions

Genes ◽

10.3390/genes13010121 ◽

2022 ◽

Vol 13 (1) ◽

pp. 121

Author(s):

Ewelina Pośpiech ◽

Paweł Teisseyre ◽

Jan Mielniczuk ◽

Wojciech Branicki

Keyword(s):

Practical Importance ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Whole Genome ◽

Forensic Dna ◽

Sequencing Technologies ◽

Age Factor ◽

High Heritability ◽

Slow Progress

The idea of forensic DNA intelligence is to extract from genomic data any information that can help guide the investigation. The clues to the externally visible phenotype are of particular practical importance. The high heritability of the physical phenotype suggests that genetic data can be easily predicted, but this has only become possible with less polygenic traits. The forensic community has developed DNA-based predictive tools by employing a limited number of the most important markers analysed with targeted massive parallel sequencing. The complexity of the genetics of many other appearance phenotypes requires big data coupled with sophisticated machine learning methods to develop accurate genomic predictors. A significant challenge in developing universal genomic predictive methods will be the collection of sufficiently large data sets. These should be created using whole-genome sequencing technology to enable the identification of rare DNA variants implicated in phenotype determination. It is worth noting that the correctness of the forensic sketch generated from the DNA data depends on the inclusion of an age factor. This, however, can be predicted by analysing epigenetic data. An important limitation preventing whole-genome approaches from being commonly used in forensics is the slow progress in the development and implementation of high-throughput, low DNA input sequencing technologies. The example of palaeoanthropology suggests that such methods may possibly be developed in forensics.

Download Full-text

Benchmarking missing-values approaches for predictive models on health databases v2

10.17504/protocols.io.b3nfqmbn ◽

2022 ◽

Author(s):

Alexandre Perez-Lebel ◽

Gaël Varoquaux ◽

Marine Le Morvan ◽

Julie Josse ◽

Jean-Baptiste Poline

Keyword(s):

Machine Learning ◽

Predictive Models ◽

Missing Values ◽

State Of The Art ◽

Computational Cost ◽

Large Data ◽

Supervised Machine Learning ◽

Computational Time ◽

Generative Modeling ◽

Predictive Approaches

BACKGROUND As databases grow larger, it becomes harder to fully control their collection, and they frequently come with missing values: incomplete observations. These large databases are well suited to train machine-learning models, for instance for forecasting or to extract biomarkers in biomedical settings. Such predictive approaches can use discriminative --rather than generative-- modeling, and thus open the door to new missing-values strategies. Yet existing empirical evaluations of strategies to handle missing values have focused on inferential statistics. RESULTS Here we conduct a systematic benchmark of missing-values strategies in predictive models with a focus on large health databases: four electronic health record datasets, a population brain imaging one, a health survey and two intensive care ones. Using gradient-boosted trees, we compare native support for missing values with simple and state-of-the-art imputation prior to learning. We investigate prediction accuracy and computational time. For prediction after imputation, we find that adding an indicator to express which values have been imputed is important, suggesting that the data are missing not at random. Elaborate missing values imputation can improve prediction compared to simple strategies but requires longer computational time on large data. Learning trees that model missing values --with missing incorporated attribute-- leads to robust, fast, and well-performing predictive modeling. CONCLUSIONS Native support for missing values in supervised machine learning predicts better than state-of-the-art imputation with much less computational cost. When using imputation, it is important to add indicator columns expressing which values have been imputed.

Download Full-text

Intra-Operative Electron Radiation Therapy (IOERT) Anticipated Boost in Breast Cancer Treatment: An Italian Multicenter Experience

Cancers ◽

10.3390/cancers14020292 ◽

2022 ◽

Vol 14 (2) ◽

pp. 292

Author(s):

Antonella Ciabattoni ◽

Fabiana Gregucci ◽

Karen Llange ◽

Marina Alessandro ◽

Francesca Corazzi ◽

...

Keyword(s):

Breast Cancer ◽

Local Control ◽

Early Stage ◽

Disease Free Survival ◽

Large Data ◽

Late Toxicity ◽

Conservative Surgery ◽

Early Stage Breast Cancer ◽

Cosmetic Result ◽

Tumor Bed

In breast cancer, the use of a boost to the tumor bed can improve local control. The aim of this research is to evaluate the safety and efficacy of the boost with intra-operative electron radiotherapy (IOERT) in patients with early-stage breast cancer undergoing conservative surgery and postoperative whole breast irradiation (WBI). The present retrospective multicenter large data were collected between January 2011 and March 2018 in 8 Italian Radiation Oncology Departments. Acute and late toxicity, objective (obj) and subjective (subj) cosmetic outcomes, in-field local control (LC), out-field LC, disease-free survival (DFS) and overall survival (OS) were evaluated. Overall, 797 patients were enrolled. IOERT-boost was performed in all patients during surgery, followed by WBI. Acute toxicity (≥G2) occurred in 179 patients (22.46%); one patient developed surgical wound infection (G3). No patients reported late toxicity ≥ G2. Obj-cosmetic result was excellent in 45%, good in 35%, fair in 20% and poor in 0% of cases. Subj-cosmetic result was excellent in 10%, good in 20%, fair in 69% and poor in 0.3% of cases. Median follow-up was 57 months (range 12–109 months). At 5 years, in-field LC was 99.2% (95% CI: 98–99.7); out-field LC 98.9% (95% CI: 97.4–99.6); DFS 96.2% (95% CI: 94.2–97.6); OS 98.6% (95% CI: 97.2–99.3). In conclusion, IOERT-boost appears to be safe, providing excellent local control for early-stage breast cancer. The safety and long-term efficacy should encourage use of this treatment, with the potential to reduce local recurrence.

Download Full-text

large data
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

The Novel Multi-layered Approach to Enhance Sorting Performance of Healthcare Analysis

Subsampling and Jackknifing: A Practically Convenient Solution for Large Data Analysis With Limited Computational Resources

SGTools: a suite of tools for processing and analyzing large data sets from in situ X-ray scattering experiments

Climatological characteristics of sea breeze parameters at Chennai

COMPARING DIFFERENT TECHNIQUES USED BY COMPANIES INTERGRATING AI IN LITIGATION Kevin Muriithi Mirera

Manifold Alignment Aware Ants: A Markovian Process for Manifold Extraction

dnabarcoder: an open-source software package for analyzing and predicting DNA sequence similarity cut-offs for fungal sequence identification

Predicting Physical Appearance from DNA Data—Towards Genomic Solutions

Benchmarking missing-values approaches for predictive models on health databases v2

Intra-Operative Electron Radiation Therapy (IOERT) Anticipated Boost in Breast Cancer Treatment: An Italian Multicenter Experience

Export Citation Format

large dataRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

The Novel Multi-layered Approach to Enhance Sorting Performance of Healthcare Analysis

Subsampling and Jackknifing: A Practically Convenient Solution for Large Data Analysis With Limited Computational Resources

SGTools: a suite of tools for processing and analyzing large data sets from in situ X-ray scattering experiments

Climatological characteristics of sea breeze parameters at Chennai

COMPARING DIFFERENT TECHNIQUES USED BY COMPANIES INTERGRATING AI IN LITIGATION Kevin Muriithi Mirera

Manifold Alignment Aware Ants: A Markovian Process for Manifold Extraction

dnabarcoder: an open-source software package for analyzing and predicting DNA sequence similarity cut-offs for fungal sequence identification

Predicting Physical Appearance from DNA Data—Towards Genomic Solutions

Benchmarking missing-values approaches for predictive models on health databases v2

Intra-Operative Electron Radiation Therapy (IOERT) Anticipated Boost in Breast Cancer Treatment: An Italian Multicenter Experience

large data
Recently Published Documents