large data sets Latest Research Papers

SGTools: a suite of tools for processing and analyzing large data sets from in situ X-ray scattering experiments

Journal of Applied Crystallography ◽

10.1107/s1600576721012267 ◽

2022 ◽

Vol 55 (1) ◽

Author(s):

Nie Zhao ◽

Chunming Yang ◽

Fenggang Bian ◽

Daoyou Guo ◽

Xiaoping Ouyang

Keyword(s):

Data Processing ◽

Small Angle ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

X Ray ◽

Intensity Mapping ◽

X Ray Scattering ◽

Ray Scattering

In situ synchrotron small-angle X-ray scattering (SAXS) is a powerful tool for studying dynamic processes during material preparation and application. The processing and analysis of large data sets generated from in situ X-ray scattering experiments are often tedious and time consuming. However, data processing software for in situ experiments is relatively rare, especially for grazing-incidence small-angle X-ray scattering (GISAXS). This article presents an open-source software suite (SGTools) to perform data processing and analysis for SAXS and GISAXS experiments. The processing modules in this software include (i) raw data calibration and background correction; (ii) data reduction by multiple methods; (iii) animation generation and intensity mapping for in situ X-ray scattering experiments; and (iv) further data analysis for the sample with an order degree and interface correlation. This article provides the main features and framework of SGTools. The workflow of the software is also elucidated to allow users to develop new features. Three examples are demonstrated to illustrate the use of SGTools for dealing with SAXS and GISAXS data. Finally, the limitations and future features of the software are also discussed.

COMPARING DIFFERENT TECHNIQUES USED BY COMPANIES INTERGRATING AI IN LITIGATION Kevin Muriithi Mirera

10.31234/osf.io/nzy86 ◽

2022 ◽

Author(s):

Kevin Muriithi Mirera

Keyword(s):

Artificial Intelligence ◽

Data Mining ◽

Language Processing ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Plain Text ◽

Learning Techniques ◽

Important Field ◽

Case Characteristics

Data mining is a way to extract knowledge out of generally large data sets; in other words, it is an approach to discover hidden relationships among data by using artificial intelligence methods. This has made it an important field in research. Law is one of the most important fields for applying data mining given the plethora of data from law cases stenographer data to lawsuit data. Text summarization in NLP (Natural Language Processing) is the process of summarizing the information on large texts for quicker consumption it is an extremely useful technique in NLP. Identifying law case characteristics is the first step for developing further analysis. An approach based on data mining techniques is discussed in this paper to extract important entities from law cases which are written in plain text. The process will involve different Artificial intelligence techniques including clustering or other unsupervised or supervised learning techniques.

Predicting Physical Appearance from DNA Data—Towards Genomic Solutions

Genes ◽

10.3390/genes13010121 ◽

2022 ◽

Vol 13 (1) ◽

pp. 121

Author(s):

Ewelina Pośpiech ◽

Paweł Teisseyre ◽

Jan Mielniczuk ◽

Wojciech Branicki

Keyword(s):

Practical Importance ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Whole Genome ◽

Forensic Dna ◽

Sequencing Technologies ◽

Age Factor ◽

High Heritability ◽

Slow Progress

The idea of forensic DNA intelligence is to extract from genomic data any information that can help guide the investigation. The clues to the externally visible phenotype are of particular practical importance. The high heritability of the physical phenotype suggests that genetic data can be easily predicted, but this has only become possible with less polygenic traits. The forensic community has developed DNA-based predictive tools by employing a limited number of the most important markers analysed with targeted massive parallel sequencing. The complexity of the genetics of many other appearance phenotypes requires big data coupled with sophisticated machine learning methods to develop accurate genomic predictors. A significant challenge in developing universal genomic predictive methods will be the collection of sufficiently large data sets. These should be created using whole-genome sequencing technology to enable the identification of rare DNA variants implicated in phenotype determination. It is worth noting that the correctness of the forensic sketch generated from the DNA data depends on the inclusion of an age factor. This, however, can be predicted by analysing epigenetic data. An important limitation preventing whole-genome approaches from being commonly used in forensics is the slow progress in the development and implementation of high-throughput, low DNA input sequencing technologies. The example of palaeoanthropology suggests that such methods may possibly be developed in forensics.

Deep Learning Approaches for Sentiment Analysis Challenges and Future Issues

10.4018/978-1-7998-8161-2.ch003 ◽

2022 ◽

pp. 27-50

Author(s):

Rajalaxmi Prabhu B. ◽

Seema S.

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Model Building ◽

Large Data ◽

Machine Learning Algorithms ◽

Large Data Sets ◽

Data Sets ◽

Learning Approaches ◽

Learning Techniques ◽

Important Challenge

A lot of user-generated data is available these days from huge platforms, blogs, websites, and other review sites. These data are usually unstructured. Analyzing sentiments from these data automatically is considered an important challenge. Several machine learning algorithms are implemented to check the opinions from large data sets. A lot of research has been undergone in understanding machine learning approaches to analyze sentiments. Machine learning mainly depends on the data required for model building, and hence, suitable feature exactions techniques also need to be carried. In this chapter, several deep learning approaches, its challenges, and future issues will be addressed. Deep learning techniques are considered important in predicting the sentiments of users. This chapter aims to analyze the deep-learning techniques for predicting sentiments and understanding the importance of several approaches for mining opinions and determining sentiment polarity.

K-Prototype Algorithm for Clustering Large Data Sets with Categorical Values to Established Product Segmentation

Proceedings of Data Analytics and Management - Lecture Notes on Data Engineering and Communications Technologies ◽

10.1007/978-981-16-6289-8_29 ◽

2022 ◽

pp. 343-353

Author(s):

Ritu Punhani ◽

V. P. S. Arora ◽

A. Sai Sabitha

Keyword(s):

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Product Segmentation

Big Data Analytics and IoT in Smart City Applications

10.4018/978-1-6684-3662-2.ch017 ◽

2022 ◽

pp. 364-380

Author(s):

Mamata Rath

Keyword(s):

Big Data ◽

Smart City ◽

Data Analytics ◽

Big Data Analytics ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Business Information ◽

Data Querying ◽

Data Elements

Big data analytics is a sophisticated approach for fusion of large data sets that include a collection of data elements to expose hidden prototype, undetected associations, showcase business logic, client inclinations, and other helpful business information. Big data analytics involves challenging techniques to mine and extract relevant data that includes the actions of penetrating a database, effectively mining the data, querying and inspecting data committed to enhance the technical execution of various task segments. The capacity to synthesize a lot of data can enable an association to manage impressive data that can influence the business.

Max stable set problem to found the initial centroids in clustering problem

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v25.i1.pp569-579 ◽

2022 ◽

Vol 25 (1) ◽

pp. 569

Author(s):

Awatif Karim ◽

Chakir Loqman ◽

Youssef Hami ◽

Jaouad Boumhidi

Keyword(s):

Document Clustering ◽

Large Data ◽

Hopfield Network ◽

Large Data Sets ◽

Stable Set ◽

Data Sets ◽

Clustering Problem ◽

Text Document ◽

Stable Set Problem

In this paper, we propose a new approach to solve the document-clustering using the K-Means algorithm. The latter is sensitive to the random selection of the k cluster centroids in the initialization phase. To evaluate the quality of K-Means clustering we propose to model the text document clustering problem as the max stable set problem (MSSP) and use continuous Hopfield network to solve the MSSP problem to have initial centroids. The idea is inspired by the fact that MSSP and clustering share the same principle, MSSP consists to find the largest set of nodes completely disconnected in a graph, and in clustering, all objects are divided into disjoint clusters. Simulation results demonstrate that the proposed K-Means improved by MSSP (KM_MSSP) is efficient of large data sets, is much optimized in terms of time, and provides better quality of clustering than other methods.

The Mother’s Role in Mother-Child Relationship Among Adolescents with Unwanted Pregnancies

‘Abqari Journal ◽

10.33102/abqari.vol25no2.462 ◽

2021 ◽

Vol 25 (2) ◽

pp. 1-20

Author(s):

Noradila Mohamed Faudzi ◽

Melati Sumari ◽

Azmawaty Mohamad Nor ◽

Norhafisah Abd Rahman

Keyword(s):

Family Structure ◽

Thematic Analysis ◽

Large Data ◽

Phenomenological Approach ◽

Large Data Sets ◽

Data Sets ◽

Pregnant Adolescents ◽

Child Relationship ◽

Unwanted Pregnancies ◽

Mother Child Relationship

The mother’s role is essential in an adolescent’s development due to the challenges of life and exposure to the outside world, which affect and constantly change the mother’s role. This study intends to explore the experiences of the mother’s roles in the mother-child relationship among adolescents with unwanted pregnancies. A phenomenological approach was employed to obtain the essence of the experiences. A total of 10 participants comprising of five pregnant adolescents and their mothers were interviewed to understand the role played by the adolescents’ mothers during the pregnancy. A diary was distributed among the adolescents to allow them to externalise and express the experiences that they had with their mothers while being pregnant. This study used thematic analysis because it is flexible in interpreting the data and allows to approach large data sets more easily by sorting them into broad themes. Five themes emerged as follows: (a) supervising and monitoring, (b) rules and regulations, (c) showing affection, (d) educating adolescents, and (e) giving encouragement and support. This study provided insights on the mothers’ struggles in raising their adolescents which were highlighted from two perspectives: adolescents and mothers. The findings revealed the challenges faced by the mothers with various types of family structure.

Detecting Unbiased Associations in Large Data Sets

Big Data ◽

10.1089/big.2021.0193 ◽

2021 ◽

Author(s):

Chuanlu Liu ◽

Shuliang Wang ◽

Hanning Yuan ◽

Xiaojia Liu

Keyword(s):

Large Data ◽

Large Data Sets ◽

Data Sets

A system for analyzing large data sets using machine learning algorithms

Bulletin of Kharkov National Automobile and Highway University ◽

10.30977/bul.2219-5548.2021.94.0.142 ◽

2021 ◽

pp. 142

Author(s):

Sergey Pronin ◽

Mykhailo Miroshnichenko

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Large Data ◽

Machine Learning Algorithms ◽

Large Data Sets ◽

Data Sets

A system for analyzing large data sets using machine learning algorithms

large data sets
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

SGTools: a suite of tools for processing and analyzing large data sets from in situ X-ray scattering experiments

COMPARING DIFFERENT TECHNIQUES USED BY COMPANIES INTERGRATING AI IN LITIGATION Kevin Muriithi Mirera

Predicting Physical Appearance from DNA Data—Towards Genomic Solutions

Deep Learning Approaches for Sentiment Analysis Challenges and Future Issues

K-Prototype Algorithm for Clustering Large Data Sets with Categorical Values to Established Product Segmentation

Big Data Analytics and IoT in Smart City Applications

Max stable set problem to found the initial centroids in clustering problem

The Mother’s Role in Mother-Child Relationship Among Adolescents with Unwanted Pregnancies

Detecting Unbiased Associations in Large Data Sets

A system for analyzing large data sets using machine learning algorithms

Export Citation Format

large data setsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

SGTools: a suite of tools for processing and analyzing large data sets from in situ X-ray scattering experiments

COMPARING DIFFERENT TECHNIQUES USED BY COMPANIES INTERGRATING AI IN LITIGATION Kevin Muriithi Mirera

Predicting Physical Appearance from DNA Data—Towards Genomic Solutions

Deep Learning Approaches for Sentiment Analysis Challenges and Future Issues

K-Prototype Algorithm for Clustering Large Data Sets with Categorical Values to Established Product Segmentation

Big Data Analytics and IoT in Smart City Applications

Max stable set problem to found the initial centroids in clustering problem

The Mother’s Role in Mother-Child Relationship Among Adolescents with Unwanted Pregnancies

Detecting Unbiased Associations in Large Data Sets

A system for analyzing large data sets using machine learning algorithms

large data sets
Recently Published Documents