structured data Latest Research Papers

Fast Quaternion Product Units for Learning Disentangled Representations in SO(3)

10.36227/techrxiv.17791574.v1 ◽

2022 ◽

Author(s):

Shaofei Qin ◽

Xuan Zhang ◽

Hongteng Xu ◽

Yi Xu

Keyword(s):

Neural Networks ◽

Real World ◽

Message Passing ◽

Point Clouds ◽

Rotation Group ◽

Structured Data ◽

Data Indexing ◽

Basic Module ◽

Real Neuron ◽

Rotation Groups

Real-world 3D structured data like point clouds and skeletons often can be represented as data in a 3D rotation group (denoted as $\mathbb{SO}(3)$). However, most existing neural networks are tailored for the data in the Euclidean space, which makes the 3D rotation data not closed under their algebraic operations and leads to sub-optimal performance in 3D-related learning tasks. To resolve the issues caused by the above mismatching between data and model, we propose a novel non-real neuron model called \textit{quaternion product unit} (QPU) to represent data on 3D rotation groups. The proposed QPU leverages quaternion algebra and the law of the 3D rotation group, representing 3D rotation data as quaternions and merging them via a weighted chain of Hamilton products. We demonstrate that the QPU mathematically maintains the $\mathbb{SO}(3)$ structure of the 3D rotation data during the inference process and disentangles the 3D representations into ``rotation-invariant'' features and ``rotation-equivariant'' features, respectively. Moreover, we design a fast QPU to accelerate the computation of QPU. The fast QPU applies a tree-structured data indexing process, and accordingly, leverages the power of parallel computing, which reduces the computational complexity of QPU in a single thread from $\mathcal{O}(N)$ to $\mathcal {O}(\log N)$. Taking the fast QPU as a basic module, we develop a series of quaternion neural networks (QNNs), including quaternion multi-layer perceptron (QMLP), quaternion message passing (QMP), and so on. In addition, we make the QNNs compatible with conventional real-valued neural networks and applicable for both skeletons and point clouds. Experiments on synthetic and real-world 3D tasks show that the QNNs based on our fast QPUs are superior to state-of-the-art real-valued models, especially in the scenarios requiring the robustness to random rotations.<br>

Download Full-text

Fast Quaternion Product Units for Learning Disentangled Representations in SO(3)

10.36227/techrxiv.17791574 ◽

2022 ◽

Author(s):

Shaofei Qin ◽

Xuan Zhang ◽

Hongteng Xu ◽

Yi Xu

Keyword(s):

Neural Networks ◽

Real World ◽

Message Passing ◽

Point Clouds ◽

Rotation Group ◽

Structured Data ◽

Data Indexing ◽

Basic Module ◽

Real Neuron ◽

Rotation Groups

Real-world 3D structured data like point clouds and skeletons often can be represented as data in a 3D rotation group (denoted as $\mathbb{SO}(3)$). However, most existing neural networks are tailored for the data in the Euclidean space, which makes the 3D rotation data not closed under their algebraic operations and leads to sub-optimal performance in 3D-related learning tasks. To resolve the issues caused by the above mismatching between data and model, we propose a novel non-real neuron model called \textit{quaternion product unit} (QPU) to represent data on 3D rotation groups. The proposed QPU leverages quaternion algebra and the law of the 3D rotation group, representing 3D rotation data as quaternions and merging them via a weighted chain of Hamilton products. We demonstrate that the QPU mathematically maintains the $\mathbb{SO}(3)$ structure of the 3D rotation data during the inference process and disentangles the 3D representations into ``rotation-invariant'' features and ``rotation-equivariant'' features, respectively. Moreover, we design a fast QPU to accelerate the computation of QPU. The fast QPU applies a tree-structured data indexing process, and accordingly, leverages the power of parallel computing, which reduces the computational complexity of QPU in a single thread from $\mathcal{O}(N)$ to $\mathcal {O}(\log N)$. Taking the fast QPU as a basic module, we develop a series of quaternion neural networks (QNNs), including quaternion multi-layer perceptron (QMLP), quaternion message passing (QMP), and so on. In addition, we make the QNNs compatible with conventional real-valued neural networks and applicable for both skeletons and point clouds. Experiments on synthetic and real-world 3D tasks show that the QNNs based on our fast QPUs are superior to state-of-the-art real-valued models, especially in the scenarios requiring the robustness to random rotations.<br>

Download Full-text

Dataset of antiarch placoderms (the most basal jawed vertebrates) throughout Middle Paleozoic

10.5194/essd-2021-394 ◽

2022 ◽

Author(s):

Zhaohui Pan ◽

Zhibin Niu ◽

Zumin Xian ◽

Min Zhu

Keyword(s):

Common Ancestor ◽

Data File ◽

Structured Data ◽

Last Common Ancestor ◽

Testing Hypotheses ◽

Middle Paleozoic ◽

Academic Publications ◽

Variation Rate ◽

Early Vertebrates ◽

Biodiversity Changes

Abstract. Antiarch placoderms, the most basal jawed vertebrates, have the potential to enlighten the origin of the last common ancestor of jawed vertebrates. Quantitative study based on credible data is more convincing than qualitative study. To reveal the antiarch distribution in space and time, we created a comprehensive structured dataset of antiarchs comprising 64 genera and 6025 records. This dataset, which includes associated chronological and geographic information, has been digitalized from academic publications manually into the DeepBone database as a dateset. We implemented the paleogeographic map marker to visualize the biogeography of antiarchs. The comprehensive data of Antiarcha allow us to generate its biodiversity and variation rate changes throughout its duration. Structured data of antiarchs has tremendous research potential, including testing hypotheses in the fields of the biodiversity changes, distribution, differentiation,population and community composition. Also, it will be easily accessible by the other tools to generate new understanding on the evolution of early vertebrates. The data file described in this paper is available on https://doi.org/10.5281/zenodo.5639529 (Pan and Zhu, 2021).

Download Full-text

Association Mining of Near Misses in Hydropower Engineering Construction Based on Convolutional Neural Network Text Classification

Computational Intelligence and Neuroscience ◽

10.1155/2022/4851615 ◽

2022 ◽

Vol 2022 ◽

pp. 1-16

Author(s):

Shu Chen ◽

Junbo Xi ◽

Yun Chen ◽

Jinfan Zhao

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Text Classification ◽

Structured Data ◽

Near Miss ◽

Economic Losses ◽

Near Misses ◽

Engineering Construction ◽

Hydropower Engineering ◽

Engineering Projects

Accidents of various types in the construction of hydropower engineering projects occur frequently, which leads to significant numbers of casualties and economic losses. Identifying and eliminating near misses are a significant means of preventing accidents. Mining near-miss data can provide valuable information on how to mitigate and control hazards. However, most of the data generated in the construction of hydropower engineering projects are semi-structured text data without unified standard expression, so data association analysis is time-consuming and labor-intensive. Thus, an artificial intelligence (AI) automatic classification method based on a convolutional neural network (CNN) is adopted to obtain structured data on near-miss locations and near-miss types from safety records. The apriori algorithm is used to further mine the associations between “locations” and “types” by scanning structured data. The association results are visualized using a network diagram. A Sankey diagram is used to reveal the information flow of near-miss specific objects using the “location ⟶ type” strong association rule. The proposed method combines text classification, association rules, and the Sankey diagrams and provides a novel approach for mining semi-structured text. Moreover, the method is proven to be useful and efficient for exploring near-miss distribution laws in hydropower engineering construction to reduce the possibility of accidents and efficiently improve the safety level of hydropower engineering construction sites.

Download Full-text

Tree-structured data placement scheme with cluster-aided top-down transmission in erasure-coded distributed storage systems

Computer Networks ◽

10.1016/j.comnet.2021.108714 ◽

2022 ◽

pp. 108714

Author(s):

Anan Zhou ◽

Benshun Yi ◽

Laigan Luo

Keyword(s):

Storage Systems ◽

Distributed Storage ◽

Data Placement ◽

Structured Data ◽

Top Down ◽

Distributed Storage Systems

Download Full-text

A Framework for Automated Scraping of Structured Data Records From the Deep Web Using Semantic Labeling

International Journal of Information Retrieval Research ◽

10.4018/ijirr.290830 ◽

2022 ◽

Vol 12 (1) ◽

pp. 1-18

Author(s):

Umamageswari Kumaresan ◽

Kalpana Ramanujam

Keyword(s):

Web Sites ◽

Syntactic Structure ◽

Structured Data ◽

Web Pages ◽

Semantic Labeling ◽

Repeated Pattern ◽

Computationally Intensive ◽

To Come ◽

String Pattern ◽

Informative Content

The intent of this research is to come up with an automated web scraping system which is capable of extracting structured data records embedded in semi-structured web pages. Most of the automated extraction techniques in the literature captures repeated pattern among a set of similarly structured web pages, thereby deducing the template used for the generation of those web pages and then data records extraction is done. All of these techniques exploit computationally intensive operations such as string pattern matching or DOM tree matching and then perform manual labeling of extracted data records. The technique discussed in this paper departs from the state-of-the-art approaches by determining informative sections in the web page through repetition of informative content rather than syntactic structure. From the experiments, it is clear that the system has identified data rich region with 100% precision for web sites belonging to different domains. The experiments conducted on the real world web sites prove the effectiveness and versatility of the proposed approach.

Download Full-text

Principal wave analysis for high-dimensional structured data with applications to epigenomics and neuroimaging studies

Statistics and Its Interface ◽

10.4310/20-sii658 ◽

2022 ◽

Vol 15 (2) ◽

pp. 225-236

Author(s):

Yuping Zhang

Keyword(s):

Structured Data ◽

High Dimensional ◽

Wave Analysis

Download Full-text

Chasm Between Cancer Quality Measures and Electronic Health Record Data Quality

JCO Clinical Cancer Informatics ◽

10.1200/cci.21.00128 ◽

2022 ◽

Author(s):

Anna E. Schorer ◽

Richard Moldwin ◽

Jacob Koskimaki ◽

Elmer V. Bernstam ◽

Neeta K. Venepalli ◽

...

Keyword(s):

Quality Measures ◽

Health Technology ◽

Structured Data ◽

Malignant Neoplasms ◽

Incentive Payment ◽

Electronic Health Record Data ◽

Health Records ◽

Record Data ◽

Electronic Health ◽

Data Elements

PURPOSE The Medicare Access and CHIP Reauthorization Act of 2015 (MACRA) requires eligible clinicians to report clinical quality measures (CQMs) in the Merit-Based Incentive Payment System (MIPS) to maximize reimbursement. To determine whether structured data in electronic health records (EHRs) were adequate to report MIPS CQMs, EHR data aggregated by ASCO's CancerLinQ platform were analyzed. MATERIALS AND METHODS Using the CancerLinQ health technology platform, 19 Oncology MIPS (oMIPS) CQMs were evaluated to determine the presence of data elements (DEs) necessary to satisfy each CQM and the DE percent population with patient data (fill rates). At the time of this analysis, the CancerLinQ network comprised 63 active practices, representing eight different EHR vendors and containing records for more than 1.63 million unique patients with one or more malignant neoplasms (1.73 million cancer cases). RESULTS Fill rates for the 63 oMIPS-associated DEs varied widely among the practices. The average site had at least one filled DE for 52% of the DEs. Only 35% of the DEs were populated for at least one patient record in 95% of the practices. However, the average DE fill rate of all practices was 23%. No data were found at any practice for 22% of the DEs. Since any oMIPS CQM with an unpopulated DE component resulted in an inability to compute the measure, only two (10.5%) of the 19 oMIPS CQMs were computable for more than 1% of the patients. CONCLUSION Although EHR systems had relatively high DE fill rates for some DEs, underfilling and inconsistency of DEs in EHRs render automated oncology MIPS CQM calculations impractical.

Download Full-text

Automated provision of clinical routine data for a complex clinical follow-up study: A data warehouse solution

Health Informatics Journal ◽

10.1177/14604582211058081 ◽

2022 ◽

Vol 28 (1) ◽

pp. 146045822110580

Author(s):

Mathias Kaspar ◽

Georg Fette ◽

Monika Hanke ◽

Maximilian Ertl ◽

Frank Puppe ◽

...

Keyword(s):

Routine Data ◽

Routine Care ◽

Study Data ◽

Structured Data ◽

Clinical Routine ◽

Specific Data ◽

Electronic Data Capture System ◽

Deep Integration ◽

High Level

A deep integration of routine care and research remains challenging in many respects. We aimed to show the feasibility of an automated transformation and transfer process feeding deeply structured data with a high level of granularity collected for a clinical prospective cohort study from our hospital information system to the study’s electronic data capture system, while accounting for study-specific data and visits. We developed a system integrating all necessary software and organizational processes then used in the study. The process and key system components are described together with descriptive statistics to show its feasibility in general and to identify individual challenges in particular. Data of 2051 patients enrolled between 2014 and 2020 was transferred. We were able to automate the transfer of approximately 11 million individual data values, representing 95% of all entered study data. These were recorded in n = 314 variables (28% of all variables), with some variables being used multiple times for follow-up visits. Our validation approach allowed for constant good data quality over the course of the study. In conclusion, the automated transfer of multi-dimensional routine medical data from HIS to study databases using specific study data and visit structures is complex, yet viable.

Download Full-text

Big Data Analytics With Machine Learning and Deep Learning Methods for Detection of Anomalies in Network Traffic

10.4018/978-1-6684-3662-2.ch032 ◽

2022 ◽

pp. 678-707

Author(s):

Valliammal Narayan ◽

Shanmugapriya D.

Keyword(s):

Big Data ◽

Deep Learning ◽

Anomaly Detection ◽

Traffic Flow ◽

Cyber Security ◽

Learning Algorithms ◽

Cyber Attacks ◽

Structured Data ◽

Cyber Attack ◽

Imbalance Problem

Information is vital for any organization to communicate through any network. The growth of internet utilization and the web users increased the cyber threats. Cyber-attacks in the network change the traffic flow of each system. Anomaly detection techniques have been developed for different types of cyber-attack or anomaly strategies. Conventional ADS protect information transferred through the network or cyber attackers. The stable prevention of anomalies by machine and deep-learning algorithms are applied for cyber-security. Big data solutions handle voluminous data in a short span of time. Big data management is the organization and manipulation of huge volumes of structured data, semi-structured data and unstructured data, but it does not handle a data imbalance problem during the training process. Big data-based machine and deep-learning algorithms for anomaly detection involve the classification of decision boundary between normal traffic flow and anomaly traffic flow. The performance of anomaly detection is efficiently increased by different algorithms.

Download Full-text

structured data
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Fast Quaternion Product Units for Learning Disentangled Representations in SO(3)

Fast Quaternion Product Units for Learning Disentangled Representations in SO(3)

Dataset of antiarch placoderms (the most basal jawed vertebrates) throughout Middle Paleozoic

Association Mining of Near Misses in Hydropower Engineering Construction Based on Convolutional Neural Network Text Classification

Tree-structured data placement scheme with cluster-aided top-down transmission in erasure-coded distributed storage systems

A Framework for Automated Scraping of Structured Data Records From the Deep Web Using Semantic Labeling

Principal wave analysis for high-dimensional structured data with applications to epigenomics and neuroimaging studies

Chasm Between Cancer Quality Measures and Electronic Health Record Data Quality

Automated provision of clinical routine data for a complex clinical follow-up study: A data warehouse solution

Big Data Analytics With Machine Learning and Deep Learning Methods for Detection of Anomalies in Network Traffic

Export Citation Format

structured dataRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Fast Quaternion Product Units for Learning Disentangled Representations in SO(3)

Fast Quaternion Product Units for Learning Disentangled Representations in SO(3)

Dataset of antiarch placoderms (the most basal jawed vertebrates) throughout Middle Paleozoic

Association Mining of Near Misses in Hydropower Engineering Construction Based on Convolutional Neural Network Text Classification

Tree-structured data placement scheme with cluster-aided top-down transmission in erasure-coded distributed storage systems

A Framework for Automated Scraping of Structured Data Records From the Deep Web Using Semantic Labeling

Principal wave analysis for high-dimensional structured data with applications to epigenomics and neuroimaging studies

Chasm Between Cancer Quality Measures and Electronic Health Record Data Quality

Automated provision of clinical routine data for a complex clinical follow-up study: A data warehouse solution

Big Data Analytics With Machine Learning and Deep Learning Methods for Detection of Anomalies in Network Traffic

structured data
Recently Published Documents