Precise clustering analysis of Internet financial credit reporting dependent on multidimensional attribute sparse large data

Given that the traditional methods cannot perform clustering analysis on the Internet financial credit reporting directly and effectively, a kind of precise clustering analysis of internet financial credit reporting dependent on multidimensional attribute sparse large data is proposed. By measuring the overall distance between Internet financial credit reporting through the sparse large data with multidimensional attributes, the multidimensional attribute sparse large data are used to perform clustering analysis on the overall distance matrix and the component approximate distance matrix between the data, respectively. The correlation relationship between the Internet financial credit reporting under these two perspectives is taken into comprehensive consideration. Multidimensional attribute sparse large data pairs are used to reflect the comprehensive relationship matrix of the original Internet financial credit reporting to achieve clustering with relatively high quality. Numerical experiments show that compared with the traditional clustering methods, the method proposed in this paper can not only reflect the overall data features effectively, but also improve the clustering effect of the original Internet financial credit reporting data through the analysis of the correlation relationship between the important component attribute sequences.

Download Full-text

Prevention of Internet Financial Risks in the Era of Digital Economy

Proceedings of Business and Economic Studies ◽

10.26689/pbes.v4i5.2649 ◽

2021 ◽

Vol 4 (5) ◽

pp. 38-44

Author(s):

Yan Wang

Keyword(s):

Big Data ◽

Digital Economy ◽

The Internet ◽

Financial Risks ◽

Financial Industry ◽

Widespread Application ◽

Credit Reporting ◽

Financial Credit ◽

Internet Finance ◽

The Times

As a pillar in the development of China’s economy, the financial industry plays a key role in the production and life of residents. Along with the widespread application of the internet, internet finance has gradually emerged as required by the times, and in the achievement of the collection and extraction of big data, related analysis and exploration technologies have been emphasized more. However, in the context of big data technology, there are still risks of unsound laws, inadequate business publicity, user information security, and capital liquidity in internet finance. Under this digital economy era, this article attempts to discuss these risks, which need to be prevented from establishing a good internet financial system, strengthening interindustry exchanges and cooperation, building a unified internet financial information supervision platform, as well as optimizing the internet financial credit reporting system, so as to promote a healthy and sound development of the whole financial industry.

Download Full-text

clusTCR: a Python interface for rapid clustering of large sets of CDR3 sequences

10.1101/2021.02.22.432291 ◽

2021 ◽

Author(s):

Sebastiaan Valkiers ◽

Max Van Houcke ◽

Kris Laukens ◽

Pieter Meysman

Keyword(s):

T Cell ◽

Large Data ◽

Cell Receptor ◽

Amino Acid Sequences ◽

Large Data Sets ◽

Data Sets ◽

Clustering Methods ◽

Link Type ◽

Large Sets ◽

Similar Accuracy

The T-cell receptor (TCR) determines the specificity of a T-cell towards an epitope. As of yet, the rules for antigen recognition remain largely undetermined. Current methods for grouping TCRs according to their epitope specificity remain limited in performance and scalability. Multiple methodologies have been developed, but all of them fail to efficiently cluster large data sets exceeding 1 million sequences. To account for this limitation, we developed clusTCR, a rapid TCR clustering alternative that efficiently scales up to millions of CDR3 amino acid sequences. Benchmarking comparisons revealed similar accuracy of clusTCR with other TCR clustering methods. clusTCR offers a drastic improvement in clustering speed, which allows clustering of millions of TCR sequences in just a few minutes through efficient similarity searching and sequence hashing.clusTCR was written in Python 3. It is available as an anaconda package (https://anaconda.org/svalkiers/clustcr) and on github (https://github.com/svalkiers/clusTCR).

Download Full-text

Applying Security to a Big Stream Cloud Architecture for the Internet of Things

Securing the Internet of Things ◽

10.4018/978-1-5225-9866-4.ch057 ◽

2020 ◽

pp. 1260-1284

Author(s):

Laura Belli ◽

Simone Cirani ◽

Luca Davoli ◽

Gianluigi Ferrari ◽

Lorenzo Melegari ◽

...

Keyword(s):

Internet Of Things ◽

Real Time ◽

Large Data ◽

The Internet ◽

Low Latency ◽

Stream Data ◽

Memory Processing ◽

Cloud Architecture ◽

Iot Devices ◽

The Internet Of Things

The Internet of Things (IoT) is expected to interconnect billions (around 50 by 2020) of heterogeneous sensor/actuator-equipped devices denoted as “Smart Objects” (SOs), characterized by constrained resources in terms of memory, processing, and communication reliability. Several IoT applications have real-time and low-latency requirements and must rely on architectures specifically designed to manage gigantic streams of information (in terms of number of data sources and transmission data rate). We refer to “Big Stream” as the paradigm which best fits the selected IoT scenario, in contrast to the traditional “Big Data” concept, which does not consider real-time constraints. Moreover, there are many security concerns related to IoT devices and to the Cloud. In this paper, we analyze security aspects in a novel Cloud architecture for Big Stream applications, which efficiently handles Big Stream data through a Graph-based platform and delivers processed data to consumers, with low latency. The authors detail each module defined in the system architecture, describing all refinements required to make the platform able to secure large data streams. An experimentation is also conducted in order to evaluate the performance of the proposed architecture when integrating security mechanisms.

Download Full-text

Information Compression

Web-Based Supply Chain Management and Digital Signal Processing ◽

10.4018/978-1-60566-888-8.ch008 ◽

2010 ◽

pp. 97-108

Author(s):

Manjunath Ramachandra

Keyword(s):

Supply Chain ◽

Data Compression ◽

Human Perception ◽

Large Data ◽

Multimedia Data ◽

The Internet ◽

Information Compression ◽

Separate Section ◽

The Web ◽

Available Resources

If a large data transactions are to happen in the supply chain over the web, the resources would be strained and lead to choking of the network apart from the increased transfer costs. To use the available resources over the internet effectively, the data is often compressed before transfer. This chapter provides the different methods and levels of data compression. A separate section is devoted for multimedia data compression where a certain losses in the data is tolerable during compression due to the limitations of human perception.

Download Full-text

A Study on the Behavior of Clustering Techniques for Modeling Travel Time in Road-Based Mass Transit Systems

Proceedings ◽

10.3390/proceedings2019031018 ◽

2019 ◽

Vol 31 (1) ◽

pp. 18

Author(s):

Cristóbal ◽

Padrón ◽

Quesada-Arencibia ◽

Alayón ◽

Blasio ◽

...

Keyword(s):

Travel Time ◽

Behavior Pattern ◽

Large Data ◽

Mass Transit ◽

Clustering Methods ◽

Knowledge Modeling ◽

Activity Data ◽

Transit Systems ◽

Clustering Techniques ◽

Key Factor

In road-based mass transit systems, the travel time is a key factor affecting quality of service. For this reason, to know the behavior of this time is a relevant challenge. Clustering methods are interesting tools for knowledge modeling because these are unsupervised techniques, allowing hidden behavior patterns in large data sets to be found. In this contribution, a study on the utility of different clustering techniques to obtain behavior pattern of travel time is presented. The study analyzed three clustering techniques: K-medoid, Diana, and Hclust, studying how two key factors of these techniques (distance metric and clusters number) affect the results obtained. The study was conducted using transport activity data provided by a public transport operator.

Download Full-text

Deep Learning, Feature Learning, and Clustering Analysis for SEM Image Classification

Data Intelligence ◽

10.1162/dint_a_00062 ◽

2020 ◽

Vol 2 (4) ◽

pp. 513-528

Author(s):

Rossella Aversa ◽

Piero Coronica ◽

Cristiano De Nobili ◽

Stefano Cozzini

Keyword(s):

Unsupervised Learning ◽

Clustering Analysis ◽

Feature Learning ◽

Machine Learning Algorithms ◽

Learning Approaches ◽

Clustering Methods ◽

Data Set ◽

Sem Image ◽

Supervised And Unsupervised Learning ◽

Scanning Electron

In this paper, we report upon our recent work aimed at improving and adapting machine learning algorithms to automatically classify nanoscience images acquired by the Scanning Electron Microscope (SEM). This is done by coupling supervised and unsupervised learning approaches. We first investigate supervised learning on a ten-category data set of images and compare the performance of the different models in terms of training accuracy. Then, we reduce the dimensionality of the features through autoencoders to perform unsupervised learning on a subset of images in a selected range of scales (from 1 μm to 2 μm). Finally, we compare different clustering methods to uncover intrinsic structures in the images.

Download Full-text

Large Data Exchange Based on the Internet of Things

International Journal of Simulation Systems Science & Technology ◽

10.5013/ijssst.a.16.4b.09 ◽

2020 ◽

Author(s):

Guest Editor Liutian Ye

Keyword(s):

Internet Of Things ◽

Data Exchange ◽

Large Data ◽

The Internet ◽

The Internet Of Things

Download Full-text

Improved K-MEANS Algorithm Based on Samples

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.734.472 ◽

2015 ◽

Vol 734 ◽

pp. 472-475

Author(s):

Wei Jin ◽

Xiao Rong Zhao

Keyword(s):

Execution Time ◽

Clustering Analysis ◽

Scientific Research ◽

Large Data ◽

Commercial Application ◽

Massive Data ◽

Number Of Clusters ◽

Initial Identification ◽

Partition Method ◽

Reduce Execution Time

Clustering analysis plays an important role in scientific research and commercial application. K-means algorithm is a widely used partition method in clustering. in this method.The number of clusters is predefined and the technique is highly dependent off the initial identification of elements that represent the clusters well. As the dataset’s scale increases rapidly, it is difficult to use K-means and deal with massive data. partitions.To prevent this problem,refining initial points algorithm provided.it can reduce execution time and improve solutions for large data by setting the refinement of initial conditions.The experiments demonstrate that sample-based K-means is more stable and more accurate.

Download Full-text

Characterising sediments of a tropical sediment-starved shelf using cluster analysis of physical and geochemical variables

Environmental Chemistry ◽

10.1071/en14126 ◽

2015 ◽

Vol 12 (2) ◽

pp. 204 ◽

Cited By ~ 7

Author(s):

Lynda C. Radke ◽

Jin Li ◽

Grant Douglas ◽

Rachel Przeslawski ◽

Scott Nichol ◽

...

Keyword(s):

Large Data ◽

High Energy ◽

Rate Coefficients ◽

Geochemical Data ◽

Clustering Methods ◽

Data Set ◽

Β Diversity ◽

Published Evidence ◽

Baseline Information ◽

Geochemical Variables

Environmental context Australia's tropical marine estate is a biodiversity hotspot that is threatened by human activities. Analysis and interpretation of large physical and geochemistry data sets provides important information on processes occurring at the seafloor in this poorly known area. These processes help us to understand how the seafloor functions to support biodiversity in the region. Abstract Baseline information on habitats is required to manage Australia's northern tropical marine estate. This study aims to develop an improved understanding of seafloor environments of the Timor Sea. Clustering methods were applied to a large data set comprising physical and geochemical variables that describe organic matter (OM) reactivity, quantity and source, and geochemical processes. Arthropoda (infauna) were used to assess different groupings. Clusters based on physical and geochemical data discriminated arthropods better than geomorphic features. Major variations among clusters included grain size and a cross-shelf transition from authigenic-Mn–As enrichments (inner shelf) to authigenic-P enrichment (outer shelf). Groups comprising raised features had the highest reactive OM concentrations (e.g. low chlorin indices and C:N ratios, and high reaction rate coefficients) and benthic algal δ13C signatures. Surface area-normalised OM concentrations higher than continental shelf norms were observed in association with: (i) low δ15N, inferring Trichodesmium input; and (ii) pockmarks, which impart bottom–up controls on seabed chemistry and cause inconsistencies between bulk and pigment OM pools. Low Shannon–Wiener diversity occurred in association with low redox and porewater pH and published evidence for high energy. Highest β-diversity was observed at euphotic depths. Geochemical data and clustering methods used here provide insight into ecosystem processes that likely influence biodiversity patterns in the region.

Download Full-text

The Study on the Methodology of Transactionevent in the Ale of Rfid Networks

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.556-562.5321 ◽

2014 ◽

Vol 556-562 ◽

pp. 5321-5327

Author(s):

Hui Qun Zhao ◽

Hai Gang Yang

Keyword(s):

Internet Of Things ◽

Real Time ◽

Large Data ◽

The Internet ◽

Prototype System ◽

Processing Efficiency ◽

Long Period ◽

Rfid Networks ◽

Integrated Program ◽

The Internet Of Things

TransactionEvent is one of the five events defined in EPCGlobal standard. As TransactionEvent lasts for a long period and processes large data, it has a higher demand of real-time. The process of the TransactionEvent in the Internet of Things is complex. In order to overcome these disadvantages, this paper proposes a non-integrated program. This program will ensure the TransactionEvent processing efficiency, reliability and real time. In the end of this paper, the article will implement a prototype system of a commercial IoT to verify this method.

Download Full-text