Precise clustering analysis of Internet financial credit reporting dependent on multidimensional attribute sparse large data

Author(s):  
Lingling Chen ◽  
Yuanyuan Zhang ◽  
Min Zeng

Given that the traditional methods cannot perform clustering analysis on the Internet financial credit reporting directly and effectively, a kind of precise clustering analysis of internet financial credit reporting dependent on multidimensional attribute sparse large data is proposed. By measuring the overall distance between Internet financial credit reporting through the sparse large data with multidimensional attributes, the multidimensional attribute sparse large data are used to perform clustering analysis on the overall distance matrix and the component approximate distance matrix between the data, respectively. The correlation relationship between the Internet financial credit reporting under these two perspectives is taken into comprehensive consideration. Multidimensional attribute sparse large data pairs are used to reflect the comprehensive relationship matrix of the original Internet financial credit reporting to achieve clustering with relatively high quality. Numerical experiments show that compared with the traditional clustering methods, the method proposed in this paper can not only reflect the overall data features effectively, but also improve the clustering effect of the original Internet financial credit reporting data through the analysis of the correlation relationship between the important component attribute sequences.

2021 ◽  
Vol 4 (5) ◽  
pp. 38-44
Author(s):  
Yan Wang

As a pillar in the development of China’s economy, the financial industry plays a key role in the production and life of residents. Along with the widespread application of the internet, internet finance has gradually emerged as required by the times, and in the achievement of the collection and extraction of big data, related analysis and exploration technologies have been emphasized more. However, in the context of big data technology, there are still risks of unsound laws, inadequate business publicity, user information security, and capital liquidity in internet finance. Under this digital economy era, this article attempts to discuss these risks, which need to be prevented from establishing a good internet financial system, strengthening interindustry exchanges and cooperation, building a unified internet financial information supervision platform, as well as optimizing the internet financial credit reporting system, so as to promote a healthy and sound development of the whole financial industry.


2021 ◽  
Author(s):  
Sebastiaan Valkiers ◽  
Max Van Houcke ◽  
Kris Laukens ◽  
Pieter Meysman

The T-cell receptor (TCR) determines the specificity of a T-cell towards an epitope. As of yet, the rules for antigen recognition remain largely undetermined. Current methods for grouping TCRs according to their epitope specificity remain limited in performance and scalability. Multiple methodologies have been developed, but all of them fail to efficiently cluster large data sets exceeding 1 million sequences. To account for this limitation, we developed clusTCR, a rapid TCR clustering alternative that efficiently scales up to millions of CDR3 amino acid sequences. Benchmarking comparisons revealed similar accuracy of clusTCR with other TCR clustering methods. clusTCR offers a drastic improvement in clustering speed, which allows clustering of millions of TCR sequences in just a few minutes through efficient similarity searching and sequence hashing.clusTCR was written in Python 3. It is available as an anaconda package (https://anaconda.org/svalkiers/clustcr) and on github (https://github.com/svalkiers/clusTCR).


2020 ◽  
pp. 1260-1284
Author(s):  
Laura Belli ◽  
Simone Cirani ◽  
Luca Davoli ◽  
Gianluigi Ferrari ◽  
Lorenzo Melegari ◽  
...  

The Internet of Things (IoT) is expected to interconnect billions (around 50 by 2020) of heterogeneous sensor/actuator-equipped devices denoted as “Smart Objects” (SOs), characterized by constrained resources in terms of memory, processing, and communication reliability. Several IoT applications have real-time and low-latency requirements and must rely on architectures specifically designed to manage gigantic streams of information (in terms of number of data sources and transmission data rate). We refer to “Big Stream” as the paradigm which best fits the selected IoT scenario, in contrast to the traditional “Big Data” concept, which does not consider real-time constraints. Moreover, there are many security concerns related to IoT devices and to the Cloud. In this paper, we analyze security aspects in a novel Cloud architecture for Big Stream applications, which efficiently handles Big Stream data through a Graph-based platform and delivers processed data to consumers, with low latency. The authors detail each module defined in the system architecture, describing all refinements required to make the platform able to secure large data streams. An experimentation is also conducted in order to evaluate the performance of the proposed architecture when integrating security mechanisms.


Author(s):  
Manjunath Ramachandra

If a large data transactions are to happen in the supply chain over the web, the resources would be strained and lead to choking of the network apart from the increased transfer costs. To use the available resources over the internet effectively, the data is often compressed before transfer. This chapter provides the different methods and levels of data compression. A separate section is devoted for multimedia data compression where a certain losses in the data is tolerable during compression due to the limitations of human perception.


Proceedings ◽  
2019 ◽  
Vol 31 (1) ◽  
pp. 18
Author(s):  
Cristóbal ◽  
Padrón ◽  
Quesada-Arencibia ◽  
Alayón ◽  
Blasio ◽  
...  

In road-based mass transit systems, the travel time is a key factor affecting quality of service. For this reason, to know the behavior of this time is a relevant challenge. Clustering methods are interesting tools for knowledge modeling because these are unsupervised techniques, allowing hidden behavior patterns in large data sets to be found. In this contribution, a study on the utility of different clustering techniques to obtain behavior pattern of travel time is presented. The study analyzed three clustering techniques: K-medoid, Diana, and Hclust, studying how two key factors of these techniques (distance metric and clusters number) affect the results obtained. The study was conducted using transport activity data provided by a public transport operator.


2020 ◽  
Vol 2 (4) ◽  
pp. 513-528
Author(s):  
Rossella Aversa ◽  
Piero Coronica ◽  
Cristiano De Nobili ◽  
Stefano Cozzini

In this paper, we report upon our recent work aimed at improving and adapting machine learning algorithms to automatically classify nanoscience images acquired by the Scanning Electron Microscope (SEM). This is done by coupling supervised and unsupervised learning approaches. We first investigate supervised learning on a ten-category data set of images and compare the performance of the different models in terms of training accuracy. Then, we reduce the dimensionality of the features through autoencoders to perform unsupervised learning on a subset of images in a selected range of scales (from 1 μm to 2 μm). Finally, we compare different clustering methods to uncover intrinsic structures in the images.


2015 ◽  
Vol 734 ◽  
pp. 472-475
Author(s):  
Wei Jin ◽  
Xiao Rong Zhao

Clustering analysis plays an important role in scientific research and commercial application. K-means algorithm is a widely used partition method in clustering. in this method.The number of clusters is predefined and the technique is highly dependent off the initial identification of elements that represent the clusters well. As the dataset’s scale increases rapidly, it is difficult to use K-means and deal with massive data. partitions.To prevent this problem,refining initial points algorithm provided.it can reduce execution time and improve solutions for large data by setting the refinement of initial conditions.The experiments demonstrate that sample-based K-means is more stable and more accurate.


2015 ◽  
Vol 12 (2) ◽  
pp. 204 ◽  
Author(s):  
Lynda C. Radke ◽  
Jin Li ◽  
Grant Douglas ◽  
Rachel Przeslawski ◽  
Scott Nichol ◽  
...  

Environmental context Australia's tropical marine estate is a biodiversity hotspot that is threatened by human activities. Analysis and interpretation of large physical and geochemistry data sets provides important information on processes occurring at the seafloor in this poorly known area. These processes help us to understand how the seafloor functions to support biodiversity in the region. Abstract Baseline information on habitats is required to manage Australia's northern tropical marine estate. This study aims to develop an improved understanding of seafloor environments of the Timor Sea. Clustering methods were applied to a large data set comprising physical and geochemical variables that describe organic matter (OM) reactivity, quantity and source, and geochemical processes. Arthropoda (infauna) were used to assess different groupings. Clusters based on physical and geochemical data discriminated arthropods better than geomorphic features. Major variations among clusters included grain size and a cross-shelf transition from authigenic-Mn–As enrichments (inner shelf) to authigenic-P enrichment (outer shelf). Groups comprising raised features had the highest reactive OM concentrations (e.g. low chlorin indices and C:N ratios, and high reaction rate coefficients) and benthic algal δ13C signatures. Surface area-normalised OM concentrations higher than continental shelf norms were observed in association with: (i) low δ15N, inferring Trichodesmium input; and (ii) pockmarks, which impart bottom–up controls on seabed chemistry and cause inconsistencies between bulk and pigment OM pools. Low Shannon–Wiener diversity occurred in association with low redox and porewater pH and published evidence for high energy. Highest β-diversity was observed at euphotic depths. Geochemical data and clustering methods used here provide insight into ecosystem processes that likely influence biodiversity patterns in the region.


2014 ◽  
Vol 556-562 ◽  
pp. 5321-5327
Author(s):  
Hui Qun Zhao ◽  
Hai Gang Yang

TransactionEvent is one of the five events defined in EPCGlobal standard. As TransactionEvent lasts for a long period and processes large data, it has a higher demand of real-time. The process of the TransactionEvent in the Internet of Things is complex. In order to overcome these disadvantages, this paper proposes a non-integrated program. This program will ensure the TransactionEvent processing efficiency, reliability and real time. In the end of this paper, the article will implement a prototype system of a commercial IoT to verify this method.


Sign in / Sign up

Export Citation Format

Share Document