massive datasets
Recently Published Documents


TOTAL DOCUMENTS

179
(FIVE YEARS 82)

H-INDEX

14
(FIVE YEARS 4)

2021 ◽  
Vol 38 (6) ◽  
pp. 1829-1835
Author(s):  
Ji Zou ◽  
Chao Zhang ◽  
Zhongjing Ma ◽  
Lei Yu ◽  
Kaiwen Sun ◽  
...  

Footprint recognition and parameter measurement are widely used in fields like medicine, sports, and criminal investigation. Some results have been achieved in the analysis of plantar pressure image features based on image processing. But the common algorithms of image feature extraction often depend on computer processing power and massive datasets. Focusing on the auxiliary diagnosis and treatment of foot rehabilitation of foot laceration patients, this paper explores the image feature analysis and dynamic measurement of plantar pressure based on fusion feature extraction. Firstly, the authors detailed the idea of extracting image features with a fusion algorithm, which integrates wavelet transform and histogram of oriented gradients (HOG) descriptor. Next, the plantar parameters were calculated based on plantar pressure images, and the measurement steps of plantar parameters were given. Finally, the feature extraction effect of the proposed algorithm was verified, and the measured results on plantar parameters were obtained through experiments.


2021 ◽  
Vol 4 ◽  
Author(s):  
Fan Zhang ◽  
Melissa Petersen ◽  
Leigh Johnson ◽  
James Hall ◽  
Sid E. O’Bryant

Driven by massive datasets that comprise biomarkers from both blood and magnetic resonance imaging (MRI), the need for advanced learning algorithms and accelerator architectures, such as GPUs and FPGAs has increased. Machine learning (ML) methods have delivered remarkable prediction for the early diagnosis of Alzheimer’s disease (AD). Although ML has improved accuracy of AD prediction, the requirement for the complexity of algorithms in ML increases, for example, hyperparameters tuning, which in turn, increases its computational complexity. Thus, accelerating high performance ML for AD is an important research challenge facing these fields. This work reports a multicore high performance support vector machine (SVM) hyperparameter tuning workflow with 100 times repeated 5-fold cross-validation for speeding up ML for AD. For demonstration and evaluation purposes, the high performance hyperparameter tuning model was applied to public MRI data for AD and included demographic factors such as age, sex and education. Results showed that computational efficiency increased by 96%, which helped to shed light on future diagnostic AD biomarker applications. The high performance hyperparameter tuning model can also be applied to other ML algorithms such as random forest, logistic regression, xgboost, etc.


Author(s):  
Dr. C. K. Gomathy

Abstract: Analyzing cyber incident information units is an essential approach for deepening our information of the evolution of the risk situation. This is a notably new studies topic, and plenty of research continue to be to be done. In this paper, we record a statistical evaluation of a breach incident information set similar to 12 years (2005–2017) of cyber hacking sports that encompass malware attacks. We display that, in evaluation to the findings suggested withinside the literature, each hacking breach incident inter-arrival instances and breach sizes need to be modeled through stochastic processes, instead of through distributions due to the fact they show off autocorrelations. Then, we recommend specific stochastic method fashions to, respectively, match the inter-arrival instances and the breach sizes. In this paper we be aware that, through reading their actions, we are able to classify malware right into a small quantity of Behavioral classes, every of which plays a restrained set of misbehaviors that signify them. These misbehaviors may be described through tracking capabilities belonging to exclusive platforms. In this paper we gift a singular host-primarily based totally malware detection machine in OSN which concurrently analyzes and correlates capabilities at 4 levels: kernel, application, person and package, to come across and prevent malicious behaviors. It has been designed to do not forget the ones behaviors traits of virtually each actual malware which may be observed withinside the wild. This prototype detects and efficaciously blocks greater than 96% of malicious apps, which come from 3 massive datasets with approximately 2,800 apps, through exploiting the cooperation of parallel classifiers and a behavioral signature-primarily based totally detector. Keywords: Cyber security, Malware, Emerging technology trends, Emerging cyber threats, Cyber attacks and countermeasures


Author(s):  
Dr. K. B. V. Brahma Rao ◽  
◽  
Dr. R Krishnam Raju Indukuri ◽  
Dr. Suresh Varma Penumatsa ◽  
Dr. M. V. Rama Sundari ◽  
...  

The objective of comparing various dimensionality techniques is to reduce feature sets in order to group attributes effectively with less computational processing time and utilization of memory. The various reduction algorithms can decrease the dimensionality of dataset consisting of a huge number of interrelated variables, while retaining the dissimilarity present in the dataset as much as possible. In this paper we use, Standard Deviation, Variance, Principal Component Analysis, Linear Discriminant Analysis, Factor Analysis, Positive Region, Information Entropy and Independent Component Analysis reduction algorithms using Hadoop Distributed File System for massive patient datasets to achieve lossless data reduction and to acquire required knowledge. The experimental results demonstrate that the ICA technique can efficiently operate on massive datasets eliminates irrelevant data without loss of accuracy, reduces storage space for the data and also the computation time compared to other techniques.


Author(s):  
João B Costa ◽  
Joana Silva-Correia ◽  
Rui L Reis ◽  
Joaquim M Oliveira

Bioengineering has been revolutionizing the production of biofunctional tissues for tackling unmet clinical needs. Bioengineers have been focusing their research in biofabrication, especially 3D bioprinting, providing cutting-edge approaches and biomimetic solutions with more reliability and cost–effectiveness. However, these emerging technologies are still far from the clinical setting and deep learning, as a subset of artificial intelligence, can be widely explored to close this gap. Thus, deep-learning technology is capable to autonomously deal with massive datasets and produce valuable outputs. The application of deep learning in bioengineering and how the synergy of this technology with biofabrication can help (more efficiently) bring 3D bioprinting to clinics, are overviewed herein.


2021 ◽  
Author(s):  
Farah Jemili ◽  
Hajer Bouras

In today’s world, Intrusion Detection System (IDS) is one of the significant tools used to the improvement of network security, by detecting attacks or abnormal data accesses. Most of existing IDS have many disadvantages such as high false alarm rates and low detection rates. For the IDS, dealing with distributed and massive data constitutes a challenge. Besides, dealing with imprecise data is another challenge. This paper proposes an Intrusion Detection System based on big data fuzzy analytics; Fuzzy C-Means (FCM) method is used to cluster and classify the pre-processed training dataset. The CTU-13 and the UNSW-NB15 are used as distributed and massive datasets to prove the feasibility of the method. The proposed system shows high performance in terms of accuracy, precision, detection rates, and false alarms.


Mathematics ◽  
2021 ◽  
Vol 9 (21) ◽  
pp. 2694
Author(s):  
Amira Mouakher ◽  
Axel Ragobert ◽  
Sébastien Gerin ◽  
Andrea Ko

Formal concept analysis (FCA) is a mathematical theory that is typically used as a knowledge representation method. The approach starts with an input binary relation specifying a set of objects and attributes, finds the natural groupings (formal concepts) described in the data, and then organizes the concepts in a partial order structure or concept (Galois) lattice. Unfortunately, the total number of concepts in this structure tends to grow exponentially as the size of the data increases. Therefore, there are numerous approaches for selecting a subset of concepts to provide full or partial coverage. In this paper, we rely on the battery of mathematical models offered by FCA to introduce a new greedy algorithm, called Concise, to compute minimal and meaningful subsets of concepts. Thanks to its theoretical properties, the Concise algorithm is shown to avoid the sluggishness of its competitors while offering the ability to mine both partial and full conceptual coverage of formal contexts. Furthermore, experiments on massive datasets also underscore the preservation of the quality of the mined formal concepts through interestingness measures agreed upon by the community.


Author(s):  
Tomer Gueta ◽  
Rahul Chauhan ◽  
Thiloshon Nagarajah ◽  
Vijay Barve ◽  
Povilas Gibas ◽  
...  

The bdverse is a collection of packages that form a general framework for facilitating biodiversity science in R (programming language). Exploratory and diagnostic visualization can unveil hidden patterns and anomalies in data and allow quick and efficient exploration of massive datasets. The development of an interactive yet flexible dashboard that can be easily deployed locally or remotely is a highly valuable biodiversity informatics tool. To this end, we have developed 'bddashboard', which serves as an agile framework for biodiversity dashboard development. This project is built in R, using the Shiny package (RStudio, Inc 2021) that helps build interactive web apps in R. The following key components were developed: Core Interactive Components The basic building blocks of every dashboard are interactive plots, maps, and tables. We have explored all major visualization libraries in R and have concluded that 'plotly' (Sievert 2020) is the most mature and showcases the best value for effort. Additionally, we have concluded that 'leaflet' (Graul 2016) shows the most diverse and high-quality mapping features, and DT (DataTables library) (Xie et al. 2021) is best for rendering tabular data. Each component was modularized to better adjust it for biodiversity data and to enhance its flexibility. Field Selector The field selector is a unique module that makes each interactive component much more versatile. Users have different data and needs; thus, every combination or selection of fields can tell a different story. The field selector allows users to change the X and Y axis on plots, to choose the columns that are visible on a table, and to easily control map settings. All that in real-time, without reloading the page or disturbing the reactivity. The field selector automatically detects how many columns a plot needs and what type of columns can be passed to the X-axis or Y-axis. The field selector also displays the completeness of each field. Plot Navigation We developed the plot navigation module to prevent unwanted extreme cases. Technically, drawing 1,000 bars on a single bar plot is possible, but this visualization is not human-friendly. Navigation allows users to decide how many values they want to see on a single plot. This technique allows for fast drawing of extensive datasets without affecting page reactivity, dramatically improving performance and functioning as a fail-safe mechanism. Reactivity Reactivity creates the connection between different components. The changes in input values automatically flow to the plots, text, maps, and tables that use the input, and cause them to update. Reactivity facilitates drilling down functionality, which enhances the user’s ability to explore and investigate the data. We developed a novel and robust reactivity technique that allows us to add a new component and effectively connect it with all existing components within a dashboard tab, using only one line of code. Generic Biodiversity Tabs We developed five useful dashboard tabs (Fig. 1): (i) the Data Summary tab to give a quick overview of a dataset; (ii) the Data Completeness tab helps users get valuable information about missing records and missing Darwin Core fields; (iii) the Spatial tab is dedicated to spatial visualizations; (iv) the Taxonomic tab is designed to visualize taxonomy; and (v) the Temporal tab is designed to visualize time-related aspects. Performance and Agility To make a dashboard work smoothly and react quickly, hundreds of small and large modules, functions, and techniques must work together. Our goal was to minimize dashboard latency and maximize its data capacity. We used asynchronous modules to write non-blocking code, clusters in map components, and preprocessing and filtering data before passing it to plots to reduce the load. The 'bddashboard' package modularized architecture allows us to develop completely different interactive and reactive dashboards within mere minutes.


Sign in / Sign up

Export Citation Format

Share Document