scholarly journals Rough Set Approach toward Data Modelling and User Knowledge for Extracting Insights

Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Xiaoqun Liao ◽  
Shah Nazir ◽  
Junxin Shen ◽  
Bingliang Shen ◽  
Sulaiman Khan

Information is considered to be the major part of an organization. With the enhancement of technology, the knowledge level is increasing with the passage of time. This increase of information is in volume, velocity, and variety. Extracting meaningful insights is the dire need of an individual from such information and knowledge. Visualization is a key tool and has become one of the most significant platforms for interpreting, extracting, and communicating information. The current study is an endeavour toward data modelling and user knowledge by using a rough set approach for extracting meaningful insights. The technique has used different rough set algorithms such as K-nearest neighbours (KNN), decision rules (DR), decomposition tree (DT), and local transfer function classifier (LTF-C) for an experimental setup. The approach has found its accuracy for the optimal use of data modelling and user knowledge. The experimental setup of the proposed method is validated by using the dataset available in the UCI web repository. Results of the proposed study show that the model is effective and efficient with an accuracy of 96% for KNN, 87% for decision rules, 91% for decision trees, 85.04% for cross validation architecture, and 94.3% for local transfer function classifier. The validity of the proposed classification algorithms is tested using different performance metrics such as F-score, precision, accuracy, recall, specificity, and misclassification rates. For all these performance metrics, the KNN classifier outperformed, and this high performance shows the applicability of the KNN classifier in the proposed problem.

Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Xiaoqun Liao ◽  
Shah Nazir ◽  
Yangbin Zhou ◽  
Muhammad Shafiq ◽  
Xuelin Qi

In modern day technology, the level of knowledge is increasing day by day. This increase is in terms of volume, velocity, and variety. Understanding of such knowledge is a dire need of an individual to extract meaningful insight from it. With the advancement in computer and image-based technologies, visualization becomes one of the most significant platforms to extract, interpret, and communicate information. In data modelling, visualization is the process of extracting knowledge to reveal the detail data structure and process of data. The proposed study aim is to know about the user knowledge, data modelling, and visualization by handling through the fuzzy logic-based approach. The experimental setup is validated through the data user modelling dataset available in the UCI web repository. The results show that the model is effective and efficient in situations where uncertainty and complexity arise.


2021 ◽  
Vol 14 (5) ◽  
pp. 785-798
Author(s):  
Daokun Hu ◽  
Zhiwen Chen ◽  
Jianbing Wu ◽  
Jianhua Sun ◽  
Hao Chen

Persistent memory (PM) is increasingly being leveraged to build hash-based indexing structures featuring cheap persistence, high performance, and instant recovery, especially with the recent release of Intel Optane DC Persistent Memory Modules. However, most of them are evaluated on DRAM-based emulators with unreal assumptions, or focus on the evaluation of specific metrics with important properties sidestepped. Thus, it is essential to understand how well the proposed hash indexes perform on real PM and how they differentiate from each other if a wider range of performance metrics are considered. To this end, this paper provides a comprehensive evaluation of persistent hash tables. In particular, we focus on the evaluation of six state-of-the-art hash tables including Level hashing, CCEH, Dash, PCLHT, Clevel, and SOFT, with real PM hardware. Our evaluation was conducted using a unified benchmarking framework and representative workloads. Besides characterizing common performance properties, we also explore how hardware configurations (such as PM bandwidth, CPU instructions, and NUMA) affect the performance of PM-based hash tables. With our in-depth analysis, we identify design trade-offs and good paradigms in prior arts, and suggest desirable optimizations and directions for the future development of PM-based hash tables.


Nanophotonics ◽  
2017 ◽  
Vol 6 (4) ◽  
pp. 663-679 ◽  
Author(s):  
Francesco Chiavaioli ◽  
Francesco Baldini ◽  
Sara Tombelli ◽  
Cosimo Trono ◽  
Ambra Giannetti

AbstractOptical fiber gratings (OFGs), especially long-period gratings (LPGs) and etched or tilted fiber Bragg gratings (FBGs), are playing an increasing role in the chemical and biochemical sensing based on the measurement of a surface refractive index (RI) change through a label-free configuration. In these devices, the electric field evanescent wave at the fiber/surrounding medium interface changes its optical properties (i.e. intensity and wavelength) as a result of the RI variation due to the interaction between a biological recognition layer deposited over the fiber and the analyte under investigation. The use of OFG-based technology platforms takes the advantages of optical fiber peculiarities, which are hardly offered by the other sensing systems, such as compactness, lightness, high compatibility with optoelectronic devices (both sources and detectors), and multiplexing and remote measurement capability as the signal is spectrally modulated. During the last decade, the growing request in practical applications pushed the technology behind the OFG-based sensors over its limits by means of the deposition of thin film overlays, nanocoatings, and nanostructures, in general. Here, we review efforts toward utilizing these nanomaterials as coatings for high-performance and low-detection limit devices. Moreover, we review the recent development in OFG-based biosensing and identify some of the key challenges for practical applications. While high-performance metrics are starting to be achieved experimentally, there are still open questions pertaining to an effective and reliable detection of small molecules, possibly up to single molecule, sensing in vivo and multi-target detection using OFG-based technology platforms.


2021 ◽  
Author(s):  
Komuravelli Prashanth ◽  
Kalidas Yeturu

<div>There are millions of scanned documents worldwide in around 4 thousand languages. Searching for information in a scanned document requires a text layer to be available and indexed. Preparation of a text layer requires recognition of character and sub-region patterns and associating with a human interpretation. Developing an optical character recognition (OCR) system for each and every language is a very difficult task if not impossible. There is a strong need for systems that add on top of the existing OCR technologies by learning from them and unifying disparate multitude of many a system. In this regard, we propose an algorithm that leverages the fact that we are dealing with scanned documents of handwritten text regions from across diverse domains and language settings. We observe that the text regions have consistent bounding box sizes and any large font or tiny font scenarios can be handled in preprocessing or postprocessing phases. The image subregions are smaller in size in scanned text documents compared to subregions formed by common objects in general purpose images. We propose and validate the hypothesis that a much simpler convolution neural network (CNN) having very few layers and less number of filters can be used for detecting individual subregion classes. For detection of several hundreds of classes, multiple such simpler models can be pooled to operate simultaneously on a document. The advantage of going by pools of subregion specific models is the ability to deal with incremental addition of hundreds of newer classes over time, without disturbing the previous models in the continual learning scenario. Such an approach has distinctive advantage over using a single monolithic model where subregions classes share and interfere via a bulky common neural network. We report here an efficient algorithm for building a subregion specific lightweight CNN models. The training data for the CNN proposed, requires engineering synthetic data points that consider both pattern of interest and non-patterns as well. We propose and validate the hypothesis that an image canvas in which optimal amount of pattern and non-pattern can be formulated using a means squared error loss function to influence filter for training from the data. The CNN hence trained has the capability to identify the character-object in presence of several other objects on a generalized test image of a scanned document. In this setting some of the key observations are in a CNN, learning a filter depends not only on the abundance of patterns of interest but also on the presence of a non-pattern context. Our experiments have led to some of the key observations - (i) a pattern cannot be over-expressed in isolation, (ii) a pattern cannot be under-xpressed as well, (iii) a non-pattern can be of salt and pepper type noise and finally (iv) it is sufficient to provide a non-pattern context to a modest representation of a pattern to result in strong individual sub-region class models. We have carried out studies and reported \textit{mean average precision} scores on various data sets including (1) MNIST digits(95.77), (2) E-MNIST capital alphabet(81.26), (3) EMNIST small alphabet(73.32) (4) Kannada digits(95.77), (5) Kannada letters(90.34), (6) Devanagari letters(100) (7) Telugu words(93.20) (8) Devanagari words(93.20) and also on medical prescriptions and observed high-performance metrics of mean average precision over 90%. The algorithm serves as a kernel in the automatic annotation of digital documents in diverse scenarios such as annotation of ancient manuscripts and hand-written health records.</div>


Author(s):  
Pravin Jagtap ◽  
Rupesh Nasre ◽  
V. S. Sanapala ◽  
B. S. V. Patnaik

Smoothed Particle Hydrodynamics (SPH) is fast emerging as a practically useful computational simulation tool for a wide variety of engineering problems. SPH is also gaining popularity as the back bone for fast and realistic animations in graphics and video games. The Lagrangian and mesh-free nature of the method facilitates fast and accurate simulation of material deformation, interface capture, etc. Typically, particle-based methods would necessitate particle search and locate algorithms to be implemented efficiently, as continuous creation of neighbor particle lists is a computationally expensive step. Hence, it is advantageous to implement SPH, on modern multi-core platforms with the help of High-Performance Computing (HPC) tools. In this work, the computational performance of an SPH algorithm is assessed on multi-core Central Processing Unit (CPU) as well as massively parallel General Purpose Graphical Processing Units (GP-GPU). Parallelizing SPH faces several challenges such as, scalability of the neighbor search process, force calculations, minimizing thread divergence, achieving coalesced memory access patterns, balancing workload, ensuring optimum use of computational resources, etc. While addressing some of these challenges, detailed analysis of performance metrics such as speedup, global load efficiency, global store efficiency, warp execution efficiency, occupancy, etc. is evaluated. The OpenMP and Compute Unified Device Architecture[Formula: see text] parallel programming models have been used for parallel computing on Intel Xeon[Formula: see text] E5-[Formula: see text] multi-core CPU and NVIDIA Quadro M[Formula: see text] and NVIDIA Tesla p[Formula: see text] massively parallel GPU architectures. Standard benchmark problems from the Computational Fluid Dynamics (CFD) literature are chosen for the validation. The key concern of how to identify a suitable architecture for mesh-less methods which essentially require heavy workload of neighbor search and evaluation of local force fields from neighbor interactions is addressed.


2021 ◽  
Author(s):  
Neha Gupta ◽  
Aditya Jain ◽  
Ajay Kumar

Abstract This work investigates the suppressed distortion performance metrics of gate all around (GAA) Gallium Nitride (GaN)/Al2O3 Nanowire (NW) n-channel MOSFET (GaNNW/Al2O3 MOSFET) based on quantum numerical simulations at room temperature (300 K). The simulation results show high switching ratio (≈109) with low subthreshold swing (67mV/decade), high QF value (4.1mS-decade/mV) of GaNNW/Al2O3-MOSFET in comparison to GaNNW/SiO2 and SiNW MOSFET for Vds=0.4V due to the lower permittivity of GaN and more effective mass of the electron. Furthermore, linearity and distortion performance is also examined by numerically calculating transconductance and its higher derivatives (gm2 and gm3); voltage and current intercept point (VIP2, VIP3 and IIP3); 1-dB compression point; Harmonics distortions (HD2 and HD3) and IMD3. All these parameters show high linearity and low distortion at zero crossover point (where gm3=0) in GaNNW/Al2O3 MOSFET. Thus, GaNNW MOSFET can be considered as a promising candidate for low power high-performance applications. In addition, effect of ambient temperature (250K-450K) on the performance of GaNNW/Al2O3 is studied and discussed in terms of the above mentioned metrics. It is very well exhibited that SS, Ion, Vth, and QF improved when the temperature is lowered which makes it suitable for low-temperature environments. But, linearity degrades as the temperature lowers down.


2013 ◽  
pp. 1225-1251
Author(s):  
Chun-Che Huang ◽  
Tzu-Liang (Bill) Tseng ◽  
Hao-Syuan Lin

Patent infringement risk is a significant issue for corporations due to the increased appreciation of intellectual property rights. If a corporation gives insufficient protection to its patents, it may loss both profits from product, and industry competitiveness. Many studies on patent infringement have focused on measuring the patent trend indicators and the patent monetary value. However, very few studies have attempted to develop a categorization mechanism for measuring and evaluating the patent infringement risk, for example, the categorization of the patent infringement cases, then to determine the significant attributes and introduce the infringement decision rules. This study applies Rough Set Theory (RST), which is suitable for processing qualitative information to induce rules to derive significant attributes for categorization of the patent infringement risk. Moreover, through the use of the concept hierarchy and the credibility index, it can be integrated with RST and then enhance application of the finalized decision rules.


Author(s):  
Benjamin Griffiths

Rough Set Theory (RST), since its introduction in Pawlak (1982), continues to develop as an effective tool in data mining. Within a set theoretical structure, its remit is closely concerned with the classification of objects to decision attribute values, based on their description by a number of condition attributes. With regards to RST, this classification is through the construction of ‘if .. then ..’ decision rules. The development of RST has been in many directions, amongst the earliest was with the allowance for miss-classification in the constructed decision rules, namely the Variable Precision Rough Sets model (VPRS) (Ziarko, 1993), the recent references for this include; Beynon (2001), Mi et al. (2004), and Slezak and Ziarko (2005). Further developments of RST have included; its operation within a fuzzy environment (Greco et al., 2006), and using a dominance relation based approach (Greco et al., 2004). The regular major international conferences of ‘International Conference on Rough Sets and Current Trends in Computing’ (RSCTC, 2004) and ‘International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing’ (RSFDGrC, 2005) continue to include RST research covering the varying directions of its development. This is true also for the associated book series entitled ‘Transactions on Rough Sets’ (Peters and Skowron, 2005), which further includes doctoral theses on this subject. What is true, is that RST is still evolving, with the eclectic attitude to its development meaning that the definitive concomitant RST data mining techniques are still to be realised. Grzymala-Busse and Ziarko (2000), in a defence of RST, discussed a number of points relevant to data mining, and also made comparisons between RST and other techniques. Within the area of data mining and the desire to identify relationships between condition attributes, the effectiveness of RST is particularly pertinent due to the inherent intent within RST type methodologies for data reduction and feature selection (Jensen and Shen, 2005). That is, subsets of condition attributes identified that perform the same role as all the condition attributes in a considered data set (termed ß-reducts in VPRS, see later). Chen (2001) addresses this, when discussing the original RST, they state it follows a reductionist approach and is lenient to inconsistent data (contradicting condition attributes - one aspect of underlying uncertainty). This encyclopaedia article describes and demonstrates the practical application of a RST type methodology in data mining, namely VPRS, using nascent software initially described in Griffiths and Beynon (2005). The use of VPRS, through its relative simplistic structure, outlines many of the rudiments of RST based methodologies. The software utilised is oriented towards ‘hands on’ data mining, with graphs presented that clearly elucidate ‘veins’ of possible information identified from ß-reducts, over different allowed levels of missclassification associated with the constructed decision rules (Beynon and Griffiths, 2004). Further findings are briefly reported when undertaking VPRS in a resampling environment, with leave-one-out and bootstrapping approaches adopted (Wisnowski et al., 2003). The importance of these results is in the identification of the more influential condition attributes, pertinent to accruing the most effective data mining results.


Sign in / Sign up

Export Citation Format

Share Document