Weighted-Attribute Triplet Hashing for Large-Scale Similar Judicial Case Matching

Similar judicial case matching aims to enable an accurate selection of a judicial document that is most similar to the target document from multiple candidates. The core of similar judicial case matching is to calculate the similarity between two fact case documents. Owing to similar judicial case matching techniques, legal professionals can promptly find and judge similar cases in a candidate set. These techniques can also benefit the development of judicial systems. However, the document of judicial cases not only is long in length but also has a certain degree of structural complexity. Meanwhile, a variety of judicial cases are also increasing rapidly; thus, it is difficult to find the document most similar to the target document in a large corpus. In this study, we present a novel similar judicial case matching model, which obtains the weight of judicial feature attributes based on hash learning and realizes fast similar matching by using a binary code. The proposed model extracts the judicial feature attributes vector using the bidirectional encoder representations from transformers (BERT) model and subsequently obtains the weighted judicial feature attributes through learning the hash function. We further impose triplet constraints to ensure that the similarity of judicial case data is well preserved when projected into the Hamming space. Comprehensive experimental results on public datasets show that the proposed method is superior in the task of similar judicial case matching and is suitable for large-scale similar judicial case matching.

Download Full-text

A Software Testing-Progress Evaluation Model Based on a Digestion Process of Test-Cases

International Journal of Reliability Quality and Safety Engineering ◽

10.1142/s0218539397000163 ◽

1997 ◽

Vol 04 (03) ◽

pp. 229-239 ◽

Cited By ~ 2

Author(s):

Mitsuhiro Kimura ◽

Shigeru Yamada

Keyword(s):

Software Testing ◽

Large Scale ◽

Evaluation Model ◽

Test Case ◽

Test Cases ◽

Software Production ◽

Proposed Model ◽

Software Product ◽

Software Engineers ◽

Case Data

It is of great importance for software engineers and managers to evaluate software testing-progress in a large-scale software production process, since tremendous software development resources must be consumed to achieve high quality and reliability of a software product. By focusing on the behavior of the digested test-case data observed in the testing process, we construct a stochastic model and derive several quantitative measures for software testing-progress evaluation. Actual data observed in the testing process are analyzed by the proposed model, and we discuss the applicability of our models.

Download Full-text

Multi-task learning for Chinese clinical named entity recognition with external knowledge

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01717-1 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Ming Cheng ◽

Shufeng Xiong ◽

Fei Li ◽

Pan Liang ◽

Jianbo Gao

Keyword(s):

Large Scale ◽

Named Entity Recognition ◽

Entity Recognition ◽

Data Driven ◽

Named Entity ◽

Proposed Model ◽

Comparison Results ◽

Benchmark Datasets ◽

Public Datasets ◽

F Measure

Abstract Background Named entity recognition (NER) on Chinese electronic medical/healthcare records has attracted significantly attentions as it can be applied to building applications to understand these records. Most previous methods have been purely data-driven, requiring high-quality and large-scale labeled medical data. However, labeled data is expensive to obtain, and these data-driven methods are difficult to handle rare and unseen entities. Methods To tackle these problems, this study presents a novel multi-task deep neural network model for Chinese NER in the medical domain. We incorporate dictionary features into neural networks, and a general secondary named entity segmentation is used as auxiliary task to improve the performance of the primary task of named entity recognition. Results In order to evaluate the proposed method, we compare it with other currently popular methods, on three benchmark datasets. Two of the datasets are publicly available, and the other one is constructed by us. Experimental results show that the proposed model achieves 91.07% average f-measure on the two public datasets and 87.05% f-measure on private dataset. Conclusions The comparison results of different models demonstrated the effectiveness of our model. The proposed model outperformed traditional statistical models.

Download Full-text

CVCDAP: an integrated platform for molecular and clinical analysis of cancer virtual cohorts

Nucleic Acids Research ◽

10.1093/nar/gkaa423 ◽

2020 ◽

Vol 48 (W1) ◽

pp. W463-W471

Author(s):

Xiaoqing Guan ◽

Meng Cai ◽

Yang Du ◽

Ence Yang ◽

Jiafu Ji ◽

...

Keyword(s):

Large Scale ◽

Molecular Mechanisms ◽

Clinical Analysis ◽

Therapeutic Approaches ◽

Web Based ◽

Related Data ◽

Public Datasets ◽

Level Analysis ◽

Selection Of Patients ◽

Selection Of

Abstract Recent large-scale multi-omics studies resulted in quick accumulation of an overwhelming amount of cancer-related data, which provides an unprecedented resource to interrogate diverse questions. While certain existing web servers are valuable and widely used, analysis and visualization functions with regard to re-investigation of these data at cohort level are not adequately addressed. Here, we present CVCDAP, a web-based platform to deliver an interactive and customizable toolbox off the shelf for cohort-level analysis of TCGA and CPTAC public datasets, as well as user uploaded datasets. CVCDAP allows flexible selection of patients sharing common molecular and/or clinical characteristics across multiple studies as a virtual cohort, and provides dozens of built-in customizable tools for seamless genomic, transcriptomic, proteomic and clinical analysis of a single virtual cohort, as well as, to compare two virtual cohorts with relevance. The flexibility and analytic competence of CVCDAP empower experimental and clinical researchers to identify new molecular mechanisms and develop potential therapeutic approaches, by building and analyzing virtual cohorts for their subject of interests. We demonstrate that CVCDAP can conveniently reproduce published findings and reveal novel insights by two applications. The CVCDAP web server is freely available at https://omics.bjcancer.org/cvcdap/.

Download Full-text

Cross-view pedestrian clustering via graph convolution network for unsupervised person re-identification

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-200435 ◽

2020 ◽

Vol 39 (3) ◽

pp. 4453-4462

Author(s):

Yuanyuan Wang ◽

Xiang Li ◽

Mingxin Jiang ◽

Haiyan Zhang ◽

E Tang

Keyword(s):

Large Scale ◽

Metric Learning ◽

Image Features ◽

Actual Application ◽

Camera Angle ◽

Shared Space ◽

Proposed Model ◽

Camera View ◽

The Difference ◽

Public Datasets

At present, supervised person re-identification method achieves high identification performance. However, there are a lot of cross cameras with unlabeled data in the actual application scenarios. The high cost of marking data will greatly reduce the effect of the supervised learning model transferring to other scene domains. Therefore, unsupervised learning of person re-identification becomes more attractive in the real world. In addition, due to changes in camera angle, illumination and posture, the extracted person image representation is generally different in the non-cross camera view, but the existing algorithm ignores the difference among cross camera images under camera parameters and environments. In order to overcome the above problems, we propose unsupervised person re-identification metric learning method. The model learns a shared space to reduce the discrepancy under different cameras. The graph convolution network is further employed to cluster the cross-view image features extracted from the shared space. Our model improves the scalability of pedestrian re-identification in practical application scenarios. Extensive experiments on four large-scale person re-identification public datasets have been conducted to demonstrate the effectiveness of the proposed model.

Download Full-text

Platelet Anti-Aggregating Activity and Tolerance of Clopidogrel in Atherosclerotic Patients

Thrombosis and Haemostasis ◽

10.1055/s-0038-1650689 ◽

1996 ◽

Vol 76 (06) ◽

pp. 0939-0943 ◽

Cited By ~ 74

Author(s):

B Boneu ◽

G Destelle ◽

Keyword(s):

Platelet Aggregation ◽

Large Scale ◽

Bleeding Time ◽

Antithrombotic Activity ◽

Ambulatory Patients ◽

Patients At Risk ◽

Set Up ◽

Ischemic Events ◽

Large Scale Clinical Trial ◽

Selection Of

SummaryThe anti-aggregating activity of five rising doses of clopidogrel has been compared to that of ticlopidine in atherosclerotic patients. The aim of this study was to determine the dose of clopidogrel which should be tested in a large scale clinical trial of secondary prevention of ischemic events in patients suffering from vascular manifestations of atherosclerosis [CAPRIE (Clopidogrel vs Aspirin in Patients at Risk of Ischemic Events) trial]. A multicenter study involving 9 haematological laboratories and 29 clinical centers was set up. One hundred and fifty ambulatory patients were randomized into one of the seven following groups: clopidogrel at doses of 10, 25, 50,75 or 100 mg OD, ticlopidine 250 mg BID or placebo. ADP and collagen-induced platelet aggregation tests were performed before starting treatment and after 7 and 28 days. Bleeding time was performed on days 0 and 28. Patients were seen on days 0, 7 and 28 to check the clinical and biological tolerability of the treatment. Clopidogrel exerted a dose-related inhibition of ADP-induced platelet aggregation and bleeding time prolongation. In the presence of ADP (5 \lM) this inhibition ranged between 29% and 44% in comparison to pretreatment values. The bleeding times were prolonged by 1.5 to 1.7 times. These effects were non significantly different from those produced by ticlopidine. The clinical tolerability was good or fair in 97.5% of the patients. No haematological adverse events were recorded. These results allowed the selection of 75 mg once a day to evaluate and compare the antithrombotic activity of clopidogrel to that of aspirin in the CAPRIE trial.

Download Full-text

Model and Method for Contributor’s Quality Assessment in Community Image Tagging Systems

Information and Control Systems ◽

10.31799/1684-8853-2018-4-45-51 ◽

2018 ◽

pp. 45-51

Author(s):

A. V. Ponomarev

Keyword(s):

Large Scale ◽

Wide Spectrum ◽

Preference Relation ◽

Pairwise Comparison ◽

Ground Truth ◽

Comparison Method ◽

Characteristic Matrix ◽

Image Tagging ◽

Proposed Model

Introduction: Large-scale human-computer systems involving people of various skills and motivation into the information processing process are currently used in a wide spectrum of applications. An acute problem in such systems is assessing the expected quality of each contributor; for example, in order to penalize incompetent or inaccurate ones and to promote diligent ones.Purpose: To develop a method of assessing the expected contributor’s quality in community tagging systems. This method should only use generally unreliable and incomplete information provided by contributors (with ground truth tags unknown).Results:A mathematical model is proposed for community image tagging (including the model of a contributor), along with a method of assessing the expected contributor’s quality. The method is based on comparing tag sets provided by different contributors for the same images, being a modification of pairwise comparison method with preference relation replaced by a special domination characteristic. Expected contributors’ quality is evaluated as a positive eigenvector of a pairwise domination characteristic matrix. Community tagging simulation has confirmed that the proposed method allows you to adequately estimate the expected quality of community tagging system contributors (provided that the contributors' behavior fits the proposed model).Practical relevance: The obtained results can be used in the development of systems based on coordinated efforts of community (primarily, community tagging systems).

Download Full-text

Bioluminescent proteins prediction with voting strategy.

Current Bioinformatics ◽

10.2174/1574893615999200601122328 ◽

2020 ◽

Vol 15 ◽

Author(s):

Shulin Zhao ◽

Ying Ju ◽

Xiucai Ye ◽

Jun Zhang ◽

Shuguang Han

Keyword(s):

Amino Acid ◽

Model Building ◽

Prediction Method ◽

Pair Composition ◽

Proposed Model ◽

Counting Rules ◽

Significant Phenomenon ◽

Voting Strategy ◽

Improved Accuracy ◽

Selection Of

Background: Bioluminescence is a unique and significant phenomenon in nature. Bioluminescence is important for the lifecycle of some organisms and is valuable in biomedical research, including for gene expression analysis and bioluminescence imaging technology.In recent years, researchers have identified a number of methods for predicting bioluminescent proteins (BLPs), which have increased in accuracy, but could be further improved. Method: In this paper, we propose a new bioluminescent proteins prediction method based on a voting algorithm. We used four methods of feature extraction based on the amino acid sequence. We extracted 314 dimensional features in total from amino acid composition, physicochemical properties and k-spacer amino acid pair composition. In order to obtain the highest MCC value to establish the optimal prediction model, then used a voting algorithm to build the model.To create the best performing model, we discuss the selection of base classifiers and vote counting rules. Results: Our proposed model achieved 93.4% accuracy, 93.4% sensitivity and 91.7% specificity in the test set, which was better than any other method. We also improved a previous prediction of bioluminescent proteins in three lineages using our model building method, resulting in greatly improved accuracy.

Download Full-text

Multi Disease-Prediction Framework Using Hybrid Deep Learning: An Optimal Prediction Model (Preprint)

10.2196/preprints.22865 ◽

2020 ◽

Author(s):

Anusha Ampavathi ◽

Vijaya Saradhi T

Keyword(s):

Feature Extraction ◽

Big Data ◽

Deep Learning ◽

Weight Function ◽

Optimization Algorithm ◽

Large Scale ◽

Heuristic Algorithms ◽

Disease Prediction ◽

Health Care Decisions ◽

Proposed Model

UNSTRUCTURED Big data and its approaches are generally helpful for healthcare and biomedical sectors for predicting the disease. For trivial symptoms, the difficulty is to meet the doctors at any time in the hospital. Thus, big data provides essential data regarding the diseases on the basis of the patient’s symptoms. For several medical organizations, disease prediction is important for making the best feasible health care decisions. Conversely, the conventional medical care model offers input as structured that requires more accurate and consistent prediction. This paper is planned to develop the multi-disease prediction using the improvised deep learning concept. Here, the different datasets pertain to “Diabetes, Hepatitis, lung cancer, liver tumor, heart disease, Parkinson’s disease, and Alzheimer’s disease”, from the benchmark UCI repository is gathered for conducting the experiment. The proposed model involves three phases (a) Data normalization (b) Weighted normalized feature extraction, and (c) prediction. Initially, the dataset is normalized in order to make the attribute's range at a certain level. Further, weighted feature extraction is performed, in which a weight function is multiplied with each attribute value for making large scale deviation. Here, the weight function is optimized using the combination of two meta-heuristic algorithms termed as Jaya Algorithm-based Multi-Verse Optimization algorithm (JA-MVO). The optimally extracted features are subjected to the hybrid deep learning algorithms like “Deep Belief Network (DBN) and Recurrent Neural Network (RNN)”. As a modification to hybrid deep learning architecture, the weight of both DBN and RNN is optimized using the same hybrid optimization algorithm. Further, the comparative evaluation of the proposed prediction over the existing models certifies its effectiveness through various performance measures.

Download Full-text

Cyberstalking Victimization Model Using Criminological Theory: A Systematic Literature Review, Taxonomies, Applications, Tools, and Validations

Electronics ◽

10.3390/electronics10141670 ◽

2021 ◽

Vol 10 (14) ◽

pp. 1670

Author(s):

Waheeb Abu-Ulbeh ◽

Maryam Altalhi ◽

Laith Abualigah ◽

Abdulwahab Ali Almazroi ◽

Putra Sumari ◽

...

Keyword(s):

Data Analysis ◽

Structural Equation ◽

Large Scale ◽

Review Paper ◽

Essential Element ◽

Routine Activities ◽

Criminological Theory ◽

Equation Modeling ◽

Future Research ◽

Proposed Model

Cyberstalking is a growing anti-social problem being transformed on a large scale and in various forms. Cyberstalking detection has become increasingly popular in recent years and has technically been investigated by many researchers. However, cyberstalking victimization, an essential part of cyberstalking, has empirically received less attention from the paper community. This paper attempts to address this gap and develop a model to understand and estimate the prevalence of cyberstalking victimization. The model of this paper is produced using routine activities and lifestyle exposure theories and includes eight hypotheses. The data of this paper is collected from the 757 respondents in Jordanian universities. This review paper utilizes a quantitative approach and uses structural equation modeling for data analysis. The results revealed a modest prevalence range is more dependent on the cyberstalking type. The results also indicated that proximity to motivated offenders, suitable targets, and digital guardians significantly influences cyberstalking victimization. The outcome from moderation hypothesis testing demonstrated that age and residence have a significant effect on cyberstalking victimization. The proposed model is an essential element for assessing cyberstalking victimization among societies, which provides a valuable understanding of the prevalence of cyberstalking victimization. This can assist the researchers and practitioners for future research in the context of cyberstalking victimization.

Download Full-text

Methodology for Determining the Location of River Ports on a Modernized Waterway Based on Non-Cost Criteria: A Case Study of the Odra River Waterway

Sustainability ◽

10.3390/su13063571 ◽

2021 ◽

Vol 13 (6) ◽

pp. 3571

Author(s):

Bogusz Wiśnicki ◽

Dorota Dybkowska-Stefek ◽

Justyna Relisko-Rybak ◽

Łukasz Kolanda

Keyword(s):

Large Scale ◽

Multicriteria Decision Making ◽

Research Process ◽

Optimal Selection ◽

Investment Projects ◽

Odra River ◽

Research Problems ◽

Selection Of ◽

Construction Works

The paper responds to research problems related to the implementation of large-scale investment projects in waterways in Europe. As part of design and construction works, it is necessary to indicate river ports that play a major role within the European transport network as intermodal nodes. This entails a number of challenges, the cardinal one being the optimal selection of port locations, taking into account the new transport, economic, and geopolitical situation that will be brought about by modernized waterways. The aim of the paper was to present an original methodology for determining port locations for modernized waterways based on non-cost criteria, as an extended multicriteria decision-making method (MCDM) and employing GIS (Geographic Information System)-based tools for spatial analysis. The methodology was designed to be applicable to the varying conditions of a river’s hydroengineering structures (free-flowing river, canalized river, and canals) and adjustable to the requirements posed by intermodal supply chains. The method was applied to study the Odra River Waterway, which allowed the formulation of recommendations regarding the application of the method in the case of different river sections at every stage of the research process.

Download Full-text