A Cell Counting Framework Based on Random Forest and Density Map

Cell counting is a fundamental part of biomedical and pathological research. Predicting a density map is the mainstream method to count cells. As an easy-trained and well-generalized model, the random forest is often used to learn the cell images and predict the density maps. However, it cannot predict the data that are beyond the training data, which may result in underestimation. To overcome this problem, we propose a cell counting framework to predict the density map by detecting cells. The cell counting framework contains two parts: the training data preparation and the detection framework. The former makes sure that the cells can be detected even when overlapping, and the latter makes sure the count result accurate and robust. The proposed method uses multiple random forests to predict various probability maps where the cells can be detected by Hessian matrix. Take all the detection results into consideration to get the density map and achieve better performance. We conducted experiments on three public cell datasets. Experimental results showed that the proposed model performs better than the traditional random forest (RF) in terms of accuracy and robustness, and even superior to some state-of-the-art deep learning models. Especially when the training data are small, which is the usual case in cell counting, the count errors on VGG cells, and MBM cells were decreased from 3.4 to 2.9, from 11.3 to 9.3, respectively. The proposed model can obtain the lowest count error and achieves state-of-the-art.

Download Full-text

Estimating probability of banking crises using random forest

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v10.i2.pp407-413 ◽

2021 ◽

Vol 10 (2) ◽

pp. 407

Author(s):

Sri Hartini ◽

Zuherman Rustam ◽

Glori Stephani Saragih ◽

María Jesús Segovia Vargas

Keyword(s):

Random Forest ◽

State Of The Art ◽

Banking Crises ◽

Training Data ◽

Annual Data ◽

Systemic Crisis ◽

Classification And Regression ◽

Systemic Crises ◽

The Impact ◽

Better Than

<span id="docs-internal-guid-4935b5ce-7fff-d9fa-75c7-0c6a5aa1f9a6"><span>Banks have a crucial role in the financial system. When many banks suffer from the crisis, it can lead to financial instability. According to the impact of the crises, the banking crisis can be divided into two categories, namely systemic and non-systemic crisis. When systemic crises happen, it may cause even stable banks bankrupt. Hence, this paper proposed a random forest for estimating the probability of banking crises as prevention action. Random forest is well-known as a robust technique both in classification and regression, which is far from the intervention of outliers and overfitting. The experiments were then constructed using the financial crisis database, containing a sample of 79 countries in the period 1981-1999 (annual data). This dataset has 521 samples consisting of 164 crisis samples and 357 non-crisis cases. From the experiments, it was concluded that utilizing 90 percent of training data would deliver 0.98 accuracy, 0.92 sensitivity, 1.00 precision, and 0.96 F1-Score as the highest score than other percentages of training data. These results are also better than state-of-the-art methods used in the same dataset. Therefore, the proposed method is shown promising results to predict the probability of banking crises.</span></span>

Download Full-text

A Girl Has No Name: Automated Authorship Obfuscation using Mutant-X

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2019-0058 ◽

2019 ◽

Vol 2019 (4) ◽

pp. 54-71

Author(s):

Asad Mahmood ◽

Faizan Ahmad ◽

Zubair Shafiq ◽

Padmini Srinivasan ◽

Fareed Zaffar

Keyword(s):

State Of The Art ◽

Fitness Function ◽

Random Search ◽

Training Data ◽

Authorship Attribution ◽

Future Research ◽

Original Text ◽

Text Corpora ◽

Semantic Relevance ◽

Better Than

Abstract Stylometric authorship attribution aims to identify an anonymous or disputed document’s author by examining its writing style. The development of powerful machine learning based stylometric authorship attribution methods presents a serious privacy threat for individuals such as journalists and activists who wish to publish anonymously. Researchers have proposed several authorship obfuscation approaches that try to make appropriate changes (e.g. word/phrase replacements) to evade attribution while preserving semantics. Unfortunately, existing authorship obfuscation approaches are lacking because they either require some manual effort, require significant training data, or do not work for long documents. To address these limitations, we propose a genetic algorithm based random search framework called Mutant-X which can automatically obfuscate text to successfully evade attribution while keeping the semantics of the obfuscated text similar to the original text. Specifically, Mutant-X sequentially makes changes in the text using mutation and crossover techniques while being guided by a fitness function that takes into account both attribution probability and semantic relevance. While Mutant-X requires black-box knowledge of the adversary’s classifier, it does not require any additional training data and also works on documents of any length. We evaluate Mutant-X against a variety of authorship attribution methods on two different text corpora. Our results show that Mutant-X can decrease the accuracy of state-of-the-art authorship attribution methods by as much as 64% while preserving the semantics much better than existing automated authorship obfuscation approaches. While Mutant-X advances the state-of-the-art in automated authorship obfuscation, we find that it does not generalize to a stronger threat model where the adversary uses a different attribution classifier than what Mutant-X assumes. Our findings warrant the need for future research to improve the generalizability (or transferability) of automated authorship obfuscation approaches.

Download Full-text

Latent Semantic Analysis using a Dennis Coefficient for English Sentiment Classification in a Parallel System

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2018.3.3044 ◽

2018 ◽

Vol 13 (3) ◽

pp. 408-428 ◽

Cited By ~ 4

Author(s):

Phu Vo Ngoc

Keyword(s):

Latent Semantic Analysis ◽

Semantic Analysis ◽

Sentiment Classification ◽

Training Data ◽

The Novel ◽

Data Set ◽

Proposed Model ◽

Testing Data ◽

Novel Model ◽

Better Than

We have already survey many significant approaches for many years because there are many crucial contributions of the sentiment classification which can be applied in everyday life, such as in political activities, commodity production, and commercial activities. We have proposed a novel model using a Latent Semantic Analysis (LSA) and a Dennis Coefficient (DNC) for big data sentiment classification in English. Many LSA vectors (LSAV) have successfully been reformed by using the DNC. We use the DNC and the LSAVs to classify 11,000,000 documents of our testing data set to 5,000,000 documents of our training data set in English. This novel model uses many sentiment lexicons of our basis English sentiment dictionary (bESD). We have tested the proposed model in both a sequential environment and a distributed network system. The results of the sequential system are not as good as that of the parallel environment. We have achieved 88.76% accuracy of the testing data set, and this is better than the accuracies of many previous models of the semantic analysis. Besides, we have also compared the novel model with the previous models, and the experiments and the results of our proposed model are better than that of the previous model. Many different fields can widely use the results of the novel model in many commercial applications and surveys of the sentiment classification.

Download Full-text

Combining Variational Autoencoders & Generative Adversarial Networks to Improve Image Quality

10.31219/osf.io/8bmdu ◽

2019 ◽

Author(s):

Atin Sakkeer Hussain

Keyword(s):

Image Quality ◽

Random Noise ◽

Training Data ◽

Generative Adversarial Networks ◽

Improve Image Quality ◽

Adversarial Networks ◽

Proposed Model ◽

Variational Autoencoder ◽

Proper Training ◽

Better Than

Generative Adversarial Networks(GAN) are trained to generate images from random noise vectors, but often these images turn out poorly due to any of several reasons such as model collapse, lack of proper training data, lack of training, etc. To combat this issue this paper, makes use of a Variational Autoencoder(VAE). The VAE is trained on a combination of the training & generated data, after this the VAE can be used to map images generated by the GAN to better versions of it. (This is similar to Denoising, but with few variations in the image). In addition to improving quality the proposed model is shown to work better than normal WGAN’s on sparse datasets with higher variety, in equal number of training epochs.

Download Full-text

A Topic-Aware Reinforced Model for Weakly Supervised Stance Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33017249 ◽

2019 ◽

Vol 33 ◽

pp. 7249-7256

Author(s):

Penghui Wei ◽

Wenji Mao ◽

Guandan Chen

Keyword(s):

Reinforcement Learning ◽

Opinion Mining ◽

State Of The Art ◽

Public Attitudes ◽

Representation Learning ◽

Experimental Results ◽

Training Data ◽

Policy Network ◽

Proposed Model ◽

Weakly Supervised

Analyzing public attitudes plays an important role in opinion mining systems. Stance detection aims to determine from a text whether its author is in favor of, against, or neutral towards a given target. One challenge of this task is that a text may not explicitly express an attitude towards the target, but existing approaches utilize target content alone to build models. Moreover, although weakly supervised approaches have been proposed to ease the burden of manually annotating largescale training data, such approaches are confronted with noisy labeling problem. To address the above two issues, in this paper, we propose a Topic-Aware Reinforced Model (TARM) for weakly supervised stance detection. Our model consists of two complementary components: (1) a detection network that incorporates target-related topic information into representation learning for identifying stance effectively; (2) a policy network that learns to eliminate noisy instances from auto-labeled data based on off-policy reinforcement learning. Two networks are alternately optimized to improve each other’s performances. Experimental results demonstrate that our proposed model TARM outperforms the state-of-the-art approaches.

Download Full-text

Prediction of Enzyme Mutant Activity Using Computational Mutagenesis and Incremental Transduction

Advances in Bioinformatics ◽

10.1155/2011/958129 ◽

2011 ◽

Vol 2011 ◽

pp. 1-9 ◽

Cited By ~ 5

Author(s):

Nada Basit ◽

Harry Wechsler

Keyword(s):

Enzyme Activity ◽

Random Forest ◽

Incremental Learning ◽

Cross Validation ◽

State Of The Art ◽

Delaunay Tessellation ◽

Computational Mutagenesis ◽

Hiv 1 ◽

Over Time ◽

Better Than

Wet laboratory mutagenesis to determine enzyme activity changes is expensive and time consuming. This paper expands on standard one-shot learning by proposing an incremental transductive method (T2bRF) for the prediction of enzyme mutant activity during mutagenesis using Delaunay tessellation and 4-body statistical potentials for representation. Incremental learning is in tune with both eScience and actual experimentation, as it accounts for cumulative annotation effects of enzyme mutant activity over time. The experimental results reported, using cross-validation, show that overall the incremental transductive method proposed, using random forest as base classifier, yields better results compared to one-shot learning methods. T2bRF is shown to yield 90% on T4 and LAC (and 86% on HIV-1). This is significantly better than state-of-the-art competing methods, whose performance yield is at 80% or less using the same datasets.

Download Full-text

Distant Supervision for Relation Extraction with Sentence Selection and Interaction Representation

Wireless Communications and Mobile Computing ◽

10.1155/2021/8889075 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Tiantian Chen ◽

Nianbin Wang ◽

Hongbin Wang ◽

Haomin Zhan

Keyword(s):

Large Scale ◽

Semantic Information ◽

State Of The Art ◽

Relation Extraction ◽

Semantic Features ◽

Distant Supervision ◽

Word Level ◽

Proposed Model ◽

Relation Prediction ◽

Better Than

Distant supervision (DS) has been widely used for relation extraction (RE), which automatically generates large-scale labeled data. However, there is a wrong labeling problem, which affects the performance of RE. Besides, the existing method suffers from the lack of useful semantic features for some positive training instances. To address the above problems, we propose a novel RE model with sentence selection and interaction representation for distantly supervised RE. First, we propose a pattern method based on the relation trigger words as a sentence selector to filter out noisy sentences to alleviate the wrong labeling problem. After clean instances are obtained, we propose the interaction representation using the word-level attention mechanism-based entity pairs to dynamically increase the weights of the words related to entity pairs, which can provide more useful semantic information for relation prediction. The proposed model outperforms the strongest baseline by 2.61 in F1-score on a widely used dataset, which proves that our model performs significantly better than the state-of-the-art RE systems.

Download Full-text

Automatically Paraphrasing via Sentence Reconstruction and Round-trip Translation

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/525 ◽

2021 ◽

Author(s):

Zilu Guo ◽

Zhongqiang Huang ◽

Kenny Q. Zhu ◽

Guandan Chen ◽

Kaibo Zhang ◽

...

Keyword(s):

Machine Translation ◽

Question Answering ◽

Domain Adaptation ◽

State Of The Art ◽

Training Data ◽

Round Trip ◽

Previous State ◽

Supervised Methods ◽

Paraphrase Generation ◽

Better Than

Paraphrase generation plays key roles in NLP tasks such as question answering, machine translation, and information retrieval. In this paper, we propose a novel framework for paraphrase generation. It simultaneously decodes the output sentence using a pretrained wordset-to-sequence model and a round-trip translation model. We evaluate this framework on Quora, WikiAnswers, MSCOCO and Twitter, and show its advantage over previous state-of-the-art unsupervised methods and distantly-supervised methods by significant margins on all datasets. For Quora and WikiAnswers, our framework even performs better than some strongly supervised methods with domain adaptation. Further, we show that the generated paraphrases can be used to augment the training data for machine translation to achieve substantial improvements.

Download Full-text

Scalable and Generalizable Social Bot Detection through Data Selection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5460 ◽

2020 ◽

Vol 34 (01) ◽

pp. 1096-1103 ◽

Cited By ~ 16

Author(s):

Kai-Cheng Yang ◽

Onur Varol ◽

Pik-Mai Hui ◽

Filippo Menczer

Keyword(s):

Cross Validation ◽

State Of The Art ◽

Rapid Development ◽

Model Performance ◽

Training Data ◽

Model Accuracy ◽

Proposed Model ◽

Information Manipulation ◽

Bot Detection ◽

Development State

Efficient and reliable social bot classification is crucial for detecting information manipulation on social media. Despite rapid development, state-of-the-art bot detection models still face generalization and scalability challenges, which greatly limit their applications. In this paper we propose a framework that uses minimal account metadata, enabling efficient analysis that scales up to handle the full stream of public tweets of Twitter in real time. To ensure model accuracy, we build a rich collection of labeled datasets for training and validation. We deploy a strict validation system so that model performance on unseen datasets is also optimized, in addition to traditional cross-validation. We find that strategically selecting a subset of training data yields better model accuracy and generalization than exhaustively training on all available data. Thanks to the simplicity of the proposed model, its logic can be interpreted to provide insights into social bot characteristics.

Download Full-text

Model Simplification of Deep Random Forest for Real-Time Applications of Various Sensor Data

Sensors ◽

10.3390/s21093004 ◽

2021 ◽

Vol 21 (9) ◽

pp. 3004

Author(s):

Sangwon Kim ◽

Byoung-Chul Ko ◽

Jaeyeal Nam

Keyword(s):

Random Forest ◽

High Performance ◽

State Of The Art ◽

Black Box ◽

Sensor Data ◽

Model Simplification ◽

Robust Performance ◽

Memory Efficiency ◽

Proposed Model ◽

Rule Set

The deep random forest (DRF) has recently gained new attention in deep learning because it has a high performance similar to that of a deep neural network (DNN) and does not rely on a backpropagation. However, it connects a large number of decision trees to multiple layers, thereby making analysis difficult. This paper proposes a new method for simplifying a black-box model of a DRF using a proposed rule elimination. For this, we consider quantifying the feature contributions and frequency of the fully trained DRF in the form of a decision rule set. The feature contributions provide a basis for determining how features affect the decision process in a rule set. Model simplification is achieved by eliminating unnecessary rules by measuring the feature contributions. Consequently, the simplified and transparent DRF has fewer parameters and rules than before. The proposed method was successfully applied to various DRF models and benchmark sensor datasets while maintaining a robust performance despite the elimination of a large number of rules. A comparison with state-of-the-art compressed DNNs also showed the proposed model simplification’s higher parameter compression and memory efficiency with a similar classification accuracy.

Download Full-text