A Novel Evolutionary Biclustering Approach using MapReduce(EBC-MR)

Author(s):  
Rathipriya R.

A novel biclustering approach is proposed in this paper, which can be used to cluster data (like web data, gene expression data) into local pattern using MapReduce framework. The proposed biclustering approach extracts the highly coherent bicluster using a correlation measure called Average Correlation Value measure. Furthermore, MapReduce based genetic algorithm is firstly used to the biclustering of web data. This method can avoid local convergence in the optimization algorithms mostly. The MSWeb dataset and MSNBC web usage data set are used to test the performance of new MapReduce based Evolutionary biclustering algorithm. The experimental study is carried out for comparison of proposed algorithm with traditional genetic algorithm in biclustering. The results reveal that novel proposed approach preforms better than existing evolutionary biclustering approach.

2012 ◽  
Vol 182-183 ◽  
pp. 2100-2104 ◽  
Author(s):  
Yao Tang Lin ◽  
Jia Li Hou

This paper proposes a specialized genetic algorithm (GA) based on an expended relational representation named weight-based encoding for solving one-dimensional bin packing problem (BPP-1). The encoding provides a totally constraint-handling scheme to address general and specific constraints, while naturally eliminates redundancy and infeasibility of previous representations for BPP-1. The current study performs experiments for solving some problem instances from a benchmark data set by our specific coded genetic algorithm with one-point, two-point and grouping crossovers. Experimental results show that the proposed methodology works well for solving BPP-1 and performs well on experimented benchmark instances. In addition, the results also show that two-point and grouping crossovers work better than one-point crossover in our experiments.


Author(s):  
Parisa Torkaman

The generalized inverted exponential distribution is introduced as a lifetime model with good statistical properties. This paper, the estimation of the probability density function and the cumulative distribution function of with five different estimation methods: uniformly minimum variance unbiased(UMVU), maximum likelihood(ML), least squares(LS), weighted least squares (WLS) and percentile(PC) estimators are considered. The performance of these estimation procedures, based on the mean squared error (MSE) by numerical simulations are compared. Simulation studies express that the UMVU estimator performs better than others and when the sample size is large enough the ML and UMVU estimators are almost equivalent and efficient than LS, WLS and PC. Finally, the result using a real data set are analyzed.


2020 ◽  
Vol 27 (4) ◽  
pp. 329-336 ◽  
Author(s):  
Lei Xu ◽  
Guangmin Liang ◽  
Baowen Chen ◽  
Xu Tan ◽  
Huaikun Xiang ◽  
...  

Background: Cell lytic enzyme is a kind of highly evolved protein, which can destroy the cell structure and kill the bacteria. Compared with antibiotics, cell lytic enzyme will not cause serious problem of drug resistance of pathogenic bacteria. Thus, the study of cell wall lytic enzymes aims at finding an efficient way for curing bacteria infectious. Compared with using antibiotics, the problem of drug resistance becomes more serious. Therefore, it is a good choice for curing bacterial infections by using cell lytic enzymes. Cell lytic enzyme includes endolysin and autolysin and the difference between them is the purpose of the break of cell wall. The identification of the type of cell lytic enzymes is meaningful for the study of cell wall enzymes. Objective: In this article, our motivation is to predict the type of cell lytic enzyme. Cell lytic enzyme is helpful for killing bacteria, so it is meaningful for study the type of cell lytic enzyme. However, it is time consuming to detect the type of cell lytic enzyme by experimental methods. Thus, an efficient computational method for the type of cell lytic enzyme prediction is proposed in our work. Method: We propose a computational method for the prediction of endolysin and autolysin. First, a data set containing 27 endolysins and 41 autolysins is built. Then the protein is represented by tripeptides composition. The features are selected with larger confidence degree. At last, the classifier is trained by the labeled vectors based on support vector machine. The learned classifier is used to predict the type of cell lytic enzyme. Results: Following the proposed method, the experimental results show that the overall accuracy can attain 97.06%, when 44 features are selected. Compared with Ding's method, our method improves the overall accuracy by nearly 4.5% ((97.06-92.9)/92.9%). The performance of our proposed method is stable, when the selected feature number is from 40 to 70. The overall accuracy of tripeptides optimal feature set is 94.12%, and the overall accuracy of Chou's amphiphilic PseAAC method is 76.2%. The experimental results also demonstrate that the overall accuracy is improved by nearly 18% when using the tripeptides optimal feature set. Conclusion: The paper proposed an efficient method for identifying endolysin and autolysin. In this paper, support vector machine is used to predict the type of cell lytic enzyme. The experimental results show that the overall accuracy of the proposed method is 94.12%, which is better than some existing methods. In conclusion, the selected 44 features can improve the overall accuracy for identification of the type of cell lytic enzyme. Support vector machine performs better than other classifiers when using the selected feature set on the benchmark data set.


Atmosphere ◽  
2021 ◽  
Vol 12 (6) ◽  
pp. 687
Author(s):  
Salman Sakib ◽  
Dawit Ghebreyesus ◽  
Hatim O. Sharif

Tropical Storm Imelda struck the southeast coastal regions of Texas from 17–19 September, 2019, and delivered precipitation above 500 mm over about 6000 km2. The performance of the three IMERG (Early-, Late-, and Final-run) GPM satellite-based precipitation products was evaluated against Stage-IV radar precipitation estimates. Basic and probabilistic statistical metrics, such as CC, RSME, RBIAS, POD, FAR, CSI, and PSS were employed to assess the performance of the IMERG products. The products captured the event adequately, with a fairly high POD value of 0.9. The best product (Early-run) showed an average correlation coefficient of 0.60. The algorithm used to produce the Final-run improved the quality of the data by removing systematic errors that occurred in the near-real-time products. Less than 5 mm RMSE error was experienced in over three-quarters (ranging from 73% to 76%) of the area by all three IMERG products in estimating the Tropical Storm Imelda. The Early-run product showed a much better RBIAS relatively to the Final-run product. The overall performance was poor, as areas with an acceptable range of RBIAS (i.e., between −10% and 10%) in all the three IMERG products were only 16% to 17% of the total area. Overall, the Early-run product was found to be better than Late- and Final-run.


Energies ◽  
2021 ◽  
Vol 14 (4) ◽  
pp. 924
Author(s):  
Zhenzhen Huang ◽  
Qiang Niu ◽  
Ilsun You ◽  
Giovanni Pau

Wearable devices used for human body monitoring has broad applications in smart home, sports, security and other fields. Wearable devices provide an extremely convenient way to collect a large amount of human motion data. In this paper, the human body acceleration feature extraction method based on wearable devices is studied. Firstly, Butterworth filter is used to filter the data. Then, in order to ensure the extracted feature value more accurately, it is necessary to remove the abnormal data in the source. This paper combines Kalman filter algorithm with a genetic algorithm and use the genetic algorithm to code the parameters of the Kalman filter algorithm. We use Standard Deviation (SD), Interval of Peaks (IoP) and Difference between Adjacent Peaks and Troughs (DAPT) to analyze seven kinds of acceleration. At last, SisFall data set, which is a globally available data set for study and experiments, is used for experiments to verify the effectiveness of our method. Based on simulation results, we can conclude that our method can distinguish different activity clearly.


2019 ◽  
Vol 2019 ◽  
pp. 1-15
Author(s):  
Jingtian Zhang ◽  
Fuxing Yang ◽  
Xun Weng

Robotic mobile fulfilment system (RMFS) is an efficient and flexible order picking system where robots ship the movable shelves with items to the picking stations. This innovative parts-to-picker system, known as Kiva system, is especially suited for e-commerce fulfilment centres and has been widely used in practice. However, there are lots of resource allocation problems in RMFS. The robots allocation problem of deciding which robot will be allocated to a delivery task has a significant impact on the productivity of the whole system. We model this problem as a resource-constrained project scheduling problem with transfer times (RCPSPTT) based on the accurate analysis of driving and delivering behaviour of robots. A dedicated serial schedule generation scheme and a genetic algorithm using building-blocks-based crossover (BBX) operator are proposed to solve this problem. The designed algorithm can be combined into a dynamic scheduling structure or used as the basis of calculation for other allocation problems. Experiment instances are generated based on the characteristics of RMFS, and the computation results show that the proposed algorithm outperforms the traditional rule-based scheduling method. The BBX operator is rapid and efficient which performs better than several classic and competitive crossover operators.


1995 ◽  
Vol 3 (3) ◽  
pp. 133-142 ◽  
Author(s):  
M. Hana ◽  
W.F. McClure ◽  
T.B. Whitaker ◽  
M. White ◽  
D.R. Bahler

Two artificial neural network models were used to estimate the nicotine in tobacco: (i) a back-propagation network and (ii) a linear network. The back-propagation network consisted of an input layer, an output layer and one hidden layer. The linear network consisted of an input layer and an output layer. Both networks used the generalised delta rule for learning. Performances of both networks were compared to the multiple linear regression method MLR of calibration. The nicotine content in tobacco samples was estimated for two different data sets. Data set A contained 110 near infrared (NIR) spectra each consisting of reflected energy at eight wavelengths. Data set B consisted of 200 NIR spectra with each spectrum having 840 spectral data points. The Fast Fourier transformation was applied to data set B in order to compress each spectrum into 13 Fourier coefficients. For data set A, the linear regression model gave better results followed by the back-propagation network which was followed by the linear network. The true performance of the linear regression model was better than the back-propagation and the linear networks by 14.0% and 18.1%, respectively. For data set B, the back-propagation network gave the best result followed by MLR and the linear network. Both the linear network and MLR models gave almost the same results. The true performance of the back-propagation network model was better than the MLR and linear network by 35.14%.


2010 ◽  
Vol 26-28 ◽  
pp. 620-624 ◽  
Author(s):  
Zhan Wei Du ◽  
Yong Jian Yang ◽  
Yong Xiong Sun ◽  
Chi Jun Zhang ◽  
Tuan Liang Li

This paper presents a modified Ant Colony Algorithm(ACA) called route-update ant colony algorithm(RUACA). The research attention is focused on improving the computational efficiency in the TSP problem. A new impact factor is introduced and proved to be effective for reducing the convergence time in the RUACA performance. In order to assess the RUACA performance, a simply supported data set of cities, which was taken as the source data in previous research using traditional ACA and genetic algorithm(GA), is chosen as a benchmark case study. Comparing with the ACA and GA results, it is shown that the presented RUACA has successfully solved the TSP problem. The results of the proposed algorithm are found to be satisfactory.


Sign in / Sign up

Export Citation Format

Share Document