scholarly journals SpCas9 activity prediction by DeepSpCas9, a deep learning-based model with unparalleled generalization performance

2019 ◽  
Author(s):  
Hui Kwon Kim ◽  
Younggwang Kim ◽  
Sungtae Lee ◽  
Seonwoo Min ◽  
Jung Yoon Bae ◽  
...  

AbstractWe evaluated SpCas9 activities at 12,832 target sequences using a high-throughput approach based on a human cell library containing sgRNA-encoding and target sequence pairs. Deep learning-based training on this large data set of SpCas9-induced indel frequencies led to the development of a SpCas9-activity predicting model named DeepSpCas9. When tested against independently generated data sets (our own and those published by other groups), DeepSpCas9 showed unprecedentedly high generalization performance. DeepSpCas9 is available athttp://deepcrispr.info/DeepCas9.

2019 ◽  
Vol 5 (11) ◽  
pp. eaax9249 ◽  
Author(s):  
Hui Kwon Kim ◽  
Younggwang Kim ◽  
Sungtae Lee ◽  
Seonwoo Min ◽  
Jung Yoon Bae ◽  
...  

We evaluated SpCas9 activities at 12,832 target sequences using a high-throughput approach based on a human cell library containing single-guide RNA–encoding and target sequence pairs. Deep learning–based training on this large dataset of SpCas9-induced indel frequencies led to the development of a SpCas9 activity–predicting model named DeepSpCas9. When tested against independently generated datasets (our own and those published by other groups), DeepSpCas9 showed high generalization performance. DeepSpCas9 is available at http://deepcrispr.info/DeepSpCas9.


Sensors ◽  
2020 ◽  
Vol 20 (1) ◽  
pp. 322 ◽  
Author(s):  
Faraz Malik Awan ◽  
Yasir Saleem ◽  
Roberto Minerva ◽  
Noel Crespi

Machine/Deep Learning (ML/DL) techniques have been applied to large data sets in order to extract relevant information and for making predictions. The performance and the outcomes of different ML/DL algorithms may vary depending upon the data sets being used, as well as on the suitability of algorithms to the data and the application domain under consideration. Hence, determining which ML/DL algorithm is most suitable for a specific application domain and its related data sets would be a key advantage. To respond to this need, a comparative analysis of well-known ML/DL techniques, including Multilayer Perceptron, K-Nearest Neighbors, Decision Tree, Random Forest, and Voting Classifier (or the Ensemble Learning Approach) for the prediction of parking space availability has been conducted. This comparison utilized Santander’s parking data set, initiated while working on the H2020 WISE-IoT project. The data set was used in order to evaluate the considered algorithms and to determine the one offering the best prediction. The results of this analysis show that, regardless of the data set size, the less complex algorithms like Decision Tree, Random Forest, and KNN outperform complex algorithms such as Multilayer Perceptron, in terms of higher prediction accuracy, while providing comparable information for the prediction of parking space availability. In addition, in this paper, we are providing Top-K parking space recommendations on the basis of distance between current position of vehicles and free parking spots.


2009 ◽  
Vol 14 (10) ◽  
pp. 1236-1244 ◽  
Author(s):  
Swapan Chakrabarti ◽  
Stan R. Svojanovsky ◽  
Romana Slavik ◽  
Gunda I. Georg ◽  
George S. Wilson ◽  
...  

Artificial neural networks (ANNs) are trained using high-throughput screening (HTS) data to recover active compounds from a large data set. Improved classification performance was obtained on combining predictions made by multiple ANNs. The HTS data, acquired from a methionine aminopeptidases inhibition study, consisted of a library of 43,347 compounds, and the ratio of active to nonactive compounds, R A/N, was 0.0321. Back-propagation ANNs were trained and validated using principal components derived from the physicochemical features of the compounds. On selecting the training parameters carefully, an ANN recovers one-third of all active compounds from the validation set with a 3-fold gain in R A/N value. Further gains in RA/N values were obtained upon combining the predictions made by a number of ANNs. The generalization property of the back-propagation ANNs was used to train those ANNs with the same training samples, after being initialized with different sets of random weights. As a result, only 10% of all available compounds were needed for training and validation, and the rest of the data set was screened with more than a 10-fold gain of the original RA/N value. Thus, ANNs trained with limited HTS data might become useful in recovering active compounds from large data sets.


2009 ◽  
Vol 15 (4) ◽  
pp. 670-681 ◽  
Author(s):  
David Mayerich ◽  
John Keyser

We present a framework for segmenting and storing filament networks from scalar volume data. Filament networks are encountered more and more commonly in biomedical imaging due to advances in high-throughput microscopy. These data sets are characterized by a complex volumetric network of thin filaments embedded in a scalar volume field. High-throughput microscopy volumes are also difficult to manage since they can require several terabytes of storage, even though the total volume of the embedded structure is much smaller. Filaments in microscopy data sets are difficult to segment because their diameter is often near the sampling resolution of the microscope, yet these networks can span large regions of the data set. We describe a novel method to trace filaments through scalar volume data sets that is robust to both noisy and undersampled data. We use graphics hardware to accelerate the tracing algorithm, making it more useful for large data sets. After the initial network is traced, we use an efficient encoding scheme to store volumetric data pertaining to the network.


Author(s):  
Shaoping Zhu ◽  
Yongliang Xiao ◽  
Weimin Ma

In order to improve the accuracy of human action recognition in video and the computational efficiency of large data sets, an action recognition algorithm based on multiple features and modified deep learning model is proposed. First, the deep network pre-training process is used to learn and optimize the RBM parameters, and the deep belief nets (DBN) model is constructed through deep learning. Then, human 13 joint points and critical points of optical flow are automatically extracted by DBN model. Second, these more abstract and more effective human motion features are combined to represent human actions. Ultimately, the entire DBN network structure is fine-tuned by support vector machine (SVM) algorithm to classify human actions. We demonstrate that human 13 joint points and critical points of optical flow are two very effective human action characterizations, our proposed approach greatly reduces the required samples, and shortens the training time of the samples, can efficiently process large data sets and can effectively recognize novel actions. We performed experiments on the KTH data set, Weizmann data set, the ballet data set and UCF101 data set to evaluate the proposed method, the experiment results show that the average recognition accuracy is over 98%, which validates its effectiveness, and show that our results are stable, reliable, and significantly better than the results of two state-of-the-art approaches on four different data sets. So, it lays a good theoretical foundation for practical applications.


Author(s):  
Kyungkoo Jun

Background & Objective: This paper proposes a Fourier transform inspired method to classify human activities from time series sensor data. Methods: Our method begins by decomposing 1D input signal into 2D patterns, which is motivated by the Fourier conversion. The decomposition is helped by Long Short-Term Memory (LSTM) which captures the temporal dependency from the signal and then produces encoded sequences. The sequences, once arranged into the 2D array, can represent the fingerprints of the signals. The benefit of such transformation is that we can exploit the recent advances of the deep learning models for the image classification such as Convolutional Neural Network (CNN). Results: The proposed model, as a result, is the combination of LSTM and CNN. We evaluate the model over two data sets. For the first data set, which is more standardized than the other, our model outperforms previous works or at least equal. In the case of the second data set, we devise the schemes to generate training and testing data by changing the parameters of the window size, the sliding size, and the labeling scheme. Conclusion: The evaluation results show that the accuracy is over 95% for some cases. We also analyze the effect of the parameters on the performance.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Yahya Albalawi ◽  
Jim Buckley ◽  
Nikola S. Nikolov

AbstractThis paper presents a comprehensive evaluation of data pre-processing and word embedding techniques in the context of Arabic document classification in the domain of health-related communication on social media. We evaluate 26 text pre-processings applied to Arabic tweets within the process of training a classifier to identify health-related tweets. For this task we use the (traditional) machine learning classifiers KNN, SVM, Multinomial NB and Logistic Regression. Furthermore, we report experimental results with the deep learning architectures BLSTM and CNN for the same text classification problem. Since word embeddings are more typically used as the input layer in deep networks, in the deep learning experiments we evaluate several state-of-the-art pre-trained word embeddings with the same text pre-processing applied. To achieve these goals, we use two data sets: one for both training and testing, and another for testing the generality of our models only. Our results point to the conclusion that only four out of the 26 pre-processings improve the classification accuracy significantly. For the first data set of Arabic tweets, we found that Mazajak CBOW pre-trained word embeddings as the input to a BLSTM deep network led to the most accurate classifier with F1 score of 89.7%. For the second data set, Mazajak Skip-Gram pre-trained word embeddings as the input to BLSTM led to the most accurate model with F1 score of 75.2% and accuracy of 90.7% compared to F1 score of 90.8% achieved by Mazajak CBOW for the same architecture but with lower accuracy of 70.89%. Our results also show that the performance of the best of the traditional classifier we trained is comparable to the deep learning methods on the first dataset, but significantly worse on the second dataset.


Author(s):  
Lior Shamir

Abstract Several recent observations using large data sets of galaxies showed non-random distribution of the spin directions of spiral galaxies, even when the galaxies are too far from each other to have gravitational interaction. Here, a data set of $\sim8.7\cdot10^3$ spiral galaxies imaged by Hubble Space Telescope (HST) is used to test and profile a possible asymmetry between galaxy spin directions. The asymmetry between galaxies with opposite spin directions is compared to the asymmetry of galaxies from the Sloan Digital Sky Survey. The two data sets contain different galaxies at different redshift ranges, and each data set was annotated using a different annotation method. The results show that both data sets show a similar asymmetry in the COSMOS field, which is covered by both telescopes. Fitting the asymmetry of the galaxies to cosine dependence shows a dipole axis with probabilities of $\sim2.8\sigma$ and $\sim7.38\sigma$ in HST and SDSS, respectively. The most likely dipole axis identified in the HST galaxies is at $(\alpha=78^{\rm o},\delta=47^{\rm o})$ and is well within the $1\sigma$ error range compared to the location of the most likely dipole axis in the SDSS galaxies with $z>0.15$ , identified at $(\alpha=71^{\rm o},\delta=61^{\rm o})$ .


2020 ◽  
Vol 6 ◽  
Author(s):  
Jaime de Miguel Rodríguez ◽  
Maria Eugenia Villafañe ◽  
Luka Piškorec ◽  
Fernando Sancho Caparrini

Abstract This work presents a methodology for the generation of novel 3D objects resembling wireframes of building types. These result from the reconstruction of interpolated locations within the learnt distribution of variational autoencoders (VAEs), a deep generative machine learning model based on neural networks. The data set used features a scheme for geometry representation based on a ‘connectivity map’ that is especially suited to express the wireframe objects that compose it. Additionally, the input samples are generated through ‘parametric augmentation’, a strategy proposed in this study that creates coherent variations among data by enabling a set of parameters to alter representative features on a given building type. In the experiments that are described in this paper, more than 150 k input samples belonging to two building types have been processed during the training of a VAE model. The main contribution of this paper has been to explore parametric augmentation for the generation of large data sets of 3D geometries, showcasing its problems and limitations in the context of neural networks and VAEs. Results show that the generation of interpolated hybrid geometries is a challenging task. Despite the difficulty of the endeavour, promising advances are presented.


2006 ◽  
Vol 39 (2) ◽  
pp. 262-266 ◽  
Author(s):  
R. J. Davies

Synchrotron sources offer high-brilliance X-ray beams which are ideal for spatially and time-resolved studies. Large amounts of wide- and small-angle X-ray scattering data can now be generated rapidly, for example, during routine scanning experiments. Consequently, the analysis of the large data sets produced has become a complex and pressing issue. Even relatively simple analyses become difficult when a single data set can contain many thousands of individual diffraction patterns. This article reports on a new software application for the automated analysis of scattering intensity profiles. It is capable of batch-processing thousands of individual data files without user intervention. Diffraction data can be fitted using a combination of background functions and non-linear peak functions. To compliment the batch-wise operation mode, the software includes several specialist algorithms to ensure that the results obtained are reliable. These include peak-tracking, artefact removal, function elimination and spread-estimate fitting. Furthermore, as well as non-linear fitting, the software can calculate integrated intensities and selected orientation parameters.


Sign in / Sign up

Export Citation Format

Share Document