DeepPlnc: Discovering plant lncRNAs through multimodal deep learning on sequential data
Various noncoding elements of genome have gained attention for their regulatory roles where the lncRNAs are very recent and most intriguing for their possible functions. Due to limited information about lncRNAs, their characterization remains a big challenge, especially in plants. Plant lncRNAs differ a lot from others even in the mode of transcription and display poor sequence conservation. Scarce resources exist to annotate for lncRNAs with satisfactory reliability. Here, we present a deep learning approach-based software, DeepPlnc, to accurately identify plant lncRNAs across the plant genomes. DeepPlnc, unlike most of the existing software, can even accurately annotate the incomplete length transcripts also which are very common in de novo assembled transcriptomes. It has incorporated a bi-modal architecture of Convolution Neural Nets while extracting information from the sequences of nucleotides and secondary structure representations for plant lncRNAs. DeepPlnc scored high on all the considered performance metrics while breaching the average accuracy of >95% when tested across different experimentally validated datasets. The software was comprehensively benchmarked against some of the recently published tools to identify the plant lncRNAs where it consistently outperformed all the compared tools for all the performance metrics and for all the considered benchmarking datasets. DeepPlnc will be an important resource for reference free identification and annotation of transcriptome and genome for lncRNAs in plants. DeepPlnc has been made freely available as a web-server at https://scbb.ihbt.res.in/DeepPlnc/. Besides this, a stand-alone version is also provided at GitHub at https://github.com/SCBB-LAB/DeepPlnc/.