Background:
Biomolecular-level event extraction is one of the most important branches
of information extraction. With the rapid growth of biomedical literature, it is difficult for researchers
to manually obtain information of interest, e.g. unknown information of threatening human
disease or some biological processes. Therefore, researchers are interested in automatically
acquiring information of biomolecular-level events. However, the annotated biomolecular-level
event corpus is limited and highly imbalanced, which affects the performance of the classification
algorithms and can even lead to over-fitting.
associations while known disease-lncRNA associations are required only.
Method:
In this paper, a new approach using the Pairwise model and convolutional neural network
for biomolecular-level event extraction is introduced. The method can identify more accurate positive
instances from unlabeled data to enlarge the labeled data. First, unlabeled samples are categorized
using the Pairwise model. Then, the shortest dependency path with additional information is
generated. Furthermore, two input forms with a new representation of the convolutional neural
network model, which are dependency word sequence and dependency relation sequence are presented.
Finally, with the sample selection strategy, the expanded labeled samples from unlabeled
domain corpus incrementally enlarge the training data to improve the performance of the classifier.
</P><P>
Result & Conclusion: Our proposed method achieved better performance than other excellent systems.
This is due to our new representation of generated short sentence and proposed sample selection
strategy, which greatly improved the accuracy of classification. The extensive experimental
results indicate that the new method can effectively inculcate unlabeled data to improve the performance
of classifier for biomolecular-level events extraction.</P>