A Sub Sequence Based Approach to Protein Function Prediction via Multi Attention Based Multi Aspect

A Sub Sequence Based Approach to Protein Function Prediction via Multi Attention Based Multi Aspect

Abstract:

Inferring the protein function(s) via the protein sub-sequence classification is often obstructed due to lack of knowledge about function(s) of sub-sequences in the protein sequence. In this regard, we develop a novel “ multi-aspect ” paradigm to perform the sub-sequence classification in an efficient way by utilizing the information of the parent sequence. The aspects are: (1) Multi-label : independent labelling of sub-sequences with more than one functions of the parent sequence, and (ii) Label-relevance : scoring the parent functions to highlight the relevance of performing a given function by the sub-sequence. The multi-aspect paradigm is used to propose the “Multi-Attention Based Multi-Aspect Network” for classifying the protein sub-sequences, where multi-attention is a novel approach to process sub-sequences at word-level. Next, the proposed Global-ProtEnc method is a sub-sequence based approach to encoding protein sequences for protein function prediction task, which is finally used to develop as ensemble methods, Global-ProtEnc-Plus . Evaluations of both the Global-ProtEnc and the Global-ProtEnc-Plus methods on the benchmark CAFA3 dataset delivered a outstanding performances. Compared to the state-of-the-art DeepGOPlus, the improvements in Fmax with the Global-ProtEnc-Plus for the biological process is +6.50 percent and cellular component is +1.90 percent.