LMNglyPred: prediction of human N-linked glycosylation sites using embeddings from a pre-trained protein language model

Document Type

Article

Publication Date

4-17-2023

Department

Department of Computer Science; Department of Chemistry

Abstract

Protein N-linked glycosylation is an important post-translational mechanism in Homo sapiens, playing essential roles in many vital biological processes. It occurs at the N-X-[S/T] sequon in amino acid sequences, where X can be any amino acid except proline. However, not all N-X-[S/T] sequons are glycosylated, thus the N-X-[S/T] sequon is a necessary but not sufficient determinant for protein glycosylation. In that regard, computational prediction of N-linked glycosylation sites confined to N-X-[S/T] sequons is an important problem which has not been extensively addressed by the existing methods especially in regard to the creation of negative sets and leveraging the distilled information from protein language models. Here, we developed LMNglyPred, a deep learning-based approach to predict N-linked glycosylated sites in human proteins using embeddings from a pre-trained protein language model (pLM). LMNglyPred produces sensitivity, specificity, Matthews Correlation Coefficient, precision, and accuracy of 76.50 percent, 75.36 percent, 0.49, 60.99 percent, and 75.74 percent, respectively on a benchmark independent test set. These results demonstrate that LMNglyPred is a robust computational tool to predict N-linked glycosylation sites confined to the N-X-[S/T] sequon.

Publication Title

Glycobiology

Recommended Citation

Pakhrin, S. C., Pokharel, S., Aoki-Kinoshita, K. F., Beck, M. R., Dam, T. K., Caragea, D., & KC, D. (2023). LMNglyPred: prediction of human N-linked glycosylation sites using embeddings from a pre-trained protein language model. Glycobiology. http://doi.org/10.1093/glycob/cwad033
Retrieved from: https://digitalcommons.mtu.edu/michigantech-p/17054

Michigan Tech Publications, Part 1

LMNglyPred: prediction of human N-linked glycosylation sites using embeddings from a pre-trained protein language model

Document Type

Publication Date

Department

Abstract

Publication Title

Recommended Citation

LINKS

Browse

Search

Graduate Students

Author Corner

Links

Michigan Tech Publications, Part 1

LMNglyPred: prediction of human N-linked glycosylation sites using embeddings from a pre-trained protein language model

Authors

Document Type

Publication Date

Department

Abstract

Publication Title

Recommended Citation

Share

LINKS

Browse

Search

Graduate Students

Author Corner

Links