Bioinformatic Analyses of Peroxiredoxins and RF-Prx: A Random Forest-Based Predictor and Classifier for Prxs

Document Type


Publication Date



Department of Computer Science


Peroxiredoxins (Prxs) are a protein superfamily, present in all organisms, that play a critical role in protecting cellular macromolecules from oxidative damage but also regulate intracellular and intercellular signaling processes involving redox-regulated proteins and pathways. Bioinformatic approaches using computational tools that focus on active site-proximal sequence fragments (known as active site signatures) and iterative clustering and searching methods (referred to as TuLIP and MISST) have recently enabled the recognition of over 38,000 peroxiredoxins, as well as their classification into six functionally relevant groups. With these data providing so many examples of Prxs in each class, machine learning approaches offer an opportunity to extract additional information about features characteristic of these protein groups. In this study, we developed a novel computational method named “RF-Prx” based on a random forest (RF) approach integrated with K-space amino acid pairs (KSAAP) to identify peroxiredoxins and classify them into one of six subgroups. Our process performed in a superior manner compared to other machine learning classifiers. Thus the RF approach integrated with K-space amino acid pairs enabled the detection of class-specific conserved sequences outside the known functional centers and with potential importance. For example, drugs designed to target Prx proteins would likely suffer from cross-reactivity among distinct Prxs if targeted to conserved active sites, but this may be avoidable if remote, class-specific regions could be targeted instead.

Publication Title

Methods in Molecular Biology