An updated nomenclature for plant ribosomal protein genes

Ban et al. (2014) proposed a nomenclature for ribosomal proteins (r-proteins) that reflects the current understanding of ribosomal protein evolution. In the past few years, this nomenclature has been widely adopted among biomedical researchers and microbiologists. This homology-based r-protein nomenclature has not been as widely adopted among plant biologists, however, presumably because r-protein nomenclature is much more complicated in plants due to gene duplication. Here, we propose compatible upgrades to the homology-guided nomenclature proposed by Ban et al. (2014) so that this naming system can be adopted for widespread use in the plant biology community. We note that Lan et al. (2022) recently proposed updated nomenclature for plant cytosolic ribosomal proteins, focused on Arabidopsis and rice. The nomenclature outlined here is an extension of that proposed by Lan et al. (2022), expanding to include organellar ribosomes and additional species, with the intent that this nomenclature can serve as a template to guide future plant genome annotations. A more detailed comparison highlighting how this naming system builds on the Ban et al. (2014) and Lan et al. (2022) nomenclatures is offered below. At this time, we request community feedback on this proposed nomenclature so that the naming system ultimately chosen represents a broad consensus. Feedback can be communicated to the this working group at plantribosome@gmail.com before July 25th, 2022. Coauthors of this letter and anyone in the scientific community expressing significant interest will then discuss this feedback as a group, reach a consensus agreement, and communicate the updated nomenclature rules through a letter to the editor (expected to be published at The Plant Cell) and the databases at TAIR and MaizeGDB.

Dear Editor, Across all living organisms, ribosomes are large macromolecular complexes that synthesize proteins by translating messenger RNA codes into amino acid sequences. Structurally, ribosomes are composed of ∼50-80 ribosomal proteins (r-proteins) and 3 or 4 ribosomal RNAs (rRNAs). Over the past 4 billion years, ribosomes have evolved some differences in rRNA and r-protein composition, with certain subunits specific to bacteria, archaea and eukaryotes, plastids, or mitochondria, although many subunits are universally conserved with clear homology across all of life. Historically, the nomenclature of r-proteins was different in each species investigated, based on certain biochemical properties; that is, they were numbered in the order that they were separated by electrophoresis and/or chromatography (e.g., see Wittmann et al., 1971), rather than named for structural homology or function. The different naming systems fostered confusion for researchers, especially scientists not directly investigating ribosome biology, and hindered computational efforts to collate information on homologous r-proteins. Ban et al. (2014) proposed to rectify these issues with a nomenclature for ribosomal proteins (r-proteins) that reflects the current understanding of ribosomal protein evolution.
In the past few years, this nomenclature has been widely adopted among biomedical researchers and microbiologists. This homology-based r-protein nomenclature has not been as widely adopted among plant biologists, however, presumably because r-protein nomenclature is much more complicated in plants due to gene duplication. Here, we propose compatible upgrades to the homology-guided nomenclature proposed by Ban et al. (2014) so that this naming system can be adopted for widespread use in the plant biology community. We note that Lan et al. (2022) recently proposed updated nomenclature for plant cytosolic ribosomal proteins, focused on Arabidopsis and rice. The nomenclature outlined here is an extension of that proposed by Lan et al. (2022), expanding to include organellar ribosomes and additional species, with the intent that this nomenclature can serve as a template to guide future plant genome annotations. A more detailed comparison highlighting how this naming system builds on the Ban et al. (2014) and Lan et al. (2022) nomenclatures is offered below. Moreover, although we intend that this nomenclature can be universally adopted by plant biologists and curators, we also recognize that databases should maintain complete lists of alternative aliases for genes based on past nomenclatures, and we encourage authors to at least parenthetically mention past gene symbol aliases in their manuscripts. Alongside the new gene symbols, we urge authors and editors to clearly list the stable unique gene ID assigned by community databases and associated genome version numbers, such as the Arabidopsis Genome Initiative (AGI) locus code available at The Arabidopsis Information Resource (TAIR) and genome version (e.g., TAIR10).
In most lineages other than plants, r-proteins are encoded by single-copy genes (Steel and Jacobson, 1986;Uechi et al., 2001). There are some small exceptions, of course; for example, bacterial genomes often include a couple of duplicated r-protein genes (Yutin et al., 2012), including E. coli, which has two copies of bL31 and two copies of bL36 (Makarova et al., 2001). S. cerevisiae, a descendent of a recent whole-genome duplication event, has two homoeologous copies of many r-protein genes (Mager et al., 1997). Plant genomes, in contrast, almost always encode multiple paralogous copies of r-protein genes. For example, in Arabidopsis thaliana, every cytosolic r-protein is encoded by at least two paralogs, and several are encoded by five or six paralogs (Barakat et al., 2001;Salih et al., 2020;Lan et al., 2022). Moreover, plants also encode an additional two sets of r-proteins that localize in mitochondria or plastids to translate the organellar genomes. In sum, the Arabidopsis genome includes nearly 400 genes that encode r-proteins, about four times more than the ∼100 genes that encode r-proteins in mammals.
In consultation with The Arabidopsis Information Resource (TAIR), Maize Genetics and Genomics Database (MaizeGDB), and colleagues in the plant ribosome biology field, we propose new names and symbols for all of the r-proteins encoded by the Arabidopsis, tomato, maize, and rice genomes, which we intend will serve as a template to guide future plant genome annotations ( Figure 1; Supplemental Data Set S1). We expect that this new nomenclature will enable greater communication with the wider audience of molecular biologists studying ribosomes and translation beyond plant biology.
The r-protein nomenclature established by Ban et al. (2014) begins with a lowercase letter indicating whether the r-protein is specific to bacteria (with the letter "b"), archaea and eukaryotes (with the letter "e"), or all domains of life (with the letter "u" for "universal"). This is followed by either L or S to indicate whether the protein is a subunit of the large or small ribosomal subunit, respectively, and then by a number to specify the r-protein identity ( Figure 1A). Cytosolic r-proteins have no suffix, whereas organelletargeted r-protein symbols conclude with a suffix to indicate that they are targeted to mitochondria (with the letter "m") or plastids (with the letter "c", for "chloroplast") (Bieri et al., 2017;Waltz et al., 2020Waltz et al., , 2021. Organellar ribosomes have evolved unique r-protein subunits with no homology to cytosolic r-proteins; in these cases, the lowercase prefix indicates that the r-protein is targeted to mitochondria (with the letter "m") or plastids (with the letter "c", for "chloroplast"), and no suffix is added to show their subcellular localization (Bieri et al., 2017;Waltz et al., 2019Waltz et al., , 2020Waltz et al., , 2021. Where feasible, the new r-protein symbols retain their traditional numbers-for example, archaeal/eukaryotic RPS6 is now eS6. Bacterial RPS6 is not homologous to eukaryotic RPS6, however, which previously caused some confusion; now, bacterial RPS6 is bS6, to indicate that it is not related to any archaeal/eukaryotic r-protein. Conversely, uS8 is now the universal symbol for bacterial r-protein S8, yeast r-protein S22, and human r-protein S15A, which all had different names despite their homology. Plant r-proteins occasionally have their own names, as well; for example, uL3, which was previously called L3 in bacteria, humans, and yeast, is called RIBOSOMAL PROTEIN1 (RP1) in Arabidopsis. Many Arabidopsis cytosolic r-proteins were first characterized from genetic screens for developmental defects, and the genes encoding these proteins were first named according to their mutant phenotypes, such as apiculata, embryo defective, evershed, hapless, oligocellula, piggyback, pointed first leaves, short valve, and suppressor of acaulis. Bifunctional r-proteins, such as eL40, which is proteolytically cleaved during ribosome assembly to separate the mature eL40 protein and its fused ubiquitin domain, are occasionally named not for the r-protein subunit, but for ubiquitin (in Arabidopsis, eL40 is called UBIQUITIN EXTENSION PROTEIN or UBQ, for example). These examples clearly illustrate the need for the new, unifying nomenclature for r-proteins in plant genomes so that our community can engage with other biologists.
Nonetheless, for continuity, past r-protein names and symbols should be maintained in databases as aliases. Moreover, we recommend that aliases should also be mentioned parenthetically as alternative gene names and symbols in future publications to ensure clarity for readers, e.g., "We detected that phosphorylation of r-protein eS6z (RPS6a) was reduced by rapamycin…". This way, readers more familiar with the acronym "RP" to indicate "ribosomal protein" will not be confused by the new names, but the updated nomenclature will reconcile with the established nomenclature in other fields.
Animal r-proteins are encoded exclusively by the nuclear genome, so biomedical researchers have not emphasized the genomic location of r-protein genes in recent nomenclatures. Plant r-proteins, however, can be encoded by the nuclear, mitochondrial, or plastid genomes, with some variation in the location of these genes across species. There is even a special case, mitochondrial uL2, which has split into two genes in plants: the nucleus encodes a polypeptide homologous to the C-terminus of uL2 and the plastid encodes a polypeptide homologous to the N-terminal portion of uL2. To indicate cases when an r-protein is encoded by the organellar genome, we recommend using uppercase letters for the suffix (i.e., "M" and "C") in publications.
The greatest challenge in adopting this new nomenclature for plant biology is how to best indicate paralogy of r-proteins ( Figure 1B). In the simplest cases, there are only two paralogs, which could be designated with a single letter in alphabetical order, e.g., eS6a and eS6b. But in many cases, there are at least three paralogs, which is problematic because the plastid-targeted proteins are designated with a "c" (Bieri et al., 2017). In Arabidopsis, about 20 cytosolic r-proteins would end with a "c" and thus would be confused with the homologous plastid-targeted r-proteins that would also end with a "c". There are many possible solutions to this problem, including several proposals advanced by members of the plant biology community; the most straightforward options are (1) to switch from a "c" designating chloroplasttargeted to a "p" designating plastid-targeted, (2) to add a hyphen separating the paralog designation from the protein symbol, (3) to distinguish between majuscule (uppercase) and miniscule (lowercase) lettering, such that "C" indicates a third paralog but "c" indicates plastid localization, (4) to use an alternative alphabet, such as Greek letters, to indicate paralogs, (5) to move the organelle indicator before the r-protein symbol, or (6) to start from the end of the alphabet, naming paralogs, e.g., uL15z, uL15y, uL15x.
After soliciting community feedback through a preprint version of this letter, social media, e-mails to additional community members, and the Plant Biology 2022 conference, we came to prefer the last option for several reasons. First, there is already literature on chloroplast ribosomes using the "c" to indicate plastid-targeted r-proteins, and there is considerable literature placing "m" or "c" at the end of the r-protein symbol to indicate organelle-targeting, so changing these would not serve the larger purpose of reaching a consensus nomenclature with r-protein biologists in other fields. Second, "p" is used as a suffix in many nomenclatures to distinguish proteins from nucleic acids (e.g., Tor1p is the protein encoded by the gene tor1 in fission yeast) or to designate protein phosphorylation (e.g., rpS6P is phosphorylated eS6). Third, hyphens are typically used in plant nomenclatures to indicate alleles, so naming genes eS6-a and eS6-b could give the false impression that these are two alleles of a single gene, rather than paralogs. Fourth, relying on uppercase versus lowercase letters or on non-standard alphabets would require that database curators, computational biologists annotating new genomes, journal editors, and ribosome biologists working outside plant biology all pay strict attention to a slight typographical difference or expand the standard alphabet to accommodate this one set of genes, whereas starting from the end of the alphabet avoids any potential confusion.
We have provided a provisional table of r-protein names and symbols for Arabidopsis, tomato, maize, and rice for the plant biology community to consider, alongside their Figure 1 The proposed r-protein nomenclature follows standard rules across all domains of life to indicate homology of ribosomal subunits. A, The first letter indicates whether the r-protein is specific to bacterial genomes (b), archaean/eukaryotic genomes (e), or universal across genomes (u). In cases when the organellar r-protein has no cytosolic r-protein orthologues, the first letter instead indicates that the r-protein is specific to mitochondria (m) or plastids (c). The second letter indicates whether the r-protein is associated with the large 60S (L) or small 40S (S) subunit. The subunit number is based on consensus convention across model species as previously established (Ban et al., 2014). r-proteins that localize to plastids (c) or mitochondria (m) are indicated with a suffix, and this suffix is uppercase when the r-protein is encoded by the organellar genome. The final suffix is used to distinguish paralogs that encode homologous r-proteins within a genome. B, Representative example of r-protein paralogy in the Arabidopsis thaliana genome. eL6x is a homoeolog of two tandemly duplicated paralogs, eL6z and eL6y. Neighboring homoeologous genes and chromosomal locations are indicated to demonstrate synteny among these r-protein genes. historical symbols in Arabidopsis and their symbols as recently proposed by Lan et al. (2022) (Supplemental Dataset S1). Note that the Lan et al. (2022) nomenclature differs primarily in how paralogs are indicated, which is a result of the exclusive focus of that nomenclature on cytosolic ribosomes. The new nomenclature will be added to public databases, including TAIR, MaizeGDB, and the Plant Cytoplasmic Ribosomal Proteins database (PlantCRP.cn). Previous names and symbols will be retained at these databases as a reference, and, as stated above, in publications, systematic identifiers (e.g., the AGI locus ID) should always be used alongside the updated r-protein symbols. We strongly encourage researchers to adopt the revised nomenclature to facilitate communication with researchers outside the plant community and increase the impact of our community's work on ribosome biology.

Supplemental data
The following materials are available in the online version of this article.
Supplemental Dataset S1. The updated ribosomal protein nomenclature for select model species.