Naail Kashif-Khan: Mining metagenomics data for novel bacterial nanocompartments

14 Mar 2024

Encapsulin nanocompartments are icosahedral protein-based organelles found in bacteria and archaea (1).

Encapsulin organelles serve a wide range of physiological functions, including mineral storage (2), oxidative stress response (3), enzyme catalysis (4) and secondary metabolism (5,6) These protein nanostructures have many potential applications in synthetic biology and biomedicine, for example metal ion loading for use as imaging agents in biomedicine (7,8), antigen display for protein-based vaccines (9), as recently demonstrated with the surface display of SARS-CoV-2 antigens in animal models (10), and packaging of proteins and RNA towards drug delivery applications (11,12). Encapsulins may also be a promising platform for metabolic engineering via loading of heterologous enzymes; this approach may protect unstable proteins from degradation, increase reaction rates, and enable the use of reaction pathways with toxic intermediates (13).

Encapsulin monomers spontaneously self-assemble into full-sized capsids and are capable of encapsulating cargo proteins in a specific manner (as shown in Figure 1A). Encapsulin cargo proteins contain C-terminal cargo loading peptides (CLPs) (14) or longer N-terminal domains (NTDs) (4) responsible for targeting them to the capsid interior. Encapsulins display similar icosahedral symmetry to virus capsids and, as such, are assigned triangulation numbers (T-numbers) based on the number of subunits and size of the capsid assembly (Figure 1B). Encapsulin proteins share a common ancestor with HK97-fold phage major capsid proteins, and as such show sequence and structural similarity with this family of viral proteins (15). This shared evolutionary history makes discovery of encapsulins from protein sequence databases difficult, since encapsulin sequences are often misannotated as phage capsid proteins, bacteriocins or linocins (16), and search hits can be ‘contaminated’ with phage capsid proteins (5).

Encapsulins are currently grouped into four families based on their cargo type and Pfam annotation (Figure 1C) (5). Family 1 currently includes encapsulins from Pfam family PF04454 (Encapsulating Protein for Peroxidase). As the Pfam name suggests, these encapsulins are associated with cargo proteins from the dye-decolourizing peroxidase (DyP) family, or iron-binding cargo proteins like ferritins, rubrerythrin, hemerythrin, or manganese catalase-like proteins (5). Almost all experimentally solved encapsulin protein structures are derived from family 1. Family 2 is the largest encapsulin family, whose members are most often associated with four different types of cargo enzymes; these are cysteine desulfurase, polyprenyl transferase, xylulose kinase, and terpene cyclase. Family 2 encapsulins are not typically associated with any single Pfam family. Family 2 encapsulin capsid proteins can also be found fused to cyclic NMP-binding domains (5). Family 3 includes encapsulins from the Pfam family PF05065 (Phage capsid family), which are found within biosynthetic gene clusters (BGCs)—sets of genes encoding enzymes responsible for the synthesis of a variety of natural products. Finally, family 4 encapsulins are part of Pfam family PF08967 (DUF1884 domain-containing protein). These proteins display a truncated form of the HK97-fold containing only the A-domain. It is currently unknown whether these proteins are capable of self-assembly into an icosahedral particle like known encapsulins, or whether they encapsulate any cargo proteins. Despite this, previous work (5) has considered this small family of proteins as encapsulins and classified them based on the presence of putative ‘cargo’ proteins – hydrogenase, osmotic shock-associated proteins, deoxyribose phosphate aldolase, or glyceraldehyde-3-phosphate dehydrogenase.

Read full article here

Recent projects

Apply now

Naail Kashif-Khan: Mining metagenomics data for novel bacterial nanocompartments