The radical SAM superfamily (RSS) is arguably the largest and most functionally diverse enzyme superfamily. Many functions (and intriguing reaction mechanisms) have been discovered; many more remain to be discovered!

RadicalSAM.org is designed to leverage “top-down” discovery of function using the EFI’s genomic enzymology web tools. The sequence similarity network (SSN) for the RSS is too large to be analyzed with Cytoscape and the RAM available on most computers so has been inaccessible to RSS community.

We generated the SSN for the entire RSS using a computer with 768GB RAM and segregated it into clusters for 1) the 20 subgroups curated by the Structure-Function Linkage Database (SFLD) and 2) many additional subgroups not curated by the SFLD.

For each subgroup, RadicalSAM.org provides:

  1. The SSN, multiple sequence alignment (MSA), WebLogo, hidden Markov model (HMM), length histogram, phylogenetic distribution, SwissProt annotations, and number and locations of conserved Cys residues.
  2. Genome neighborhood diagrams (GNDs) that provide metabolic pathway context for inference of functions.
  3. UniProt accession IDs and FASTA sequences that can be used with EFI-EST, EFI-GNT, and EFI-CGFP for user-specific applications.
  4. For four large and functionally diverse subgroups, the ability to “walk” through a series of SSNs generated at increasing alignment scores. The progeny (walking forward) and progenitors of a cluster (walking backward) can be identified, allowing the discovery of related functions and/or substrate specificities.

We encourage users to submit experimentally characterized functional annotations for sequences that have not yet been curated by SwissProt so that these can be made available to the RSS community.

A Perspective has been accepted for publication in ACS Bio & Med Chem Au describing RadicalSAM.org. If you use RadicalSAM.org, please cite us:

Nils Oberg, Timothy W. Precord, Douglas A. Mitchell, and John A. Gerlt, RadicalSAM.org: A Resource to Interpret Sequence-Function Space and Discover New Radical SAM Enzyme Chemistry, ACS Bio & Med Chem Aug 2021 https://doi.org/10.1021/acsbiomedchemau.1c00048

The radical SAM superfamily (RSS) is arguably the largest and most functionally diverse enzyme superfamily. Its members contain a Fe4S4 cluster near the N-terminus of a (β/α)6-barrel domain that binds S-adenosyl methionine (SAM); one-electron reduction of the bound SAM yields Met and the 5′-deoxyadenosyl radical (5′-dAdo•). By hydrogen abstraction, the 5′-dAdo• generates a substrate radical (R•) (and 5′-deoxyadenosine) that undergoes intriguing and often complex chemistry to yield the product.

Illustration of the chemical reaction that generates a substrate radical R• and 5′-deoxyadenosine.

The SFLD (http://sfld.rbvi.ucsf.edu/archive/django/index.html) used an SSN to segregate the RSS into 20 subgroups with characterized functions and 22 without characterized functions. Their analysis was described in Methods in Enzymology Volume 606 in 2018: Atlas of the Radical SAM Superfamily: Divergent Evolution of Function Using an "Plug and Play" Domain, G.L. Holliday, E. Akiva, E.C. Meng, S.D. Brown, S. Calhoun, U. Pieper, A. Sali, S.J. Booker and P.C. Babbitt (doi: 10.1016/bs.mie.2018.06.004).

At the time of the SFLD's analyses (2017), the RSS included 113,776 sequences that were collected at 50% sequence identity into 10,741 representative nodes so that the SSN could be visualized and analyzed using Cytoscape (https://cytoscape.org/). The SSN is reproduced below (Figure 5 in the Atlas; minimum e-value threshold to draw edges between nodes is 1e-20), with the characterized subgroups numbered and colored as described by the SFLD.

The SSN used to define the subgroups in the Atlas of the the Radical Superfamily, from the Atlas publication.

We provide a web resource with three "democratized" genomic enzymology tools (https://efi.igb.illinois.edu/; doi: 10.1021/acs.biochem.9b00735) The tools are used to 1) explore sequence-function space in protein families using SSNs (generated with EFI-EST), 2) collect and explore genome neighborhoods for clues about functions of uncharacterized enzymes (collected with EFI-GNT), and 3) prioritize uncharacterized SSN clusters for functional assignment based on human microbiome metagenome abundance (chemically guided functional profiling with EFI-CGFP).

Unfortunately, most experimentalists interested in the RSS cannot take full advantage of the tools:

  1. The SSN for the RSS is too large to be visualized/analyzed with Cytoscape using the RAM installed on most computers.
  2. The SSN contains many clusters/subgroups that were not curated by the SFLD.
  3. Regularly updated lists of the members of the subgroups are not available.

We developed RadicalSAM.org to provide lists of accession IDs (UniProt, UniRef90, and UniRef50) for the SFLD-curated subgroups and uncurated subgroups so that their SSNs can generated with EFI-EST. For most of SFLD subgroups and additional clusters, the number of UniRef50 IDs is sufficiently small that useful SSNs can be visualized and analyzed with Cytoscape using typically available computers.

RadicalSAM.org also provides information about each subgroup to aid target selection and inference of function, including the multiple sequence alignment (MSA), WebLogo, hidden Markov model (HMM), length histogram, phylogenetic distribution, SwissProt annotations, and number and locations of conserved Cys residues.

Importantly, RadicalSAM.org provides genome context (genome neighborhood networks, GNNs, and genome neighborhood diagrams, GNDs) for the bacterial and archaeal members of the subgroups, thereby providing metabolic pathway context for inferring novel enzymatic activities and physiological functions.

This release of RadicalSAM.org includes RSS sequences in UniProt Release 2020_05/InterPro Release 82 (October 7, 2020). Option B of EFI-EST was used to collect sequences from the UniRef50 database using as query a list of 172 InterPro families/domains and 1 Pfam family; the list of families is provided in the Sequence Families tab. UniRef50 clusters were used because these provide a manageable number of SSN nodes and edges (using a Mac Pro desktop with 768 GB RAM). The UniRef database is described at: https://www.uniprot.org/help/uniref.

We used a dataset containing only "Complete" UniRef50 cluster IDs filtered to include sequences with ≥140 residues (50,232 of 52,886 UniRef50 clusters; include 616,009 of 620,386 UniProt IDs). The sequences in the UniRef50/UniRef90 clusters WERE NOT length-filtered.

Future releases of RadicalSAM.org may use the "Complete" plus "Fragment" UniRef50 cluster dataset filtered to include sequences with ≥140 residues, with the benefit of providing additional genome context information. However, in this initial release, we adopted a "conservative" approach for including sequences.

FamilyShort Name
IPR000385MoaA_NifB_PqqE_Fe-S-bd_CS
IPR001989Radical_activat_CS
IPR002684Biotin_synth/BioAB
IPR003698Lipoyl_synth
IPR003739Lys_aminomutase/Glu_NH3_mut
IPR004383rRNA_lsu_MTrfase_RlmN/Cfr
IPR004558Coprogen_oxidase_HemN
IPR004559HemW-like
IPR005839Methylthiotransferase
IPR005840Ribosomal_S12_MeSTrfase_RimO
IPR005909RaSEA
IPR005911YhcC-like
IPR005980Nase_CF_NifB
IPR006463MiaB_methiolase
IPR006466MiaB-like_B
IPR006467MiaB-like_C
IPR006638Elp3/MiaB/NifB
IPR007197rSAM
IPR010505Mob_synth_C
IPR010722BATS_dom
IPR010723HemN_C
IPR011101DUF5131
IPR011843PQQ_synth_PqqE_bac
IPR012726ThiH
IPR012837NrdG
IPR012838PFL1_activating
IPR012839Organic_radical_activase
IPR013483MoaA
IPR013704UPF0313_N
IPR013848Methylthiotransferase_N
IPR013917tRNA_wybutosine-synth
IPR016431Pyrv-formate_lyase-activ_prd
IPR016771Fe-S_OxRdtase_rSAM_TM0948_prd
IPR016779rSAM_MSMEG0568
IPR016863DesII
IPR017200PqqE-like
IPR017742Deazaguanine_synth
IPR017833Hopanoid_synth-assoc_rSAM_HpnH
IPR017834Hopanoid_synth-assoc_rSAM_HpnJ
IPR019939CofG_family
IPR019940CofH_family
IPR020050FO_synthase_su2
IPR020612Methylthiotransferase_CS
IPR022431Cyclic_DHFL_synthase_mqnC
IPR022432Aminodeoxyfutalosine_synthase
IPR022447Lys_aminomutase-rel
IPR022459Lysine_aminomutase
IPR022462EpmB
IPR022881rRNA_lsu_MeTfrase_Cfr
IPR022946UPF0313
IPR023404rSAM_horseshoe
IPR023819Pep-mod_rSAM_AF0577
IPR023821rSAM_TatD-assoc
IPR023822rSAM_TatD-assoc_bac
IPR0238687-CO-7-deazaGua_synth_put_Clo
IPR023880Benzylsucc_Synthase_activating
IPR0238854Fe4S-binding_SPASM_dom
IPR023886QH-AmDH_gsu_maturation
IPR023891Pyrrolys_PylB
IPR023897Spore_PP_lysase
IPR023904Pep_rSAM_mat_YydG
IPR023912YjjW_bact
IPR023913Mycofactocin_rSAM_pep_mat
IPR023930NirJ1
IPR023969CHP04072_B12-bd/rSAM
IPR023979CHP04014_B12-bd/rSAM
IPR023980CHP04013_B12-bd/rSAM
IPR023992HemeD1_Synth_rSAM_NirJ
IPR023993TYW1_archaea
IPR023995HemZ
IPR024001Cys-rich_pep_rSAM_mat_CcpM
IPR024007FeFe-hyd_mat_HydG
IPR024016CHP04064_rSAM
IPR024017Pep_cycl_rSAM
IPR024018CHP04083_rSAM
IPR024023rSAM_paired_HxsB
IPR024025SCIFF_rSAM_maturase
IPR024025SCIFF_rSAM_maturase
IPR024177Biotin_synthase
IPR024560UPF0313_C
IPR0249247-CO-7-deazaguanine_synth-like
IPR025895LAM_C_dom
IPR026322Geopep_mat_rSAM
IPR026332HutW
IPR026335SAM_SPASM_FxsB
IPR026344SCM_rSAM_ScmE
IPR026346SCM_rSAM_ScmF
IPR026357rSAM/SPASM_prot_GRRM_system
IPR026404rSAM_w_lipo
IPR026407SAM_GG-Bacter
IPR026412rSAM_Cxxx_rpt
IPR026423rSAM_cobopep
IPR026423rSAM_cobopep
IPR026426rSAM_FibroRumin
IPR026429MIA_synthase
IPR026447B12_SAM_Ta0216
IPR026482rSAM_nif11_3
IPR027492RNA_MTrfase_RlmN
IPR027526Lipoyl_synth_chlpt
IPR027527Lipoyl_synth_mt
IPR027564HpnR_B12_rSAM
IPR027570GeoRSP_rSAM
IPR027583rSAM_ACGX
IPR027586rSAM_metal_mat
IPR027596AmmeMemoSam_rS
IPR027604W_rSAM_matur
IPR027609rSAM_QueE_Proteobac
IPR027621rSAM_QueE_gams
IPR027633rSAM_NirJ2
IPR030801Glu_2_3_NH3_mut
IPR030837B12_rSAM_cofa1
IPR030896rSAM_AhbD_hemeb
IPR030905CutC_activ_rSAM
IPR030915rSAM_SkfB
IPR030933Non_iron_rSAM
IPR030969B12_rSAM_trp_MT
IPR030977QueE_Cx14CxxC
IPR030989rSAM_XyeB
IPR031003BcpD_PhpK_rSAM
IPR031012rSAM_mob_pairB
IPR031015Arg_2_3_am_muta
IPR031691LIAS_N
IPR031691LIAS_N
IPR032432Radical_SAM_C
IPR033971Avilamycin_epimerase
IPR033974Glycerol_dehydratase_activase
IPR033975ThnP-like
IPR033976GntE-like
IPR034165NifB_C
IPR034386BtrN-like
IPR034391Cmo-like_SPASM_containing
IPR034405F420
IPR034422HydE/PylB-like
IPR034428ThiH/NoCL/HydG-like
IPR034436NocN/NosN-like
IPR0344384-hPhe_decarboxylase_activase
IPR034457Organic_radical-activating
IPR034462Benzylsuc_synthase_activase
IPR034465Pyruvate_for-lyase_activase
IPR034466Methyltransferase_Class_B
IPR0344717_8-dihydro-6-hydroxymethylpte
IPR034474Methyltransferase_Class_D
IPR034479AhbC-like
IPR034480Heme_carboxy_lyase-like
IPR034485Anaerobic_Cys-type_sulfatase-m
IPR034491Anaerob_Ser_sulfatase-maturase
IPR034497Bacteriochlorophyll_C12_MT
IPR034498Bacteriochlorophyll_C8_MT
IPR034505Coproporphyrinogen-III_oxidase
IPR034508Spectinomycin_biosynthesis
IPR034514ThnK-like
IPR034515ThnL-like
IPR034529Fom3-like
IPR034530HpnP-like
IPR034531Methylation_of_yatakemycin
IPR034534Pyrimidine_methyltransferase
IPR034547Tte1186a_maturase
IPR034556tRNA_wybutosine-synthase
IPR034557ThrcA_tRNA_MEthiotransferase
IPR034559Spore_PP_lysase_Clostridia
IPR034560Spore_PP_lysase_Bacilli
IPR034687ELP3-like
IPR038135Methylthiotransferase_N_sf
IPR039661ELP3
IPR039661ELP3
IPR040063QhpD-like/Tte1186a
IPR040072Methyltransferase_A
IPR040074BssD/PflA/YjjW
IPR040081CndI-like
IPR040082GenK-like
IPR040085MJ0674-like
IPR040086MJ0683-like
PF13186SPASM
Sequence Status: "Complete" and "Fragment"

UniProt designates the "Sequence Status" for each sequence: "Complete" if the encoding DNA sequence includes both a start and stop codon; "Fragment" if one or both of these codons is/are absent. A "Fragment" may result if the coding DNA sequence is at the end of a contig. A "Complete" sequence need not to be a "full length" sequence, e.g., it may be truncated as the result of sequencing errors that produce incorrect start and/or stop codons.

Option B identified 664,196 "Complete" and "Fragment" UniProt sequences in 66,428 UniRef50 clusters; these represent 579,102 unique sequences (100% sequence identity over 100% of the length). Option B identified 620,386 "Complete" sequences in 52,886 UniRef50 clusters; these represent 535,892 unique sequences.

Minimum Sequence Length

We remove "short" (truncated) sequences from our datasets to 1) improve the quality of the MSAs used to generate WebLogos and HMMs and 2) minimize the number of singletons in SSNs generated with alignment scores that collect sequences into "isofunctional" clusters. We designate sequences as "short" if they cannot encode a functional RSS enzyme. By inspecting the UniProt ID length histograms for the subgroups in RadicalSAM.org (generated using the Cluster Analysis utility of EFI-EST), we identified the anaerobic ribonucleotide-triphosphate reductase activating enzyme family with members that contain ≥140 residues (Megacluster-3-3-1) as the "shortest" functional RSS family.

Therefore, we used UniRef50 IDs that contain ≥140 residues to construct the SSN for the RSS.

Length Histograms

It is instructive for the user to be familiar with the length distribution of the UniRef50 IDs (i.e., nodes) in our datasets and resulting SSNs. The length histograms (UniProt IDs, UniRef90 cluster IDs, and UniRef50 cluster IDs) for the "Complete" and "Complete" plus "Fragment" datasets are shown below.

UniProt IDs
  1. The "Complete" dataset contains "short" sequences (<140 residues; too short to be functional) although they are designated "Complete" by UniProt. These result from sequencing errors but, since they the encoding DNA has start and stop codons, they cannot be distinguished from "full length" sequences when deposited in UniProt.
  2. The "Complete" plus "Fragment" dataset contains a larger fraction of "short" sequences, the additional sequences contributed by those encoded by DNA without start and/or stop codons.
UniRef90 cluster IDs
  1. The histograms are similar to those for the UniProt IDs, albeit lower resolution because sequences that share ≥90% sequence identity are conflated in the same UniRef90 cluster.
  2. The fraction of "short" sequences is larger than in the UniProt ID histograms. These result from "random" unique sequences instead of homologous "full length" sequences of similar length. As result, the "short" clusters are less likely to include multiple sequences so they constitute a larger fraction of the cluster IDs.
UniRef50 cluster IDs
  1. The histograms are even lower resolution because sequences that share ≥50% sequence identity are conflated in the same UniRef50 cluster.
  2. The fraction of "short" clusters is further increased. Although the absolute number of sequences in the "short" UniRef50 clusters the same as in the UniRef90 clusters and the UniProt IDs, the fraction of "short" clusters increases, again because these result from "random" unique sequences instead of homologous "full length" sequences of similar length.

With its large number of sequences, the RSS requires a large-scale approach for identifying subgroups with related functions. Following the strategy used by the SFLD, we generated the SSN for the RSS using UniRef50 cluster IDs and then segregated it into clusters, some defining the 20 SFLD-curated subgroups and others defining uncharacterized sequences. In this section, we describe our procedure for identifying subgroups.

Given the large numbers of sequences and SSN clusters, we cannot guarantee that the subgroup segregation is "perfect". Indeed, we expect that it is not. However, the identified subgroups provide manageable starting points for the discovery of novel functions using EFI-EST, EFI-GNT, and EFI-CGFP.

Subgroup Identification

Using an alignment score of 11 that groups UniRef50 clusters into identifiable SFLD subgroups and minimizes the number of singletons (visual inspection), the SSN contains 50,232 nodes and 41,476,118 edges. We used a Mac Pro desktop computer with 768 GB RAM to visualize and manipulate the SSN using Cytoscape 3.8.2.

The large cluster (50,084 nodes and 41,476,016 edges) was selected for identification of the subgroups.

The nodes associated with SFLD subgroups were identified by coloring the nodes according to the InterPro family (F)/domain (D) that includes the SFLD subgroup (Table 1); the node colors for the SFLD-curated subgroups are those used by the SFLD. Clusters associated with four additional families/domains described by InterPro were also colored.

SubgroupSubgroup NameIPR #Color
17-carboxy-7-deazaguanine synthase-like (F)IPR024924Teal
2Coproporphyrinogen III oxidase-like (F)IPR034505Red
3Antiviral proteins (viperin) (F)----
4Avilamycin synthase (F)IPR033971Pink
5B12-binding domain containing (D)IPR006158Blue
6BATS domain containing (D)IPR010722Orange
7DesII-like (F)IPR016863Mauve
8ELP3/YhcC (F)IPR039661Black
9F420, menaquinone cofactor biosynthesis (F) IPR034405Purple
10FeMo-cofactor biosynthesis protein (F)IPR005980Mint green
11Lipoyl synthase like (F)IPR003698Yellow
12Methylthiotransferase (D)IPR013848Verdun green
13Methyltransferase Class A (F)IPR040072Dark brown
14Methyltransferase Class D (F)IPR034474Light pink
15Organic radical activating enzymes (F)IPR034457Cyan
16PLP-dependent (F)IPR003739Dark green
17SPASM/twitch domain containing (D)IPR023885Magenta
18Spectinomycin biosynthesis (F)IPR034508White
19Spore photoproduct lyase (F)IPR023897Green
20tRNA wybutosine-synthesizing (F)IPR034556Brown
Protein MJ0683-like (F)IPR040086Electric lime
Uncharacterized protein family UPF0313 (F)IPR022946Olive
DUF5131 (F)IPR011101Light purple
3',8-Cyclase/Mo cofactor synthesis (D)IPR010505Dodger blue

In contrast to the SFLD's SSN (Background tab), the clusters containing the subgroups are not separated in this SSN (a single cluster!), the result of both the larger number of nodes and the choice of a smaller alignment score to both prevent splitting SFLD subgroups into multiple clusters and, also, reduce the number of singletons (the SFLD used an edge threshold of 1e-20).

The SFLD subgroups were separated by selection/deletion of "long" SSN edges ("remote" sequence relationships). This editing is subjective, but we know of no other practical strategy to separate the subgroups. That 1) the nodes associated with the various subgroups colocalize in the starting SSN and 2) their colocalization is maintained in the editing supports the validity of this approach.

In several of the SFLD subgroups, some nodes are grey because they are not recognized by the HMMs used by InterPro (many of the HMMs were generated when the RSS was significantly smaller/less diverse). As a result, the InterPro families (and their HMMs) cannot be used to provide the members of the subgroups.

The nodes associated with either SFLD Subgroup 5, B12-binding domain (blue nodes), or SFLD Subgroup 17, SPASM/Twitch-domain (magenta nodes), are not colocalized in a "spherical", well-organized cluster because the sequence-function space is more diverse than that for the other subgroups. As described in the Functionally Diverse Subgroups tab, RadicalSAM.org provides a strategy to identifying isofunctional clusters in these subgroups.

The resulting "edge-edited" SSN contained 10 clusters, five "megaclusters" containing multiple SFLD subgroups [(mega)clusters are numbered in order of decreasing number of UniRef50 IDs/nodes,1 through 5] and five clusters containing a single SFLD subgroup or InterPro family (numbered in order of decreasing UniRef50 IDs/nodes, 6 through 10).

As described in the tabs, Megaclusters-1, -2, -3, -4, and -5 were segregated into component subgroups.

Megacluster-1 contains five SFLD subgroups: Subgroup 17 (SPASM/Twitch domain, Megacluster-1-1, magenta nodes); Subgroup 14 (methyltransferase D, Megacluster-1-3, pink nodes); Subgroup 10 (FeMo-cofactor, Megacluster-1-4, pale green nodes); Subgroup 3 (viperin, Megacluster-1-5; grey nodes because InterPro does not curate this subgroup); and Subgroup 7 (DesII-like, Megacluster-1-8; pale magenta nodes).

Megacluster-1 also contains the large GTP 3’,8-cyclase family (Megacluster-1-2; "Dodger blue" nodes). The sequences in Megaclusters-1-2, -1-3, -1-4, and 1-5 were identified as discrete clusters in the UniRef50 SSN for Megacluster-1 generated with an alignment score 30; these were removed from Megacluster-1 to generate the SSN for Subgroup 17 (SPASM/Twitch domain; Megacluster-1-1; magenta nodes).

Two uncharacterized clusters "loosely" connected to Megacluster-1 were segregated by manual edge deletion (Megacluster-1-6 and Megacluster-1-7; grey nodes). Megacluster-1-6 contains the Swiss-Prot curated ]methyl-coenzyme M reductase subunit alpha]-arginine C-methyltransferase function.

The UniRef50 SSN for the resulting Megacluster-1-1 is displayed below.

The UniRef50 SSN for the additional four SFLD subgroups, GTP 3’,8-cyclases, and loosely connected clusters is displayed below (generated with an alignment score of 18 and edge-edited).

Megacluster-2 contains four SFLD subgroups: Subgroup 5 (B12-binding domain, Megacluster-2-1, blue nodes); Subgroup 2 (Coproporphyrinogen III oxidase-like, Megacluster-2-2, red nodes), Subgroup 12 (Methylthiotransferase, Megacluster-2-3, Verdun green nodes); and Subgroup 8 (Elongator protein 3, Megaclusters-2-4 and -2-5, black nodes).

The UniRef50 SSN for Megacluster-2 was generated with an alignment score of 14 (increase from the initial alignment score of 11); the "long" edges were deleted to segregate the clusters/subgroups. The clusters are numbered in order of decreasing number of UniRef50 IDs/nodes.

Megacluster-2-4 was segregated into subclusters as indicated on the Explore page for the cluster.

Megacluster-3 contains two SFLD subgroups: Subgroup 1 (7-carboxy-7-deazaguanine synthase-like, Megacluster-3-1, teal nodes) and Subgroup 2 (Organic radical activating enzymes, Megacluster-3-2; cyan nodes).

The UniRef90 IDs in Megacluster-3 were used to generate the SSN with an alignment score of 20 (increase from the initial alignment score of 11); the "long" edges were deleted to segregate the clusters/subgroups. The increase in node resolution to UniRef90 ensures that the subclusters obtained from "edge-editing" will be isofunctional. The clusters are numbered in order of decreasing number of UniRef90 IDs/nodes.

Where indicated on their Explore pages, several of Mega-3-N clusters were further segregated so that different SwissProt functions are located in distinct subclusters.

Megacluster-4 contains two SFLD subgroups: Subgroup 10 (F420, menaquinone cofactor biosynthesis, Megacluster-4-1; purple nodes) and Subgroup 6 (BATS domain containing, Megaclusters-4-2, -4-3, -4-5, and -4-10, orange nodes).

The UniRef90 IDs in Megacluster-4 were used to generate the SSN with an alignment score of 22 (increase from the initial alignment score of 11); the "long" edges were deleted to segregate the clusters/subgroups. The increase in node resolution to UniRef90 ensures that the subclusters obtained from "edge-editing" will be isofunctional. The clusters are numbered in order of decreasing number of UniRef90 IDs/nodes.

Where indicated on their Explore pages, several of Mega-4-N clusters were further segregated so that different SwissProt functions are located in distinct subclusters.

Megacluster-5 contains one SFLD subgroup and two additional InterPro families: Protein MJ0683-like (Megacluster-5-1,electric lime nodes); DUF5131 (Megacluster-5-2; light purple nodes), and Subgroup 19 (Spore photoproduct lyase, Megacluster-5-3, green nodes).

The UniRef50 SSN for Megacluster-5 was generated with an alignment score of 14 (increase from the initial alignment score of 11); the "long" edges were deleted to segregate the clusters/subgroups. The clusters are numbered in order of decreasing number of UniRef50 IDs/nodes.

Four clusters each contained one SFLD subgroup and the fifth contained one InterPro family: Subgroup 13 (Methyltransferase Class A, Cluster-6, dark brown nodes); Subgroup 16 (PLP-dependent, Cluster-7, dark green nodes); Subgroup 11 (Lipoyl synthase like, Cluster-8, yellow nodes); Subgroup 19 (tRNA wybutosine-synthesizing, Cluster-9, brown nodes); and UPF0313 (Cluster-10, olive nodes).

These clusters were used without further editing. The clusters are numbered in order of decreasing number of UniRef50 IDs/nodes.

SFLD subgroup 2 (Megacluster-2-2; oxygen-independent coproporphyrinogen III oxidase like; red nodes in the SSN), SFLD subgroup 5 (Megacluster-2-1; B12-binding domain; blue nodes), SFLD subgroup 16 (Cluster-7; PLP-dependent; green nodes) and SFLD subgroup 17 (Megacluster-1-1; SPASM/Twitch domain-containing; magenta nodes) are large and functionally diverse. Segregation of the SSNs for these subgroups into isofunctional families/clusters is not easy, e.g., a single alignment score threshold cannot be used to segregate the SSN into isofunctional clusters.

Analysis Strategy

The SSN for the RSS was generated with UniRef50 clusters/IDs. This coarse granularity is sufficient for identifying subgroups (SFLD-curated and uncharacterized clusters). However, these nodes that conflate sequences sharing ≥50% sequence identity can result in nodes and SSN clusters that are heterofunctional even as the alignment scores is increased, thereby confusing interpretation of MSAs and genome context. Therefore, UniRef90 clusters/IDs were used to generate the SSNs for these subgroups to maximize the likelihood that as the alignment score is increased, the SSN clusters will become isofunctional.

Segregation of the SSNs for these subgroups into isofunctional families/clusters is not easy, e.g., a single alignment score threshold cannot be used to segregate the SSN into isofunctional clusters.

To solve this problem, for each subgroup a series of SSNs was generated with increasing alignments scores. As the alignment score increases, both the sizes of the clusters and their functional complexity decrease, with isofunctional clusters segregating at alignment scores that are characteristic of the function. We refer to this strategy as "dicing".

The "AS Walk-Through" function is provided so that the user can 1) "walk" forward from any cluster to its progeny clusters in the SSN with the next alignment score or 2) "walk" backward to its progenitor cluster in in the SSN with the previous alignment score. These connections may allow the discovery of divergent functions that share mechanistic attributes.

The "AS Walk-Through" window provides SwissProt and user-provided annotations, if available, for the progenitor cluster and progeny clusters. These provide "landmarks" for exploring sequence-function space.

For each cluster in each SSN, the Explore page provides information to assess whether the cluster is isofunctional, e.g., convergence ratio, number of conserved Cys residues, and, most importantly, genome neighborhood diagrams (GNDs) for the sequences in the clusters.

As described in the description of the Search function, the clusters in the SSNs can be searched with a UniProt ID or sequence.

Megacluster-1-1, Subgroup 17 (SPASM/Twitch domain)

The UniRef50 nodes/IDs in Megacluster-1-1 (obtained as described in the Megacluster-1 subtab of the Subgroup tab) were expanded to UniRef90 nodes/IDs. The SSN was generated with Option D of EFI-EST followed by analyses using a series of 33 alignment scores (from 25 to 70 increments of 5, from 80 to 300 in increments of 10).

The Explore page for Megacluster-1-1 displays the SSN generated with an alignment score of 11 and provides access various types of bioinformatic information about Megacluster-1-1. The Explore page provides a link to the Diced SSNs page. The clusters in each of the 33 "diced" SSNs can be viewed by selecting the alignment score (a cluster contains ≥3 UniRef90 IDs/nodes). As the alignment score increases, the clusters decrease in size and complexity. Also, as the alignment score increases, the number of clusters initially increases as the large clusters segregate and then decreases as the small clusters "dissociate" into individual nodes.

The Click here link on the Diced SSNs page accesses the Explore page for Megacluster-1-1-1 in the SSN generated with an alignment score of 25. On that (and any) Explore page, any cluster in the current SSN can be selected; also, SSNs generated with other alignment scores can be selected.

Each Explore page includes the "AS Walk-Through" button above the image for the cluster. The "AS Walk-Through" function allows the user to "walk through" the series of "diced" SSNs, allowing identification of the progeny of a cluster (walking forward) or the progenitor of a cluster (walking backward). This function allows, for example, analyses of 1) speciation of orthologues (with the taxonomic distribution of the cluster available via the TAXONOMY button) and 2) divergent evolution of functions from a common progenitor.

Clicking the "AS Walk-Through" button opens a window that identifies 1) the cluster in the previous SSN in the series that contained the sequences in the cluster and 2) the cluster(s) in the next SSN that contain(s) the sequences in the cluster. For each cluster in the window, the number of nodes and CR are provided as well as SwissProt functions and user-contributed annotation, if these are available. The cluster is a link to the cluster; clicking the link opens the Explore page for the cluster.

Several "diced" clusters that lack conserved Cys motifs in a C-terminal domain can be identified (Megacluster-1-1 was identified using pairwise sequence similarity using all UniRef50 IDs in the RSS, so some "outliers" can be expected). And, although Subgroup 17 has been designed as "SPASM/Twitch domain-containing" (PF13186 and IPR023885), inspection of the Explore pages reveals that clusters contain a wide variety of C-terminal (and N-terminal) domain Cys-rich motifs, ranging in number from 1 to 27 Cys residues (the paradigm SPASM domain contains 8 Cys residues for two Fe4S4 clusters; the paradigm Twitch domain contains 4 Cys for one Fe4S4 cluster1).

1 doi: 10.1128/JB.00040-11
doi: 10.1074/jbc.R114.581249
doi: 10.1016/j.bbamcr.2015.01.002
doi: 10.1074/jbc.RA118.005369

Subgroup 5, Megacluster-2-1 (B12-binding domain)

The UniRef50 IDs in Megacluster-2-1 (obtained as described in the Megacluster-2 Subgroup tab) were expanded to UniRef90 nodes/IDs. The SSN was generated with Option D of EFI-EST followed by analyses using a series of 30 alignment scores (from 35 to 325 in increments of 10).

The Explore page for Megacluster-2-1 displays the SSN generated with an alignment score of 11 and provides access various types of bioinformatic information about Megacluster-2-1. The Explore page provides a link to the Diced SSNs page. The clusters in each of the 30 "diced" SSNs can be viewed by selecting the alignment score (a cluster contains ≥3 UniRef90 IDs/nodes). As the alignment score increases, the clusters decrease in size and complexity. Also, as the alignment score increases, the number of clusters initially increases as the large clusters segregate and then decreases as the small clusters "dissociate" into individual nodes.

The Click here link on the Diced SSNs page accesses the Explore page for Megacluster-2-1-1 in the SSN generated with an alignment score of 35. On that (and any) Explore page, any cluster in the current SSN can be selected; also, SSNs generated with other alignment scores can be selected.

Each Explore page includes the "AS Walk-Through" button above the image for the cluster. The "AS Walk-Through" function allows the user to "walk through" the series of "diced" SSNs, allowing identification of the progeny of a cluster (walking forward) or the progenitor of a cluster (walking backward). This function allows, for example, analyses of 1) speciation of orthologues (with the taxonomic distribution of the cluster available via the TAXONOMY button) and 2) divergent evolution of functions from a common progenitor.

Clicking the "AS Walk-Through" button opens a window that identifies 1) the cluster in the previous SSN in the series that contained the sequences in the cluster and 2) the cluster(s) in the next SSN that contain(s) the sequences in the cluster. For each cluster in the window, the number of nodes and CR are provided as well as SwissProt functions and user-contributed annotation, if these are available. The cluster is a link to the cluster; clicking the link opens the Explore page for the cluster.

Several "diced" clusters that lack the B12-binding domain (PF02310) can be identified (Megacluster-2-1 was identified using pairwise sequence similarity using all UniRef50 IDs in the RSS, so some "outliers" can be expected).

Subgroup 2, Megacluster-2-2 (Oxygen-independent coproporphyrinogen III oxidase-like)

The UniRef50 nodes/IDs in Megacluster-2-2 (obtained as described in the Megacluster-2 subtab of the Subgroup tab) were expanded to UniRef90 nodes/IDs. The SSN was generated with Option D of EFI-EST followed by analyses using a series of 20 alignment scores (60 to 250 in increments of 10).

The Explore page for Megacluster-2-2 displays the SSN generated with an alignment score of 11 and provides access various types of bioinformatic information about Megacluster-2-2. The Explore page provides a link to the Diced SSNs page. The clusters in each of the 20 "diced" SSNs can be viewed by selecting the alignment score (a cluster contains ≥3 UniRef90 IDs/nodes). As the alignment score increases, the clusters decrease in size and complexity. Also, as the alignment score increases, the number of clusters initially increases as the large clusters segregate and then decreases as the small clusters "dissociate" into individual nodes.

The Click here link on the Diced SSNs page accesses the Explore page for Megacluster-2-2-1 in the SSN generated with an alignment score of 60. On that (and any) Explore page, any cluster in the current SSN can be selected; also, SSNs generated with other alignment scores can be selected.

Each Explore page includes the "AS Walk-Through" button above the image for the cluster. The "AS Walk-Through" function allows the user to "walk through" the series of "diced" SSNs, allowing identification of the progeny of a cluster (walking forward) or the progenitor of a cluster (walking backward). This function allows, for example, analyses of 1) speciation of orthologues (with the taxonomic distribution of the cluster available via the TAXONOMY button) and 2) divergent evolution of functions from a common progenitor.

Clicking the "AS Walk-Through" button opens a window that identifies 1) the cluster in the previous SSN in the series that contained the sequences in the cluster and 2) the cluster(s) in the next SSN that contain(s) the sequences in the cluster. For each cluster in the window, the number of nodes and CR are provided as well as SwissProt functions and user-contributed annotation, if these are available. The cluster is a link to the cluster; clicking the link opens the Explore page for the cluster.

Subgroup 16, Cluster-7 (PLP-dependent)

The UniRef50 IDs in Cluster-7 (obtained as described in the Clusters subtab of the Subgroup tab) were expanded to UniRef90 nodes/IDs. The SSN was generated with Option D of EFI-EST followed by analyses using a series of 20 alignment scores (60 to 250 in increments of 10).

The Explore page for Cluster-7 displays an image of the SSN generated with an alignment score of 11 and provides access various types of bioinformatic information about Cluster-7. The Explore page provides a link to the Diced SSNs page. The clusters in each of the 20 "diced" SSNs can be viewed by selecting the alignment score (a cluster contains ≥3 UniRef90 IDs/nodes). As the alignment score increases, the clusters decrease in size and complexity. Also, as the alignment score increases, the number of clusters initially increases as the large clusters segregate and then decreases as the small clusters "dissociate" into individual nodes.

The Click here link on the Diced SSNs page accesses the Explore page for Cluster-7-1 in the SSN generated with an alignment score of 60. On that (and any) Explore page, any cluster in the current SSN can be selected; also, SSNs generated with other alignment scores can be selected.

Each Explore page includes the "AS Walk-Through" button above the image for the cluster. The "AS Walk-Through" function allows the user to "walk through" the series of "diced" SSNs, allowing identification of the progeny of a cluster (walking forward) or the progenitor of a cluster (walking backward). This function allows, for example, analyses of 1) speciation of orthologues (with the taxonomic distribution of the cluster available via the TAXONOMY button) and 2) divergent evolution of functions from a common progenitor.

Clicking the "AS Walk-Through" button opens a window that identifies 1) the cluster in the previous SSN in the series that contained the sequences in the cluster and 2) the cluster(s) in the next SSN that contain(s) the sequences in the cluster. For each cluster in the window, the number of nodes and CR are provided as well as SwissProt functions and user-contributed annotation, if these are available. The cluster is a link to the cluster; clicking the link opens the Explore page for the cluster.

Exploring subgroups

This section describes the Search function results when the input UniProt ID or sequence matches a sequence located in a cluster (≥3 UniRef90 nodes) in the functionally diverse subgroups 2, 5, 16, and 17. Refer to the description of the Search tab for results when the input matches one of the other subgroups.

With the Find by UniProt ID function, the user provides a UniProt ID for searching all of the clusters in RadicalSAM.org. When the ID is located in the "diced" SSNs of a functionally diverse subgroup, the Results page provides a list of clusters in the "diced" SSNs that contain the UniProt ID along with the number of UniProt IDs, number of cluster nodes, and UniProt ID convergence ratio (CR; described on the Explore Pages tab). The clusters are links to the Explore page for that cluster (see Explore Pages tab for a description of the information provided on the Explore page).

As the alignment score used to generate the diced clusters increases, a UniProt ID may located in a cluster with ≤2 UniRef90 ID nodes. When this occurs, the Search will report "ID not found". Generation of the MSA, WebLogo, HMM, Length Histograms, tables of Conserved Cys Residues, and files with IDs and FASTA sequences for a cluster requires that the cluster contain ≥3 UniRef90 nodes.

With the Find by Sequence function, the user provides a sequence (with/without a FASTA header) for first searching the HMMs of the subgroups to identify the subgroup with smallest e-value. If the smallest e-value is for a functionally diverse subgroup (SFLD subgroup 2, 5, 16, or 17; Megaclusters-1-1, -2-1, or -2-2 or Cluster-7), the HMMs for all of the clusters in the series of "diced" SSNs in that subgroup are searched.

The Results page provides a list of the three clusters at each alignment score with the smallest e-values along with the number of UniProt IDs, number of cluster nodes, and UniProt ID convergence ratio (CR; described on the Explore Pages tab). The cluster identifiers are links to the Explore page for that cluster (see Explore Pages tab for a description of the information provided on the Explore page).

As the alignment score used to generate the "diced" SSN increases, the e-value typically decreases as the alignment score increases and the cluster becomes orthologous/isofunctional as nonorthologous sequences are removed. As the alignment score used to generate the diced clusters increases and if/when the user-provided sequence segregates into a cluster with ≤2 UniRef90 ID nodes for which an HMM is not generated, the Search results will continue to identify the three best clusters but the e-values likely will be larger than those identified for clusters containing the sequence.

An image of the cluster is presented on its Explore pages: an isofunctional cluster likely will be "spherical", with each node/sequence connected to all other nodes with an edge so the value of CR will approach 1.0. Functional homogeneity within each cluster also can be assessed by inspection of the Conserved Cys Residue table as well as the genome neighborhood diagrams (GNDs) for the UniRef90 node IDs and the UniProt IDs.

An Explore page is provided for each SSN cluster in RadicalSAM.org.

The Explore page provides information about the sequences in the SSN cluster (image displayed) that can be viewed or downloaded:
  1. SwissProt-annotated functions (button)
  2. KEGG annotated sequences (button)
  3. PDB files (button)
  4. TIGRFAM families (button)
  5. Community annotations (button)
  6. Taxonomy sunburst (button)
  7. Genome neighborhood diagrams (GNDs; button)
  8. Clusters sizes (numbers of UniProt, UniRef90, and UniRef50 IDs)
  9. Convergence Ratio (CR) for the UniProt IDs and UniRef node IDs,
  10. Summary of the number of Conserved Cys Residues as a function of sequence conservation (from 90% to 10%, decreasing in steps of 10%)
  11. WebLogo and multiple sequence alignment (MSA; generated with MUSCLE; can be viewed with Jalview that is available for download from https://www.jalview.org/)
  12. HMM (viewed interactively using Skylign at https://skylign.org/; a text file is also available for download)
  13. Length histograms for UniProt, UniRef90, and UniRef50 IDs
  14. SSN for the displayed cluster (xgmml file for Cytoscape; download)
  15. Lists of UniProt, UniRef90, and UniRef50 IDs (download)
  16. UniProt, UniProt90, and UniRef50 FASTA files (download)
  17. A table with the number and residue positions of conserved Cys residues in the MSA (download).

Convergence ratio (CR): The CR is the ratio of the number of sequence pairs with edge alignment score values (derived from BLAST e-values/bit scores) ≥ the minimum alignment score threshold used to generate the SSN to the total number of sequence pairs. The value of CR ranges from 1.0 for sequences that are very similar ("identical") to 0.0 for sequences that are unrelated at the specified alignment score.

At small values of the alignment score, the value of CR for a cluster can be ~ 1.0 even if the cluster is heterofunctional (e-values are large; pairwise sequence similarity is small). However, at larger values of the alignment score, clusters with values approaching 1.0 are likely to be isofunctional (e-values are small; pairwise sequence identity is large).

Isofunctional clusters, as judged by shared genome context in the GNDs, often have CR values that approach 1.0. However, the values of CR for isofunctional/orthologous clusters that contain sequences from phylogenetically diverse species can decrease with increasing alignment score as the sequence divergence between orthologues in different phylogenetic groups cause the CR to decrease. In such situations, as the alignment score increases, an isofunctional cluster with a low value for CR can segregate into smaller clusters for different phylogenetic groups with CR values that approach 1.0. Thus, values of CR that approach 1.0 are not required for isofunctionality; inspection of the GNDs allows that assessment

Conserved Cys Residues: A list is provided of the number of Conserved Cys Residues as a function of percent conservation in the MSA. Recall that, by definition, members of the RSS share a Cx3Cx2C motif for SAM-binding. Conserved Cys residues in excess of 3 may be associated with additional FeS clusters, e.g., members of the SFLD subgroup 17, SPASM/Twitch domain.

The sequences in the cluster are not edited prior to construction of the MSA, so some will be truncated (even if their Sequence Status is "Complete"). Also, if the alignment score threshold is less than that required for isofunctionality, the sequences in the MSA will be heterofunctional and, therefore, heterogeneous in length. Therefore, the Number of Conserved Cys Residues can be expected to be a function of percent conservation, with the most abundant conserved Cys motifs represented at large values of percent conservation and conserved Cys motifs in less abundant sequences/functions represented at lower values of percent conservation. Therefore, with the caveat that the sequences in the cluster cannot be expected to be uniform in length, this summary can be used to evaluate sequence and function heterogeneity.

The Conserved Cys Residues can/should be used together with the MSA, CR, length histograms, and GNDs in identifying isofunctional clusters.

Consensus Cys Residues: A text file ("Consensus residue percentage summary table") is available for download. In the MSA (SSN cluster number in column 1), the positions of Cys residues identified at 90%, 80%, 70%, 60%, 50%, 40%, 30%, 20%, and 10% conservation (column 2) are identified in columns 6 and greater. The number in column 3 is the number of Cys residues conserved that percent conservation columns 4 and 5 provide the number of IDs in the cluster. The canonical Cx3Cx2C is easily identified; additional conserved Cys residues may provide ligands to auxilliary Fe-S centers.

Taxonomy: The Taxonomy button opens a sunburst display similar to that provided by Pfam for its families (https://pfam.xfam.org/). Each node in the SSN is displayed as an arc, arranged radially with the superkingdom at the center and species in the outermost ring. Clicking on a taxonomic group expands that part of the taxonomic hierarchy. Clicking on the center circle will revert the display to the next highest level. Buttons are provided to download the IDs (UniProt and UniRefNN) and FASTA files (UniProt and UniRefNN) at the displayed taxonomic level (depending on the number of sequences, a delay may be encountered in downloading the FASTA files).

Genome Neighborhood Diagrams (GNDs): The Genome Neighborhood Diagrams button provides genome neighborhood diagrams (GNDs) for the node IDs in each cluster. These are displayed using the GND Explorer used by the EFI-GNT tool. The GNDs provide information about both functional heterogeneity (one or several genome neighborhoods; one or more functions) and possible metabolic pathways (Pfam/InterPro families of proximal genes).

For UniRef50 SSN clusters, the default GND display is UniRef50 node IDs in the cluster; for UniRef90 SSN clusters, the default display is UniRef90 nodes IDs in the cluster.

The GNDs for the UniRef90 IDs in each UniRef50 node are available by clicking the "+" link adjacent to each UniRef50 GND.

The GNDs for the UniProt IDs in each UniRef90 node are available by clicking the "+" link adjacent to each UniRef90 GND.

We would like RadicalSAM.org to be a community resource, with users providing current annotation information that will assist the community with selection of proteins for study and inform the sequence-function space that is used to infer functions.

SwissProt annotations are incomplete and sometimes vague or incorrect; experimentally verified annotations provided by the community are expected to be both more reliable and precise. On the Submit page, we provide a "Community Annotation Submission" form for users to submit experimentally determined functions as well as publications that document/describe these functions. After review, these will be included on the cluster pages using the "Pubs" and "Anno" buttons.

We have identified some proteins in the RadicalSAM superfamily that have already been annotated by the community. These are listed below.

UniProt ID DOI
Q9X5S2 https://doi.org/10.1016/S1074-5521(99)80040-4
A0A060PWX2 https://doi.org/10.1002/anie.201402623
A0A062UZ78 https://doi.org/10.1126/science.aao6595
A0A0E1M5B1 http://dx.doi.org/10.1002/pro.3529
A0A0E3RYH2 https://doi.org/10.1002/anie.201803183
A0A0H3KB22 https://doi.org/10.1093/nar/29.5.1097
https://doi.org/10.1007/BF00290574
https://doi.org/10.1002/chem.201604719
A0A0H4NV78 https://doi.org/10.1021/jacs.6b10901
A0A0K0K526 https://doi.org/10.1021/ja312641f
A0A0M3N271 https://doi.org/10.1021/acs.biochem.8b00693
A0A0Z8EWX1 http://dx.doi.org/10.1073/pnas.1703663114
A0A144V459 https://doi.org/10.1111/1462-2920.12509
A0A1C7D1B6 http://dx.doi.org/10.1038/nsmb.3265
A0A1C7D1B7 http://dx.doi.org/10.1038/nsmb.3265
A0A1H7F4G6 https://doi.org/10.1002/anie.202107192
A0A1I5E523 https://doi.org/10.1111/febs.14307
https://doi.org/10.1016/j.febslet.2010.04.053
A0A1M5XT65 https://doi.org/10.1111/1462-2920.15346
A0A1W6VU85 https://doi.org/10.1021/jacs.0c08585
A0A2W5DYM3 https://doi.org/10.1073/pnas.1508440112
http://dx.doi.org/10.1126/science.1241859
http://dx.doi.org/10.1021/acs.biochem.5b01216
A0A2Z6BLR4 https://doi.org/10.1021/acs.biochem.9b00197
A0A376LGD9 http://www.jbc.org/content/285/8/5240
A0A378Y5A1 https://doi.org/10.1021/jacs.9b01520
A0A5C8IMA7 https://doi.org/10.1021/acs.biochem.9b00197
A0A656FYS1 https://doi.org/10.1021/acschembio.6b01069
A0A6B9HEI0 https://doi.org/10.1039/C9CC07197K
A0A6F8Z1X2 https://doi.org/10.3390/biom10050775
A0A7C9KC68 https://doi.org/10.1038/s41586-019-1791-1
A0A7G7KS84 https://doi.org/10.1002/anie.202015177
A0B690 https://pubs.acs.org/doi/10.1021/jacs.0c02243
A0PM49 https://doi.org/10.1074/jbc.M117.795682
A1B2Q7 https://doi.org/10.1074/jbc.M115.638320
A3DDW1 https://doi.org/10.1021/jacs.7b01283
A4IQU1 https://doi.org/10.1093/nar/gks603
https://doi.org/10.3389/fchem.2017.00014
A4J6G2 https://doi.org/10.1016/j.bbapap.2006.11.008
A4VXX3 https://doi.org/10.1021/jacs.9b05151
A5HBL2 https://doi.org/10.1261/rna.1371409
A6L094 not yet published. Only the structure reported.
A6M1R8 https://doi.org/10.1021/ja504618y
A8R0J3 https://doi.org/10.1038/ja.2007.63
A8R0J7 https://doi.org/10.1038/ja.2007.63
A8R0J8 https://doi.org/10.1038/ja.2007.63
B1C2R2 http://dx.doi.org/10.1073/pnas.1909604116
B1KQP7 https://doi.org/10.1038/s41467-018-05217-1
B3QHD1 https://doi.org/10.1073/pnas.0912949107
B8J367 https://doi.org/10.1016/j.bbapap.2014.03.016
B9ZUJ4 https://doi.org/10.1016/j.chembiol.2008.11.005
C0JRZ9 https://doi.org/10.1016/j.chembiol.2009.01.007
https://doi.org/10.1021/jacs.5b12592
http://doi.org/10.1038/ncomms9377
C2TQ82 https://doi.org/10.1371/journal.pone.0020852
C3HC15 https://doi.org/10.1021/jacs.9b01519
C3HC16 https://doi.org/10.1021/jacs.9b01519
C6FX51 https://www.nature.com/articles/nchembio.512
https://doi.org/10.1002/anie.201407320
http://dx.doi.org/10.1126/science.aad8995
C6FX53 https://doi.org/10.1002/anie.201712224
https://doi.org/10.1021/cb900133x
https://doi.org/10.1021/jacs.8b13157
C6XW09 https://doi.org/10.1021/acs.biochem.0c00070
C7P8S7 https://doi.org/10.1111/j.1365-2958.2005.04989.x
D0QZJ5 https://doi.org/10.1002/anie.201102527
https://doi.org/10.1016/j.febslet.2015.05.032
D1C4T7 not yet published. Only the structure reported
D2KTX6 https://doi.org/10.1021/ja902261a
D2KTX8 https://doi.org/10.1021/ja902261a
D2SNF5 https://doi.org/10.1021/acs.biochem.8b00264
https://doi.org/10.1021/acs.biochem.8b00693
D3EIG6 https://doi.org/10.1021/acs.biochem.9b00197
D3T7F1 http://dx.doi.org/10.1073/pnas.1417252112
D5SLH5 https://doi.org/10.1021/bi901018q
D5VRB9 http://dx.doi.org/10.1038/s41467-019-08579-2
D5VRM1 https://doi.org/10.3389/fpls.2017.01567
https://doi.org/10.1021/jacs.6b03329
D6Y4Z7 https://doi.org/10.1021/jacs.7b00693
D8G6F8 https://doi.org/10.1002/anie.201400478
https://doi.org/10.1002/anie.201609469
E2D677 https://doi.org/10.1111/mmi.14430
E5DUI3 http://dx.doi.org/10.1074/jbc.M111.224832
E5KJ95 https://doi.org/10.1016/j.bbapap.2011.11.006
F2R7B0 https://doi.org/10.1039/C6SC03533G
F2R7B1 https://doi.org/10.1039/C6SC03533G 
F5AT08 https://doi.org/10.1073/pnas.0913554107
F5AT09 https://doi.org/10.1073/pnas.0913554107
F8JND9 https://doi.org/10.1073/pnas.1508615112
F8JNE0 https://doi.org/10.1128/AAC.01366-10
F8JNE4 https://doi.org/10.1128/AAC.01366-10
G0LD12 https://doi.org/10.1074/jbc.RA120.015371
G0LD27 https://doi.org/10.1074/jbc.RA120.015371
G1WJU6 https://doi.org/10.1111/1462-2920.15345
G1WKZ6 https://doi.org/10.1111/1462-2920.15344
G9MQB8 http://dx.doi.org/10.1074/jbc.RA118.003998
I3NN68 https://doi.org/10.1021/ja211098r
I7G6W7 https://doi.org/10.1016/bs.mie.2018.04.015
J9ZW10 https://science.sciencemag.org/content/338/6105/387.long
J9ZW29 https://doi.org/10.1021/jacs.7b08402
J9ZXD6 http://doi.org/10.1021/jacs.6b06697
K4MHG1 https://doi.org/10.1016/j.chembiol.2012.08.013
https://doi.org/10.1039/C2SC21183A
K4MHV5 https://doi.org/10.1039/C2SC21183A
https://doi.org/10.1016/j.chembiol.2012.08.013
K4MJZ5 https://doi.org/10.1016/j.chembiol.2012.08.013
https://doi.org/10.1039/C2SC21183A
N0DKX5 https://doi.org/10.1074/jbc.M116.767665
O24770 http://dx.doi.org/10.1038/nature21689
https://doi.org/10.1271/bbb.63.563
O27899 https://doi.org/10.1073/pnas.1510409112
O31423 https://doi.org/10.1021/ja310542g
http://dx.doi.org/10.2210/pdb6EFN/pdb
O31677 https://doi.org/10.1021/bi900400e
https://doi.org/10.1021/acs.biochem.5b00210
O54060 https://doi.org/10.1093/nar/29.5.1097
O58832 https://doi.org/10.1007/s00775-019-01702-0.
https://doi.org/10.1038/nature09138
O58832 https://doi.org/10.1038/nature09138
https://doi.org/10.1021/acs.biochem.8b00287
O59412 http://dx.doi.org/10.1107/S0907444907040668
O70600 https://doi.org/10.1096/fasebj.14.3.523
https://doi.org/10.1096/fasebj.14.3.523
O87941 https://doi.org/10.1046/j.1365-2958.1998.00826.x
https://doi.org/10.1111/j.1574-6976.1998.tb00381.x
P00459 https://doi.org/10.1021/acs.accounts.7b00417
P0A9N4 https://doi.org/10.1073/pnas.0806640105
http://dx.doi.org/10.1126/science.aaf5327
https://doi.org/10.1021/bi00214a005
P0A9N8 https://doi.org/10.1021/bi002936q
P11067 https://doi.org/10.1126/science.1224603
https://doi.org/10.1074/jbc.270.45.26890
https://doi.org/10.1074/jbc.R114.578161
P12996 https://doi.org/10.1021/bi00178a020
https://doi.org/10.1021/jacs.8b07613
https://doi.org/10.1074/jbc.R114.599308
P26168 https://doi.org/10.1073/pnas.97.12.6908
https://doi.org/10.1146/annurev.genet.31.1.61
P27507 http://doi.org/10.1074/jbc.C115.699918
P30140 https://doi.org/10.1002/anie.200702554
P30745 https://doi.org/10.1046/j.1432-1327.1998.2550024.x
P32131 http://doi.org/10.1074/jbc.M512628200
https://doi.org/10.1111/mmi.12951
P32461 https://doi.org/10.1126/science.aao6595
P36979 http://dx.doi.org/10.1126/science.1205358
https://doi.org/10.1021/ja410560p
http://dx.doi.org/10.1126/science.aad5367
P37956 https://doi.org/10.1128/JB.180.18.4879-4885.1998
P39280 https://doi.org/10.1021/bi061328t
P39409 https://doi.org/10.1371/journal.pcbi.1002228
P40487 https://doi.org/10.1126/science.aao6595
P52062 https://doi.org/10.1073/pnas.1416285112
https://doi.org/10.1074/jbc.RA117.000229
P54462 http://dx.doi.org/10.1074/jbc.m110.106831
P60716 http://dx.doi.org/10.1021/bi049528x
https://doi.org/10.1128/jb.175.5.1325-1336.1993
P65388 http://dx.doi.org/10.1073/pnas.0404624101
P65389 http://dx.doi.org/10.1073/pnas.0404624101
P69848 http://dx.doi.org/10.1073/pnas.0510711103
https://doi.org/10.1021/acs.biochem.9b00741
https://doi.org/10.1021/jacs.0c01200
P71011 https://doi.org/10.1128/JB.181.23.7346-7355.1999
https://doi.org/10.1039/c6cc01317a
P71517 https://doi.org/10.1186/1471-2091-9-8
http://dx.doi.org/10.1021/acs.biochem.7b01097
https://doi.org/10.1074/jbc.C115.699918
P9WJ78 http://doi.org/10.1002/1873-3468.12249
https://pubs.acs.org/doi/pdf/10.1021/acs.biochem.6b00355
P9WJ79 https://doi.org/10.1186/1471-2164-12-21
https://doi.org/10.1002/1873-3468.12249
P9WK91 http://dx.doi.org/10.1073/pnas.1602486113
http://dx.doi.org/10.1021/acs.biochem.5b01216
P9WP73 https://doi.org/10.1073/pnas.1416285112
P9WP77 https://pubs.acs.org/doi/10.1021/ja307762b
Q02908 https://doi.org/10.1261/rna.7247705
https://doi.org/10.1016/j.molcel.2005.02.018
https://doi.org/10.1074/jbc.M403361200
Q08960 https://doi.org/10.1038/sj.emboj.7601105
Q0TTH1 https://doi.org/10.1021/ja067175e
Q12Q22 https://doi.org/10.1016/j.bbapap.2014.09.009
Q185C5 https://doi.org/10.1016/j.bbapap.2006.11.008
Q1Q0N1 https://doi.org/10.1186/1745-6150-4-8
Q2MEW6 https://doi.org/10.1038/ja.2016.110
Q2MFI7 https://doi.org/10.1002/anie.201510635
https://doi.org/10.1021/jacs.7b10501
https://doi.org/10.1021/jacs.6b02221
Q2MG55 https://doi.org/10.1016/j.chembiol.2014.12.012
Q38HX2 https://doi.org/10.1007/s00775-013-1008-2
Q38HX4 https://doi.org/10.1159/000440882
Q3ABV3 http://dx.doi.org/10.1002/cbic.201402661
https://doi.org/10.1074/jbc.M801161200
Q3ME29 https://doi.org/10.1186/1471-2164-12-21
Q3ME30 https://doi.org/10.1002/anie.201400478
https://doi.org/10.1002/anie.201609469
Q45595 http://doi.org/10.1038/nchem.2714
Q46CH7 https://doi.org/10.1155/2014/327637
https://doi.org/10.1039/C6SC01140C
Q46E78 https://doi.org/10.1038/nature09918
Q4J8I0 https://doi.org/10.1073/pnas.1909306116
Q4J8R4 https://doi.org/10.1073/pnas.1814048115
Q4JC22 https://doi.org/10.1073/pnas.1909306116
Q50258 https://doi.org/10.7164/antibiotics.48.1191
Q51741 https://doi.org/10.1128/jb.177.16.4817-4819.1995
Q53U14 https://doi.org/10.1021/ja507759f
Q54273 https://doi.org/10.1016/0378-1119(95)00101-b
Q56184 https://doi.org/10.1039/b614678c
https://doi.org/10.1021/acs.biochem.7b00472
https://doi.org/10.1021/acs.biochem.8b00616
Q57705 https://doi.org/10.1016/j.jmb.2007.07.024
https://doi.org/10.1021/jacs.8b01493
Q57888 https://pubs.acs.org/doi/10.1021/ja513287k
https://doi.org/10.1007/s00203-003-0614-8
Q58036 https://doi.org/10.1128/JB.01903-14
Q58195 https://doi.org/10.1128/JB.01125-09
Q58826 https://pubs.acs.org/doi/10.1021/ja307762b
https://doi.org/10.1021/ja513287k
https://doi.org/10.1007/s00203-003-0614-8
Q5IW50 https://doi.org/10.1021/bi201220r
Q5JE80 https://doi.org/10.1038/s41589-019-0390-7
Q5SK48 https://doi.org/10.1021/ja408594p
https://doi.org/10.1021/jacs.7b04209
Q5VV42 https://dx.doi.org/10.7150%2Fijbs.49302
Q60AV6 https://doi.org/10.1073/pnas.0912949107
Q6E3K8 https://doi.org/10.1128/AEM.70.12.7303-7310.2004
Q6PSL4 https://doi.org/10.1074/jbc.M403206200
Q6QVU0 https://doi.org/10.1016/j.bbrc.2008.05.133
Q70KE5 https://doi.org/10.1021/ja312641f
Q712I6 https://doi.org/10.1111/j.1365-2958.1995.tb02269.x
https://doi.org/10.1038/nchembio.2187
Q796V8 https://www.ncbi.nlm.nih.gov/pubmed/?term=10498703
Q841K7 https://doi.org/10.1002/cbic.200300583
Q841K9 https://doi.org/10.1371/journal.pone.0068545
Q84F14 https://doi.org/10.1021/bi060840b
Q8A2W0 https://doi.org/10.1021/jacs.9b11093
Q8CBB9 http://dx.doi.org/10.1073/pnas.1705402114
https://doi.org/10.1021/acs.biochem.9b00741
Q8DLC2 http://dx.doi.org/10.1042/BJ20140895
https://doi.org/10.1021/jacs.5b04387
Q8DT69 https://doi.org/10.1128/mBio.02688-20
Q8G907 https://doi.org/10.1073/pnas.1312228110
https://doi.org/10.1021/ja072481t
https://doi.org/10.1021/jacs.5b03384
Q8GEZ7 https://doi.org/10.1073/pnas.0734105100
https://doi.org/10.1021/bi035930k
Q8GHB6 https://doi.org/10.1002/cbic.200300609
Q8KBK9 https://doi.org/10.1016/j.cbpa.2009.02.036
Q8KCU0 https://doi.org/10.1016/j.cbpa.2009.02.036
Q8R6P9 https://doi.org/10.1021/acs.biochem.6b00145
Q8RAM6 https://doi.org/10.1021/acs.biochem.6b00145
Q8RJJ1 https://doi.org/10.1038/s41598-017-17321-1
Q8THG6 https://doi.org/10.1074/jbc.RA119.007609
Q8THG6 https://doi.org/10.1038/s41598-018-25716-x
Q8THK1 https://doi.org/10.1038/s41589-019-0390-7
Q8TIF7 https://doi.org/10.1073/pnas.1510409112
Q8WXG1 https://doi.org/10.1016/j.febslet.2010.02.041
Q8X6Z3 https://doi.org/10.1039/C8OB02906G
Q93KV6 https://doi.org/10.1074/jbc.M601508200
Q96SZ6 https://doi.org/10.1093/nar/gks240
https://doi.org/10.1016/S0378-1119(99)00499-0
https://dx.doi.org/10.7150%2Fijbs.49302
Q97IK9 https://doi.org/10.1021/bi501205e
https://doi.org/10.1016/j.febslet.2008.04.063
Q97L63 https://doi.org/10.1021/ja807375c
Q9A6Q5 http://doi.org/10.1038/nchembio.121
Q9FB10 https://doi.org/10.1039/b615284h
Q9FBG4 https://doi.org/10.1261/rna.1371409
Q9K864 http://doi.org/10.1021/bi400498d
Q9KZZ7 https://doi.org/10.1039/C8OB02906G
Q9NZB8 https://doi.org/10.1021/ja512997j
Q9RCI2 https://doi.org/10.1128/JB.181.23.7256-7265.1999
Q9S1L5 https://doi.org/10.1002/anie.200801204
Q9WY26 http://dx.doi.org/10.1038/nchembio.1579
Q9WZC1 http://dx.doi.org/10.1074/jbc.m408562200
https://doi.org/10.1074/jbc.M301518200
https://doi.org/10.1002/anie.201713188
Q9X0Z6 https://doi.org/10.1074/jbc.M801161200
https://doi.org/10.1074/jbc.M801161200
Q9X2H6 http://dx.doi.org/10.1074/jbc.M109.065516
http://dx.doi.org/10.1021/ja4048448
http://dx.doi.org/10.1038/nchembio.1229
Q9X5R8 https://doi.org/10.1016/S1074-5521(99)80040-4
Q9X758 https://doi.org/10.1074/jbc.M209435200
https://doi.org/10.1074/jbc.M313855200
Q9X7W1 https://doi.org/10.1002/anie.201911584
Q9XAP2 https://doi.org/10.1021/bi400498d
https://doi.org/10.1126/science.1160446
Q9XBQ8 https://doi.org/10.1073/pnas.0505726102
https://doi.org/10.1128/JB.182.2.469-476.2000
https://doi.org/10.1128/JB.182.2.469-476.2000
Q9ZGH1 https://doi.org/10.1021/ja909451a
https://doi.org/10.1039/A804431G
https://doi.org/10.1021/jacs.5b02545
U1XSN3 https://doi.org/10.1111/1462-2920.12509
V0VQG0 https://doi.org/10.1073/pnas.1603209113
V8LSZ7 http://doi.org/10.1038/nchem.2237