Thank you for visiting Nature.com. You are using a browser version with limited CSS support. For the best experience, we recommend that you use an updated browser (or disable Compatibility Mode in Internet Explorer). In addition, to ensure ongoing support, we show the site without styles and JavaScript.
Sliders showing three articles per slide. Use the back and next buttons to move through the slides, or the slide controller buttons at the end to move through each slide.
Detailed product description
304 Stainless steel welded coiled tube /tubing
1. Specification: Stainless steel coil tube / tubing
2. Type: welded or seamless
3. Standard: ASTM A269, ASTM A249
4. Stainless steel coil tube OD: 6mm to 25.4MM
5. Length: 600-3500MM or as per customer’s requirement.
6. Wall thickness: 0.2mm to 2.0mm.
7. Tolerance: OD: +/-0.01mm; Thickness: +/-0.01%.
8. Coil inner hole size: 500MM-1500MM (can be adjusted according to customer requirements)
9. Coil height: 200MM-400MM (can be adjusted according to customer requirements)
10. Surface: Bright or annealed
11. Material: 304, 304L, 316L, 321, 301, 201, 202, 409, 430, 410, alloy 625, 825, 2205, 2507, etc.
12. Packing: woven bags in wooden case, wooden pallet, wooden shaft, or as per customer’s requirement
13. Test : chemical component, yield strength, tensile strength, hardness measurement
14. Guarantee: The third party (for example :SGS TV ) inspection, etc.
15. Application: Decoration, furniture, oil transportation, heat exchanger, railing making, paper making, automobile, food processing, medical, etc.
All the Chemical Composition and Physical Properites for Stainless Steel as beow:
Material | ASTM A269 Chemical Composition % Max | ||||||||||
C | Mn | P | S | Si | Cr | Ni | Mo | N B | Nb | Ti | |
TP304 | 0.08 | 2.00 | 0.045 | 0.030 | 1.00 | 18.0-20.0 | 8.0-11.0 | ^ | ^ | ^ . | ^ |
TP304L | 0.035 | 2.00 | 0.045 | 0.030 | 1.00 | 18.0-20.0 | 8.0-12.0 | ^ | ^ | ^ | ^ |
TP316 | 0.08 | 2.00 | 0.045 | 0.030 | 1.00 | 16.0-18.0 | 10.0-14.0 | 2.00-3.00 | ^ | ^ | ^ |
TP316L | 0.035 D | 2.00 | 0.045 | 0.030 | 1.00 | 16.0-18.0 | 10.0-15.0 | 2.00-3.00 | ^ | ^ | ^ |
TP321 | 0.08 | 2.00 | 0.045 | 0.030 | 1.00 | 17.0-19.0 | 9.0-12.0 | ^ | ^ | ^ | 5C -0.70 |
TP347 | 0.08 | 2.00 | 0.045 | 0.030 | 1.00 | 17.0-19.0 | 9.0-12.0 | 10C -1.10 | ^ |
Material | Heat treatment | Temperure F (C) Min. | Hardness | |
Brinell | Rockwell | |||
TP304 | Solution | 1900 (1040) | 192HBW/200HV | 90HRB |
TP304L | Solution | 1900 (1040) | 192HBW/200HV | 90HRB |
TP316 | Solution | 1900(1040) | 192HBW/200HV | 90HRB |
TP316L | Solution | 1900(1040) | 192HBW/200HV | 90HRB |
TP321 | Solution | 1900(1040) F | 192HBW/200HV | 90HRB |
TP347 | Solution | 1900(1040) | 192HBW/200HV | 90HRB |
OD, inch | OD Tolerance inch(mm) | WT Tolerance % | Length Tolernace inch(mm) | |
+ | - | |||
≤ 1 / 2 | ± 0.005 ( 0.13 ) | ± 15 | 1 / 8 ( 3.2 ) | 0 |
> 1 / 2 ~1 1 / 2 | ± 0.005(0.13) | ± 10 | 1 / 8 (3.2) | 0 |
> 1 1 / 2 ~< 3 1 / 2 | ± 0.010(0.25) | ± 10 | 3 / 16 (4.8) | 0 |
> 3 1 / 2 ~< 5 1 / 2 | ± 0.015(0.38) | ± 10 | 3 / 16 (4.8) | 0 |
> 5 1 / 2 ~< 8 | ± 0.030(0.76) | ± 10 | 3 / 16 (4.8) | 0 |
8~< 12 | ± 0.040(1.01) | ± 10 | 3 / 16 (4.8) | 0 |
12~< 14 | ± 0.050(1.26) | ± 10 | 3 / 16 (4.8) | 0 |
Natural microbial communities are phylogenetically and metabolically diverse. In addition to understudied groups of organisms1, this diversity also holds a rich potential for the discovery of ecologically and biotechnologically significant enzymes and biochemical compounds2,3. However, studying this diversity to determine the genomic pathways that synthesize such compounds and bind them to their respective hosts remains a challenge. The biosynthetic potential of microorganisms in the open ocean remains largely unknown due to limitations in the analysis of whole genome resolution data on a global scale. Here, we explore the diversity and diversity of biosynthetic gene clusters in the ocean by integrating about 10,000 microbial genomes from cultured cells and single cells with more than 25,000 newly reconstructed draft genomes from over 1,000 seawater samples. These efforts have identified about 40,000 putative mostly new biosynthetic gene clusters, some of which have been found in previously unsuspected phylogenetic groups. In these populations, we identified a lineage enriched in biosynthetic gene clusters (“Candidatus Eudormicrobiaceae”) that belonged to an uncultivated bacterial phylum and included some of the most biosynthetically diverse microorganisms in this environment. Of these, we have characterized the phosphatase-peptide and pytonamide pathways, identifying instances of unusual bioactive compound structure and enzymology, respectively. In conclusion, this study demonstrates how microbiome-based strategies can enable the exploration of previously undescribed enzymes and natural foods in a poorly understood microbiota and environment.
Microbes drive global biogeochemical cycles, maintain food webs, and keep plants and animals healthy5. Their enormous phylogenetic, metabolic and functional diversity represents a rich potential for the discovery of new taxa1, enzymes and biochemical compounds, including natural products6. In ecological communities, these molecules provide microorganisms with a variety of physiological and ecological functions, from communication to competition 2, 7 . In addition to their original functions, these natural products and their genetically coded production pathways provide examples for biotechnological and therapeutic applications2,3. The identification of such pathways and connections has been greatly facilitated by the study of cultured microbes. However, taxonomic studies of natural environments have shown that the vast majority of microorganisms have not been cultivated8. This cultural bias limits our ability to exploit the functional diversity encoded by many microbes4,9.
To overcome these limitations, technological advances over the past decade have allowed researchers to directly (i.e., without prior culture) sequence microbial DNA fragments from entire communities (metagenomics) or single cells. The ability to assemble these fragments into larger genome fragments and reconstruct multiple metagenomically assembled genomes (MAGs) or single amplified genomes (SAGs), respectively, opens up an important opportunity for taxocentric studies of the microbiome (i.e., microbial communities and the microbiome). pave new paths. own genetic material in a given environment) 10,11,12. Indeed, recent studies have greatly expanded the phylogenetic representation of microbial diversity on Earth1, 13 and have revealed much of the functional diversity in individual microbial communities not previously covered by cultured microorganism reference genome sequences (REFs)14. The ability to place undiscovered functional diversity in the context of the host genome (i.e., genome resolution) is critical for predicting as yet uncharacterized microbial lines that presumably encode new natural products15,16 or for tracing such compounds back to their original producer17. For example, a combined metagenomic and single-cell genomic analysis approach has led to the identification of Candidatus Entotheonella, a group of metabolically rich sponge-associated bacteria, as producers of a variety of drug potentials18. However, despite recent attempts at genomic exploration of diverse microbial communities,16,19 more than two-thirds of the global metagenomic data for Earth’s largest ocean of ecosystems16,20 are still missing. Thus, in general, the biosynthetic potential of the marine microbiome and its potential as a repository of novel enzymatic and natural products remain largely understudied.
To explore the biosynthetic potential of marine microbiomes on a global scale, we first pooled marine microbial genomes obtained using culture-dependent and non-culture methods to create an extensive database of phylogenetics and gene function. Examination of this database revealed a wide variety of biosynthetic gene clusters (BGCs), most of which belong to as yet uncharacterized gene cluster (GCF) families. In addition, we identified an unknown bacterial family that exhibits the highest known diversity of BGCs in the open ocean to date. We selected two ribosomal synthesis and post-translationally modified peptide (RiPP) pathways for experimental validation based on their genetic differences from currently known pathways. The functional characterization of these pathways has revealed unexpected examples of enzymology as well as structurally unusual compounds with protease inhibitory activity.
At first, we aimed to create a global data resource for genome analysis, focusing on its bacterial and archaeal components. To this end, we pooled metagenomic data and 1038 seawater samples from 215 globally distributed sampling sites (latitude range = 141.6°) and several deep layers (from 1 to 5600 m in depth, covering the pelagic, mesopelagic and abyssal zones). Background21,22,23 (Fig. 1a, extended data, Fig. 1a and Supplementary Table 1). In addition to providing a wide geographic coverage, these selectively filtered samples allowed us to compare various components of the marine microbiome, including virus-rich (<0.2 µm), prokaryotic-rich (0.2–3 µm), particle-rich (0.8 µm). –20 µm) and virus-depleted (>0.2 µm) colonies.
a, A total of 1038 publicly available genomes (metagenomics) of marine microbial communities collected from 215 globally distributed locations (62°S to 79°N and 179°W to 179°E .). Map tiles © Esri. Sources: GEBCO, NOAA, CHS, OSU, UNH, CSUMB, National Geographic, DeLorme, NAVTEQ, and Esri. b, these metagenomes were used to reconstruct MAGs (methods and additional information), which differ in quantity and quality (methods) in the datasets (marked in color). The reconstructed MAGs were supplemented with publicly available (external) genomes, including handcrafted MAG26, SAG27 and REF. 27 Compile OMD. c, compared to previous reports based only on SAG (GORG)20 or MAG (GEM)16, OMD improves the genomic characterization of marine microbial communities (metagenomic read mapping rate; method) by two to three times with more consistent representation in depth and latitude. . <0.2, n=151, 0.2-0.8, n=67, 0.2-3, n=180, 0.8-20, n=30, >0.2, n=610, <30°, n = 132, 30–60° , n = 73, >60°, n = 42, EPI, n = 174, MES, n = 45, BAT, n = 28. d, OMD grouping into species clusters level (95% mean nucleotide identity) identifies a total of approximately 8300 species, more than half of which have not previously been characterized according to taxonomic annotations using the GTDB (version 89) e, classification of species by genome type showed that MAG, SAG and REFs complement each other well in reflecting the phylogenetic diversity of the marine microbiome. In particular, 55%, 26% and 11% of the species were specific for MAG, SAG and REF, respectively. BATS, Bermuda Atlantic Time Series; GEM, genomes of the Earth’s microbiome; GORG, global ocean reference genome; HOT, Hawaiian Ocean time series.
Using this dataset, we reconstructed a total of 26,293 MAGs, mostly bacterial and archaeal (Fig. 1b and expanded data, Fig. 1b). We created these MAGs from assemblies from separate rather than pooled metagenomic samples to prevent the collapse of natural sequence variation between samples from different locations or time points (methods). In addition, we grouped genomic fragments based on their prevalence correlations across a large number of samples (from 58 to 610 samples, depending on survey; method). We found that this is a time-consuming but important step24 that was skipped in several large-scale MAG16, 19, 25 reconstruction works and significantly improves the quantity (2.7-fold on average) and quality (+20% on average) of the genome. reconstructed from the marine metagenome studied here (extended data, Fig. 2a and additional information). Overall, these efforts resulted in a 4.5-fold increase in marine microbial MAGs (6-fold if only high-quality MAGs are considered) compared to the most comprehensive MAG resource available today16 (Methods). This newly created MAG set was then combined with 830 hand-picked MAG26s, 5969 SAG27s and 1707 REFs. Twenty-seven species of marine bacteria and archaea made up a combinatorial collection of 34,799 genomes (Fig. 1b).
We then evaluated the newly created resource to improve its ability to represent marine microbial communities and assess the impact of integrating different genome types. On average, we found that it covers approximately 40-60% of marine metagenomic data (Figure 1c), two to three times the coverage of previous MAG-only reports in both depth and latitude More serial 16 or SAG20. In addition, to systematically measure taxonomic diversity in established collections, we annotated all genomes using the Genome Taxonomy Database (GTDB) toolkit (methods) and used an average genome-wide nucleotide identity of 95%. 28 to identify 8,304 species clusters (species). Two-thirds of these species (including new clades) had not previously appeared in the GTDB, of which 2790 were discovered using the MAG reconstructed in this study (Fig. 1d). In addition, we found that different types of genomes are highly complementary: 55%, 26%, and 11% of species are composed entirely of MAG, SAG, and REF, respectively (Fig. 1e). In addition, MAG covered all 49 types found in the water column, while SAG and REF only represented 18 and 11 of them, respectively. However, SAG better represents the diversity of the most common clades (expanded data, Fig. 3a), such as Pelagic Bacteriales (SAR11), with SAG covering almost 1300 species and MAG only 390 species. Notably, REFs rarely overlapped with MAGs or SAGs at the species level and represented >95% of the approximately 1000 genomes not found in the open ocean metagenomic sets studied here, mainly due to interactions with other types of isolated representative marine specimens (e.g. sediments ). or host-associate). To make it widely available to the scientific community, this marine genome resource, which also includes unclassified fragments (e.g., from predicted phages, genomic islands, and genome fragments for which there is insufficient data for MAG reconstruction), can be compared with taxonomic data. Access annotations along with gene function and contextual parameters in the Ocean Microbiology Database (OMD; https://microbiomics.io/ocean/).
We then set out to explore the richness and novelty of biosynthetic potential in open ocean microbiomes. To this end, we first used antiSMASH for all MAGs, SAGs, and REFs found in 1038 marine metagenomes (methods) to predict a total of 39,055 BGCs. We then grouped these into 6907 non-redundant GCFs and 151 gene cluster populations (GCCs; Supplementary Table 2 and methods) to account for inherent redundancy (i.e., the same BGC can be encoded in multiple genomes) and metagenomic data Fragmentation of concentrated BGCs . Incomplete BGCs did not significantly increase, if any (Supplementary Information), the number of GCFs and GCCs, respectively, containing at least one intact BGC member in 44% and 86% of cases.
At the GCC level, we found a wide variety of predicted RiPPs and other natural products (Fig. 2a). Among them, for example, arylpolyenes, carotenoids, ectoines, and siderophores belong to GCCs with a wide phylogenetic distribution and a high abundance in oceanic metagenomes, which may indicate a wide adaptation of microorganisms to the marine environment, including resistance to reactive oxygen species, oxidative and osmotic stress. . or iron absorption (more information). This functional diversity contrasts with a recent analysis of approximately 1.2 million BGCs among approximately 190,000 genomes stored in the NCBI RefSeq database (BiG-FAM/RefSeq, hereinafter referred to as RefSeq)29, which showed that nonribosomal Synthetase peptides (NRPS) and polyketide synthase (PKS) BGCs (Supplementary Information). We also found 44 (29%) GCCs only distantly related to any RefSeq BGC (\(\bar{d}\)RefSeq > 0.4; Fig. 2a and methods) and 53 (35%) GCCs only in MAG , highlighting the potential to detect previously undescribed chemicals in OMD. Given that each of these GCCs likely represent highly diverse biosynthetic functions, we further analyzed data at the GCF level in an effort to provide a more detailed grouping of BGCs predicted to code for similar natural products29. A total of 3861 (56%) identified GCFs did not overlap with RefSeq, and >97% of GCFs were not present in MIBiG, one of the largest databases of experimentally validated BGCs (Figure 2b). While it is not surprising to discover many potential novel pathways in settings that are not well represented by the reference genome, our method for dereplicating BGCs into GCFs before benchmarking differs from previous reports 16 and allows us to provide an unbiased assessment of novelty. Most of the new diversity (3012 GCF or 78%) corresponds to predicted terpenes, RiPP or other natural products, and most (1815 GCF or 47%) is encoded in unknown types due to their biosynthetic potential. Unlike PKS and NRPS clusters, these compact BGCs are less likely to be fragmented during metagenomic assembly 31 and allow more time- and resource-intensive functional characterization of their products.
A total of 39,055 BGCs were grouped into 6,907 GCFs and 151 GCCs. a, data representation (internal external). Hierarchical clustering of BGC distances based on GCC, 53 of which are fixed by MAG only. The GCC contains BGCs from different taxa (ln-transformed gate frequency) and different BGC classes (circle size corresponds to its frequency). For each GCC, the outer layer represents the number of BGCs, the prevalence (percentage of samples), and the distance (minimum BGC cosine distance (min(dMIBiG))) from BiG-FAM to BGC. GCCs with BGCs closely related to experimentally verified BGCs (MIBiG) are highlighted with arrows. b Comparing GCF with predicted (BiG-FAM) and experimentally validated (MIBiG) BGCs, 3861 new (d–>0.2) GCFs were found. Most (78%) of these code for RiPP, terpenes, and other putative natural products. c, all genomes in the OMD found in 1038 marine metagenomes were placed in the GTDB base tree to show the phylogenetic coverage of the OMD. Clades without any genomes in the OMD are shown in grey. The number of BGCs corresponds to the largest number of predicted BGCs per genome in a given clade. For clarity, the last 15% of the nodes are collapsed. Arrows indicate clades rich in BGC (>15 BGC), with the exception of Mycobacterium, Gordonia (second only to Rhodococcus), and Crocosphaera (second only to Synechococcus). d, Unknown c. Eremiobacterota showed the highest biosynthetic diversity (Shannon index based on natural product type). Each band represents the genome with the most BGCs in the species. T1PKS, PKS type I, T2/3PKS, PKS type II and type III.
In addition to richness and novelty, we explore the biogeographic structure of the biosynthetic potential of the marine microbiome. Grouping of samples by average metagenomic GCF copy number distribution (Methods) showed that low-latitude, surface, prokaryotic-rich and virus-poor communities, mostly from surface or deeper sunlit waters, were rich in RiPP and BGC terpenes. In contrast, polar, deep-sea, virus- and particle-rich communities were associated with higher abundances of NRPS and PKS BGC (expanded data, Fig. 4 and additional information). Finally, we found that well-studied tropical and pelagic communities are the most promising sources of new terpenes (Augmented Data Figure). Highest potential for PKS, RiPP and other natural products (Figure 5a with expanded data).
To complement our study of the biosynthetic potential of marine microbiomes, we aimed to map their phylogenetic distribution and identify new BGC-enriched clades. To this end, we placed the genomes of marine microbes into a normalized GTDB13 bacterial and archaeal phylogenetic tree and overlaid the putative biosynthetic pathways they encode (Fig. 2c). We have easily detected several BGC-enriched clades (represented by over 15 BGCs) in seawater samples (methods) known for their biosynthetic potential, such as cyanobacteria (Synechococcus) and Proteus bacteria, such as Tistrella32,33, or recently attracted attention for their natural products . such as Myxococcota (Sandaracinaceae), Rhodococcus and Planctomycetota34,35,36. Interestingly, we found several previously unexplored lineages in these clades. For example, those species with the richest biosynthetic potential in the phyla Planctomycetota and Myxococcota belonged to uncharacterized candidate orders and genera, respectively (Supplementary Table 3). Taken together, this suggests that the OMD provides access to previously unknown phylogenetic information, including microorganisms, which may represent new targets for enzyme and natural product discovery.
Next, we characterized the BGC-enriched clade by not only counting the maximum number of BGCs encoded by its members, but also by assessing the diversity of these BGCs, which explains the frequency of different types of natural candidate products (Fig. 2c and methods). . We found that the most biosynthetically diverse species were represented by specially engineered bacterial MAGs in this study. These bacteria belong to the uncultivated phylum Candidatus Eremiobacterota, which remains largely unexplored apart from a few genomic studies37,38. It is noteworthy that “ca. The genus Eremiobacterota has only been analyzed in a terrestrial environment39 and is not known to include any members enriched in BGC. Here we have reconstructed eight MAGs of the same species (nucleotide identity > 99%) 23. We therefore propose the species name “Candidatus Eudoremicrobium malaspinii”, named after the nereid (sea nymph), a beautiful gift in Greek mythology and expeditions. ‘Ka. According to phylogenetic annotation 13, E. malaspinii has no previously known relatives below the sequence level and thus belongs to a new bacterial family that we propose “Ca. E. malaspinii” as the type species and “Ca. Eudormicrobiaceae” as the official name (Supplementary Information). Brief metagenomic reconstruction of ‘Ca. The E. malaspinii genome project was validated by very low input, long read metagenomic sequencing and targeted assembly of a single sample (Methods) as a single 9.63 Mb linear chromosome with a 75 kb duplication. as the only remaining ambiguity.
To establish the phylogenetic context of this species, we searched for 40 closely related species in additional eukaryotic-enriched metagenomic samples from the Tara Ocean expedition through targeted genome reconstruction. Briefly, we have linked metagenomic reads to genomic fragments associated with “Ca. E. malaspinii” and hypothesized that an increased recruitment rate in this sample indicates the presence of other relatives (methods). As a result, we found 10 MAGs, a combination of 19 MAGs representing five species in three genera within a newly defined family (i.e. “Ca. Eudormicrobiaceae”). After manual inspection and quality control (expanded data, Fig. 6 and additional information), we found that “Ca. Eudormicrobiaceae species present larger genomes (8 Mb) and richer biosynthetic potential (14 to 22 BGC per species) than other “Ca” members. Clade Eremiobacterota (up to 7 BGC) (Fig. 3a–c).
a, Phylogenetic positions of the five ‘Ca. Species of Eudormicrobiaceae showed BGC richness specific to the marine lines identified in this study. The phylogenetic tree includes all ‘Ca. MAG Eremiobacterota and members of other phyla (genome numbers in brackets) provided in GTDB (version 89) were used for evolutionary background (Methods). The outermost layers represent classifications at the family level (“Ca. Eudormicrobiaceae” and “Ca. Xenobiaceae”) and at the class level (“Ca. Eremiobacteria”). The five species described in this study are represented by alphanumeric codes and proposed binomial names (Supplementary Information). b, ok. Eudormicrobiaceae species share seven common BGC nuclei. The absence of BGC in the A2 clade was due to the incompleteness of the representative MAG (Supplementary Table 3). BGCs are specific to “Ca. Amphithomicrobium” and “Ca. Amphithomicrobium” (clades A and B) are not shown. c, All BGCs encoded as “Ca. Eudoremicrobium taraoceanii was found to be expressed in 623 metatranscriptomes taken from the oceans of Tara. Solid circles indicate active transcription. Orange circles denote log2-transformed fold changes below and above the housekeeping gene expression rate (methods). d, relative abundance curves (methods) showing ‘Ca. Species of Eudormicrobiaceae are widespread in most ocean basins and in the entire water column (from the surface to a depth of at least 4000 m). Based on these estimates, we found that ‘Ca. E. malaspinii’ accounts for up to 6% of prokaryotic cells in deep-sea pelagic grain-associated communities. We considered a species to be present at a site if it was found in any fraction of the size of a given depth layer. IO – Indian Ocean, NAO – North Atlantic, NPO – North Pacific, RS – Red Sea, SAO – South Atlantic, SO – Southern Ocean, SPO – South Pacific.
Studying the abundance and distribution of Ca. Eudormicrobiaceae, which, as we found, predominates in most ocean basins, as well as in the entire water column (Fig. 3d). Locally, they make up 6% of the marine microbial community, making them an important part of the global marine microbiome. In addition, we found the relative content of Ca. Eudormicrobiaceae species and their BGC expression levels were highest in the eukaryotic enriched fraction (Fig. 3c and extended data, Fig. 7), indicating a possible interaction with particulate matter, including plankton. This observation bears some resemblance to ‘Ca. Eudoremicrobium BGCs that produce cytotoxic natural products through known pathways may exhibit predatory behavior (Supplementary Information and Expanded Data, Figure 8), similar to other predators that specifically produce metabolites such as Myxococcus41. Discovery of Ca. Eudormicrobiaceae in less available (deep ocean) or eukaryotic rather than prokaryotic samples may explain why these bacteria and their unexpected BGC diversity remain unclear in the context of natural food research.
Ultimately, we sought to experimentally validate the promise of our microbiome-based work in discovering new pathways, enzymes, and natural products. Among the different classes of BGCs, the RiPP pathway is known to encode a rich chemical and functional diversity due to various post-translational modifications of the core peptide by mature enzymes42. So we chose two ‘Ca. Eudoremicrobium’ RiPP BGCs (Figures 3b and 4a-e) are based on the same as any known BGC (\(\bar{d}\)MIBiG and \(\bar{d}\)RefSeq above 0.2) .
a–c, In vitro heterologous expression and in vitro enzymatic assays of a novel (\(\bar{d}\)RefSeq = 0.29) cluster of RiPP biosynthesis specific for deep sea Ca species. E. malaspinii’ led to the production of diphosphorylated products. c, modifications identified using high-resolution (HR) MS/MS (fragmentation indicated by b and y ions in the chemical structure) and NMR (expanded data, Fig. 9). d, this phosphorylated peptide exhibits low micromolar inhibition of mammalian neutrophil elastase, which is not found in the control peptide and the dehydrating peptide (chemical removal induced dehydration). The experiment was repeated three times with similar results. For example, heterologous expression of a second novel \(\bar{d}\)RefSeq = 0.33) cluster of protein biosynthesis elucidates the function of four mature enzymes that modify the 46 amino acid core peptide. Residues are stained according to site of modification predicted by HR-MS/MS, isotope labeling, and NMR analysis (Supplementary Information). Dashed coloration indicates that the modification occurs at either of the two residues. The figure is a compilation of numerous heterologous constructs to show the activity of all mature enzymes on the same nucleus. h, Illustration of NMR data for backbone amide N-methylation. Full results are shown in fig. 10 with extended data. i, Phylogenetic position of the mature FkbM protein cluster enzyme among all FkbM domains found in the MIBiG 2.0 database reveals an enzyme of this family with N-methyltransferase activity (Supplementary Information). Schematic diagrams of BGCs (a, e), precursor peptide structures (b, f), and putative chemical structures of natural products (c, g) are shown.
The first RiPP pathway (\(\bar{d}\)MIBiG = 0.41, \(\bar{d}\)RefSeq = 0.29) was found only in deep-sea species “Ca. E. malaspinii” and codes for Peptide- precursor (Fig. 4a, b). In this mature enzyme, we have identified a single functional domain homologous to the dehydration domain of lantipeptide synthase that normally catalyzes phosphorylation and subsequent removal of 43 (Supplementary Information). Therefore, we predict that the modification of the precursor peptide involves such a two-step dehydration. However, using tandem mass spectrometry (MS/MS) and nuclear magnetic resonance spectroscopy (NMR), we identified a polyphosphorylated linear peptide (Fig. 4c). Although unexpected, we found several lines of evidence to support its being the end product: two different heterologous hosts and no dehydration in in vitro assays, identification of key residues mutated in the catalytic dehydration site of the mature enzyme. all reconstructed by “Ca”. The E. malaspinii genome (expanded data, Fig. 9 and additional information) and, finally, the biological activity of the phosphorylated product, but not the chemically synthesized dehydrated form (Fig. 4d). In fact, we found that it exhibits a low micromolar protease inhibitory activity against neutrophil elastase, comparable to other related natural products in the concentration range (IC50 = 14.3 μM) 44 , despite the fact that the ecological role remains to be elucidated. Based on these results, we propose to name the pathway “phospheptin”.
The second case is a complex RiPP pathway specific to ‘Ca. The genus Eudoremicrobium (\(\bar{d}\)MIBiG = 0.46, \(\bar{d}\)RefSeq = 0.33) was predicted to encode natural protein products (Fig. 4e). These pathways are of particular biotechnological interest because of the expected density and variety of unusual chemical modifications established by the enzymes encoded by the relatively short BGCs45. We found that this protein differs from previously characterized proteins in that it lacks both the main NX5N motif of polyceramides and the lanthionine loop of landornamides 46 . To overcome the limitations of common heterologous expression patterns, we used them along with a custom Microvirgula aerodenitrificans system to characterize four mature pathway enzymes (methods). Using a combination of MS/MS, isotope labeling, and NMR, we detected these mature enzymes in the 46-amino acid core of the peptide (Fig. 4f,g, expanded data, Figs. 10–12 and additional information). Among mature enzymes, we characterized the first appearance of a FkbM O-methyltransferase family member 47 in the RiPP pathway and unexpectedly found that this mature enzyme introduces backbone N-methylation (Fig. 4h, i and additional information). Although this modification is known in natural NRP48 products, enzymatic N-methylation of amide bonds is a complex but biotechnologically significant reaction49 that has so far been of interest to the RiPP family of borosines. Specificity 50,51. The identification of this activity in other families of enzymes and RiPP may open up new applications and expand the functional diversity of proteins 52 and their chemical diversity. Based on the identified modifications and the unusual length of the proposed product structure, we propose a pathway name “pythonamide”.
The discovery of an unexpected enzymology in a functionally characterized family of enzymes illustrates the promise of environmental genomics for new discoveries, and also illustrates the limited capacity for functional inference based on sequence homology alone. Thus, together with reports of non-canonical bioactive polyphosphorylated RiPPs, our results demonstrate resource-intensive but critical value to synthetic biology efforts to fully uncover the functional richness, diversity, and unusual structures of biochemical compounds.
Here we demonstrate the range of biosynthetic potential encoded by microbes and their genomic context in the global marine microbiome, facilitating future research by making the resulting resource available to the scientific community (https://microbiomics.io/ocean/). We found that much of its phylogenetic and functional novelty can only be obtained by reconstructing MAGs and SAGs, especially in underutilized microbial communities that could guide future bioprospecting efforts. Although we will focus here on ‘Ca. Eudormicrobiaceae” as a lineage especially biosynthetically “talented”, many of the BGCs predicted in the undiscovered microbiota likely encode previously undescribed enzymologies that yield compounds with environmentally and/or biotechnologically significant actions.
Metagenomic datasets from major oceanographic and time series studies with sufficient sequencing depth were included to maximize coverage of global marine microbial communities in ocean basins, deep layers and over time. These datasets (Supplementary Table 1 and Figure 1) include metagenomics from samples collected in the oceans of Tara (viral enriched, n=190; prokaryotic enriched, n=180)12,22 and the BioGEOTRACES expedition (n=480). Hawaiian Oceanic Time Series (HOT, n = 68), Bermuda-Atlantic Time Series (BATS, n = 62)21 and the Malaspina Expedition (n = 58)23. Sequencing reads from all metagenomic fragments were filtered for quality using BBMap (v.38.71) by removing sequencing adapters from reads, removing reads mapped to quality control sequences (PhiX genomes), and using trimq=14, maq=20 discards poor read quality, maxns = 0 and minlength = 45. Subsequent analyzes were run or merged with QC reads if specified (bbmerge.sh minoverlap=16). QC readings were normalized (bbnorm.sh target = 40, minddepth = 0) prior to build using metaSPAdes (v.3.11.1 or v.3.12 if needed)53. The resulting scaffold contigs (hereinafter referred to as scaffolds) were finally filtered by length (≥1 kb).
The 1038 metagenomic samples were divided into groups, and for each group of samples, the metagenomic quality control reads of all samples were matched to the brackets of each sample separately, resulting in the following number of pairwise bracketed group reads: Tara Marine Viruses – Enriched (190×190 ), Prokaryotes Enriched (180×180), BioGEOTRACES, HOT and BATS (610×610) and Malaspina (58×58). Mapping was done using Burrows-Wheeler-Aligner (BWA) (v.0.7.17-r1188)54 which allows readings to be matched to secondary sites (using the -a flag). Alignments were filtered to be at least 45 bases long, have ≥97% identity, and span ≥80% reads. The resulting BAM files were processed using the jgi_summarize_bam_contig_depths script for MetaBAT2 (v.2.12.1)55 to provide intra- and inter-sample coverage for each group. Finally, brackets were grouped to increase sensitivity by individually running MetaBAT2 on all samples with –minContig 2000 and –maxEdges 500. We use MetaBAT2 instead of an ensemble boxer because it has been shown in independent tests to be the most effective single boxer. and 10 to 50 times faster than other commonly used boxers57. To test for the effect of abundance correlations, a randomly selected subsample of metagenomics (10 for each of the two Tara Ocean datasets, 10 for BioGEOTRACES, 5 for each time series, and 5 for Malaspina) additionally used samples only. Internal samples are grouped to obtain coverage information. (Additional Information).
Additional (external) genomes were included in the subsequent analysis, namely 830 manually selected MAGs from a subset of the Tara Oceans26 dataset, 5287 SAGs from the GORG20 dataset, and data from the MAR database (MarDB v. 4) from 1707 isolated REFs and 682 SAGs) 27. For the MarDB dataset, genomes are selected based on available metadata if the sample type matches the following regular expression: ‘[S|s]ingle.?[C|c]ell|[C|c]ulture|[I|i] isolated’.
The quality of each metagenomic container and external genomes was assessed using CheckM (v.1.0.13) and Anvi’o's Lineage Workflow (v.5.5.0)58,59. If CheckM or Anvi’o reports ≥50% completeness/completeness and ≤10% contamination/redundancy, then save metagenomic cells and external genomes for later analysis. These scores were then combined into mean completeness (mcpl) and mean contamination (mctn) to classify genome quality according to community criteria60 as follows: high quality: mcpl ≥ 90% and mctn ≤ 5%; good quality: mcpl ≥ 70%, mctn ≤ 10%, medium quality: mcpl ≥ 50% and mctn ≤ 10%, fair quality: mcpl ≤ 90% or mctn ≥ 10%. The filtered genomes were then correlated with quality scores (Q and Q’) as follows: Q = mcpl – 5 x mctn Q’ = mcpl – 5 x mctn + mctn x (strain variability)/100 + 0.5 x log[N50] . (implemented in dRep61).
To allow comparative analysis between different data sources and genome types (MAG, SAG and REF), 34,799 genomes were dereferenced based on genome-wide average nucleotide identity (ANI) using dRep (v.2.5.4). Repeats)61 with 95% ANI thresholds28,62 (-comp 0 -con 1000 -sa 0.95 -nc 0.2) and single-copy marker genes using SpecI63 providing genome clustering at the species level. A representative genome was selected for each dRep cluster according to the maximum quality score (Q’) defined above, which was considered representative of the species.
To evaluate the mapping speed, BWA (v.0.7.17-r1188, -a) was used to map all 1038 sets of metagenomic reads with 34,799 genomes contained in the OMD. Quality-controlled reads were mapped in single-ended mode and the resulting alignments were filtered to retain only alignments ≥45 bp in length. and identity ≥95%. The display ratio for each sample is the percentage of readings remaining after filtration divided by the total number of quality control readings. Using the same approach, each of the 1038 metagenomes was reduced to 5 million inserts (expanded data, Fig. 1c) and matched to GORG SAG in OMD and in all GEM16. The amount of MAGs recovered from seawater in the GEM16 catalog was determined by keyword queries of metagenomic sources, selecting seawater samples (eg, as opposed to marine sediments). Specifically, we select “aquatic” as “ecosystem_category”, “marine” as “ecosystem_type”, and filter “habitat” as “deep ocean”, “marine”, “maritime oceanic”, “pelagic marine”, “marine water” , “Ocean”, “Sea Water”, “Surface Sea Water”, “Surface Sea Water”. This resulted in 5903 MAGs (734 high quality) distributed over 1823 OTUs (views here).
Prokaryotic genomes were taxonomically annotated using GTDB-Tk (v.1.0.2)64 with default parameters targeting GTDB r89 version 13. Anvi’o was used to identify eukaryotic genomes based on domain prediction and recall ≥50% and redundancy ≤ 10%. The taxonomic annotation of a species is defined as one of its representative genomes. With the exception of eukaryotes (148 MAG), each genome was first functionally annotated using prokka (v.1.14.5)65, naming complete genes, defining “archaea” or “bacteria” parameters as needed, which is also reported for non-coding genes. and CRISPR regions, among other genomic features. Annotate predicted genes by identifying universal single-copy marker genes (uscMG) using fetchMG (v.1.2)66, assign ortholog groups and query using emapper (v.2.0.1)67 based on eggNOG (v.5.0)68. KEGG database (published February 10, 2020) 69. The last step was performed by matching proteins to the KEGG database using DIAMOND (v.0.9.30)70 with a query and topic coverage of ≥70%. Results were further filtered according to NCBI Prokaryotic Genome Annotation Pipeline71 based on bitrate ≥ 50% of maximum expected bitrate (link itself). Gene sequences were also used as input to identify BGCs in the genome using antiSMASH (v.5.1.0)72 with default parameters and different cluster explosions. All genomes and annotations have been compiled into OMD along with contextual metadata available on the web (https://microbiomics.io/ocean/).
Similar to previously described methods12,22 we used CD-HIT (v.4.8.1) to cluster >56.6 million protein-coding genes from bacterial and archaeal genomes from OMD into 95% identity and shorter genes (90% coverage)73 up to >17.7 million gene clusters. The longest sequence was chosen as the representative gene for each gene cluster. The 1038 metagenomes were then matched to >17.7 million BWA (-a) cluster members and the resulting BAM files were filtered to retain only alignments with ≥95% percent identity and ≥45 base alignments. Length-normalized gene abundance was calculated by first counting inserts from the best unique alignment and then, for fuzzy-mapped inserts, adding fractional counts to the corresponding target genes proportional to their number of unique inserts.
The genomes from the expanded OMD (with additional MAGs from “Ca. Eudormicrobiaceae”, see below) were added to the mOTUs74 metagenomic analysis tool database (v.2.5.1) to create an extended mOTU reference database. Only six single-copy genomes (23,528 genomes) survived out of ten uscMGs. The expansion of the database resulted in 4,494 additional clusters at the species level. 1038 metagenomes were analyzed using default mOTU parameters (v.2). A total of 989 genomes contained in 644 mOTU clusters (95% REF, 5% SAG and 99.9% belonging to MarDB) were not detected by the mOTU profile. This reflects various additional sources of marine isolation of the MarDB genomes (most of the undetected genomes are associated with organisms isolated from sediments, marine hosts, etc.). To continue focusing on the open ocean environment in this study, we excluded them from the downstream analysis unless they were detected or included in the extended mOTU database created in this study.
All BGCs from MAG, SAG and REF in OMD (see above) were combined with BGCs identified in all metagenomic scaffolds (antiSMASH v.5.0, default parameters) and characterized using BiG-SLICE (v.1.1) (PFAM domain )75. Based on these features, we computed all cosine distances between BGCs and grouped them (mean links) into GCF and GCC using distance thresholds of 0.2 and 0.8 respectively. These thresholds are an adaptation of thresholds previously used using Euclidean distance75 together with cosine distance, which alleviates some of the error in the original BiG-SLICE clustering strategy (Supplementary Information).
BGCs were then filtered to retain only ≥5 kb encoded on scaffolds to reduce the risk of fragmentation as previously described16 and to exclude MarDB REFs and SAGs not found in 1038 metagenomes (see above). This resulted in a total of 39,055 BGCs being encoded by the OMD genome, with an additional 14,106 identified on metagenomic fragments (i.e. not combined into MAGs). These “metagenomic” BGCs were used to estimate the proportion of marine microbiome biosynthesis potential not captured in the database (Supplementary Information). Each BGC was functionally characterized according to predictive product types defined by anti-SMASH or coarser product categories defined in BiG-SCAPE76. To prevent sampling bias in quantification (taxonomic and functional composition of GCC/GCF, distance of GCF and GCC to reference databases, and metagenomic abundance of GCF), by keeping only the longest BGC per GCF for each species, 39,055 BGCs were further deduplicated, in resulting in a total of 17,689 BGC.
The novelty of GCC and GCF was assessed based on the distance between the calculated database (RefSeq database in BiG-FAM)29 and the experimentally verified (MIBIG 2.0)30 BGC. For each of the 17,689 representative BGCs, we chose the smallest cosine distance to the respective database. These minimum distances are then averaged (mean) according to GCF or GCC, as appropriate. A GCF is considered new if the distance to the database is greater than 0.2, which corresponds to an ideal separation between the (average) GCF and the reference. For GCC, we choose 0.4, which is twice the threshold defined by GCF, to lock in a long-term relationship with links.
The metagenomic abundance of BGC was estimated as the average abundance of its biosynthetic genes (as determined by anti-SMASH) available from gene-level profiles. The metagenomic abundance of each GCF or GCC was then calculated as the sum of representative BGCs (out of 17,689). These abundance maps were subsequently normalized for cellular composition using the per-sample mOTU count, which also accounted for sequencing efforts (expanded data, Fig. 1d). The prevalence of GCF or GCC was calculated as the percentage of samples with an abundance > 0.
The Euclidean distance between samples was calculated from the normalized GCF profile. These distances were reduced in size using UMAP77 and the resulting embeddings were used for unsupervised density-based clustering using HDBSCAN78. The optimal minimum number of points for a cluster (and hence the number of clusters) used by HDBSCAN is determined by maximizing the cumulative probability of cluster membership. The identified clusters (and a random balanced subsample of these clusters to account for bias in permutational multivariate analysis of variance (PERMANOVA)) were tested for significance against unreduced Euclidean distances using PERMANOVA. The average genome size of the samples was calculated based on the relative abundance of mOTU and the estimated genome size of the members of the genomes. In particular, the average genome size of each mOTU was estimated as the average of the genome sizes of its members corrected for completeness (after filtering) (for example, a 75% complete genome with a length of 3 Mb has an adjusted size of 4 Mb). for medium genomes with integrity ≥70%. The average genome size for each sample was then calculated as the sum of mOTU genome sizes weighted by relative abundance.
A filtered set of genome-encoded BGCs in the OMD is shown in bacterial and archaeal GTDB trees (in ≥5 kb frameworks, excluding REF and SAG MarDB not found in 1038 metagenomes, see above) and their predicted product categories on based on the phylogenetic position of the genome (see above). We first reduced the data by species, using the genome with the most BGCs in that species as representative. For visualization, the representatives were further divided into tree groups, and again, for each celled clade, the genome containing the largest number of BGCs was selected as a representative. BGC-enriched species (at least one genome with >15 BGCs) were further analyzed by calculating the Shannon Diversity Index for the product types encoded in those BGCs. If all predicted product types are the same, chemical hybrids and other complex BGCs (as predicted by anti-SMAH) are considered to belong to the same product type, regardless of their order in the cluster (e.g. protein-bacteriocin and bacteriocin-proteoprotein fusion body). hybrid).
Remaining DNA (estimated to be 6 ng) from Malaspina sample MP1648, corresponding to biological sample SAMN05421555 and matched to Illumina SRR3962772 metagenomic read set for short read, processed according to PacBio sequencing protocol with ultra-low input to use PacBio kit SMRTbell gDNA sample amplification kit (100-980-000) and SMRTbell Express 2.0 template preparation kit (100-938-900). Briefly, the remaining DNA was cut, repaired and purified (ProNex beads) using Covaris (g-TUBE, 52104). Purified DNA is then subjected to library preparation, amplification, purification (ProNex beads) and size selection (>6 kb, Blue Pippin) before a final purification step (ProNex beads) and sequencing on the Sequel II platform.
Reconstruction of the first two ca. For MAG Eremiobacterota, we identified six additional ANIs >99% (these are included in Figure 3), which were initially filtered based on contamination scores (later identified as gene duplications, see below). We also found a tray labeled “Ca”. Eremiobacterota” from various studies23 and used them together with eight MAGs from our study as a reference for metagenomic reads from 633 eukaryotic enriched (>0.8 µm) samples using BWA (v.0.7.17) Ref -r1188, – a flag) for downsampled mapping (5 million reads). Based on enrichment-specific maps (filtered by 95% alignment identity and 80% read coverage), 10 metagenomes (expected coverage ≥5×) were selected for assembly and an additional 49 metagenomes (expected coverage ≥1×) for content correlation. Using same parameters as above, these samples were binned and 10 additional ‘Ca’s were added. MAG Eremiobacterota has been restored. These 16 MAGs (not counting the two already in the database) bring the total number of genomes in the expanded OMD to 34,815. MAGs are assigned taxonomic ranks based on their genomic similarity and position in the GTDB. 18 MAGs were dereplicated using dRep into 5 species (intraspecific ANI >99%) and 3 genera (intrageneric ANI 85% to 94%) within the same family79. Species representatives were manually selected based on integrity, contamination, and N50. Suggested nomenclature is provided in the Supplementary Information.
Assess the integrity and contamination of ‘Ca. MAG Eremiobacterota, we assessed the presence of uscMG, as well as lineage- and domain-specific single-copy marker gene sets used by CheckM and Anvi’o. The identification of 2 duplicates out of 40 uscMGs was confirmed by phylogenetic reconstruction (see below) to rule out any potential contamination (this corresponds to 5% based on these 40 marker genes). An additional study of five representative MAGs ‘Ca. The low level of contaminants in these reconstructed genomes was confirmed for Eremiobacterota species using the interactive Anvi’o interface based on abundance and sequence composition correlations (Supplementary Information)59.
For phylogenomic analysis, we selected five representative MAGs “Ca”. Eudormicrobiaceae”, all species “Ca. The genome of Eremiobacterota and members of other phyla (including UBP13, Armatimonadota, Patescibacteria, Dormibacterota, Chloroflexota, Cyanobacteria, Actinobacteria and Planctomycetota) is available from GTDB (r89)13. All of these genomes were annotated as previously described for single copy marker gene extraction and BGC annotation. The GTDB genomes were conserved according to the above integrity and contamination criteria. Phylogenetic analysis was performed using the Anvi’o Phylogenetics59 workflow. The tree was constructed using IQTREE (v.2.0.3) (default options and -bb 1000)80 on an alignment of 39 tandem ribosomal proteins identified by Anvi’o (MUSCLE, v.3.8.1551)81. His positions were reduced. to cover at least 50% of the genome82 and Planctomycecota was used as an outgroup based on the GTDB tree topology. One tree of 40 uscMGs was built using the same tools and parameters.
We used Traitar (v.1.1.2) with default parameters (phenotype, from nucleotides)83 to predict common microbial traits. We explored a potential predatory lifestyle based on a previously developed predatory index84 that depends on the content of a protein-coding gene in the genome. Specifically, we use DIAMOND to compare proteins in the genome against the OrthoMCL database (v.4)85 using the options –more-sensive –id 25 –query-cover 70 –subject-cover 70 –top 20 AND count the genes corresponding to the marker genes for predators and non-predators. The index is the difference between the number of predatory and non-predatory markings. As an additional control, we also analyzed the “Ca” genome. The Entotheonella TSY118 factor is based on its association with Ca. Eudoremicrobium (large genome size and biosynthetic potential). Next, we tested potential links between predator and non-predator marker genes and the biosynthetic potential of Ca. Eudormicrobiaceae” and found that no more than one gene (from any type of marker gene, i.e. predator/non-predator gene) overlaps with BGC, suggesting that BGC does not confound predation signals. Additional genomic annotation of scrambled replicons was performed using TXSSCAN (v.1.0.2) to specifically examine the secretion system, pili, and flagella86.
Five representative ‘Ca’s were mapped by mapping 623 metatranscriptomes from the prokaryotic and eukaryotic enrichment fractions of the Tara oceans22,40,87 (using BWA, v.0.7.17-r1188, -a flag). Eudormicrobiaceae genome. BAM files were processed with FeatureCounts (v.2.0.1)88 after 80% read coverage and 95% identity filtering (with options featureCounts –primary -O –fraction -t CDS,tRNA -F GTF -g ID -p ) Counts the number of inserts per gene. The generated maps were normalized for gene length and marker gene abundance mOTU (length-normalized average insertion count for genes with insertion count >0) and log-transformed to 22.74 to obtain the relative expression per cell of each gene level, which also explains the variability from sample to the sample during sequencing. Such ratios allow for comparative analysis, mitigating composition problems when using relative abundance data. Only samples with >5 of the 10 mOTU marker genes were considered for further analysis to allow a large enough portion of the genome to be detected.
The normalized transcriptome profile of ‘Ca. E. taraoceanii was subjected to dimensionality reduction using UMAP and the resulting representation was used for unsupervised clustering using HDBSCAN (see above) to determine expression status. PERMANOVA tests the significance of differences between identified clusters in the original (not reduced) distance space. Differential expression between these conditions was tested across the genome (see above) and 201 KEGG pathways were identified in 6 functional groups, namely: BGC, secretion system and flagellar genes from TXSSCAN, degradation enzymes (protease and peptidases), and predatory and non-predatory genes. predatory index markers. For each sample, we calculated the median normalized expression for each class (note that BGC expression itself is calculated as the median expression of biosynthetic genes for that BGC) and tested for significance across states (Kruskal-Wallis test adjusted for FDR).
Synthetic genes were purchased from GenScript and PCR primers were purchased from Microsynth. Phusion polymerase from Thermo Fisher Scientific was used for DNA amplification. NucleoSpin plasmids, NucleoSpin gel and PCR purification kit from Macherey-Nagel were used for DNA purification. Restriction enzymes and T4 DNA ligase were purchased from New England Biolabs. Chemicals other than isopropyl-β-d-1-thiogalactopyranoside (IPTG) (Biosynth) and 1,4-dithiothreitol (DTT, AppliChem) were purchased from Sigma-Aldrich and used without further purification. The antibiotics chloramphenicol (Cm), spectinomycin dihydrochloride (Sm), ampicillin (Amp), gentamicin (Gt), and carbenicillin (Cbn) were purchased from AppliChem. Bacto Tryptone and Bacto Yeast Extract media components were purchased from BD Biosciences. Trypsin for sequencing was purchased from Promega.
Gene sequences were extracted from anti-SMASH predicted BGC 75.1. E. malaspinii (Supplementary information).
The genes embA (locus, MALA_SAMN05422137_METAG-framework_127-gene_5), embM (locus, MALA_SAMN05422137_METAG-framework_127-gene_4), and embAM (including intergene regions) were sequenced as synthetic constructs in pUC57(AmpR) with and without codons optimized for expression in E when. The embA gene was subcloned into the first multiple cloning site (MCS1) of pACYCDuet-1(CmR) and pCDFDuet-1(SmR) with BamHI and HindIII cleavage sites. The embM and embMopt genes (codon-optimized) were subcloned into MCS1 pCDFDuet-1(SmR) with BamHI and HindIII and placed in the second multiple cloning site of pCDFDuet-1(SmR) and pRSFDuet-1(KanR) (MCS2) with NdeI/ChoI. The embAM cassette was subcloned into pCDFDuet1(SmR) with BamHI and HindIII cleavage sites. The orf3/embI gene (locus, MALA_SAMN05422137_METAG-scaffold_127-gene_3) was constructed by overlap extension PCR using primers EmbI_OE_F_NdeI and EmbI_OE_R_XhoI, digested with NdeI/XhoI, and ligated into pCDFDuet-1-EmbM (MCS1) using the same restriction enzymes (Supplementary table). 6). Restriction enzyme digestion and ligation was performed according to the manufacturer’s protocol (New England Biolabs).
Post time: Mar-14-2023