In our recent white paper we describe how to investigate the functional potential of a microbial community in a polar desert in Antarctica using metagenomic shotgun sequencing data. In the original paper (1), the authors supplemented their microbiome data with qPCR analyses to investigate the expression of the most interesting genes discovered in the functional profiles to support their hypothesis that the microbial community survive by scavenging atmospheric trace gases. However, what if they had instead included RNA-seq transcriptomic data to evaluate gene activity in their samples? In this post, we show you how to add transcriptomics data to a microbiome survey using the tools of CLC Genomics Workbench.
The example below presents a de novo assembly based approach to metatranscriptomic analysis using CLC Genomics Workbench and the Microbial Genomics Module. There are, in fact, multiple approaches to performing metatranscriptomics data analysis, depending on the specific questions you may have. For a deeper review on best-practices in metatranscriptomics analysis we recommend you review Bashiardes et. al. (2), or read published examples where CLC Genomics Workbench was used for metatranscriptomics research; some recent interesting examples include a study of thehoney bee (3) and termite (4) microbiomes and their associated metatranscriptomes.
The example metatranscriptomic pipeline presented below consists of two parts (shown in Figure 1). Part 1 includes: assembling the metagenome; grouping contigs into bins to reconstruct the microbial genomes; and finding and annotating genes. It is also described in further detail in our recent white paper on Antarctic microbiome profiling. A common approach and caveat of comparing metatranscriptomes from multiple samples is often to create a “co-assembly” across your samples that serves as a single reference list of contigs and genes for the downstream RNAseq analysis. A good example of this approach can be found in Marynowska et. al. (4).
Part 2 of the analysis pipeline involves adding the transcriptomic data to supplement the metagenomic survey with information on gene activity. Part 2 is the focus of this post and will be described below.
CLC Genomics Workbench include a suite a of tools designed for analyzing gene expression data. For this blog post, we will use just only a few of them. The RNA-Seq Analysis tool will start with mapping reads to the genome and the coding sequences. The tool requires a file with the reference genome and a file with annotations for protein coding sequences (CDS) or genes. If these are not already available from Part 1 of the pipeline (Figure 1), they can be generated using Track Tools -> Track Conversion -> Convert to Tracks. This will take an annotated genome or list of contigs as input and generate individual track files. Additional details on this conversion step can be found in our manual. In this case we need to generate a track for the genome and one for the annotated coding regions. From the read mappings, reads are categorized and assigned, and expression values are calculated. The output from the RNA-Seq Analysis tool is a table describing for each gene the number of reads mapped, the number of reads per kilobase gene, and the expression value. The results can be visualized in a track list along with the genes and the read mappings (Figure 2). The track list is interactively linked to the results table, and marking a CDS of interest in the table view, will shift the focus of the track list to that particular region.
From the track view read mappings can be manually inspected by zooming in on individual genes (Figure 3). In the case of the desert soil microbiome in Antarctica, genes supporting the use of atmospheric trace gases as carbon and energy sources could be searched out from the table, and the expression values inspected.
If your microbiome investigation involves comparing microbial communities at different times or under different conditions, transcriptomes can be compared across multiple states. This analysis can be performed with the tool Differential Expression for RNA-Seq. The tool performs a statistical test of the differential expression of two or more samples. The output is a table displaying for each gene, the fold change and the p-value for the statistical comparison. From this list, genes significantly changing expression levels under different biological conditions can be found.
CLC Genomics Workbench contain several additional tools for analyzing RNA-Seq data for more sophisticated comparisons and visualizations than what have been shown here. If you are interested in learning more or trying out the functionalities, you can always download a free trial.