DESeq2 Analysis of Montipora capitata 2023 RNAseq data
This post details DESeq2 analysis to identify differentially expressed genes for the Montipora capitata 2023 larval thermal tolerance project RNAseq. See my notebook posts and my GitHub repo for information on this project.
RNAseq data was obtained through polyA selection and NGS sequencing at Azenta Life Sciences. See my notebook posts for previous bioinformatics pipeline and QC analysis of these data.
Files for this project include RNAseq files for 54 samples. There are three parental phenotypes of these larvae: Nonbleached (from parents that were unbleached during a previous heat wave), bleached (from parents that bleached and recovered during a previous heat wave), and wildtype (collected as a sample from the natural reef).
Each of these parental phenotypes was exposed to 27, 30, or 33°C for 4 hour incubations at 9 days post fertilization. We have n=6 biological replicates per parental phenotype x temperature treatment. There are no random effects, temperature exposures were conducted in water baths in sealed glass jars with n=1 per temperature. Environmental data for these incubations is in my GitHub repo.
Overview
In this post, I have added my R script and output images of figures to go through the results of identifying DEGs.
Information included for context on the DESeq2 package in this post is from the DESeq2 vingette here.
The script for the analysis is on GitHub here.
Hypotheses
This analysis (and following functional enrichment analyses) is addressing three hypotheses:
- Gene expression will be different under elevated temperature (27°C vs 30-33°C)with increased expression of stress response genes.
- Gene expression will be different by parental phenotype.
- There will be differentially expressed genes between bleached and nonbleached offspring at high temperature, due to the parental thermal tolerance history. We expect Nonbleached offspring to show transcriptomic indicators of tolerance to stress.
Set up
Load required libraries.
if ("tidyverse" %in% rownames(installed.packages()) == 'FALSE') install.packages('tidyverse')
if ("genefilter" %in% rownames(installed.packages()) == 'FALSE') install.packages('genefilter')
if ("DESeq2" %in% rownames(installed.packages()) == 'FALSE') install.packages('DESeq2')
if ("RColorBrewer" %in% rownames(installed.packages()) == 'FALSE') install.packages('RColorBrewer')
if ("WGCNA" %in% rownames(installed.packages()) == 'FALSE') install.packages('WGCNA')
if ("flashClust" %in% rownames(installed.packages()) == 'FALSE') install.packages('flashClust')
if ("gridExtra" %in% rownames(installed.packages()) == 'FALSE') install.packages('gridExtra')
if ("ComplexHeatmap" %in% rownames(installed.packages()) == 'FALSE') install.packages('ComplexHeatmap')
if ("goseq" %in% rownames(installed.packages()) == 'FALSE') install.packages('goseq')
if ("dplyr" %in% rownames(installed.packages()) == 'FALSE') install.packages('dplyr')
if ("clusterProfiler" %in% rownames(installed.packages()) == 'FALSE') install.packages('clusterProfiler')
if ("pheatmap" %in% rownames(installed.packages()) == 'FALSE') install.packages('pheatmap')
if ("magrittr" %in% rownames(installed.packages()) == 'FALSE') install.packages('magrittr')
if ("vegan" %in% rownames(installed.packages()) == 'FALSE') install.packages('vegan')
#BiocManager::install('apeglm')
#BiocManager::install('EnhancedVolcano')
#BiocManager::install("VennDetail")
library("tidyverse")
library("genefilter")
library("WGCNA")
library("DESeq2")
library("RColorBrewer")
library("flashClust")
library("gridExtra")
library("goseq")
library("dplyr")
library("clusterProfiler")
library("pheatmap")
library("magrittr")
library("vegan")
library("apeglm")
library("EnhancedVolcano")
library("cowplot")
library("viridis")
library("pheatmap")
library("VennDetail")
Data input and filtering
Import the data files. Load metadata for each sample and the gene count matrix generated from this script.
#load metadata sheet with sample name and developmental stage information
metadata <- read.csv("data/rna_seq/sample_rnaseq_metadata.csv", header = TRUE, sep = ",")%>%select(sample, temperature, parent, symbiont, phenotype)
metadata$code<-paste0(metadata$temperature, "_", metadata$parent)
head(metadata)
#load gene count matrix generated from cluster computation
gcount <- as.data.frame(read.csv("data/rna_seq/Mcapitata2023_gene_count_matrix.csv", row.names="gene_id"), colClasses = double)
head(gcount)
Remove any genes that were not detected (sum of 0 counts across all samples).
dim(gcount)
gcount<-gcount %>%
mutate(Total = rowSums(.[, 1:54]))%>%
filter(!Total==0)%>%
select(!Total)
dim(gcount)
We started with 54,384 genes. which is the total number in the annotation. There were about 10,000 genes that had 0’s across all samples (not detected) with 44,690 remaining as detected in our dataset.
Conduct data filtering using the pOverA
function.
pOverA: Specifying the minimum count for a proportion of samples for each gene. Here, we are using a pOverA of 0.05. This is because we have 54 samples with a minimum of n=6 samples per group. Therefore, we will accept genes that are present in 6/54 = 0.11 of the samples because we expect different expression by treatment group (parent x temperature). We are further setting the minimum count of genes to 10, such that 11% of the samples must have a gene count of >10 in order for the gene to remain in the data set.
Filter in the package genefilter
. Pre-filtering our dataset improves quality of statistical analysis by removing low-coverage counts and will also reduce the memory size dataframe and increase the speed of the transformation and testing functions.
filt <- filterfun(pOverA(0.1,11))
#create filter for the counts data
gfilt <- genefilter(gcount, filt)
#identify genes to keep by count filter
gkeep <- gcount[gfilt,]
#identify gene lists
gn.keep <- rownames(gkeep)
#gene count data filtered in PoverA, P percent of the samples have counts over A
gcount_filt <- as.data.frame(gcount[which(rownames(gcount) %in% gn.keep),])
#How many rows do we have before and after filtering?
nrow(gcount) #Before
nrow(gcount_filt) #After
Before filtering, we had approximately 44,690 genes. After filtering with pOverA
, we have 24,005 genes. This indicates that there were 24,005 genes present in at least 10% of samples with at least 10 counts of each gene.
Prepare sample names. Keep “R” and any numbers following “R” in the colnames of gcount_filt. Our samples are identified as “R##”.
# Function to extract 'R' followed by numbers from column names
extract_R_numbers <- function(col_names) {
sub("R(\\d+).*", "\\1", col_names)
}
# Apply the function to all column names
new_col_names <- sapply(names(gcount_filt), extract_R_numbers)
# Rename the columns
names(gcount_filt) <- paste0("R", new_col_names)
# Output the modified dataframe
names(gcount_filt)
length(names(gcount_filt))
Column names are now correct and there are 54 samples as expected.
In order for the DESeq2 algorithms to work, the SampleIDs on the metadata file and count matrices have to match exactly and be in the same order. They are verified in the same order.
Format metadata and count data columns
Convert temperature and metadata categories to factors.
metadata$temperature<-as.factor(metadata$temperature)
metadata$parent<-as.factor(metadata$parent)
metadata$symbiont<-as.factor(metadata$symbiont)
metadata$code<-as.factor(metadata$code)
Set levels of factors with the reference level first. We are setting 27°C as the reference from which to compare gene expression at 30°C and 33°C. We are also setting Wildtype as the reference to compare Bleached and Nonbleached offspring to. This is because wildtype is a mixture of the natural population from Kaneohe Bay. Bleached and Nonbleached represent selected populations from the natural population of the bay. Therefore, wildtype is our reference group.
metadata$temperature<-factor(metadata$temperature, levels=c("27", "30", "33"))
metadata$parent<-factor(metadata$parent, levels=c("Wildtype", "Bleached", "Nonbleached"))
Make sure the metadata and the columns in the gene count matrix are all the same.
metadata$sample
colnames(gcount_filt)
Reorder them automatically if needed.
list<-colnames(gcount_filt)
list<-as.factor(list)
metadata$sample<-as.factor(metadata$sample)
# Re-order the levels
metadata$sample <- factor(as.character(metadata$sample), levels=list)
# Re-order the data.frame
metadata_ordered <- metadata[order(metadata$sample),]
metadata_ordered$sample
head(metadata_ordered)
metadata_ordered$sample
colnames(gcount_filt)
head(gcount_filt)
These match, so we are good to continue to the next step!
Construct DESeq2 data set by temperature x parent
Create a DESeqDataSet design from gene count matrix and labels. Here we set the design to test main effects of parent and temperature as well as the interaction of parent and temperature on gene expression to address our hypotheses.
We will use a formula with temperature and parent as main effects as well as their interaction on gene expression.
#Set DESeq2 design
gdds <- DESeqDataSetFromMatrix(countData = gcount_filt,
colData = metadata_ordered,
design = ~temperature + parent + temperature:parent)
First we are going to log-transform the data using a variance stabilizing transforamtion (VST). This function calculates a variance stabilizing transformation (VST) from the fitted dispersion-mean relation(s) and then transforms the count data (normalized by division by the size factors or normalization factors), yielding a matrix of values which are now approximately homoskedastic (having constant variance along the range of mean values).
To do this we first need to calculate the size factors of our samples. This is a rough estimate of how many reads each sample contains compared to the others. In order to use VST (the faster log2 transforming process) to log-transform our data, the size factors need to be less than 4.
SF.gdds <- estimateSizeFactors(gdds) #estimate size factors to determine if we can use vst to transform our data. Size factors should be less than 4 for us to use vst
print(sizeFactors(SF.gdds)) #View size factors
all(sizeFactors(SF.gdds)) < 4
All size factors are less than 4, so we can use a VST transformation. blind=FALSE should be used for transforming data for downstream analysis, where the full use of the design information should be made. blind=FALSE will skip re-estimation of the dispersion trend, if this has already been calculated. If many of genes have large differences in counts due to the experimental design, it is important to set blind=FALSE for downstream analysis.
gvst <- vst(gdds, blind=FALSE) #apply a variance stabilizing transforamtion to minimize effects of small counts and normalize wrt library size
head(assay(gvst), 3) #view transformed gene count data for the first three genes in the dataset.
dim(gvst)
Plot a heatmap of sample to sample distances
Plot a heatmap of sample to sample distances by calculating a distance matrix of our filtered and transformed gene count matrix. We should see 100% similarity of each sample compared to itself.
gsampleDists <- dist(t(assay(gvst))) #calculate distance matix
gsampleDistMatrix <- as.matrix(gsampleDists) #distance matrix
rownames(gsampleDistMatrix) <- colnames(gvst) #assign row names
colnames(gsampleDistMatrix) <- NULL #assign col names
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255) #assign colors
save_pheatmap_pdf <- function(x, filename, width=7, height=7) {
stopifnot(!missing(x))
stopifnot(!missing(filename))
pdf(filename, width=width, height=height)
grid::grid.newpage()
grid::grid.draw(x$gtable)
dev.off()
}
pht<-pheatmap(gsampleDistMatrix, #plot matrix
clustering_distance_rows=gsampleDists, #cluster rows
clustering_distance_cols=gsampleDists, #cluster columns
col=colors) #set colors
save_pheatmap_pdf(pht, "figures/rna_seq/sample_distances.pdf")
dev.off()
As expected, 1:1 mapping of samples show 100% similarity.
Conduct PERMANOVA and PCA for all genes
Prepare data for PERMANOVA test of the effect of temperature, parent, and the interaction on gene expression of all genes in our dataset after filtering (24,0005 genes).
test<-t(assay(gvst)) #export as matrix
test<-as.data.frame(test)
#add category columns
test$sample<-rownames(test)
test$temperature<-metadata$temperature[match(test$sample, metadata$sample)]
test$parent<-metadata$parent[match(test$sample, metadata$sample)]
Build PERMANOVA model for all genes in the dataset.
library(vegan)
library(factoextra)
scaled_test <-prcomp(test[c(1:24005)], scale=TRUE, center=TRUE)
fviz_eig(scaled_test)
# scale data
vegan <- scale(test[c(1:24005)])
# PerMANOVA
permanova<-adonis2(vegan ~ parent*temperature, data = test, method='eu')
permanova
Here are the results:
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = vegan ~ parent * temperature, data = test, method = "eu")
Df SumOfSqs R2 F Pr(>F)
parent 2 154649 0.12156 4.4308 0.001 ***
temperature 2 253981 0.19965 7.2767 0.001 ***
parent:temperature 4 78208 0.06148 1.1204 0.182
Residual 45 785321 0.61731
Total 53 1272159 1.00000
Gene expression is significantly different between temperature and parent, no interactive effect detected. This is interesting because these are effects detected on all genes. Therefore, we can interpret that temperature and parent have large scale effects on larval gene expression.
Plot a PCA of samples with shape for temperature and and color for parental phenotype for all genes.
gPCAdata <- plotPCA(gvst, intgroup = c("temperature", "parent"), returnData=TRUE)
percentVar <- round(100*attr(gPCAdata, "percentVar")) #plot PCA of samples with all data
allgenesfilt_PCA <- ggplot(gPCAdata, aes(PC1, PC2, color=parent, shape=temperature)) +
geom_point(size=3) +
ggtitle(expression(bold("All genes (24,005 genes)")))+
xlab(paste0("PC1: ",percentVar[1],"% variance")) +
ylab(paste0("PC2: ",percentVar[2],"% variance")) +
scale_color_manual(name="Parental Phenotype", values=c("Wildtype"="gray", "Bleached"="orange", "Nonbleached"="brown4"))+
scale_shape_manual(name="Temperature", values=c(19,17,15))+
geom_text(x=0, y=10, label="p[Parent]=0.001", color="black")+
geom_text(x=0, y=8.5, label="p[Temperature]=0.001", color="black")+
geom_text(x=0, y=7, label="p[Interaction]=0.182", color="darkgray")+
theme_classic() + #Set background color
theme(panel.border = element_blank(), # Set border
#panel.grid.major = element_blank(), #Set major gridlines
#panel.grid.minor = element_blank(), #Set minor gridlines
axis.line = element_line(colour = "black"), #Set axes color
plot.background=element_blank()); allgenesfilt_PCA
ggsave("figures/rna_seq/pca.pdf", allgenesfilt_PCA, width=8, height=4)
ggsave("figures/rna_seq/pca.jpeg", allgenesfilt_PCA, width=8, height=4)
Add a plot with elipses.
allgenes_ellipse<-allgenesfilt_PCA + stat_ellipse();allgenes_ellipse
ggsave("figures/rna_seq/pca_ellipse.pdf", allgenes_ellipse, width=9, height=5)
ggsave("figures/rna_seq/pca_ellipse.jpeg", allgenes_ellipse, width=9, height=5)
These PCA’s show that there are effects of temperature and parental phenotype on global gene expression. As expected, the samples from the highest temperature (33°C) are the most separated from other groups. At each temperature, there is separation between each parental phenotype. Next, we will identify DEG’s between these groups to address our hypotheses.
Identify, visualize, and list DEGs
We will identify DEGs to address our hypotheses by doing the following:
- DEGs between temperatures, using pairwise comparisons between each temperature treatment (Hypothesis 1).
- DEGs between parents, using the same pairwise approach between each parental phenotype (Hypothesis 2).
- DEGs between parents at elevated temperature (Hypothesis 3). More information on how I did this in the sections below.
There are two statistical tests we can use in DESeq2: Wald test and likelihood ratio test (LRT). Here is what I have learned about each and why I chose the method that I use below.
Here is the information from the DESeq2 vingette: “DESeq2 offers two kinds of hypothesis tests: the Wald test, where we use the estimated standard error of a log2 fold change to test if it is equal to zero, and the likelihood ratio test (LRT). The LRT examines two models for the counts, a full model with a certain number of terms and a reduced model, in which some of the terms of the full model are removed. The test determines if the increased likelihood of the data using the extra terms in the full model is more than expected if those extra terms are truly zero. The LRT is therefore useful for testing multiple terms at once, for example testing 3 or more levels of a factor at once, or all interactions between two variables. The LRT for count data is conceptually similar to an analysis of variance (ANOVA) calculation in linear regression, except that in the case of the Negative Binomial GLM, we use an analysis of deviance (ANODEV), where the deviance captures the difference in likelihood between a full and a reduced model.”
Given this information, the Wald test is most appropriate for cases where there are comparisons between 2 groups or when you are interested in conducting pairwise comparisons. The LRT test is appropriate for cases where there are multiple levels of a factor where you are interested in the whole effect of that factor (e.g., a time series where you want to detect any effects across time).
Because we have 3 levels of each factor that are categorical and I want to conduct pairwise comparisons to address my hypotheses, I will use the Wald test.
We will select DEG’s that have a p-value < 0.01 (we have a lot of DEG’s, so I am chosing a more stringent cut off) and log2foldchange>1.5. This will keep genes that are 2.83 times higher or lower expressed in the contrast. A common cut off is >1, but because we have a lot of DEG’s, I increased this to a more stringent cut off. These cut offs were used for temperature and parent effects, which had a high number of DEGs and therefore we need a more stringent cut off to take only the strongest responses to functional enrichment.
For detecting genes in the interactive effects (#3 below), I increased these cut offs to p-value < 0.05 and fold change >1. This selects genes at p<0.05 and those that are expressed 2 times higher or lower in the selected contrast. I used less stringent cut offs because the number of DEGs was low for this interaction and therefore I want to include all statistically significant DEGs and do not have the same need to reduce gene lists for functional enrichment.
1. View DEG’s by Temperature (Hypothesis #1)
Run Wald test.
DEG_T <- DESeq(gdds, test="Wald")
View results by temperature effects. Create dataframe for each comparison and add a column for the specific contrast. These are the results terms that are generated by the DESeq2 model:
[1] "Intercept" "temperature_30_vs_27" "temperature_33_vs_27"
[4] "parent_Bleached_vs_Wildtype" "parent_Nonbleached_vs_Wildtype" "temperature30.parentBleached"
[7] "temperature33.parentBleached" "temperature30.parentNonbleached" "temperature33.parentNonbleached"
These results are the possible contrasts that we can use for identifying DEGs through specific comparisons. For identifying DEGs between levels within one factor, we will use the following notation:
results(DEG, contrast=c("temperature","30","27"))
Here, we will identify DEGs between 27°C and 30°C with the main effect of temperature. The reference level is listed second, with our test level entered first. Therefore, log2fold change values from this comparison are positive to indicate a gene that is upregulated at 30°C compared to 27°C. log2fold change values that are negative indicate a gene that is downregulated at 30°C compared to 27°C.
I will conduct these comparisons for 27 vs 30, 27 vs 33, and 30 vs 33.
resultsNames(DEG_T)
#this shows the comparisons made (shown similar to lm output in terms of interactions)
temp30_deg<-results(DEG_T, contrast=c("temperature","30","27"))
temp30_deg <- as.data.frame(subset(temp30_deg, padj<0.01))
temp30_deg <- as.data.frame(subset(temp30_deg, abs(log2FoldChange)>1.5))
temp30_deg$contrast <- c("27vs30")
temp30_deg$gene <- rownames(temp30_deg)
rownames(temp30_deg) <- NULL
dim(temp30_deg)
temp33_deg<-results(DEG_T, contrast=c("temperature","33","27"))
temp33_deg <- as.data.frame(subset(temp33_deg, padj<0.01))
temp33_deg <- as.data.frame(subset(temp33_deg, abs(log2FoldChange)>1.5))
temp33_deg$contrast <- c("27vs33")
temp33_deg$gene <- rownames(temp33_deg)
rownames(temp33_deg) <- NULL
dim(temp33_deg)
temp30vs33_deg<-results(DEG_T, contrast=c("temperature","33","30"))
temp30vs33_deg <- as.data.frame(subset(temp30vs33_deg, padj<0.01))
temp30vs33_deg <- as.data.frame(subset(temp30vs33_deg, abs(log2FoldChange)>1.5))
temp30vs33_deg$contrast <- c("30vs33")
temp30vs33_deg$gene <- rownames(temp30vs33_deg)
rownames(temp30vs33_deg) <- NULL
dim(temp30vs33_deg)
There are 24 genes differentially expressed between 27 and 30°C. There are 914 genes differentially expressed between 27 and 33°C. There are 244 genes differentially expressed between 30 and 33°C. There are 1,182 genes total that are DEG’s from these pairwise comparisons. In these contrasts, positive fold changes indicate higher expression in the first listed contrast. I always had the higher temperature listed first.
Combine this list of DEGs into one dataframe and display the number of unique genes in this list.
DEG_temperature<-rbind(temp30_deg, temp33_deg, temp30vs33_deg)
dim(DEG_temperature)
length(unique(DEG_temperature$gene))
There are 1,182 total genes and 936 unique genes, indicating that some are shared in the contrasts.
Now that we have a list of DEG’s, obtain a VST normalized gene count matrix for these DEGs.
DEG_temperature_vst <- gdds[unique(DEG_temperature$gene)]
dim(DEG_temperature_vst)
DEG_temperature_vst <- varianceStabilizingTransformation(DEG_temperature_vst)
Upset plot of genes between tempeature contrasts
Create an upset diagram showing genes unique and shared in each temperature comparison.
list1<-DEG_temperature%>%
filter(contrast=="27vs30")
list2<-DEG_temperature%>%
filter(contrast=="27vs33")
list3<-DEG_temperature%>%
filter(contrast=="30vs33")
ven <- venndetail(list("27vs30" = list1$gene, "27vs33" = list2$gene,
"30vs33" = list3$gene))
upset_temperature<-plot(ven, type = "upset");upset_temperature
pdf("figures/rna_seq/upset_temperature.pdf", width=8, height=6)
par(mar = c(5, 5, 5, 5))
plot(ven, type = "upset")
dev.off()
The highest number of unique genes are in the 27 vs 33°C comparison, which is expected because of the strong temperature effects at this high temperature. There are also many genes shared between 30 vs 33°C and 27 vs 33°C comparisons, demonstrating that the largest changes in gene expression occur at 33°C with far less response between 27°C and 30°C. This makes sense - 33°C is stressful!
Plot a volcano plot for these DEGs
Next, display a volcano plot for each of these comparisons.
27 vs 30°C
res <- results(DEG, contrast=c("temperature","30","27"))
volcano_plot_27vs30<-EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
title = '27 vs 30°C',
pCutoff = 0.01,
FCcutoff = 1.5,
pointSize = 3.0,
labSize = 0.0,
legendPosition = 'none');volcano_plot_27vs30
27 vs 33°C
res <- results(DEG, contrast=c("temperature","33","27"))
volcano_plot_27vs33<-EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
title = '27 vs 33°C',
pCutoff = 0.01,
FCcutoff = 1.5,
pointSize = 3.0,
labSize = 0.0,
legendPosition = 'none');volcano_plot_27vs33
30 vs 33°C
res <- results(DEG, contrast=c("temperature","33","30"))
volcano_plot_30vs33<-EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
title = '30 vs 33°C',
pCutoff = 0.01,
FCcutoff = 1.5,
pointSize = 3.0,
labSize = 0.0,
legendPosition = 'none');volcano_plot_30vs33
Plot all volcano plots together.
volcano_temp_grid<-plot_grid(volcano_plot_27vs30, volcano_plot_27vs33, volcano_plot_30vs33, align="vh", nrow=1, ncol=3)
ggsave("figures/rna_seq/volcano_temperature.pdf", plot=volcano_temp_grid, width=18, height=6)
ggsave("figures/rna_seq/volcano_temperature.png", plot=volcano_temp_grid, width=18, height=6)
Plot a heatmap for genes that are differentially expressed between temperatures
Next, plot a heatmap of expression of genes that are in the list of DEG’s that are differentially expressed between temperatures.
Set themes and metadata colors for heatmap.
legend_names_col = colnames(assay(DEG_temperature_vst))
df <- as.data.frame(colData(gdds)[,c("temperature","parent")])
df$temperature <- factor(as.character(df$temperature), levels=c("27", "30", "33"))
df$parent <- factor(as.character(df$parent), levels=c("Wildtype", "Bleached", "Nonbleached"))
df<-df%>%
arrange(temperature, parent)
ann_colors = list(
temperature = c("27" = "lightblue", "30" = "pink2","33"= "red2"),
parent = c("Bleached" = "orange", "Nonbleached" = "brown4", "Wildtype" = "gray")
)
Plot heatmap.
pdf("figures/rna_seq/DEG_heatmap_temperature.pdf", width=15, height=20)
pheatmap(assay(DEG_temperature_vst), cluster_rows=TRUE, show_rownames=FALSE, color=inferno(10), show_colnames=TRUE, fontsize_row=3, scale="row", cluster_cols=TRUE, annotation_col=df, annotation_colors = ann_colors, labels_col=legend_names_col)
dev.off()
We can clearly see sets of genes that are very different in 33°C vs the other temperatures, which makes sense due to the high number of unique DEGS at this temperature. There are sets of genes up regulated and down regulated at 33°C that we will investigate with functional enrichment analyses. There are also sets of genes that vary between 27°C and 30°C and throughout these comparisons there are sets of genes that vary by parent as well. We will dig into this further below by looking at DEGs between parents at high temperatures.
Sample R93 looks like it has different expression from the others in the same group. It was not resequenced and has very high read depth (31+ million reads). Does not appear to be an outlier due to technical differences. No record of anything weird happening with this sample during collection in online or physical notebook. No recorded problems with extraction, concentrations of RNA were in mid to upper range. We will keep this sample and examine further in functional enrichment.
Run a PERMANOVA and plot a PCA for temperature comparisons
Conduct PERMANOVA by parent and temperature
Export data for PERMANOVA test.
test_DEG_temp<-t(assay(DEG_temperature_vst)) #export as matrix
test_DEG_temp<-as.data.frame(test_DEG_temp)
#add category columns
test_DEG_temp$sample<-rownames(test_DEG_temp)
test_DEG_temp$temperature<-metadata$temperature[match(test_DEG_temp$sample, metadata$sample)]
test_DEG_temp$parent<-metadata$parent[match(test_DEG_temp$sample, metadata$sample)]
Build PERMANOVA model for DEG’s that are different between temperatures
dim(test_DEG_temp)
scaled_test_DEG_temp <-prcomp(test_DEG_temp[c(1:936)], scale=TRUE, center=TRUE)
fviz_eig(scaled_test_DEG_temp)
# scale data
vegan_DEG_temp <- scale(test_DEG_temp[c(1:936)])
# PerMANOVA
permanova_DEG<-adonis2(vegan_DEG_temp ~ parent*temperature, data = test_DEG_temp, method='eu')
permanova_DEG
About 60% of variance is explained by first PC followed by <10% on following PCs.
Here are the results:
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = vegan_DEG_temp ~ parent * temperature, data = test_DEG_temp, method = "eu")
Df SumOfSqs R2 F Pr(>F)
parent 2 3587 0.07231 5.4969 0.002 **
temperature 2 29516 0.59498 45.2292 0.001 ***
parent:temperature 4 1822 0.03672 1.3958 0.183
Residual 45 14683 0.29598
Total 53 49608 1.00000
Temperature is significant as expected because we are analyzing temperature DEGs. Parent is also significant. This makes sense because in our global gene expression PERMANOVA and PCA above we see that global expression is different between parental phenotypes. This shows that parental separation is present in differentially expressed genes between temperatures. We will look into this further below.
Plot a PCA of DEG’s for temperature.
gPCAdata_DEG_temp <- plotPCA(DEG_temperature_vst, intgroup = c("temperature", "parent"), returnData=TRUE)
percentVar_DEG_temp <- round(100*attr(gPCAdata_DEG_temp, "percentVar")) #plot PCA of samples with all data
DEG_PCA_temp <- ggplot(gPCAdata_DEG_temp, aes(PC1, PC2, color=parent, shape=temperature)) +
geom_point(size=3) +
ggtitle(expression(bold("DEGs: temperature (936 genes)")))+
xlab(paste0("PC1: ",percentVar_DEG_temp[1],"% variance")) +
ylab(paste0("PC2: ",percentVar_DEG_temp[2],"% variance")) +
scale_color_manual(name="Parental Phenotype", values=c("Wildtype"="gray", "Bleached"="orange", "Nonbleached"="brown4"))+
scale_shape_manual(name="Temperature", values=c(19,17,15))+
geom_text(x=0, y=-8, label="p[Parent]=0.002", color="black")+
geom_text(x=0, y=-9.5, label="p[Temperature]=0.001", color="black")+
geom_text(x=0, y=-11, label="p[Interaction]=0.183", color="darkgray")+
theme_classic() + #Set background color
theme(panel.border = element_blank(), # Set border
#panel.grid.major = element_blank(), #Set major gridlines
#panel.grid.minor = element_blank(), #Set minor gridlines
axis.line = element_line(colour = "black"), #Set axes color
plot.background=element_blank()); DEG_PCA_temp
ggsave("figures/rna_seq/pca_DEG_temperature.pdf", DEG_PCA_temp, width=8, height=4)
ggsave("figures/rna_seq/pca_DEG_temperature.png", DEG_PCA_temp, width=8, height=4)
From this we can see very clear separation by temperature with high variance explained by PC1. The difference in parent exists in these DEG’s, explained by PC2 with lower variance explained. This shows that DEGs between temperatures are also influenced by parental history.
2. View DEG’s by Parent (Hypothesis #2)
Identify DEGs using the pairwise approach as done above for temperature. Compare Nonbleached (NB) larvae to wildtype (WT), NB to Bleached (B), and B to WT.
DEG_P <- DESeq(gdds, test="Wald")
View results by parental effects. Create dataframe for each comparison and add a column for the specific contrast.
resultsNames(DEG_P)
#this shows the comparisons made (shown similar to lm output in terms of interactions)
BvsWT_deg<-results(DEG_P, contrast=c("parent","Bleached","Wildtype"))
BvsWT_deg <- as.data.frame(subset(BvsWT_deg, padj<0.01))
BvsWT_deg <- as.data.frame(subset(BvsWT_deg, abs(log2FoldChange)>1.5))
BvsWT_deg$contrast <- c("BvsWT")
BvsWT_deg$gene <- rownames(BvsWT_deg)
rownames(BvsWT_deg) <- NULL
dim(BvsWT_deg)
NBvsWT_deg<-results(DEG_P, contrast=c("parent","Nonbleached","Wildtype"))
NBvsWT_deg <- as.data.frame(subset(NBvsWT_deg, padj<0.01))
NBvsWT_deg <- as.data.frame(subset(NBvsWT_deg, abs(log2FoldChange)>1.5))
NBvsWT_deg$contrast <- c("NBvsWT")
NBvsWT_deg$gene <- rownames(NBvsWT_deg)
rownames(NBvsWT_deg) <- NULL
dim(NBvsWT_deg)
NBvsB_deg<-results(DEG_P, contrast=c("parent","Nonbleached","Bleached"))
NBvsB_deg <- as.data.frame(subset(NBvsB_deg, padj<0.01))
NBvsB_deg <- as.data.frame(subset(NBvsB_deg, abs(log2FoldChange)>1.5))
NBvsB_deg$contrast <- c("NBvsB")
NBvsB_deg$gene <- rownames(NBvsB_deg)
rownames(NBvsB_deg) <- NULL
dim(NBvsB_deg)
There are 207 genes differentially expressed between bleached and wildtype. There are 112 genes differentially expressed between nonbleached and wildtype. There are 282 genes differentially expressed between nonbleached and bleached. In these contrasts, positive fold changes indicate higher expression in the first listed contrast. Nonbleached was listed first when compared to wildtype and bleached. Bleached was listed first when compared to wildtype.
Combine this list of DEGs into one dataframe.
DEG_parent<-rbind(BvsWT_deg, NBvsWT_deg, NBvsB_deg)
dim(DEG_parent)
length(unique(DEG_parent$gene))
There are 601 total genes and 432 unique genes, indicating that 169 are shared between at least two of the contrasts.
Get a VST normalized gene count matrix for these DEGs.
DEG_parent_vst <- gdds[unique(DEG_parent$gene)]
dim(DEG_parent_vst)
DEG_parent_vst <- varianceStabilizingTransformation(DEG_parent_vst)
Upset plot of genes between tempeature contrasts
Create an upset diagram showing genes unique and shared in each parent comparison.
list1_parent<-DEG_parent%>%
filter(contrast=="NBvsWT")
list2_parent<-DEG_parent%>%
filter(contrast=="BvsWT")
list3_parent<-DEG_parent%>%
filter(contrast=="NBvsB")
ven_parent <- venndetail(list("NBvsWT" = list1_parent$gene, "BvsWT" = list2_parent$gene,
"NBvsB" = list3_parent$gene))
upset_parent<-plot(ven_parent, type = "upset");upset_parent
pdf("figures/rna_seq/upset_parent.pdf", width=8, height=6)
par(mar = c(5, 5, 5, 5))
plot(ven_parent, type = "upset")
dev.off()
There are genes that are unique to each comparisons and genes shared between each comparison. The highest number of shared genes is between comparisons of B to NB or WT larvae, suggesting particularly different gene activity in the B group.
Plot a volcano plot for these comparisons.
Generate a volcano plot for DEGs from each pairwise comparison.
B vs WT
res <- results(DEG, contrast=c("parent","Bleached","Wildtype"))
volcano_plot_BvsWT<-EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
title = 'Bleached vs Wildtype',
pCutoff = 0.01,
FCcutoff = 1.5,
pointSize = 3.0,
labSize = 0.0,
legendPosition = 'none');volcano_plot_BvsWT
NB vs WT
res <- results(DEG, contrast=c("parent","Nonbleached","Wildtype"))
volcano_plot_NBvsWT<-EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
title = 'Nonbleached vs Wildtype',
pCutoff = 0.01,
FCcutoff = 1.5,
pointSize = 3.0,
labSize = 0.0,
legendPosition = 'none');volcano_plot_NBvsWT
NB vs B
res <- results(DEG, contrast=c("parent","Nonbleached","Bleached"))
volcano_plot_NBvsB<-EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
title = 'Nonbleached vs Bleached',
pCutoff = 0.01,
FCcutoff = 1.5,
pointSize = 3.0,
labSize = 0.0,
legendPosition = 'none');volcano_plot_NBvsB
Plot all volcano plots together.
volcano_parent_grid<-plot_grid(volcano_plot_BvsWT, volcano_plot_NBvsWT, volcano_plot_NBvsB, align="vh", nrow=1, ncol=3)
ggsave("figures/rna_seq/volcano_parent.pdf", plot=volcano_parent_grid, width=18, height=6)
ggsave("figures/rna_seq/volcano_parent.png", plot=volcano_parent_grid, width=18, height=6)
Plot a heatmap for genes that are differentially expressed between parent phenotypes
Generate a heatmap as done for temperature with DEG’s between parent groups.
Set themes and metadata.
legend_names_col = colnames(assay(DEG_parent_vst))
df <- as.data.frame(colData(gdds)[,c("temperature","parent")])
df$temperature <- factor(as.character(df$temperature), levels=c("27", "30", "33"))
df$parent <- factor(as.character(df$parent), levels=c("Wildtype", "Bleached", "Nonbleached"))
df<-df%>%
arrange(temperature, parent)
ann_colors = list(
temperature = c("27" = "lightblue", "30" = "pink2","33"= "red2"),
parent = c("Bleached" = "orange", "Nonbleached" = "brown4", "Wildtype" = "gray")
)
Plot heatmap.
pdf("figures/rna_seq/DEG_heatmap_parent.pdf", width=15, height=20)
pheatmap(assay(DEG_parent_vst), cluster_rows=TRUE, show_rownames=FALSE, color=inferno(10), show_colnames=TRUE, fontsize_row=3, scale="row", cluster_cols=TRUE, annotation_col=df, annotation_colors = ann_colors, labels_col=legend_names_col)
dev.off()
There are clear sets of genes that are either up or down regulated depending on the parent group. We can see temperature effects within each parent group, as expected.
Run a PERMANOVA and plot a PCA for parent comparisons.
Conduct PERMANOVA by parent and temperature of parental DEGs.
Export data for PERMANOVA test.
test_DEG_parent<-t(assay(DEG_parent_vst)) #export as matrix
test_DEG_parent<-as.data.frame(test_DEG_parent)
#add category columns
test_DEG_parent$sample<-rownames(test_DEG_parent)
test_DEG_parent$temperature<-metadata$temperature[match(test_DEG_parent$sample, metadata$sample)]
test_DEG_parent$parent<-metadata$parent[match(test_DEG_parent$sample, metadata$sample)]
Build PERMANOVA model for DEG’s that are different between parent phenotypes.
dim(test_DEG_parent)
scaled_test_DEG_parent <-prcomp(test_DEG_parent[c(1:432)], scale=TRUE, center=TRUE)
fviz_eig(scaled_test_DEG_parent)
# scale data
vegan_DEG_parent <- scale(test_DEG_parent[c(1:432)])
# PerMANOVA
permanova_DEG_parent<-adonis2(vegan_DEG_parent ~ parent*temperature, data = test_DEG_parent, method='eu')
permanova_DEG_parent
About 35% of variance is explained by first PC followed by 15% on the second PC and <10% on the third PC.
The results are:
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = vegan_DEG_parent ~ parent * temperature, data = test_DEG_parent, method = "eu")
Df SumOfSqs R2 F Pr(>F)
parent 2 11344.8 0.49549 35.8707 0.001 ***
temperature 2 3117.4 0.13615 9.8567 0.001 ***
parent:temperature 4 1317.8 0.05756 2.0834 0.010 **
Residual 45 7116.0 0.31080
Total 53 22896.0 1.00000
Parent is significant as expected because we are analyzing parental DEGs. There are still temperature effects, which also makes sense because there was such a strong temperature effect on global expression. Interestingly, there is a parent x temperature interaction, indicating response to temperature is mediated by parental phenotype. We will look into this more next.
Plot a PCA of DEG’s for parent
gPCAdata_DEG_parent <- plotPCA(DEG_parent_vst, intgroup = c("temperature", "parent"), returnData=TRUE)
percentVar_DEG_parent <- round(100*attr(gPCAdata_DEG_parent, "percentVar")) #plot PCA of samples with all data
DEG_PCA_parent <- ggplot(gPCAdata_DEG_parent, aes(PC1, PC2, color=parent, shape=temperature)) +
geom_point(size=3) +
ggtitle(expression(bold("DEGs: parent (432 genes)")))+
xlab(paste0("PC1: ",percentVar_DEG_parent[1],"% variance")) +
ylab(paste0("PC2: ",percentVar_DEG_parent[2],"% variance")) +
scale_color_manual(name="Parental Phenotype", values=c("Wildtype"="gray", "Bleached"="orange", "Nonbleached"="brown4"))+
scale_shape_manual(name="Temperature", values=c(19,17,15))+
geom_text(x=0, y=10, label="p[Parent]=0.001", color="black")+
geom_text(x=0, y=8, label="p[Temperature]=0.001", color="black")+
geom_text(x=0, y=6, label="p[Interaction]=0.010", color="black")+
theme_classic() + #Set background color
theme(panel.border = element_blank(), # Set border
#panel.grid.major = element_blank(), #Set major gridlines
#panel.grid.minor = element_blank(), #Set minor gridlines
axis.line = element_line(colour = "black"), #Set axes color
plot.background=element_blank()); DEG_PCA_parent
ggsave("figures/rna_seq/pca_DEG_parent.pdf", DEG_PCA_parent, width=8, height=4)
ggsave("figures/rna_seq/pca_DEG_parent.png", DEG_PCA_parent, width=8, height=4)
From this we can see very clear separation by parent with high variance explained by PC1 separating parent. There is significant separation between temperatures within parent with 33°C being especially different, as expected from temperature DEGs. Interestingly, gene expression of wildtype parents at 27-30°C is separated from the other parental phenotypes.
3. View DEG’s differential between parents at elevated temperatures (Hypothesis #3)
Next, view genes that are uniquely differentially expressed between parental types at 30°C and 33°C. In other words, view genes in which gene expression of parental phenotype is mediated by temperature.
For this approach, the DESeq2 guidance states: “The key point to remember about designs with interaction terms is that, unlike for a design ~genotype + condition, where the condition effect represents the overall effect controlling for differences due to genotype, by adding genotype:condition, the main condition effect only represents the effect of condition for the reference level of genotype (I, or whichever level was defined by the user as the reference level). The interaction terms genotypeII.conditionB and genotypeIII.conditionB give the difference between the condition effect for a given genotype and the condition effect for the reference genotype.” In the examples for this effect, here is how it is recommended to be used:
# the interaction term for condition effect in genotype III vs genotype II.
# this tests if the condition effect is different in III compared to II
results(dds, contrast=list("genotypeIII.conditionB", "genotypeII.conditionB"))
Here, we want to know if the temperature effect is different in Nonbleached vs Bleached larvae.
Therefore, I will use contrast DEGs between temperature30.parentNonbleached
and temperature30.parentBleached
or temperature33.parentNonbleached
and temperature33.parentBleached
in the DESeq2 results. This will examine the difference between bleached and nonbleached larvae at 30°C and then at 33°C. Each of the NB and B parental groups have been compared to the wildtype group as a reference. This will provide us with DEG’s between parent groups at temperature.
Create dataframe for each comparison and add a column for the specific contrast.
I am changing the p-value to 0.05 and the fold change to >1 for this test. This is because we have far fewer genes detected in the interaction as discussed above. To test this hypothesis, we want to find all genes that are significant in this interaction.
DEG_I <- DESeq(gdds, test="Wald")
resultsNames(DEG_I)
#this shows the comparisons made (shown similar to lm output in terms of interactions)
NBvsB_30_deg<-results(DEG_I, contrast=list("temperature30.parentNonbleached", "temperature30.parentBleached"))
NBvsB_30_deg <- as.data.frame(subset(NBvsB_30_deg, padj<0.05))
NBvsB_30_deg <- as.data.frame(subset(NBvsB_30_deg, abs(log2FoldChange)>1))
NBvsB_30_deg$contrast <- c("NBvsB_30C")
NBvsB_30_deg$gene <- rownames(NBvsB_30_deg)
rownames(NBvsB_30_deg) <- NULL
dim(NBvsB_30_deg)
NBvsB_33_deg<-results(DEG_I, contrast=list("temperature33.parentNonbleached", "temperature33.parentBleached"))
NBvsB_33_deg <- as.data.frame(subset(NBvsB_33_deg, padj<0.05))
NBvsB_33_deg <- as.data.frame(subset(NBvsB_33_deg, abs(log2FoldChange)>1))
NBvsB_33_deg$contrast <- c("NBvsB_33C")
NBvsB_33_deg$gene <- rownames(NBvsB_33_deg)
rownames(NBvsB_33_deg) <- NULL
dim(NBvsB_33_deg)
There are 7 DEGs between NB and B at 30°C. There are 49 at 33°C. In these contrasts, positive fold changes indicate higher expression in the first listed contrast - in this case, Nonbleached samples.
Combine this list of DEGs into one dataframe.
DEG_interaction<-rbind(NBvsB_30_deg, NBvsB_33_deg)
dim(DEG_interaction)
length(unique(DEG_interaction$gene))
There are 56 total genes and 55 unique genes, indicating that only one is shared between 30°C and 33°C.
DEG_interaction_unique <- DEG_interaction
dim(DEG_interaction_unique)
str(DEG_interaction_unique)
length(unique(DEG_interaction_unique$gene))
There are 55 genes that are differentially expressed between NB and B at 30 or 33°C.
Get a VST normalized gene count matrix for these DEGs.
DEG_interaction_vst <- gdds[unique(DEG_interaction_unique$gene)]
dim(DEG_interaction_vst)
DEG_interaction_vst <- varianceStabilizingTransformation(DEG_interaction_vst)
Upset plot of genes between tempeature contrasts
Create an upset diagram showing genes unique and shared in each temperature comparison.
list1_interaction<-DEG_interaction_unique%>%
filter(contrast=="NBvsB_30C")
dim(list1_interaction)
list2_interaction<-DEG_interaction_unique%>%
filter(contrast=="NBvsB_33C")
dim(list2_interaction)
ven_interaction <- venndetail(list("NBvsB_30C" = list1_interaction$gene, "NBvsB_33C" = list2_interaction$gene))
upset_interaction<-plot(ven_interaction, type = "upset");upset_interaction
pdf("figures/rna_seq/upset_interaction.pdf", width=8, height=6)
plot(upset_interaction, type = "upset")
dev.off()
There are 48 DEGs at 33°C and 6 at 30°C with 1 shared between the contrasts.
Plot a volcano plot for these comparisons
Generate a volcano plot for each of the two contrasts.
NB vs B at 30°C
res <- results(DEG, contrast=list("temperature30.parentNonbleached", "temperature30.parentBleached"))
volcano_plot_NBvsB_30C<-EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
title = 'Nonbleached vs Bleached at 30°C',
pCutoff = 0.05,
FCcutoff = 1,
pointSize = 3.0,
labSize = 0.0,
legendPosition = 'none');volcano_plot_NBvsB_30C
NB vs B at 33°C
res <- results(DEG, contrast=list("temperature33.parentNonbleached", "temperature33.parentBleached"))
volcano_plot_NBvsB_33C<-EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
title = 'Nonbleached vs Bleached at 33°C',
pCutoff = 0.05,
FCcutoff = 1,
pointSize = 3.0,
labSize = 0.0,
legendPosition = 'none');volcano_plot_NBvsB_33C
Plot all volcano plots together.
volcano_interaction_grid<-plot_grid(volcano_plot_NBvsB_30C, volcano_plot_NBvsB_33C, align="vh", nrow=1, ncol=2)
ggsave("figures/rna_seq/volcano_interaction.pdf", plot=volcano_interaction_grid, width=12, height=6)
ggsave("figures/rna_seq/volcano_interaction.png", plot=volcano_interaction_grid, width=12, height=6)
There are more red dots in these plots than there should be at 30°C - I expect there is some discrepancy between the DEG analysis and this EnhancedVolcano
package. I’ll look into this more.
Plot a heatmap for genes that are differentially expressed between parent phenotypes at elevated temperature
Generate a heatmap of these interaction DEGs.
Set themes and metadata.
legend_names_col = colnames(assay(DEG_interaction_vst))
df <- as.data.frame(colData(gdds)[,c("temperature","parent")])
df$temperature <- factor(as.character(df$temperature), levels=c("27", "30", "33"))
df$parent <- factor(as.character(df$parent), levels=c("Wildtype", "Bleached", "Nonbleached"))
df<-df%>%
arrange(temperature, parent)
ann_colors = list(
temperature = c("27" = "lightblue", "30" = "pink2","33"= "red2"),
parent = c("Bleached" = "orange", "Nonbleached" = "brown4", "Wildtype" = "gray")
)
Plot heatmap.
pdf("figures/rna_seq/DEG_heatmap_interaction.pdf", width=15, height=20)
pheatmap(assay(DEG_interaction_vst), cluster_rows=TRUE, show_rownames=FALSE, color=inferno(10), show_colnames=TRUE, fontsize_row=3, scale="row", cluster_cols=TRUE, annotation_col=df, annotation_colors = ann_colors, labels_col=legend_names_col)
dev.off()
This shows some clear genes that are either up or down regulated in the bleached larvae compared to other parents. I’m excited to see what these genes are doing in functional enrichment analyses.
Run a PERMANOVA and plot a PCA for interaction comparisons at elevated temperatures
Conduct PERMANOVA by parent and temperature of interaction DEGs. Here we are testing the expression of the DEG’s identified at 30°C and 33°C.
Export data for PERMANOVA test.
test_DEG_interaction<-t(assay(DEG_interaction_vst)) #export as matrix
test_DEG_interaction<-as.data.frame(test_DEG_interaction)
#add category columns
test_DEG_interaction$sample<-rownames(test_DEG_interaction)
test_DEG_interaction$temperature<-metadata$temperature[match(test_DEG_interaction$sample, metadata$sample)]
test_DEG_interaction$parent<-metadata$parent[match(test_DEG_interaction$sample, metadata$sample)]
Build PERMANOVA model for DEG’s that are different between NB and B at elevated temperatures.
dim(test_DEG_interaction)
scaled_test_DEG_interaction <-prcomp(test_DEG_interaction[c(1:55)], scale=TRUE, center=TRUE)
fviz_eig(scaled_test_DEG_interaction)
# scale data
vegan_DEG_interaction <- scale(test_DEG_interaction[c(1:55)])
# PerMANOVA
permanova_DEG_interaction<-adonis2(vegan_DEG_interaction ~ parent*temperature, data = test_DEG_interaction, method='eu')
permanova_DEG_interaction
About 30% of variance is explained by first PC followed by 15% on the second PC and <10% on the third PC.
The results are:
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = vegan_DEG_interaction ~ parent * temperature, data = test_DEG_interaction, method = "eu")
Df SumOfSqs R2 F Pr(>F)
parent 2 1609.1 0.13143 6.0983 0.001 ***
temperature 2 3496.3 0.28558 13.2504 0.001 ***
parent:temperature 4 1200.7 0.09807 2.2752 0.001 ***
Residual 45 5936.9 0.48492
Total 53 12243.0 1.00000
Interaction is significant as expected. Strong parent and temperature effects are also present as expected.
Plot a PCA of DEG’s for the DEGs from this interaction.
gPCAdata_DEG_interaction <- plotPCA(DEG_interaction_vst, intgroup = c("temperature", "parent"), returnData=TRUE)
percentVar_DEG_interaction <- round(100*attr(gPCAdata_DEG_interaction, "percentVar")) #plot PCA of samples with all data
DEG_PCA_interaction <- ggplot(gPCAdata_DEG_interaction, aes(PC1, PC2, color=parent, shape=temperature)) +
geom_point(size=3) +
ggtitle(expression(bold("DEGs: NB vs B at 30-33°C (55 genes)")))+
xlab(paste0("PC1: ",percentVar_DEG_parent[1],"% variance")) +
ylab(paste0("PC2: ",percentVar_DEG_parent[2],"% variance")) +
scale_color_manual(name="Parental Phenotype", values=c("Wildtype"="gray", "Bleached"="orange", "Nonbleached"="brown4"))+
scale_shape_manual(name="Temperature", values=c(19,17,15))+
geom_text(x=1, y=-2.5, label="p[Parent]=0.001", color="black")+
geom_text(x=1, y=-3, label="p[Temperature]=0.001", color="black")+
geom_text(x=1, y=-3.5, label="p[Interaction]=0.001", color="black")+
theme_classic() + #Set background color
theme(panel.border = element_blank(), # Set border
#panel.grid.major = element_blank(), #Set major gridlines
#panel.grid.minor = element_blank(), #Set minor gridlines
axis.line = element_line(colour = "black"), #Set axes color
plot.background=element_blank()); DEG_PCA_interaction
ggsave("figures/rna_seq/pca_DEG_interaction.pdf", DEG_PCA_interaction, width=8, height=4)
ggsave("figures/rna_seq/pca_DEG_interaction.png", DEG_PCA_interaction, width=8, height=4)
From this we can see very clear separation between parental groups at 33°C with moderate separation between 30°C, but far less than seen at 33°C. There is not much if any separation at 27°C. This shows that gene expression is more differential between parents at 33°C, but not at lower temperatures. This is exactly what we are looking for in the interaction.
Save lists of gene counts and DEG information
Next, export the list of DEGs with DESeq2 results (p-values and fold change data for each contrast) and a normalized gene count matrix for that list of DEGs for temperature, parent, and interactive effects.
Temperature
DEG_temperature
: List of DEG’s and fold change/pvalues
DEG_temperature_vst
: Count matrix of genes from DEG list
deg_temperature_list<-DEG_temperature%>%
select(gene, everything())
write_csv(deg_temperature_list, "output/rna_seq/temperature_deg_list.csv")
dim(DEG_temperature_vst)
deg_temperature_counts<-t(assay(DEG_temperature_vst))
deg_temperature_counts<-as.data.frame(deg_temperature_counts)
dim(deg_temperature_counts)
deg_temperature_counts$sample<-rownames(deg_temperature_counts)
deg_temperature_counts<-deg_temperature_counts%>%
select(sample, everything())
write_csv(deg_temperature_counts, "output/rna_seq/temperature_deg_counts.csv")
Parent
DEG_parent
: List of DEG’s and fold change/pvalues
DEG_parent_vst
: Count matrix of genes from DEG list
deg_parent_list<-DEG_parent%>%
select(gene, everything())
write_csv(deg_parent_list, "output/rna_seq/parent_deg_list.csv")
dim(DEG_parent_vst)
deg_parent_counts<-t(assay(DEG_parent_vst))
deg_parent_counts<-as.data.frame(deg_parent_counts)
dim(deg_parent_counts)
deg_parent_counts$sample<-rownames(deg_parent_counts)
deg_parent_counts<-deg_parent_counts%>%
select(sample, everything())
write_csv(deg_parent_counts, "output/rna_seq/parent_deg_counts.csv")
NB vs B at elevated temperature
DEG_interaction_unique
: List of DEG’s and fold change/pvalues
DEG_interaction_vst
: Count matrix of genes from DEG list
dim(DEG_interaction_unique)
deg_interaction_list<-DEG_interaction_unique%>%
select(gene, everything())
write_csv(deg_interaction_list, "output/rna_seq/NBvsB_30-33_deg_list.csv")
dim(DEG_interaction_vst)
deg_interaction_counts<-t(assay(DEG_interaction_vst))
deg_interaction_counts<-as.data.frame(deg_interaction_counts)
dim(deg_interaction_counts)
deg_interaction_counts$sample<-rownames(deg_interaction_counts)
deg_interaction_counts<-deg_interaction_counts%>%
select(sample, everything())
write_csv(deg_interaction_counts, "output/rna_seq/NBvsB_30-33_deg_counts.csv")
All genes detected
List of all genes in the dataset (gvst) = 24,005 genes which will serve as a reference for functional enrichment.
dim(gvst)
all_genes<-t(assay(gvst))
all_genes<-as.data.frame(all_genes)
dim(all_genes)
all_genes$sample<-rownames(all_genes)
all_genes<-all_genes%>%
select(sample, everything())
write_csv(file="output/rna_seq/all_genes.csv", all_genes)
Next steps
Next, I am going to use the exported DEG lists and gene count matrices to conduct functional enrichment analyses.
Returning to our hypotheses, here are a summary of the DEG analyses so far:
- Gene expression will be different under elevated temperature (27°C vs 30-33°C) with increased expression of stress response genes. We found differentially expressed genes at elevated temperatures with strong temperature effects on gene expression.
- Gene expression will be different by parental phenotype. We found differentially expressed genes between each parental phenotype with effects of parent on gene expression.
- There will be differentially expressed genes between bleached and nonbleached offspring at high temperature, due to the parental thermal tolerance history. We expect Nonbleached offspring to show transcriptomic indicators of tolerance to stress. We found differentially expressed genes between nonbleached and bleached parental phenotypes at elevated temperatures.
Now I will conduct functional enrichment to look at functions of these DEGs.