Preliminary lipidomic and metabolomics analysis of Hawaii 2023 larval data
This post details preliminary analysis of lipidomic and metabolomic data for the Hawaii 2023 larval project.
Overview
This post details lipidomic and metabolomic analyses for the Montipora capitata 2023 larval thermal tolerance project in collaboration with Jennifer Matthews at University of Technology Sydney. See my notebook posts and my GitHub repo for information on this project.
Lipidomics and metabolomic data were obtained through a tri-extraction process to obtain lipid, metabolite, and protein pellet fractions from each sample. Lipids were analyzed using LCMS at University of Technology Sydney and metabolomics were analyzed via GCMS at Metabolomics Australia. See my notebook posts for more information. Protein was quantified using a Bradford assay and is used for normalization.
Files for this project include lipidomic and metabolomic files for 54 samples. There are three parental phenotypes of these larvae: Nonbleached (from parents that were unbleached during a previous heat wave), bleached (from parents that bleached and recovered during a previous heat wave), and wildtype (collected as a sample from the natural reef). Those from bleached parents are dominated by Cladocopium symbionts and those from wildtype and non-bleached colonies have a mixture of Cladocopium and Durusdinium symbionts.
Each of these parental phenotypes was exposed to 27, 30, or 33°C for 4 hour incubations at 9 days post fertilization. We have n=6 biological replicates per parental phenotype x temperature treatment. There are no random effects, temperature exposures were conducted in water baths in sealed glass jars with n=1 per temperature. Environmental data for these incubations is in my GitHub repo.
At this point in analysis, metabolomics data has undergone initial QC checks and curation (J. Matthews) and lipidomics data underwent peak identification and selection in MS-DIAL (A. Huffmyer). All data are available on GitHub here.
In this analysis, I analyzed peak area as well as peak height for lipidomics. The entry below contains full scripts for peak area as well as metabolomic analyses. I have added a section for analysis of peak height that just contains the output figures so that we can view any differences between results for height and area.
Lipidomics (Peak area)
library(tidyverse)
library(stringr)
library(readxl)
library(purrr)
library(lubridate)
library(ggplot2)
library(broom)
library(cowplot)
library(mixOmics)
library(dplyr)
library(conflicted)
library(vegan)
library(factoextra)
library(ggfortify)
library(RVAideMemoire)
library(ComplexHeatmap)
library(circlize)
library(tibble)
library(viridis)
library(genefilter)
library(UpSetR)
conflict_prefer("select", winner = "dplyr", quiet = FALSE)
conflict_prefer("filter", winner = "dplyr", quiet = FALSE)
conflict_prefer("rename", winner = "dplyr", quiet = FALSE)
Data manipulation and preparations
Read in all files
metadata<-read_csv("data/lipids_metabolites/lipidomics/lipid_metadata_huffmyer_matthews.csv")%>%
rename(sample=`Vial ID`, tube=`Sample ID`, temperature=Temperature, parent=Parent, type=Type, code=Code)
neg_values<-read_csv("data/lipids_metabolites/lipidomics/processed-data/negative/PeakValues_2_2024_09_19_13_01_19.csv")
neg_peaks<-read_csv("data/lipids_metabolites/lipidomics/processed-data/negative/PeakMaster_2_2024_09_19_13_01_19.csv")
pos_values<-read_csv("data/lipids_metabolites/lipidomics/processed-data/positive/PeakValues_3_2024_09_19_14_24_22.csv")
pos_peaks<-read_csv("data/lipids_metabolites/lipidomics/processed-data/positive/PeakMaster_3_2024_09_19_14_24_22.csv")
protein<-read_csv("data/lipids_metabolites/protein/protein_calculated.csv")
Add protein data to the metadata.
metadata$protein.ug<-protein$`Total Protein ug`[match(metadata$tube, protein$Sample)]
Negative polarity
How many peaks do we have?
length(neg_peaks$`Metabolite name`)
162 peaks.
Reformat data set to be useful.
neg_peaks<-neg_peaks%>%
rename(peak=`Alignment ID`, lipid=`Metabolite name`, formula=Formula, ontology=Ontology)%>%
select(peak, lipid, formula, ontology)%>%
mutate(polarity=c("negative"))
neg_values<-neg_values%>%
rename(peak=ID, file=File, group=Class, area=Area)%>%
select(peak, file, group, area)%>%
mutate(polarity=c("negative"))
The data frames now contain peak
for a unique ID for each lipid detected. This number links the two files. Area represents the normalized area after internal standard normalization. We will further normalize this value below.
Format the file names to correspond to sample IDs in the metadata sheet.
#first remove the test PBQCs ran at 5 and 10 uL loadings and keep the standard 20 uL loading
neg_values<-neg_values%>%
filter(!file=="20240423_Mcap_neg_PBQC1_5")%>%
filter(!file=="20240423_Mcap_neg_PBQC1_10")%>%
mutate(file = str_replace(file, "20240423_Mcap_neg_PBQC1_20", "20240423_Mcap_neg_PBQC1"))%>%
mutate(sample = str_extract(file, "[^_]+$"))
Merge together all data from the metadata, values, and peak information to one data frame.
negative<-left_join(metadata, neg_values)
negative<-negative%>%
select(sample, temperature, parent, type, peak, area, polarity, protein.ug)
negative<-left_join(negative, neg_peaks)
Next, add together the values for all peaks identified as the same metabolite for each sample.
How many lipids have multiple peaks detected separately?
length(unique(neg_peaks$lipid))
There are 162 unique lipids and 162 peaks.
Show an example of entries that have the same metabolite with multiple peak IDs.
filtered_values <- negative %>%
group_by(lipid) %>%
filter(n_distinct(peak) > 1) %>%
ungroup()%>%
arrange(lipid, sample, peak)
print(filtered_values)
No duplicates.
Remove PBQC’s and Blanks. These were used for peak identification in MSDIAL and are not used downstream at this point.
negative<-negative%>%
filter(!type=="PBQC")%>%
filter(!type=="Blank")
Conduct internal standard normalization.
Peak 16879 (PG) will be used as our internal standard for negative polarity. It was consistent across samples (low fold change), high signal:noise, and is mid-retention.
#pull out the name of the standard being used
standard_neg<-neg_peaks%>%
filter(peak=="16879")%>%
pull(lipid)
#make a dataframe of standard values for each sample
standard_neg_value<-negative%>%
filter(lipid %in% standard_neg)%>%
group_by(sample)%>%
summarise(area_standard=mean(area, na.rm=TRUE))
#add this value back into the master data frame for corrections.
negative<-negative%>%
left_join(.,standard_neg_value)
Remove sample 1, it did not have a signal for the internal standard and was an outlier in MS-DIAL analysis.
negative<-negative%>%
filter(!sample=="1")
Normalize to the standard.
negative<-negative%>%
mutate(area_corrected=area/area_standard)
Finally, normalize corrected area to sample protein. This will generate units of area / ug protein.
negative<-negative%>%
mutate(area_normalized=area_corrected/protein.ug)
Repeat for positive polarity.
Positive polarity
How many peaks do we have?
length(pos_peaks$`Metabolite name`)
896 peaks
Reformat data set to be useful.
pos_peaks<-pos_peaks%>%
rename(peak=`Alignment ID`, lipid=`Metabolite name`, formula=Formula, ontology=Ontology)%>%
select(peak, lipid, formula, ontology)%>%
mutate(polarity=c("positive"))
pos_values<-pos_values%>%
rename(peak=ID, file=File, group=Class, area=Area)%>%
select(peak, file, group, area)%>%
mutate(polarity=c("positive"))
The data frames now contain peak
for a unique ID for each lipid detected. This number links the two files. Area represents the normalized area after internal standard normalization. We will further normalize this value below.
Format the file names to correspond to sample IDs in the metadata sheet.
pos_values<-pos_values%>%
mutate(sample = str_extract(file, "[^_]+$"))
Merge together all data from the metadata, values, and peak information to one data frame.
positive<-left_join(metadata, pos_values)
positive<-positive%>%
select(sample, temperature, parent, type, peak, area, polarity, protein.ug)
positive<-left_join(positive, pos_peaks)
Next, add together the values for all peaks identified as the same metabolite for each sample.
How many lipids have multiple peaks detected separately?
length(unique(pos_peaks$lipid))
There are 896 unique lipids and 896 total peaks.
Show an example of entries that have the same metabolite with multiple peak IDs.
filtered_values <- positive %>%
group_by(lipid) %>%
filter(n_distinct(peak) > 1) %>%
ungroup()%>%
arrange(lipid, sample, peak)
print(filtered_values)
levels(factor(filtered_values$lipid))
No duplicates
Remove PBQC’s and blanks. These were used for peak identification in MSDIAL and are not used downstream at this point.
positive<-positive%>%
filter(!type=="PBQC")%>%
filter(!type=="Blank")
Conduct internal standard normalization.
Peak 19029 (LPC) will be used as our internal standard for positive polarity. It was consistent across samples (low fold change), high signal:noise, and was the best representative from standards.
#pull out the name of the standard being used
standard_pos<-pos_peaks%>%
filter(peak=="19029")%>%
pull(lipid)
#make a dataframe of standard values for each sample
standard_pos_value<-positive%>%
filter(lipid %in% standard_pos)%>%
group_by(sample)%>%
summarise(area_standard=mean(area, na.rm=TRUE))
#add this value back into the master data frame for corrections.
positive<-positive%>%
left_join(.,standard_pos_value)
Remove sample 1, it did not have a signal for the internal standard for negative and therefore is being removed from the data analysis.
positive<-positive%>%
filter(!sample=="1")
Normalize to the standard.
positive<-positive%>%
mutate(area_corrected=area/area_standard)
Finally, normalize corrected area to sample protein. This will generate units of area / ug protein.
positive<-positive%>%
mutate(area_normalized=area_corrected/protein.ug)
Join polarities together
Join positive and negative datasets.
length(negative$lipid)
length(positive$lipid)
lipids<-rbind(negative, positive)
length(lipids$lipid)
Select polarity for peaks in both polarity files. How many lipids are detected at both polarities?
filtered_values <- lipids %>%
group_by(lipid) %>%
filter(n_distinct(polarity) > 1) %>%
ungroup()%>%
arrange(lipid, sample, polarity)
print(filtered_values)
length(unique(filtered_values$lipid))
There are 40 lipids detected at both polarities. Select the polarity with the highest Area value for each lipid.
# Identify lipids with both polarities
lipids_with_both_polarities <- lipids %>%
group_by(lipid) %>%
filter(n_distinct(polarity) == 2)
# Filter lipids with both polarities to keep only the one with the highest area
filtered_lipids <- lipids_with_both_polarities %>%
group_by(lipid) %>%
filter(area == max(area)) %>%
ungroup()
# Identify lipids that only have one entry (i.e., do not have both polarities)
single_entry_lipids <- lipids %>%
group_by(lipid) %>%
filter(n_distinct(polarity) == 1) %>%
ungroup()
# Combine filtered lipids and single entry lipids
lipids_filtered <- bind_rows(filtered_lipids, single_entry_lipids)
How many lipids are detected at both polarities?
filtered_values <- lipids_filtered %>%
group_by(lipid) %>%
filter(n_distinct(polarity) > 1) %>%
ungroup()%>%
arrange(lipid, sample, polarity)
print(filtered_values)
length(unique(filtered_values$lipid))
Duplicates are now taken care of.
View ontology categories available.
levels(as.factor(lipids_filtered$ontology))
ontology<-lipids_filtered%>%ungroup()%>%select(lipid, formula, ontology)%>%unique()
[1] “AHexCAS” “AHexCer” “AHexCS” “AHexSIS” “ASM” “BileAcid” “BMP”
[8] “BRSE” “CAR” “CASE” “CE” “Cer_ADS” “Cer_AP” “Cer_AS”
[15] “Cer_EOS” “Cer_HDS” “Cer_HS” “Cer_NDS” “Cer_NP” “Cer_NS” “CL”
[22] “CoQ” “DCAE” “DEGSE” “DG” “DGDG” “DMPE” “EtherDG”
[29] “EtherDGDG” “EtherLPC” “EtherLPE” “EtherMGDG” “EtherOxPC” “EtherPC” “EtherPE”
[36] “EtherPI” “EtherPS” “EtherTG” “FA” “FAHFA” “HBMP” “HexCer_AP”
[43] “HexCer_HS” “HexCer_NDS” “KDCAE” “KLCAE” “LCAE” “LPC” “LPE”
[50] “MG” “MGDG” “NAE” “NAGly” “NAGlySer” “NAOrn” “NATau”
[57] “Others” “OxFA” “OxPC” “OxPE” “OxPI” “OxTG” “PA”
[64] “PC” “PE” “PEtOH” “PG” “PI” “PMeOH” “PS”
[71] “SHexCer” “SISE” “SL” “SM” “SSulfate” “ST” “TG”
[78] “TG_EST” “Unknown” “VAE”
lipids_filtered<-lipids_filtered%>%
select(sample, lipid, temperature, parent, type, polarity, formula, ontology, area_normalized)%>%
group_by(sample, lipid, temperature, parent, type, formula, ontology)%>%
summarize(area_normalized=sum(area_normalized, na.rm=TRUE))
Remove standard lipids - those with (d7) or (d9) in the lipid name.
lipid_d7 <- lipids_filtered %>%
filter(str_detect(lipid, "\\(d7\\)"))%>%
pull(lipid)%>%
unique()
lipid_d9 <- lipids_filtered %>%
filter(str_detect(lipid, "\\(d9\\)"))%>%
pull(lipid)%>%
unique()
lipids_filtered<-lipids_filtered%>%
filter(!lipid %in% lipid_d7)%>%
filter(!lipid %in% lipid_d9)
length(unique(lipids_filtered$lipid))
We have 1009 lipids in our dataset.
hist(lipids_filtered$area_normalized)
Plot and remove outliers.
wide_data<-lipids_filtered%>%
ungroup()%>%
select(!formula)%>%
select(!ontology)%>%
select(!type)%>%
pivot_wider(names_from=lipid, values_from=area_normalized)
wide_data%>%
select_if(~ any(is.na(.)))
Replace NA’s with 0’s.
wide_data[is.na(wide_data)] <- 0
wide_data%>%
select_if(~ any(is.na(.)))
str(wide_data)
wide_data$temperature<-factor(wide_data$temperature, levels=c("Ambient", "Moderate (+3)", "High (+6)"))
wide_data$parent<-factor(wide_data$parent, levels=c("Wildtype", "Bleached", "Nonbleached"))
All characteristics are renamed.
scaled.pca<-prcomp(wide_data[c(4:1012)], scale=TRUE, center=TRUE)
pca1<-ggplot2::autoplot(scaled.pca, data=wide_data, frame.colour="parent", loadings=FALSE, colour="parent", shape="temperature", loadings.label.colour="black", loadings.colour="black", loadings.label=FALSE, frame=FALSE, loadings.label.size=5, loadings.label.vjust=-1, size=5) +
scale_color_manual(name="Parental Phenotype", values=c("Wildtype"="gray", "Bleached"="orange", "Nonbleached"="brown4"))+
scale_fill_manual(name="Parental Phenotype", values=c("Wildtype"="gray", "Bleached"="orange", "Nonbleached"="brown4"))+
scale_shape_manual(name="Temperature", values=c(19,17,15))+
geom_text(aes(x = PC1, y = PC2, label = sample), vjust = -0.5)+
theme_classic()+
theme(legend.text = element_text(size=18),
legend.position="none",
plot.background = element_blank(),
legend.title = element_text(size=18, face="bold"),
axis.text = element_text(size=18),
axis.title = element_text(size=18, face="bold"));pca1
Sample 23 and 48 are outliers, remove.
wide_data<-wide_data%>%
filter(!sample=="23")%>%
filter(!sample=="48")
lipids_filtered<-lipids_filtered%>%
filter(!sample=="23")%>%
filter(!sample=="48")
Conduct filtering for relative standard deviation
Make a dataframe that calculates the standard deviation of the lipid area within treatment relative to the mean (i.e., how “large” is the SD).
The formula will be:
(sd*100)/mean
Calculate for each lipid in each treatment group.
test<-lipids_filtered%>%
group_by(lipid, parent, temperature)%>%
summarise(RSD=(sd(area_normalized)*100)/mean(area_normalized))
length(unique(test$lipid))
hist(test$RSD)
1009 lipids before
Keep lipids with RSD <40 in all treatments.
test <- test %>%
group_by(lipid, parent, temperature) %>%
filter(all(RSD < 40))
length(unique(test$lipid))
keep_list<-c(unique(test$lipid))
hist(test$RSD)
843 metabolites after
Filter the filtered dataset to include the selected metabolites after RSD filtering.
lipids_filtered<-lipids_filtered%>%
filter(lipid %in% keep_list)
We are now ready to conduct analysis!
Export the processed dataset.
write_csv(lipids_filtered, file="output/lipids_metabolites/lipids/processed_lipids.csv")
Unsupervised analysis
Conduct a PCA and PERMANOVA to examine the effects of parent and temperature on the lipidome of coral larvae.
Convert to wide format.
wide_data<-lipids_filtered%>%
ungroup()%>%
select(!formula)%>%
select(!ontology)%>%
select(!type)%>%
pivot_wider(names_from=lipid, values_from=area_normalized)
wide_data%>%
select_if(~ any(is.na(.)))
No columns have NA’s.
str(wide_data)
wide_data$temperature<-factor(wide_data$temperature, levels=c("Ambient", "Moderate (+3)", "High (+6)"))
wide_data$parent<-factor(wide_data$parent, levels=c("Wildtype", "Bleached", "Nonbleached"))
All characteristics are renamed.
View scree plot.
scaled.pca<-prcomp(wide_data[c(4:846)], scale=TRUE, center=TRUE)
fviz_eig(scaled.pca)
Prepare a PCA plot
# scale data
vegan <- scale(wide_data[c(4:846)])
# PerMANOVA
permanova<-adonis2(vegan ~ parent*temperature, data = wide_data, method='eu')
permanova
adonis2(formula = vegan ~ parent * temperature, data = wide_data, method = "eu")
Df SumOfSqs R2 F Pr(>F)
parent 2 3925 0.09312 2.5252 0.024 *
temperature 2 3346 0.07938 2.1525 0.050 *
parent:temperature 4 2238 0.05309 0.7198 0.703
Residual 42 32642 0.77441
Total 50 42150 1.00000
Because the unsupervised analysis shows an effect of parent and temperature, we will proceed with PLSDA analyses to look at VIPS that separate parent groups and those that separate temperature groups.
pca1<-ggplot2::autoplot(scaled.pca, data=wide_data, frame.colour="parent", loadings=FALSE, colour="parent", shape="temperature", loadings.label.colour="black", loadings.colour="black", loadings.label=FALSE, frame=FALSE, loadings.label.size=5, loadings.label.vjust=-1, size=5) +
scale_color_manual(name="Parental Phenotype", values=c("Wildtype"="gray", "Bleached"="orange", "Nonbleached"="brown4"))+
scale_fill_manual(name="Parental Phenotype", values=c("Wildtype"="gray", "Bleached"="orange", "Nonbleached"="brown4"))+
scale_shape_manual(name="Temperature", values=c(19,17,15))+
geom_text(aes(x = PC1, y = PC2, label = sample), vjust = -0.5)+
theme_classic()+
theme(legend.text = element_text(size=18),
legend.position="none",
plot.background = element_blank(),
legend.title = element_text(size=18, face="bold"),
axis.text = element_text(size=18),
axis.title = element_text(size=18, face="bold"));pca1
pca2<-ggplot2::autoplot(scaled.pca, data=wide_data, frame.colour="parent", loadings=FALSE, colour="parent", shape="temperature", loadings.label.colour="black", loadings.colour="black", loadings.label=FALSE, frame=TRUE, loadings.label.size=5, loadings.label.vjust=-1, size=4) +
scale_color_manual(name="Parental Phenotype", values=c("Wildtype"="gray", "Bleached"="orange", "Nonbleached"="brown4"))+
scale_fill_manual(name="Parental Phenotype", values=c("Wildtype"="gray", "Bleached"="orange", "Nonbleached"="brown4"))+
scale_shape_manual(name="Temperature", values=c(19,17,15))+
theme_classic()+
theme(legend.text = element_text(size=14),
legend.position="right",
plot.background = element_blank(),
legend.title = element_text(size=14, face="bold"),
axis.text = element_text(size=14),
axis.title = element_text(size=14, face="bold"));pca2
pca2b<-ggplot2::autoplot(scaled.pca, data=wide_data, frame.colour="temperature", loadings=FALSE, colour="temperature", shape="parent", loadings.label.colour="black", loadings.colour="black", loadings.label=FALSE, frame=TRUE, loadings.label.size=5, loadings.label.vjust=-1, size=4) +
scale_color_manual(name="Temperature", values=c("blue", "orange", "red"))+
scale_fill_manual(name="Temperature", values=c("blue", "orange", "red"))+
scale_shape_manual(name="Parental Phenotype", values=c(19,17,15))+
theme_classic()+
theme(legend.text = element_text(size=14),
legend.position="right",
plot.background = element_blank(),
legend.title = element_text(size=14, face="bold"),
axis.text = element_text(size=14),
axis.title = element_text(size=14, face="bold"));pca2b
pca_plots<-plot_grid(pca2, pca2b, ncol=2, nrow=1)
ggsave(pca_plots, filename="figures/lipids/pca_plots.jpeg", width=12, height=4)
Supervised PLS-DA analysis
Script modified from the following tutorial: https://mixomicsteam.github.io/Bookdown/plsda.html
PLS-DA for the effect of parent
Generate a PLS-DA and plot.
#assigning datasets
X <- wide_data
levels(as.factor(X$parent))
levels(as.factor(X$temperature))
Y <- as.factor(X$parent) #select treatment names
Y
X<-X[4:846] #pull only data columns
# run PLSDA
MyResult.plsda <- plsda(X,Y, ncomp=2) # 1 Run the method with ncomp = groups -1
plotIndiv(MyResult.plsda) # 2 Plot the samples
cols<-c("gray", "orange", "brown4")
pdf("figures/lipids/PLSDA_Parent.pdf", width=9, height=6)
plotIndiv(MyResult.plsda, col=cols, ind.names = FALSE, legend=TRUE, legend.title = "Parent", ellipse = FALSE, title="", style = "graphics", centroid=FALSE, point.lwd = 2, cex=2)
dev.off()
plotIndiv(MyResult.plsda, col=cols, ind.names = FALSE, legend=TRUE, legend.title = "Parent", ellipse = FALSE, title="", style = "graphics", centroid=FALSE, point.lwd = 2, cex=2)
Set model to determine VIPs.
MyResult.plsda_parent <- plsda(X,Y, ncomp=2)
#view plsda model again
plotIndiv(MyResult.plsda_parent, col=cols, ind.names = FALSE, legend=TRUE, legend.title = "Parent", ellipse = TRUE, title="", style = "graphics", centroid=FALSE, point.lwd = 2, cex=2)
Extract VIP scores to view the lipids that are most highly differentiating lipids.
#extract
parent_VIP <- PLSDA.VIP(MyResult.plsda_parent, graph=TRUE)
parent_VIP_DF <- as.data.frame(parent_VIP[["tab"]])
parent_VIP_DF
# Converting row names to column
parent_VIP_table <- rownames_to_column(parent_VIP_DF, var = "Lipid")
#filter for VIP > 1
parent_VIP_1 <- parent_VIP_table %>%
filter(VIP >= 1)
write_csv(parent_VIP_1, "output/lipids_metabolites/lipids/Overall_Parent_VIPs.csv")
#plot
parent_VIP_1 %>%
arrange(VIP) %>%
ggplot( aes(x = VIP, y = reorder(Lipid,VIP,sum))) +
geom_point() +
ylab("Lipid") +
xlab("VIP Score") +
ggtitle("Parent VIP") +
theme_bw() + theme(panel.border = element_rect(linetype = "solid", color = "black"), panel.grid.major = element_blank(), #Makes background theme white
panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"))
pdf("figures/lipids/Parent_VIP.pdf", width=10, height=35)
parent_VIP_1 %>%
arrange(VIP) %>%
ggplot( aes(x = VIP, y = reorder(Lipid,VIP,sum))) +
geom_point() +
ylab("Lipid") +
xlab("VIP Score") +
ggtitle("Parent VIP > 1") +
theme_bw() + theme(panel.border = element_rect(linetype = "solid", color = "black"), panel.grid.major = element_blank(), #Makes background theme white
panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"))
dev.off()
These lipids represent VIPs that drive separation between parent groups.
Make a list of vips.
parent_vip_list<-parent_VIP_1%>%
pull(Lipid)
There are 237 VIPs.
Plot heatmap of z score of lipids across life stages.
# Filter data for VIP metabolites
parent_vip_data <- lipids_filtered %>% filter(lipid %in% parent_vip_list)
# Convert counts to Z-scores
parent_vip_data <- parent_vip_data %>%
group_by(lipid) %>%
mutate(z_score = scale(area_normalized)) %>%
group_by(parent, lipid)%>%
summarise(z_score = mean(z_score, na.rm=TRUE))%>%
ungroup()
# Pivot data to have metabolites as rows and lifestage as columns
heatmap_data <- parent_vip_data %>%
dplyr::select(lipid, parent, z_score) %>%
spread(key = parent, value = z_score)
# Convert to matrix and set rownames
heatmap_matrix <- as.matrix(column_to_rownames(heatmap_data, var = "lipid"))
# Create the heatmap
heatmap <- Heatmap(
heatmap_matrix,
name = "Z-score",
col = inferno(10),
cluster_rows = TRUE,
cluster_columns = FALSE,
show_row_names = FALSE,
show_column_names = TRUE,
#row_split=6,
row_title = "Lipid",
column_names_gp = gpar(fontsize = 12, border=FALSE),
column_names_rot = 45,
row_gap = unit(1, "mm"),
border = TRUE,
row_names_gp = gpar(fontsize = 12, alpha = 0.75, border = FALSE)
)
# Draw the heatmap
draw(heatmap)
dev.off()
pdf("figures/lipids/VIP_heatmap_parent.pdf", width=4, height=6)
draw(heatmap)
dev.off()
There are sets of lipids that are elevated in bleached parent larvae and another set that are elevated in nonbleached and wildtype. This is interesting because larvae from bleached parents are dominated by Cladocopium and larvae from nonbleached and wildtype parents have a mixture of Cladocopium and Durusdinium symbionts.
View a PCA of these VIPs.
vip_data_parent<-lipids_filtered%>%
ungroup()%>%
select(!formula)%>%
select(!ontology)%>%
select(!type)%>%
filter(lipid %in% parent_vip_list)%>%
pivot_wider(names_from=lipid, values_from=area_normalized)
vip_data_parent%>%
select_if(~ any(is.na(.)))
No columns have NA’s.
vip_data_parent$temperature<-factor(vip_data_parent$temperature, levels=c("Ambient", "Moderate (+3)", "High (+6)"))
vip_data_parent$parent<-factor(vip_data_parent$parent, levels=c("Wildtype", "Bleached", "Nonbleached"))
All characteristics are renamed.
scaled.pca<-prcomp(vip_data_parent[c(4:240)], scale=TRUE, center=TRUE)
fviz_eig(scaled.pca)
Prepare a PCA plot
# scale data
vegan <- scale(vip_data_parent[c(4:240)])
# PerMANOVA
permanova<-adonis2(vegan ~ parent*temperature, data = vip_data_parent, method='eu')
permanova
adonis2(formula = vegan ~ parent * temperature, data = vip_data_parent, method = "eu")
Df SumOfSqs R2 F Pr(>F)
parent 2 2631.8 0.22209 7.0674 0.001 ***
temperature 2 874.1 0.07376 2.3472 0.031 *
parent:temperature 4 524.0 0.04422 0.7035 0.753
Residual 42 7820.1 0.65993
Total 50 11850.0 1.00000
pca_vip_parent1<-ggplot2::autoplot(scaled.pca, data=vip_data_parent, frame.colour="parent", loadings=FALSE, colour="parent", shape="temperature", loadings.label.colour="black", loadings.colour="black", loadings.label=FALSE, frame=TRUE, loadings.label.size=5, loadings.label.vjust=-1, size=4) +
scale_color_manual(name="Parental Phenotype", values=c("Wildtype"="gray", "Bleached"="orange", "Nonbleached"="brown4"))+
scale_fill_manual(name="Parental Phenotype", values=c("Wildtype"="gray", "Bleached"="orange", "Nonbleached"="brown4"))+
scale_shape_manual(name="Temperature", values=c(19,17,15))+
theme_classic()+
theme(legend.text = element_text(size=14),
legend.position="right",
plot.background = element_blank(),
legend.title = element_text(size=14, face="bold"),
axis.text = element_text(size=14),
axis.title = element_text(size=14, face="bold"));pca_vip_parent1
pca_vip_parent2<-ggplot2::autoplot(scaled.pca, data=vip_data_parent, frame.colour="temperature", loadings=FALSE, colour="temperature", shape="parent", loadings.label.colour="black", loadings.colour="black", loadings.label=FALSE, frame=TRUE, loadings.label.size=5, loadings.label.vjust=-1, size=4) +
scale_color_manual(name="Temperature", values=c("blue", "orange", "red"))+
scale_fill_manual(name="Temperature", values=c("blue", "orange", "red"))+
scale_shape_manual(name="Parental Phenotype", values=c(19,17,15))+
theme_classic()+
theme(legend.text = element_text(size=14),
legend.position="right",
plot.background = element_blank(),
legend.title = element_text(size=14, face="bold"),
axis.text = element_text(size=14),
axis.title = element_text(size=14, face="bold"));pca_vip_parent2
pca_plots_parent<-plot_grid(pca_vip_parent1, pca_vip_parent2, ncol=2, nrow=1)
ggsave(pca_plots_parent, filename="figures/lipids/pca_plots_parentVIP.jpeg", width=12, height=4)
PLS-DA for the effect of temperature
Generate a PLS-DA and plot.
#assigning datasets
X <- wide_data
levels(as.factor(X$parent))
levels(as.factor(X$temperature))
Y <- as.factor(X$temperature) #select treatment names
Y
levels(Y)
X<-X[4:846] #pull only data columns
# run PLSDA
MyResult.plsda <- plsda(X,Y, ncomp=2) # 1 Run the method with ncomp = groups -1
plotIndiv(MyResult.plsda) # 2 Plot the samples
cols<-c("blue", "orange", "red")
pdf("figures/lipids/PLSDA_Temperature.pdf", width=9, height=6)
plotIndiv(MyResult.plsda, col=cols, ind.names = FALSE, legend=TRUE, legend.title = "Temperature", ellipse = FALSE, title="", style = "graphics", centroid=FALSE, point.lwd = 2, cex=2)
dev.off()
plotIndiv(MyResult.plsda, col=cols, ind.names = FALSE, legend=TRUE, legend.title = "Temperature", ellipse = FALSE, title="", style = "graphics", centroid=FALSE, point.lwd = 2, cex=2)
Set model to determine VIPs.
MyResult.plsda_temperature <- plsda(X,Y, ncomp=2)
#view plsda model again
plotIndiv(MyResult.plsda_temperature, col=cols, ind.names = FALSE, legend=TRUE, legend.title = "Temperature", ellipse = TRUE, title="", style = "graphics", centroid=FALSE, point.lwd = 2, cex=2)
Extract VIP scores for the most highly differentiating lipids.
#extract
temperature_VIP <- PLSDA.VIP(MyResult.plsda_temperature, graph=TRUE)
temperature_VIP_DF <- as.data.frame(temperature_VIP[["tab"]])
temperature_VIP_DF
# Converting row names to column
temperature_VIP_table <- rownames_to_column(temperature_VIP_DF, var = "Lipid")
#filter for VIP > 1
temperature_VIP_1 <- temperature_VIP_table %>%
filter(VIP >= 1)
write_csv(temperature_VIP_1, "output/lipids_metabolites/lipids/Overall_Temperature_VIPs.csv")
#plot
temperature_VIP_1 %>%
arrange(VIP) %>%
ggplot( aes(x = VIP, y = reorder(Lipid,VIP,sum))) +
geom_point() +
ylab("Lipid") +
xlab("VIP Score") +
ggtitle("Temperature VIP") +
theme_bw() + theme(panel.border = element_rect(linetype = "solid", color = "black"), panel.grid.major = element_blank(), #Makes background theme white
panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"))
pdf("figures/lipids/Temperature_VIP.pdf", width=10, height=35)
temperature_VIP_1 %>%
arrange(VIP) %>%
ggplot( aes(x = VIP, y = reorder(Lipid,VIP,sum))) +
geom_point() +
ylab("Lipid") +
xlab("VIP Score") +
ggtitle("Temperature VIP > 1") +
theme_bw() + theme(panel.border = element_rect(linetype = "solid", color = "black"), panel.grid.major = element_blank(), #Makes background theme white
panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"))
dev.off()
These lists represent VIPs that drive separation between temperature groups.
Make a list of vips.
temperature_vip_list<-temperature_VIP_1%>%
pull(Lipid)
There are 251 VIPs.
Plot heatmap of z score of lipids across temperature.
# Filter data for VIP metabolites
temperature_vip_data <- lipids_filtered %>% filter(lipid %in% temperature_vip_list)
# Convert counts to Z-scores
temperature_vip_data <- temperature_vip_data %>%
group_by(lipid) %>%
mutate(z_score = scale(area_normalized)) %>%
group_by(temperature, lipid)%>%
summarise(z_score = mean(z_score, na.rm=TRUE))%>%
ungroup()
temperature_vip_data$temperature<-factor(temperature_vip_data$temperature, levels=c("Ambient", "Moderate (+3)", "High (+6)"))
# Pivot data to have metabolites as rows and lifestage as columns
heatmap_data <- temperature_vip_data %>%
dplyr::select(lipid, temperature, z_score) %>%
spread(key = temperature, value = z_score)
# Convert to matrix and set rownames
heatmap_matrix <- as.matrix(column_to_rownames(heatmap_data, var = "lipid"))
# Create the heatmap
heatmap <- Heatmap(
heatmap_matrix,
name = "Z-score",
col = inferno(10),
cluster_rows = TRUE,
cluster_columns = FALSE,
show_row_names = FALSE,
show_column_names = TRUE,
#row_split=6,
row_title = "Lipid",
column_names_gp = gpar(fontsize = 12, border=FALSE),
column_names_rot = 45,
row_gap = unit(1, "mm"),
border = TRUE,
row_names_gp = gpar(fontsize = 12, alpha = 0.75, border = FALSE)
)
# Draw the heatmap
draw(heatmap)
dev.off()
pdf("figures/lipids/VIP_heatmap_temperature.pdf", width=4, height=6)
draw(heatmap)
dev.off()
View a PCA of these VIPs.
vip_data_temperature<-lipids_filtered%>%
ungroup()%>%
select(!formula)%>%
select(!ontology)%>%
select(!type)%>%
filter(lipid %in% temperature_vip_list)%>%
pivot_wider(names_from=lipid, values_from=area_normalized)
vip_data_temperature%>%
select_if(~ any(is.na(.)))
No columns have NA’s.
vip_data_temperature$temperature<-factor(vip_data_temperature$temperature, levels=c("Ambient", "Moderate (+3)", "High (+6)"))
vip_data_temperature$parent<-factor(vip_data_temperature$parent, levels=c("Wildtype", "Bleached", "Nonbleached"))
All characteristics are renamed.
scaled.pca<-prcomp(vip_data_temperature[c(4:254)], scale=TRUE, center=TRUE)
fviz_eig(scaled.pca)
Prepare a PCA plot
# scale data
vegan <- scale(vip_data_temperature[c(4:254)])
# PerMANOVA
permanova<-adonis2(vegan ~ parent*temperature, data = vip_data_temperature, method='eu')
permanova
adonis2(formula = vegan ~ parent * temperature, data = vip_data_temperature, method = "eu")
Df SumOfSqs R2 F Pr(>F)
parent 2 1228.1 0.09786 2.7436 0.018 *
temperature 2 1219.2 0.09715 2.7237 0.021 *
parent:temperature 4 702.6 0.05598 0.7848 0.681
Residual 42 9400.1 0.74901
Total 50 12550.0 1.00000
pca_vip_temperature1<-ggplot2::autoplot(scaled.pca, data=vip_data_temperature, frame.colour="parent", loadings=FALSE, colour="parent", shape="temperature", loadings.label.colour="black", loadings.colour="black", loadings.label=FALSE, frame=TRUE, loadings.label.size=5, loadings.label.vjust=-1, size=4) +
scale_color_manual(name="Parental Phenotype", values=c("Wildtype"="gray", "Bleached"="orange", "Nonbleached"="brown4"))+
scale_fill_manual(name="Parental Phenotype", values=c("Wildtype"="gray", "Bleached"="orange", "Nonbleached"="brown4"))+
scale_shape_manual(name="Temperature", values=c(19,17,15))+
theme_classic()+
theme(legend.text = element_text(size=14),
legend.position="right",
plot.background = element_blank(),
legend.title = element_text(size=14, face="bold"),
axis.text = element_text(size=14),
axis.title = element_text(size=14, face="bold"));pca_vip_temperature1
pca_vip_temperature2<-ggplot2::autoplot(scaled.pca, data=vip_data_temperature, frame.colour="temperature", loadings=FALSE, colour="temperature", shape="parent", loadings.label.colour="black", loadings.colour="black", loadings.label=FALSE, frame=TRUE, loadings.label.size=5, loadings.label.vjust=-1, size=4) +
scale_color_manual(name="Temperature", values=c("blue", "orange", "red"))+
scale_fill_manual(name="Temperature", values=c("blue", "orange", "red"))+
scale_shape_manual(name="Parental Phenotype", values=c(19,17,15))+
theme_classic()+
theme(legend.text = element_text(size=14),
legend.position="right",
plot.background = element_blank(),
legend.title = element_text(size=14, face="bold"),
axis.text = element_text(size=14),
axis.title = element_text(size=14, face="bold"));pca_vip_temperature2
pca_plots_temperature<-plot_grid(pca_vip_temperature1, pca_vip_temperature2, ncol=2, nrow=1)
ggsave(pca_plots_temperature, filename="figures/lipids/pca_plots_temperatureVIP.jpeg", width=12, height=4)
PLS-DA for the effect of parent x temperature
Next, run a PLSDA to look for VIPs between groups (combination of parent x temperature treatment).
Generate a PLS-DA and plot.
#assigning datasets
X <- wide_data
levels(as.factor(X$parent))
levels(as.factor(X$temperature))
X$code<-paste(X$parent, X$temperature)
levels(as.factor(X$code))
Y <- as.factor(X$code) #select treatment names
Y
levels(Y)
X<-X[4:846] #pull only data columns
# run PLSDA
MyResult.plsda <- plsda(X,Y, ncomp=5) # 1 Run the method with ncomp = groups -1
plotIndiv(MyResult.plsda) # 2 Plot the samples
cols<-c("orange", "orange3", "orange4", "brown", "brown2", "brown4", "gray", "lightgray", "darkgray")
pdf("figures/lipids/PLSDA_interaction.pdf", width=9, height=6)
plotIndiv(MyResult.plsda, col=cols, ind.names = FALSE, legend=TRUE, legend.title = "Temperature", ellipse = FALSE, title="", style = "graphics", centroid=FALSE, point.lwd = 2, cex=2)
dev.off()
plotIndiv(MyResult.plsda, col=cols, ind.names = FALSE, legend=TRUE, legend.title = "Temperature", ellipse = FALSE, title="", style = "graphics", centroid=FALSE, point.lwd = 2, cex=2)
Set model to determine VIPs.
MyResult.plsda_interaction <- plsda(X,Y, ncomp=5)
#view plsda model again
plotIndiv(MyResult.plsda_interaction, col=cols, ind.names = FALSE, legend=TRUE, legend.title = "Parent x Temperature", ellipse = TRUE, title="", style = "graphics", centroid=FALSE, point.lwd = 2, cex=2)
Extract VIP scores of the most highly differentiating lipids.
#extract
interaction_VIP <- PLSDA.VIP(MyResult.plsda_interaction, graph=TRUE)
interaction_VIP_DF <- as.data.frame(interaction_VIP[["tab"]])
interaction_VIP_DF
# Converting row names to column
interaction_VIP_table <- rownames_to_column(interaction_VIP_DF, var = "Lipid")
#filter for VIP > 1
interaction_VIP_1 <- interaction_VIP_table %>%
filter(VIP >= 1)
write_csv(interaction_VIP_1, "output/lipids_metabolites/lipids/Overall_Interaction_VIPs.csv")
#plot
interaction_VIP_1 %>%
arrange(VIP) %>%
ggplot( aes(x = VIP, y = reorder(Lipid,VIP,sum))) +
geom_point() +
ylab("Lipid") +
xlab("VIP Score") +
ggtitle("Parent x Temperature VIP") +
theme_bw() + theme(panel.border = element_rect(linetype = "solid", color = "black"), panel.grid.major = element_blank(), #Makes background theme white
panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"))
pdf("figures/lipids/Interaction_VIP.pdf", width=10, height=35)
interaction_VIP_1 %>%
arrange(VIP) %>%
ggplot( aes(x = VIP, y = reorder(Lipid,VIP,sum))) +
geom_point() +
ylab("Lipid") +
xlab("VIP Score") +
ggtitle("Interaction VIP > 1") +
theme_bw() + theme(panel.border = element_rect(linetype = "solid", color = "black"), panel.grid.major = element_blank(), #Makes background theme white
panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"))
dev.off()
These lists represent VIPs that drive separation between parent x temperature groups.
Make a list of vips.
interaction_vip_list<-interaction_VIP_1%>%
pull(Lipid)
There are 302 VIPs.
Plot heatmap of z score of lipids across groups.
# Filter data for VIP metabolites
interaction_vip_data <- lipids_filtered %>% filter(lipid %in% interaction_vip_list)
# Convert counts to Z-scores
interaction_vip_data <- interaction_vip_data %>%
mutate(code=paste(parent, temperature))%>%
group_by(lipid) %>%
mutate(z_score = scale(area_normalized)) %>%
group_by(code, lipid)%>%
summarise(z_score = mean(z_score, na.rm=TRUE))%>%
ungroup()
interaction_vip_data$code<-factor(interaction_vip_data$code, levels=c("Bleached Ambient", "Bleached Moderate (+3)", "Bleached High (+6)", "Nonbleached Ambient", "Nonbleached Moderate (+3)", "Nonbleached High (+6)", "Wildtype Ambient", "Wildtype Moderate (+3)", "Wildtype High (+6)"))
# Pivot data to have metabolites as rows and lifestage as columns
heatmap_data <- interaction_vip_data %>%
dplyr::select(lipid, code, z_score) %>%
spread(key = code, value = z_score)
# Convert to matrix and set rownames
heatmap_matrix <- as.matrix(column_to_rownames(heatmap_data, var = "lipid"))
# Create the heatmap
heatmap <- Heatmap(
heatmap_matrix,
name = "Z-score",
col = inferno(10),
cluster_rows = TRUE,
cluster_columns = FALSE,
show_row_names = FALSE,
show_column_names = TRUE,
#row_split=4,
row_title = "Lipid",
column_names_gp = gpar(fontsize = 12, border=FALSE),
column_names_rot = 45,
row_gap = unit(1, "mm"),
border = TRUE,
row_names_gp = gpar(fontsize = 12, alpha = 0.75, border = FALSE)
)
# Draw the heatmap
draw(heatmap)
dev.off()
pdf("figures/lipids/VIP_heatmap_parentxtemperature.pdf", width=6, height=8)
draw(heatmap)
dev.off()
View a PCA of these VIPs.
vip_data_interaction<-lipids_filtered%>%
ungroup()%>%
select(!formula)%>%
select(!ontology)%>%
select(!type)%>%
filter(lipid %in% interaction_vip_list)%>%
pivot_wider(names_from=lipid, values_from=area_normalized)
vip_data_interaction%>%
select_if(~ any(is.na(.)))
No columns have NA’s.
vip_data_interaction$temperature<-factor(vip_data_interaction$temperature, levels=c("Ambient", "Moderate (+3)", "High (+6)"))
vip_data_interaction$parent<-factor(vip_data_interaction$parent, levels=c("Wildtype", "Bleached", "Nonbleached"))
All characteristics are renamed.
scaled.pca<-prcomp(vip_data_interaction[c(4:305)], scale=TRUE, center=TRUE)
fviz_eig(scaled.pca)
Prepare a PCA plot
# scale data
vegan <- scale(vip_data_interaction[c(4:305)])
# PerMANOVA
permanova<-adonis2(vegan ~ parent*temperature, data = vip_data_interaction, method='eu')
permanova
adonis2(formula = vegan ~ parent * temperature, data = vip_data_interaction, method = "eu")
Df SumOfSqs R2 F Pr(>F)
parent 2 2779.5 0.18407 5.8640 0.001 ***
temperature 2 1432.4 0.09486 3.0220 0.002 **
parent:temperature 4 934.5 0.06189 0.9858 0.450
Residual 42 9953.7 0.65918
Total 50 15100.0 1.00000
No interactive effect detected.
View a PCA with treatment code information.
test_wide_data<-vip_data_interaction%>%
mutate(code=paste(parent, temperature))
levels(as.factor(test_wide_data$code))
pca_interaction3<-ggplot2::autoplot(scaled.pca, data=test_wide_data, frame.colour="code", loadings=FALSE, colour="code", shape="code", loadings.label.colour="black", loadings.colour="black", loadings.label=FALSE, frame=TRUE, loadings.label.size=5, loadings.label.vjust=-1, size=4) +
scale_color_manual(name="Group", values=c("orange", "orange", "orange", "brown4", "brown4", "brown4", "gray", "gray", "gray"))+
scale_fill_manual(name="Group", values=c("orange", "orange", "orange", "brown4", "brown4", "brown4", "gray", "gray", "gray"))+
scale_shape_manual(name="Group", values=c(19,17,15,19,17,15,19,17,15))+
theme_classic()+
theme(legend.text = element_text(size=14),
legend.position="right",
plot.background = element_blank(),
legend.title = element_text(size=14, face="bold"),
axis.text = element_text(size=14),
axis.title = element_text(size=14, face="bold"));pca_interaction3
ggsave(pca_interaction3, filename="figures/lipids/pca_all_groups_interactionVIP.jpeg", width=8, height=6)
Output lists for LION
Conduct analysis in LION for lipid enrichment. This site is similar to metaboanalyst for enrichment analysis of lipids.
List of parental VIPS
parent_vip_list%>%
as.data.frame()%>%
write_csv(., file="output/lipids_metabolites/lipids/parent_VIP_list.csv")
No significant enrichment of any terms on LION.
List of temperature VIPS
temperature_vip_list%>%
as.data.frame()%>%
write_csv(., file="output/lipids_metabolites/lipids/temperature_VIP_list.csv")
No significant enrichment of any terms on LION.
List of interaction VIPS
interaction_vip_list%>%
as.data.frame()%>%
write_csv(., file="output/lipids_metabolites/lipids/interaction_VIP_list.csv")
Full list of lipids as the background.
full_lipid_list<-lipids_filtered%>%
pull(lipid)%>%
unique()%>%
as.data.frame()%>%
write_csv(., file="output/lipids_metabolites/lipids/full_lipid_list.csv")
What is the overlap in these lists? Create an upset plot.
# Combine lists into a data frame indicating membership (1 = present, 0 = absent)
upset_data <- data.frame(
Lipid = unique(c(parent_vip_list, temperature_vip_list, interaction_vip_list)),
List1 = as.numeric(unique(c(parent_vip_list, temperature_vip_list, interaction_vip_list)) %in% parent_vip_list),
List2 = as.numeric(unique(c(parent_vip_list, temperature_vip_list, interaction_vip_list)) %in% temperature_vip_list),
List3 = as.numeric(unique(c(parent_vip_list, temperature_vip_list, interaction_vip_list)) %in% interaction_vip_list)
)
# Convert data to the format needed for the UpSetR plot
plot_data <- upset_data[, -1]
row.names(plot_data) <- upset_data$Lipid
names(plot_data) <- c("Parent", "Temperature", "PxT Group")
# Generate the UpSet plot
pdf("figures/lipids/upset_VIP_list_overlap.pdf", width=6, height=6)
upset(plot_data, sets = c("Parent", "Temperature", "PxT Group"), keep.order = TRUE)
dev.off()
Plot enriched lipids in the group VIP data set
Results from LION:
- No enrichment in parent list
- No enrichment in temperature list
- fatty acids [FA], and fatty acids and conjugates [FA01] enriched in group specific list
Ontology FA
Add in lipid ontology values.
lipids$ontology<-ontology$ontology[match(lipids$lipid, ontology$lipid)]
Plot a heatmap of the target ontologies in the group VIP list (parent x temperature).
Fatty acids
# Filter data for VIP metabolites
FA_data <- lipids %>%
filter(lipid %in% interaction_vip_list) %>%
filter(ontology=="FA") %>%
droplevels()
length(unique(FA_data$lipid))
#16 fatty acids in this data set
# Convert counts to Z-scores
FA_data <- FA_data %>%
group_by(lipid) %>%
mutate(z_score = scale(area_normalized)) %>%
group_by(parent, temperature, lipid)%>%
summarise(z_score = mean(z_score, na.rm=TRUE))%>%
ungroup()
# Pivot data to have metabolites as rows and lifestage as columns
heatmap_data <- FA_data %>%
mutate(code=paste(parent, temperature))%>%
mutate(code=factor(code, levels=c("Bleached Ambient", "Bleached Moderate (+3)", "Bleached High (+6)", "Nonbleached Ambient", "Nonbleached Moderate (+3)", "Nonbleached High (+6)", "Wildtype Ambient", "Wildtype Moderate (+3)", "Wildtype High (+6)")))%>%
dplyr::select(lipid, code, z_score) %>%
spread(key = code, value = z_score)
# Convert to matrix and set rownames
heatmap_matrix <- as.matrix(column_to_rownames(heatmap_data, var = "lipid"))
# Create the heatmap
heatmap <- Heatmap(
heatmap_matrix,
name = "Z-score",
col = inferno(10),
cluster_rows = TRUE,
cluster_columns = FALSE,
show_row_names = TRUE,
show_column_names = TRUE,
#row_split=split,
#column_split=heatmap_data$code,
row_title = "Fatty Acids",
column_names_gp = gpar(fontsize = 12, border=FALSE),
column_names_rot = 45,
row_gap = unit(1, "mm"),
border = TRUE,
row_names_gp = gpar(fontsize = 12, alpha = 0.75, border = FALSE)
)
# Draw the heatmap
draw(heatmap)
dev.off()
pdf("figures/lipids/fatty_acids_VIP.pdf", width=6, height=8)
draw(heatmap)
dev.off()
This is interesting! There are several FAs that are elevated in moderate temperature in nonbleached and wildtype larvae, but not in bleached parent larvae. There is one FA, 28:7 that is depleted in bleached larvae and another, FA22:6 that is depleted in bleached larave at high temperature. THere are also several fatty acids that are strongly elevated in nonbleached larvae at moderate temperatures.
No other categories were enriched in the VIP lists.
Lipidomics (Peak height)
The analysis followed the same steps as done for area above. Here are the figures for analysis of height. Results are largely the same. I also produced an upset plot below to look at VIP overlap between the two analyses.
I will have to decide which data set to use.
Parental effects
PCA of parent and treatment effects:
PLSDA of parent effects:
VIPs of parent effects:
Heatmap of parent VIPs:
PCA of parent VIPs:
Temperature effects
PLSDA of temperature effects:
VIPs of temperature effects:
Heatmap of temperature VIPs:
PCA of temperature VIPs:
Treatment group effects
PLSDA of group effects:
VIPs of group effects:
Heatmap of group VIPs:
PCA of group VIPs:
LION enrichment
Enrichment results from LION:
- No enrichment in parent list
- polyunsaturated fatty acid enriched in temperature list (LION:0002967)
- diacylglycerols [GL0201], diradylglycerols [GL02], and lipid-mediated signalling enriched in group-specific list
View lipids related to the enriched categories in the interaction/group specific VIP data set from the LION results.
Monounsaturated fatty acids, fatty acids, fatty acids and conjugates, Fatty acid with less than 2 double bonds = ontology FA
Diacylglycerols = ontology DG
Lipid-mediated signalling = could include ontology PC, PS, PG, PA, PE (glycerophospholipids), contains Cer (ceramides)
Add in lipid ontology values.
lipids$ontology<-ontology$ontology[match(lipids$lipid, ontology$lipid)]
Plot a heatmap of the target ontologies in the group VIP list (parent x temperature).
Fatty acids
# Filter data for VIP metabolites
FA_data <- lipids %>%
filter(lipid %in% interaction_vip_list) %>%
filter(ontology=="FA") %>%
droplevels()
length(unique(FA_data$lipid))
#14 fatty acids in this data set
# Convert counts to Z-scores
FA_data <- FA_data %>%
group_by(lipid) %>%
mutate(z_score = scale(height_normalized)) %>%
group_by(parent, temperature, lipid)%>%
summarise(z_score = mean(z_score, na.rm=TRUE))%>%
ungroup()
# Pivot data to have metabolites as rows and lifestage as columns
heatmap_data <- FA_data %>%
mutate(code=paste(parent, temperature))%>%
mutate(code=factor(code, levels=c("Bleached Ambient", "Bleached Moderate (+3)", "Bleached High (+6)", "Nonbleached Ambient", "Nonbleached Moderate (+3)", "Nonbleached High (+6)", "Wildtype Ambient", "Wildtype Moderate (+3)", "Wildtype High (+6)")))%>%
dplyr::select(lipid, code, z_score) %>%
spread(key = code, value = z_score)
# Convert to matrix and set rownames
heatmap_matrix <- as.matrix(column_to_rownames(heatmap_data, var = "lipid"))
# Create the heatmap
heatmap <- Heatmap(
heatmap_matrix,
name = "Z-score",
col = inferno(10),
cluster_rows = TRUE,
cluster_columns = FALSE,
show_row_names = TRUE,
show_column_names = TRUE,
#row_split=split,
#column_split=heatmap_data$code,
row_title = "Fatty Acids",
column_names_gp = gpar(fontsize = 12, border=FALSE),
column_names_rot = 45,
row_gap = unit(1, "mm"),
border = TRUE,
row_names_gp = gpar(fontsize = 12, alpha = 0.75, border = FALSE)
)
# Draw the heatmap
draw(heatmap)
dev.off()
pdf("figures/lipids/fatty_acids_VIP_height.pdf", width=6, height=8)
draw(heatmap)
dev.off()
Diacylglycerols
# Filter data for VIP metabolites
DG_data <- lipids %>%
filter(lipid %in% interaction_vip_list) %>%
filter(ontology=="DG") %>%
droplevels()
length(unique(DG_data$lipid))
#61 DGs in this data set
# Convert counts to Z-scores
DG_data <- DG_data %>%
group_by(lipid) %>%
mutate(z_score = scale(height_normalized)) %>%
group_by(parent, temperature, lipid)%>%
summarise(z_score = mean(z_score, na.rm=TRUE))%>%
ungroup()
# Pivot data to have metabolites as rows and lifestage as columns
heatmap_data <- DG_data %>%
mutate(code=paste(parent, temperature))%>%
mutate(code=factor(code, levels=c("Bleached Ambient", "Bleached Moderate (+3)", "Bleached High (+6)", "Nonbleached Ambient", "Nonbleached Moderate (+3)", "Nonbleached High (+6)", "Wildtype Ambient", "Wildtype Moderate (+3)", "Wildtype High (+6)")))%>%
dplyr::select(lipid, code, z_score) %>%
spread(key = code, value = z_score)
# Convert to matrix and set rownames
heatmap_matrix <- as.matrix(column_to_rownames(heatmap_data, var = "lipid"))
# Create the heatmap
heatmap <- Heatmap(
heatmap_matrix,
name = "Z-score",
col = inferno(10),
cluster_rows = TRUE,
cluster_columns = FALSE,
show_row_names = TRUE,
show_column_names = TRUE,
#row_split=7,
row_title = "DGs",
column_names_gp = gpar(fontsize = 12, border=FALSE),
column_names_rot = 45,
row_gap = unit(1, "mm"),
border = TRUE,
row_names_gp = gpar(fontsize = 12, alpha = 0.75, border = FALSE)
)
# Draw the heatmap
draw(heatmap)
dev.off()
pdf("figures/lipids/diacylglycerols_VIP_height.pdf", width=6, height=14)
draw(heatmap)
dev.off()
Lipid mediated signaling
# Filter data for VIP metabolites
signal_data <- lipids %>%
filter(lipid %in% interaction_vip_list) %>%
filter(ontology %in% c("PC", "PS", "PG", "PA", "PE") | grepl("Cer", ontology))%>%
droplevels()
length(unique(signal_data$lipid))
#64 lipids in this data set
# Convert counts to Z-scores
signal_data <- signal_data %>%
group_by(lipid) %>%
mutate(z_score = scale(height_normalized)) %>%
group_by(parent, temperature, lipid)%>%
summarise(z_score = mean(z_score, na.rm=TRUE))%>%
ungroup()
# Pivot data to have metabolites as rows and lifestage as columns
heatmap_data <- signal_data %>%
mutate(code=paste(parent, temperature))%>%
mutate(code=factor(code, levels=c("Bleached Ambient", "Bleached Moderate (+3)", "Bleached High (+6)", "Nonbleached Ambient", "Nonbleached Moderate (+3)", "Nonbleached High (+6)", "Wildtype Ambient", "Wildtype Moderate (+3)", "Wildtype High (+6)")))%>%
dplyr::select(lipid, code, z_score) %>%
spread(key = code, value = z_score)
# Convert to matrix and set rownames
heatmap_matrix <- as.matrix(column_to_rownames(heatmap_data, var = "lipid"))
# Create the heatmap
heatmap <- Heatmap(
heatmap_matrix,
name = "Z-score",
col = inferno(10),
cluster_rows = TRUE,
cluster_columns = FALSE,
show_row_names = TRUE,
show_column_names = TRUE,
#row_split=7,
row_title = "Lipid Signaling",
column_names_gp = gpar(fontsize = 12, border=FALSE),
column_names_rot = 45,
row_gap = unit(1, "mm"),
border = TRUE,
row_names_gp = gpar(fontsize = 12, alpha = 0.75, border = FALSE)
)
# Draw the heatmap
draw(heatmap)
dev.off()
pdf("figures/lipids/lipid_signaling_VIP_height.pdf", width=6, height=14)
draw(heatmap)
dev.off()
There were more functions enriched in the height data set.
Compare height and area VIP lists
Plot upset plot for each factor tested between height and area to see overlap in VIP sets.
Load in parent VIP lists.
list1<-read_csv("output/lipids_metabolites/lipids/parent_VIP_list.csv")%>%pull(.)
list2<-read_csv("output/lipids_metabolites/lipids/parent_VIP_list_height.csv")%>%pull(.)
# Combine lists into a data frame indicating membership (1 = present, 0 = absent)
upset_data <- data.frame(
Lipid = unique(c(list1, list2)),
List1 = as.numeric(unique(c(list1, list2)) %in% list1),
List2 = as.numeric(unique(c(list1, list2)) %in% list2))
# Convert data to the format needed for the UpSetR plot
plot_data <- upset_data[, -1]
row.names(plot_data) <- upset_data$Lipid
names(plot_data) <- c("Area", "Height")
# Generate the UpSet plot
upset(plot_data, sets = c("Area", "Height"), keep.order = TRUE)
There were 218 shared, 31 unique to height, 19 unique lipids to area data sets for parent VIPs.
Load in temperature VIP lists.
list1<-read_csv("output/lipids_metabolites/lipids/temperature_VIP_list.csv")%>%pull(.)
list2<-read_csv("output/lipids_metabolites/lipids/temperature_VIP_list_height.csv")%>%pull(.)
# Combine lists into a data frame indicating membership (1 = present, 0 = absent)
upset_data <- data.frame(
Lipid = unique(c(list1, list2)),
List1 = as.numeric(unique(c(list1, list2)) %in% list1),
List2 = as.numeric(unique(c(list1, list2)) %in% list2))
# Convert data to the format needed for the UpSetR plot
plot_data <- upset_data[, -1]
row.names(plot_data) <- upset_data$Lipid
names(plot_data) <- c("Area", "Height")
# Generate the UpSet plot
upset(plot_data, sets = c("Area", "Height"), keep.order = TRUE)
There were 213 shared, 38 unique to height, 38 unique to area data sets for temperature VIPs.
Group VIP lists.
list1<-read_csv("output/lipids_metabolites/lipids/interaction_VIP_list.csv")%>%pull(.)
list2<-read_csv("output/lipids_metabolites/lipids/interaction_VIP_list_height.csv")%>%pull(.)
# Combine lists into a data frame indicating membership (1 = present, 0 = absent)
upset_data <- data.frame(
Lipid = unique(c(list1, list2)),
List1 = as.numeric(unique(c(list1, list2)) %in% list1),
List2 = as.numeric(unique(c(list1, list2)) %in% list2))
# Convert data to the format needed for the UpSetR plot
plot_data <- upset_data[, -1]
row.names(plot_data) <- upset_data$Lipid
names(plot_data) <- c("Area", "Height")
# Generate the UpSet plot
upset(plot_data, sets = c("Area", "Height"), keep.order = TRUE)
There were 266 shared, 39 unique to height, 36 unique to area data sets for group VIPs.
In each set, most lipids are shared by area and height data sets. There are some small differences.
Metabolomics
library(tidyverse)
library(stringr)
library(readxl)
library(purrr)
library(lubridate)
library(ggplot2)
library(broom)
library(cowplot)
library(mixOmics)
library(dplyr)
library(conflicted)
library(vegan)
library(factoextra)
library(ggfortify)
library(RVAideMemoire)
library(ComplexHeatmap)
library(circlize)
library(tibble)
library(viridis)
library(genefilter)
library(UpSetR)
conflict_prefer("select", winner = "dplyr", quiet = FALSE)
conflict_prefer("filter", winner = "dplyr", quiet = FALSE)
conflict_prefer("rename", winner = "dplyr", quiet = FALSE)
Data manipulation and preparations
Read in all files
metadata<-read_csv("data/lipids_metabolites/metabolomics/Metabolite_Metadata_Huffmyer_Matthews.csv")%>%rename(tube=`Sample ID`, type=Type, temperature=Temperature, parent=Symbiont)
metabolites<-read_xlsx("data/lipids_metabolites/metabolomics/metabolite_data.xlsx")
protein<-read_csv("data/lipids_metabolites/protein/protein_calculated.csv")
Add protein data to the metadata.
metadata$protein.ug<-protein$`Total Protein ug`[match(metadata$tube, protein$Sample)]
Format data and conduct protein normalization
Format the metabolite data frame in long format to obtain sample names and normalize to total protein.
metabolites<-metabolites%>%
pivot_longer(names_to = "file", values_to = "area", cols=c(2:74))%>%
mutate(tube = str_extract(file, "(?<=_)[^_]+(?=_)"))%>%
mutate(tube = str_remove_all(tube, "M"))
levels(as.factor(metabolites$tube))
levels(as.factor(metadata$tube))
Merge in metadata.
metadata<-metadata%>%
select(tube, type, temperature, parent, protein.ug)
metabolites<-left_join(metabolites, metadata)%>%select(!file)
Replace NA’s with 0’s in the metabolite data frame.
metabolites %>%
filter(is.na(area)) %>%
nrow()
# there are many NA's, which need to be replaced with 0
metabolites <- metabolites %>%
mutate(area = replace_na(area, 0))
metabolites %>%
filter(is.na(area)) %>%
nrow()
Filter out PBQC samples and blank samples, these were used for peak identification and are not needed now.
metabolites<-metabolites%>%
filter(!type=="PBQC")%>%
filter(!type=="Blank")
Now perform internal standard normalization to C13 sorbitol average for each sample corrections.
#make a dataframe of sorbitol values for each sample
sorbitol<-metabolites%>%
filter(Name=="13C6-Sorbitol")%>%
group_by(tube)%>%
summarise(area_sorbitol=mean(area, na.rm=TRUE))
#add this value back into the master data frame for corrections.
metabolites<-metabolites%>%
left_join(.,sorbitol)
Normalize area by dividing area by sorbitol internal standard value for each sample’s peak value for each metabolite.
metabolites<-metabolites%>%
mutate(area_corrected=area/area_sorbitol)
Filter to only keep metabolites that have a detectable value in every sample.
Check for outliers at this point.
wide_data<-metabolites%>%
ungroup()%>%
select(!type)%>%
select(!protein.ug)%>%
select(!area_sorbitol)%>%
select(!area)%>%
pivot_wider(names_from=Name, values_from=area_corrected)
wide_data%>%
select_if(~ any(is.na(.)))
No columns have NA’s.
#remove any column with all 0's
wide_data <- wide_data[, colSums(wide_data != 0) > 0]
wide_data<-wide_data%>%
select(!`13C6-Sorbitol`)
scaled.pca<-prcomp(wide_data[c(4:374)], scale=TRUE, center=TRUE)
pca1<-ggplot2::autoplot(scaled.pca, data=wide_data, frame.colour="parent", loadings=FALSE, colour="parent", shape="temperature", loadings.label.colour="black", loadings.colour="black", loadings.label=FALSE, frame=FALSE, loadings.label.size=5, loadings.label.vjust=-1, size=5) +
scale_color_manual(name="Parental Phenotype", values=c("Wildtype"="gray", "Bleached"="orange", "Nonbleached"="brown4"))+
scale_fill_manual(name="Parental Phenotype", values=c("Wildtype"="gray", "Bleached"="orange", "Nonbleached"="brown4"))+
scale_shape_manual(name="Temperature", values=c(19,17,15))+
geom_text(aes(x = PC1, y = PC2, label = tube), vjust = -0.5)+
theme_classic()+
theme(legend.text = element_text(size=18),
legend.position="none",
plot.background = element_blank(),
legend.title = element_text(size=18, face="bold"),
axis.text = element_text(size=18),
axis.title = element_text(size=18, face="bold"));pca1
No outliers before removing metabolites for presence/absence filtering.
Keep only metabolites detected in all samples.
# Find metabolites that have a value > 0 for every tube
filtered_data <- metabolites %>%
group_by(Name) %>%
filter(all(area_corrected > 0))
# do this instead by keeping those present at >0 value in at least 90% of samples
#threshold <- 0.9 # 90% threshold
#filtered_data <- metabolites %>%
# group_by(Name) %>%
# filter(mean(area_corrected > 0) >= threshold)
# Find metabolites that were removed (i.e., have any value <= 0)
removed_metabolites <- metabolites %>%
filter(!(Name %in% filtered_data$Name)) %>%
distinct(Name)
kept_metabolites <- metabolites %>%
filter((Name %in% filtered_data$Name)) %>%
distinct(Name)
57 metabolites are kept after all samples >0 filtering, 569 were removed. This makes sense, the data frame kept all metabolites in the library so we expect far fewer to be retained.
66 are kept with filtering for >0 value in 90% of samples. Keeping more stringent filtering for now to keep metabolites that have detectable value in every sample.
Normalize corrected area to sample protein. This will generate units of normalized peak area / ug protein.
filtered_data<-filtered_data%>%
mutate(area_normalized=area_corrected/protein.ug)
hist(filtered_data$area_normalized)
quantile(filtered_data$area_normalized)
length(unique(filtered_data$Name))
Filter out internal standards if present still.
filtered_data<-filtered_data%>%
filter(!Name=="13C5,15N1-Valine")%>%
filter(!Name=="13C6-Sorbitol")
We are now ready for anaylsis!
Plot to detect outliers.
wide_data<-filtered_data%>%
ungroup()%>%
select(!type)%>%
select(!protein.ug)%>%
select(!area_sorbitol)%>%
select(!area_corrected)%>%
select(!area)%>%
pivot_wider(names_from=Name, values_from=area_normalized)
wide_data%>%
select_if(~ any(is.na(.)))
No columns have NA’s.
wide_data$temperature<-factor(wide_data$temperature, levels=c("Ambient", "Moderate (+3)", "High (+6)"))
wide_data$parent<-factor(wide_data$parent, levels=c("Wildtype", "Bleached", "Nonbleached"))
scaled.pca<-prcomp(wide_data[c(4:58)], scale=TRUE, center=TRUE)
pca1<-ggplot2::autoplot(scaled.pca, data=wide_data, frame.colour="parent", loadings=FALSE, colour="parent", shape="temperature", loadings.label.colour="black", loadings.colour="black", loadings.label=FALSE, frame=FALSE, loadings.label.size=5, loadings.label.vjust=-1, size=5) +
scale_color_manual(name="Parental Phenotype", values=c("Wildtype"="gray", "Bleached"="orange", "Nonbleached"="brown4"))+
scale_fill_manual(name="Parental Phenotype", values=c("Wildtype"="gray", "Bleached"="orange", "Nonbleached"="brown4"))+
scale_shape_manual(name="Temperature", values=c(19,17,15))+
geom_text(aes(x = PC1, y = PC2, label = tube), vjust = -0.5)+
theme_classic()+
theme(legend.text = element_text(size=18),
legend.position="none",
plot.background = element_blank(),
legend.title = element_text(size=18, face="bold"),
axis.text = element_text(size=18),
axis.title = element_text(size=18, face="bold"));pca1
Sample L107 is a clear outlier. Remove.
filtered_data<-filtered_data%>%
filter(!tube=="L107")
wide_data<-wide_data%>%
filter(!tube=="L107")
Conduct filtering for relative standard deviation
Make a dataframe that calculates the standard deviation of the metabolite area within treatment relative to the mean (i.e., how “large” is the SD).
The formula will be:
(sd*100)/mean
Calculate for each metabolite in each treatment group.
test<-filtered_data%>%
group_by(Name, parent, temperature)%>%
summarise(RSD=(sd(area_normalized)*100)/mean(area_normalized))
length(unique(test$Name))
hist(test$RSD)
55 metabolites before.
Remove metabolites that have RSD < threshold in all treatments (RSD values are high, come back to this). I am using a threshold of 150. RSD values are high for this data set.
test <- test %>%
group_by(Name) %>%
filter(all(RSD < 150))
length(unique(test$Name))
keep_list<-c(unique(test$Name))
48 metabolites after filtering.
Filter the filtered dataset to include the selected metabolites after RSD filtering.
filtered_data<-filtered_data%>%
filter(Name %in% keep_list)
Unsupervised analysis
Conduct a PCA and PERMANOVA to examine the effects of parent and temperature on the metabolome of coral larvae.
Convert to wide format.
wide_data<-filtered_data%>%
ungroup()%>%
select(!type)%>%
select(!protein.ug)%>%
select(!area_sorbitol)%>%
select(!area_corrected)%>%
select(!area)%>%
pivot_wider(names_from=Name, values_from=area_normalized)
wide_data%>%
select_if(~ any(is.na(.)))
No columns have NA’s.
wide_data$temperature<-factor(wide_data$temperature, levels=c("Ambient", "Moderate (+3)", "High (+6)"))
wide_data$parent<-factor(wide_data$parent, levels=c("Wildtype", "Bleached", "Nonbleached"))
scaled.pca<-prcomp(wide_data[c(4:51)], scale=TRUE, center=TRUE)
fviz_eig(scaled.pca)
Prepare a PCA plot
# scale data
vegan <- scale(wide_data[c(4:51)])
# PerMANOVA
permanova<-adonis2(vegan ~ parent*temperature, data = wide_data, method='eu')
permanova
adonis2(formula = vegan ~ parent * temperature, data = wide_data, method = "eu")
Df SumOfSqs R2 F Pr(>F)
parent 2 79.95 0.03203 0.8623 0.483
temperature 2 74.27 0.02975 0.8009 0.542
parent:temperature 4 301.81 0.12092 1.6274 0.087 .
Residual 44 2039.97 0.81730
Total 52 2496.00 1.00000
No effect of parent or temperature.
pca2<-ggplot2::autoplot(scaled.pca, data=wide_data, frame.colour="parent", loadings=FALSE, colour="parent", shape="temperature", loadings.label.colour="black", loadings.colour="black", loadings.label=FALSE, frame=TRUE, loadings.label.size=5, loadings.label.vjust=-1, size=4) +
scale_color_manual(name="Parental Phenotype", values=c("Wildtype"="gray", "Bleached"="orange", "Nonbleached"="brown4"))+
scale_fill_manual(name="Parental Phenotype", values=c("Wildtype"="gray", "Bleached"="orange", "Nonbleached"="brown4"))+
scale_shape_manual(name="Temperature", values=c(19,17,15))+
theme_classic()+
theme(legend.text = element_text(size=14),
legend.position="right",
plot.background = element_blank(),
legend.title = element_text(size=14, face="bold"),
axis.text = element_text(size=14),
axis.title = element_text(size=14, face="bold"));pca2
pca2b<-ggplot2::autoplot(scaled.pca, data=wide_data, frame.colour="temperature", loadings=FALSE, colour="temperature", shape="parent", loadings.label.colour="black", loadings.colour="black", loadings.label=FALSE, frame=TRUE, loadings.label.size=5, loadings.label.vjust=-1, size=4) +
scale_color_manual(name="Temperature", values=c("blue", "orange", "red"))+
scale_fill_manual(name="Temperature", values=c("blue", "orange", "red"))+
scale_shape_manual(name="Parental Phenotype", values=c(19,17,15))+
theme_classic()+
theme(legend.text = element_text(size=14),
legend.position="right",
plot.background = element_blank(),
legend.title = element_text(size=14, face="bold"),
axis.text = element_text(size=14),
axis.title = element_text(size=14, face="bold"));pca2b
pca_plots<-plot_grid(pca2, pca2b, ncol=2, nrow=1)
ggsave(pca_plots, filename="figures/metabolites/pca_plots.jpeg", width=12, height=4)
No clear differences by parent or treatment.
Supervised PLS-DA analysis
Script modified from the following tutorial: https://mixomicsteam.github.io/Bookdown/plsda.html
PLS-DA for the effect of parent
Generate a PLS-DA and plot.
#assigning datasets
X <- wide_data
levels(as.factor(X$parent))
levels(as.factor(X$temperature))
Y <- as.factor(X$parent) #select treatment names
Y
X<-X[4:51] #pull only data columns
# run PLSDA
MyResult.plsda <- plsda(X,Y, ncomp=2) # 1 Run the method with ncomp = groups -1
plotIndiv(MyResult.plsda) # 2 Plot the samples
cols<-c("gray", "orange", "brown4")
pdf("figures/metabolites/PLSDA_Parent.pdf", width=9, height=6)
plotIndiv(MyResult.plsda, col=cols, ind.names = FALSE, legend=TRUE, legend.title = "Parent", ellipse = FALSE, title="", style = "graphics", centroid=FALSE, point.lwd = 2, cex=2)
dev.off()
plotIndiv(MyResult.plsda, col=cols, ind.names = FALSE, legend=TRUE, legend.title = "Parent", ellipse = FALSE, title="", style = "graphics", centroid=FALSE, point.lwd = 2, cex=2)
Set model to determine VIPs.
MyResult.plsda_parent <- plsda(X,Y, ncomp=2)
#view plsda model again
plotIndiv(MyResult.plsda_parent, col=cols, ind.names = FALSE, legend=TRUE, legend.title = "Parent", ellipse = TRUE, title="", style = "graphics", centroid=FALSE, point.lwd = 2, cex=2)
Extract VIP scores for metabolites that are most highly differentiating.
#extract
parent_VIP <- PLSDA.VIP(MyResult.plsda_parent, graph=TRUE)
parent_VIP_DF <- as.data.frame(parent_VIP[["tab"]])
parent_VIP_DF
# Converting row names to column
parent_VIP_table <- rownames_to_column(parent_VIP_DF, var = "Metabolite")
#filter for VIP > 1
parent_VIP_1 <- parent_VIP_table %>%
filter(VIP >= 1)
write_csv(parent_VIP_1, "output/lipids_metabolites/metabolites/Overall_Parent_VIPs.csv")
#plot
parent_VIP_1 %>%
arrange(VIP) %>%
ggplot( aes(x = VIP, y = reorder(Metabolite,VIP,sum))) +
geom_point() +
ylab("Metabolite") +
xlab("VIP Score") +
ggtitle("Parent VIP") +
theme_bw() + theme(panel.border = element_rect(linetype = "solid", color = "black"), panel.grid.major = element_blank(), #Makes background theme white
panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"))
pdf("figures/metabolites/Parent_VIP.pdf", width=4, height=4)
parent_VIP_1 %>%
arrange(VIP) %>%
ggplot( aes(x = VIP, y = reorder(Metabolite,VIP,sum))) +
geom_point() +
ylab("Metabolite") +
xlab("VIP Score") +
ggtitle("Parent VIP > 1") +
theme_bw() + theme(panel.border = element_rect(linetype = "solid", color = "black"), panel.grid.major = element_blank(), #Makes background theme white
panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"))
dev.off()
These lists represent VIPs that drive separation between parent groups. We can extract VIPs, but given that there is no significant separation in unsupervised analyses we will likely not move forward with more detailed analyses.
Make a list of vips.
parent_vip_list<-parent_VIP_1%>%
pull(Metabolite)
There are 15 VIPs.
Plot heatmap of z score of lipids across life stages.
# Filter data for VIP metabolites
parent_vip_data <- filtered_data %>% filter(Name %in% parent_vip_list)
# Convert counts to Z-scores
parent_vip_data <- parent_vip_data %>%
group_by(Name) %>%
mutate(z_score = scale(area_normalized)) %>%
group_by(parent, Name)%>%
summarise(z_score = mean(z_score, na.rm=TRUE))%>%
ungroup()
# Pivot data to have metabolites as rows and lifestage as columns
heatmap_data <- parent_vip_data %>%
dplyr::select(Name, parent, z_score) %>%
spread(key = parent, value = z_score)
# Convert to matrix and set rownames
heatmap_matrix <- as.matrix(column_to_rownames(heatmap_data, var = "Name"))
# Create the heatmap
heatmap <- Heatmap(
heatmap_matrix,
name = "Z-score",
col = inferno(10),
cluster_rows = TRUE,
cluster_columns = FALSE,
show_row_names = TRUE,
show_column_names = TRUE,
#row_split=3,
row_title = "Metabolite",
column_names_gp = gpar(fontsize = 12, border=FALSE),
column_names_rot = 45,
row_gap = unit(1, "mm"),
border = TRUE,
row_names_gp = gpar(fontsize = 12, alpha = 0.75, border = FALSE)
)
# Draw the heatmap
draw(heatmap)
dev.off()
pdf("figures/metabolites/VIP_heatmap_parent.pdf", width=5, height=6)
draw(heatmap)
dev.off()
The variation appears to be mainly due to differences between wildtype and the other two phenotypes. This may have to do with genotypic diversity or the method of collection/location of population.
View PCA for VIPs.
vip_data_parent<-filtered_data%>%
ungroup()%>%
select(!type)%>%
select(!protein.ug)%>%
select(!area_sorbitol)%>%
select(!area_corrected)%>%
select(!area)%>%
filter(Name %in% parent_vip_list)%>%
pivot_wider(names_from=Name, values_from=area_normalized)
vip_data_parent%>%
select_if(~ any(is.na(.)))
No columns have NA’s.
vip_data_parent$temperature<-factor(vip_data_parent$temperature, levels=c("Ambient", "Moderate (+3)", "High (+6)"))
vip_data_parent$parent<-factor(vip_data_parent$parent, levels=c("Wildtype", "Bleached", "Nonbleached"))
scaled.pca<-prcomp(vip_data_parent[c(4:18)], scale=TRUE, center=TRUE)
fviz_eig(scaled.pca)
Prepare a PCA plot
# scale data
vegan <- scale(vip_data_parent[c(4:18)])
# PerMANOVA
permanova<-adonis2(vegan ~ parent*temperature, data = vip_data_parent, method='eu')
permanova
adonis2(formula = vegan ~ parent * temperature, data = vip_data_parent, method = "eu")
Df SumOfSqs R2 F Pr(>F)
parent 2 44.50 0.05705 1.5591 0.182
temperature 2 22.15 0.02839 0.7760 0.548
parent:temperature 4 85.45 0.10955 1.4969 0.159
Residual 44 627.91 0.80501
Total 52 780.00 1.00000
No difference in VIPs by parent or temperature.
pca_vip_parent1<-ggplot2::autoplot(scaled.pca, data=vip_data_parent, frame.colour="parent", loadings=FALSE, colour="parent", shape="temperature", loadings.label.colour="black", loadings.colour="black", loadings.label=FALSE, frame=TRUE, loadings.label.size=5, loadings.label.vjust=-1, size=4) +
scale_color_manual(name="Parental Phenotype", values=c("Wildtype"="gray", "Bleached"="orange", "Nonbleached"="brown4"))+
scale_fill_manual(name="Parental Phenotype", values=c("Wildtype"="gray", "Bleached"="orange", "Nonbleached"="brown4"))+
scale_shape_manual(name="Temperature", values=c(19,17,15))+
theme_classic()+
theme(legend.text = element_text(size=14),
legend.position="right",
plot.background = element_blank(),
legend.title = element_text(size=14, face="bold"),
axis.text = element_text(size=14),
axis.title = element_text(size=14, face="bold"));pca_vip_parent1
pca_vip_parent2<-ggplot2::autoplot(scaled.pca, data=vip_data_parent, frame.colour="temperature", loadings=FALSE, colour="temperature", shape="parent", loadings.label.colour="black", loadings.colour="black", loadings.label=FALSE, frame=TRUE, loadings.label.size=5, loadings.label.vjust=-1, size=4) +
scale_color_manual(name="Temperature", values=c("blue", "orange", "red"))+
scale_fill_manual(name="Temperature", values=c("blue", "orange", "red"))+
scale_shape_manual(name="Parental Phenotype", values=c(19,17,15))+
theme_classic()+
theme(legend.text = element_text(size=14),
legend.position="right",
plot.background = element_blank(),
legend.title = element_text(size=14, face="bold"),
axis.text = element_text(size=14),
axis.title = element_text(size=14, face="bold"));pca_vip_parent2
pca_plots_parent<-plot_grid(pca_vip_parent1, pca_vip_parent2, ncol=2, nrow=1)
ggsave(pca_plots_parent, filename="figures/metabolites/pca_plots_parentVIP.jpeg", width=12, height=4)
PLS-DA for the effect of temperature
Generate a PLS-DA and plot.
#assigning datasets
X <- wide_data
levels(as.factor(X$parent))
levels(as.factor(X$temperature))
Y <- as.factor(X$temperature) #select treatment names
Y
levels(Y)
X<-X[4:51] #pull only data columns
# run PLSDA
MyResult.plsda <- plsda(X,Y, ncomp=2) # 1 Run the method with ncomp = groups -1
plotIndiv(MyResult.plsda) # 2 Plot the samples
cols<-c("blue", "orange", "red")
pdf("figures/metabolites/PLSDA_Temperature.pdf", width=9, height=6)
plotIndiv(MyResult.plsda, col=cols, ind.names = FALSE, legend=TRUE, legend.title = "Temperature", ellipse = FALSE, title="", style = "graphics", centroid=FALSE, point.lwd = 2, cex=2)
dev.off()
plotIndiv(MyResult.plsda, col=cols, ind.names = FALSE, legend=TRUE, legend.title = "Temperature", ellipse = FALSE, title="", style = "graphics", centroid=FALSE, point.lwd = 2, cex=2)
Set model to determine VIPs.
MyResult.plsda_temperature <- plsda(X,Y, ncomp=2)
#view plsda model again
plotIndiv(MyResult.plsda_temperature, col=cols, ind.names = FALSE, legend=TRUE, legend.title = "Temperature", ellipse = TRUE, title="", style = "graphics", centroid=FALSE, point.lwd = 2, cex=2)
Extract VIP scores of differentiating metabolites.
#extract
temperature_VIP <- PLSDA.VIP(MyResult.plsda_temperature, graph=TRUE)
temperature_VIP_DF <- as.data.frame(temperature_VIP[["tab"]])
temperature_VIP_DF
# Converting row names to column
temperature_VIP_table <- rownames_to_column(temperature_VIP_DF, var = "Metabolite")
#filter for VIP > 1
temperature_VIP_1 <- temperature_VIP_table %>%
filter(VIP >= 1)
write_csv(temperature_VIP_1, "output/lipids_metabolites/metabolites/Overall_Temperature_VIPs.csv")
#plot
temperature_VIP_1 %>%
arrange(VIP) %>%
ggplot( aes(x = VIP, y = reorder(Metabolite,VIP,sum))) +
geom_point() +
ylab("Metabolite") +
xlab("VIP Score") +
ggtitle("Temperature VIP") +
theme_bw() + theme(panel.border = element_rect(linetype = "solid", color = "black"), panel.grid.major = element_blank(), #Makes background theme white
panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"))
pdf("figures/metabolites/Temperature_VIP.pdf", width=4, height=4)
temperature_VIP_1 %>%
arrange(VIP) %>%
ggplot( aes(x = VIP, y = reorder(Metabolite,VIP,sum))) +
geom_point() +
ylab("Metabolite") +
xlab("VIP Score") +
ggtitle("Temperature VIP > 1") +
theme_bw() + theme(panel.border = element_rect(linetype = "solid", color = "black"), panel.grid.major = element_blank(), #Makes background theme white
panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"))
dev.off()
These lists represent VIPs that drive separation between temperature groups.
Make a list of vips.
temperature_vip_list<-temperature_VIP_1%>%
pull(Metabolite)
There are 10 VIPs.
Plot heatmap of z score of metabolites across temperature.
# Filter data for VIP metabolites
temperature_vip_data <- filtered_data %>% filter(Name %in% temperature_vip_list)
# Convert counts to Z-scores
temperature_vip_data <- temperature_vip_data %>%
group_by(Name) %>%
mutate(z_score = scale(area_normalized)) %>%
group_by(temperature, Name)%>%
summarise(z_score = mean(z_score, na.rm=TRUE))%>%
ungroup()
temperature_vip_data$temperature<-factor(temperature_vip_data$temperature, levels=c("Ambient", "Moderate (+3)", "High (+6)"))
# Pivot data to have metabolites as rows and lifestage as columns
heatmap_data <- temperature_vip_data %>%
dplyr::select(Name, temperature, z_score) %>%
spread(key = temperature, value = z_score)
# Convert to matrix and set rownames
heatmap_matrix <- as.matrix(column_to_rownames(heatmap_data, var = "Name"))
# Create the heatmap
heatmap <- Heatmap(
heatmap_matrix,
name = "Z-score",
col = inferno(10),
cluster_rows = TRUE,
cluster_columns = FALSE,
show_row_names = TRUE,
show_column_names = TRUE,
#row_split=3,
row_title = "Metabolite",
column_names_gp = gpar(fontsize = 12, border=FALSE),
column_names_rot = 45,
row_gap = unit(1, "mm"),
border = TRUE,
row_names_gp = gpar(fontsize = 12, alpha = 0.75, border = FALSE)
)
# Draw the heatmap
draw(heatmap)
dev.off()
pdf("figures/metabolites/VIP_heatmap_temperature.pdf", width=6, height=7)
draw(heatmap)
dev.off()
View PCA for VIPs.
vip_data_temperature<-filtered_data%>%
ungroup()%>%
select(!type)%>%
select(!protein.ug)%>%
select(!area_sorbitol)%>%
select(!area_corrected)%>%
select(!area)%>%
filter(Name %in% temperature_vip_list)%>%
pivot_wider(names_from=Name, values_from=area_normalized)
vip_data_temperature%>%
select_if(~ any(is.na(.)))
No columns have NA’s.
vip_data_temperature$temperature<-factor(vip_data_temperature$temperature, levels=c("Ambient", "Moderate (+3)", "High (+6)"))
vip_data_temperature$parent<-factor(vip_data_temperature$parent, levels=c("Wildtype", "Bleached", "Nonbleached"))
scaled.pca<-prcomp(vip_data_temperature[c(4:13)], scale=TRUE, center=TRUE)
fviz_eig(scaled.pca)
Prepare a PCA plot
# scale data
vegan <- scale(vip_data_temperature[c(4:13)])
# PerMANOVA
permanova<-adonis2(vegan ~ parent*temperature, data = vip_data_temperature, method='eu')
permanova
adonis2(formula = vegan ~ parent * temperature, data = vip_data_temperature, method = "eu")
Df SumOfSqs R2 F Pr(>F)
parent 2 28.82 0.05543 1.5314 0.208
temperature 2 23.68 0.04553 1.2579 0.263
parent:temperature 4 53.42 0.10273 1.4192 0.203
Residual 44 414.08 0.79630
Total 52 520.00 1.00000
No significant effects.
pca_vip_temperature1<-ggplot2::autoplot(scaled.pca, data=vip_data_temperature, frame.colour="parent", loadings=FALSE, colour="parent", shape="temperature", loadings.label.colour="black", loadings.colour="black", loadings.label=FALSE, frame=TRUE, loadings.label.size=5, loadings.label.vjust=-1, size=4) +
scale_color_manual(name="Parental Phenotype", values=c("Wildtype"="gray", "Bleached"="orange", "Nonbleached"="brown4"))+
scale_fill_manual(name="Parental Phenotype", values=c("Wildtype"="gray", "Bleached"="orange", "Nonbleached"="brown4"))+
scale_shape_manual(name="Temperature", values=c(19,17,15))+
theme_classic()+
theme(legend.text = element_text(size=14),
legend.position="right",
plot.background = element_blank(),
legend.title = element_text(size=14, face="bold"),
axis.text = element_text(size=14),
axis.title = element_text(size=14, face="bold"));pca_vip_temperature1
pca_vip_temperature2<-ggplot2::autoplot(scaled.pca, data=vip_data_temperature, frame.colour="temperature", loadings=FALSE, colour="temperature", shape="parent", loadings.label.colour="black", loadings.colour="black", loadings.label=FALSE, frame=TRUE, loadings.label.size=5, loadings.label.vjust=-1, size=4) +
scale_color_manual(name="Temperature", values=c("blue", "orange", "red"))+
scale_fill_manual(name="Temperature", values=c("blue", "orange", "red"))+
scale_shape_manual(name="Parental Phenotype", values=c(19,17,15))+
theme_classic()+
theme(legend.text = element_text(size=14),
legend.position="right",
plot.background = element_blank(),
legend.title = element_text(size=14, face="bold"),
axis.text = element_text(size=14),
axis.title = element_text(size=14, face="bold"));pca_vip_temperature2
pca_plots_temperature<-plot_grid(pca_vip_temperature1, pca_vip_temperature2, ncol=2, nrow=1)
ggsave(pca_plots_temperature, filename="figures/metabolites/pca_plots_temperatureVIP.jpeg", width=12, height=4)
PLS-DA for the effect of parent x temperature groups
Generate a PLS-DA and plot.
#assigning datasets
X <- wide_data
levels(as.factor(X$parent))
levels(as.factor(X$temperature))
X$code<-paste(X$parent, X$temperature)
levels(as.factor(X$code))
Y <- as.factor(X$code) #select treatment names
Y
levels(Y)
X<-X[4:51] #pull only data columns
# run PLSDA
MyResult.plsda <- plsda(X,Y, ncomp=5) # 1 Run the method with ncomp = groups -1
plotIndiv(MyResult.plsda) # 2 Plot the samples
cols<-c("orange", "orange3", "orange4", "brown", "brown2", "brown4", "gray", "lightgray", "darkgray")
pdf("figures/metabolites/PLSDA_interaction.pdf", width=9, height=6)
plotIndiv(MyResult.plsda, col=cols, ind.names = FALSE, legend=TRUE, legend.title = "Temperature", ellipse = FALSE, title="", style = "graphics", centroid=FALSE, point.lwd = 2, cex=2)
dev.off()
plotIndiv(MyResult.plsda, col=cols, ind.names = FALSE, legend=TRUE, legend.title = "Temperature", ellipse = FALSE, title="", style = "graphics", centroid=FALSE, point.lwd = 2, cex=2)
Set model to determine VIPs.
MyResult.plsda_interaction <- plsda(X,Y, ncomp=5)
#view plsda model again
plotIndiv(MyResult.plsda_interaction, col=cols, ind.names = FALSE, legend=TRUE, legend.title = "Parent x Temperature", ellipse = TRUE, title="", style = "graphics", centroid=FALSE, point.lwd = 2, cex=2)
Extract VIP scores.
#extract
interaction_VIP <- PLSDA.VIP(MyResult.plsda_interaction, graph=TRUE)
interaction_VIP_DF <- as.data.frame(interaction_VIP[["tab"]])
interaction_VIP_DF
# Converting row names to column
interaction_VIP_table <- rownames_to_column(interaction_VIP_DF, var = "Metabolite")
#filter for VIP > 1
interaction_VIP_1 <- interaction_VIP_table %>%
filter(VIP >= 1)
write_csv(interaction_VIP_1, "output/lipids_metabolites/metabolites/Overall_Interaction_VIPs.csv")
#plot
interaction_VIP_1 %>%
arrange(VIP) %>%
ggplot( aes(x = VIP, y = reorder(Metabolite,VIP,sum))) +
geom_point() +
ylab("Metabolite") +
xlab("VIP Score") +
ggtitle("Parent x Temperature VIP") +
theme_bw() + theme(panel.border = element_rect(linetype = "solid", color = "black"), panel.grid.major = element_blank(), #Makes background theme white
panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"))
pdf("figures/metabolites/Interaction_VIP.pdf", width=4, height=4)
interaction_VIP_1 %>%
arrange(VIP) %>%
ggplot( aes(x = VIP, y = reorder(Metabolite,VIP,sum))) +
geom_point() +
ylab("Metabolite") +
xlab("VIP Score") +
ggtitle("Interaction VIP > 1") +
theme_bw() + theme(panel.border = element_rect(linetype = "solid", color = "black"), panel.grid.major = element_blank(), #Makes background theme white
panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"))
dev.off()
These lists represent VIPs that drive separation between parent x temperature groups.
Make a list of vips.
interaction_vip_list<-interaction_VIP_1%>%
pull(Metabolite)
There are 13 VIPs.
Plot heatmap of z score of lipids across groups.
# Filter data for VIP metabolites
interaction_vip_data <- filtered_data %>% filter(Name %in% interaction_vip_list)
# Convert counts to Z-scores
interaction_vip_data <- interaction_vip_data %>%
mutate(code=paste(parent, temperature))%>%
group_by(Name) %>%
mutate(z_score = scale(area_normalized)) %>%
group_by(code, Name)%>%
summarise(z_score = mean(z_score, na.rm=TRUE))%>%
ungroup()
interaction_vip_data$code<-factor(interaction_vip_data$code, levels=c("Bleached Ambient", "Bleached Moderate (+3)", "Bleached High (+6)", "Nonbleached Ambient", "Nonbleached Moderate (+3)", "Nonbleached High (+6)", "Wildtype Ambient", "Wildtype Moderate (+3)", "Wildtype High (+6)"))
# Pivot data to have metabolites as rows and lifestage as columns
heatmap_data <- interaction_vip_data %>%
dplyr::select(Name, code, z_score) %>%
spread(key = code, value = z_score)
# Convert to matrix and set rownames
heatmap_matrix <- as.matrix(column_to_rownames(heatmap_data, var = "Name"))
# Create the heatmap
heatmap <- Heatmap(
heatmap_matrix,
name = "Z-score",
col = inferno(10),
cluster_rows = TRUE,
cluster_columns = FALSE,
show_row_names = TRUE,
show_column_names = TRUE,
#row_split=4,
row_title = "Metabolite",
column_names_gp = gpar(fontsize = 12, border=FALSE),
column_names_rot = 45,
row_gap = unit(1, "mm"),
border = TRUE,
row_names_gp = gpar(fontsize = 12, alpha = 0.75, border = FALSE)
)
# Draw the heatmap
draw(heatmap)
dev.off()
pdf("figures/metabolites/VIP_heatmap_parentxtemperature.pdf", width=8, height=8)
draw(heatmap)
dev.off()
View PCA for VIPs.
vip_data_interaction<-filtered_data%>%
ungroup()%>%
select(!type)%>%
select(!protein.ug)%>%
select(!area_sorbitol)%>%
select(!area_corrected)%>%
select(!area)%>%
filter(Name %in% interaction_vip_list)%>%
pivot_wider(names_from=Name, values_from=area_normalized)
vip_data_interaction%>%
select_if(~ any(is.na(.)))
No columns have NA’s.
vip_data_interaction$temperature<-factor(vip_data_interaction$temperature, levels=c("Ambient", "Moderate (+3)", "High (+6)"))
vip_data_interaction$parent<-factor(vip_data_interaction$parent, levels=c("Wildtype", "Bleached", "Nonbleached"))
scaled.pca<-prcomp(vip_data_interaction[c(4:16)], scale=TRUE, center=TRUE)
fviz_eig(scaled.pca)
Prepare a PCA plot
# scale data
vegan <- scale(vip_data_interaction[c(4:16)])
# PerMANOVA
permanova<-adonis2(vegan ~ parent*temperature, data = vip_data_interaction, method='eu')
permanova
adonis2(formula = vegan ~ parent * temperature, data = vip_data_interaction, method = "eu")
Df SumOfSqs R2 F Pr(>F)
parent 2 28.06 0.04151 1.1445 0.332
temperature 2 27.70 0.04098 1.1299 0.315
parent:temperature 4 80.85 0.11960 1.6489 0.099 .
Residual 44 539.39 0.79791
Total 52 676.00 1.00000
No effects are significant.
pca_vip_interaction1<-ggplot2::autoplot(scaled.pca, data=vip_data_interaction, frame.colour="parent", loadings=FALSE, colour="parent", shape="temperature", loadings.label.colour="black", loadings.colour="black", loadings.label=FALSE, frame=TRUE, loadings.label.size=5, loadings.label.vjust=-1, size=4) +
scale_color_manual(name="Parental Phenotype", values=c("Wildtype"="gray", "Bleached"="orange", "Nonbleached"="brown4"))+
scale_fill_manual(name="Parental Phenotype", values=c("Wildtype"="gray", "Bleached"="orange", "Nonbleached"="brown4"))+
scale_shape_manual(name="Temperature", values=c(19,17,15))+
theme_classic()+
theme(legend.text = element_text(size=14),
legend.position="right",
plot.background = element_blank(),
legend.title = element_text(size=14, face="bold"),
axis.text = element_text(size=14),
axis.title = element_text(size=14, face="bold"));pca_vip_interaction1
pca_vip_interaction2<-ggplot2::autoplot(scaled.pca, data=vip_data_interaction, frame.colour="temperature", loadings=FALSE, colour="temperature", shape="parent", loadings.label.colour="black", loadings.colour="black", loadings.label=FALSE, frame=TRUE, loadings.label.size=5, loadings.label.vjust=-1, size=4) +
scale_color_manual(name="Temperature", values=c("blue", "orange", "red"))+
scale_fill_manual(name="Temperature", values=c("blue", "orange", "red"))+
scale_shape_manual(name="Parental Phenotype", values=c(19,17,15))+
theme_classic()+
theme(legend.text = element_text(size=14),
legend.position="right",
plot.background = element_blank(),
legend.title = element_text(size=14, face="bold"),
axis.text = element_text(size=14),
axis.title = element_text(size=14, face="bold"));pca_vip_interaction2
pca_plots_interaction<-plot_grid(pca_vip_interaction1, pca_vip_interaction2, ncol=2, nrow=1)
ggsave(pca_plots_interaction, filename="figures/metabolites/pca_plots_interactionVIP.jpeg", width=12, height=4)
Plot mean values of VIPs
Plot parent x temperature group VIPs as a mean to see if differences are present.
interaction_means<-filtered_data%>%
ungroup()%>%
select(!type)%>%
select(!protein.ug)%>%
select(!area_sorbitol)%>%
select(!area_corrected)%>%
select(!area)%>%
filter(Name %in% interaction_vip_list)%>%
group_by(Name, parent, temperature)%>%
summarise(mean=mean(area_normalized, na.rm=TRUE), se=sd(area_normalized, na.rm=TRUE)/sqrt(length(area_normalized)))%>%
mutate(temperature=factor(temperature, levels=c("Ambient", "Moderate (+3)", "High (+6)")))%>%
ggplot(aes(x = temperature, y = mean, color = parent)) +
facet_wrap(~Name, scales="free")+
geom_line(aes(group=parent), position=position_dodge(0.3))+
geom_point(position=position_dodge(0.3))+
geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=0, position=position_dodge(0.3))+
scale_color_manual(values=c("orange", "brown4", "gray"))+
labs(x = "",y = "Mean Normalized Area") +
theme_classic()+
theme(
axis.text.x=element_text(angle=45, vjust=1, hjust=1)
);interaction_means
ggsave(interaction_means, filename="figures/metabolites/interaction_VIP_means.jpeg", width=12, height=12)
What is the overlap in these lists? Create an upset plot.
# Combine lists into a data frame indicating membership (1 = present, 0 = absent)
upset_data <- data.frame(
Name = unique(c(parent_vip_list, temperature_vip_list, interaction_vip_list)),
List1 = as.numeric(unique(c(parent_vip_list, temperature_vip_list, interaction_vip_list)) %in% parent_vip_list),
List2 = as.numeric(unique(c(parent_vip_list, temperature_vip_list, interaction_vip_list)) %in% temperature_vip_list),
List3 = as.numeric(unique(c(parent_vip_list, temperature_vip_list, interaction_vip_list)) %in% interaction_vip_list)
)
# Convert data to the format needed for the UpSetR plot
plot_data <- upset_data[, -1]
row.names(plot_data) <- upset_data$Name
names(plot_data) <- c("Parent", "Temperature", "PxT Group")
# Generate the UpSet plot
pdf("figures/metabolites/upset_VIP_list_overlap.pdf", width=6, height=6)
upset(plot_data, sets = c("Parent", "Temperature", "PxT Group"), keep.order = TRUE)
dev.off()
No signature of changes in metabolome by parent or temperature treatment.
Next steps
- Determine whether to use area or height data sets for analysis.
- Dig into lipid results!
- Decide on use for metabolomics results given low metabolites kept after filtering.