WGCNA with physiology correlations of ortholog expression for the E5 timeseries molecular project

Today I updated a previously generated WGCNA to include correlations with physiology data.

Overview

Please see my last post for all the details on the WGCNA. This post only contains the updated figure with physiology correlations. The code can be found here.

Updated analysis to include physiology correlations

Study Design

Species and Samples

The analysis includes RNA-seq data from three coral species representing different ecological strategies:

  • Acropora pulchra (ACR): Fast-growing, branching coral (40 samples)
  • Porites evermanni (POR): Slow-growing, massive coral (38 samples)
  • Pocillopora tuahiniensis (POC): Intermediate growth, branching coral (39 samples)

Experimental Design

Samples were collected from Moorea, French Polynesia across four time points (TP1-TP4) representing different environmental conditions. The experiment included:

  • Total samples: 117 RNA-seq libraries
  • Time points: 4 developmental/environmental time points per colony
  • Sites: Manava and Mahana collection sites
  • Biological replicates: Multiple colonies per species per time point

Input Data

The analysis used variance-stabilized transformation (VST) normalized counts from the DESeq2 pipeline:

  • Input file: vst_counts_matrix.csv from analysis 14-pca-orthologs
  • Genes: 9,800 ortholog groups (OG_XXXXX identifiers)
  • Samples: 117 columns representing individual RNA-seq libraries
  • Format: VST-normalized expression values across all samples
  • Physiology data: Data frame with physiology for 14 metrics.

VST normalization was chosen because it stabilizes variance across the expression range, making the data more suitable for correlation-based analyses like WGCNA.

Results

Module Identification

WGCNA identified 14 distinct co-expression modules plus one unassigned module (grey or 0):

module n_orthologs
1 2646
2 1198
3 1144
4 967
5 884
6 876
0 757
7 706
8 250
9 84
10 63
11 63
12 62
13 50
14 46

Total: 9,796 genes assigned across 15 groups

Module Size Distribution

The modules show a hierarchical distribution typical of biological networks:

  • Large modules (>1,000 genes): representing core biological processes
  • Medium modules (500-1,000 genes): specialized but common functions
  • Small modules (30-250 genes): highly specialized or condition-specific responses
  • Grey module (757 genes): Unassigned genes with low network connectivity

Module-Trait Relationships

The analysis generated a comprehensive heatmap showing correlations between module eigengenes (ME) and experimental traits.

This heatmap shows each module on the y axis grouped in a dendrogram. The x axis shows the categorical variables. Each cell shows the correlation of ortholog expression in the module and the variable with non-significant associations shown without text. Red indicates positive correlation and blue indicates negative correlation.

Key findings include:

Temporal Patterns:

  • Several modules showed strong time-dependent expression patterns
  • Identified warm season (TP1/TP2, e.g., ME8) vs cool season (TP3/TP4, e.g., ME11) modules

Species-Specific Patterns:

  • Some modules exhibited species-specific expression
  • Species-divergent modules suggest lineage-specific adaptations with some showing positive correlations with 1 species in particular

Site Effects:

  • Modules correlated with collection site (Manava vs Maharepa)
  • Site-specific modules may reflect local environmental adaptation

Physiology Correlations:

  • There are lots of correlations with physiological metrics!
  • The correlation patterns show that modules correlated with Porites evermanni are also correlated with physiological metrics including biomass and photosynthetic parameters. We know these parameters are strongly different between species, so this is expected.
  • We can look at the specific genes and relationships and determine how these relate to species differences.

Statistical Significance:

  • Only correlations with p < 0.05 are displayed in the heatmap
  • Benjamini-Hochberg (BH) multiple testing correction applied per trait
  • Module eigengene clustering revealed related modules with similar trait associations

Source Code

The complete analysis is documented in the R Markdown notebook:

Written on October 28, 2025