Mcapitata Development 16S Analysis Part 2

This post details QC and QIIME analysis for the 16S analysis adapted from the pipeline developed by Emma Strand.

Loading 16S Data in QIIME2

Steps #1-5 in Part 1 post

Log into Andromeda and navigate to data folder. If off campus, use VPN connection via AnyConnect.

ssh -l ashuffmyer ssh3.hac.uri.edu
cd /data/putnamlab/ashuffmyer/AH_MCAP_16S

6. Create metadata files

Before proceeding, read QIIME2 webpage and tutorials.

We need to create two metadata files: Sample manifest file and sample metadata file

Create a directory for metadata in the AH_MCAP_16S directory.

mkdir metadata

First, create a list of all samples and file paths to help create metadata files.

cd raw_data
find $PWD -type f ! -name filepath.csv > filepath.csv

find $PWD -type f ! -name filenames.csv ! -name filepath.csv -printf "%P\n" >filenames.csv

mv filenames.csv /data/putnamlab/ashuffmyer/AH_MCAP_16S/metadata

mv filepath.csv /data/putnamlab/ashuffmyer/AH_MCAP_16S/metadata

Outside of Andromeda, move these files to your local computer.

scp ashuffmyer@bluewaves.uri.edu:/data/putnamlab/ashuffmyer/AH_MCAP_16S/metadata/filenames.csv ~/MyProjects/EarlyLifeHistory_Energetics/Mcap2020/Data/16S

scp ashuffmyer@bluewaves.uri.edu:/data/putnamlab/ashuffmyer/AH_MCAP_16S/metadata/filepath.csv ~/MyProjects/EarlyLifeHistory_Energetics/Mcap2020/Data/16S

Sample manifest file

Create a sample manifest file with the following columns: sample-id, forward-absolute-path, and reverse-absolute-path. Note that this is different from previous pipeline versions. I originally create a file with a column for direction (e.g., forward or reverse) and had errors loading data due to two lines in the manifest per sample.

Sample manifest file should have:
sample-id (e.g., WSH201)
forward-absolute-path (file path of R1 read)
reverse-absolute-path (file path of R2 read)

Save as a tab-delimited file .txt.

This file is named sample_manifest.txt.

Copy back into Andromeda.

scp ~/MyProjects/EarlyLifeHistory_Energetics/Mcap2020/Data/16S/sample_manifest.txt ashuffmyer@bluewaves.uri.edu:/data/putnamlab/ashuffmyer/AH_MCAP_16S/metadata/ 

The file looks like this:
manifest

Sample metadata file

Create a sample metadata file based on QIIME2 requirements (see links above). Metadata includes the first row as a header and the second row as the data type (#q2:types) with metadata starting in the third row. Save this file as a tab-delimited txt file as for the manifest file.

Platemaps with sample names spreadsheet here.

I created this file manually on my computer and then copied into Andromeda.

scp ~/MyProjects/EarlyLifeHistory_Energetics/Mcap2020/Data/16S/sample_metadata.txt ashuffmyer@bluewaves.uri.edu:/data/putnamlab/ashuffmyer/AH_MCAP_16S/metadata/ 

The file looks like this:
metadta

7. Input data into QIIME2

Next, input sample data into QIIME2. The settings were copied from E. Strand’s pipeline for this preliminary analysis.

More information on importing data here.

  • Sequence Data with Sequence Quality Information: because we have fastq files, not fasta files.
  • FASTQ data in paired-end demultiplexed format: because our samples are already demultiplexed and we have 1 file per F and R.
  • Input path directs to the sample manifest file created above.
  • PairedEndFastqManifestPhred33 option requires a forward and reverse read. This assumes that the PHRED offset for positional quality scores is 33 - more info here.

Enter interactive mode and load modules.

interactive 

module load Miniconda3/4.9.2
module load Python/3.8.2-GCCcore-9.3.0
module load FastQC/0.11.9-Java-11
module load MultiQC/1.9-intel-2020a-Python-3.8.2
module load cutadapt/2.10-GCCcore-9.3.0-Python-3.8.2
module load QIIME2/2021.8

Move the manifest file in with the data files.

mv /data/putnamlab/ashuffmyer/AH_MCAP_16S/metadata/sample_manifest.txt /data/putnamlab/ashuffmyer/AH_MCAP_16S/raw_data/ 

Run in the AH_MCAP_16S directory.

Note that I had to create a new conda environment, the previous environment was not present. This step is not needed if conda activate works.

conda create -n AH_MCAP_16S
conda activate AH_MCAP_16S

qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path raw_data/sample_manifest.txt \
  --input-format PairedEndFastqManifestPhred33V2 \
  --output-path AH-MCAP-16S-paired-end-sequences1.qza

Running this script I got the following error:

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

It seems that we need to update conda/numpy so that it can be accessed in the cluster.

conda upgrade numpy may be the correct solution. Contacting Kevin Bryan to ask about the option to upgrade Conda and numpy.

K. Bryan updated QIIME2 to QIIME2021.8 on 20211229 this includes an upgrade to numpy.

Try to import data again. Must re activate conda environment and load modules in interactive mode each time you exit interactive mode.

interactive 
module load Miniconda3/4.9.2
conda activate AH_MCAP_16S 
module load QIIME2/2021.8

qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path raw_data/sample_manifest.txt \
  --input-format PairedEndFastqManifestPhred33V2 \
  --output-path AH-MCAP-16S-paired-end-sequences1.qza

Success! Data imported.

Output reads: Imported raw_data/sample_manifest.txt as PairedEndFastqManifestPhred33V2 to AH-MCAP-16S-paired-end-sequences1.qza

Now the QIIME artifact named AH-MCAP-16S-paired-end-sequences1.qza lives in the AH_MCAP_16S directory.

Next, we can proceed with data cleaning in QIIME2.
Written on December 28, 2021