Raw sequence downloads and NCBI SRA uploads for M. capitata larval thermal tolerance project top off sequencing second upload

This post details the NCBI Sequence Read Archive upload for my Montipora capitata 2023 larval thermal tolerance project. See my notebook posts and my GitHub repo for information on this project.

I previously uploaded and QC’d data from our first round of sequencing in this project with downloads detailed here and QC information here.

Due to low read depth in some samples (13 samples at <20M reads, 7 samples at <15M reads), Azenta performed top off sequencing to meet deliverables for this project. Azenta added additional sequencing for 13 total samples. These samples were distributed across treatments.

In the first round of sequencing (Feb 2024), samples were sequenced on one lane of a NovaX-25B instrument. The top off samples were also performed on one lane of the same instrument on 4/11/2024.

This post details downloads and SRA uploads for the new sequencing files.

1. Download top off data from Azenta

View result statistics and metadata

I used Azenta’s sFTP instructions to view the project results. In their online storage, they provided an .html with an overview of statistics and the R1 and R2 .fastq.gz files for each sample. There are also .md5 files for each .fastq.gz file.

sftp ashuff_uw@sftp.genewiz.com
#entered password from email

lcd MyProjects/larval_symbiont_TPC/data/rna_seq
cd 30-943303755
cd second_delivery
mget Azenta_30-943303755_Data_Report.html

I renamed this file to have “_second” on the title and retained the original file for the first round of sequencing with the file name Azenta_30-943303755_Data_Report.html.

The file for the second sequencing report is on GitHub here.

Overall statistics

Number of samples = 13
number reads = 145,090,062
Yield (Mbases) = 43,525
Mean quality scores = 38.00
% bases over 30 quality score = 90.00

More detailed statistics

I downloaded the report statistics available on GitHub here as well as the full report and added this data to the sample metadata on GitHub here.

All samples had >37 mean quality scores
Yeild (Mbases) ranged from ~500-10,000 per sample with an additional 1.6-33 million reads per sample.
89+% bases had quality scores over 30

Here is the summary of the second round of sequencing:

Project	Sample ID	Barcode Sequence	# Reads	Yield (Mbases)	Mean Quality Score	% Bases >= 30
30-943303755	R107	AAGCGACT+CTTCGCCT	17533977	5260	38	90.31
30-943303755	R55	TTCCTCCT+AGGCTATA	2876549	863	38.01	90.43
30-943303755	R56	TTCCTCCT+GCCTCTAT	1661094	498	37.94	90.06
30-943303755	R57	TTCCTCCT+AGGATAGG	3287827	986	38.08	90.73
30-943303755	R58	TTCCTCCT+TCAGAGCC	1373160	412	38.09	90.78
30-943303755	R59	TTCCTCCT+CTTCGCCT	33684113	10105	37.96	90.15
30-943303755	R60	TTCCTCCT+TAAGATTA	2530567	759	38	90.36
30-943303755	R62	TTCCTCCT+GTCAGTAC	2313143	694	38.01	90.39
30-943303755	R67	TGCTTGCT+CTTCGCCT	17430173	5229	37.9	89.84
30-943303755	R75	GGTGATGA+CTTCGCCT	11054008	3316	37.99	90.27
30-943303755	R83	AACCTACG+CTTCGCCT	17612372	5284	37.76	89.18
30-943303755	R91	GGATCTGA+CTTCGCCT	17511498	5253	38.03	90.44
30-943303755	R99	TGATCACG+CTTCGCCT	16221581	4866	37.47	87.83

I then added this data to the sample metadata and generated a column for total reads combined from first and second sequencing.

date	sample	larvae	temperature	symbiont	parent	phenotype	code	Barcode Sequence	# Reads	Yield (Mbases)	Mean Quality Score	% Bases >= 30	ncbi_bioproject	ncbi_accession	resequenced	second_barcode	second_reads	second_yield	second_mean_quality	second_bases_30	total_reads
20230627	R100	50	33	Mixed	Nonbleached	Nonbleached_Mixed	Nonbleached_Mixed_33	TGATCACG+TAAGATTA	34566278	10370	37.79	89.32	PRJNA1078313	SAMN40082680							34566278
20230627	R101	50	33	Mixed	Nonbleached	Nonbleached_Mixed	Nonbleached_Mixed_33	TGATCACG+ACGTCCTG	38457421	11538	37.79	89.29	PRJNA1078313	SAMN40082681							38457421
20230627	R102	50	33	Mixed	Nonbleached	Nonbleached_Mixed	Nonbleached_Mixed_33	TGATCACG+GTCAGTAC	33729404	10119	37.79	89.27	PRJNA1078313	SAMN40082682							33729404
20230627	R103	50	33	Wildtype	Wildtype	Wildtype	Wildtype_33	AAGCGACT+AGGCTATA	21980396	6594	38.01	90.36	PRJNA1078313	SAMN40082683							21980396
20230627	R104	50	33	Wildtype	Wildtype	Wildtype	Wildtype_33	AAGCGACT+GCCTCTAT	30634968	9191	37.91	89.86	PRJNA1078313	SAMN40082684							30634968
20230627	R105	50	33	Wildtype	Wildtype	Wildtype	Wildtype_33	AAGCGACT+AGGATAGG	29434310	8830	37.84	89.51	PRJNA1078313	SAMN40082685							29434310
20230627	R106	50	33	Wildtype	Wildtype	Wildtype	Wildtype_33	AAGCGACT+TCAGAGCC	34929945	10479	37.94	89.97	PRJNA1078313	SAMN40082686							34929945
20230627	R107	50	33	Wildtype	Wildtype	Wildtype	Wildtype_33	AAGCGACT+CTTCGCCT	6719567	2016	38.15	90.98	PRJNA1078313	SAMN40082687	yes	AAGCGACT+CTTCGCCT	17533977	5260	38	90.31	24253544
20230627	R108	50	33	Wildtype	Wildtype	Wildtype	Wildtype_33	AAGCGACT+TAAGATTA	29304668	8791	37.95	90.04	PRJNA1078313	SAMN40082688							29304668
20230627	R55	50	27	Cladocopium	Bleached	Bleached_Cladocopium	Bleached_Cladocopium_27	TTCCTCCT+AGGCTATA	15818670	4746	37.8	89.36	PRJNA1078313	SAMN40082635	yes	TTCCTCCT+AGGCTATA	2876549	863	38.01	90.43	18695219
20230627	R56	50	27	Cladocopium	Bleached	Bleached_Cladocopium	Bleached_Cladocopium_27	TTCCTCCT+GCCTCTAT	17457755	5238	37.58	88.32	PRJNA1078313	SAMN40082636	yes	TTCCTCCT+GCCTCTAT	1661094	498	37.94	90.06	19118849
20230627	R57	50	27	Cladocopium	Bleached	Bleached_Cladocopium	Bleached_Cladocopium_27	TTCCTCCT+AGGATAGG	15232862	4570	37.7	88.88	PRJNA1078313	SAMN40082637	yes	TTCCTCCT+AGGATAGG	3287827	986	38.08	90.73	18520689
20230627	R58	50	27	Cladocopium	Bleached	Bleached_Cladocopium	Bleached_Cladocopium_27	TTCCTCCT+TCAGAGCC	17987445	5396	37.73	89.03	PRJNA1078313	SAMN40082638	yes	TTCCTCCT+TCAGAGCC	1373160	412	38.09	90.78	19360605
20230627	R59	50	27	Cladocopium	Bleached	Bleached_Cladocopium	Bleached_Cladocopium_27	TTCCTCCT+CTTCGCCT	6002991	1801	37.96	90.11	PRJNA1078313	SAMN40082639	yes	TTCCTCCT+CTTCGCCT	33684113	10105	37.96	90.15	39687104
20230627	R60	50	27	Cladocopium	Bleached	Bleached_Cladocopium	Bleached_Cladocopium_27	TTCCTCCT+TAAGATTA	16284649	4885	37.63	88.54	PRJNA1078313	SAMN40082640	yes	TTCCTCCT+TAAGATTA	2530567	759	38	90.36	18815216
20230627	R61	50	27	Mixed	Nonbleached	Nonbleached_Mixed	Nonbleached_Mixed_27	TTCCTCCT+ACGTCCTG	19891057	5967	37.73	89.05	PRJNA1078313	SAMN40082641							19891057
20230627	R62	50	27	Mixed	Nonbleached	Nonbleached_Mixed	Nonbleached_Mixed_27	TTCCTCCT+GTCAGTAC	16675688	5003	37.64	88.59	PRJNA1078313	SAMN40082642	yes	TTCCTCCT+GTCAGTAC	2313143	694	38.01	90.39	18988831
20230627	R63	50	27	Mixed	Nonbleached	Nonbleached_Mixed	Nonbleached_Mixed_27	TGCTTGCT+AGGCTATA	23828671	7149	38.01	90.33	PRJNA1078313	SAMN40082643							23828671
20230627	R64	50	27	Mixed	Nonbleached	Nonbleached_Mixed	Nonbleached_Mixed_27	TGCTTGCT+GCCTCTAT	32277087	9683	37.93	89.96	PRJNA1078313	SAMN40082644							32277087
20230627	R65	50	27	Mixed	Nonbleached	Nonbleached_Mixed	Nonbleached_Mixed_27	TGCTTGCT+AGGATAGG	32099169	9630	37.78	89.27	PRJNA1078313	SAMN40082645							32099169
20230627	R66	50	27	Mixed	Nonbleached	Nonbleached_Mixed	Nonbleached_Mixed_27	TGCTTGCT+TCAGAGCC	32272569	9682	37.96	90.08	PRJNA1078313	SAMN40082646							32272569
20230627	R67	50	27	Wildtype	Wildtype	Wildtype	Wildtype_27	TGCTTGCT+CTTCGCCT	7665773	2300	38.07	90.62	PRJNA1078313	SAMN40082647	yes	TGCTTGCT+CTTCGCCT	17430173	5229	37.9	89.84	25095946
20230627	R68	50	27	Wildtype	Wildtype	Wildtype	Wildtype_27	TGCTTGCT+TAAGATTA	30934325	9280	37.92	89.89	PRJNA1078313	SAMN40082648							30934325
20230627	R69	50	27	Wildtype	Wildtype	Wildtype	Wildtype_27	TGCTTGCT+ACGTCCTG	39517309	11856	37.9	89.77	PRJNA1078313	SAMN40082649							39517309
20230627	R70	50	27	Wildtype	Wildtype	Wildtype	Wildtype_27	TGCTTGCT+GTCAGTAC	34280767	10284	37.88	89.7	PRJNA1078313	SAMN40082650							34280767
20230627	R71	50	27	Wildtype	Wildtype	Wildtype	Wildtype_27	GGTGATGA+AGGCTATA	27838736	8352	37.99	90.27	PRJNA1078313	SAMN40082651							27838736
20230627	R72	50	27	Wildtype	Wildtype	Wildtype	Wildtype_27	GGTGATGA+GCCTCTAT	33950055	10185	37.81	89.34	PRJNA1078313	SAMN40082652							33950055
20230627	R73	50	30	Cladocopium	Bleached	Bleached_Cladocopium	Bleached_Cladocopium_30	GGTGATGA+AGGATAGG	35173243	10552	37.88	89.71	PRJNA1078313	SAMN40082653							35173243
20230627	R74	50	30	Cladocopium	Bleached	Bleached_Cladocopium	Bleached_Cladocopium_30	GGTGATGA+TCAGAGCC	34922263	10476	37.96	90.07	PRJNA1078313	SAMN40082654							34922263
20230627	R75	50	30	Cladocopium	Bleached	Bleached_Cladocopium	Bleached_Cladocopium_30	GGTGATGA+CTTCGCCT	9871112	2961	38.12	90.85	PRJNA1078313	SAMN40082655	yes	GGTGATGA+CTTCGCCT	11054008	3316	37.99	90.27	20925120
20230627	R76	50	30	Cladocopium	Bleached	Bleached_Cladocopium	Bleached_Cladocopium_30	GGTGATGA+TAAGATTA	33036284	9911	37.95	90.02	PRJNA1078313	SAMN40082656							33036284
20230627	R77	50	30	Cladocopium	Bleached	Bleached_Cladocopium	Bleached_Cladocopium_30	GGTGATGA+ACGTCCTG	39746840	11924	37.98	90.17	PRJNA1078313	SAMN40082657							39746840
20230627	R78	50	30	Cladocopium	Bleached	Bleached_Cladocopium	Bleached_Cladocopium_30	GGTGATGA+GTCAGTAC	35398354	10619	37.86	89.61	PRJNA1078313	SAMN40082658							35398354
20230627	R79	50	30	Mixed	Nonbleached	Nonbleached_Mixed	Nonbleached_Mixed_30	AACCTACG+AGGCTATA	25009225	7502	37.93	89.95	PRJNA1078313	SAMN40082659							25009225
20230627	R80	50	30	Mixed	Nonbleached	Nonbleached_Mixed	Nonbleached_Mixed_30	AACCTACG+GCCTCTAT	36133089	10840	37.85	89.53	PRJNA1078313	SAMN40082660							36133089
20230627	R81	50	30	Mixed	Nonbleached	Nonbleached_Mixed	Nonbleached_Mixed_30	AACCTACG+AGGATAGG	33493354	10048	37.78	89.2	PRJNA1078313	SAMN40082661							33493354
20230627	R82	50	30	Mixed	Nonbleached	Nonbleached_Mixed	Nonbleached_Mixed_30	AACCTACG+TCAGAGCC	35452563	10636	37.92	89.85	PRJNA1078313	SAMN40082662							35452563
20230627	R83	50	30	Mixed	Nonbleached	Nonbleached_Mixed	Nonbleached_Mixed_30	AACCTACG+CTTCGCCT	7497048	2249	38.06	90.57	PRJNA1078313	SAMN40082663	yes	AACCTACG+CTTCGCCT	17612372	5284	37.76	89.18	25109420
20230627	R84	50	30	Mixed	Nonbleached	Nonbleached_Mixed	Nonbleached_Mixed_30	AACCTACG+TAAGATTA	32220467	9666	37.92	89.86	PRJNA1078313	SAMN40082664							32220467
20230627	R85	50	30	Wildtype	Wildtype	Wildtype	Wildtype_30	AACCTACG+ACGTCCTG	36878406	11063	37.84	89.46	PRJNA1078313	SAMN40082665							36878406
20230627	R86	50	30	Wildtype	Wildtype	Wildtype	Wildtype_30	AACCTACG+GTCAGTAC	32691940	9808	37.76	89.14	PRJNA1078313	SAMN40082666							32691940
20230627	R87	50	30	Wildtype	Wildtype	Wildtype	Wildtype_30	GGATCTGA+AGGCTATA	24408673	7323	38.09	90.72	PRJNA1078313	SAMN40082667							24408673
20230627	R88	50	30	Wildtype	Wildtype	Wildtype	Wildtype_30	GGATCTGA+GCCTCTAT	30584889	9175	37.85	89.56	PRJNA1078313	SAMN40082668							30584889
20230627	R89	50	30	Wildtype	Wildtype	Wildtype	Wildtype_30	GGATCTGA+AGGATAGG	33121528	9936	37.87	89.65	PRJNA1078313	SAMN40082669							33121528
20230627	R90	50	30	Wildtype	Wildtype	Wildtype	Wildtype_30	GGATCTGA+TCAGAGCC	31879658	9564	38.03	90.37	PRJNA1078313	SAMN40082670							31879658
20230627	R91	50	33	Cladocopium	Bleached	Bleached_Cladocopium	Bleached_Cladocopium_33	GGATCTGA+CTTCGCCT	6934253	2080	38.01	90.36	PRJNA1078313	SAMN40082671	yes	GGATCTGA+CTTCGCCT	17511498	5253	38.03	90.44	24445751
20230627	R92	50	33	Cladocopium	Bleached	Bleached_Cladocopium	Bleached_Cladocopium_33	GGATCTGA+TAAGATTA	28565956	8570	37.93	89.93	PRJNA1078313	SAMN40082672							28565956
20230627	R93	50	33	Cladocopium	Bleached	Bleached_Cladocopium	Bleached_Cladocopium_33	GGATCTGA+ACGTCCTG	31620062	9486	38.02	90.37	PRJNA1078313	SAMN40082673							31620062
20230627	R94	50	33	Cladocopium	Bleached	Bleached_Cladocopium	Bleached_Cladocopium_33	GGATCTGA+GTCAGTAC	32531274	9759	37.87	89.62	PRJNA1078313	SAMN40082674							32531274
20230627	R95	50	33	Cladocopium	Bleached	Bleached_Cladocopium	Bleached_Cladocopium_33	TGATCACG+AGGCTATA	27375934	8213	37.92	89.92	PRJNA1078313	SAMN40082675							27375934
20230627	R96	50	33	Cladocopium	Bleached	Bleached_Cladocopium	Bleached_Cladocopium_33	TGATCACG+GCCTCTAT	35882529	10765	37.78	89.22	PRJNA1078313	SAMN40082676							35882529
20230627	R97	50	33	Mixed	Nonbleached	Nonbleached_Mixed	Nonbleached_Mixed_33	TGATCACG+AGGATAGG	33376891	10013	37.77	89.17	PRJNA1078313	SAMN40082677							33376891
20230627	R98	50	33	Mixed	Nonbleached	Nonbleached_Mixed	Nonbleached_Mixed_33	TGATCACG+TCAGAGCC	35899983	10770	37.86	89.63	PRJNA1078313	SAMN40082678							35899983
20230627	R99	50	33	Mixed	Nonbleached	Nonbleached_Mixed	Nonbleached_Mixed_33	TGATCACG+CTTCGCCT	8420607	2526	37.83	89.57	PRJNA1078313	SAMN40082679	yes	TGATCACG+CTTCGCCT	16221581	4866	37.47	87.83	24642188

All samples now have >18 M reads ranging from ~18M to ~35M. This is great! It increased read depth for the samples that previously had <12M reads.

Download sequences to URI server

Prepare folders in Andromeda directory.

#logged into Andromeda 
cd /data/putnamlab/ashuffmyer
cd mcap-2023-rnaseq
cd raw-sequences
mkdir second_sequencing

Now the directory that I want sequences in is /data/putnamlab/ashuffmyer/mcap-2023-rnaseq/raw-sequences/second_sequencing.

# in andromeda second_sequencing folder 

pwd 
/data/putnamlab/ashuffmyer/mcap-2023-rnaseq/raw-sequences/second_sequencing

# log into Azenta sftp while logged into Andromeda as directed by Azenta and navigate to sequence folder 00_fastq

#set directory for download
lcd /data/putnamlab/ashuffmyer/mcap-2023-rnaseq/raw-sequences/second_sequencing

#download all files in sequence folder 
mget *

Downloaded to URI Andromeda on April 12 2024.

I then checked for data integrity using md5 checksums.

md5sum *.fastq.gz > checkmd5_20240412.md5
md5sum -c checkmd5_20240412.md5

Output was as follows:

R107_R1_001.fastq.gz: OK
R107_R2_001.fastq.gz: OK
R55_R1_001.fastq.gz: OK
R55_R2_001.fastq.gz: OK
R56_R1_001.fastq.gz: OK
R56_R2_001.fastq.gz: OK
R57_R1_001.fastq.gz: OK
R57_R2_001.fastq.gz: OK
R58_R1_001.fastq.gz: OK
R58_R2_001.fastq.gz: OK
R59_R1_001.fastq.gz: OK
R59_R2_001.fastq.gz: OK
R60_R1_001.fastq.gz: OK
R60_R2_001.fastq.gz: OK
R62_R1_001.fastq.gz: OK
R62_R2_001.fastq.gz: OK
R67_R1_001.fastq.gz: OK
R67_R2_001.fastq.gz: OK
R75_R1_001.fastq.gz: OK
R75_R2_001.fastq.gz: OK
R83_R1_001.fastq.gz: OK
R83_R2_001.fastq.gz: OK
R91_R1_001.fastq.gz: OK
R91_R2_001.fastq.gz: OK
R99_R1_001.fastq.gz: OK
R99_R2_001.fastq.gz: OK

Azenta provided a .md5 file for each sequence file. I then compared generated checksums to these original files to confirm data integrity and content is the same after the transfer from Azenta.

#bind together all .md5 files provided by Azenta 

cat *.gz.md5 > azenta_original_checksums_second.md5

This provided the following list of the original checksums:

less azenta_original_checksums_second.md5

6ee514bdc1bacfb591629a3edf82bcd4  ./R107_R1_001.fastq.gz
3d6b00afc053f2eb770f83c5830f699e  ./R107_R2_001.fastq.gz
16aba2b715ceeb2c2f3a79dbc855e768  ./R55_R1_001.fastq.gz
14e32d9fccd55fd81a3468a4fd5766d1  ./R55_R2_001.fastq.gz
7fbb16f6f920e4a58431cd91e14b912b  ./R56_R1_001.fastq.gz
80bd51fd98400da3fb91b1182252f577  ./R56_R2_001.fastq.gz
aa6b3bd9617c10ca9b504309cc1ff047  ./R57_R1_001.fastq.gz
c52e1d5d9c7caa2189eaedf81944dfc7  ./R57_R2_001.fastq.gz
027601dabbfc6e48829efe5b931e6e05  ./R58_R1_001.fastq.gz
b1e2605ec7c35ad3c25ad7ab03cb4d2f  ./R58_R2_001.fastq.gz
a62cfd57273efefcfc4076e66cb5da1c  ./R59_R1_001.fastq.gz
dfa3021694fd5ddc01c387eafe34a0d9  ./R59_R2_001.fastq.gz
ba4c0c4c8dede34d546ee7ec5293553b  ./R60_R1_001.fastq.gz
dbf68528a9ee3428d378275c83458b24  ./R60_R2_001.fastq.gz
9ae8dc138d0df94197388b9da117f658  ./R62_R1_001.fastq.gz
96e1b11cb6e60415dc6f69fb4849651e  ./R62_R2_001.fastq.gz
0d2b1896bf85c0800a3a3482f34c90ca  ./R67_R1_001.fastq.gz
cbb14c976024dfe72a7561415284c41a  ./R67_R2_001.fastq.gz
f16d7330659ba6e310ade50cdae10b8c  ./R75_R1_001.fastq.gz
9b9ab52837950fdd8108d22e29f9bc31  ./R75_R2_001.fastq.gz
429bf4b34de618d97ec2213e658477c2  ./R83_R1_001.fastq.gz
77cf8a4c7d2db0bf83280df1254b335a  ./R83_R2_001.fastq.gz
86064e95efa1d802371fbda68d2be0ca  ./R91_R1_001.fastq.gz
3e24c0fdad691020cd1140e3a90bea3f  ./R91_R2_001.fastq.gz
c117e340bd9bef1bc295b6e86f6bfcbb  ./R99_R1_001.fastq.gz
0ebb0438dde06d989cfd79a5616ef9a3  ./R99_R2_001.fastq.gz

Then, here is the md5 checksum of the downloaded data on Andromeda:

less checkmd5_20240412.md5

6ee514bdc1bacfb591629a3edf82bcd4  R107_R1_001.fastq.gz
3d6b00afc053f2eb770f83c5830f699e  R107_R2_001.fastq.gz
16aba2b715ceeb2c2f3a79dbc855e768  R55_R1_001.fastq.gz
14e32d9fccd55fd81a3468a4fd5766d1  R55_R2_001.fastq.gz
7fbb16f6f920e4a58431cd91e14b912b  R56_R1_001.fastq.gz
80bd51fd98400da3fb91b1182252f577  R56_R2_001.fastq.gz
aa6b3bd9617c10ca9b504309cc1ff047  R57_R1_001.fastq.gz
c52e1d5d9c7caa2189eaedf81944dfc7  R57_R2_001.fastq.gz
027601dabbfc6e48829efe5b931e6e05  R58_R1_001.fastq.gz
b1e2605ec7c35ad3c25ad7ab03cb4d2f  R58_R2_001.fastq.gz
a62cfd57273efefcfc4076e66cb5da1c  R59_R1_001.fastq.gz
dfa3021694fd5ddc01c387eafe34a0d9  R59_R2_001.fastq.gz
ba4c0c4c8dede34d546ee7ec5293553b  R60_R1_001.fastq.gz
dbf68528a9ee3428d378275c83458b24  R60_R2_001.fastq.gz
9ae8dc138d0df94197388b9da117f658  R62_R1_001.fastq.gz
96e1b11cb6e60415dc6f69fb4849651e  R62_R2_001.fastq.gz
0d2b1896bf85c0800a3a3482f34c90ca  R67_R1_001.fastq.gz
cbb14c976024dfe72a7561415284c41a  R67_R2_001.fastq.gz
f16d7330659ba6e310ade50cdae10b8c  R75_R1_001.fastq.gz
9b9ab52837950fdd8108d22e29f9bc31  R75_R2_001.fastq.gz
429bf4b34de618d97ec2213e658477c2  R83_R1_001.fastq.gz
77cf8a4c7d2db0bf83280df1254b335a  R83_R2_001.fastq.gz
86064e95efa1d802371fbda68d2be0ca  R91_R1_001.fastq.gz
3e24c0fdad691020cd1140e3a90bea3f  R91_R2_001.fastq.gz
c117e340bd9bef1bc295b6e86f6bfcbb  R99_R1_001.fastq.gz
0ebb0438dde06d989cfd79a5616ef9a3  R99_R2_001.fastq.gz

I then added these lists to a spreadsheet and checked that the cells matched for original and downloaded files. The file is on GitHub here.

I also added in the checksums for data that I downloaded to Mox (detailed below). All files are confirmed and everything looks good!

file	original_azenta	downloaded_andromeda	downloaded_mox	match_andromeda	match_mox
R107_R1_001.fastq.gz	6ee514bdc1bacfb591629a3edf82bcd4	6ee514bdc1bacfb591629a3edf82bcd4	6ee514bdc1bacfb591629a3edf82bcd4	TRUE	TRUE
R107_R2_001.fastq.gz	3d6b00afc053f2eb770f83c5830f699e	3d6b00afc053f2eb770f83c5830f699e	3d6b00afc053f2eb770f83c5830f699e	TRUE	TRUE
R55_R1_001.fastq.gz	16aba2b715ceeb2c2f3a79dbc855e768	16aba2b715ceeb2c2f3a79dbc855e768	16aba2b715ceeb2c2f3a79dbc855e768	TRUE	TRUE
R55_R2_001.fastq.gz	14e32d9fccd55fd81a3468a4fd5766d1	14e32d9fccd55fd81a3468a4fd5766d1	14e32d9fccd55fd81a3468a4fd5766d1	TRUE	TRUE
R56_R1_001.fastq.gz	7fbb16f6f920e4a58431cd91e14b912b	7fbb16f6f920e4a58431cd91e14b912b	7fbb16f6f920e4a58431cd91e14b912b	TRUE	TRUE
R56_R2_001.fastq.gz	80bd51fd98400da3fb91b1182252f577	80bd51fd98400da3fb91b1182252f577	80bd51fd98400da3fb91b1182252f577	TRUE	TRUE
R57_R1_001.fastq.gz	aa6b3bd9617c10ca9b504309cc1ff047	aa6b3bd9617c10ca9b504309cc1ff047	aa6b3bd9617c10ca9b504309cc1ff047	TRUE	TRUE
R57_R2_001.fastq.gz	c52e1d5d9c7caa2189eaedf81944dfc7	c52e1d5d9c7caa2189eaedf81944dfc7	c52e1d5d9c7caa2189eaedf81944dfc7	TRUE	TRUE
R58_R1_001.fastq.gz	027601dabbfc6e48829efe5b931e6e05	027601dabbfc6e48829efe5b931e6e05	027601dabbfc6e48829efe5b931e6e05	TRUE	TRUE
R58_R2_001.fastq.gz	b1e2605ec7c35ad3c25ad7ab03cb4d2f	b1e2605ec7c35ad3c25ad7ab03cb4d2f	b1e2605ec7c35ad3c25ad7ab03cb4d2f	TRUE	TRUE
R59_R1_001.fastq.gz	a62cfd57273efefcfc4076e66cb5da1c	a62cfd57273efefcfc4076e66cb5da1c	a62cfd57273efefcfc4076e66cb5da1c	TRUE	TRUE
R59_R2_001.fastq.gz	dfa3021694fd5ddc01c387eafe34a0d9	dfa3021694fd5ddc01c387eafe34a0d9	dfa3021694fd5ddc01c387eafe34a0d9	TRUE	TRUE
R60_R1_001.fastq.gz	ba4c0c4c8dede34d546ee7ec5293553b	ba4c0c4c8dede34d546ee7ec5293553b	ba4c0c4c8dede34d546ee7ec5293553b	TRUE	TRUE
R60_R2_001.fastq.gz	dbf68528a9ee3428d378275c83458b24	dbf68528a9ee3428d378275c83458b24	dbf68528a9ee3428d378275c83458b24	TRUE	TRUE
R62_R1_001.fastq.gz	9ae8dc138d0df94197388b9da117f658	9ae8dc138d0df94197388b9da117f658	9ae8dc138d0df94197388b9da117f658	TRUE	TRUE
R62_R2_001.fastq.gz	96e1b11cb6e60415dc6f69fb4849651e	96e1b11cb6e60415dc6f69fb4849651e	96e1b11cb6e60415dc6f69fb4849651e	TRUE	TRUE
R67_R1_001.fastq.gz	0d2b1896bf85c0800a3a3482f34c90ca	0d2b1896bf85c0800a3a3482f34c90ca	0d2b1896bf85c0800a3a3482f34c90ca	TRUE	TRUE
R67_R2_001.fastq.gz	cbb14c976024dfe72a7561415284c41a	cbb14c976024dfe72a7561415284c41a	cbb14c976024dfe72a7561415284c41a	TRUE	TRUE
R75_R1_001.fastq.gz	f16d7330659ba6e310ade50cdae10b8c	f16d7330659ba6e310ade50cdae10b8c	f16d7330659ba6e310ade50cdae10b8c	TRUE	TRUE
R75_R2_001.fastq.gz	9b9ab52837950fdd8108d22e29f9bc31	9b9ab52837950fdd8108d22e29f9bc31	9b9ab52837950fdd8108d22e29f9bc31	TRUE	TRUE
R83_R1_001.fastq.gz	429bf4b34de618d97ec2213e658477c2	429bf4b34de618d97ec2213e658477c2	429bf4b34de618d97ec2213e658477c2	TRUE	TRUE
R83_R2_001.fastq.gz	77cf8a4c7d2db0bf83280df1254b335a	77cf8a4c7d2db0bf83280df1254b335a	77cf8a4c7d2db0bf83280df1254b335a	TRUE	TRUE
R91_R1_001.fastq.gz	86064e95efa1d802371fbda68d2be0ca	86064e95efa1d802371fbda68d2be0ca	86064e95efa1d802371fbda68d2be0ca	TRUE	TRUE
R91_R2_001.fastq.gz	3e24c0fdad691020cd1140e3a90bea3f	3e24c0fdad691020cd1140e3a90bea3f	3e24c0fdad691020cd1140e3a90bea3f	TRUE	TRUE
R99_R1_001.fastq.gz	c117e340bd9bef1bc295b6e86f6bfcbb	c117e340bd9bef1bc295b6e86f6bfcbb	c117e340bd9bef1bc295b6e86f6bfcbb	TRUE	TRUE
R99_R2_001.fastq.gz	0ebb0438dde06d989cfd79a5616ef9a3	0ebb0438dde06d989cfd79a5616ef9a3	0ebb0438dde06d989cfd79a5616ef9a3	TRUE	TRUE

Data are now downloaded and integrity confirmed on URI Andromeda.

Download data to UW Hyak/Mox

#logged into UW Hyak/Mox
cd /gscratch/srlab/ashuff/mcap-2023-rnaseq
mkdir second_sequencing
cd second_sequencing

The full directory where I want raw sequences to go is /gscratch/srlab/ashuff/mcap-2023-rnaseq/second_sequencing .

# in Hyak ashuffm folder 

# log into Azenta sftp as directed by Azenta 
# cd into my project folder

#set directory for download
lcd /gscratch/srlab/ashuff/mcap-2023-rnaseq/second_sequencing

#download all files into project folder 

mget *

Downloaded on 13 April 2024.

I then checked for data integrity using md5 checksums. Azenta provided a .md5 file for each sequence file. See the table above for confirmation of md5 checksums.

srun -p srlab -A srlab --time=1:00:00 --mem=100G --pty /bin/bash

md5sum *.fastq.gz > checkmd5_20240413.md5

md5sum -c checkmd5_20240413.md5  

Check sums from data downloaded on Mox is here:

less checkmd5_20240413.md5

6ee514bdc1bacfb591629a3edf82bcd4  R107_R1_001.fastq.gz
3d6b00afc053f2eb770f83c5830f699e  R107_R2_001.fastq.gz
16aba2b715ceeb2c2f3a79dbc855e768  R55_R1_001.fastq.gz
14e32d9fccd55fd81a3468a4fd5766d1  R55_R2_001.fastq.gz
7fbb16f6f920e4a58431cd91e14b912b  R56_R1_001.fastq.gz
80bd51fd98400da3fb91b1182252f577  R56_R2_001.fastq.gz
aa6b3bd9617c10ca9b504309cc1ff047  R57_R1_001.fastq.gz
c52e1d5d9c7caa2189eaedf81944dfc7  R57_R2_001.fastq.gz
027601dabbfc6e48829efe5b931e6e05  R58_R1_001.fastq.gz
b1e2605ec7c35ad3c25ad7ab03cb4d2f  R58_R2_001.fastq.gz
a62cfd57273efefcfc4076e66cb5da1c  R59_R1_001.fastq.gz
dfa3021694fd5ddc01c387eafe34a0d9  R59_R2_001.fastq.gz
ba4c0c4c8dede34d546ee7ec5293553b  R60_R1_001.fastq.gz
dbf68528a9ee3428d378275c83458b24  R60_R2_001.fastq.gz
9ae8dc138d0df94197388b9da117f658  R62_R1_001.fastq.gz
96e1b11cb6e60415dc6f69fb4849651e  R62_R2_001.fastq.gz
0d2b1896bf85c0800a3a3482f34c90ca  R67_R1_001.fastq.gz
cbb14c976024dfe72a7561415284c41a  R67_R2_001.fastq.gz
f16d7330659ba6e310ade50cdae10b8c  R75_R1_001.fastq.gz
9b9ab52837950fdd8108d22e29f9bc31  R75_R2_001.fastq.gz
429bf4b34de618d97ec2213e658477c2  R83_R1_001.fastq.gz
77cf8a4c7d2db0bf83280df1254b335a  R83_R2_001.fastq.gz
86064e95efa1d802371fbda68d2be0ca  R91_R1_001.fastq.gz
3e24c0fdad691020cd1140e3a90bea3f  R91_R2_001.fastq.gz
c117e340bd9bef1bc295b6e86f6bfcbb  R99_R1_001.fastq.gz
0ebb0438dde06d989cfd79a5616ef9a3  R99_R2_001.fastq.gz

Output from checksums is here:

R107_R1_001.fastq.gz: OK
R107_R2_001.fastq.gz: OK
R55_R1_001.fastq.gz: OK
R55_R2_001.fastq.gz: OK
R56_R1_001.fastq.gz: OK
R56_R2_001.fastq.gz: OK
R57_R1_001.fastq.gz: OK
R57_R2_001.fastq.gz: OK
R58_R1_001.fastq.gz: OK
R58_R2_001.fastq.gz: OK
R59_R1_001.fastq.gz: OK
R59_R2_001.fastq.gz: OK
R60_R1_001.fastq.gz: OK
R60_R2_001.fastq.gz: OK
R62_R1_001.fastq.gz: OK
R62_R2_001.fastq.gz: OK
R67_R1_001.fastq.gz: OK
R67_R2_001.fastq.gz: OK
R75_R1_001.fastq.gz: OK
R75_R2_001.fastq.gz: OK
R83_R1_001.fastq.gz: OK
R83_R2_001.fastq.gz: OK
R91_R1_001.fastq.gz: OK
R91_R2_001.fastq.gz: OK
R99_R1_001.fastq.gz: OK
R99_R2_001.fastq.gz: OK

Everything looks good and all data files were transferred correctly.

Files are now stored on both URI and UW servers.

2. Rename files to indicate they are from the second sequencing

Add an “s” after the sample ID to indicate that samples are from the second round of sequencing, so that they can be uploaded to NCBI and will remain distinct from first round of sequencing.

I am performing this in Andromeda, becuase this is where I am going to conduct analyses and upload to NCBI.

/data/putnamlab/ashuffmyer/mcap-2023-rnaseq/raw-sequences/second_sequencing 

for file in *; do
    # Check if the file name contains ".gz" and an underscore
    if [[ $file == *".gz"* && $file == *_* ]]; then
        # Extract the part of the file name before the first underscore
        prefix="${file%%_*}"
        # Extract the part of the file name after the first underscore
        suffix="${file#*_}"
        # Rename the file by adding "s" after the prefix
        mv "$file" "${prefix}s_$suffix"
        echo "Renamed: $file to ${prefix}s_$suffix"
    fi
done

This changed the files to the following names:

azenta_original_checksums_second.md5  R60s_R1_001.fastq.gz.md5
checkmd5_20240412.md5                 R60s_R2_001.fastq.gz
md5_files                             R60s_R2_001.fastq.gz.md5
R107s_R1_001.fastq.gz                 R62s_R1_001.fastq.gz
R107s_R1_001.fastq.gz.md5             R62s_R1_001.fastq.gz.md5
R107s_R2_001.fastq.gz                 R62s_R2_001.fastq.gz
R107s_R2_001.fastq.gz.md5             R62s_R2_001.fastq.gz.md5
R55s_R1_001.fastq.gz                  R67s_R1_001.fastq.gz
R55s_R1_001.fastq.gz.md5              R67s_R1_001.fastq.gz.md5
R55s_R2_001.fastq.gz                  R67s_R2_001.fastq.gz
R55s_R2_001.fastq.gz.md5              R67s_R2_001.fastq.gz.md5
R56s_R1_001.fastq.gz                  R75s_R1_001.fastq.gz
R56s_R1_001.fastq.gz.md5              R75s_R1_001.fastq.gz.md5
R56s_R2_001.fastq.gz                  R75s_R2_001.fastq.gz
R56s_R2_001.fastq.gz.md5              R75s_R2_001.fastq.gz.md5
R57s_R1_001.fastq.gz                  R83s_R1_001.fastq.gz
R57s_R1_001.fastq.gz.md5              R83s_R1_001.fastq.gz.md5
R57s_R2_001.fastq.gz                  R83s_R2_001.fastq.gz
R57s_R2_001.fastq.gz.md5              R83s_R2_001.fastq.gz.md5
R58s_R1_001.fastq.gz                  R91s_R1_001.fastq.gz
R58s_R1_001.fastq.gz.md5              R91s_R1_001.fastq.gz.md5
R58s_R2_001.fastq.gz                  R91s_R2_001.fastq.gz
R58s_R2_001.fastq.gz.md5              R91s_R2_001.fastq.gz.md5
R59s_R1_001.fastq.gz                  R99s_R1_001.fastq.gz
R59s_R1_001.fastq.gz.md5              R99s_R1_001.fastq.gz.md5
R59s_R2_001.fastq.gz                  R99s_R2_001.fastq.gz
R59s_R2_001.fastq.gz.md5              R99s_R2_001.fastq.gz.md5
R60s_R1_001.fastq.gz                  raw_fastqc

The files that were sequenced in the second round now contain “s” after the sample name.

3. Upload to NCBI project

Project ID: PRJNA1078313

I used the same information on uploading for this project detailed in my previous post and the Putnam Lab SRA protocol..

The sample metadata can be found on GitHub here using the MIMS.me.host-associated template here and sequencing information and metadata is located on GitHub here.

I will upload these files as a new SRA submission and link them to the existing bioproject and biosamples.

I started a new SRA submission process and entered the BioProject number PRJNA1078313. I added a release date of March 1 2025.

I created a new SRA metadata sheet for the samples that are being uploaded in this upload. I added the description “Standard mRNA-seq from libraries prepared using Zymo Quick MiniPrep Plus DNA/RNA extraction kit at the Putnam Lab (University of Rhode Island) with sequencing at Azenta Life Sciences. Additional top off sequencing performed by Azenta Life Sciences with additional sequencing files denoted with “s” following sample name prefix.”.

The metadata sheet for these files can be found on GitHub here.

Transfer data from URI server to NCBI SRA

I next followed FTP instructions to upload top off sequencing files.

First, log onto Andromeda, make a folder for this upload, and symlink all files.

cd /data/putnamlab/ashuffmyer/mcap-2023-rnaseq/ncbi_upload
mkdir 20240416_upload
cd 20240416_upload

ln -s /data/putnamlab/ashuffmyer/mcap-2023-rnaseq/raw-sequences/second_sequencing/*gz /data/putnamlab/ashuffmyer/mcap-2023-rnaseq/ncbi_upload/20240416_upload

Next, upload files using FTP instructions.

cd /data/putnamlab/ashuffmyer/mcap-2023-rnaseq/ncbi_upload/20240416_upload

ftp -i 

open ftp-private.ncbi.nlm.nih.gov

#enter name and password given on SRA webpage

cd uploads/ashuffmyer_gmail.com_bsKvx0RY

mkdir mcap-2023-rnaseq-upload-20240416

cd mcap-2023-rnaseq-upload-20240416

mput *

The upload to SRA will proceed for each file with messages “transfer complete” when each is uploaded. Keep computer active until all uploads are finished.

Files successfully uploaded.

Continue with the submission by selecting the preload folder on SRA once all 26 files registered.

RNA-Seq sequence files were submitted under SRA SUB14381642.

All information added to the Putnam Lab sequence inventory here.

Written on April 12, 2024