Raw sequence downloads and NCBI SRA uploads for M. capitata larval thermal tolerance project top off sequencing second upload
This post details the NCBI Sequence Read Archive upload for my Montipora capitata 2023 larval thermal tolerance project. See my notebook posts and my GitHub repo for information on this project.
I previously uploaded and QC’d data from our first round of sequencing in this project with downloads detailed here and QC information here.
Due to low read depth in some samples (13 samples at <20M reads, 7 samples at <15M reads), Azenta performed top off sequencing to meet deliverables for this project. Azenta added additional sequencing for 13 total samples. These samples were distributed across treatments.
In the first round of sequencing (Feb 2024), samples were sequenced on one lane of a NovaX-25B instrument. The top off samples were also performed on one lane of the same instrument on 4/11/2024.
This post details downloads and SRA uploads for the new sequencing files.
1. Download top off data from Azenta
View result statistics and metadata
I used Azenta’s sFTP instructions to view the project results. In their online storage, they provided an .html with an overview of statistics and the R1 and R2 .fastq.gz files for each sample. There are also .md5 files for each .fastq.gz file.
sftp ashuff_uw@sftp.genewiz.com
#entered password from email
lcd MyProjects/larval_symbiont_TPC/data/rna_seq
cd 30-943303755
cd second_delivery
mget Azenta_30-943303755_Data_Report.html
I renamed this file to have “_second” on the title and retained the original file for the first round of sequencing with the file name Azenta_30-943303755_Data_Report.html
.
The file for the second sequencing report is on GitHub here.
Overall statistics
- Number of samples = 13
- number reads = 145,090,062
- Yield (Mbases) = 43,525
- Mean quality scores = 38.00
- % bases over 30 quality score = 90.00
More detailed statistics
I downloaded the report statistics available on GitHub here as well as the full report and added this data to the sample metadata on GitHub here.
- All samples had >37 mean quality scores
- Yeild (Mbases) ranged from ~500-10,000 per sample with an additional 1.6-33 million reads per sample.
- 89+% bases had quality scores over 30
Here is the summary of the second round of sequencing:
Project | Sample ID | Barcode Sequence | # Reads | Yield (Mbases) | Mean Quality Score | % Bases >= 30 |
---|---|---|---|---|---|---|
30-943303755 | R107 | AAGCGACT+CTTCGCCT | 17533977 | 5260 | 38 | 90.31 |
30-943303755 | R55 | TTCCTCCT+AGGCTATA | 2876549 | 863 | 38.01 | 90.43 |
30-943303755 | R56 | TTCCTCCT+GCCTCTAT | 1661094 | 498 | 37.94 | 90.06 |
30-943303755 | R57 | TTCCTCCT+AGGATAGG | 3287827 | 986 | 38.08 | 90.73 |
30-943303755 | R58 | TTCCTCCT+TCAGAGCC | 1373160 | 412 | 38.09 | 90.78 |
30-943303755 | R59 | TTCCTCCT+CTTCGCCT | 33684113 | 10105 | 37.96 | 90.15 |
30-943303755 | R60 | TTCCTCCT+TAAGATTA | 2530567 | 759 | 38 | 90.36 |
30-943303755 | R62 | TTCCTCCT+GTCAGTAC | 2313143 | 694 | 38.01 | 90.39 |
30-943303755 | R67 | TGCTTGCT+CTTCGCCT | 17430173 | 5229 | 37.9 | 89.84 |
30-943303755 | R75 | GGTGATGA+CTTCGCCT | 11054008 | 3316 | 37.99 | 90.27 |
30-943303755 | R83 | AACCTACG+CTTCGCCT | 17612372 | 5284 | 37.76 | 89.18 |
30-943303755 | R91 | GGATCTGA+CTTCGCCT | 17511498 | 5253 | 38.03 | 90.44 |
30-943303755 | R99 | TGATCACG+CTTCGCCT | 16221581 | 4866 | 37.47 | 87.83 |
I then added this data to the sample metadata and generated a column for total reads combined from first and second sequencing.
date | sample | larvae | temperature | symbiont | parent | phenotype | code | Barcode Sequence | # Reads | Yield (Mbases) | Mean Quality Score | % Bases >= 30 | ncbi_bioproject | ncbi_accession | resequenced | second_barcode | second_reads | second_yield | second_mean_quality | second_bases_30 | second_ncbi_bioproject | second_ncbi_accession | total_reads |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
20230627 | R100 | 50 | 33 | Mixed | Nonbleached | Nonbleached_Mixed | Nonbleached_Mixed_33 | TGATCACG+TAAGATTA | 34566278 | 10370 | 37.79 | 89.32 | PRJNA1078313 | SAMN40082680 | 34566278 | ||||||||
20230627 | R101 | 50 | 33 | Mixed | Nonbleached | Nonbleached_Mixed | Nonbleached_Mixed_33 | TGATCACG+ACGTCCTG | 38457421 | 11538 | 37.79 | 89.29 | PRJNA1078313 | SAMN40082681 | 38457421 | ||||||||
20230627 | R102 | 50 | 33 | Mixed | Nonbleached | Nonbleached_Mixed | Nonbleached_Mixed_33 | TGATCACG+GTCAGTAC | 33729404 | 10119 | 37.79 | 89.27 | PRJNA1078313 | SAMN40082682 | 33729404 | ||||||||
20230627 | R103 | 50 | 33 | Wildtype | Wildtype | Wildtype | Wildtype_33 | AAGCGACT+AGGCTATA | 21980396 | 6594 | 38.01 | 90.36 | PRJNA1078313 | SAMN40082683 | 21980396 | ||||||||
20230627 | R104 | 50 | 33 | Wildtype | Wildtype | Wildtype | Wildtype_33 | AAGCGACT+GCCTCTAT | 30634968 | 9191 | 37.91 | 89.86 | PRJNA1078313 | SAMN40082684 | 30634968 | ||||||||
20230627 | R105 | 50 | 33 | Wildtype | Wildtype | Wildtype | Wildtype_33 | AAGCGACT+AGGATAGG | 29434310 | 8830 | 37.84 | 89.51 | PRJNA1078313 | SAMN40082685 | 29434310 | ||||||||
20230627 | R106 | 50 | 33 | Wildtype | Wildtype | Wildtype | Wildtype_33 | AAGCGACT+TCAGAGCC | 34929945 | 10479 | 37.94 | 89.97 | PRJNA1078313 | SAMN40082686 | 34929945 | ||||||||
20230627 | R107 | 50 | 33 | Wildtype | Wildtype | Wildtype | Wildtype_33 | AAGCGACT+CTTCGCCT | 6719567 | 2016 | 38.15 | 90.98 | PRJNA1078313 | SAMN40082687 | yes | AAGCGACT+CTTCGCCT | 17533977 | 5260 | 38 | 90.31 | 24253544 | ||
20230627 | R108 | 50 | 33 | Wildtype | Wildtype | Wildtype | Wildtype_33 | AAGCGACT+TAAGATTA | 29304668 | 8791 | 37.95 | 90.04 | PRJNA1078313 | SAMN40082688 | 29304668 | ||||||||
20230627 | R55 | 50 | 27 | Cladocopium | Bleached | Bleached_Cladocopium | Bleached_Cladocopium_27 | TTCCTCCT+AGGCTATA | 15818670 | 4746 | 37.8 | 89.36 | PRJNA1078313 | SAMN40082635 | yes | TTCCTCCT+AGGCTATA | 2876549 | 863 | 38.01 | 90.43 | 18695219 | ||
20230627 | R56 | 50 | 27 | Cladocopium | Bleached | Bleached_Cladocopium | Bleached_Cladocopium_27 | TTCCTCCT+GCCTCTAT | 17457755 | 5238 | 37.58 | 88.32 | PRJNA1078313 | SAMN40082636 | yes | TTCCTCCT+GCCTCTAT | 1661094 | 498 | 37.94 | 90.06 | 19118849 | ||
20230627 | R57 | 50 | 27 | Cladocopium | Bleached | Bleached_Cladocopium | Bleached_Cladocopium_27 | TTCCTCCT+AGGATAGG | 15232862 | 4570 | 37.7 | 88.88 | PRJNA1078313 | SAMN40082637 | yes | TTCCTCCT+AGGATAGG | 3287827 | 986 | 38.08 | 90.73 | 18520689 | ||
20230627 | R58 | 50 | 27 | Cladocopium | Bleached | Bleached_Cladocopium | Bleached_Cladocopium_27 | TTCCTCCT+TCAGAGCC | 17987445 | 5396 | 37.73 | 89.03 | PRJNA1078313 | SAMN40082638 | yes | TTCCTCCT+TCAGAGCC | 1373160 | 412 | 38.09 | 90.78 | 19360605 | ||
20230627 | R59 | 50 | 27 | Cladocopium | Bleached | Bleached_Cladocopium | Bleached_Cladocopium_27 | TTCCTCCT+CTTCGCCT | 6002991 | 1801 | 37.96 | 90.11 | PRJNA1078313 | SAMN40082639 | yes | TTCCTCCT+CTTCGCCT | 33684113 | 10105 | 37.96 | 90.15 | 39687104 | ||
20230627 | R60 | 50 | 27 | Cladocopium | Bleached | Bleached_Cladocopium | Bleached_Cladocopium_27 | TTCCTCCT+TAAGATTA | 16284649 | 4885 | 37.63 | 88.54 | PRJNA1078313 | SAMN40082640 | yes | TTCCTCCT+TAAGATTA | 2530567 | 759 | 38 | 90.36 | 18815216 | ||
20230627 | R61 | 50 | 27 | Mixed | Nonbleached | Nonbleached_Mixed | Nonbleached_Mixed_27 | TTCCTCCT+ACGTCCTG | 19891057 | 5967 | 37.73 | 89.05 | PRJNA1078313 | SAMN40082641 | 19891057 | ||||||||
20230627 | R62 | 50 | 27 | Mixed | Nonbleached | Nonbleached_Mixed | Nonbleached_Mixed_27 | TTCCTCCT+GTCAGTAC | 16675688 | 5003 | 37.64 | 88.59 | PRJNA1078313 | SAMN40082642 | yes | TTCCTCCT+GTCAGTAC | 2313143 | 694 | 38.01 | 90.39 | 18988831 | ||
20230627 | R63 | 50 | 27 | Mixed | Nonbleached | Nonbleached_Mixed | Nonbleached_Mixed_27 | TGCTTGCT+AGGCTATA | 23828671 | 7149 | 38.01 | 90.33 | PRJNA1078313 | SAMN40082643 | 23828671 | ||||||||
20230627 | R64 | 50 | 27 | Mixed | Nonbleached | Nonbleached_Mixed | Nonbleached_Mixed_27 | TGCTTGCT+GCCTCTAT | 32277087 | 9683 | 37.93 | 89.96 | PRJNA1078313 | SAMN40082644 | 32277087 | ||||||||
20230627 | R65 | 50 | 27 | Mixed | Nonbleached | Nonbleached_Mixed | Nonbleached_Mixed_27 | TGCTTGCT+AGGATAGG | 32099169 | 9630 | 37.78 | 89.27 | PRJNA1078313 | SAMN40082645 | 32099169 | ||||||||
20230627 | R66 | 50 | 27 | Mixed | Nonbleached | Nonbleached_Mixed | Nonbleached_Mixed_27 | TGCTTGCT+TCAGAGCC | 32272569 | 9682 | 37.96 | 90.08 | PRJNA1078313 | SAMN40082646 | 32272569 | ||||||||
20230627 | R67 | 50 | 27 | Wildtype | Wildtype | Wildtype | Wildtype_27 | TGCTTGCT+CTTCGCCT | 7665773 | 2300 | 38.07 | 90.62 | PRJNA1078313 | SAMN40082647 | yes | TGCTTGCT+CTTCGCCT | 17430173 | 5229 | 37.9 | 89.84 | 25095946 | ||
20230627 | R68 | 50 | 27 | Wildtype | Wildtype | Wildtype | Wildtype_27 | TGCTTGCT+TAAGATTA | 30934325 | 9280 | 37.92 | 89.89 | PRJNA1078313 | SAMN40082648 | 30934325 | ||||||||
20230627 | R69 | 50 | 27 | Wildtype | Wildtype | Wildtype | Wildtype_27 | TGCTTGCT+ACGTCCTG | 39517309 | 11856 | 37.9 | 89.77 | PRJNA1078313 | SAMN40082649 | 39517309 | ||||||||
20230627 | R70 | 50 | 27 | Wildtype | Wildtype | Wildtype | Wildtype_27 | TGCTTGCT+GTCAGTAC | 34280767 | 10284 | 37.88 | 89.7 | PRJNA1078313 | SAMN40082650 | 34280767 | ||||||||
20230627 | R71 | 50 | 27 | Wildtype | Wildtype | Wildtype | Wildtype_27 | GGTGATGA+AGGCTATA | 27838736 | 8352 | 37.99 | 90.27 | PRJNA1078313 | SAMN40082651 | 27838736 | ||||||||
20230627 | R72 | 50 | 27 | Wildtype | Wildtype | Wildtype | Wildtype_27 | GGTGATGA+GCCTCTAT | 33950055 | 10185 | 37.81 | 89.34 | PRJNA1078313 | SAMN40082652 | 33950055 | ||||||||
20230627 | R73 | 50 | 30 | Cladocopium | Bleached | Bleached_Cladocopium | Bleached_Cladocopium_30 | GGTGATGA+AGGATAGG | 35173243 | 10552 | 37.88 | 89.71 | PRJNA1078313 | SAMN40082653 | 35173243 | ||||||||
20230627 | R74 | 50 | 30 | Cladocopium | Bleached | Bleached_Cladocopium | Bleached_Cladocopium_30 | GGTGATGA+TCAGAGCC | 34922263 | 10476 | 37.96 | 90.07 | PRJNA1078313 | SAMN40082654 | 34922263 | ||||||||
20230627 | R75 | 50 | 30 | Cladocopium | Bleached | Bleached_Cladocopium | Bleached_Cladocopium_30 | GGTGATGA+CTTCGCCT | 9871112 | 2961 | 38.12 | 90.85 | PRJNA1078313 | SAMN40082655 | yes | GGTGATGA+CTTCGCCT | 11054008 | 3316 | 37.99 | 90.27 | 20925120 | ||
20230627 | R76 | 50 | 30 | Cladocopium | Bleached | Bleached_Cladocopium | Bleached_Cladocopium_30 | GGTGATGA+TAAGATTA | 33036284 | 9911 | 37.95 | 90.02 | PRJNA1078313 | SAMN40082656 | 33036284 | ||||||||
20230627 | R77 | 50 | 30 | Cladocopium | Bleached | Bleached_Cladocopium | Bleached_Cladocopium_30 | GGTGATGA+ACGTCCTG | 39746840 | 11924 | 37.98 | 90.17 | PRJNA1078313 | SAMN40082657 | 39746840 | ||||||||
20230627 | R78 | 50 | 30 | Cladocopium | Bleached | Bleached_Cladocopium | Bleached_Cladocopium_30 | GGTGATGA+GTCAGTAC | 35398354 | 10619 | 37.86 | 89.61 | PRJNA1078313 | SAMN40082658 | 35398354 | ||||||||
20230627 | R79 | 50 | 30 | Mixed | Nonbleached | Nonbleached_Mixed | Nonbleached_Mixed_30 | AACCTACG+AGGCTATA | 25009225 | 7502 | 37.93 | 89.95 | PRJNA1078313 | SAMN40082659 | 25009225 | ||||||||
20230627 | R80 | 50 | 30 | Mixed | Nonbleached | Nonbleached_Mixed | Nonbleached_Mixed_30 | AACCTACG+GCCTCTAT | 36133089 | 10840 | 37.85 | 89.53 | PRJNA1078313 | SAMN40082660 | 36133089 | ||||||||
20230627 | R81 | 50 | 30 | Mixed | Nonbleached | Nonbleached_Mixed | Nonbleached_Mixed_30 | AACCTACG+AGGATAGG | 33493354 | 10048 | 37.78 | 89.2 | PRJNA1078313 | SAMN40082661 | 33493354 | ||||||||
20230627 | R82 | 50 | 30 | Mixed | Nonbleached | Nonbleached_Mixed | Nonbleached_Mixed_30 | AACCTACG+TCAGAGCC | 35452563 | 10636 | 37.92 | 89.85 | PRJNA1078313 | SAMN40082662 | 35452563 | ||||||||
20230627 | R83 | 50 | 30 | Mixed | Nonbleached | Nonbleached_Mixed | Nonbleached_Mixed_30 | AACCTACG+CTTCGCCT | 7497048 | 2249 | 38.06 | 90.57 | PRJNA1078313 | SAMN40082663 | yes | AACCTACG+CTTCGCCT | 17612372 | 5284 | 37.76 | 89.18 | 25109420 | ||
20230627 | R84 | 50 | 30 | Mixed | Nonbleached | Nonbleached_Mixed | Nonbleached_Mixed_30 | AACCTACG+TAAGATTA | 32220467 | 9666 | 37.92 | 89.86 | PRJNA1078313 | SAMN40082664 | 32220467 | ||||||||
20230627 | R85 | 50 | 30 | Wildtype | Wildtype | Wildtype | Wildtype_30 | AACCTACG+ACGTCCTG | 36878406 | 11063 | 37.84 | 89.46 | PRJNA1078313 | SAMN40082665 | 36878406 | ||||||||
20230627 | R86 | 50 | 30 | Wildtype | Wildtype | Wildtype | Wildtype_30 | AACCTACG+GTCAGTAC | 32691940 | 9808 | 37.76 | 89.14 | PRJNA1078313 | SAMN40082666 | 32691940 | ||||||||
20230627 | R87 | 50 | 30 | Wildtype | Wildtype | Wildtype | Wildtype_30 | GGATCTGA+AGGCTATA | 24408673 | 7323 | 38.09 | 90.72 | PRJNA1078313 | SAMN40082667 | 24408673 | ||||||||
20230627 | R88 | 50 | 30 | Wildtype | Wildtype | Wildtype | Wildtype_30 | GGATCTGA+GCCTCTAT | 30584889 | 9175 | 37.85 | 89.56 | PRJNA1078313 | SAMN40082668 | 30584889 | ||||||||
20230627 | R89 | 50 | 30 | Wildtype | Wildtype | Wildtype | Wildtype_30 | GGATCTGA+AGGATAGG | 33121528 | 9936 | 37.87 | 89.65 | PRJNA1078313 | SAMN40082669 | 33121528 | ||||||||
20230627 | R90 | 50 | 30 | Wildtype | Wildtype | Wildtype | Wildtype_30 | GGATCTGA+TCAGAGCC | 31879658 | 9564 | 38.03 | 90.37 | PRJNA1078313 | SAMN40082670 | 31879658 | ||||||||
20230627 | R91 | 50 | 33 | Cladocopium | Bleached | Bleached_Cladocopium | Bleached_Cladocopium_33 | GGATCTGA+CTTCGCCT | 6934253 | 2080 | 38.01 | 90.36 | PRJNA1078313 | SAMN40082671 | yes | GGATCTGA+CTTCGCCT | 17511498 | 5253 | 38.03 | 90.44 | 24445751 | ||
20230627 | R92 | 50 | 33 | Cladocopium | Bleached | Bleached_Cladocopium | Bleached_Cladocopium_33 | GGATCTGA+TAAGATTA | 28565956 | 8570 | 37.93 | 89.93 | PRJNA1078313 | SAMN40082672 | 28565956 | ||||||||
20230627 | R93 | 50 | 33 | Cladocopium | Bleached | Bleached_Cladocopium | Bleached_Cladocopium_33 | GGATCTGA+ACGTCCTG | 31620062 | 9486 | 38.02 | 90.37 | PRJNA1078313 | SAMN40082673 | 31620062 | ||||||||
20230627 | R94 | 50 | 33 | Cladocopium | Bleached | Bleached_Cladocopium | Bleached_Cladocopium_33 | GGATCTGA+GTCAGTAC | 32531274 | 9759 | 37.87 | 89.62 | PRJNA1078313 | SAMN40082674 | 32531274 | ||||||||
20230627 | R95 | 50 | 33 | Cladocopium | Bleached | Bleached_Cladocopium | Bleached_Cladocopium_33 | TGATCACG+AGGCTATA | 27375934 | 8213 | 37.92 | 89.92 | PRJNA1078313 | SAMN40082675 | 27375934 | ||||||||
20230627 | R96 | 50 | 33 | Cladocopium | Bleached | Bleached_Cladocopium | Bleached_Cladocopium_33 | TGATCACG+GCCTCTAT | 35882529 | 10765 | 37.78 | 89.22 | PRJNA1078313 | SAMN40082676 | 35882529 | ||||||||
20230627 | R97 | 50 | 33 | Mixed | Nonbleached | Nonbleached_Mixed | Nonbleached_Mixed_33 | TGATCACG+AGGATAGG | 33376891 | 10013 | 37.77 | 89.17 | PRJNA1078313 | SAMN40082677 | 33376891 | ||||||||
20230627 | R98 | 50 | 33 | Mixed | Nonbleached | Nonbleached_Mixed | Nonbleached_Mixed_33 | TGATCACG+TCAGAGCC | 35899983 | 10770 | 37.86 | 89.63 | PRJNA1078313 | SAMN40082678 | 35899983 | ||||||||
20230627 | R99 | 50 | 33 | Mixed | Nonbleached | Nonbleached_Mixed | Nonbleached_Mixed_33 | TGATCACG+CTTCGCCT | 8420607 | 2526 | 37.83 | 89.57 | PRJNA1078313 | SAMN40082679 | yes | TGATCACG+CTTCGCCT | 16221581 | 4866 | 37.47 | 87.83 | 24642188 |
All samples now have >18 M reads ranging from ~18M to ~35M. This is great! It increased read depth for the samples that previously had <12M reads.
Download sequences to URI server
Prepare folders in Andromeda directory.
#logged into Andromeda
cd /data/putnamlab/ashuffmyer
cd mcap-2023-rnaseq
cd raw-sequences
mkdir second_sequencing
Now the directory that I want sequences in is /data/putnamlab/ashuffmyer/mcap-2023-rnaseq/raw-sequences/second_sequencing
.
# in andromeda second_sequencing folder
pwd
/data/putnamlab/ashuffmyer/mcap-2023-rnaseq/raw-sequences/second_sequencing
# log into Azenta sftp while logged into Andromeda as directed by Azenta and navigate to sequence folder 00_fastq
#set directory for download
lcd /data/putnamlab/ashuffmyer/mcap-2023-rnaseq/raw-sequences/second_sequencing
#download all files in sequence folder
mget *
Downloaded to URI Andromeda on April 12 2024.
I then checked for data integrity using md5 checksums.
md5sum *.fastq.gz > checkmd5_20240412.md5
md5sum -c checkmd5_20240412.md5
Output was as follows:
R107_R1_001.fastq.gz: OK
R107_R2_001.fastq.gz: OK
R55_R1_001.fastq.gz: OK
R55_R2_001.fastq.gz: OK
R56_R1_001.fastq.gz: OK
R56_R2_001.fastq.gz: OK
R57_R1_001.fastq.gz: OK
R57_R2_001.fastq.gz: OK
R58_R1_001.fastq.gz: OK
R58_R2_001.fastq.gz: OK
R59_R1_001.fastq.gz: OK
R59_R2_001.fastq.gz: OK
R60_R1_001.fastq.gz: OK
R60_R2_001.fastq.gz: OK
R62_R1_001.fastq.gz: OK
R62_R2_001.fastq.gz: OK
R67_R1_001.fastq.gz: OK
R67_R2_001.fastq.gz: OK
R75_R1_001.fastq.gz: OK
R75_R2_001.fastq.gz: OK
R83_R1_001.fastq.gz: OK
R83_R2_001.fastq.gz: OK
R91_R1_001.fastq.gz: OK
R91_R2_001.fastq.gz: OK
R99_R1_001.fastq.gz: OK
R99_R2_001.fastq.gz: OK
Azenta provided a .md5 file for each sequence file. I then compared generated checksums to these original files to confirm data integrity and content is the same after the transfer from Azenta.
#bind together all .md5 files provided by Azenta
cat *.gz.md5 > azenta_original_checksums_second.md5
This provided the following list of the original checksums:
less azenta_original_checksums_second.md5
6ee514bdc1bacfb591629a3edf82bcd4 ./R107_R1_001.fastq.gz
3d6b00afc053f2eb770f83c5830f699e ./R107_R2_001.fastq.gz
16aba2b715ceeb2c2f3a79dbc855e768 ./R55_R1_001.fastq.gz
14e32d9fccd55fd81a3468a4fd5766d1 ./R55_R2_001.fastq.gz
7fbb16f6f920e4a58431cd91e14b912b ./R56_R1_001.fastq.gz
80bd51fd98400da3fb91b1182252f577 ./R56_R2_001.fastq.gz
aa6b3bd9617c10ca9b504309cc1ff047 ./R57_R1_001.fastq.gz
c52e1d5d9c7caa2189eaedf81944dfc7 ./R57_R2_001.fastq.gz
027601dabbfc6e48829efe5b931e6e05 ./R58_R1_001.fastq.gz
b1e2605ec7c35ad3c25ad7ab03cb4d2f ./R58_R2_001.fastq.gz
a62cfd57273efefcfc4076e66cb5da1c ./R59_R1_001.fastq.gz
dfa3021694fd5ddc01c387eafe34a0d9 ./R59_R2_001.fastq.gz
ba4c0c4c8dede34d546ee7ec5293553b ./R60_R1_001.fastq.gz
dbf68528a9ee3428d378275c83458b24 ./R60_R2_001.fastq.gz
9ae8dc138d0df94197388b9da117f658 ./R62_R1_001.fastq.gz
96e1b11cb6e60415dc6f69fb4849651e ./R62_R2_001.fastq.gz
0d2b1896bf85c0800a3a3482f34c90ca ./R67_R1_001.fastq.gz
cbb14c976024dfe72a7561415284c41a ./R67_R2_001.fastq.gz
f16d7330659ba6e310ade50cdae10b8c ./R75_R1_001.fastq.gz
9b9ab52837950fdd8108d22e29f9bc31 ./R75_R2_001.fastq.gz
429bf4b34de618d97ec2213e658477c2 ./R83_R1_001.fastq.gz
77cf8a4c7d2db0bf83280df1254b335a ./R83_R2_001.fastq.gz
86064e95efa1d802371fbda68d2be0ca ./R91_R1_001.fastq.gz
3e24c0fdad691020cd1140e3a90bea3f ./R91_R2_001.fastq.gz
c117e340bd9bef1bc295b6e86f6bfcbb ./R99_R1_001.fastq.gz
0ebb0438dde06d989cfd79a5616ef9a3 ./R99_R2_001.fastq.gz
Then, here is the md5 checksum of the downloaded data on Andromeda:
less checkmd5_20240412.md5
6ee514bdc1bacfb591629a3edf82bcd4 R107_R1_001.fastq.gz
3d6b00afc053f2eb770f83c5830f699e R107_R2_001.fastq.gz
16aba2b715ceeb2c2f3a79dbc855e768 R55_R1_001.fastq.gz
14e32d9fccd55fd81a3468a4fd5766d1 R55_R2_001.fastq.gz
7fbb16f6f920e4a58431cd91e14b912b R56_R1_001.fastq.gz
80bd51fd98400da3fb91b1182252f577 R56_R2_001.fastq.gz
aa6b3bd9617c10ca9b504309cc1ff047 R57_R1_001.fastq.gz
c52e1d5d9c7caa2189eaedf81944dfc7 R57_R2_001.fastq.gz
027601dabbfc6e48829efe5b931e6e05 R58_R1_001.fastq.gz
b1e2605ec7c35ad3c25ad7ab03cb4d2f R58_R2_001.fastq.gz
a62cfd57273efefcfc4076e66cb5da1c R59_R1_001.fastq.gz
dfa3021694fd5ddc01c387eafe34a0d9 R59_R2_001.fastq.gz
ba4c0c4c8dede34d546ee7ec5293553b R60_R1_001.fastq.gz
dbf68528a9ee3428d378275c83458b24 R60_R2_001.fastq.gz
9ae8dc138d0df94197388b9da117f658 R62_R1_001.fastq.gz
96e1b11cb6e60415dc6f69fb4849651e R62_R2_001.fastq.gz
0d2b1896bf85c0800a3a3482f34c90ca R67_R1_001.fastq.gz
cbb14c976024dfe72a7561415284c41a R67_R2_001.fastq.gz
f16d7330659ba6e310ade50cdae10b8c R75_R1_001.fastq.gz
9b9ab52837950fdd8108d22e29f9bc31 R75_R2_001.fastq.gz
429bf4b34de618d97ec2213e658477c2 R83_R1_001.fastq.gz
77cf8a4c7d2db0bf83280df1254b335a R83_R2_001.fastq.gz
86064e95efa1d802371fbda68d2be0ca R91_R1_001.fastq.gz
3e24c0fdad691020cd1140e3a90bea3f R91_R2_001.fastq.gz
c117e340bd9bef1bc295b6e86f6bfcbb R99_R1_001.fastq.gz
0ebb0438dde06d989cfd79a5616ef9a3 R99_R2_001.fastq.gz
I then added these lists to a spreadsheet and checked that the cells matched for original and downloaded files. The file is on GitHub here.
I also added in the checksums for data that I downloaded to Mox (detailed below). All files are confirmed and everything looks good!
file | original_azenta | downloaded_andromeda | downloaded_mox | match_andromeda | match_mox |
---|---|---|---|---|---|
R107_R1_001.fastq.gz | 6ee514bdc1bacfb591629a3edf82bcd4 | 6ee514bdc1bacfb591629a3edf82bcd4 | 6ee514bdc1bacfb591629a3edf82bcd4 | TRUE | TRUE |
R107_R2_001.fastq.gz | 3d6b00afc053f2eb770f83c5830f699e | 3d6b00afc053f2eb770f83c5830f699e | 3d6b00afc053f2eb770f83c5830f699e | TRUE | TRUE |
R55_R1_001.fastq.gz | 16aba2b715ceeb2c2f3a79dbc855e768 | 16aba2b715ceeb2c2f3a79dbc855e768 | 16aba2b715ceeb2c2f3a79dbc855e768 | TRUE | TRUE |
R55_R2_001.fastq.gz | 14e32d9fccd55fd81a3468a4fd5766d1 | 14e32d9fccd55fd81a3468a4fd5766d1 | 14e32d9fccd55fd81a3468a4fd5766d1 | TRUE | TRUE |
R56_R1_001.fastq.gz | 7fbb16f6f920e4a58431cd91e14b912b | 7fbb16f6f920e4a58431cd91e14b912b | 7fbb16f6f920e4a58431cd91e14b912b | TRUE | TRUE |
R56_R2_001.fastq.gz | 80bd51fd98400da3fb91b1182252f577 | 80bd51fd98400da3fb91b1182252f577 | 80bd51fd98400da3fb91b1182252f577 | TRUE | TRUE |
R57_R1_001.fastq.gz | aa6b3bd9617c10ca9b504309cc1ff047 | aa6b3bd9617c10ca9b504309cc1ff047 | aa6b3bd9617c10ca9b504309cc1ff047 | TRUE | TRUE |
R57_R2_001.fastq.gz | c52e1d5d9c7caa2189eaedf81944dfc7 | c52e1d5d9c7caa2189eaedf81944dfc7 | c52e1d5d9c7caa2189eaedf81944dfc7 | TRUE | TRUE |
R58_R1_001.fastq.gz | 027601dabbfc6e48829efe5b931e6e05 | 027601dabbfc6e48829efe5b931e6e05 | 027601dabbfc6e48829efe5b931e6e05 | TRUE | TRUE |
R58_R2_001.fastq.gz | b1e2605ec7c35ad3c25ad7ab03cb4d2f | b1e2605ec7c35ad3c25ad7ab03cb4d2f | b1e2605ec7c35ad3c25ad7ab03cb4d2f | TRUE | TRUE |
R59_R1_001.fastq.gz | a62cfd57273efefcfc4076e66cb5da1c | a62cfd57273efefcfc4076e66cb5da1c | a62cfd57273efefcfc4076e66cb5da1c | TRUE | TRUE |
R59_R2_001.fastq.gz | dfa3021694fd5ddc01c387eafe34a0d9 | dfa3021694fd5ddc01c387eafe34a0d9 | dfa3021694fd5ddc01c387eafe34a0d9 | TRUE | TRUE |
R60_R1_001.fastq.gz | ba4c0c4c8dede34d546ee7ec5293553b | ba4c0c4c8dede34d546ee7ec5293553b | ba4c0c4c8dede34d546ee7ec5293553b | TRUE | TRUE |
R60_R2_001.fastq.gz | dbf68528a9ee3428d378275c83458b24 | dbf68528a9ee3428d378275c83458b24 | dbf68528a9ee3428d378275c83458b24 | TRUE | TRUE |
R62_R1_001.fastq.gz | 9ae8dc138d0df94197388b9da117f658 | 9ae8dc138d0df94197388b9da117f658 | 9ae8dc138d0df94197388b9da117f658 | TRUE | TRUE |
R62_R2_001.fastq.gz | 96e1b11cb6e60415dc6f69fb4849651e | 96e1b11cb6e60415dc6f69fb4849651e | 96e1b11cb6e60415dc6f69fb4849651e | TRUE | TRUE |
R67_R1_001.fastq.gz | 0d2b1896bf85c0800a3a3482f34c90ca | 0d2b1896bf85c0800a3a3482f34c90ca | 0d2b1896bf85c0800a3a3482f34c90ca | TRUE | TRUE |
R67_R2_001.fastq.gz | cbb14c976024dfe72a7561415284c41a | cbb14c976024dfe72a7561415284c41a | cbb14c976024dfe72a7561415284c41a | TRUE | TRUE |
R75_R1_001.fastq.gz | f16d7330659ba6e310ade50cdae10b8c | f16d7330659ba6e310ade50cdae10b8c | f16d7330659ba6e310ade50cdae10b8c | TRUE | TRUE |
R75_R2_001.fastq.gz | 9b9ab52837950fdd8108d22e29f9bc31 | 9b9ab52837950fdd8108d22e29f9bc31 | 9b9ab52837950fdd8108d22e29f9bc31 | TRUE | TRUE |
R83_R1_001.fastq.gz | 429bf4b34de618d97ec2213e658477c2 | 429bf4b34de618d97ec2213e658477c2 | 429bf4b34de618d97ec2213e658477c2 | TRUE | TRUE |
R83_R2_001.fastq.gz | 77cf8a4c7d2db0bf83280df1254b335a | 77cf8a4c7d2db0bf83280df1254b335a | 77cf8a4c7d2db0bf83280df1254b335a | TRUE | TRUE |
R91_R1_001.fastq.gz | 86064e95efa1d802371fbda68d2be0ca | 86064e95efa1d802371fbda68d2be0ca | 86064e95efa1d802371fbda68d2be0ca | TRUE | TRUE |
R91_R2_001.fastq.gz | 3e24c0fdad691020cd1140e3a90bea3f | 3e24c0fdad691020cd1140e3a90bea3f | 3e24c0fdad691020cd1140e3a90bea3f | TRUE | TRUE |
R99_R1_001.fastq.gz | c117e340bd9bef1bc295b6e86f6bfcbb | c117e340bd9bef1bc295b6e86f6bfcbb | c117e340bd9bef1bc295b6e86f6bfcbb | TRUE | TRUE |
R99_R2_001.fastq.gz | 0ebb0438dde06d989cfd79a5616ef9a3 | 0ebb0438dde06d989cfd79a5616ef9a3 | 0ebb0438dde06d989cfd79a5616ef9a3 | TRUE | TRUE |
Data are now downloaded and integrity confirmed on URI Andromeda.
Download data to UW Hyak/Mox
#logged into UW Hyak/Mox
cd /gscratch/srlab/ashuff/mcap-2023-rnaseq
mkdir second_sequencing
cd second_sequencing
The full directory where I want raw sequences to go is /gscratch/srlab/ashuff/mcap-2023-rnaseq/second_sequencing
.
# in Hyak ashuffm folder
# log into Azenta sftp as directed by Azenta
# cd into my project folder
#set directory for download
lcd /gscratch/srlab/ashuff/mcap-2023-rnaseq/second_sequencing
#download all files into project folder
mget *
Downloaded on 13 April 2024.
I then checked for data integrity using md5 checksums. Azenta provided a .md5 file for each sequence file. See the table above for confirmation of md5 checksums.
srun -p srlab -A srlab --time=1:00:00 --mem=100G --pty /bin/bash
md5sum *.fastq.gz > checkmd5_20240413.md5
md5sum -c checkmd5_20240413.md5
Check sums from data downloaded on Mox is here:
less checkmd5_20240413.md5
6ee514bdc1bacfb591629a3edf82bcd4 R107_R1_001.fastq.gz
3d6b00afc053f2eb770f83c5830f699e R107_R2_001.fastq.gz
16aba2b715ceeb2c2f3a79dbc855e768 R55_R1_001.fastq.gz
14e32d9fccd55fd81a3468a4fd5766d1 R55_R2_001.fastq.gz
7fbb16f6f920e4a58431cd91e14b912b R56_R1_001.fastq.gz
80bd51fd98400da3fb91b1182252f577 R56_R2_001.fastq.gz
aa6b3bd9617c10ca9b504309cc1ff047 R57_R1_001.fastq.gz
c52e1d5d9c7caa2189eaedf81944dfc7 R57_R2_001.fastq.gz
027601dabbfc6e48829efe5b931e6e05 R58_R1_001.fastq.gz
b1e2605ec7c35ad3c25ad7ab03cb4d2f R58_R2_001.fastq.gz
a62cfd57273efefcfc4076e66cb5da1c R59_R1_001.fastq.gz
dfa3021694fd5ddc01c387eafe34a0d9 R59_R2_001.fastq.gz
ba4c0c4c8dede34d546ee7ec5293553b R60_R1_001.fastq.gz
dbf68528a9ee3428d378275c83458b24 R60_R2_001.fastq.gz
9ae8dc138d0df94197388b9da117f658 R62_R1_001.fastq.gz
96e1b11cb6e60415dc6f69fb4849651e R62_R2_001.fastq.gz
0d2b1896bf85c0800a3a3482f34c90ca R67_R1_001.fastq.gz
cbb14c976024dfe72a7561415284c41a R67_R2_001.fastq.gz
f16d7330659ba6e310ade50cdae10b8c R75_R1_001.fastq.gz
9b9ab52837950fdd8108d22e29f9bc31 R75_R2_001.fastq.gz
429bf4b34de618d97ec2213e658477c2 R83_R1_001.fastq.gz
77cf8a4c7d2db0bf83280df1254b335a R83_R2_001.fastq.gz
86064e95efa1d802371fbda68d2be0ca R91_R1_001.fastq.gz
3e24c0fdad691020cd1140e3a90bea3f R91_R2_001.fastq.gz
c117e340bd9bef1bc295b6e86f6bfcbb R99_R1_001.fastq.gz
0ebb0438dde06d989cfd79a5616ef9a3 R99_R2_001.fastq.gz
Output from checksums is here:
R107_R1_001.fastq.gz: OK
R107_R2_001.fastq.gz: OK
R55_R1_001.fastq.gz: OK
R55_R2_001.fastq.gz: OK
R56_R1_001.fastq.gz: OK
R56_R2_001.fastq.gz: OK
R57_R1_001.fastq.gz: OK
R57_R2_001.fastq.gz: OK
R58_R1_001.fastq.gz: OK
R58_R2_001.fastq.gz: OK
R59_R1_001.fastq.gz: OK
R59_R2_001.fastq.gz: OK
R60_R1_001.fastq.gz: OK
R60_R2_001.fastq.gz: OK
R62_R1_001.fastq.gz: OK
R62_R2_001.fastq.gz: OK
R67_R1_001.fastq.gz: OK
R67_R2_001.fastq.gz: OK
R75_R1_001.fastq.gz: OK
R75_R2_001.fastq.gz: OK
R83_R1_001.fastq.gz: OK
R83_R2_001.fastq.gz: OK
R91_R1_001.fastq.gz: OK
R91_R2_001.fastq.gz: OK
R99_R1_001.fastq.gz: OK
R99_R2_001.fastq.gz: OK
Everything looks good and all data files were transferred correctly.
Files are now stored on both URI and UW servers.
2. Rename files to indicate they are from the second sequencing
Add an “s” after the sample ID to indicate that samples are from the second round of sequencing, so that they can be uploaded to NCBI and will remain distinct from first round of sequencing.
I am performing this in Andromeda, becuase this is where I am going to conduct analyses and upload to NCBI.
/data/putnamlab/ashuffmyer/mcap-2023-rnaseq/raw-sequences/second_sequencing
for file in *; do
# Check if the file name contains ".gz" and an underscore
if [[ $file == *".gz"* && $file == *_* ]]; then
# Extract the part of the file name before the first underscore
prefix="${file%%_*}"
# Extract the part of the file name after the first underscore
suffix="${file#*_}"
# Rename the file by adding "s" after the prefix
mv "$file" "${prefix}s_$suffix"
echo "Renamed: $file to ${prefix}s_$suffix"
fi
done
This changed the files to the following names:
azenta_original_checksums_second.md5 R60s_R1_001.fastq.gz.md5
checkmd5_20240412.md5 R60s_R2_001.fastq.gz
md5_files R60s_R2_001.fastq.gz.md5
R107s_R1_001.fastq.gz R62s_R1_001.fastq.gz
R107s_R1_001.fastq.gz.md5 R62s_R1_001.fastq.gz.md5
R107s_R2_001.fastq.gz R62s_R2_001.fastq.gz
R107s_R2_001.fastq.gz.md5 R62s_R2_001.fastq.gz.md5
R55s_R1_001.fastq.gz R67s_R1_001.fastq.gz
R55s_R1_001.fastq.gz.md5 R67s_R1_001.fastq.gz.md5
R55s_R2_001.fastq.gz R67s_R2_001.fastq.gz
R55s_R2_001.fastq.gz.md5 R67s_R2_001.fastq.gz.md5
R56s_R1_001.fastq.gz R75s_R1_001.fastq.gz
R56s_R1_001.fastq.gz.md5 R75s_R1_001.fastq.gz.md5
R56s_R2_001.fastq.gz R75s_R2_001.fastq.gz
R56s_R2_001.fastq.gz.md5 R75s_R2_001.fastq.gz.md5
R57s_R1_001.fastq.gz R83s_R1_001.fastq.gz
R57s_R1_001.fastq.gz.md5 R83s_R1_001.fastq.gz.md5
R57s_R2_001.fastq.gz R83s_R2_001.fastq.gz
R57s_R2_001.fastq.gz.md5 R83s_R2_001.fastq.gz.md5
R58s_R1_001.fastq.gz R91s_R1_001.fastq.gz
R58s_R1_001.fastq.gz.md5 R91s_R1_001.fastq.gz.md5
R58s_R2_001.fastq.gz R91s_R2_001.fastq.gz
R58s_R2_001.fastq.gz.md5 R91s_R2_001.fastq.gz.md5
R59s_R1_001.fastq.gz R99s_R1_001.fastq.gz
R59s_R1_001.fastq.gz.md5 R99s_R1_001.fastq.gz.md5
R59s_R2_001.fastq.gz R99s_R2_001.fastq.gz
R59s_R2_001.fastq.gz.md5 R99s_R2_001.fastq.gz.md5
R60s_R1_001.fastq.gz raw_fastqc
The files that were sequenced in the second round now contain “s” after the sample name.
3. Upload to NCBI project
Project ID: PRJNA1078313
I used the same information on uploading for this project detailed in my previous post and the Putnam Lab SRA protocol..
The sample metadata can be found on GitHub here using the MIMS.me.host-associated template here and sequencing information and metadata is located on GitHub here.
I will upload these files as a new SRA submission and link them to the existing bioproject and biosamples.
I started a new SRA submission process and entered the BioProject number PRJNA1078313. I added a release date of March 1 2025.
I created a new SRA metadata sheet for the samples that are being uploaded in this upload. I added the description “Standard mRNA-seq from libraries prepared using Zymo Quick MiniPrep Plus DNA/RNA extraction kit at the Putnam Lab (University of Rhode Island) with sequencing at Azenta Life Sciences. Additional top off sequencing performed by Azenta Life Sciences with additional sequencing files denoted with “s” following sample name prefix.”.
The metadata sheet for these files can be found on GitHub here.
Transfer data from URI server to NCBI SRA
I next followed FTP instructions to upload top off sequencing files.
First, log onto Andromeda, make a folder for this upload, and symlink all files.
cd /data/putnamlab/ashuffmyer/mcap-2023-rnaseq/ncbi_upload
mkdir 20240416_upload
cd 20240416_upload
ln -s /data/putnamlab/ashuffmyer/mcap-2023-rnaseq/raw-sequences/second_sequencing/*gz /data/putnamlab/ashuffmyer/mcap-2023-rnaseq/ncbi_upload/20240416_upload
Next, upload files using FTP instructions.
cd /data/putnamlab/ashuffmyer/mcap-2023-rnaseq/ncbi_upload/20240416_upload
ftp -i
open ftp-private.ncbi.nlm.nih.gov
#enter name and password given on SRA webpage
cd uploads/ashuffmyer_gmail.com_bsKvx0RY
mkdir mcap-2023-rnaseq-upload-20240416
cd mcap-2023-rnaseq-upload-20240416
mput *
The upload to SRA will proceed for each file with messages “transfer complete” when each is uploaded. Keep computer active until all uploads are finished.
Files successfully uploaded.
Continue with the submission by selecting the preload folder on SRA once all 26 files registered.
RNA-Seq sequence files were submitted under SRA SUB14381642.
All information added to the Putnam Lab sequence inventory here.