Stacks Steps by Steps(To use in association with the Stacks manual ava dịch - Stacks Steps by Steps(To use in association with the Stacks manual ava Việt làm thế nào để nói

Stacks Steps by Steps(To use in ass

Stacks Steps by Steps
(To use in association with the Stacks manual available at
http://creskolab.uoregon.edu/stacks/manual/)
0- Input, Output and softwares
Input: -File containing the radtags sequences in fastq format (raw_seq.fq)
-File(s) containing the list of barcode sequences of same length (barcode8nt.txt)
-File containing the genome sequence (genome.fa; for ref_map analysis only)
Output: Create a directory, for example “stacks_analysis”, which will contain all future
output directories and files.
Software: Running Stacks can require the use of other programs such as FastQC, mySQL,
Bowtie and Samtools.
1- FastQC: to check sequence quality (optional)
FastQC will allow to visualizing the reads quality and determine the cut off (= length at
which the reads should be trimmed).
1.1- Run FastQC from the Command Line
> /home/bmb/Analysis_Tools/FastQC/fastqc
1.2- Open the raw_seq.fq file within FastQC interface to start the analysis.
>>>> Example with MungBean:
The statistic “Per base sequence content” shows that the base composition (A, T, C
or G) at the different location seems correctly random until position 70.
The statistic “Per base sequence quality” shows that the average quality score across
all bases of the reads starts to decrease after position 40. Within the first 70 bases, the score
stays high (> 28, within the green area) for 90% of the reads (lower whisker).
In consequence, for maximum quality data the sequences should be trimmed at about
70nt (including barcode)
2- Stacks - process_radtags - to clean the raw data (single-end read)
process_radtags will filtered the reads (according to their quality, barcode and restriction
site), regroup reads by barcode (1 file for each), trim barcode and cut the reads to desired
length.
2.1- Create directory “samples” in the directory “stacks_analysis” to output
the sequence files of all samples (parents + progeny)
2.2- Run process_radtags using the Command Line
> cd /path/to/stacks
> process_radtags -f /path/to/raw_seq.fq -o /path/to/stacks_analysis/samples -b
/path/to/barcode8nt.txt -e apeKI -s 20 -t 62 -c -q -r -D
• f — path to the input file if (single-end sequences). Use “-p” if multiple files in one
directory.
• o — path to output the processed files.
• b — path to a file containing barcodes for this run. One file for each barcode length, one run
for each file.
• e — name of the enzyme used to make the radtags library
• s — set the average score limit (Phred's value) within a sliding window which length is
determined by the flag “-w” (default w=0.15, s=10).
• t — truncate final read length to this value (according to FastQC data).
• c — remove any read with an uncalled base (“N”).
• q — discard reads with low quality scores (according to Phred's value).
• r — rescue barcodes and RAD-Tags (correct barcodes and restriction enzyme sites).
• D — capture discarded reads to a file (useful if processing multiple barcode files).
When using multiple lists containing different barcode length, use the reads discarded from
the process_radtags of the first barcode list (raw_seq.fq.discards) to perform the analysis of
the second, and so on. Start process_radtags using the longest barcodes and finish with the
smallest.
Ex: > process_radtags
-f /path/to/raw_seq.fq
-o /path/to/stacks_analysis/samples
-b /path/to/barcode8nt.txt
-e apeKI -s 20 -t 62 -c -q –D
> process_radtags
-f /path/to/stacks_analysis/samples/raw_seq.fq.discards.fq
-o /path/to/stacks_analysis/samples
-b /path/to/barcode6nt.txt
-e apeKI -s 20 -t 62 -c -q –D
>>>> Example using MungBean:
The option “-r” helped to rescue an additional 10% of the total sequences by
allowing correction of mismatches in the barcode.
When using multiple barcode files. Roughly the same sequences are retained when
using the raw seq or the discarded seq as input. This is however less true for smaller
barcodes for which the use of discarded sequence as input seems to improve data specificity
(ex: “GATT” group present 50% less retained sequences when using the discard sequences
as input compare to the raw sequences as input). STACKS efficiently distinguish between
the different barcodes in respect of their length and sequence. However, using the discarded
seq as input present the advantage of being faster (processing) and avoid the presence of
excess sequences in the “Sequences not recorded” group.
To increase reduce incorporating sequencing error, decrease sliding window size (w),
read length (t) or increase average score limit in window (s). in the example the average
score quality is 20 and the window size is [0.15 x t (93)] = 14 nucleotides. This means that if
the average score within a 15nt-length window drops below 99% of probability of being
correct, the read is discarded.
Phred Quality Score Probability of incorrect base call Base call accuracy
10 1 in 10 90%
20 1 in 100 99%
30 1 in 1000 99.9%
3- mySQL database: to visualize data in web interface
A mySQL database as to be created before running denovo_map or ref_map, in order to
collect the data.
3.1- create a mySQL database
> sudo su
> [sudo] password for bmb: # enter “VA903m”
> mysql -p
> Enter password: # enter “root”
> mysql> create database db_name_radtags; #Always use the same suffix “_radtags”
3.2- create a table and field content in the new mySQL database
> mysql> use db_name_radtags;
> mysql> create table batches (
………
………
);
script (in green) is copied from “stacks.sql” file and paste in the terminal window.
> mysql> q or exit
4- Create a catalog of markers with Stacks
Using denovo_map or ref_map, Stacks will generate a catalog of potential alleles and loci
that can be found in the parents and some of the progeny. The program will generate a
series of files: for each sample: sample1.alleles|matches|snps|tags.tsv; and for the whole
population: batch.catalog.alleles|snps|tags.tsv, batch.genotypes.txt, batch.haplotypes.tsv,
batch.markers.tsv. These files can be use as such or be upload into a MySQL database for
easier analysis using the web interface.
4a- Stacks - denovo_map (without genome sequence)
The program will run a series of stacks components (Ustacks > Cstacks > Sstacks >
genotypes > load_radtags.pl > index_radtags.pl > genotypes) in order to generate a
catalog of loci and alleles (SNPs). In absence of genome sequence to align the radtags
sequences, stacks of reads are generated based on their depth of coverage (number of
identical reads) and associated to create loci based on their sequence similarity. Data can
be view through the mySQL web interface or export in tsv or xls format.
4a.1- Create directory “stacks_denovo” in the directory
“stacks_analysis” to output the catalog files generated from the denovo analysis.
4a.2a- Run denovo_map.pl using the Command Line
> cd /usr/local/share/stacks/scripts
> denovo_map.pl -m 3 -M 1 -n 1 -T 15 -B db_name_radtags -b 1 -A F2 -t
-D “Denovo Map”
-o /path/to/stacks_analysis/stacks_denovo
-p /path/to/stacks_analysis/samples/parent1.fq
-p /path/to/stacks_analysis/samples/parent2.fq
-r /path/to/stacks_analysis/samples/progeny1.fq
-r /path/to/stacks_analysis/samples/progeny2.fq
-r /path/to/stacks_analysis/samples/progeny3.fq
…… enter all samples......
• m —minimum number of raw reads needed to create a stack within an individual and
generate indiv_alleles (default value = 3). Describe as the minimum depth of coverage, the
option is essential to define the stringency level. Higher value of m will insure that less
sequencing error will be treated as polymorphism, but will also reduce the total number of
identified markers.
• M — maximum number of mismatches between stacks to form an haplotype within an
individual and build indiv_loci (default value = 2). It can be seen as the number of SNPs
allowed per locus within same individual. Increasing [M] will increase the number of alleles
and heterozygote loci.
• n — maximum number of mismatches between any two haplotypes (loci) within the
population to build catalog-loci (default value = 0). If [n] > 0, the consensus sequence from
each locus will be used to attempt to merge them together across samples. Therefore, if
locus A from parent 1 is homozygous and locus B from parent 2 is also homozygous, but
they are X nucleotides apart, [n] will govern whether they will be merged when building the
catalog. To get more AAxBB markers, increase [n]. Of course as a side effect when [n]
increases, more physically separate loci will be merge erroneously.
• N — specify the number of mismatches allowed when aligning the secondary reads to
primary stacks (default value = M+2). This is the second run of alignment. The mismatches
here will not count as polymorphism but will be simply ignore, so basically it will just
increase the stack depth. Note that this rescue of reads can have the negative effect by
creating a variation in the locus and the incertitude on whether the locus is homozygous or
heterozygous.
• T — number of threads or cores to run Stacks on.
• t — remove, or break up, highly repetitive RAD-Tags.
• B — name of the mySQL database to load data into.
• b — batch ID representing this dataset (must be a number). Stacks can be run multiple times
on the same dataset and the results will be added to the same database by specifying
different batch IDs. If using an already existing batch ID, the data will not erase the
precious data present in this batch but will be added to them).
• D — batch description.
• H — disable mapping of secondary reads.
• A — if processing a genetic map, specify the cross type, 'CP' (Cross Pollinated = F1 cross),
'F2' (F2 cross, with F0 submitted as the parents), 'BC1' (backcross F1x Parent), 'DH'
(Doubled Haploids), or 'GEN' (Generic, to get a list of all possible markers independently of
the cross type). The program will throw out alleles that could not occur in the specified cross
type situation (Ex.1: in a F2 cross we can have AA/BB markers fr
0/5000
Từ: -
Sang: -
Kết quả (Việt) 1: [Sao chép]
Sao chép!
Stacks Steps by Steps(To use in association with the Stacks manual available athttp://creskolab.uoregon.edu/stacks/manual/)0- Input, Output and softwaresInput: -File containing the radtags sequences in fastq format (raw_seq.fq)-File(s) containing the list of barcode sequences of same length (barcode8nt.txt)-File containing the genome sequence (genome.fa; for ref_map analysis only)Output: Create a directory, for example “stacks_analysis”, which will contain all futureoutput directories and files.Software: Running Stacks can require the use of other programs such as FastQC, mySQL,Bowtie and Samtools.1- FastQC: to check sequence quality (optional)FastQC will allow to visualizing the reads quality and determine the cut off (= length atwhich the reads should be trimmed).1.1- Run FastQC from the Command Line> /home/bmb/Analysis_Tools/FastQC/fastqc1.2- Open the raw_seq.fq file within FastQC interface to start the analysis.>>>> Example with MungBean:The statistic “Per base sequence content” shows that the base composition (A, T, Cor G) at the different location seems correctly random until position 70.The statistic “Per base sequence quality” shows that the average quality score acrossall bases of the reads starts to decrease after position 40. Within the first 70 bases, the scorestays high (> 28, within the green area) for 90% of the reads (lower whisker).In consequence, for maximum quality data the sequences should be trimmed at about70nt (including barcode)2- Stacks - process_radtags - to clean the raw data (single-end read)process_radtags will filtered the reads (according to their quality, barcode and restrictionsite), regroup reads by barcode (1 file for each), trim barcode and cut the reads to desiredlength.2.1- Create directory “samples” in the directory “stacks_analysis” to outputthe sequence files of all samples (parents + progeny)2.2- Run process_radtags using the Command Line> cd /path/to/stacks> process_radtags -f /path/to/raw_seq.fq -o /path/to/stacks_analysis/samples -b/path/to/barcode8nt.txt -e apeKI -s 20 -t 62 -c -q -r -D• f — path to the input file if (single-end sequences). Use “-p” if multiple files in onedirectory.• o — path to output the processed files.• b — path to a file containing barcodes for this run. One file for each barcode length, one runfor each file.• e — name of the enzyme used to make the radtags library• s — set the average score limit (Phred's value) within a sliding window which length isdetermined by the flag “-w” (default w=0.15, s=10).• t — truncate final read length to this value (according to FastQC data).• c — remove any read with an uncalled base (“N”).• q — discard reads with low quality scores (according to Phred's value).• r — rescue barcodes and RAD-Tags (correct barcodes and restriction enzyme sites).• D — capture discarded reads to a file (useful if processing multiple barcode files).When using multiple lists containing different barcode length, use the reads discarded fromthe process_radtags of the first barcode list (raw_seq.fq.discards) to perform the analysis ofthe second, and so on. Start process_radtags using the longest barcodes and finish with thesmallest.Ex: > process_radtags -f /path/to/raw_seq.fq -o /path/to/stacks_analysis/samples -b /path/to/barcode8nt.txt -e apeKI -s 20 -t 62 -c -q –D> process_radtags -f /path/to/stacks_analysis/samples/raw_seq.fq.discards.fq -o /path/to/stacks_analysis/samples -b /path/to/barcode6nt.txt -e apeKI -s 20 -t 62 -c -q –D>>>> Example using MungBean:The option “-r” helped to rescue an additional 10% of the total sequences byallowing correction of mismatches in the barcode.When using multiple barcode files. Roughly the same sequences are retained whenusing the raw seq or the discarded seq as input. This is however less true for smallerbarcodes for which the use of discarded sequence as input seems to improve data specificity(ex: “GATT” group present 50% less retained sequences when using the discard sequencesas input compare to the raw sequences as input). STACKS efficiently distinguish betweenthe different barcodes in respect of their length and sequence. However, using the discardedseq as input present the advantage of being faster (processing) and avoid the presence ofexcess sequences in the “Sequences not recorded” group.To increase reduce incorporating sequencing error, decrease sliding window size (w),read length (t) or increase average score limit in window (s). in the example the averagescore quality is 20 and the window size is [0.15 x t (93)] = 14 nucleotides. This means that ifthe average score within a 15nt-length window drops below 99% of probability of beingcorrect, the read is discarded.Phred Quality Score Probability of incorrect base call Base call accuracy10 1 in 10 90%20 1 in 100 99%30 1 in 1000 99.9%3- mySQL database: to visualize data in web interfaceA mySQL database as to be created before running denovo_map or ref_map, in order tocollect the data.3.1- create a mySQL database> sudo su> [sudo] password for bmb: # enter “VA903m”> mysql -p> Enter password: # enter “root”> mysql> create database db_name_radtags; #Always use the same suffix “_radtags”3.2- create a table and field content in the new mySQL database> mysql> use db_name_radtags;> mysql> create table batches (………………);script (in green) is copied from “stacks.sql” file and paste in the terminal window.> mysql> q or exit4- Create a catalog of markers with StacksUsing denovo_map or ref_map, Stacks will generate a catalog of potential alleles and locithat can be found in the parents and some of the progeny. The program will generate aseries of files: for each sample: sample1.alleles|matches|snps|tags.tsv; and for the wholepopulation: batch.catalog.alleles|snps|tags.tsv, batch.genotypes.txt, batch.haplotypes.tsv,batch.markers.tsv. These files can be use as such or be upload into a MySQL database foreasier analysis using the web interface.4a- Stacks - denovo_map (without genome sequence)The program will run a series of stacks components (Ustacks > Cstacks > Sstacks >genotypes > load_radtags.pl > index_radtags.pl > genotypes) in order to generate acatalog of loci and alleles (SNPs). In absence of genome sequence to align the radtagssequences, stacks of reads are generated based on their depth of coverage (number ofidentical reads) and associated to create loci based on their sequence similarity. Data canbe view through the mySQL web interface or export in tsv or xls format.4a.1- Create directory “stacks_denovo” in the directory“stacks_analysis” to output the catalog files generated from the denovo analysis.4a.2a- Run denovo_map.pl using the Command Line> cd /usr/local/share/stacks/scripts> denovo_map.pl -m 3 -M 1 -n 1 -T 15 -B db_name_radtags -b 1 -A F2 -t -D “Denovo Map” -o /path/to/stacks_analysis/stacks_denovo -p /path/to/stacks_analysis/samples/parent1.fq -p /path/to/stacks_analysis/samples/parent2.fq -r /path/to/stacks_analysis/samples/progeny1.fq -r /path/to/stacks_analysis/samples/progeny2.fq -r /path/to/stacks_analysis/samples/progeny3.fq
…… enter all samples......
• m —minimum number of raw reads needed to create a stack within an individual and
generate indiv_alleles (default value = 3). Describe as the minimum depth of coverage, the
option is essential to define the stringency level. Higher value of m will insure that less
sequencing error will be treated as polymorphism, but will also reduce the total number of
identified markers.
• M — maximum number of mismatches between stacks to form an haplotype within an
individual and build indiv_loci (default value = 2). It can be seen as the number of SNPs
allowed per locus within same individual. Increasing [M] will increase the number of alleles
and heterozygote loci.
• n — maximum number of mismatches between any two haplotypes (loci) within the
population to build catalog-loci (default value = 0). If [n] > 0, the consensus sequence from
each locus will be used to attempt to merge them together across samples. Therefore, if
locus A from parent 1 is homozygous and locus B from parent 2 is also homozygous, but
they are X nucleotides apart, [n] will govern whether they will be merged when building the
catalog. To get more AAxBB markers, increase [n]. Of course as a side effect when [n]
increases, more physically separate loci will be merge erroneously.
• N — specify the number of mismatches allowed when aligning the secondary reads to
primary stacks (default value = M+2). This is the second run of alignment. The mismatches
here will not count as polymorphism but will be simply ignore, so basically it will just
increase the stack depth. Note that this rescue of reads can have the negative effect by
creating a variation in the locus and the incertitude on whether the locus is homozygous or
heterozygous.
• T — number of threads or cores to run Stacks on.
• t — remove, or break up, highly repetitive RAD-Tags.
• B — name of the mySQL database to load data into.
• b — batch ID representing this dataset (must be a number). Stacks can be run multiple times
on the same dataset and the results will be added to the same database by specifying
different batch IDs. If using an already existing batch ID, the data will not erase the
precious data present in this batch but will be added to them).
• D — batch description.
• H — disable mapping of secondary reads.
• A — if processing a genetic map, specify the cross type, 'CP' (Cross Pollinated = F1 cross),
'F2' (F2 cross, with F0 submitted as the parents), 'BC1' (backcross F1x Parent), 'DH'
(Doubled Haploids), or 'GEN' (Generic, to get a list of all possible markers independently of
the cross type). The program will throw out alleles that could not occur in the specified cross
type situation (Ex.1: in a F2 cross we can have AA/BB markers fr
đang được dịch, vui lòng đợi..
 
Các ngôn ngữ khác
Hỗ trợ công cụ dịch thuật: Albania, Amharic, Anh, Armenia, Azerbaijan, Ba Lan, Ba Tư, Bantu, Basque, Belarus, Bengal, Bosnia, Bulgaria, Bồ Đào Nha, Catalan, Cebuano, Chichewa, Corsi, Creole (Haiti), Croatia, Do Thái, Estonia, Filipino, Frisia, Gael Scotland, Galicia, George, Gujarat, Hausa, Hawaii, Hindi, Hmong, Hungary, Hy Lạp, Hà Lan, Hà Lan (Nam Phi), Hàn, Iceland, Igbo, Ireland, Java, Kannada, Kazakh, Khmer, Kinyarwanda, Klingon, Kurd, Kyrgyz, Latinh, Latvia, Litva, Luxembourg, Lào, Macedonia, Malagasy, Malayalam, Malta, Maori, Marathi, Myanmar, Mã Lai, Mông Cổ, Na Uy, Nepal, Nga, Nhật, Odia (Oriya), Pashto, Pháp, Phát hiện ngôn ngữ, Phần Lan, Punjab, Quốc tế ngữ, Rumani, Samoa, Serbia, Sesotho, Shona, Sindhi, Sinhala, Slovak, Slovenia, Somali, Sunda, Swahili, Séc, Tajik, Tamil, Tatar, Telugu, Thái, Thổ Nhĩ Kỳ, Thụy Điển, Tiếng Indonesia, Tiếng Ý, Trung, Trung (Phồn thể), Turkmen, Tây Ban Nha, Ukraina, Urdu, Uyghur, Uzbek, Việt, Xứ Wales, Yiddish, Yoruba, Zulu, Đan Mạch, Đức, Ả Rập, dịch ngôn ngữ.

Copyright ©2025 I Love Translation. All reserved.

E-mail: