from two homozygousparents, but the

from two homozygous
parents, but these are not trackable in an F1 setup and would be thrown away). Note that the
web interface is showing all possible genotypes, regardless of the mapping type.
• o — directory path to write output files.
• p — path to the file containing the parent sequences.
• r — path to the file containing the progeny sequences.
>>>> Example using MungBean:
Best condition to detect 1nt SNPs should be: m=5 (to reduce risk of single mismatch due to
sequencing error), M=1 (will allow locus with only 1 mismatch between alleles, so no
sequencing error allowed), n=1 (will allow locus match between parents with 1 mismatch,
and so possible aa/bb genotype).
>>>> Increasing “m” allows for more stringent condition and avoid misinterpreting
sequencing errors as SNPs. However, [m] shouldn’t exceed 10 as it may decrease too much
the total number of loci.
4a.2b- Run denovo_map.pl using the Shell Script
Instead ofentering a command line in the terminal, a shell script can be use to run stacks. It
offers the advantage of being able to run multiple samples and tasks at once.
Create the file “stacks.sh” and write the following script:
#!/bin/bash
cd /usr/local/share/stacks/scripts
parent="parent1 parent2 "
progeny="progeny1 progeny2 progeny3 progeny4 progeny5….. "
pathparent=""
pathprogeny=""
for i in $parent
do
pathparent+="-p /path/to/samples/${i}.fq ";
done
for i in $progeny
do
pathprogeny+="-r /path/to/samples/${i}.fq ";
done
denovo_map.pl -m 3 -M 1 -n 1 -T 15 -B db_name_radtags -b 1 -A F2 -t -H
-D “Denovo Map”
-o /path/to/stacks_denovo
$pathparent
$pathprogeny
Then run the script by taping in the following command line in the terminal:
> bash /path/to/stacks.sh
4b- Stacks – ref_map (with genome sequence)
The program will run a series of stacks components (Pstacks > Cstacks > Sstacks >
genotypes > load_radtags.pl > index_radtags.pl > genotypes) in order to generate a
catalog of loci and alleles (SNPs). The mean difference with denovo_map, is that ref_map
use sequences that were first align to a genome of reference using Bowtie (or alignment
program) which generates SAM files. SAM files will be use as input for ref_map. The [m]
flag in ref_map refers to the number of reads that align to a single position in the reference
genome, but the reads can be different which creates alleles (For this process denovo_map
needs 2 flags: [m] number of identical reads and [M] differences between alleles). So [m]
correspond to the locus depth in ref_map, whereas it correspond to the allele depth in
denovo_map. Output data can be view through the mySQL web interface or export in tsv or
xls format.
4b.1- Create directories “samples_ref” and “stacks_ref” in the
directory “stacks_analysis” to output the catalog files generated from the ref_map analysis.
4b.2- Run Bowtie2-build from the command line
The program will index the genome sequence file into 6 subfiles (bt2 files)
> cd /path/to/stacks_analysis/samples_ref
> /path/to/bowtie2-2.2.3/bowtie2-build –f [input] [bt2_base]
[-f] to specify that the input file is in fasta format
[input] path to the genome sequence in fasta format (./genome.fa)
[bt2_base] base name of the future subfiles (Ex: Gen)
4b.3- Run Bowtie2 from the command line
The program will perform an alignment of the radtags reads to the genome.of reference. By
default, Bowtie 2 performs end-to-end read alignment (as oppose to a local alignment
mode). That is, it searches for alignments involving all of the nucleotides in the read and
validates the alignment if its score is above the threshold. Alignment score calculation:
mismatched base = -6, gap = -11 (Ex: for a sequence with one mismatch + 1 gap, alignment
score = -6-11 = -17). Max alignment score = 0 when the match is perfect. The default
minimum score threshold is [-0.6+(-0.6*L)], where L is the read length (Ex: 64bp read,
minimum score = -0.6-(0.6*64) = -39). This can be configured with the option [–score-min
L,-0.6,-0.6].
Execute the program from the directory “samples_ref” containing the bt2 files. Run the
program for each sample (parents and progeny files) independently.
> cd /path/to/stacks_anaylsis/samples_ref
>/path/to/bowtie2-2.2.3/bowtie2 [-x bt2-base] [-U input] [-S output] [- -score-min L,-0.6,-
0.1]
[-x] base name of genome subfiles (Ex: Gen)
[-U] path to the the file containing the sample sequence (ex: parent1.fq)
[-S] path to the file in which the result will be stored (ex: parent1.sam)
[- -score-min] define the variables use to calculate the minimum score threshold [-0.6+(-
0.1*L)], default is [-0.6+(-0.6*L)]. L is the read length.
4b.4a- Run ref_map.pl using the command line
> cd /usr/local/share/stacks/scripts
> ref_map.pl -m 3 -n 1 -T 15 -B db_name_radtags -b 1 -A F2
-D “Ref Map”
-o /path/to/stacks_analysis/stacks_ref
-p /path/to/stacks_analysis/samples/parent1.sam
-p /path/to/stacks_analysis/samples/parent2.sam
-r /path/to/stacks_analysis/samples/progeny1.sam
-r /path/to/stacks_analysis/samples/progeny2.sam
-r /path/to/stacks_analysis/samples/progeny3.sam
…… enter all samples......
• n —number of mismatches allowed between loci when building the catalog (default 0). See
previous chapter for more details.
• m —minimum depth of coverage (nb of reads) to create a stack (default 1). See previous
chapter for more details.
• T — specify the number of threads to execute.
• A — if processing a genetic map, specify the cross type, 'CP', 'F2', 'BC1', 'DH', or 'GEN'.
See previous chapter for more details.
• B — specify a database to load data into.
• b — batch ID representing this dataset in the database.
• D — batch description
• o — directory path to write output files.
• p — path to the file containing the parent sequences.
• r — path to the file containing the progeny sequences.
ATTENTION: ref_map can be run directly using SAM or BAM files. If using BAM files,
make sure that the files are sorted and indexed before being used by ref_map.
4b.4b- Run ref_map.pl using the Shell Script
Instead ofentering a command line in the terminal, a shell script can be use to run stacks. It
offers the advantage of being able to run multiple samples and tasks at once.
Create the file “stacks.sh” and write the following script:
#!/bin/bash
cd /path/to/stacks_analysis/samples_ref
/path/to/bowtie2-2.2.3/bowtie2-build -f genome.fa Gen
samples=”parent1 parent2 progeny1 progeny2 progeny3....”
for file in $samples
do
/path/to/bowtie2 -x Gen -U ../samples/${file}.fq -S ./${file}.sam
done
cd /usr/local/share/stacks/scripts
parent="parent1 parent2 "
progeny="progeny1 progeny2 progeny3 progeny4 progeny5….. "
for i in $parent
do
pathparent+="-p /path/to/samples_ref/${i}.sam ";
done
for i in $progeny
do
pathprogeny+="-r /path/to/samples_ref/${i}.sam ";
done
ref_map.pl -m 3 -n 1 -T 15 -B db_name_radtags -b 1 -A F2
-D “Ref Map”
-o /path/to/stacks_analysis/stacks_ref
$pathparent
$pathprogeny
Then run the script by taping in the following command line in the terminal:
> bash /path/to/stacks.sh
5- Data Analysis on mySQL database
5.1- go to http://localhost/stacks
5.2- Select a database_radtags → Catalog
5.3- Define Filters:
Alleles: Number of alleles per locus. Since both parents (F0) are expected to be
highly homozygous (>90%), we can expect to have maximum 2 alleles per locus, one from
each parent (aa/bb).
Recommended set up: 1-2
SNPs: Range of SNPs number per locus. Since recombination is unlikely to occur
between SNPs of the same locus, filtering on the number of SNPs per allele shouldn't matter.
Recommended set up: 1
Parental Matches: Number of parents matching a locus. If we are working with a
simple cross from closely related parents, then both should match to each locus. However, if
there is a polymorphism between the parents affecting the restriction site use to generate the
radtags library, then only one parent could show an allele for the locus (aa/--). Note that if
there is too much polymorphism between parents or/and if the flag [n] was set too low,
different alleles from each parent that should normally belong to the same locus, may be
classified as two different loci.
Recommended set up: 1-2
Progeny Matches: minimum number of progeny matching the locus. Quality level
of markers are expected to be found in a large number of progeny. Since parents loci should
be mostly homozygous and markers should segregate at 1:2:1 in F2 population; each allele
should be present in ¾ (75%) of the progenies.
Recommended set up: 60% to 75% of the total number of progeny.
Segregation Distortion: Chi-square test for segregation of the marker within the
progeny. The smaller the p-value, the better; Statistical significance is reach when p-value <
0.05. However do not use this filter when dealing with small number of progeny samples
(Adequate sample size > 10 progeny).
Mappable Progeny: Number of progeny for which a genotype can be inferred. (1) A
locus is mappable in a progeny if the progeny’s alleles are consistent with the alleles present
in the parents. (2) A Progeny locus is mappable based on the diversity of its alleles and there
depth of coverage. This is based on criteria that are determined by the genotypes program
which is included in denovo_map and ref_map (see chapter 6b- Stacks – Genotype for
more info).
Mappable Marker: To display only one type a mappable marker based on the
genotypes found in progeny. Mappable markers are based on the mappable progeny.
Genotypes are not called for many loci in the progeny samples because of the presence of
minor alleles. Minor alleles come from the alignment of secondary reads that create a doubt
on whether a locus can be call homozygous or heterozygous, therefore limiting the number
of mappable progeny. Using the flag [-H] could help to correct this problem by disabling the
use of secondary read to call for the haplotypes.
Recommended set up: aa/bb
Genotypes: Similar to mappable progeny.
Note: the program define in the following order: Polymorphisms

from two homozygous
parents, but these are not trackable in an F1 setup and would be thrown away). Note that the
web interface is showing all possible genotypes, regardless of the mapping type.
• o — directory path to write output files.
• p — path to the file containing the parent sequences.
• r — path to the file containing the progeny sequences.
>>>> Example using MungBean:
Best condition to detect 1nt SNPs should be: m=5 (to reduce risk of single mismatch due to
sequencing error), M=1 (will allow locus with only 1 mismatch between alleles, so no
sequencing error allowed), n=1 (will allow locus match between parents with 1 mismatch,
and so possible aa/bb genotype).
>>>> Increasing “m” allows for more stringent condition and avoid misinterpreting
sequencing errors as SNPs. However, [m] shouldn’t exceed 10 as it may decrease too much
the total number of loci.
4a.2b- Run denovo_map.pl using the Shell Script
Instead ofentering a command line in the terminal, a shell script can be use to run stacks. It
offers the advantage of being able to run multiple samples and tasks at once.
Create the file “stacks.sh” and write the following script:
#!/bin/bash
cd /usr/local/share/stacks/scripts
parent="parent1 parent2 "
progeny="progeny1 progeny2 progeny3 progeny4 progeny5….. "
pathparent=""
pathprogeny=""
for i in $parent
do
pathparent+="-p /path/to/samples/${i}.fq ";
done
for i in $progeny
do
pathprogeny+="-r /path/to/samples/${i}.fq ";
done
denovo_map.pl -m 3 -M 1 -n 1 -T 15 -B db_name_radtags -b 1 -A F2 -t -H 
-D “Denovo Map” 
-o /path/to/stacks_denovo 
$pathparent 
$pathprogeny
Then run the script by taping in the following command line in the terminal:
> bash /path/to/stacks.sh
4b- Stacks – ref_map (with genome sequence)
The program will run a series of stacks components (Pstacks > Cstacks > Sstacks >
genotypes > load_radtags.pl > index_radtags.pl > genotypes) in order to generate a
catalog of loci and alleles (SNPs). The mean difference with denovo_map, is that ref_map
use sequences that were first align to a genome of reference using Bowtie (or alignment
program) which generates SAM files. SAM files will be use as input for ref_map. The [m]
flag in ref_map refers to the number of reads that align to a single position in the reference
genome, but the reads can be different which creates alleles (For this process denovo_map
needs 2 flags: [m] number of identical reads and [M] differences between alleles). So [m]
correspond to the locus depth in ref_map, whereas it correspond to the allele depth in
denovo_map. Output data can be view through the mySQL web interface or export in tsv or
xls format.
4b.1- Create directories “samples_ref” and “stacks_ref” in the
directory “stacks_analysis” to output the catalog files generated from the ref_map analysis.
4b.2- Run Bowtie2-build from the command line
The program will index the genome sequence file into 6 subfiles (bt2 files)
> cd /path/to/stacks_analysis/samples_ref
> /path/to/bowtie2-2.2.3/bowtie2-build –f [input] [bt2_base]
[-f] to specify that the input file is in fasta format
[input] path to the genome sequence in fasta format (./genome.fa)
[bt2_base] base name of the future subfiles (Ex: Gen)
4b.3- Run Bowtie2 from the command line
The program will perform an alignment of the radtags reads to the genome.of reference. By
default, Bowtie 2 performs end-to-end read alignment (as oppose to a local alignment
mode). That is, it searches for alignments involving all of the nucleotides in the read and
validates the alignment if its score is above the threshold. Alignment score calculation:
mismatched base = -6, gap = -11 (Ex: for a sequence with one mismatch + 1 gap, alignment
score = -6-11 = -17). Max alignment score = 0 when the match is perfect. The default
minimum score threshold is [-0.6+(-0.6*L)], where L is the read length (Ex: 64bp read,
minimum score = -0.6-(0.6*64) = -39). This can be configured with the option [–score-min
L,-0.6,-0.6].
Execute the program from the directory “samples_ref” containing the bt2 files. Run the
program for each sample (parents and progeny files) independently.
> cd /path/to/stacks_anaylsis/samples_ref
>/path/to/bowtie2-2.2.3/bowtie2 [-x bt2-base] [-U input] [-S output] [- -score-min L,-0.6,-
0.1]
[-x] base name of genome subfiles (Ex: Gen)
[-U] path to the the file containing the sample sequence (ex: parent1.fq)
[-S] path to the file in which the result will be stored (ex: parent1.sam)
[- -score-min] define the variables use to calculate the minimum score threshold [-0.6+(-
0.1*L)], default is [-0.6+(-0.6*L)]. L is the read length.
4b.4a- Run ref_map.pl using the command line
> cd /usr/local/share/stacks/scripts
> ref_map.pl -m 3 -n 1 -T 15 -B db_name_radtags -b 1 -A F2 
-D “Ref Map” 
-o /path/to/stacks_analysis/stacks_ref 
-p /path/to/stacks_analysis/samples/parent1.sam 
-p /path/to/stacks_analysis/samples/parent2.sam 
-r /path/to/stacks_analysis/samples/progeny1.sam 
-r /path/to/stacks_analysis/samples/progeny2.sam 
-r /path/to/stacks_analysis/samples/progeny3.sam 
…… enter all samples......
• n —number of mismatches allowed between loci when building the catalog (default 0). See
previous chapter for more details.
• m —minimum depth of coverage (nb of reads) to create a stack (default 1). See previous
chapter for more details.
• T — specify the number of threads to execute.
• A — if processing a genetic map, specify the cross type, 'CP', 'F2', 'BC1', 'DH', or 'GEN'.
See previous chapter for more details.
• B — specify a database to load data into.
• b — batch ID representing this dataset in the database.
• D — batch description
• o — directory path to write output files.
• p — path to the file containing the parent sequences.
• r — path to the file containing the progeny sequences.
ATTENTION: ref_map can be run directly using SAM or BAM files. If using BAM files,
make sure that the files are sorted and indexed before being used by ref_map.
4b.4b- Run ref_map.pl using the Shell Script
Instead ofentering a command line in the terminal, a shell script can be use to run stacks. It
offers the advantage of being able to run multiple samples and tasks at once.
Create the file “stacks.sh” and write the following script:
#!/bin/bash
cd /path/to/stacks_analysis/samples_ref
/path/to/bowtie2-2.2.3/bowtie2-build -f genome.fa Gen
samples=”parent1 parent2 progeny1 progeny2 progeny3....”
for file in $samples
do
/path/to/bowtie2 -x Gen -U ../samples/${file}.fq -S ./${file}.sam
done
cd /usr/local/share/stacks/scripts
parent="parent1 parent2 "
progeny="progeny1 progeny2 progeny3 progeny4 progeny5….. "
for i in $parent
do
pathparent+="-p /path/to/samples_ref/${i}.sam ";
done
for i in $progeny
do
pathprogeny+="-r /path/to/samples_ref/${i}.sam ";
done
ref_map.pl -m 3 -n 1 -T 15 -B db_name_radtags -b 1 -A F2 
-D “Ref Map” 
-o /path/to/stacks_analysis/stacks_ref 
$pathparent 
$pathprogeny
Then run the script by taping in the following command line in the terminal:
> bash /path/to/stacks.sh
5- Data Analysis on mySQL database
5.1- go to http://localhost/stacks
5.2- Select a database_radtags → Catalog
5.3- Define Filters:
Alleles: Number of alleles per locus. Since both parents (F0) are expected to be
highly homozygous (>90%), we can expect to have maximum 2 alleles per locus, one from
each parent (aa/bb).
Recommended set up: 1-2
SNPs: Range of SNPs number per locus. Since recombination is unlikely to occur
between SNPs of the same locus, filtering on the number of SNPs per allele shouldn't matter.
Recommended set up: 1
Parental Matches: Number of parents matching a locus. If we are working with a
simple cross from closely related parents, then both should match to each locus. However, if
there is a polymorphism between the parents affecting the restriction site use to generate the
radtags library, then only one parent could show an allele for the locus (aa/--). Note that if
there is too much polymorphism between parents or/and if the flag [n] was set too low,
different alleles from each parent that should normally belong to the same locus, may be
classified as two different loci.
Recommended set up: 1-2
Progeny Matches: minimum number of progeny matching the locus. Quality level
of markers are expected to be found in a large number of progeny. Since parents loci should
be mostly homozygous and markers should segregate at 1:2:1 in F2 population; each allele
should be present in ¾ (75%) of the progenies.
Recommended set up: 60% to 75% of the total number of progeny.
Segregation Distortion: Chi-square test for segregation of the marker within the
progeny. The smaller the p-value, the better; Statistical significance is reach when p-value <
0.05. However do not use this filter when dealing with small number of progeny samples
(Adequate sample size > 10 progeny).
Mappable Progeny: Number of progeny for which a genotype can be inferred. (1) A
locus is mappable in a progeny if the progeny’s alleles are consistent with the alleles present
in the parents. (2) A Progeny locus is mappable based on the diversity of its alleles and there
depth of coverage. This is based on criteria that are determined by the genotypes program
which is included in denovo_map and ref_map (see chapter 6b- Stacks – Genotype for
more info).
Mappable Marker: To display only one type a mappable marker based on the
genotypes found in progeny. Mappable markers are based on the mappable progeny.
Genotypes are not called for many loci in the progeny samples because of the presence of
minor alleles. Minor alleles come from the alignment of secondary reads that create a doubt
on whether a locus can be call homozygous or heterozygous, therefore limiting the number
of mappable progeny. Using the flag [-H] could help to correct this problem by disabling the
use of secondary read to call for the haplotypes.
Recommended set up: aa/bb
Genotypes: Similar to mappable progeny.
Note: the program define in the following order: Polymorphisms

0/5000

Từ: -

Sang: -

Kết quả (Việt) 1: [Sao chép]

Sao chép!

from two homozygousparents, but these are not trackable in an F1 setup and would be thrown away). Note that theweb interface is showing all possible genotypes, regardless of the mapping type.• o — directory path to write output files.• p — path to the file containing the parent sequences.• r — path to the file containing the progeny sequences.>>>> Example using MungBean:Best condition to detect 1nt SNPs should be: m=5 (to reduce risk of single mismatch due tosequencing error), M=1 (will allow locus with only 1 mismatch between alleles, so nosequencing error allowed), n=1 (will allow locus match between parents with 1 mismatch,and so possible aa/bb genotype).>>>> Increasing “m” allows for more stringent condition and avoid misinterpretingsequencing errors as SNPs. However, [m] shouldn’t exceed 10 as it may decrease too muchthe total number of loci.4a.2b- Run denovo_map.pl using the Shell ScriptInstead ofentering a command line in the terminal, a shell script can be use to run stacks. Itoffers the advantage of being able to run multiple samples and tasks at once.Create the file “stacks.sh” and write the following script:#!/bin/bashcd /usr/local/share/stacks/scriptsparent="parent1 parent2 "progeny="progeny1 progeny2 progeny3 progeny4 progeny5….. "pathparent=""pathprogeny=""for i in $parentdopathparent+="-p /path/to/samples/${i}.fq ";donefor i in $progenydopathprogeny+="-r /path/to/samples/${i}.fq ";donedenovo_map.pl -m 3 -M 1 -n 1 -T 15 -B db_name_radtags -b 1 -A F2 -t -H -D “Denovo Map” -o /path/to/stacks_denovo $pathparent $pathprogenyThen run the script by taping in the following command line in the terminal:> bash /path/to/stacks.sh4b- Stacks – ref_map (with genome sequence)The program will run a series of stacks components (Pstacks > Cstacks > Sstacks >genotypes > load_radtags.pl > index_radtags.pl > genotypes) in order to generate acatalog of loci and alleles (SNPs). The mean difference with denovo_map, is that ref_mapuse sequences that were first align to a genome of reference using Bowtie (or alignmentprogram) which generates SAM files. SAM files will be use as input for ref_map. The [m]flag in ref_map refers to the number of reads that align to a single position in the referencegenome, but the reads can be different which creates alleles (For this process denovo_mapneeds 2 flags: [m] number of identical reads and [M] differences between alleles). So [m]correspond to the locus depth in ref_map, whereas it correspond to the allele depth indenovo_map. Output data can be view through the mySQL web interface or export in tsv orxls format.4b.1- Create directories “samples_ref” and “stacks_ref” in thedirectory “stacks_analysis” to output the catalog files generated from the ref_map analysis.4b.2- Run Bowtie2-build from the command lineThe program will index the genome sequence file into 6 subfiles (bt2 files)> cd /path/to/stacks_analysis/samples_ref> /path/to/bowtie2-2.2.3/bowtie2-build –f [input] [bt2_base][-f] to specify that the input file is in fasta format[input] path to the genome sequence in fasta format (./genome.fa)[bt2_base] base name of the future subfiles (Ex: Gen)4b.3- Run Bowtie2 from the command lineThe program will perform an alignment of the radtags reads to the genome.of reference. Bydefault, Bowtie 2 performs end-to-end read alignment (as oppose to a local alignmentmode). That is, it searches for alignments involving all of the nucleotides in the read andvalidates the alignment if its score is above the threshold. Alignment score calculation:mismatched base = -6, gap = -11 (Ex: for a sequence with one mismatch + 1 gap, alignmentscore = -6-11 = -17). Max alignment score = 0 when the match is perfect. The defaultminimum score threshold is [-0.6+(-0.6*L)], where L is the read length (Ex: 64bp read,minimum score = -0.6-(0.6*64) = -39). This can be configured with the option [–score-minL,-0.6,-0.6].Execute the program from the directory “samples_ref” containing the bt2 files. Run theprogram for each sample (parents and progeny files) independently.> cd /path/to/stacks_anaylsis/samples_ref>/path/to/bowtie2-2.2.3/bowtie2 [-x bt2-base] [-U input] [-S output] [- -score-min L,-0.6,-0.1][-x] base name of genome subfiles (Ex: Gen)[-U] path to the the file containing the sample sequence (ex: parent1.fq)[-S] path to the file in which the result will be stored (ex: parent1.sam)[- -score-min] define the variables use to calculate the minimum score threshold [-0.6+(-0.1*L)], default is [-0.6+(-0.6*L)]. L is the read length.4b.4a- Run ref_map.pl using the command line> cd /usr/local/share/stacks/scripts> ref_map.pl -m 3 -n 1 -T 15 -B db_name_radtags -b 1 -A F2 -D “Ref Map” -o /path/to/stacks_analysis/stacks_ref -p /path/to/stacks_analysis/samples/parent1.sam -p /path/to/stacks_analysis/samples/parent2.sam -r /path/to/stacks_analysis/samples/progeny1.sam -r /path/to/stacks_analysis/samples/progeny2.sam -r /path/to/stacks_analysis/samples/progeny3.sam …… enter all samples......• n —number of mismatches allowed between loci when building the catalog (default 0). Seeprevious chapter for more details.• m —minimum depth of coverage (nb of reads) to create a stack (default 1). See previouschapter for more details.• T — specify the number of threads to execute.• A — if processing a genetic map, specify the cross type, 'CP', 'F2', 'BC1', 'DH', or 'GEN'.See previous chapter for more details.• B — specify a database to load data into.• b — batch ID representing this dataset in the database.• D — batch description• o — directory path to write output files.• p — path to the file containing the parent sequences.• r — path to the file containing the progeny sequences.ATTENTION: ref_map can be run directly using SAM or BAM files. If using BAM files,make sure that the files are sorted and indexed before being used by ref_map.4b.4b- Run ref_map.pl using the Shell ScriptInstead ofentering a command line in the terminal, a shell script can be use to run stacks. Itoffers the advantage of being able to run multiple samples and tasks at once.Create the file “stacks.sh” and write the following script:#!/bin/bashcd /path/to/stacks_analysis/samples_ref/path/to/bowtie2-2.2.3/bowtie2-build -f genome.fa Gensamples=”parent1 parent2 progeny1 progeny2 progeny3....”for file in $samplesdo/path/to/bowtie2 -x Gen -U ../samples/${file}.fq -S ./${file}.samdonecd /usr/local/share/stacks/scriptsparent="parent1 parent2 "progeny="progeny1 progeny2 progeny3 progeny4 progeny5….. "for i in $parentdopathparent+="-p /path/to/samples_ref/${i}.sam ";donefor i in $progenydopathprogeny+="-r /path/to/samples_ref/${i}.sam ";doneref_map.pl -m 3 -n 1 -T 15 -B db_name_radtags -b 1 -A F2 -D “Ref Map” -o /path/to/stacks_analysis/stacks_ref $pathparent $pathprogenyThen run the script by taping in the following command line in the terminal:> bash /path/to/stacks.sh5- Data Analysis on mySQL database5.1- go to http://localhost/stacks5.2- Select a database_radtags → Catalog5.3- Define Filters:
Alleles: Number of alleles per locus. Since both parents (F0) are expected to be
highly homozygous (>90%), we can expect to have maximum 2 alleles per locus, one from
each parent (aa/bb).
Recommended set up: 1-2
SNPs: Range of SNPs number per locus. Since recombination is unlikely to occur
between SNPs of the same locus, filtering on the number of SNPs per allele shouldn't matter.
Recommended set up: 1
Parental Matches: Number of parents matching a locus. If we are working with a
simple cross from closely related parents, then both should match to each locus. However, if
there is a polymorphism between the parents affecting the restriction site use to generate the
radtags library, then only one parent could show an allele for the locus (aa/--). Note that if
there is too much polymorphism between parents or/and if the flag [n] was set too low,
different alleles from each parent that should normally belong to the same locus, may be
classified as two different loci.
Recommended set up: 1-2
Progeny Matches: minimum number of progeny matching the locus. Quality level
of markers are expected to be found in a large number of progeny. Since parents loci should
be mostly homozygous and markers should segregate at 1:2:1 in F2 population; each allele
should be present in ¾ (75%) of the progenies.
Recommended set up: 60% to 75% of the total number of progeny.
Segregation Distortion: Chi-square test for segregation of the marker within the
progeny. The smaller the p-value, the better; Statistical significance is reach when p-value <
0.05. However do not use this filter when dealing with small number of progeny samples
(Adequate sample size > 10 progeny).
Mappable Progeny: Number of progeny for which a genotype can be inferred. (1) A
locus is mappable in a progeny if the progeny’s alleles are consistent with the alleles present
in the parents. (2) A Progeny locus is mappable based on the diversity of its alleles and there
depth of coverage. This is based on criteria that are determined by the genotypes program
which is included in denovo_map and ref_map (see chapter 6b- Stacks – Genotype for
more info).
Mappable Marker: To display only one type a mappable marker based on the
genotypes found in progeny. Mappable markers are based on the mappable progeny.
Genotypes are not called for many loci in the progeny samples because of the presence of
minor alleles. Minor alleles come from the alignment of secondary reads that create a doubt
on whether a locus can be call homozygous or heterozygous, therefore limiting the number
of mappable progeny. Using the flag [-H] could help to correct this problem by disabling the
use of secondary read to call for the haplotypes.
Recommended set up: aa/bb
Genotypes: Similar to mappable progeny.
Note: the program define in the following order: Polymorphisms

đang được dịch, vui lòng đợi..

Kết quả (Việt) 2:[Sao chép]

Sao chép!

từ hai đồng hợp tử
cha mẹ, nhưng đây không phải là dễ theo dõi trong một thiết lập F1 và sẽ được bỏ đi). Lưu ý rằng
giao diện web được hiển thị tất cả các kiểu gen có thể, bất kể các loại bản đồ.
• o -. đường dẫn thư mục để viết các tập tin đầu ra
• p -. đường dẫn đến tập tin có chứa các chuỗi mẹ
• r - đường dẫn đến tập tin có chứa các trình tự thế hệ con cháu .
>>>> Ví dụ sử dụng đậu xanh:
điều kiện tốt nhất để phát hiện SNPs 1nt nên là: m = 5 (để giảm nguy cơ không phù hợp duy nhất do
lỗi sequencing), M = 1 (locus sẽ cho phép với chỉ 1 không phù hợp giữa gen tương ứng, vì vậy không có
lỗi trình tự cho phép), n = 1 (sẽ cho phép locus trận đấu giữa cha mẹ với 1 không phù hợp,
và như vậy có thể aa / bb kiểu gen).
>>>> Tăng "m" cho phép điều kiện nghiêm ngặt hơn và tránh hiểu sai
lỗi trình tự như SNPs. Tuy nhiên, [m] không nên vượt quá 10 vì nó có thể làm giảm quá nhiều
tổng số loci.
4a.2b- Run denovo_map.pl sử dụng Shell Script
Thay vào đó ofentering một dòng lệnh trong các thiết bị đầu cuối, một kịch bản có thể được sử dụng để stacks chạy. Nó
cung cấp các lợi thế của việc có thể chạy nhiều mẫu và nhiệm vụ cùng một lúc.
Tạo file "stacks.sh" và viết kịch bản sau đây:
#! / bin / bash
cd / usr / local / share / stacks / scripts
mẹ = " parent1 parent2 "
con cháu = "progeny1 progeny2 progeny3 progeny4 progeny5 ... .."
pathparent = ""
pathprogeny = ""
cho tôi trong $ mẹ
làm
pathparent + = "- p /path/to/samples/${i}.fq";
done
cho tôi trong $ con cháu
làm
pathprogeny + = "- r /path/to/samples/${i}.fq";
thực hiện
denovo_map.pl -m 3 -M 1 -n 1 -T 15 -B db_name_radtags -b 1 -A F2 -t -H
-D "Denovo Map"
o / path / to / stacks_denovo
$ pathparent
$ pathprogeny
Sau đó chạy script bằng cách vỗ nhẹ trong dòng lệnh sau trong terminal:
> bash / path / to / ngăn xếp sh
4b- Stacks - ref_map (với trình tự hệ gen)
Chương trình sẽ chạy một loạt các thành phần ngăn xếp (Pstacks> Cstacks> Sstacks>
kiểu gen> load_radtags.pl> index_radtags.pl> kiểu gen) để tạo ra một
danh mục các loci và alen (SNPs). Sự khác biệt có ý nghĩa với denovo_map, là ref_map
trình tự sử dụng lần đầu tiên được sắp xếp để một bộ gen tham chiếu sử dụng Bowtie (hoặc chỉnh
chương trình) mà tạo ra các tập tin SAM. File SAM sẽ được sử dụng như là đầu vào cho ref_map. Các [m]
cờ trong ref_map đề cập đến số lần đọc mà sắp xếp để một vị trí duy nhất trong tài liệu tham khảo
bộ gen, nhưng đọc có thể khác nhau mà tạo ra alen (Đối với quá trình này denovo_map
cần 2 lá cờ: [m] số giống hệt nhau đọc và [M] sự khác biệt giữa các alen). Vì vậy, [m]
tương ứng với chiều sâu locus trong ref_map, trong khi nó tương ứng với độ sâu allele trong
denovo_map. Dữ liệu đầu ra có thể xem thông qua giao diện web mySQL, xuất khẩu trong tsv hoặc
định dạng xls.
4b.1- Tạo thư mục "samples_ref" và "stacks_ref" trong
thư mục "stacks_analysis" để sản xuất các tệp danh mục được tạo ra từ việc phân tích ref_map.
4b. 2- Run Bowtie2-xây dựng từ dòng lệnh
Chương trình sẽ chỉ mục các tập tin trình tự bộ gen thành 6 tập con (bt2 file)
> cd / path / to / stacks_analysis / samples_ref
> /path/to/bowtie2-2.2.3/bowtie2-build -f [nhập] [bt2_base]
[-f] để xác định rằng các tập tin đầu vào là trong FASTA định dạng
[nhập] đường dẫn đến trình tự bộ gen ở định dạng FASTA (./genome.fa)
[bt2_base] tên cơ sở của các tập con tương lai ( Ex: Gen)
4b.3- Run Bowtie2 từ dòng lệnh
chương trình sẽ thực hiện một sự liên kết của các radtags đọc để tham khảo genome.of. Bởi
mặc định, Bowtie 2 thực hiện end-to-end đọc liên kết (như phản đối một sự liên kết địa phương
mode). Đó là, nó tìm kiếm sự sắp xếp, liên quan đến tất cả các nucleotide trong đọc và
xác nhận sự liên kết nếu điểm số của nó là ở trên ngưỡng. Alignment số tính toán:
base sai = -6, khoảng cách = -11 (Ex: cho một chuỗi với một mismatch + 1 khoảng cách, liên kết
điểm số = -6-11 = -17). Max kết score = 0 khi trận đấu là hoàn hảo. Các mặc định
ngưỡng số điểm tối thiểu là [-0.6 + (- 0.6 * L)], trong đó L là chiều dài read (Ex: 64bp đọc,
số điểm tối thiểu = -0.6- (0,6 * 64) = -39). Điều này có thể được cấu hình với các tùy chọn [-score-min
L, -0.6, -0.6].
Thực hiện các chương trình từ thư mục "samples_ref" có chứa các tập tin bt2. Chạy
chương trình cho mỗi mẫu (cha mẹ và các tập tin hệ con cháu) độc lập.
> cd / path / to / stacks_anaylsis / samples_ref
> /path/to/bowtie2-2.2.3/bowtie2 [-x bt2-base] [nhập -U] [ sản lượng -S] [- -score-min L, -0.6, -
0.1]
[-x] tên cơ sở của tập con gen (Ex: Gen)
[U] đường dẫn đến tập tin có chứa các trình tự mẫu (ví dụ: parent1. fq)
[-S] đường dẫn đến tập tin trong đó kết quả sẽ được lưu trữ (ví dụ: parent1.sam)
[- -score-min] định nghĩa các biến sử dụng để tính toán các ngưỡng điểm tối thiểu [-0.6 + (-
0.1 * L )], mặc định là [-0.6 + (- 0.6 * L)]. L là chiều dài đọc.
4b.4a- Run ref_map.pl sử dụng dòng lệnh
> cd / usr / local / share / stacks / script
> ref_map.pl -m 3 -n 1 -T 15 -B db_name_radtags -b 1 - Một F2
-D "Ref Map"
o / path / to / stacks_analysis / stacks_ref
p /path/to/stacks_analysis/samples/parent1.sam
p /path/to/stacks_analysis/samples/parent2.sam
r /path/to/stacks_analysis/samples/progeny1.sam
r /path/to/stacks_analysis/samples/progeny2.sam
r /path/to/stacks_analysis/samples/progeny3.sam
...... nhập tất cả mẫu ......
• n -số sai lệch cho phép giữa các locus khi xây dựng danh mục (mặc định là 0). Xem
chương trước để biết thêm chi tiết.
• m -minimum sâu vùng phủ sóng (nb lần đọc) để tạo ra một ngăn xếp (default 1). Xem trước
chương để biết thêm chi tiết.
• T - xác định số lượng các chủ đề để thực thi.
• A - nếu chế biến một bản đồ di truyền, xác định kiểu chéo, 'CP', 'F2', 'BC1', 'DH', hay ' GEN '.
Xem chương trước để biết thêm chi tiết.
• B -. chỉ định một cơ sở dữ liệu để tải dữ liệu vào
• b - ID batch đại diện cho tập dữ liệu này trong cơ sở dữ liệu.
• D - mô tả hàng loạt
• o - đường dẫn thư mục để viết các tập tin đầu ra.
• p . - đường dẫn đến tập tin có chứa các chuỗi mẹ
. • r - đường dẫn đến tập tin có chứa các trình tự thế hệ con cháu
ATTENTION: ref_map có thể chạy trực tiếp sử dụng SAM hay BAM tập tin. Nếu sử dụng các tập tin BAM,
đảm bảo rằng các tập tin được sắp xếp và lập chỉ mục trước khi được sử dụng bởi ref_map.
4b.4b- Run ref_map.pl sử dụng Shell Script
Thay vào đó ofentering một dòng lệnh trong các thiết bị đầu cuối, một kịch bản có thể được sử dụng để chạy stacks . Nó
cung cấp các lợi thế của việc có thể chạy nhiều mẫu và nhiệm vụ cùng một lúc.
Tạo file "stacks.sh" và viết kịch bản sau đây:
#! / bin / bash
cd / path / to / stacks_analysis / samples_ref
/ path / to / bowtie2-2.2.3 / bowtie2-build-f genome.fa Gen
mẫu = "parent1 parent2 progeny1 progeny2 progeny3 ...."
cho tập tin trong $ mẫu
làm
/ path / to / bowtie2 -x Gen -U ../samples/ $ {file} .fq -S ./${file}.sam
làm
cd / usr / local / share / stacks / scripts
mẹ = "parent1 parent2"
con cháu = "progeny1 progeny2 progeny3 progeny4 progeny5 ... .."
cho tôi trong $ mẹ
làm
pathparent + = "- p /path/to/samples_ref/${i}.sam";
làm
cho tôi trong $ con cháu
làm
pathprogeny + = "- r /path/to/samples_ref/${i}.sam";
done
ref_map.pl -m 3 -n 1 -T 15 -B db_name_radtags -b 1 -A F2
-D "Ref Map"
o / path / to / stacks_analysis / stacks_ref
$ pathparent
$ pathprogeny
Sau đó chạy kịch bản bằng taping trong dòng lệnh sau trong terminal:
> bash /path/to/stacks.sh
5- Phân tích dữ liệu trên cơ sở dữ liệu mySQL
5.1 đi đến http: // localhost / ngăn xếp
5.2- Chọn một database_radtags → Catalog
5.3- Xác định Bộ lọc:
alen: Số alen mỗi locus. Kể từ khi cả hai cha mẹ (F0) dự kiến sẽ được
đánh giá cao đồng hợp tử (> 90%), chúng ta có thể mong đợi để có tối đa 2 alen mỗi locus, một từ
. mỗi phụ huynh (aa / bb)
Recommended lập: 1-2
SNPs: Phạm vi số SNPs mỗi locus. Kể từ khi tái tổ hợp là không thể xảy ra
. giữa SNPs của cùng một locus, lọc theo số lượng SNPs mỗi alen không phải vấn đề
thiệu thiết lập: 1
Trận Đấu Parental: Số cha mẹ phù hợp với một locus. Nếu chúng ta đang làm việc với một
chéo đơn giản từ các bậc cha mẹ có liên quan chặt chẽ, sau đó cả hai phải phù hợp với từng locus. Tuy nhiên, nếu
có một đa hình giữa cha mẹ ảnh hưởng đến việc sử dụng trang web hạn chế để tạo ra các
thư viện radtags, sau đó chỉ có một phụ huynh có thể hiển thị một allele cho các locus (aa / -). Lưu ý rằng nếu
có quá nhiều đa hình giữa cha mẹ hoặc / và nếu cờ [n] được đặt quá thấp,
alen khác nhau từ mỗi phụ huynh thường phải thuộc về cùng một locus, có thể được
phân loại như là hai loci khác nhau.
Đề xuất thành lập: 1-2
Progeny các trận đấu: số lượng tối thiểu của các thế hệ con cháu phù hợp với các locus. Mức chất lượng
các mốc dự kiến sẽ được tìm thấy trong một số lượng lớn các thế hệ con cháu. Kể từ khi cha mẹ loci nên
được chủ yếu là đồng hợp tử và các dấu hiệu nên cách ly trong 1: 2: 1 trong dân F2; mỗi alen
. nên có mặt trong ¾ (75%) của progenies
thiệu thiết lập: 60% đến 75% của tổng số các thế hệ con cháu.
Sự phân biệt Distortion: thử nghiệm Chi-square cho sự phân ly của các điểm đánh dấu trong các
thế hệ con cháu. Các nhỏ hơn p-giá trị, thì tốt hơn; Ý nghĩa thống kê là đạt được khi giá trị p <
0,05. Tuy nhiên không sử dụng bộ lọc này khi giao dịch với số lượng nhỏ các mẫu thế hệ con cháu
(Adequate mẫu size> 10 con cháu).
Progeny mappable: Số con cháu mà một kiểu gen có thể được suy ra. (1) Một
locus là mappable trong một thế hệ con cháu nếu alen của đời con là phù hợp với các alen có mặt
trong các bậc cha mẹ. (2) Một locus Progeny là mappable dựa trên sự đa dạng của các alen của nó và có
chiều sâu của vùng phủ sóng. Điều này được dựa trên các tiêu chí được xác định bằng các chương trình kiểu gen
được bao gồm trong denovo_map và ref_map (xem chương 6b- Stacks - kiểu gen cho
biết thêm).
mappable Marker: Để chỉ hiển thị một loại một marker mappable dựa trên các
kiểu gen được tìm thấy trong thế hệ con cháu . Đánh dấu mappable được dựa trên các thế hệ con cháu mappable.
Kiểu gen không được gọi cho nhiều loci trong các mẫu thế hệ con cháu vì sự hiện diện của các
alen nhỏ. Alen nhỏ đến từ sự liên kết của trung đọc mà tạo ra một nghi ngờ
về việc liệu một locus có thể được gọi đồng hợp tử hay dị hợp tử, do đó hạn chế số lượng
các thế hệ con cháu mappable. Sử dụng các flag [-H] có thể giúp đỡ để sửa vấn đề này bằng cách vô hiệu hóa
. sử dụng đọc thứ cấp để kêu gọi các haplotype
thiệu thiết lập: aa / bb
tương tự cho con cháu mappable: Kiểu gen.
Lưu ý: chương trình xác định theo thứ tự sau: Các đa hình

đang được dịch, vui lòng đợi..

Kết quả (Việt) 3:[Sao chép]

Sao chép!

đang được dịch, vui lòng đợi..

Các ngôn ngữ khác

Hỗ trợ công cụ dịch thuật: Albania, Amharic, Anh, Armenia, Azerbaijan, Ba Lan, Ba Tư, Bantu, Basque, Belarus, Bengal, Bosnia, Bulgaria, Bồ Đào Nha, Catalan, Cebuano, Chichewa, Corsi, Creole (Haiti), Croatia, Do Thái, Estonia, Filipino, Frisia, Gael Scotland, Galicia, George, Gujarat, Hausa, Hawaii, Hindi, Hmong, Hungary, Hy Lạp, Hà Lan, Hà Lan (Nam Phi), Hàn, Iceland, Igbo, Ireland, Java, Kannada, Kazakh, Khmer, Kinyarwanda, Klingon, Kurd, Kyrgyz, Latinh, Latvia, Litva, Luxembourg, Lào, Macedonia, Malagasy, Malayalam, Malta, Maori, Marathi, Myanmar, Mã Lai, Mông Cổ, Na Uy, Nepal, Nga, Nhật, Odia (Oriya), Pashto, Pháp, Phát hiện ngôn ngữ, Phần Lan, Punjab, Quốc tế ngữ, Rumani, Samoa, Serbia, Sesotho, Shona, Sindhi, Sinhala, Slovak, Slovenia, Somali, Sunda, Swahili, Séc, Tajik, Tamil, Tatar, Telugu, Thái, Thổ Nhĩ Kỳ, Thụy Điển, Tiếng Indonesia, Tiếng Ý, Trung, Trung (Phồn thể), Turkmen, Tây Ban Nha, Ukraina, Urdu, Uyghur, Uzbek, Việt, Xứ Wales, Yiddish, Yoruba, Zulu, Đan Mạch, Đức, Ả Rập, dịch ngôn ngữ.