perl - extracting fasta sequences based on position -
i newbie perl. still learning.
i have file in fasta format. extract sequences spanning particular position. example, position 200 300
>contig[0001] tgcatcaaaagctgaaaatatgtagtcgagaagtcatttcgagaaattgacgttttaagt ttcggtttccaaattcaaccggatgtatcttcgccaataattgtcagcagttagaatttc tttcaacattatgaagccctttttatatattttgattctgcatcaaaagctgaaaatatg tagtcttgaagtcatttcgagaaatcgacgttttaagtttctgtttccaaattcaaacgg atgtatcttcgccaataattgtcagaagttagaatttctttcaacattatgaagcccttt ttatatattttgattctgcatcaaaagctgaaaatgtgtagtctcgaagtcatttcgaga aattgacgttttaagtttctgtttccaaattcaaacggatgtatcttcgccaataattgt cagaagttagaatttctttcaacattatgaagccctttttacatattttgaccctgcatc aaaagctgaaaatatgtagtctcgaagtcattttgagaagttagaatttctttcaacatt atgaagccctttttatatattttgattctgcatcaaaagctgaaaatatgtagtctcgaa gtcwtttcragaaattgacgttttaagtttctgtttccaaattcaaacggatgtatcttc gccaataattgtcagaagttagaatttctttcaacattatgaagccctttttatatattt tgactctgcatcaaaagctgaaaatatgtagtctcgaagtcatttcgagaaattgacgtt
i extract sequences position 200–300 sequence contig[0001]
. output be:
>contig[0001]_200-300 agaaatcgacgttttaagtttctgtttccaaattcaaacggatgtatcttcgccaataatt gtcagaagttagaatttctttcaacattatgaagcccttt
i have 500 sequences in fasta file , have required postions in tab delimited file containing id start end.
it great if me on this.
thank help.i not sure can give file containing info regarding positions.
newbie
one way. content of script.pl
:
#!/usr/bin/env perl use warnings; use strict; ($adn, $l, $header); while ( <> ) { chomp; ## first line known, header, print , process next one. if ( $. == 1 ) { printf qq|%s_%s\n|, $_, q|200-300|; next; } ## concat adn while not found header. if ( '>' ne substr $_, 0, 1 ) { if ( ! $l ) { $l = length } $adn .= $_; if ( ! eof ) { next } } else { $header = sprintf qq|%s_%s\n|, $_, q|200-300|; } ## extract range 200-300 , insert newlines set same length of ## line before. $s = substr $adn, 199, 100; $s =~ s/(.{$l})/$1\n/g; printf qq|%s\n|, $s; undef $adn; ## if not end of file, print header of following adn. if ( ! eof ) { print $header } }
run like:
perl script.pl infile
that yields:
>contig[0001]_200-300 agaaatcgacgttttaagtttctgtttccaaattcaaacggatgtatcttcgccaataat tgtcagaagttagaatttctttcaacattatgaagccctt
Comments
Post a Comment