perl - extracting fasta sequences based on position -


i newbie perl. still learning.

i have file in fasta format. extract sequences spanning particular position. example, position 200 300

>contig[0001] tgcatcaaaagctgaaaatatgtagtcgagaagtcatttcgagaaattgacgttttaagt ttcggtttccaaattcaaccggatgtatcttcgccaataattgtcagcagttagaatttc tttcaacattatgaagccctttttatatattttgattctgcatcaaaagctgaaaatatg tagtcttgaagtcatttcgagaaatcgacgttttaagtttctgtttccaaattcaaacgg atgtatcttcgccaataattgtcagaagttagaatttctttcaacattatgaagcccttt ttatatattttgattctgcatcaaaagctgaaaatgtgtagtctcgaagtcatttcgaga aattgacgttttaagtttctgtttccaaattcaaacggatgtatcttcgccaataattgt cagaagttagaatttctttcaacattatgaagccctttttacatattttgaccctgcatc aaaagctgaaaatatgtagtctcgaagtcattttgagaagttagaatttctttcaacatt atgaagccctttttatatattttgattctgcatcaaaagctgaaaatatgtagtctcgaa gtcwtttcragaaattgacgttttaagtttctgtttccaaattcaaacggatgtatcttc gccaataattgtcagaagttagaatttctttcaacattatgaagccctttttatatattt tgactctgcatcaaaagctgaaaatatgtagtctcgaagtcatttcgagaaattgacgtt 

i extract sequences position 200–300 sequence contig[0001]. output be:

>contig[0001]_200-300 agaaatcgacgttttaagtttctgtttccaaattcaaacggatgtatcttcgccaataatt gtcagaagttagaatttctttcaacattatgaagcccttt 

i have 500 sequences in fasta file , have required postions in tab delimited file containing id start end.

it great if me on this.

thank help.i not sure can give file containing info regarding positions.

newbie

one way. content of script.pl:

#!/usr/bin/env perl  use warnings; use strict;  ($adn, $l, $header);  while ( <> ) {      chomp;      ## first line known, header, print , process next one.     if ( $. == 1 ) {          printf qq|%s_%s\n|, $_, q|200-300|;         next;     }         ## concat adn while not found header.     if ( '>' ne substr $_, 0, 1 ) {          if ( ! $l ) { $l = length }         $adn .= $_;          if ( ! eof ) { next }     }        else {         $header = sprintf qq|%s_%s\n|, $_, q|200-300|;     }         ## extract range 200-300 , insert newlines set same length of      ## line before.     $s = substr $adn, 199, 100;     $s =~ s/(.{$l})/$1\n/g;     printf qq|%s\n|, $s;      undef $adn;      ## if not end of file, print header of following adn.     if ( ! eof ) { print $header } } 

run like:

perl script.pl infile 

that yields:

>contig[0001]_200-300 agaaatcgacgttttaagtttctgtttccaaattcaaacggatgtatcttcgccaataat tgtcagaagttagaatttctttcaacattatgaagccctt 

Comments

Popular posts from this blog

jquery - How can I dynamically add a browser tab? -

node.js - Getting the socket id,user id pair of a logged in user(s) -

keyboard - C++ GetAsyncKeyState alternative -