First one is free ... #!/usr/bin/perl use strict; my ($directory,$directory_handle,$file, at files,$sequence); my ($file_handle,$poly_a_tail,$rseq); $directory = "./"; # directory to open if (!(opendir $directory_handle,$directory)) { die "FATAL ERROR: Unable to open directory = ".$directory."\n"; } # select only the .seq files @files = grep { /\.seq$/ } readdir($directory_handle); # loop over these selected files foreach $file (@files) { # try to open the file if (!(open($file_handle,"< ".$file))) { # if we cannot open it, warn the user, and skip to the next file warn "Warning: unable to open file = ".$file."\. Skipping\.\n"; next; } else { # assume one line per file, or we will have to modify this chomp($sequence=<$file_handle>); # now time to bring out the heavy artillery $rseq=reverse $sequence; # poly-a is now at the head $rseq =~ /^([AN]+)\w+$/; # match A's and/or N's at the front $poly_a_tail = $1; # return the match ... printf "%i %s\n",length($poly_a_tail),$file; # tell the world ... close($file_handle); } } On Tue, 2003-09-09 at 17:00, Tristan Fiedler wrote: > Thanks for the scripting tips! I have a 'counting' issue which I need to > quickly resolve. A typical sequence input file (5 - 700 bases) looks like > : > > AGTAGTCGATCATNATANCTANTACNACTACTAACTATGCTAGNNAATATAAAAAAAAANAAA > > I have over 500 files, named *.seq. I would like to create a script which : > > a. runs through all the files, > b. counts the length of the 'poly A' tail (defined as the longest stretch > of A or N) > c. sends the output to a file, eg. > > 25 1.seq > 87 2.seq > 13 3.seq > > Example valid poly A tails : > > AAAANANANANAAANNAAAAAA > > AAAAAAAAAAAAAA > > NNNNNNNNNNNNN > > AAANNNNNNNNNNNAAAAAAAAA > > Thank you so much for your expertise! > > Tristan -- Joseph Landman, Ph.D Scalable Informatics LLC email: landman at scalableinformatics.com web: http://scalableinformatics.com phone: +1 734 612 4615