On Wed, 2003-08-27 at 17:30, Dan Bolser wrote: > Joseph Landman said: > > Hi Dan: > > > > I am assuming that your arrays contain not HSP's, but some sort of > > object representing the HSP ala BioPerl. Is this correct? > > > > Nope, just a simple hash, key => value, > I.E. > > { > hsp_hit-from => 5, > hsp_hit-to => 10, > score => 20, > } > Ok. Comments interspersed: > >> __SKIP__ > >> > >> preamble > >> > >> @hsps = array of HSP hashes, for a particular protein > >> each HSP can be from several SCOP sequences. > >> > >> __RESUME__ > >> > >> my @result; # Final HSP's > >> > >> TOP:while (@hsps){ # NB: Ordered by Hsp_query-from > >> # (for optimzation). > >> > >> my $p = 0; # Current HSP pointer. > >> > >> MID:for (my $j=$p+1; $j<@hsps; $j++){ # Overlap slider. These two loops give you something comparable to an O(N^2) algorithm. That is, I can rewrite the while and the for as foreach my $hsp (@hsps) # Outer O(N): 1 iteration per entry foreach my $index (0 .. $#hsps) # Inner O(N): 1 iteration per entry > >> > >> # Family overlap only! > >> > >> next MID if > >> $hsps[$p]->{SCCS} != $hsps[$j]->{SCCS}; Short circuits are good. > >> > >> # Optimization. > >> > >> if ( $THRESH > > >> $hsps[$p]->{P_END} - $hsps[$j]->{P_START} ){ > >> > >> shift @hsps; > >> next TOP; > >> } A shift or a similar operation (splice, push, etc) will likely trigger the garbage collector in Perl, as well as the memory allocator > >> > >> # Pick best of pair (removing the other from the list). > >> > >> if ( $hsps[$p]->{E_VALUE} > $hsps[$j]->{E_VALUE} ){ > >> splice (@hsps, $p, 1); > >> $j--; > >> $p = $j; > >> } > >> else { > >> splice (@hsps, $j, 1); > >> $j--; > >> } > >> } > >> push @result, splice(@hsps, $p, 1); > >> } > >> print "OK\n\n"; I might suggest that instead of moving data around (splice, push, shift) and causing allocation/GC events, that you might maintain several lists, indicating which class the HSP's belong to. -- Joseph Landman, Ph.D Scalable Informatics LLC email: landman at scalableinformatics.com web: http://scalableinformatics.com phone: +1 734 612 4615