Cheers... On 27 Aug 2003, Joseph Landman wrote: > Comments interspersed: > > > > >> > > >> my @result; # Final HSP's > > >> > > >> TOP:while (@hsps){ # NB: Ordered by Hsp_query-from > > >> # (for optimzation). > > >> > > >> my $p = 0; # Current HSP pointer. > > >> > > >> MID:for (my $j=$p+1; $j<@hsps; $j++){ # Overlap slider. > > These two loops give you something comparable to an O(N^2) algorithm. > That is, I can rewrite the while and the for as Yup, but for each inner N, N gets smaller by one. is this still O(N^2) ? i.e. N*N + N*(N-1) + N*(N-2) + ... + N*(N-(N-2)) + N*(N-(N-1)) + 0; > > >> > > >> # Family overlap only! > > >> > > >> next MID if > > >> $hsps[$p]->{SCCS} != $hsps[$j]->{SCCS}; > > Short circuits are good. Yup, but this one is fundamental for the way the algorithm works, I.e. The outer loop is called once per unique SCOP family in the HSP's for this protein... Hmmm, I am not sure how this relates to the above now I think about it. > > > >> > > >> # Optimization. > > >> > > >> if ( $THRESH > > > >> $hsps[$p]->{P_END} - $hsps[$j]->{P_START} ){ > > >> > > >> shift @hsps; > > >> next TOP; > > >> } > > A shift or a similar operation (splice, push, etc) will likely trigger > the garbage collector in Perl, as well as the memory allocator I see > I might suggest that instead of moving data around (splice, push, shift) > and causing allocation/GC events, that you might maintain several lists, > indicating which class the HSP's belong to. Using a hash? I still think that I will have to loop through more than once... Maby if I just sort by Hsp_query-from and Hsp_e-value (using a sort vector of the form @sort_vector_e_value = sort{ $hsps[$a]->{E_VALUE} <=> $hsps[$b]->{E_VALUE} } (0..$#hsps); I think sort is O(N*log(N)), so two sorts is still NlogN, then maby I can do the whole thing in one pass... Thanks very much for your help, Cheers, Dan.