[Biodevelopers] splice advice...

Dan Bolser dmb at mrc-dunn.cam.ac.uk
Wed Aug 27 14:56:33 EDT 2003


Hello, splice to see you etc.

I am trying to write a *simple* "best HST in family " 
algorithm in perl.

My raw materials are SCOP queries against target sequences.

I get each set of hits for each protein in turn, sorted
by P_START (Hsp_query-from).

I then go through the list and remove any pair of sequences
with more than $THRESH AA overlap (if they come from the same
scop family).

This list removal involves lots of splicing, which is O(N) with
list size. 

I figure I could avoid all that splice if I just use pointers
to array positions, but I can't work out how to do this...

Maby splicing is the least of my optimzation problems....

__SKIP__

preamble

@hsps = array of HSP hashes, for a particular protein
each HSP can be from several SCOP sequences.

__RESUME__

  my @result;						# Final HSP's
  
  TOP:while (@hsps){			  # NB: Ordered by Hsp_query-from
                                          # (for optimzation).

    my $p = 0;	                          # Current HSP pointer.
    
    MID:for (my $j=$p+1; $j<@hsps; $j++){ # Overlap slider.
      
     # Family overlap only!

      next MID if
        $hsps[$p]->{SCCS} != $hsps[$j]->{SCCS};	
      
      # Optimization.
      
      if ( $THRESH >
             $hsps[$p]->{P_END} - $hsps[$j]->{P_START} ){
        
        shift @hsps;
        next TOP;
      }

      # Pick best of pair (removing the other from the list).
      
      if ( $hsps[$p]->{E_VALUE} > $hsps[$j]->{E_VALUE} ){
        splice (@hsps, $p, 1);
        $j--;
        $p = $j;
      }
      else {
        splice (@hsps, $j, 1);
        $j--;
      }
    }
    push @result, splice(@hsps, $p, 1);
  }
  print "OK\n\n";

__END_ISH__

Whaddya think?
Any better way?

Cheers, 



On Wed, 27 Aug 2003, sekhar kavuru wrote:

> Dear Joseph,
>  
> Iam a Perl Developer with BioInformatics Certification.
>  
> Recently I developed a software package using BioPerl/ EnsEmbl to create a Perl/Html based database interface to access Genome data from EnsEMBL and SwissProt.
> The Browser I developed enables users to query ENSEMBL database  based on either CloneId or Chromosome Number.
>  
> If you need any assistance or help please feel free to write to me.
>  
> Regards
>  
> Sekhar
> 
> biodevelopers-request at bioinformatics.org wrote:
> Send Biodevelopers mailing list submissions to
> biodevelopers at bioinformatics.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> https://bioinformatics.org/mailman/listinfo/biodevelopers
> or, via email, send a message with subject or body 'help' to
> biodevelopers-request at bioinformatics.org
> 
> You can reach the person managing the list at
> biodevelopers-admin at bioinformatics.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Biodevelopers digest..."
> 
> 
> Today's Topics:
> 
> 1. Re: [BiO BB] perl scripting assistance (Joseph Landman)
> 
> --__--__--
> 
> Message: 1
> From: Joseph Landman 
> To: BiO BB 
> Cc: biodevelopers 
> Date: 26 Aug 2003 21:06:11 -0400
> Subject: [Biodevelopers] Re: [BiO BB] perl scripting assistance
> Reply-To: biodevelopers at bioinformatics.org
> 
> Try the biodevelopers group on bioinformatics.org ...
> 
> On Tue, 2003-08-26 at 13:06, Tristan J. Fiedler wrote:
> > Are any bulletin boards / discussion groups available for obtaining tips
> > in scripting with perl?
> > 
> > Thank you.
> 




More information about the Biodevelopers mailing list