[Biodevelopers] Re: splice advice...

Dan Bolser dmb at mrc-dunn.cam.ac.uk
Wed Aug 27 17:30:00 EDT 2003


Joseph Landman said:
> Hi Dan:
>
>   I am assuming that your arrays contain not HSP's, but some sort of
> object representing the HSP ala BioPerl.  Is this correct?
>

Nope, just a simple hash, key => value,
I.E.

{
  hsp_hit-from => 5,
  hsp_hit-to   => 10,
  score        => 20,
}

>   See
> http://doc.bioperl.org/releases/bioperl-1.2/Bio/Tools/BPlite/HSP.html

Ta.

The problem is I find these implementations a bit confusing.

I know I am reinventing the wheel, but sometimes it pays off.

I.E. I have my HSP's in a RDB, so why load them into objects?

Does bioperl have that func (I guess it does. I also like the
htmlReport stuff with bioperl, but again I wish it was a bit
more general, less self specific).


>   for a way to handle some of the HSP processing.  This might help you
> simplify the expression of what you are doing...


Yup, or at least make it a bit more readable by people who don't know
my specifics. But I seriously doubt that it will help my optimzation problem.

Any suggestions of the complexity of below / how to improve it would be
greatly appreaciated.

Cheers,
Dan.

>
> Joe
>
> On Wed, 2003-08-27 at 14:56, Dan Bolser wrote:
>> Hello, splice to see you etc.
>>
>> I am trying to write a *simple* "best HST in family "
>> algorithm in perl.
>>
>> My raw materials are SCOP queries against target sequences.
>>
>> I get each set of hits for each protein in turn, sorted
>> by P_START (Hsp_query-from).
>>
>> I then go through the list and remove any pair of sequences
>> with more than $THRESH AA overlap (if they come from the same
>> scop family).
>>
>> This list removal involves lots of splicing, which is O(N) with
>> list size.
>>
>> I figure I could avoid all that splice if I just use pointers
>> to array positions, but I can't work out how to do this...
>>
>> Maby splicing is the least of my optimzation problems....
>>
>> __SKIP__
>>
>> preamble
>>
>> @hsps = array of HSP hashes, for a particular protein
>> each HSP can be from several SCOP sequences.
>>
>> __RESUME__
>>
>>   my @result;						# Final HSP's
>>
>>   TOP:while (@hsps){			  # NB: Ordered by Hsp_query-from
>>                                           # (for optimzation).
>>
>>     my $p = 0;	                          # Current HSP pointer.
>>
>>     MID:for (my $j=$p+1; $j<@hsps; $j++){ # Overlap slider.
>>
>>      # Family overlap only!
>>
>>       next MID if
>>         $hsps[$p]->{SCCS} != $hsps[$j]->{SCCS};
>>
>>       # Optimization.
>>
>>       if ( $THRESH >
>>              $hsps[$p]->{P_END} - $hsps[$j]->{P_START} ){
>>
>>         shift @hsps;
>>         next TOP;
>>       }
>>
>>       # Pick best of pair (removing the other from the list).
>>
>>       if ( $hsps[$p]->{E_VALUE} > $hsps[$j]->{E_VALUE} ){
>>         splice (@hsps, $p, 1);
>>         $j--;
>>         $p = $j;
>>       }
>>       else {
>>         splice (@hsps, $j, 1);
>>         $j--;
>>       }
>>     }
>>     push @result, splice(@hsps, $p, 1);
>>   }
>>   print "OK\n\n";
>>
>> __END_ISH__
>>
>> Whaddya think?
>> Any better way?
>>
>> Cheers,
>>
>>
>>
>> On Wed, 27 Aug 2003, sekhar kavuru wrote:
>>
>> > Dear Joseph,
>> >
>> > Iam a Perl Developer with BioInformatics Certification.
>> >
>> > Recently I developed a software package using BioPerl/ EnsEmbl to create a
>> Perl/Html based database interface to access Genome data from EnsEMBL and
>> SwissProt. The Browser I developed enables users to query ENSEMBL database
>> based on either CloneId or Chromosome Number.
>> >
>> > If you need any assistance or help please feel free to write to me.
>> >
>> > Regards
>> >
>> > Sekhar
>> >
>> > biodevelopers-request at bioinformatics.org wrote:
>> > Send Biodevelopers mailing list submissions to
>> > biodevelopers at bioinformatics.org
>> >
>> > To subscribe or unsubscribe via the World Wide Web, visit
>> > https://bioinformatics.org/mailman/listinfo/biodevelopers
>> > or, via email, send a message with subject or body 'help' to
>> > biodevelopers-request at bioinformatics.org
>> >
>> > You can reach the person managing the list at
>> > biodevelopers-admin at bioinformatics.org
>> >
>> > When replying, please edit your Subject line so it is more specific than "Re:
>> Contents of Biodevelopers digest..."
>> >
>> >
>> > Today's Topics:
>> >
>> > 1. Re: [BiO BB] perl scripting assistance (Joseph Landman)
>> >
>> > --__--__--
>> >
>> > Message: 1
>> > From: Joseph Landman
>> > To: BiO BB
>> > Cc: biodevelopers
>> > Date: 26 Aug 2003 21:06:11 -0400
>> > Subject: [Biodevelopers] Re: [BiO BB] perl scripting assistance Reply-To:
>> biodevelopers at bioinformatics.org
>> >
>> > Try the biodevelopers group on bioinformatics.org ...
>> >
>> > On Tue, 2003-08-26 at 13:06, Tristan J. Fiedler wrote:
>> > > Are any bulletin boards / discussion groups available for obtaining tips in
>> scripting with perl?
>> > >
>> > > Thank you.
>> >
> --
> Joseph Landman, Ph.D
> Scalable Informatics LLC
> email: landman at scalableinformatics.com
>   web: http://scalableinformatics.com
> phone: +1 734 612 4615






More information about the Biodevelopers mailing list