[Biococoa-dev] more ramblings

Mon Nov 29 20:41:24 EST 2004

On Nov 29, 2004, at 10:59 AM, John Timmer wrote:

> Okay, I'm cutting out a ton of quotations, because I was beginning to 
> lose
> track of the discussion (I blame my cold for lack of focus ;).

Eat some more turkey to cure your cold ;-)

> The first is the issue of how to handle untyped sequence files.  Koen
> suggests that the method for each untyped file goes through a factory 
> object
> that handles its lack of clarity, an idea which I like.

Actually, what I suggested is to have a factory that handles the 
creation of *every* sequence. You feed the factory with a string, 
array, etc, and a BCSequenceType and/or BCSymbolSet, and the factory 
returns the right BCSequence. If the type is not specified, then the 
guess code comes into action. The point I am trying to make is that we 
should always use BCSequence as a *return type* for the factory as well 
as within BCSequenceReader. Otherwise we need a factory class for each 
BCSequence subclass. Internally the factory creates the right subclass, 
and even though the return type is BCSequence, the actual type will be 
the created subclass. That's the nice thing of inheritance!

So maybe:

BCSequenceFactory	*myFactory = [[BCSequenceFactory] alloc ] init];

BCSequence	*newSequence = [myFactory createSequenceUsingString: 
@"AACCTTGG" usingType: BCDNASequence];

-(BCSequence *) createSequenceUsingString: (NSString *) string 
usingTyp: (BCSequenceType) type
{
	switch (type)
	{
		case BCDNASequence:
		{
			return [BCSequenceDNA DNASequenceWithString: string];
			break;
		}

		.....

and so on.

Note that in the snippet I am actually using BCSequenceDNA ;-). If you 
guys really want it, it's fine with me if we keep those around for 
convenience. But I still think that we should put most code in 
BCSequence, except maybe for the init methods. Because we are using a 
sequencetype or symbol set we know that the sequence is using the right 
type of symbols. So there is also no need to do typechecking, such as 
in setSequenceArray and other methods.

> The question then becomes how to determine which type of sequence to 
> return.
> The way I would imagine is to have a flag to determine whether to ask 
> for
> user input - this could put up a standard dialog box.  If the flag is 
> false,
> the factory method could create each type of possible sequence, then 
> use the
> sequence counted set to look for undefined symbols.  Compare the 
> results,
> and take the one with the fewest undefined symbols.  In case of a tie,
> default to DNA>RNA>protein.

Sounds good, this code can also go in the factory class. However, I 
don't think we should use a dialog box for the framework. This is the 
sole responsibility of the developer who uses BioCocoa.

>
> Ramble #2 is about the sequence wrapper/bundle, and how to implement 
> that to
> handle the multiple sequences in an alignment file.  I had envisioned 
> the
> wrapper as holding features, and a bundle as linking related 
> sequences.  If
> this is the way we go, we'd have to implement both in order to handle 
> this
> circumstance.
>
> A short summary of how I expected a bundle to work -
> Each wrapper would have a unique bundle ID, and a reference to its 
> bundle.
> Features within the wrapper, features could include a bundle ID.  
> Basically,
> if code wanted to look at a feature, it would check to make sure that 
> the
> bundle reference was not nil - if it wasn't, it would take the 
> feature's
> bundle ID, and ask the bundle for the sequence corresponding to that 
> ID.
> Given that a feature should have an NSRange, this would allow the two
> sequences to be aligned.

Could you show a more concrete interface? It's still kinda vague to me 
:(

>
> The last issue seems to be around the quote from Koen:
>> I agree, but let's then focus on having these one-liners in BCSequence
>> only, not in the subclasses.
> I remember this quote as bothering me when I first read it, because 
> there
> are some one liners that clearly belong in a specific sequence 
> subclass (ie
> - finding the longest open reading frame should not be available to a
> protein sequence, and finding the hydrophobicity should not be 
> available to
> nucleotides or codons).  I seem to remember that reading further 
> alleviated
> my concerns on this, but I can't remember how.  Since Alex and I share 
> this
> concern, could you clarify what you meant here, Koen?

If we add code to a wrapper that checks if the type of sequence then I 
don't see any problem. If the sequence type by accident is the wrong 
one (which I really don't think is going to happen), the wrapper should 
return nil, or an error, or an NSNotification. Hope that's more clear.

cheers,

- Koen.