[Biococoa-dev] BCSequence class cluster

Fri Jan 7 17:20:21 EST 2005

>Hi Charles,
>
>First of all, happy New Year to you too!
>
>Thanks a lot for all the work, both the coding and the research you did about the future directions of BioCocoa. I'm a big fan of the Class cluster approach as this keeps the interface very simple. The biggest problem with this approach - as I see it now - is that some BCTools will only work with/on some sequence types. In that respect, I'd prefer your proposal to provide an additional set of headers defining some public classes as placeholders over the protocol approach. The placeholder approach will make/keep code much more readable indeed.

OK, let's try to discuss the strong/weak typing issue in more details . I will repeat a lot of what we all discussed before... but hopefully put it in a broader context. I don't know yet where this whole discussion will get me (and us), so many ideas/problems/designs are all coming to mind.

First, if we go back again to the BCSequence, we have 2 options, regardless of class cluster:

* we only provide the user with one public class, either with a class cluster or a general class that handle the different cases; how to handle the irrelevant cases is debatable: bluntly (returning nil), silently (returning self or empty objects or even [NSNull null]), with a Biococoa-specific error reporting system (which could be useful anyway at some point in the future), or with runtime errors (this would not be very nice!)

* you provide public headers for all the different types of sequence; and you don't let the user get a sequence object BCSequence, without a strong type, like you get with the method [BCSequence sequenceWithString:aString]; so you provide the user with all the objects BCSequenceDNA, BCSequenceRNA, BCSequenceCodon, BCSequenceProtein, and the user chooses which one to use; this gives more work to the user, but also more control and thus potentially more flexibility (and BCSequenceNucleotide could fit in the picture too)

The choice depends on 2 things:

* what the user wants; this is difficult; we are all potential users, I suppose; maybe one hint is that the user of BioCocoa would also be a user of Cocoa, and would be used to simple interfaces where she does not have to handle the details and can get results in 2 lines of code

* what the developers can do and how much work it requires; my feeling is in both designs, there will have to be some compromises, and not everything will be perfect; but it seems both options are feasible, the class cluster potentially requiring more careful planning (but once established, not more complicated than other approaches, or maybe even simpler); the amount of code would be roughly equivalent in both cases; the simple interface would make code maintainance easier (at least in terms of testing) and would improve backward compatibility; on the other hand, the more complex public interface would male our life easier when we want to provide more radical extensions

Again, the reason why I came up with the idea of some public headers for placeholder classes for typed sequences was to propose the user BOTH OPTIONS! (but maybe we should not). Regardless of the design, some kind of trick is necessary, because you cannot allow BCSequence to respond to all the messages, and then prevent some of the subclasses to respond to some of the messages, and still have some compiler warnings. If you already have a class cluster design in place, having some public headers for placeholder classes is one possibility, or adding some formal protocols (Alex, Peter and I find that a bit hard for the user). But actually, if you think more about it, maybe it could also be the other way around. You would have an abstract superclass (BCSequenceAbstract) and some public sequence-specific subclasses (BCSequenceDNA,...) and then provide an additional generic subclass BCSequence that would respond to all messages in his header. One possible implementation for that subclass is to have just one ivar = an instance of one of the subclass, and only implement -forwardInvocation to handle the messages. So, depending what initial option you favor, the addition of the other option is always possible, but the final result is different...

OK, so we have two choices for the interface. There might be ways to provide both choices to the user, but we probably would not want to do that as a first version and we have to choose anyway.

Now, to go beyond the BCSequence implementation, you raise the issue of the BCTools implementation. And it strikes me now that they are very much interrelated and that there are other design issues with BCTools and that they closely relate to BCSequence. We have now another choice to make:
* to perform an operation on a BCSequence, the user has to use one of the BCTools; the simple BCTools might provide some convenience class methods like
	+ (BCSequence *)complementForSequence:(BCSequence *)aSequence
but more complex tools will be used by alloc/init and then settings some parameters with some accessors methods and then calling a 'result' method on the tool; the interface to the BCTools is public
* to perform an operation on a BCSequence, the user has to use a BCSequence method, such as 'complement', or 'cutWithEnzyme:' or 'weigh' or ...; and the user does not even have to know that BCTools exist!

The latter gives a very simple interface, all within BCSequence. However, while the concept of sequence is abstract enough to be put in one class, I don't think the concept of tools or operation on a sequence is simple enough to have all the interface all fit in the context of BCSequence:
- the tools have very different levels of complexity, from 'reverse' to 'align'
- some tools might use objects others than BCSequence
- some operations might require to set many parameters, some optional, some required (like alignements); that could be hard to fit in just one method of BCSequence (though, obviously, NSDictionaries would help to pass a bunch of arguments at once)
- batch processing on an array of sequences will not be possible in such a design (and we would lose potential optimizations); with BCTool, you can just set all the parameters and then run it on several sequences at once
- another point is that BCSequence interface might look bloated, though it is not necessarily an issue and there are precedents: just look at NSWindow!

So my impression is that the BCTools interface will have to be public to a certain extent (and for simple, obvious tools like translation, some convenience methods will be included in the BCSequence interface, the job being done really by a BCTool). Now, the user will have to provide the BCTool with a BCSequence. Depending on the design chosen for BCSequence (one public class or several typed subclasses), the interface for the BCTools will be quite different, and the implementation as well:

* with just one BCSequence, there is only the need for one 'init' method, namely 'initWithSequence:'; the code may have to decide what to do depending on the 'sequenceType' (the good thing is BCSequence does not have to decide, so if a convenience method is provided in BCSequence, it can be at the level of the superclass). So there might be some 'if' and 'case' statement involved here. In certain cases, it might get very difficult to stick to a simple general tool able to handle all sequence types with just one class. Such an example is alignement. A BCToolAlignement would be very different for a protein and a DNA. We then may have to provide two tools in two separate classes and have more stringent rules (the user will have to be more careful then), but we don't have to give up the simplicity of the BCSequence, I think. Anyway, alignements are really a very elaborate thing that may fall out of the BCTool paradigm.

*with several BCSequence public subclasses, we would have to enforce typing for the 'init' methods of the tools, with some 'initWithDNASequence' and equivalent. With tools able to handle several sequence types, we could need several init methods, one for each sequence type. A problem also is the output: the type will depend on what was entered. For instance, a BCToolComplement could be fed a BCSequenceDNA or a BCSequenceRNA with the 'initWithDNASequence' or 'initWithRNASequence' method, but then we would need a 'complementDNASequence' and a 'complementRNASequence' method to retrieve the result to keep the strong typing in the user code (sorry the BCToolComplement is a dull example, as it can be and is handled by BCSequence; also, a simple possibility here is to use a BCSequenceNucleotide, as suggested by John; there might no be so many cases where the problem would arise, I am not sure).

*what you are saying, Peter (finally I comment on your comment!), is that to solve these dilemna, we implement a class cluster together with some typed classes that will be only used in certain tools;

I did not mean to go that far in the discussion when I started that email, but these issues will have to be debated, will have some impact on the design of BCSequence,  and some design decisions will have to be taken for the BCTools as well.

>It seems that the mutability problem can be solved by either the subclasses or the mutable variant. While the mutable variant will reduce the number of classes, it will make the code in these classes less readable (depending on the number of optimizations we decide to implement). I think something could be said for either solution, I don't really have an opinion about this one.

OK, it seems the mutability/immutability issue could and should be put on the backburner for now. The code may still have to provide an interface now to allow future implementations to take place later, like I explained in my recent email in response to Alex.

Charles

-- 
Charles Parnot
charles.parnot at stanford.edu

Help science go fast forward:
http://cmgm.stanford.edu/~cparnot/xgrid-stanford/

Room  B157 in Beckman Center
279, Campus Drive
Stanford University
Stanford, CA 94305 (USA)

Tel +1 650 725 7754
Fax +1 650 725 8021