[Biococoa-dev] Misc

Sun Jan 9 01:29:10 EST 2005

Sorry I just wanted to discuss further a few points here, not critical points, and a little random, but still worth a few words...

At 7:48 PM -0500 1/7/05, John Timmer wrote:
>Yeah, this analogy doesn't really work in terms of taking into account why
>Alex and I worry.  You can add a .tiff to a non-path, and the result is
>still a string.  You might get unexpected behavior when you used it, but
>you'd have to do something convoluted to get your app to crash as a result.
Actually, like you say, the result is still a string.
In a system that only uses BCSequence object, as long as you get BCSequence objects, you will never crash (except if you intentionally add runtime errors... well and if we make errors, but this is our task to squash these bugs). If you get NSArray, NSNumber, etc..., you should be fine too. What gets more dangerous is to receive nil back. Still OK, but borderline (like NSDictionary objectForKey).
I am sorry the analogy with NSString is really really stretched, I could not find better, and I still agree the risk is higher.

Note that as you will see in my other email of the day, I do see your point, and thinks we could provide some strong typing for those who wants it. But I still think that due to the nature of Cocoa, crashes would be rare as long as you return friendly objects. Their lack of meaning has a chance to get back to the real user of the final app, and this lack of meaning should make sense to him, hopefully, if he did something stupid.

At 1:37 PM +0100 1/8/05, Alexander Griekspoor wrote:
>Yes, I agree, if we can get rid of BCSymbollist and only have one class (BCSequence) that would have my preference too. Just my stupidity, adding many methods to a class (which we need to keep annotations in sync and add/remove them etc) doesn't make it more costly to use  memory/speed wise?
At 1:37 PM +0100 1/8/05, Alexander Griekspoor wrote:
>Exactly my point, I'm not sure though, the first (and foremost) question is whether putting everything in one class doesn't matter memory/speed wise (I'm not talking about the small unused ivars). Second, how much will the sequence class increase code-wise to a point where things become unpractical. For example, I have experienced opensource projects where classes have become so extended that you can't say extract one class for use in a custom project without spending hours unwinding everything unnecessary, where with a class you basically have to incorporate the whole project to make it work. Maybe not so much of importance, but usually the more simple the code and easier to overview, the more useful and versatile it turns out to be.
I am not sure about performance issue. But they should not be an issue.
About subclass versus large class, my wild guess is more subclasses will have the runtime lookup the class hierarchy more, while more methods will have the runtime lookup the method list more (this is some pure science, here). With the caching and all, well, euh...
About code bloating, yes this could be a issue on the developer-side.

>At 1:37 PM +0100 1/8/05, Alexander Griekspoor wrote:
>>If the purpose is to separate the code for annotations, an alternative is to use a category, and not a subclass. And I think I like that better, actually.
>As said before, I'm not really in favor of categories as they don't really belong in a framework, BUT as they are private things maybe different.
What is the problem with categories?
I think I may understand what you mean: categories on an Apple object like NSDictionary are not welcome? No, they are not. But indeed, we are talking about private classes, so we do what we want (and Apple does it a lot, look at the headers of NString or NSArray, for example). You just have to put all the @interface in the same header file.

At 7:58 AM -0500 1/8/05, Koen van der Drift wrote:
>Now I think of it, is there a good reason why we should have immutable sequences? The only I can think of right now is that we have to be careful when annotations are present. If we can solve that, then IMO there is no need for both immutable and mutable variants. Charles, you know perl, any idea how this is solved in BioPerl?
I know perl, but not BioPerl very much. I actually tried to use it once to read sequences, but it was so slow...; it was all FASTA format so a few lines of simple perl did much better. My interpretation is they had added many layers of abstraction for complicated sequence formats, which takes a bit hit of performance in perl. Note that in ObjC, we would not see such a hit. Anyway, I will have another look at BioPerl...
Like Alex and you, I actually don't think sequence should be made immutable just because of annotations. This is a lame excuse (I hope nobody from BioJava ever reads that!!).

The reason for immutable sequences is also for copying. It is possible that taking a subarray is also optimized just because NSArray is optimized for that (or will be). Maybe Apple is smart enough to create a subarray by pointing at a piece of the already existing parent array, so no new memory is used and no copy is done. More generally, any optimization put in NSArray will benefit BCSequence (if we use the native NSArray methods as much as possible). And we don't get them with NSMutableArray.

At 2:20 PM +0100 1/8/05, Alexander Griekspoor wrote:
>But I don't see why a method for which we provide a convenience method in BCSequence should be hidden in the BCTool, I don't mind that the BCTool methods we use internally in the BCSequence convenience methods are public as well, if a user wants to replicate the convenience method by hand for some reason, he's free to do so...
I did not mean to hide the simple BCTool altogether. They can still be public, yes. Sorry for the confusion!

At 2:20 PM +0100 1/8/05, Alexander Griekspoor wrote:
>Hey, we can have the same approach we're now taking for bcsequence (are we? ;-) for such tools as well can't we? Have a public tool with private subclasses depending on the fed sequence subclass...
I did not dare to bring that up... Let's just keep it for ourselves for now.

At 2:20 PM +0100 1/8/05, Alexander Griekspoor wrote:
>Hmm, that might no be the case actually, I've seen quite some alignment code recently and almost never people use different methods for aligning DNA and protein, there's really no big difference (take ClustalW for instance), only the scoring matrix might be a bit more complicated. A still think alignments are very nice in fact to be offered as a tool.

OK, good to know. Makes our future brighter...

--------

Sorry guys, your reading is not done. I  am planning on sending another email with some further ideas following John, Peter and Alex's concerns with not having strongly typed classes. I hope what you will read will please you...

Charles

-- 
Charles Parnot
charles.parnot at stanford.edu

Help science go fast forward:
http://cmgm.stanford.edu/~cparnot/xgrid-stanford/

Room  B157 in Beckman Center
279, Campus Drive
Stanford University
Stanford, CA 94305 (USA)

Tel +1 650 725 7754
Fax +1 650 725 8021