From charles.parnot at stanford.edu Wed Jan 5 02:56:13 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Tue, 4 Jan 2005 23:56:13 -0800 Subject: [Biococoa-dev] BCSequence class cluster Message-ID: It seems the class cluster possibility has raised some interest. So I took some time to think it through and write some code. I got carried away and wrote a lot of it, and also I wrote this long email, but now you are used to those long emails:-) Note that I am just proposing an implementation of a class cluster, and some solutions to potential pitfalls, but I am not saying that you should absolutely go with the class cluster design. I am a little biased in favor of it, but you should really decide if (1) you want to discuss it further and (2) discuss it further! Note that I mostly say 'you' when I talk about the developers, but maybe at some point, I should really start saying 'we' ;-) Anyway, for every sentence you read below, mentally add at the beginning "I may very well be wrong or missing something but it seems to me that maybe...". Like I said before, several of the issues raised here apply to the existing code and you will have to deal with it at some point. The main point boils downs to the question of using a weakly typed object BCSequence vs using strongly objects belonging to one of the subclasses BCSequenceDNA/RNA/etc... Some of the code is a bit schizophrenic right now and tries to deal with both cases... The class cluster would favor the weakly typed route, and would make the design more consistent and simpler. To follow the discussion, you can download a zipped Xcode project with some real code here: http://cmgm.stanford.edu/~cparnot/temp/BCSequenceClassCluster.zip Don't try to compile, it probably won't succeed. It is just easier to navigate the code in this familiar format. OK, so how would a class cluster look like? 1. The user point of view ---------------------- For the user, there is only one class, called BCSequence. Instances are immutable and can be obtained with a number of factory methods, or using alloc followed by init methods. These are defined in the only header file accessible to the user, BCSequence.h (see attached project). From the user point of view, the usage is very simple: just create a sequence with one of the numerous factory or init methods, including reading from files. The instance you get back is immutable, but you can create new instances from it by removing/adding pieces, or transforming it to another type. You can always check the type and length, get the sequence back into a string or array of symbols. You can feed tools with that BCSequence instance and get the results, potentially getting back other instances of BCSequence. There are 2 things the user could complain about: a- Some of the methods are only relevant for certain sequence types b- Sequence objects are immutable About complaint (a) In the header file BCSequence.h of the attached project, there are 2 methods that are only relevant to a subset of the BCSequence type: -complement and -reverseComplement. This is not a really big concern at this point, because this is just 2 methods and it is quite easy to return something for all cases (for a protein, probably just return itself). But more methods in BCSequence or in the BCTools could give the same issues. For instance, BCToolDigest. That would only have sense on a DNA sequence when using restriction enzymes. The class BCSequence would always return something, empty sequences in the worst case, leaving the troubles to the runtime. This is the only appropriate way to handle it with the class cluster design, maybe together with some error codes/handling mechanism. But the user may want to be more specific about the BCSequence type and get some compiler warnings when appropriate, instead of leaving it to the runtime. The user might be ready to give up the simplicity of a unique class and use more specific types. This is the issue of weak vs strong typing, which relates to the issue of compiler vs runtime errors/warnings. One possible answer is to say to the user: this is the way it is, just accept it!! And I believe as a first version, it is really OK. But there are also some ways to give the user the possibility to choose between strong and weak typing and keep the class cluster design, that I will explain later, below. About complaint (b) I thought of enforcing immutability as a starting point, as this is easier on the developer side to deal with immutable objects. Giving the option of immutability to the user is anyway a good thing, as it allows a number of optimizations, that could really pay off in a real application with lots of copying, ref passing,... Of course, it is nice to also have mutable objects. I will address that on the developer point of view (see below). Note that ultimately, one thing would probably always be immutable: the sequence type. 2. Implementing the class cluster ------------------------------ The class cluster that I implement in the attached project looks very much like what you have already done. There is a superclass BCSequence, and then subclasses, BCSequenceDNA, BCSequenceRNA,...etc... plus a new special subclass BCSequenceFactory. Now the purpose of a class cluster is that the user just does everything using the public interface for BCSequence, and as far as the user is concerned, every object is an instance of BCSequence. But inside the hood, you actually return instances of one of the subclasses so that some operations can be optimized for the particular type of sequence you are dealing with. The problem for the developer of a class cluster is that you know which subclass to use only once you call one of the init methods, but you still have to do the 'alloc' before the init. There is no way BCSequence will know what subclass it should use at the time 'alloc' is called. So the trick is to alloc a temporary instance of a particular subclass, a 'placeholder' class. Look at the implementation of 'alloc' in BCSequence.m. What this method returns is actually an instance of BCSequenceFactory when called on the superclass (when called on one of the subclass, though, it just passes the message up to NSObject). The bottom line is: you never create an instance of BCSequence, but an instance of BCSequenceFactory (you still alloc instances of BCSequence subclasses, of course). In fact, that BCSequenceFactory instance could be a singleton and never deallocated if we changed the code a little bit. Then when one of the init method is called on that new BCSequenceFactory instance. This method actually allocs and inits a new object, an instance of the appropriate subclass. It then releases self and returns a pointer to the new object created. Because she should always use the value returned by init to set your pointers, the user will get the right object in the end. To summarize, what happens when the user runs the following command: BCSequence *mySeq = [[BCSequence alloc] initWithDNAString:aString]; You have the following happening * [BCSequence alloc] returns an instance of BCSequenceFactory * the message initWithDNAString:aString is sent to the BCSequenceFactory instance * in the method, a second object is created by calling finalObject=[[BCSequenceDNA alloc] initWithString:aString] * then the method calls [self release] to destroy the original BCSequenceFactory instance * then the method returns the finalObject * so now mySeq=final Object and is an instance of BCSequenceDNA You get the same process when the user calls: BCSequence *mySeq = [[BCSequence alloc] initWithString:aString]; except BCSequenceFactory first figures out to what subclass it should send the 'initWithString' message (using the same code as the original BCFactorySequence). Then all the other methods are just convenience methods calling these building blocks. Like for any superclass/subclass pattern, it is important to define what methods the subclasses should, may or should not override, and I have a summary of that in the attached project. It is very similar to what you have already done. 3. Pros and cons --------------- What are the potential pitfalls and limitations: (a) how to still provide the user with some more static typing when she wants more control over it? This is complaint (a) of part (1) above. (b) how to provide mutable/immutable versions? This is complaint (b) of part (1) above. (c) the class cluster assumes all the methods can be called on all the subclasses. Will that always be relevant? The case of 'complement' is already a bit troublesome, and how about even worse cases, like 'digestWithRestrictionEnzyme:'. It does not make any sense for a protein, does it? The question is really: how does that fit with the BCTools? Could problem arise as we define more and more tools? Will it be that easy to add more private subclasses without breaking the existing code? (d) What about the recent developments: does BCSymbolList fit in the picture? how do you add the annotation stuff to that? I have answers to all of these, and I will come back to these different points below, in other parts of my email. And there might be other pitfalls I don't see yet. But first, while writing the code and thinking about the whole concept, I also realized the potential benefits of a class cluster, and there are more than what I anticipated. Some of these benefits are really the benefits you get from OO, but are even more apparent with such a simple interface where things are even more encapsulated because it is almost like you have just one class: * super simple interface for the user; she also gets the benefit of polymorphism without the need to know the existence of all the subclasses; * because the public interface is reduced, the developer can make plenty of changes without breaking existing code developed by the user * in particular, it allows the addition of new types of sequences or optimized subclasses for particular uses, that may in most cases already work with the code developed by the user; so the user can get new functionality for free * the same is true for code developed by the developers of the framework: - developers can work on other parts of the framework without knowing too much about the guts of BCSequence - by relying on just one class for interactions between the different pieces of BioCocoa, it simplifies the development and minimize disruptions as modifications are made to BCSequence I remember in the discussions, there was some disagreement about having subclasses (Alex's choice) or just one class which would decide what to do depending on the symbolSet used (Koen's choice); maybe a class cluster is a way to have many of the benefits of the 2 systems without too many of the problems. More about pros and cons of class cluster on the Apple web site: http://developer.apple.com/documentation/Cocoa/Conceptual/CocoaObjects/Articles/ClassClusters.html For me, the bottom line is still unclear. At present, I feel that a class cluster would work really well. But we have to anticipate now all the potential problems, and we should decide if it is worth it. 4. Compile vs runtime errors -------------------------- This is a discussion about complaint (a) of part (1) and pitfall (a) of part (3). What if the user wants more control over the type of sequence it is using and want some compiler warnings when trying to cut a protein with EcoRI, or get its complementary sequence? At this point, the class cluster does not allow that. All the methods are valid for all the sequence types. In this context, an invalid call will only be revealed at runtime, and a BCProtein object would have to decide at runtime to return something when sent an irrelevant message. What should it send back? This issue is actually slightly different from the discussion here and is discussed in part 6 (sorry this whole email is quite large and complicated; I am trying to keep it readable!). The question here is really: can we prevent that from even happening when the user knows what type of sequence she is dealing with and could get compiler warnings? One way to help with that is to provide an additional set of headers defining some public classes named BCSequenceDNA, BCSequenceRNA,.... These classes would just be placeholders, and would be completely disctint from the subclasses of BCSequence (I will come back to the name conflict). They would have some init methods, but when the user uses these classes and alloc/init an instance, she would get in fact one of the BCSequence subclasses. The compiler would not know and would trust the headers to generate warning. For instance, the header for the BCSequenceProtein placeholder class would not define the methods 'complement' or 'cutWithRestrictionEnzyme:', and you would get a compiler warning even though the object would in fact respond to the methods at runtime (but would have to return some dummy values). So these headers would really define completely virtual classes. One of the problem is the names of these placeholder classes conflict with the names of the BCSequence private subclasses that are defined in the project I sent. We could rename the latter to BCSeqDNA/RNA/... for example, and keep the nice full names 'BCSequenceDNA/RNA/...' for the placeholder public classes. An alternative is to define protocols, and so the user would have to use (id ) in the code. The BCSequence would provide methods to return objects typed this way. It is a bit of a pain to type id all the time and reduces readability, though. So there are ways to solve the problem. Note that the problem is not really tied to the class cluster implementation and is already partly a problem that the current code is facing, as I talked about at the very beginning of the email (OK, now is a good time to reread everything!!). Of course, the interface then becomes a bit schizophrenic, so it may not be such a good idea to allow all of that. At least in the beginning, there may be not such a high need for stronger typing, and this goes a bit against the whole idea of a simple interface and a class cluster. 5. Mutable and immutable instances -------------------------------- This is a discussion about complaint (b) of part (1) and pitfall (b) of part (3). Why impose immutable objects? Not sure. This is not something I had thought of at first, but it is anyway an important issue that goes beyond the idea of class cluster. Immutable objects allows very important and basic optimizations, particularly when copying objects, and are sufficient for most uses. A smart user will use immutable objects whenever it can and will only go to mutable objects if really necessary. This is something we may have to think about for the BioCocoa project anyway. I am not saying it is absolutely necessary but it should be discussed (and maybe it has been??). To implement mutable objects in the class cluster could be a bit tricky, because there are two conflicting subclass organizations here: mutable/immutable and dna/rna/protein/codon. To get all the combinations, it seems that we need 8 subclasses!! I am not completely sure how to deal with it, or if we should deal with it or just give up and stick to mutable only. One possibility is to not have distinct subclasses for mutable/immutable. Instead, there could be simply a BOOL flag 'isMutable' as one of the instance variables. The object would then return different results in key methods such as 'copy' depending on the value of the flag. Also, at creation, it would create mutable or immutable instance variables (NSArray or NSMutableArray) depending on the value of that flag. It is OK to declare a mutable object as the instance variable and then actually use it to allocate an immutable object, as long as we are consistent in the methods called to avoid runtime errors (and we should use some casts to avoid compiler warnings). 6. Potential clashes in the future -------------------------------- This is a discussion about pitfall (c) of part (3). The problem is: will the class cluster ever become a problem in the future and force us to rewrite everything and lose our sleep? The short answer is: I don't know! I guess any pattern can get in the way in some unpredicted way at some unpredicted point in the future. We can try to anticipate those issues. In the case of the class cluster, some of the questions to answer are obviously: how do we deal with irrelevant messages sent to inappropriate subclasses, such as sending 'complement' to a BCSequenceProtein? how frequent these messages will be? how do we deal with new sequence types that could be introduced later? how frequently will new sequence types be needed? The answer to that is to list as much as we can all the methods that would have to go in the final implementation of BCSequence and see how the current sequence types could deal with it. Also, we would have to think about what other types of sequences could be added in the future (which could be inspired by other BioX projects) and hope that a future BCSequenceExtraterrestrial won't break everything. This may have already been discussed earlier on the mailing list? Some examples of how to deal with irrelevant methods: * complement of a protein: return the same sequence; return an empty sequence; return nil?? * cut a protein with EcoRI: OK, this is easy, you just get the same protein!! Or do you get the sequence of the EcoRI protein!!! * etc... The existing code will have to deal with this anyway. When I look at the present code, I see you can return BCSequence objects without knowing the type, as returned by 'sequenceWithString:' in the BCSequenceFactory class. And then, this is allowed to get in the BCToolComplement with the method 'complementToolWithSequence:'. What if the BCSequence created is a protein? The abstraction that you did encode in BCSymbol already allows you to deal with it, you did a great job! 7. Full incorporation of the present implementation ---------------------------------------------- This is a discussion about pitfall (d) of part (3). The implementation I attached to the email is quite basic and could be further refined to incorporate the features and organization of the current implementation and the short-term planned additions. The current class tree can probably be used as is. One problem is the name BCSequence would be taken for the superclass; this is probably the name that should be public. Then we could have the following: * BCSymbolList = subclass of BCSequence * BCSeq = subclass of BCSymbolList with annotations * BCSeqDNA, BCSeqRNA, etc... = subclasses of BCSeq with optimized methods for the different types of sequences The additional benefit is that the instance variables would not even be in the public header anymore, but in the subclass BCSymbolList (and BCSequenceFactory would then be even lighter, with no instance variable at all). An alternative is to decide that BCSymbolList would actually be BCSequence, and the annotated BCSequence would become BCSeq. It is thus mostly a problem of naming, which is somewhat secondary, but is still quite important because it would be here to stay and has to be easy to remember and logical... An additional problem is that if you instantiate BCSymbolList (in the case of non-annotated sequences), you want to make sure that it can handle ALL the messages declared in the header. It is not clear to me yet that it can do it. 8. Happy new year! ------------------ ... and thanks for reading this up to that point! Charles -- Charles Parnot charles.parnot at stanford.edu Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From mek at mekentosj.com Wed Jan 5 06:46:47 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Wed, 5 Jan 2005 12:46:47 +0100 Subject: Fwd: [Biococoa-dev] BCSequence class cluster Message-ID: <7AA9BE1B-5F0F-11D9-A29A-000D93AE89A4@mekentosj.com> Wow, you have been quite busy Charles, brilliant! Happy new year everybody! Just one thing I found recently by pure coincident but very related: http://developer.apple.com/documentation/Cocoa/Conceptual/CocoaObjects/ Articles/ClassClusters.html Op 5-jan-05 om 8:56 heeft Charles PARNOT het volgende geschreven: > It seems the class cluster possibility has raised some interest. So I > took some time to think it through and write some code. I got carried > away and wrote a lot of it, and also I wrote this long email, but now > you are used to those long emails:-) I like those ;-) *** ADDED: just got my email back that it's to big to post on the list, so I'll cut some things away.. > > Note that I am just proposing an implementation of a class cluster, > and some solutions to potential pitfalls, but I am not saying that you > should absolutely go with the class cluster design. I am a little > biased in favor of it, but you should really decide if (1) you want to > discuss it further and (2) discuss it further! Note that I mostly say > 'you' when I talk about the developers, but maybe at some point, I > should really start saying 'we' ;-) Yep, you definitely got stuck in here, haha. > Like I said before, several of the issues raised here apply to the > existing code and you will have to deal with it at some point. The > main point boils downs to the question of using a weakly typed object > BCSequence vs using strongly objects belonging to one of the > subclasses BCSequenceDNA/RNA/etc... Some of the code is a bit > schizophrenic right now and tries to deal with both cases... The class > cluster would favor the weakly typed route, and would make the design > more consistent and simpler. Which certainly is a good thing. > OK, so how would a class cluster look like? > > > 1. The user point of view > ---------------------- > > There are 2 things the user could complain about: > a- Some of the methods are only relevant for certain sequence types > b- Sequence objects are immutable > > About complaint (a) > In the header file BCSequence.h of the attached project, there are 2 > methods that are only relevant to a subset of the BCSequence type: > -complement and -reverseComplement. This is not a really big concern > at this point, because this is just 2 methods and it is quite easy to > return something for all cases (for a protein, probably just return > itself). But more methods in BCSequence or in the BCTools could give > the same issues. For instance, BCToolDigest. That would only have > sense on a DNA sequence when using restriction enzymes. Exactly. > The class BCSequence would always return something, empty sequences in > the worst case, leaving the troubles to the runtime. This is the only > appropriate way to handle it with the class cluster design, maybe > together with some error codes/handling mechanism. > But the user may want to be more specific about the BCSequence type > and get some compiler warnings when appropriate, instead of leaving it > to the runtime. The user might be ready to give up the simplicity of a > unique class and use more specific types. This is the issue of weak vs > strong typing, which relates to the issue of compiler vs runtime > errors/warnings. True, that's the big issue. > One possible answer is to say to the user: this is the way it is, just > accept it!! And I believe as a first version, it is really OK. I agree, some rules simply come with the system. > But there are also some ways to give the user the possibility to > choose between strong and weak typing and keep the class cluster > design, that I will explain later, below. > > About complaint (b) > I thought of enforcing immutability as a starting point, as this is > easier on the developer side to deal with immutable objects. Giving > the option of immutability to the user is anyway a good thing, as it > allows a number of optimizations, that could really pay off in a real > application with lots of copying, ref passing,... Yes, this is exactly how the mutable variants of NSData, NSString etc are setup as I discovered in the devnote I mentioned above. Indeed, it would be very nice to have a mutable and immutable variant of BCSequence objects. > Of course, it is nice to also have mutable objects. Definitely! With large sequences you certainly don't want to copy them all the time to new objects. > I will address that on the developer point of view (see below). Note > that ultimately, one thing would probably always be immutable: the > sequence type. > > > 2. Implementing the class cluster > ------------------------------ > > The class cluster that I implement in the attached project looks very > much like what you have already done. There is a superclass > BCSequence, and then subclasses, BCSequenceDNA, > BCSequenceRNA,...etc... plus a new special subclass BCSequenceFactory. > Now the purpose of a class cluster is that the user just does > everything using the public interface for BCSequence, and as far as > the user is concerned, every object is an instance of BCSequence. But > inside the hood, you actually return instances of one of the > subclasses so that some operations can be optimized for the particular > type of sequence you are dealing with. In other words the subclasses are private, only BCSequence.h is public right? > > The problem for the developer of a class cluster is that you know > which subclass to use only once you call one of the init methods, but > you still have to do the 'alloc' before the init. There is no way > BCSequence will know what subclass it should use at the time 'alloc' > is called. So the trick is to alloc a temporary instance of a > particular subclass, a 'placeholder' class. Look at the implementation > of 'alloc' in BCSequence.m. + (id)alloc { if (self==[BCKSequence class] // Should this be [BCSequence class]? return [BCKSequencePlaceholder alloc]; // So this would be [BCSequenceFactory alloc]? else return [super alloc]; } > What this method returns is actually an instance of BCSequenceFactory > when called on the superclass (when called on one of the subclass, > though, it just passes the message up to NSObject). The bottom line > is: you never create an instance of BCSequence, but an instance of > BCSequenceFactory (you still alloc instances of BCSequence subclasses, > of course). In fact, that BCSequenceFactory instance could be a > singleton and never deallocated if we changed the code a little bit. > > Then when one of the init method is called on that new > BCSequenceFactory instance. This method actually allocs and inits a > new object, an instance of the appropriate subclass. It then releases > self and returns a pointer to the new object created. Because she > should always use the value returned by init to set your pointers, the > user will get the right object in the end. OK, I get it, looks very nice! > . > Like for any superclass/subclass pattern, it is important to define > what methods the subclasses should, may or should not override, and I > have a summary of that in the attached project. It is very similar to > what you have already done. Yep, guess that's easy to headerdoc along with every method > > > 3. Pros and cons > --------------- > > But first, while writing the code and thinking about the whole > concept, I also realized the potential benefits of a class cluster, > and there are more than what I anticipated. Some of these benefits are > really the benefits you get from OO, but are even more apparent with > such a simple interface where things are even more encapsulated > because it is almost like you have just one class: > * super simple interface for the user; she also gets the benefit of > polymorphism without the need to know the existence of all the > subclasses; That's even a big advantage for us ;-) Think in terms of tutorials and documentation. > * because the public interface is reduced, the developer can make > plenty of changes without breaking existing code developed by the user Yep > * in particular, it allows the addition of new types of sequences or > optimized subclasses for particular uses, that may in most cases > already work with the code developed by the user; so the user can get > new functionality for free Exactly, like adding the mutable variants > > I remember in the discussions, there was some disagreement about > having subclasses (Alex's choice) or just one class which would decide > what to do depending on the symbolSet used (Koen's choice); maybe a > class cluster is a way to have many of the benefits of the 2 systems > without too many of the problems. > More about pros and cons of class cluster on the Apple web site: > http://developer.apple.com/documentation/Cocoa/Conceptual/ > CocoaObjects/Articles/ClassClusters.html Aha, maybe should have read the whole thing first. I like to approach these long emails more as conversations, commenting along the way so everyone can follow my (sometimes twisted) thoughts ;-) > > For me, the bottom line is still unclear. At present, I feel that a > class cluster would work really well. But we have to anticipate now > all the potential problems, and we should decide if it is worth it. That's exactly my thought at the moment, indeed it fits nicely in between the two opposite choices in the subclassing debate and satisfies most arguments. The only problem is that I don't have a real oversight to see potential problems coming, but that's simply because of my inexperience with programming. Perhaps we just have to take the jump and see where it ends, at least it has proven very effective in the cocoa framework (wow, that's a biased opinion ;-). > > 4. Compile vs runtime errors > -------------------------- > This is a discussion about complaint (a) of part (1) and pitfall (a) > of part (3). What if the user wants more control over the type of > sequence it is using and want some compiler warnings when trying to > cut a protein with EcoRI, or get its complementary sequence? > > At this point, the class cluster does not allow that. All the methods > are valid for all the sequence types. In this context, an invalid call > will only be revealed at runtime, and a BCProtein object would have to > decide at runtime to return something when sent an irrelevant message. > What should it send back? This issue is actually slightly different > from the discussion here and is discussed in part 6 (sorry this whole > email is quite large and complicated; I am trying to keep it > readable!). Hanging it here.. Still around.. ;-) > The question here is really: can we prevent that from even happening > when the user knows what type of sequence she is dealing with and > could get compiler warnings? > > One way to help with that is to provide an additional set of headers > defining some public classes named BCSequenceDNA, BCSequenceRNA,.... > These classes would just be placeholders, and would be completely > disctint from the subclasses of BCSequence (I will come back to the > name conflict). Good idea. > They would have some init methods, but when the user uses these > classes and alloc/init an instance, she would get in fact one of the > BCSequence subclasses. The compiler would not know and would trust the > headers to generate warning. For instance, the header for the > BCSequenceProtein placeholder class would not define the methods > 'complement' or 'cutWithRestrictionEnzyme:', and you would get a > compiler warning even though the object would in fact respond to the > methods at runtime (but would have to return some dummy values). So > these headers would really define completely virtual classes. One of > the problem is the names of these placeholder classes conflict with > the names of the BCSequence private subclasses that are defined in the > project I sent. We could rename the latter to BCSeqDNA/RNA/... for > example, and keep the nice full names 'BCSequenceDNA/RNA/...' for the > placeholder public classes. Seems feasible, although having separate names for internal vs public representations might be troublesome. > > An alternative is to define protocols, and so the user would have to > use (id ) in the code. The BCSequence would provide > methods to return objects typed this way. It is a bit of a pain to > type id all the time and reduces readability, though. Yes, that's painful. > > So there are ways to solve the problem. Note that the problem is not > really tied to the class cluster implementation and is already partly > a problem that the current code is facing, as I talked about at the > very beginning of the email (OK, now is a good time to reread > everything!!). > > Of course, the interface then becomes a bit schizophrenic, so it may > not be such a good idea to allow all of that. At least in the > beginning, there may be not such a high need for stronger typing, and > this goes a bit against the whole idea of a simple interface and a > class cluster. Perhaps you're right, but what I was thinking is to implement a way to better return the reason why something don't work instead of a simple nil. For instance, calling cutInPiecesWithThisRestrictionEnzyme on a DNA would return the pieces, while it would also work on proteins, but return nil right. Of course you could also let the method return an exception, it will then become the developers responsibility to call methods on the right object. The downside is that this might lead to easily to program halts/crashes if the developer doesn't pay attention. But think in terms of NSArray objectAtIndex method, it returns nil if you ask an object out of bounds, AND raises an Exception. I'm still wondering a bit how we're going to implement these kind of methods, as we now have to start ALL methods with a test what the sequence type is. > > > > 5. Mutable and immutable instances > -------------------------------- > This is a discussion about complaint (b) of part (1) and pitfall (b) > of part (3). > > Why impose immutable objects? Not sure. > This is not something I had thought of at first, but it is anyway an > important issue that goes beyond the idea of class cluster. Immutable > objects allows very important and basic optimizations, particularly > when copying objects, and are sufficient for most uses. A smart user > will use immutable objects whenever it can and will only go to mutable > objects if really necessary. This is something we may have to think > about for the BioCocoa project anyway. I am not saying it is > absolutely necessary but it should be discussed (and maybe it has > been??). I've been in favour of both mutable of immutable bcsequences from the beginning, didn't know how to implement it in a simple way however ;-) > > To implement mutable objects in the class cluster could be a bit > tricky, because there are two conflicting subclass organizations here: > mutable/immutable and dna/rna/protein/codon. To get all the > combinations, it seems that we need 8 subclasses!! Oops, Koen won't like this, LOL ;-) On the other hand, look at the number of NSNumber subclasses... > > I am not completely sure how to deal with it, or if we should deal > with it or just give up and stick to mutable only. One possibility is > to not have distinct subclasses for mutable/immutable. Instead, there > could be simply a BOOL flag 'isMutable' as one of the instance > variables. The object would then return different results in key > methods such as 'copy' depending on the value of the flag. But then we could just as well do the subclasses right? > Also, at creation, it would create mutable or immutable instance > variables (NSArray or NSMutableArray) depending on the value of that > flag. It is OK to declare a mutable object as the instance variable > and then actually use it to allocate an immutable object, as long as > we are consistent in the methods called to avoid runtime errors (and > we should use some casts to avoid compiler warnings). I think the choice in this system is simple, either the subclass or a mutable variant only. > > > 6. Potential clashes in the future > -------------------------------- > This is a discussion about pitfall (c) of part (3). > The problem is: will the class cluster ever become a problem in the > future and force us to rewrite everything and lose our sleep? > The short answer is: I don't know! Me neither. > > I guess any pattern can get in the way in some unpredicted way at some > unpredicted point in the future. Or now already, look at the discussion about subclassing. > > 7. Full incorporation of the present implementation > ---------------------------------------------- > It is thus mostly a problem of naming, which is somewhat secondary, > but is still quite important because it would be here to stay and has > to be easy to remember and logical... > > An additional problem is that if you instantiate BCSymbolList (in the > case of non-annotated sequences), you want to make sure that it can > handle ALL the messages declared in the header. It is not clear to me > yet that it can do it. Let's first decide if we all like the idea of the class cluster, and then see how to implement it and the naming. Just one thing you might have thought about as well Charles, how do you see the annotations stuff fitting in this scheme? The nice thing is that it applies to all subclasses, but can it still be implemented in the superclass? Perhaps not, as the mutable vs immutable implementation will be quite different. And that's where my major doubt is, as you mentioned you have both a divergency in the direction of mutable vs immutable, as well as in DNA/RNA/Protein. This automatically leads to duplication of the code in one of the two directions I'm afraid... There's plenty to discuss ;-) > > > 8. Happy new year! > ------------------ > > ... and thanks for reading this up to that point! It was a pleasure! Cheers, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows vs Mac 65 million years ago, there were more dinosaurs than humans. Where are the dinosaurs now? ********************************************************* -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 19240 bytes Desc: not available URL: From kvddrift at earthlink.net Wed Jan 5 07:21:24 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 5 Jan 2005 07:21:24 -0500 Subject: [Biococoa-dev] BCSequence class cluster In-Reply-To: References: Message-ID: <50DAA6B1-5F14-11D9-B0D0-003065A5FDCC@earthlink.net> On Jan 5, 2005, at 2:56 AM, Charles PARNOT wrote: > It seems the class cluster possibility has raised some interest. So I > took some time to think it through and write some code. I got carried > away and wrote a lot of it, and also I wrote this long email, but now > you are used to those long emails:-) > Hi Charles, Your email looks very impressive - I need some time to read it through, though, and I will reply later ;-) Thanks for jumping into BioCocoa and join in our discussions. cheers, - Koen. From charles.parnot at stanford.edu Wed Jan 5 15:12:40 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Wed, 5 Jan 2005 12:12:40 -0800 Subject: [Biococoa-dev] BCSequence class cluster In-Reply-To: <7AA9BE1B-5F0F-11D9-A29A-000D93AE89A4@mekentosj.com> References: <7AA9BE1B-5F0F-11D9-A29A-000D93AE89A4@mekentosj.com> Message-ID: >>About complaint (b) >>I thought of enforcing immutability as a starting point, as this is >>easier on the developer side to deal with immutable objects. Giving >>the option of immutability to the user is anyway a good thing, as >>it allows a number of optimizations, that could really pay off in a >>real application with lots of copying, ref passing,... >Yes, this is exactly how the mutable variants of NSData, NSString >etc are setup as I discovered in the devnote I mentioned above. >Indeed, it would be very nice to have a mutable and immutable >variant of BCSequence objects. You will notice that NSNumber does not have an mutable version. Why? One reason is that creating a new instance is not too costly, the data is small. Another reason is maybe that the implementation is a bit more tricky as the NSNumber resembles our BCSequence, with a large number of potential subclasses, and then the question of how to implement mutability and immutability. >>2. Implementing the class cluster >>------------------------------ >> >>The class cluster that I implement in the attached project looks >>very much like what you have already done. There is a superclass >>BCSequence, and then subclasses, BCSequenceDNA, >>BCSequenceRNA,...etc... plus a new special subclass >>BCSequenceFactory. Now the purpose of a class cluster is that the >>user just does everything using the public interface for >>BCSequence, and as far as the user is concerned, every object is an >>instance of BCSequence. But inside the hood, you actually return >>instances of one of the subclasses so that some operations can be >>optimized for the particular type of sequence you are dealing with. >In other words the subclasses are private, only BCSequence.h is public right? Yes. >>The problem for the developer of a class cluster is that you know >>which subclass to use only once you call one of the init methods, >>but you still have to do the 'alloc' before the init. There is no >>way BCSequence will know what subclass it should use at the time >>'alloc' is called. So the trick is to alloc a temporary instance of >>a particular subclass, a 'placeholder' class. Look at the >>implementation of 'alloc' in BCSequence.m. >+ (id)alloc >{ > if (self==[BCKSequence class] // Should this be [BCSequence class]? > return [BCKSequencePlaceholder alloc]; // So this >would be [BCSequenceFactory alloc]? > else > return [super alloc]; >} Arghh, a stupid typo in the most important piece of code!! OK, I corrected the code in the link. Download it again... >That's exactly my thought at the moment, indeed it fits nicely in >between the two opposite choices in the subclassing debate and >satisfies most arguments. The only problem is that I don't have a >real oversight to see potential problems coming, but that's simply >because of my inexperience with programming. Perhaps we just have to >take the jump and see where it ends, at least it has proven very >effective in the cocoa framework (wow, that's a biased opinion ;-). Yes, there is a real good foundation in the framework and plenty of good ideas of implementation. You/we are probably at the point where we foresee all the potential developments and have a better sense of what the design can be. >>They would have some init methods, but when the user uses these >>classes and alloc/init an instance, she would get in fact one of >>the BCSequence subclasses. The compiler would not know and would >>trust the headers to generate warning. For instance, the header for >>the BCSequenceProtein placeholder class would not define the >>methods 'complement' or 'cutWithRestrictionEnzyme:', and you would >>get a compiler warning even though the object would in fact respond >>to the methods at runtime (but would have to return some dummy >>values). So these headers would really define completely virtual >>classes. One of the problem is the names of these placeholder >>classes conflict with the names of the BCSequence private >>subclasses that are defined in the project I sent. We could rename >>the latter to BCSeqDNA/RNA/... for example, and keep the nice full >>names 'BCSequenceDNA/RNA/...' for the placeholder public classes. >Seems feasible, although having separate names for internal vs >public representations might be troublesome. In case it was not clear, and because I am not sure what you understood, I want to say again that they have to have different names. We cannot keep the same names for the private and public classes. It it true that it could be a little confusing for the developer, but we would probably almost never use the public classes internally; so confusion will be not too bad. Also, using an abbreviation like BCSeq for something internal is a good mnemonic to remember that these names are really private. > >Perhaps you're right, but what I was thinking is to implement a way >to better return the reason why something don't work instead of a >simple nil. For instance, calling >cutInPiecesWithThisRestrictionEnzyme on a DNA would return the >pieces, while it would also work on proteins, but return nil right. >Of course you could also let the method return an exception, it will >then become the developers responsibility to call methods on the >right object. The downside is that this might lead to easily to >program halts/crashes if the developer doesn't pay attention. But >think in terms of NSArray objectAtIndex method, it returns nil if >you ask an object out of bounds, AND raises an Exception. >I'm still wondering a bit how we're going to implement these kind of >methods, as we now have to start ALL methods with a test what the >sequence type is. No, there is no test at the beginning of a method. It is simply coded in the subclass. For example, BCSequenceProtein could override the 'complement' method to return an empty sequence. Actually, this is not such a great example as 'complement' can easily be taken care of by the superclass which would call 'complement' on the BCSymbol objects of the symbolArray. Now I have an additional comment on what to do with strongly typed instances, when the user is purposedly using a BCSequenceProtein, has a call to 'complement' and ignores the compiler warning and runs the program. It would then be nice to have run time error (yeah, this is nice!) when calling a method on a strongly typed instance. For this we could have an additional flag 'isTyped' and have the private BCSeqProtein check the value of the flag in the critical methods, and raise an exception if isTyped=YES or call super if =NO. >>To implement mutable objects in the class cluster could be a bit >>tricky, because there are two conflicting subclass organizations >>here: mutable/immutable and dna/rna/protein/codon. To get all the >>combinations, it seems that we need 8 subclasses!! >Oops, Koen won't like this, LOL ;-) On the other hand, look at the >number of NSNumber subclasses... See my comment above about NSNumber. They did not bother to implement mutability, probably not worth it in the case of NSNumber. Now, they would be in trouble if they decided to implement a wrapper for C arrays of different types, ie vectors or matrices. You would have all the combinations mutable/double, immutable/double, mutable/float, immutable/float, mutable/int,... >>I am not completely sure how to deal with it, or if we should deal >>with it or just give up and stick to mutable only. One possibility >>is to not have distinct subclasses for mutable/immutable. Instead, >>there could be simply a BOOL flag 'isMutable' as one of the >>instance variables. The object would then return different results >>in key methods such as 'copy' depending on the value of the flag. >But then we could just as well do the subclasses right? Yes, that may be true. On the other hand, most of the code could be in the superclass and use that flag. In fact, we should start thinking about where mutability makes a difference. What methods should be implemented. There are not so many: insertSequenceAtRange, removeSequenceAtRange, setSequence, appendSequence, addAnnotation(s), removeAnnotation(s). These would be defined in another placeholder class 'BCMutableSequence' (which would return in facts subclasses of the class cluster), which would give compiler warnings if called on BCSequence objects. If they are called at runtime on a sequence for which isMutable=NO, they would generate a runtime error (so a test would be needed at the beginning of each of these methods). It seems they might be coded in the superclass. The same is true for 'copy', that may not even have to know if the copied instance is mutable. For example, it would do [symbolArray copy] which would return the same pointer if symbolArray is immutable, or a real copy if symbolArray is mutable. Note that 'copy' always returns an immutable instance by convention. Then 'mutableCopy' would apply the same tricks. The subclasses may have to deal with their own instance variables (they don't have any so far), and may have to check [self isMutable]. >Let's first decide if we all like the idea of the class cluster, and >then see how to implement it and the naming. Just one thing you >might have thought about as well Charles, how do you see the >annotations stuff fitting in this scheme? The nice thing is that it >applies to all subclasses, but can it still be implemented in the >superclass? Perhaps not, as the mutable vs immutable implementation >will be quite different. And that's where my major doubt is, as you >mentioned you have both a divergency in the direction of mutable vs >immutable, as well as in DNA/RNA/Protein. This automatically leads >to duplication of the code in one of the two directions I'm afraid... >There's plenty to discuss ;-) About annotations, I have not a good grasp of the whole concept, but it certainly seems that if the concept of an annotation is sufficiently abstract, it could easily go in the superclass, the same way many methods can be handled in the superclass thanks to the BCSymbol abstraction you guys have designed. In fact, if annotations are just one NSArray, it is not too costly in terms of memory, adding just one instance variable = a few bytes to the size of the object, and can be kept to nil if no annotations are present. In conclusion, to discuss the class cluster possibility, it is maybe time to come up with a list of: * methods that could be in the public BCSequence.h header; we should not be afraid to have many; they could be dispatched in categories for convenience; the doc for BCSequence would be big, but that would be quite normal! * methods that could be in the public BCMutableSequence.h header * sequence types that could be added in the future And then see how well that fits with class cluster, and if mutable/immutable implementation is feasible. OK, I'll stop there, hoping teh europeans will get that before going to bed. Charles -- Charles Parnot charles.parnot at stanford.edu Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mek at mekentosj.com Wed Jan 5 15:35:05 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Wed, 5 Jan 2005 21:35:05 +0100 Subject: [Biococoa-dev] BCSequence class cluster In-Reply-To: References: <7AA9BE1B-5F0F-11D9-A29A-000D93AE89A4@mekentosj.com> Message-ID: <48A29064-5F59-11D9-9A51-000D93AE89A4@mekentosj.com> > OK, I'll stop there, hoping teh europeans will get that before going > to bed. Yep, still got it. Don't have the time to reply to all of it yet, but one comment that came to mind immediately: > You will? notice that NSNumber does not have an mutable version. Why? > One reason is that creating a new instance is not too costly, the data > is small. Another reason is maybe that the implementation is a bit > more tricky as the NSNumber resembles our BCSequence, with a large > number of potential subclasses, and then the question of how to > implement mutability and immutability. You're right, but in our case we're not talking about small subclasses with only one variable (int, float, bool etc), the BCSequences are way to big to return a new instance every time you call a method on it. Therefore, I would propose to start with the mutable version, then later we can always generate the immutable versions in addition for optimization purposes. About the warnings, I'm not that much of a fan to add flags like isMutable or isTyped, in the first case I would rather have a real immutable subclass and in the second can't we just generate the runtime errors, in general those will surface in 99% of the cases in the development cycle and the developer can take countermeasures to prevent the end-user from doing stupid things. Just my 2 cents, Alex Op 5-jan-05 om 21:12 heeft Charles PARNOT het volgende geschreven: > About complaint (b) > I thought of enforcing immutability as a starting point, as this is > easier on the developer side to deal with immutable objects. Giving > the option of immutability to the user is anyway a good thing, as it > allows a number of optimizations, that could really pay off in a real > application with lots of copying, ref passing,... > Yes, this is exactly how the mutable variants of NSData, NSString etc > are setup as I discovered in the devnote I mentioned above. Indeed, it > would be very nice to have a mutable and immutable variant of > BCSequence objects. > > > > > > 2. Implementing the class cluster > ------------------------------ > > The class cluster that I implement in the attached project looks very > much like what you have already done. There is a superclass > BCSequence, and then subclasses, BCSequenceDNA, > BCSequenceRNA,...etc... plus a new special subclass BCSequenceFactory. > Now the purpose of a class cluster is that the user just does > everything using the public interface for BCSequence, and as far as > the user is concerned, every object is an instance of BCSequence. But > inside the hood, you actually return instances of one of the > subclasses so that some operations can be optimized for the particular > type of sequence you are dealing with. > In other words the subclasses are private, only BCSequence.h is public > right? > > Yes. > > > > The problem for the developer of a class cluster is that you know > which subclass to use only once you call one of the init methods, but > you still have to do the 'alloc' before the init. There is no way > BCSequence will know what subclass it should use at the time 'alloc' > is called. So the trick is to alloc a temporary instance of a > particular subclass, a 'placeholder' class. Look at the implementation > of 'alloc' in BCSequence.m. > + (id)alloc > { > ??????? if (self==[BCKSequence class]? // Should this be [BCSequence > class]? > ??????? ??????? return [BCKSequencePlaceholder alloc];? // So this > would be [BCSequenceFactory alloc]? > ??????? else > ??????? ??????? return [super alloc]; > } > > Arghh, a stupid typo in the most important piece of code!! OK, I > corrected the code in the link. Download it again... > > > > > > That's exactly my thought at the moment, indeed it fits nicely in > between the two opposite choices in the subclassing debate and > satisfies? most arguments. The only problem is that I don't have a > real oversight to see potential problems coming, but that's simply > because of my inexperience with programming. Perhaps we just have to > take the jump and see where it ends, at least it has proven very > effective in the cocoa framework (wow, that's a biased opinion ;-). > > Yes, there is a real good foundation in the framework and plenty of > good ideas of implementation. You/we are probably at the point where > we foresee all the potential developments and have a better sense of > what the design can be. > > > > > > > They would have some init methods, but when the user uses these > classes and alloc/init an instance, she would get in fact one of the > BCSequence subclasses. The compiler would not know and would trust the > headers to generate warning. For instance, the header for the > BCSequenceProtein placeholder class would not define the methods > 'complement' or 'cutWithRestrictionEnzyme:', and you would get a > compiler warning even though the object would in fact respond to the > methods at runtime (but would have to return some dummy values). So > these headers would really define completely virtual classes. One of > the problem is the names of these placeholder classes conflict with > the names of the BCSequence private subclasses that are defined in the > project I sent. We could rename the latter to BCSeqDNA/RNA/... for > example, and keep the nice full names 'BCSequenceDNA/RNA/...' for the > placeholder public classes. > Seems feasible, although having separate names for internal vs public > representations might be troublesome. > > In case it was not clear, and because I am not sure what you > understood, I want to say again that they have to have different > names. We cannot keep the same names for the private and public > classes. It it true that it could be a little confusing for the > developer, but we would probably almost never use the public classes > internally; so confusion will be not too bad. Also, using an > abbreviation like BCSeq for something internal is a good mnemonic to > remember that these names are really private. > > > > Perhaps you're right, but what I was thinking is to implement a way to > better return the reason why something don't work instead of a simple > nil. For instance, calling cutInPiecesWithThisRestrictionEnzyme on a > DNA would return the pieces, while it would also work on proteins, but > return nil right. Of course you could also let the method return an > exception, it will then become the developers responsibility to call > methods on the right object. The downside is that this might lead to > easily to program halts/crashes if the developer doesn't pay > attention. But think in terms of NSArray objectAtIndex method, it > returns nil if you ask an object out of bounds, AND raises an > Exception. > I'm still wondering a bit how we're going to implement these kind of > methods, as we now have to start ALL methods with a test what the > sequence type is. > > No, there is no test at the beginning of a method. It is simply coded > in the subclass. For example, BCSequenceProtein could override the > 'complement' method to return an empty sequence. Actually, this is not > such a great example as 'complement' can easily be taken care of by > the superclass which would call 'complement' on the BCSymbol objects > of the symbolArray. > Now I have an additional comment on what to do with strongly typed > instances, when the user is purposedly using a BCSequenceProtein, has > a call to 'complement' and ignores the compiler warning and runs the > program. It would then be nice to have run time error (yeah, this is > nice!) when calling a method on a strongly typed instance. For this we > could have an additional flag 'isTyped' and have the private > BCSeqProtein check the value of the flag in the critical methods, and > raise an exception if isTyped=YES or call super if =NO. > > > To implement mutable objects in the class cluster could be a bit > tricky, because there are two conflicting subclass organizations here: > mutable/immutable and dna/rna/protein/codon. To get all the > combinations, it seems that we need 8 subclasses!! > Oops, Koen won't like this, LOL ;-) On the other hand, look at the > number of NSNumber subclasses... > > See my comment above about NSNumber. They did not bother to implement > mutability, probably not worth it in the case of NSNumber. Now, they > would be in trouble if they decided to implement a wrapper for C > arrays of different types, ie vectors or matrices. You would have all > the combinations mutable/double, immutable/double, mutable/float, > immutable/float, mutable/int,... > > > > I am not completely sure how to deal with it, or if we should deal > with it or just give up and stick to mutable only. One possibility is > to not have distinct subclasses for mutable/immutable. Instead, there > could be simply a BOOL flag 'isMutable' as one of the instance > variables. The object would then return different results in key > methods such as 'copy' depending on the value of the flag. > But then we could just as well do the subclasses right? > > Yes, that may be true. On the other hand, most of the code could be in > the superclass and use that flag. In fact, we should start thinking > about where mutability makes a difference. What methods should be > implemented. There are not so many: insertSequenceAtRange, > removeSequenceAtRange, setSequence, appendSequence, addAnnotation(s), > removeAnnotation(s). These would be defined in another placeholder > class 'BCMutableSequence' (which would return in facts subclasses of > the class cluster), which would give compiler warnings if called on > BCSequence objects. If they are called at runtime on a sequence for > which isMutable=NO, they would generate a runtime error (so a test > would be needed at the beginning of each of these methods). It seems > they might be coded in the superclass. The same is true for 'copy', > that may not even have to know if the copied instance is mutable. For > example, it would do [symbolArray copy] which would return the same > pointer if symbolArray is immutable, or a real copy if symbolArray is > mutable. Note that 'copy' always returns an immutable instance by > convention. Then 'mutableCopy' would apply the same tricks. The > subclasses may have to deal with their own instance variables (they > don't have any so far), and may have to check [self isMutable]. > > > > > Let's first decide if we all like the idea of the class cluster, and > then see how to implement it and the naming. Just one thing you might > have thought about as well Charles, how do you see the annotations > stuff fitting in this scheme? The nice thing is that it applies to all > subclasses, but can it still be implemented in the superclass? Perhaps > not, as the mutable vs immutable implementation will be quite > different. And that's where my major doubt is, as you mentioned you > have both a divergency in the direction of mutable vs immutable, as > well as in DNA/RNA/Protein. This automatically leads to duplication of > the code in one of the two directions I'm afraid... > There's plenty to discuss ;-) > > About annotations, I have not a good grasp of the whole concept, but > it certainly seems that if the concept of an annotation is > sufficiently abstract, it could easily go in the superclass, the same > way many methods can be handled in the superclass thanks to the > BCSymbol abstraction you guys have designed. > In fact, if annotations are just one NSArray, it is not too costly in > terms of memory, adding just one instance variable = a few bytes to > the size of the object, and can be kept to nil if no annotations are > present. > > In conclusion, to discuss the class cluster possibility, it is maybe > time to come up with a list of: > * methods that could be in the public BCSequence.h header; we should > not be afraid to have many; they could be dispatched in categories for > convenience; the doc for BCSequence would be big, but that would be > quite normal! > * methods that could be in the public BCMutableSequence.h header > * sequence types that could be added in the future > > And then see how well that fits with class cluster, and if > mutable/immutable implementation is feasible. > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows is a 32-bit patch to a 16-bit shell for an 8-bit operating system, written for a 4-bit processor by a 2- bit company without 1 bit of sense. ********************************************************* -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 13027 bytes Desc: not available URL: From kvddrift at earthlink.net Wed Jan 5 17:38:52 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 5 Jan 2005 17:38:52 -0500 Subject: [Biococoa-dev] BCSequence class cluster In-Reply-To: References: <7AA9BE1B-5F0F-11D9-A29A-000D93AE89A4@mekentosj.com> Message-ID: <931424C0-5F6A-11D9-BC51-003065A5FDCC@earthlink.net> Alright, Some time now to reply :) For more info on class clusters, also checkout the page at CocoaDev: http://www.cocoadev.com/index.pl?ClassClusters BTW, CocoaDev has a lot of very interesting pages. > They would have some init methods, but when the user uses these > classes and alloc/init an instance, she would get in fact one of the > BCSequence subclasses. The compiler would not know and would trust the > headers to generate warning. For instance, the header for the > BCSequenceProtein placeholder class would not define the methods > 'complement' or 'cutWithRestrictionEnzyme:', Proteins can be digested as well, so I suggest we just use a general digest tool class. Just feed it a site where to cut, and the tool returns an array of locations, or subsequences. > I'm still wondering a bit how we're going to implement these kind of > methods, as we now have to start ALL methods with a test what the > sequence type is. You only need to put the test in the init of the BCTools subclasses. No need to keep them in BCSequence, et al. Another possible solution could be to create a DNATools/ProteinTools class, which has convenience methods to various manipulations only for that type of sequence (this is the BioJava approach - I like the first one better, though). > > No, there is no test at the beginning of a method. It is simply coded > in the subclass. For example, BCSequenceProtein could override the > 'complement' method to return an empty sequence. Actually, this is not > such a great example as 'complement' can easily be taken care of by > the superclass which would call 'complement' on the BCSymbol objects > of the symbolArray. So, if I understand this correctly, when using class clusters, we still need to subclass each method? Not sure if I like that idea now :). As you remember, the reason I was trying to convince you guys to avoid subclassing, was to prevent code duplication in all the subclasses. This easily leads to errors and is difficult to maintain. I could of course be wrong, and not understand the class clusters completely yet. > Now I have an additional comment on what to do with strongly typed > instances, when the user is purposedly using a BCSequenceProtein, has > a call to 'complement' and ignores the compiler warning and runs the > program. It would then be nice to have run time error (yeah, this is > nice!) when calling a method on a strongly typed instance. For this we > could have an additional flag 'isTyped' and have the private > BCSeqProtein check the value of the flag in the critical methods, and > raise an exception if isTyped=YES or call super if =NO. Again, maybe I don't understand the class clusters yet, but isn't the idea to have users only use BCSequence, not BCSequenceXXX? I agree with Alex about not using flags, though. We have introduced the BCSequenceType and BCSymbolSet (my preference) to identify the type. > > To implement mutable objects in the class cluster could be a bit > tricky, because there are two conflicting subclass organizations here: > mutable/immutable and dna/rna/protein/codon. To get all the > combinations, it seems that we need 8 subclasses!! > Oops, Koen won't like this, LOL ;-) On the other hand, look at the > number of NSNumber subclasses... No, I don't like that :D, see my comment above. Another possibility (also stolen from BioJava) is to make a BCToolsEdit class that takes care of editing a immutable class. > > OK, I'll stop there, hoping teh europeans will get that before going > to bed. > Although European, I'm in North Carolina, so I have a few more hours. I will respond some more maybe later tonight. cheers, - Koen. From charles.parnot at stanford.edu Wed Jan 5 20:06:24 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Wed, 5 Jan 2005 17:06:24 -0800 Subject: [Biococoa-dev] (no subject) Message-ID: >Alright, > >Some time now to reply :) > >For more info on class clusters, also checkout the page at CocoaDev: > >http://www.cocoadev.com/index.pl?ClassClusters > >BTW, CocoaDev has a lot of very interesting pages. This is how I get started a few months ago with my fitting program where I decided to implement class clusters. It turned out to be quite easy. Adding subclasses is easy too. > >>They would have some init methods, but when the user uses these >>classes and alloc/init an instance, she would get in fact one of >>the BCSequence subclasses. The compiler would not know and would >>trust the headers to generate warning. For instance, the header for >>the BCSequenceProtein placeholder class would not define the >>methods 'complement' or 'cutWithRestrictionEnzyme:', > >Proteins can be digested as well, so I suggest we just use a general >digest tool class. Just feed it a site where to cut, and the tool >returns an array of locations, or subsequences. This is EXCELLENT. The more general the approach, the more encapsulated, the easier it will be to write and maintain and keep as much backward-compatibility as possible. >> >>No, there is no test at the beginning of a method. It is simply >>coded in the subclass. For example, BCSequenceProtein could >>override the 'complement' method to return an empty sequence. >>Actually, this is not such a great example as 'complement' can >>easily be taken care of by the superclass which would call >>'complement' on the BCSymbol objects of the symbolArray. > >So, if I understand this correctly, when using class clusters, we >still need to subclass each method? Not sure if I like that idea now >:). As you remember, the reason I was trying to convince you guys to >avoid subclassing, was to prevent code duplication in all the >subclasses. This easily leads to errors and is difficult to >maintain. I could of course be wrong, and not understand the class >clusters completely yet. OK, I should have been clearer. As much as possible, we should keep the implementation in the superclass. And actually, 'complement' is an excellent example of it and how the abstraction you have set up for BCSymbol works well. When the case can be handled by the superclass, we should do it. Most of the BCSequence methods in my mini-project can be handled at the superclass level, I now appreciate more There might still be cases where specific methods have to be written for a specific type of sequence. It is then easy to add the method to the subclass, instead of having some case of if statements in the superclass. This is more elegant and easier to maintain. I know you already had the discussion with Alex. No matter what you do, if a type of sequence has to be treated separately, you have to write two different versions of a particular piece of code. It is actually easier to separate the two cases in two separate methods for two different classes in two different files. It still think I see your point. The only problem with subclass is that it is quite easy not to realize that you are duplicating code. It is more apparent when you have a series of if statement in front of your eyes, all with the same contents. You can also more easily spot the common stuff. Having subclasses does not force you to duplicate code, it just tends to happen more frequently. > >>Now I have an additional comment on what to do with strongly typed >>instances, when the user is purposedly using a BCSequenceProtein, >>has a call to 'complement' and ignores the compiler warning and >>runs the program. It would then be nice to have run time error >>(yeah, this is nice!) when calling a method on a strongly typed >>instance. For this we could have an additional flag 'isTyped' and >>have the private BCSeqProtein check the value of the flag in the >>critical methods, and raise an exception if isTyped=YES or call >>super if =NO. > >Again, maybe I don't understand the class clusters yet, but isn't >the idea to have users only use BCSequence, not BCSequenceXXX? I >agree with Alex about not using flags, though. We have introduced >the BCSequenceType and BCSymbolSet (my preference) to identify the >type. Yes, sorry about the confusion. I was just proposing a possibility in case the user complains she wants more strongly typed classes. I agree it goes against the concept of class cluster and its simplicity. I was just trying to show that even in the context of a class cluster, there are ways to still propose some more strongly typed headers without too much efforts, and give the option to the user. It seems Alex and you don't really ask for that, and I like the simplicity of just one public interface too. Let's just forget about that or bring the issue back later. >> >>To implement mutable objects in the class cluster could be a bit >>tricky, because there are two conflicting subclass organizations >>here: mutable/immutable and dna/rna/protein/codon. To get all the >>combinations, it seems that we need 8 subclasses!! >>Oops, Koen won't like this, LOL ;-) On the other hand, look at the >>number of NSNumber subclasses... > >No, I don't like that :D, see my comment above. Another possibility >(also stolen from BioJava) is to make a BCToolsEdit class that takes >care of editing a immutable class. I can see that: then BCToolEdit would havbe to return new instances (otherwise it is not immutable anymore), but would deal with optimization better than a dull 'copy'. > >> >>OK, I'll stop there, hoping teh europeans will get that before going to bed. >> > >Although European, I'm in North Carolina, so I have a few more >hours. I will respond some more maybe later tonight. > > >cheers, > >- Koen. Can't wait for more!! Charles -- Charles Parnot charles.parnot at stanford.edu Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From charles.parnot at stanford.edu Wed Jan 5 21:15:10 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Wed, 5 Jan 2005 18:15:10 -0800 Subject: [Biococoa-dev] BCSequence class cluster Message-ID: >>You will notice that NSNumber does not have an mutable version. >>Why? One reason is that creating a new instance is not too costly, >>the data is small. Another reason is maybe that the implementation >>is a bit more tricky as the NSNumber resembles our BCSequence, with >>a large number of potential subclasses, and then the question of >>how to implement mutability and immutability. >You're right, but in our case we're not talking about small >subclasses with only one variable (int, float, bool etc), the >BCSequences are way to big to return a new instance every time you >call a method on it. That is also my point further down in the previous email. For NSNumber, Apple can get away by saying 'you don't need a mutable version, the immutable does not cost much'. But the user of BioCocoa would probably not be very happy with only an immutable version (and we would not be easier)... If we have to stick to just one version, we have to stick to the mutable. >Therefore, I would propose to start with the mutable version, then >later we can always generate the immutable versions in addition for >optimization purposes. OK, I agree with that. It seems easier to implement the mutable version first. The only problem is that the addition of the immutable class could break existing programs in the future. More specifically, if a program keeps using BCSequence as mutable objects and then suddenly the framework decides they are now immutable and the user should now use BCMutableSequence, there will be a lot of rewrite necessary. An alternative is to call the puclic class BCMutableSequence instead now. And not have a BCSequence class (at least no public class). So future introduction of BCSequence will not break old code. I know there is not so many applications relying on BioCocoa yet, but even if there are just one or two (like the examples in the CVS project), it might still be worth thinking about it now. Another alternative is to have both public headers for in fact the same implementation. The implementation would be very easy: * superclass BCSequence: header without the mutable-specific methods * subclass BCMutableSequence: header with the mutable-specific methods * the rest of the subclasses = below that Then, at least, the user gets some compiler warnings even if the mutable-specific methods actually will run fine on the immutable instances. Of course, there could still be cases not detected by the compiler and and that should be illegal at runtime (like if the user ignores the compiler warnings!!) and would still cause programs to break in the future. But that would be less frequent thanks to the compiler warnings. Whichever way we implement it in the future, we have to make sure that it is feasible. It looks like we already have several options and opinions, so that's a good sign! We can decide how to do it later. So the roadmap is: * do we want mutable and immutable objects? --> well, yeah, sure! (who would say no?) * can we do it in the future? --> apparently yes, we have several ideas (it does not mean it will be easy to maintain code, we have to make the best decision at some point, and Koen raised that important issue) * can we do it in the future with the class cluster design? --> same answer, same ideas * should we have something temporary already in place to ensure backward-compatibility? --> my vote: yes; what do you guys think? * if yes, how? --> I proposed 2 ways >About the warnings, I'm not that much of a fan to add flags like >isMutable or isTyped, in the first case I would rather have a real >immutable subclass and in the second can't we just generate the >runtime errors, in general those will surface in 99% of the cases in >the development cycle and the developer can take countermeasures to >prevent the end-user from doing stupid things. > >Just my 2 cents, >Alex The problem with an immutable suclass is that it would inherit the methods in the mutable superclass, and thus the user would not get compiler warnings when using a mutable-specific method on a immutable class. About the isTyped, see my response to Koen (I am planning of talking about it there too). It seems we all agree that the user can deal with just one class, BCSequence, and live with it. Providing additional headers for stronger typing was just a future possible addition and I was suggesting some ways it could be done, anticipating some needs, and seeing if these could be fulfilled in a class cluster design. But my feeling (and it seems, yours too) is that such a need will probably never come and should not be fulfilled. It will tend to duplicate code. Charles -- Charles Parnot charles.parnot at stanford.edu Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 -- Charles Parnot charles.parnot at stanford.edu Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From kvddrift at earthlink.net Wed Jan 5 21:26:30 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 5 Jan 2005 21:26:30 -0500 Subject: [Biococoa-dev] (no subject) In-Reply-To: References: Message-ID: <5FCC29DE-5F8A-11D9-8F75-000A95685F72@earthlink.net> On Jan 5, 2005, at 8:06 PM, Charles PARNOT wrote: >>> >>> No, there is no test at the beginning of a method. It is simply >>> coded in the subclass. For example, BCSequenceProtein could override >>> the 'complement' method to return an empty sequence. Actually, this >>> is not such a great example as 'complement' can easily be taken care >>> of by the superclass which would call 'complement' on the BCSymbol >>> objects of the symbolArray. >> >> So, if I understand this correctly, when using class clusters, we >> still need to subclass each method? Not sure if I like that idea now >> :). As you remember, the reason I was trying to convince you guys to >> avoid subclassing, was to prevent code duplication in all the >> subclasses. This easily leads to errors and is difficult to maintain. >> I could of course be wrong, and not understand the class clusters >> completely yet. > > OK, I should have been clearer. As much as possible, we should keep > the implementation in the superclass. And actually, 'complement' is an > excellent example of it and how the abstraction you have set up for > BCSymbol works well. When the case can be handled by the superclass, > we should do it. Most of the BCSequence methods in my mini-project can > be handled at the superclass level, I had a better look at your code, and see that you have actually most methods in BCSequence. Although I think most of them can actually go in the superclass of BCSequence, BCSymbolList. so the class cluster should like: BCSymbolList -> BCSequence -> BCSequenceDNA -> BCSequenceRNA -> BCSequenceProtein -> BCSequenceCodon BCSymbolList is supposed to a 'barebone' sequence class, BCSequence has additional annotations. > I know you already had the discussion with Alex. No matter what you > do, if a type of sequence has to be treated separately, you have to > write two different versions of a particular piece of code. It is > actually easier to separate the two cases in two separate methods for > two different classes in two different files. It still think I see > your point. The only problem with subclass is that it is quite easy > not to realize that you are duplicating code. It is more apparent when > you have a series of if statement in front of your eyes, all with the > same contents. You can also more easily spot the common stuff. Having > subclasses does not force you to duplicate code, it just tends to > happen more frequently. If the two methods are very different, then indeed it makes sense to have them in two different subclasses. My point before was that the methods in the various subclasses were almost identical, so there was no use in duplicating them. > >> >>> Now I have an additional comment on what to do with strongly typed >>> instances, when the user is purposedly using a BCSequenceProtein, >>> has a call to 'complement' and ignores the compiler warning and runs >>> the program. It would then be nice to have run time error (yeah, >>> this is nice!) when calling a method on a strongly typed instance. >>> For this we could have an additional flag 'isTyped' and have the >>> private BCSeqProtein check the value of the flag in the critical >>> methods, and raise an exception if isTyped=YES or call super if =NO. >> >> Again, maybe I don't understand the class clusters yet, but isn't the >> idea to have users only use BCSequence, not BCSequenceXXX? I agree >> with Alex about not using flags, though. We have introduced the >> BCSequenceType and BCSymbolSet (my preference) to identify the type. > > Yes, sorry about the confusion. I was just proposing a possibility in > case the user complains she wants more strongly typed classes. I agree > it goes against the concept of class cluster and its simplicity. I was > just trying to show that even in the context of a class cluster, there > are ways to still propose some more strongly typed headers without too > much efforts, and give the option to the user. It seems Alex and you > don't really ask for that, and I like the simplicity of just one > public interface too. Let's just forget about that or bring the issue > back later. OK. > > >>> >>> To implement mutable objects in the class cluster could be a bit >>> tricky, because there are two conflicting subclass organizations >>> here: mutable/immutable and dna/rna/protein/codon. To get all the >>> combinations, it seems that we need 8 subclasses!! >>> Oops, Koen won't like this, LOL ;-) On the other hand, look at the >>> number of NSNumber subclasses... >> >> No, I don't like that :D, see my comment above. Another possibility >> (also stolen from BioJava) is to make a BCToolsEdit class that takes >> care of editing a immutable class. > > I can see that: then BCToolEdit would havbe to return new instances > (otherwise it is not immutable anymore), but would deal with > optimization better than a dull 'copy'. > BTW, the reason that BioJava uses immutable sequences is: " It is worth noting that many BioJava implementations of Sequence and SymbolList do not allow edit operations as this may invalidate underlying Features or Annotations." That's indeed something to keep in the back of our minds. - Koen. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 5290 bytes Desc: not available URL: From charles.parnot at stanford.edu Thu Jan 6 01:35:48 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Wed, 5 Jan 2005 22:35:48 -0800 Subject: [Biococoa-dev] (no subject) Message-ID: >I had a better look at your code, and see that you have actually >most methods in BCSequence. Although I think most of them can >actually go in the superclass of BCSequence, BCSymbolList. > >so the class cluster should like: > >BCSymbolList -> BCSequence -> BCSequenceDNA > -> BCSequenceRNA > -> BCSequenceProtein > -> BCSequenceCodon > >BCSymbolList is supposed to a 'barebone' sequence class, BCSequence >has additional annotations. Yes, I understand that. I discussed it in the initial super long email (part 7, I believe?!? Even I have some trouble remembering it all...). The reason why I picked BCSequence for the superclass name is because this would become the only public class, and I thought the name BCSequence would be a better public name than BCSymbolList. The problem of course is that it is different from the current implementation, which had also good reason to use these names. So I am actually proposing the names: old name new name -------- --------- BCSequence (no instance variable, only public header) BSSymbolList BCSymbolList BCSequence BCSeq BCSequenceDNA BCSeqDNA with an inheritance tree parallel to the existing one. An alternative is to not have an 'empty' class at the top and go with the following: old name new name -------- --------- BSSymbolList BCSequence BCSequence BCSeq BCSequenceDNA BCSeqDNA We have to talk more about the role of BCSymbolList in the context of a class cluster, though. When creating an instance of a subclass of the class cluster, when should we choose BCSymbolList? In the case when there is no annotation? In this case, we have to make sure that BCSymbolList will respond to all the messages. Once an instance is created, we cannot decide to suddenly make it one of the typed subclass just to be able to handle a message not implemented by BCSymbolList. We could also decide to have BCSymbolList and BCSequence both publics in the existing code. There might be some good reson for that... speaking of which ... in fact, I have a question on how the current implementation is supposed to work. It would be up to the user to decide wether to use BCSymbolList or BCSequence? And this choice would be made depending on what the user wants to later do with that object/sequence? I know I already asked some questions about it, but could you develop a bit more the reasons and the usage. Thanks:-) > >> I know you already had the discussion with Alex. No matter what >>you do, if a type of sequence has to be treated separately, you >>have to write two different versions of a particular piece of code. >>It is actually easier to separate the two cases in two separate >>methods for two different classes in two different files. It still >>think I see your point. The only problem with subclass is that it >>is quite easy not to realize that you are duplicating code. It is >>more apparent when you have a series of if statement in front of >>your eyes, all with the same contents. You can also more easily >>spot the common stuff. Having subclasses does not force you to >>duplicate code, it just tends to happen more frequently. > >If the two methods are very different, then indeed it makes sense to >have them in two different subclasses. My point before was that the >methods in the various subclasses were almost identical, so there >was no use in duplicating them. Sorry for being so 'dogmatic' in the previous email. It is now clear we have a very similar view of the whole thing! We all want to have a minimal number of methods in the subclasses. In a few cases, optimized versions may still be possible with this design. >BTW, the reason that BioJava uses immutable sequences is: " It is >worth noting that many BioJava implementations of Sequence and >SymbolList do not allow edit operations as this may invalidate >underlying Features or Annotations." That's indeed something to keep >in the back of our minds. Clearly, managing annotations on a mutable sequence is a problem. Which also occurs when you cut a piece of a sequence and return a new instance. I can clearly see that a separate class would be in charge of the job. This would be the job of one the BCTool, right? So, now the 1 million dollars question: you did not tell yet what you think of the whole class cluster design and if you like the idea...;-) (I have to say the more I discuss it, the more I like it, just for the simplicity and the level of abstraction it would give to the users of the framework...) Charles Charles Parnot charles.parnot at stanford.edu Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 -- Charles Parnot charles.parnot at stanford.edu Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 -------------- next part -------------- An HTML attachment was scrubbed... URL: From kvddrift at earthlink.net Thu Jan 6 20:29:29 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 6 Jan 2005 20:29:29 -0500 Subject: [Biococoa-dev] (no subject) In-Reply-To: References: Message-ID: <93494E56-604B-11D9-8F75-000A95685F72@earthlink.net> On Jan 6, 2005, at 1:35 AM, Charles PARNOT wrote: > Yes, I understand that. I discussed it in the initial super long email > (part 7, I believe?!? Even I have some trouble remembering it all...). > The reason why I picked BCSequence for the superclass name is because > this would become the only public class, and I thought the name > BCSequence would be a? better public name than BCSymbolList. Yes, I think that would be the best approach. Can we make BCSymbolList a super of the class cluster (just as NSValue is the super of NSNumber)? > The problem of course is that it is different from the current > implementation, which had also good reason to use these names. So I am > actually proposing the names: > > old name?????????????????? new name > --------????????????????? --------- > ??????? ??????? BCSequence (no instance variable, only public header) > BSSymbolList?????????? BCSymbolList > BCSequence????????????? BCSeq > BCSequenceDNA?? BCSeqDNA > > with an inheritance tree parallel to the existing one. An alternative > is to not have an 'empty' class at the top and go with the following: > > old name?????????????????? new name > --------????????????????? --------- > BSSymbolList?????????? BCSequence > BCSequence????????????? BCSeq > BCSequenceDNA?? BCSeqDNA What about using _BCSequence instead of BCSeq? Whith the underscore we know exactly what tyoe of class we're dealing with, BCSeq <-> BCSequence seems more confusing to me. > > We have to talk more about the role of BCSymbolList in the context of > a class cluster, though. When creating an instance of a subclass of > the class cluster, when should we choose BCSymbolList? In the case > when there is no annotation? In this case, we have to make sure that > BCSymbolList will respond to all the messages. Once an instance is > created, we cannot decide to suddenly make it one of the typed > subclass just to be able to handle a message not implemented by > BCSymbolList. > > We could also decide to have BCSymbolList and BCSequence both publics > in the existing code. There might be some good reson for that... > speaking of which ... in fact, I have a question on how the current > implementation is supposed to work. It would be up to the user to > decide wether to use BCSymbolList or BCSequence? And this choice would > be made depending on what the user wants to later do with that > object/sequence? I know I already asked some questions about it, but > could you develop a bit more the reasons and the usage. Thanks:-) Not sure yet how the user should approache this :). In general there should be one class that is always used throughout. Now I think of it, BCSequence is really the best choice for it. Annotations are a nice extension, but not always needed. Maybe we should dump BCSymbolList at all, and only use BCSequence. It can have an empty annotation ivar, which is not very costly. One of the reasons to introduce the BCSymbolList was to avoid the introduction of even more subclasses classes with long, clumsy names such as BCAnnotatedDNASequence. > Sorry for being so 'dogmatic' in the previous email. It is now clear > we have a very similar view of the whole thing! We all want to have a > minimal number of methods in the subclasses. In a few cases, optimized > versions may still be possible with this design. Sounds good, just as long as this is hidden for the user, in other words, in theose special cases, the user should still only use BCSequence, not the specialized subclass. > > BTW, the reason that BioJava uses immutable sequences is: " It is > worth noting that many BioJava implementations of Sequence and > SymbolList do not allow edit operations as this may invalidate > underlying Features or Annotations." That's indeed something to keep > in the back of our minds. Clearly, managing annotations on a mutable > sequence is a problem. Which also occurs when you cut a piece of a > sequence and return a new instance. I can clearly see that a separate > class would be in charge of the job. This would be the job of one the > BCTool, right? I suppose so, I haven't looked into that yet. > > > So, now the 1 million dollars question: you did not tell yet what you > think of the whole class cluster design and if you like the idea...;-) > (I have to say the more I discuss it, the more I like it, just for the > simplicity and the level of abstraction it would give to the users of > the framework...) > I don't know yet. One thing I didn't like that came of the discussion is the possibility for the user to use strong typing. If we can make the class cluster in such a way that the user only sees BCSequence, then I am all for it. In that case I could even be persuaded to have both mutable an immutable versions of each subclass. cheers, - Koen. From peter.schols at bio.kuleuven.ac.be Fri Jan 7 05:27:48 2005 From: peter.schols at bio.kuleuven.ac.be (Peter Schols) Date: Fri, 7 Jan 2005 11:27:48 +0100 Subject: [Biococoa-dev] BCSequence class cluster In-Reply-To: References: Message-ID: Hi Charles, First of all, happy New Year to you too! Thanks a lot for all the work, both the coding and the research you did about the future directions of BioCocoa. I'm a big fan of the Class cluster approach as this keeps the interface very simple. The biggest problem with this approach - as I see it now - is that some BCTools will only work with/on some sequence types. In that respect, I'd prefer your proposal to provide an additional set of headers defining some public classes as placeholders over the protocol approach. The placeholder approach will make/keep code much more readable indeed. It seems that the mutability problem can be solved by either the subclasses or the mutable variant. While the mutable variant will reduce the number of classes, it will make the code in these classes less readable (depending on the number of optimizations we decide to implement). I think something could be said for either solution, I don't really have an opinion about this one. Thanks again for your valuable contribution to BioCocoa! Best wishes, Peter On 05 Jan 2005, at 08:56, Charles PARNOT wrote: > It seems the class cluster possibility has raised some interest. So I > took some time to think it through and write some code. I got carried > away and wrote a lot of it, and also I wrote this long email, but now > you are used to those long emails:-) > > Note that I am just proposing an implementation of a class cluster, > and some solutions to potential pitfalls, but I am not saying that you > should absolutely go with the class cluster design. I am a little > biased in favor of it, but you should really decide if (1) you want to > discuss it further and (2) discuss it further! Note that I mostly say > 'you' when I talk about the developers, but maybe at some point, I > should really start saying 'we' ;-) Anyway, for every sentence you > read below, mentally add at the beginning "I may very well be wrong or > missing something but it seems to me that maybe...". > > > > Like I said before, several of the issues raised here apply to the > existing code and you will have to deal with it at some point. The > main point boils downs to the question of using a weakly typed object > BCSequence vs using strongly objects belonging to one of the > subclasses BCSequenceDNA/RNA/etc... Some of the code is a bit > schizophrenic right now and tries to deal with both cases... The class > cluster would favor the weakly typed route, and would make the design > more consistent and simpler. > > To follow the discussion, you can download a zipped Xcode project with > some real code here: > http://cmgm.stanford.edu/~cparnot/temp/BCSequenceClassCluster.zip > Don't try to compile, it probably won't succeed. It is just easier to > navigate the code in this familiar format. > > OK, so how would a class cluster look like? > > > 1. The user point of view > ---------------------- > > For the user, there is only one class, called BCSequence. Instances > are immutable and can be obtained with a number of factory methods, or > using alloc followed by init methods. These are defined in the only > header file accessible to the user, BCSequence.h (see attached > project). > > From the user point of view, the usage is very simple: just create a > sequence with one of the numerous factory or init methods, including > reading from files. The instance you get back is immutable, but you > can create new instances from it by removing/adding pieces, or > transforming it to another type. You can always check the type and > length, get the sequence back into a string or array of symbols. You > can feed tools with that BCSequence instance and get the results, > potentially getting back other instances of BCSequence. > > There are 2 things the user could complain about: > a- Some of the methods are only relevant for certain sequence types > b- Sequence objects are immutable > > About complaint (a) > In the header file BCSequence.h of the attached project, there are 2 > methods that are only relevant to a subset of the BCSequence type: > -complement and -reverseComplement. This is not a really big concern > at this point, because this is just 2 methods and it is quite easy to > return something for all cases (for a protein, probably just return > itself). But more methods in BCSequence or in the BCTools could give > the same issues. For instance, BCToolDigest. That would only have > sense on a DNA sequence when using restriction enzymes. > The class BCSequence would always return something, empty sequences in > the worst case, leaving the troubles to the runtime. This is the only > appropriate way to handle it with the class cluster design, maybe > together with some error codes/handling mechanism. > But the user may want to be more specific about the BCSequence type > and get some compiler warnings when appropriate, instead of leaving it > to the runtime. The user might be ready to give up the simplicity of a > unique class and use more specific types. This is the issue of weak vs > strong typing, which relates to the issue of compiler vs runtime > errors/warnings. > One possible answer is to say to the user: this is the way it is, just > accept it!! And I believe as a first version, it is really OK. But > there are also some ways to give the user the possibility to choose > between strong and weak typing and keep the class cluster design, that > I will explain later, below. > > About complaint (b) > I thought of enforcing immutability as a starting point, as this is > easier on the developer side to deal with immutable objects. Giving > the option of immutability to the user is anyway a good thing, as it > allows a number of optimizations, that could really pay off in a real > application with lots of copying, ref passing,... > Of course, it is nice to also have mutable objects. I will address > that on the developer point of view (see below). Note that ultimately, > one thing would probably always be immutable: the sequence type. > > > 2. Implementing the class cluster > ------------------------------ > > The class cluster that I implement in the attached project looks very > much like what you have already done. There is a superclass > BCSequence, and then subclasses, BCSequenceDNA, > BCSequenceRNA,...etc... plus a new special subclass BCSequenceFactory. > Now the purpose of a class cluster is that the user just does > everything using the public interface for BCSequence, and as far as > the user is concerned, every object is an instance of BCSequence. But > inside the hood, you actually return instances of one of the > subclasses so that some operations can be optimized for the particular > type of sequence you are dealing with. > > The problem for the developer of a class cluster is that you know > which subclass to use only once you call one of the init methods, but > you still have to do the 'alloc' before the init. There is no way > BCSequence will know what subclass it should use at the time 'alloc' > is called. So the trick is to alloc a temporary instance of a > particular subclass, a 'placeholder' class. Look at the implementation > of 'alloc' in BCSequence.m. What this method returns is actually an > instance of BCSequenceFactory when called on the superclass (when > called on one of the subclass, though, it just passes the message up > to NSObject). The bottom line is: you never create an instance of > BCSequence, but an instance of BCSequenceFactory (you still alloc > instances of BCSequence subclasses, of course). In fact, that > BCSequenceFactory instance could be a singleton and never deallocated > if we changed the code a little bit. > > Then when one of the init method is called on that new > BCSequenceFactory instance. This method actually allocs and inits a > new object, an instance of the appropriate subclass. It then releases > self and returns a pointer to the new object created. Because she > should always use the value returned by init to set your pointers, the > user will get the right object in the end. > > To summarize, what happens when the user runs the following command: > BCSequence *mySeq = [[BCSequence alloc] initWithDNAString:aString]; > > You have the following happening > * [BCSequence alloc] returns an instance of BCSequenceFactory > * the message initWithDNAString:aString is sent to the > BCSequenceFactory instance > * in the method, a second object is created by calling > finalObject=[[BCSequenceDNA alloc] initWithString:aString] > * then the method calls [self release] to destroy the original > BCSequenceFactory instance > * then the method returns the finalObject > * so now mySeq=final Object and is an instance of BCSequenceDNA > > You get the same process when the user calls: > BCSequence *mySeq = [[BCSequence alloc] initWithString:aString]; > except BCSequenceFactory first figures out to what subclass it should > send the 'initWithString' message (using the same code as the original > BCFactorySequence). > > Then all the other methods are just convenience methods calling these > building blocks. > > Like for any superclass/subclass pattern, it is important to define > what methods the subclasses should, may or should not override, and I > have a summary of that in the attached project. It is very similar to > what you have already done. > > > 3. Pros and cons > --------------- > > What are the potential pitfalls and limitations: > (a) how to still provide the user with some more static typing when > she wants more control over it? This is complaint (a) of part (1) > above. > (b) how to provide mutable/immutable versions? This is complaint (b) > of part (1) above. > (c) the class cluster assumes all the methods can be called on all the > subclasses. Will that always be relevant? The case of 'complement' is > already a bit troublesome, and how about even worse cases, like > 'digestWithRestrictionEnzyme:'. It does not make any sense for a > protein, does it? The question is really: how does that fit with the > BCTools? Could problem arise as we define more and more tools? Will it > be that easy to add more private subclasses without breaking the > existing code? > (d) What about the recent developments: does BCSymbolList fit in the > picture? how do you add the annotation stuff to that? > > I have answers to all of these, and I will come back to these > different points below, in other parts of my email. And there might be > other pitfalls I don't see yet. > > But first, while writing the code and thinking about the whole > concept, I also realized the potential benefits of a class cluster, > and there are more than what I anticipated. Some of these benefits are > really the benefits you get from OO, but are even more apparent with > such a simple interface where things are even more encapsulated > because it is almost like you have just one class: > * super simple interface for the user; she also gets the benefit of > polymorphism without the need to know the existence of all the > subclasses; > * because the public interface is reduced, the developer can make > plenty of changes without breaking existing code developed by the user > * in particular, it allows the addition of new types of sequences or > optimized subclasses for particular uses, that may in most cases > already work with the code developed by the user; so the user can get > new functionality for free > * the same is true for code developed by the developers of the > framework: > - developers can work on other parts of the framework without knowing > too much about the guts of BCSequence > - by relying on just one class for interactions between the different > pieces of BioCocoa, it simplifies the development and minimize > disruptions as modifications are made to BCSequence > > I remember in the discussions, there was some disagreement about > having subclasses (Alex's choice) or just one class which would decide > what to do depending on the symbolSet used (Koen's choice); maybe a > class cluster is a way to have many of the benefits of the 2 systems > without too many of the problems. > More about pros and cons of class cluster on the Apple web site: > http://developer.apple.com/documentation/Cocoa/Conceptual/ > CocoaObjects/Articles/ClassClusters.html > > For me, the bottom line is still unclear. At present, I feel that a > class cluster would work really well. But we have to anticipate now > all the potential problems, and we should decide if it is worth it. > > > > 4. Compile vs runtime errors > -------------------------- > This is a discussion about complaint (a) of part (1) and pitfall (a) > of part (3). What if the user wants more control over the type of > sequence it is using and want some compiler warnings when trying to > cut a protein with EcoRI, or get its complementary sequence? > > At this point, the class cluster does not allow that. All the methods > are valid for all the sequence types. In this context, an invalid call > will only be revealed at runtime, and a BCProtein object would have to > decide at runtime to return something when sent an irrelevant message. > What should it send back? This issue is actually slightly different > from the discussion here and is discussed in part 6 (sorry this whole > email is quite large and complicated; I am trying to keep it > readable!). The question here is really: can we prevent that from even > happening when the user knows what type of sequence she is dealing > with and could get compiler warnings? > > One way to help with that is to provide an additional set of headers > defining some public classes named BCSequenceDNA, BCSequenceRNA,.... > These classes would just be placeholders, and would be completely > disctint from the subclasses of BCSequence (I will come back to the > name conflict). They would have some init methods, but when the user > uses these classes and alloc/init an instance, she would get in fact > one of the BCSequence subclasses. The compiler would not know and > would trust the headers to generate warning. For instance, the header > for the BCSequenceProtein placeholder class would not define the > methods 'complement' or 'cutWithRestrictionEnzyme:', and you would get > a compiler warning even though the object would in fact respond to the > methods at runtime (but would have to return some dummy values). So > these headers would really define completely virtual classes. One of > the problem is the names of these placeholder classes conflict with > the names of the BCSequence private subclasses that are defined in the > project I sent. We could rename the latter to BCSeqDNA/RNA/... for > example, and keep the nice full names 'BCSequenceDNA/RNA/...' for the > placeholder public classes. > > An alternative is to define protocols, and so the user would have to > use (id ) in the code. The BCSequence would provide > methods to return objects typed this way. It is a bit of a pain to > type id all the time and reduces readability, though. > > So there are ways to solve the problem. Note that the problem is not > really tied to the class cluster implementation and is already partly > a problem that the current code is facing, as I talked about at the > very beginning of the email (OK, now is a good time to reread > everything!!). > > Of course, the interface then becomes a bit schizophrenic, so it may > not be such a good idea to allow all of that. At least in the > beginning, there may be not such a high need for stronger typing, and > this goes a bit against the whole idea of a simple interface and a > class cluster. > > > > 5. Mutable and immutable instances > -------------------------------- > This is a discussion about complaint (b) of part (1) and pitfall (b) > of part (3). > > Why impose immutable objects? Not sure. > This is not something I had thought of at first, but it is anyway an > important issue that goes beyond the idea of class cluster. Immutable > objects allows very important and basic optimizations, particularly > when copying objects, and are sufficient for most uses. A smart user > will use immutable objects whenever it can and will only go to mutable > objects if really necessary. This is something we may have to think > about for the BioCocoa project anyway. I am not saying it is > absolutely necessary but it should be discussed (and maybe it has > been??). > > To implement mutable objects in the class cluster could be a bit > tricky, because there are two conflicting subclass organizations here: > mutable/immutable and dna/rna/protein/codon. To get all the > combinations, it seems that we need 8 subclasses!! > > I am not completely sure how to deal with it, or if we should deal > with it or just give up and stick to mutable only. One possibility is > to not have distinct subclasses for mutable/immutable. Instead, there > could be simply a BOOL flag 'isMutable' as one of the instance > variables. The object would then return different results in key > methods such as 'copy' depending on the value of the flag. Also, at > creation, it would create mutable or immutable instance variables > (NSArray or NSMutableArray) depending on the value of that flag. It is > OK to declare a mutable object as the instance variable and then > actually use it to allocate an immutable object, as long as we are > consistent in the methods called to avoid runtime errors (and we > should use some casts to avoid compiler warnings). > > > 6. Potential clashes in the future > -------------------------------- > This is a discussion about pitfall (c) of part (3). > The problem is: will the class cluster ever become a problem in the > future and force us to rewrite everything and lose our sleep? > The short answer is: I don't know! > > I guess any pattern can get in the way in some unpredicted way at some > unpredicted point in the future. We can try to anticipate those > issues. In the case of the class cluster, some of the questions to > answer are obviously: how do we deal with irrelevant messages sent to > inappropriate subclasses, such as sending 'complement' to a > BCSequenceProtein? how frequent these messages will be? how do we deal > with new sequence types that could be introduced later? how frequently > will new sequence types be needed? > > The answer to that is to list as much as we can all the methods that > would have to go in the final implementation of BCSequence and see how > the current sequence types could deal with it. Also, we would have to > think about what other types of sequences could be added in the future > (which could be inspired by other BioX projects) and hope that a > future BCSequenceExtraterrestrial won't break everything. This may > have already been discussed earlier on the mailing list? > > Some examples of how to deal with irrelevant methods: > * complement of a protein: return the same sequence; return an empty > sequence; return nil?? > * cut a protein with EcoRI: OK, this is easy, you just get the same > protein!! Or do you get the sequence of the EcoRI protein!!! > * etc... > > The existing code will have to deal with this anyway. When I look at > the present code, I see you can return BCSequence objects without > knowing the type, as returned by 'sequenceWithString:' in the > BCSequenceFactory class. And then, this is allowed to get in the > BCToolComplement with the method 'complementToolWithSequence:'. What > if the BCSequence created is a protein? The abstraction that you did > encode in BCSymbol already allows you to deal with it, you did a great > job! > > > 7. Full incorporation of the present implementation > ---------------------------------------------- > This is a discussion about pitfall (d) of part (3). > > The implementation I attached to the email is quite basic and could be > further refined to incorporate the features and organization of the > current implementation and the short-term planned additions. The > current class tree can probably be used as is. One problem is the > name BCSequence would be taken for the superclass; this is probably > the name that should be public. Then we could have the following: > * BCSymbolList = subclass of BCSequence > * BCSeq = subclass of BCSymbolList with annotations > * BCSeqDNA, BCSeqRNA, etc... = subclasses of BCSeq with optimized > methods for the different types of sequences > > The additional benefit is that the instance variables would not even > be in the public header anymore, but in the subclass BCSymbolList (and > BCSequenceFactory would then be even lighter, with no instance > variable at all). An alternative is to decide that BCSymbolList would > actually be BCSequence, and the annotated BCSequence would become > BCSeq. > > It is thus mostly a problem of naming, which is somewhat secondary, > but is still quite important because it would be here to stay and has > to be easy to remember and logical... > > An additional problem is that if you instantiate BCSymbolList (in the > case of non-annotated sequences), you want to make sure that it can > handle ALL the messages declared in the header. It is not clear to me > yet that it can do it. > > > 8. Happy new year! > ------------------ > > ... and thanks for reading this up to that point! > > Charles > > > > -- > Charles Parnot > charles.parnot at stanford.edu > > Help science go fast forward: > http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ > > Room B157 in Beckman Center > 279, Campus Drive > Stanford University > Stanford, CA 94305 (USA) > > Tel +1 650 725 7754 > Fax +1 650 725 8021 > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > From jtimmer at bellatlantic.net Fri Jan 7 11:09:03 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Fri, 07 Jan 2005 11:09:03 -0500 Subject: [Biococoa-dev] BCSequence class cluster In-Reply-To: Message-ID: Hi all - Let me add my happy new year wishes to everyone as well. I've followed this discussion as carefully as I could, given a ton of scientific writing I've got to do (plus a side programming project related to the digital camera I got for christmas). I think it might be useful for me to try to summarize what I think I understand and add a comment or two, and then let you guys tell me where I'm wrong - For the purposes of discussion, I'm going to use "users" to mean "developers using the framework, but not working on its implementation" and "developers" to mean us. A class cluster is a potentially good thing, in that it would hide some of the complexity of the implementation from users. It doesn't necessarily make Koen happier, since we may well have all the current classes used internally. As far as I can tell, though, nothing short of me getting around to implementing BCSequenceNucleotide would solve Koen's biggest gripe, the duplication of three methods in the DNA and RNA subclasses. We could still make Alex and I happy by defining protocols for sequence sub-types, and use them in the methods that act on specific sequence types, like complementation and hydrophobicity. This way, errors can still be thrown at compile time, instead of taking the app down at runtime. None of us have ever done this, so we'd kindof be making it up as we go along. Is that about right? Now, for some comments: My wife's out of town this weekend and I'm sick of reading my own writing, so I may actually implement BCSequenceNucleotide on Sunday. Since only the headers are going to be clearly visible to users, could we make sure that there are comments in the header that indicate if a method is a convenience call through to a different class and, if so, what class to find the critical method in. Now, for the important comment: I think for major design issues like this, we could do worse than look at how Apple implements things. I went to a Tiger Tech Talk recently, and it's clear that the folks working there put a tremendous amount of thought into design, and in some cases plan designs to account for technology that's over a year away from being implemented. Given that, it may be useful to look at where Apple uses class clusters. As far as I can tell, they use them only in situations where there are multiple Core Foundation objects that they want to provide a single, simple object oriented wrapper that hides the implementations from ObjC users. The primary advantage of this seems to be that it allows the class to swap the CF objects that hold the actual data as needed without the user being bothered with the implementation. I don't think that we have an analogous situation here, though I'm not positive about that. Given that, I think we should proceed with caution, and perhaps ask Apple's Cocoa Dev list for their opinions on when to use a class cluster. Okay, better go check my plates - Cheers, John _______________________________________________ This mind intentionally left blank From charles.parnot at stanford.edu Fri Jan 7 13:50:27 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Fri, 7 Jan 2005 10:50:27 -0800 Subject: [Biococoa-dev] BCSequence class cluster was:(no subject) In-Reply-To: <93494E56-604B-11D9-8F75-000A95685F72@earthlink.net> References: <93494E56-604B-11D9-8F75-000A95685F72@earthlink.net> Message-ID: >What about using _BCSequence instead of BCSeq? Whith the underscore we know exactly what tyoe of class we're dealing with, BCSeq <-> BCSequence seems more confusing to me. That indeed may be a better naming scheme, though I have never seen that applied anywhere (a bit unorthodox?). It seems that ivar starting with undescore are supposedly reserved for Apple now, but these would be class names, not ivars, so technically, we would not breaking the rules! >>Yes, I understand that. I discussed it in the initial super long email (part 7, I believe?!? Even I have some trouble remembering it all...). The reason why I picked BCSequence for the superclass name is because this would become the only public class, and I thought the name BCSequence would be a better public name than BCSymbolList. > >Yes, I think that would be the best approach. Can we make BCSymbolList a super of the class cluster (just as NSValue is the super of NSNumber)? Yes, we can certainly do that, and it seems it would be the only way to keep BCSymbol around. It may not be used very much... actually just as NSValue is not used very much. The purpose of NSValue is mostly to allow further subclassing in future implementations (this is just a guess, Apple is not very clear on that). I don't know if there would be such uses, but it is worth the discussion. I also agree with the discussion below and we may end up dumping BCSymbolList anyway! (as least as a concrete subclass) (about the purpose of BCSymbolList) >Not sure yet how the user should approache this :). In general there should be one class that is always used throughout. Now I think of it, BCSequence is really the best choice for it. Annotations are a nice extension, but not always needed. Maybe we should dump BCSymbolList at all, and only use BCSequence. It can have an empty annotation ivar, which is not very costly. One of the reasons to introduce the BCSymbolList was to avoid the introduction of even more subclasses classes with long, clumsy names such as BCAnnotatedDNASequence. OK, so it seems the main purpose was to provide a light-weight version for non-annotated sequence. It seems at some point, when adding annotations, the team decided to subclass BCSequence into BCAnnotatedSequence. The renaming then was a way to avoid long class names (that I remember now from reading the archives) . I totally agree that an ivar for annotation is not going to use much memory if nil (4 bytes) or empty (4 bytes + sizeof(ivars of NSArray)), particularly when you already have an NSArray for the sequenceArray that should contain in 99% cases at least 10 items (peptides), and most of the time many more (think plasmids!). So "light-weight" class is actually probably wrong. However, I could see the case for code separation, the superclass without annotation taking care of the non-annotation methods, and then the annotated subclass taking care of the annotation-related methods. In this case, the current BCSymbolList could be an abstract class and remain hidden in the class cluster as a private subclass, at the top of the class hierarchy, just below the pulic class (or it could BE the public class; this goes back to my previous emails with the different propositions for class hierarchies). We end up with one public class being able to do everything, and all the concrete private subclasses capable of handling annotations. If the purpose is to separate the code for annotations, an alternative is to use a category, and not a subclass. And I think I like that better, actually. My feeling after all these discussions about annotations is that we should be able to make annotations a general and abstract concept that encompass all the different types of sequences, which means most of the code could go in the superclass. Just like you already did with BCSymbol being able to take most of the details away from BCSequence.... OK, I already said all of that before in previous emails, I know;-) >>Sorry for being so 'dogmatic' in the previous email. It is now clear we have a very similar view of the whole thing! We all want to have a minimal number of methods in the subclasses. In a few cases, optimized versions may still be possible with this design. > >Sounds good, just as long as this is hidden for the user, in other words, in those special cases, the user should still only use BCSequence, not the specialized subclass. Then we agree totally! :-) >>So, now the 1 million dollars question: you did not tell yet what you think of the whole class cluster design and if you like the idea...;-) >>(I have to say the more I discuss it, the more I like it, just for the simplicity and the level of abstraction it would give to the users of the framework...) >> > >I don't know yet. One thing I didn't like that came of the discussion is the possibility for the user to use strong typing. If we can make the class cluster in such a way that the user only sees BCSequence, then I am all for it. In that case I could even be persuaded to have both mutable an immutable versions of each subclass. I don't like strong typing either. As I said before, Alex does not seem to like it either. It also seems you don't and it may have driven your argument against subclasses, you seem to prefer general methods that do something (including run-time errors?) on all types of sequences, even if not always relevant. The class cluster design is then a good fit. Peter still thinks there is a case in favor of stronger typing to restrict the irrelevant tasks as much as possible. I will discuss it in a separate email. This one is long enough! Charles -- Charles Parnot charles.parnot at stanford.edu Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From charles.parnot at stanford.edu Fri Jan 7 17:20:21 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Fri, 7 Jan 2005 14:20:21 -0800 Subject: [Biococoa-dev] BCSequence class cluster In-Reply-To: References: Message-ID: >Hi Charles, > >First of all, happy New Year to you too! > >Thanks a lot for all the work, both the coding and the research you did about the future directions of BioCocoa. I'm a big fan of the Class cluster approach as this keeps the interface very simple. The biggest problem with this approach - as I see it now - is that some BCTools will only work with/on some sequence types. In that respect, I'd prefer your proposal to provide an additional set of headers defining some public classes as placeholders over the protocol approach. The placeholder approach will make/keep code much more readable indeed. OK, let's try to discuss the strong/weak typing issue in more details . I will repeat a lot of what we all discussed before... but hopefully put it in a broader context. I don't know yet where this whole discussion will get me (and us), so many ideas/problems/designs are all coming to mind. First, if we go back again to the BCSequence, we have 2 options, regardless of class cluster: * we only provide the user with one public class, either with a class cluster or a general class that handle the different cases; how to handle the irrelevant cases is debatable: bluntly (returning nil), silently (returning self or empty objects or even [NSNull null]), with a Biococoa-specific error reporting system (which could be useful anyway at some point in the future), or with runtime errors (this would not be very nice!) * you provide public headers for all the different types of sequence; and you don't let the user get a sequence object BCSequence, without a strong type, like you get with the method [BCSequence sequenceWithString:aString]; so you provide the user with all the objects BCSequenceDNA, BCSequenceRNA, BCSequenceCodon, BCSequenceProtein, and the user chooses which one to use; this gives more work to the user, but also more control and thus potentially more flexibility (and BCSequenceNucleotide could fit in the picture too) The choice depends on 2 things: * what the user wants; this is difficult; we are all potential users, I suppose; maybe one hint is that the user of BioCocoa would also be a user of Cocoa, and would be used to simple interfaces where she does not have to handle the details and can get results in 2 lines of code * what the developers can do and how much work it requires; my feeling is in both designs, there will have to be some compromises, and not everything will be perfect; but it seems both options are feasible, the class cluster potentially requiring more careful planning (but once established, not more complicated than other approaches, or maybe even simpler); the amount of code would be roughly equivalent in both cases; the simple interface would make code maintainance easier (at least in terms of testing) and would improve backward compatibility; on the other hand, the more complex public interface would male our life easier when we want to provide more radical extensions Again, the reason why I came up with the idea of some public headers for placeholder classes for typed sequences was to propose the user BOTH OPTIONS! (but maybe we should not). Regardless of the design, some kind of trick is necessary, because you cannot allow BCSequence to respond to all the messages, and then prevent some of the subclasses to respond to some of the messages, and still have some compiler warnings. If you already have a class cluster design in place, having some public headers for placeholder classes is one possibility, or adding some formal protocols (Alex, Peter and I find that a bit hard for the user). But actually, if you think more about it, maybe it could also be the other way around. You would have an abstract superclass (BCSequenceAbstract) and some public sequence-specific subclasses (BCSequenceDNA,...) and then provide an additional generic subclass BCSequence that would respond to all messages in his header. One possible implementation for that subclass is to have just one ivar = an instance of one of the subclass, and only implement -forwardInvocation to handle the messages. So, depending what initial option you favor, the addition of the other option is always possible, but the final result is different... OK, so we have two choices for the interface. There might be ways to provide both choices to the user, but we probably would not want to do that as a first version and we have to choose anyway. Now, to go beyond the BCSequence implementation, you raise the issue of the BCTools implementation. And it strikes me now that they are very much interrelated and that there are other design issues with BCTools and that they closely relate to BCSequence. We have now another choice to make: * to perform an operation on a BCSequence, the user has to use one of the BCTools; the simple BCTools might provide some convenience class methods like + (BCSequence *)complementForSequence:(BCSequence *)aSequence but more complex tools will be used by alloc/init and then settings some parameters with some accessors methods and then calling a 'result' method on the tool; the interface to the BCTools is public * to perform an operation on a BCSequence, the user has to use a BCSequence method, such as 'complement', or 'cutWithEnzyme:' or 'weigh' or ...; and the user does not even have to know that BCTools exist! The latter gives a very simple interface, all within BCSequence. However, while the concept of sequence is abstract enough to be put in one class, I don't think the concept of tools or operation on a sequence is simple enough to have all the interface all fit in the context of BCSequence: - the tools have very different levels of complexity, from 'reverse' to 'align' - some tools might use objects others than BCSequence - some operations might require to set many parameters, some optional, some required (like alignements); that could be hard to fit in just one method of BCSequence (though, obviously, NSDictionaries would help to pass a bunch of arguments at once) - batch processing on an array of sequences will not be possible in such a design (and we would lose potential optimizations); with BCTool, you can just set all the parameters and then run it on several sequences at once - another point is that BCSequence interface might look bloated, though it is not necessarily an issue and there are precedents: just look at NSWindow! So my impression is that the BCTools interface will have to be public to a certain extent (and for simple, obvious tools like translation, some convenience methods will be included in the BCSequence interface, the job being done really by a BCTool). Now, the user will have to provide the BCTool with a BCSequence. Depending on the design chosen for BCSequence (one public class or several typed subclasses), the interface for the BCTools will be quite different, and the implementation as well: * with just one BCSequence, there is only the need for one 'init' method, namely 'initWithSequence:'; the code may have to decide what to do depending on the 'sequenceType' (the good thing is BCSequence does not have to decide, so if a convenience method is provided in BCSequence, it can be at the level of the superclass). So there might be some 'if' and 'case' statement involved here. In certain cases, it might get very difficult to stick to a simple general tool able to handle all sequence types with just one class. Such an example is alignement. A BCToolAlignement would be very different for a protein and a DNA. We then may have to provide two tools in two separate classes and have more stringent rules (the user will have to be more careful then), but we don't have to give up the simplicity of the BCSequence, I think. Anyway, alignements are really a very elaborate thing that may fall out of the BCTool paradigm. *with several BCSequence public subclasses, we would have to enforce typing for the 'init' methods of the tools, with some 'initWithDNASequence' and equivalent. With tools able to handle several sequence types, we could need several init methods, one for each sequence type. A problem also is the output: the type will depend on what was entered. For instance, a BCToolComplement could be fed a BCSequenceDNA or a BCSequenceRNA with the 'initWithDNASequence' or 'initWithRNASequence' method, but then we would need a 'complementDNASequence' and a 'complementRNASequence' method to retrieve the result to keep the strong typing in the user code (sorry the BCToolComplement is a dull example, as it can be and is handled by BCSequence; also, a simple possibility here is to use a BCSequenceNucleotide, as suggested by John; there might no be so many cases where the problem would arise, I am not sure). *what you are saying, Peter (finally I comment on your comment!), is that to solve these dilemna, we implement a class cluster together with some typed classes that will be only used in certain tools; I did not mean to go that far in the discussion when I started that email, but these issues will have to be debated, will have some impact on the design of BCSequence, and some design decisions will have to be taken for the BCTools as well. >It seems that the mutability problem can be solved by either the subclasses or the mutable variant. While the mutable variant will reduce the number of classes, it will make the code in these classes less readable (depending on the number of optimizations we decide to implement). I think something could be said for either solution, I don't really have an opinion about this one. OK, it seems the mutability/immutability issue could and should be put on the backburner for now. The code may still have to provide an interface now to allow future implementations to take place later, like I explained in my recent email in response to Alex. Charles -- Charles Parnot charles.parnot at stanford.edu Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From charles.parnot at stanford.edu Fri Jan 7 19:23:35 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Fri, 7 Jan 2005 16:23:35 -0800 Subject: [Biococoa-dev] BCSequence class cluster In-Reply-To: References: Message-ID: >Hi all - > >Let me add my happy new year wishes to everyone as well. I've followed this >discussion as carefully as I could, given a ton of scientific writing I've >got to do (plus a side programming project related to the digital camera I >got for christmas). I think it might be useful for me to try to summarize >what I think I understand and add a comment or two, and then let you guys >tell me where I'm wrong - > >For the purposes of discussion, I'm going to use "users" to mean "developers >using the framework, but not working on its implementation" and "developers" >to mean us. Yes, this is a very simple and useful convention! Sorry I will answer the email in a different order and answer the final comment first (mostly to justify the length of my emails today) >Now, for the important comment: I think for major design issues like this, >we could do worse than look at how Apple implements things. I went to a >Tiger Tech Talk recently, and it's clear that the folks working there put a >tremendous amount of thought into design, and in some cases plan designs to >account for technology that's over a year away from being implemented. > >Given that, it may be useful to look at where Apple uses class clusters. As >far as I can tell, they use them only in situations where there are multiple >Core Foundation objects that they want to provide a single, simple object >oriented wrapper that hides the implementations from ObjC users. The >primary advantage of this seems to be that it allows the class to swap the >CF objects that hold the actual data as needed without the user being >bothered with the implementation. > >I don't think that we have an analogous situation here, though I'm not >positive about that. Given that, I think we should proceed with caution, >and perhaps ask Apple's Cocoa Dev list for their opinions on when to use a >class cluster. I totally agree that such a big design decision should be carefully questioned and then carefully planned, be it class clusters or not. It seems to me that with the core of BCSequence in place, the arrival of annotations and of BCTools, some design decisions have to be taken anyway. I did raise an additional sets of questions and pointed to some issues, coming after many of the discussions you had before. It seems you already had to take some decisions in the past, and more are coming! All these discussions are useful (!), even though right now, it seems to be all theoretical and no decision has been taken. In fact, one could get the feeeling that such discussions could go forever, and stall the project. I agree with you that the current design decisions could be critical for the future, so one or two weeks of discussions are no big deal. This way, we can see where the questions are, and get a sense of the priorities before coming up with a roadmap, and also a set of yes/no design questions to take a decision on. And then all vote, and then Peter makes the decision.. I am just guessing at the current process for decision taking ;-) Like you say, we have to foresee all the reasonable possibilities. We have one great tool, which is to look at other implementations of the various BioX efforts. It does not mean we have to copy the design (though part of it can probably ripped off). The language is different, of course, and the possibilities it offers, and the habits that come with it. But also the usage is different. Cocoa users are used to some designs and have different expectations than perl users. The latter write quick, flexible small scripts, that they may very well dump after 2 weeks. The Cocoa developer wants to write a long-lasting application, starting with a simple layout, where more details can be added later. What the other BioX efforts can show us, is what kind of tools can exist, what type of sequence can exist, what kind of annotations, and how they all play together. I personnaly don't have much knowledge of the other BioX efforts. How about you guys? Regarding class cluster. To quote Apple, 'The grouping of classes in this way simplifies the publicly visible architecture of an object-oriented framework without reducing its functional richness'. I did not know about the CF types so much, I had just read about class clusters as a way to provide optimized codes adapted to the size of the object (or at least have the possibility to do so at some point). This is true of NSData, NSString, NSArray and NSDictionary, that may vary a lot in size. NSNumber is a quite different story and is probably closer in spirit to what a BCSequence class cluster would be. So that could be a better reference. Asking the cocoadev mailing list (or a discussion on cocoadev.com) is a very good idea. I thought about it at some point. Ultimately, because WE are the biologists, only us can decide wether dna and protein sequences are as close to each other as, eg int and float (cf NSNumber). To go further, let 's go very far (or not so far) in the future. What could BioCocoa be? For me, it seems it could do to sequences what the WebView does with web pages. Thanks to the simplicity and power of WebView, with two lines of code (or even just a few links in a nib), you get a web browser. Imagine the same with BioCocoa. A nib with a BCSequenceView in a window, and a few menu items like 'complement', 'reverse', etc... Then a few lines of code in a controller would allow the user to load a sequence from file, choose complement in the menu and get a new window with the complement. The developer of that app (I can't call him the user anymore, sorry!) would not have to know which type of sequence the view is dealing with. So it would just forward the 'complement' calls to the sequence in the view and pop up a new view with the returned sequence, no question asked. What if the user of the app opens a protein sequence, and chooses complement in the menu. What should the user of the app expect? Well, the user should not be surprised to get the same sequence back, or some empty window, or nothing. The developer of BioCocoa are not to blame, the developer of the app is not to blame, the user of the app can only blame himself for that and if he does not understand what is happening, he should probably not use that app!! In the meantime, BioCocoa has made the life of the developer of the app very easy; it took less than an hour to build a good-looking app; and should more types of sequence be added in the framework, no need for any change in the code. I love that story :-) ******* now the other stuff... >A class cluster is a potentially good thing, in that it would hide some of >the complexity of the implementation from users. It doesn't necessarily >make Koen happier, since we may well have all the current classes used >internally. As far as I can tell, though, nothing short of me getting >around to implementing BCSequenceNucleotide would solve Koen's biggest >gripe, the duplication of three methods in the DNA and RNA subclasses. I agree that BCSequenceNucleotide could help at some point, no matter what design is chosen. >Since only the headers are going to be clearly visible to users, could we >make sure that there are comments in the header that indicate if a method is >a convenience call through to a different class and, if so, what class to >find the critical method in. Yes, this is true for any class hierarchy you are building, but probably even more critical for a class cluster. >None of us have ever done this, so we'd kindof be making it up as we go >along. > >Is that about right? yep, looks like it! Actually, I have implemented a class cluster in some of my code. In my (limited) experience, it is a nice design and of very simple use once you get past the conceptual alloc/init trick. It is also a great example of how flexible you can get with typing. I knew that the compiler warnings were just syntax sugar (I hope this is the appropriate expression), but the possible manipulations at runtime are infinite. For instance, in the class cluster design, there is absolutely nothing that prevents you from returning instances of objects that are NOT subclasses of the public class. The object returned could be from a completely unrelated class; at runtime, they are all ids. Of course, you can easily shoot yourself in the foot too... >We could still make Alex and I happy by defining protocols for sequence >sub-types, and use them in the methods that act on specific sequence types, >like complementation and hydrophobicity. This way, errors can still be >thrown at compile time, instead of taking the app down at runtime. Protocols are a possibility, definitely. If there are not too many cases where this is needed, it is a possibility. What I had in mind when proposing protocols to handle strong typing and not giving away the single interface was something like: //public superclass of the class cluster @interface BCSequence {} +(BCSequence *)sequenceWithString:(NSString *)aString; -(BCSequence *)complement; -(NSNumber *)hydrophobicity; @end //these methods shuold be implemented but not put in a public header //these methods should be implemented ONLY in the relevant private subclasses //so that a runtime error is generated when called on the wrong type @interface BCSequence {} -(BCSequence )dnaComplement; -(NSNumber *)proteinHydrophobicity; @end //a category to provide strong typing //the instance returned will be BCSequence at runtime // but the compiler does not know @interface BCSequence (BCSequenceStrongTyping) +(id )dnaSequenceWithString:(NSString *)aString; +(id )proteinSequenceWithString:(NSString *)aString; @end //the method will actually return a BCSequence at runtime //but the compiler does not know @protocol BCSequenceDNA -(id )complement; @end @protocol BCSequenceProtein -(NSNumber *)hydrophobicity; @end ... NSNumber h; BCSequence *seq; id *dna; //when it is a BCSequence, dna can use the generic methods //and always get something back seq=[BCSequence sequenceWithString:@"ATGCTAGACGAAT"]; seq=[seq complement]; //h is now nil, or [NSNull null], or NSNumber=0 (?)... or runtime error? //and no compilation warning h=[seq hydrophobicity]; //now strong typing dna=[BCSequence dnaSequenceWithString:@"ATGCTAGACGAAT"]; dna=[dna complement]; //compilation warning h=[dna hydrophobicity]; ... I tried several alternatives and it was not easy to choose the right way of doing things. Is that what you were thinking of? It seems you suggest to even remove 'complement' and 'hydrophobicity' from the header of BCSequence. I just realized that then there is no need to hide the subclasses and build the artificial protocol thing. So I suppose you want to keep BCSequence header with all the methods. In that case throwing the app down is not a good answer, and the results should always be something not too stupid. I just thought of an analogy to throw in the discussion. NSString has the path methods like 'stringByAppendingPathExtension'. These will work on ANY string, even if they are not path, but the contents of this email. However, it does not make sense. Do we get a compiler warning? no. Do we get a runtime error? no. Are we in trouble? yes. What the f... this string is doing here when I should have a path??? This is clearly the fault of the user, here, not of the guy who designed NSString. OK, this is really a much simpler situation than ours, but still. Ok, enough blabla for today. Have a good night, guys! Charles -- Charles Parnot charles.parnot at stanford.edu Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From jtimmer at bellatlantic.net Fri Jan 7 19:48:15 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Fri, 07 Jan 2005 19:48:15 -0500 Subject: [Biococoa-dev] BCSequence class cluster In-Reply-To: Message-ID: Just a quick comment: > I just thought of an analogy to throw in the discussion. NSString has the path > methods like 'stringByAppendingPathExtension'. These will work on ANY string, > even if they are not path, but the contents of this email. However, it does > not make sense. Do we get a compiler warning? no. Do we get a runtime error? > no. Are we in trouble? yes. What the f... this string is doing here when I > should have a path??? This is clearly the fault of the user, here, not of the > guy who designed NSString. OK, this is really a much simpler situation than > ours, but still. Yeah, this analogy doesn't really work in terms of taking into account why Alex and I worry. You can add a .tiff to a non-path, and the result is still a string. You might get unexpected behavior when you used it, but you'd have to do something convoluted to get your app to crash as a result. If you ask for a complement from a protein, you'll either get nil or something with a sequenceArray count of 0. Either of these make it very easy for a user to crash the app. That's why I feel (and as of last check, Alex did as well) that things should be structured so that asking for the complement of a sequence object should generate a compiler warning if it's not a nucleotide sequence (and likewise for other sequence type specific methods). It's the user-friendly thing to do. If we're doing that, then we should definitely have a header that informs users of what types of sequence respond to what messages. I find it personally more satisfying to explicitly type my sequence variables so that the code is easier to follow and the right methods pop up in code sense, but I recognize that that's probably a personal taste. Cheers, John _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Fri Jan 7 19:54:24 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 7 Jan 2005 19:54:24 -0500 Subject: [Biococoa-dev] BCSequence class cluster In-Reply-To: References: Message-ID: Hi John, Happy new year, and good to see you're still alive :) > A class cluster is a potentially good thing, in that it would hide > some of > the complexity of the implementation from users. It doesn't > necessarily > make Koen happier, since we may well have all the current classes used > internally. If they are all hidden inside a class cluster, it looks like a good compromise to me. > As far as I can tell, though, nothing short of me getting > around to implementing BCSequenceNucleotide would solve Koen's biggest > gripe, the duplication of three methods in the DNA and RNA subclasses. Actually the code in BCSequenceDNA and BCSequenceRNA has been cleaned up a lot by putting many methods in BCTools, so is not an issue for me anymore :) So I am not sure at this point what would be the benefit of adding an additional BCSequenceNucleotide class. > I don't think that we have an analogous situation here, though I'm not > positive about that. Given that, I think we should proceed with > caution, Those are also my thoughts a class cluster. It looks a very nice idea, but it's still unclear to me if our situation is a good example for the introduction of class clusters. - Koen. From mek at mekentosj.com Sat Jan 8 07:37:43 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sat, 8 Jan 2005 13:37:43 +0100 Subject: [Biococoa-dev] BCSequence class cluster was:(no subject) In-Reply-To: References: <93494E56-604B-11D9-8F75-000A95685F72@earthlink.net> Message-ID: <17E9C158-6172-11D9-9A51-000D93AE89A4@mekentosj.com> >> What about using _BCSequence instead of BCSeq? Whith the underscore >> we know exactly what tyoe of class we're dealing with, BCSeq <-> >> BCSequence seems more confusing to me. > > That indeed may be a better naming scheme, though I have never seen > that applied anywhere (a bit unorthodox?). It seems that ivar starting > with undescore are supposedly reserved for Apple now, but these would > be class names, not ivars, so technically, we would not breaking the > rules! I like the underscore variant as well. Charles, I heard that too, in principle ivars with underscores are reserved to Apple, although I've seen many opensource projects already where people use underscores for private ivars... > (about the purpose of BCSymbolList) >> Not sure yet how the user should approache this :). In general there >> should be one class that is always used throughout. Now I think of >> it, BCSequence is really the best choice for it. Annotations are a >> nice extension, but not always needed. Maybe we should dump >> BCSymbolList at all, and only use BCSequence. It can have an empty >> annotation ivar, which is not very costly. One of the reasons to >> introduce the BCSymbolList was to avoid the introduction of even more >> subclasses classes with long, clumsy names such as >> BCAnnotatedDNASequence. Yes, I agree, if we can get rid of BCSymbollist and only have one class (BCSequence) that would have my preference too. Just my stupidity, adding many methods to a class (which we need to keep annotations in sync and add/remove them etc) doesn't make it more costly to use memory/speed wise? > OK, so it seems the main purpose was to provide a light-weight version > for non-annotated sequence. It seems at some point, when adding > annotations, the team decided to subclass BCSequence into > BCAnnotatedSequence. The renaming then was a way to avoid long class > names (that I remember now from reading the archives) . Correct. > > I totally agree that an ivar for annotation is not going to use much > memory if nil (4 bytes) or empty (4 bytes + sizeof(ivars of NSArray)), > particularly when you already have an NSArray for the sequenceArray > that should contain in 99% cases at least 10 items (peptides), and > most of the time many more (think plasmids!). So "light-weight" class > is actually probably wrong. ;-) > > However, I could see the case for code separation, the superclass > without annotation taking care of the non-annotation methods, and then > the annotated subclass taking care of the annotation-related methods. Exactly my point, I'm not sure though, the first (and foremost) question is whether putting everything in one class doesn't matter memory/speed wise (I'm not talking about the small unused ivars). Second, how much will the sequence class increase code-wise to a point where things become unpractical. For example, I have experienced opensource projects where classes have become so extended that you can't say extract one class for use in a custom project without spending hours unwinding everything unnecessary, where with a class you basically have to incorporate the whole project to make it work. Maybe not so much of importance, but usually the more simple the code and easier to overview, the more useful and versatile it turns out to be. > In this case, the current BCSymbolList could be an abstract class and > remain hidden in the class cluster as a private subclass, at the top > of the class hierarchy, just below the pulic class (or it could BE the > public class; this goes back to my previous emails with the different > propositions for class hierarchies). We end up with one public class > being able to do everything, and all the concrete private subclasses > capable of handling annotations. > > If the purpose is to separate the code for annotations, an alternative > is to use a category, and not a subclass. And I think I like that > better, actually. As said before, I'm not really in favor of categories as they don't really belong in a framework, BUT as they are private things maybe different. > My feeling after all these discussions about annotations is that we > should be able to make annotations a general and abstract concept that > encompass all the different types of sequences, which means most of > the code could go in the superclass. Yes! Cheers, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com 4Peaks - For Peaks, Four Peaks. 2004 Winner of the Apple Design Awards Best Mac OS X Student Product http://www.mekentosj.com/4peaks ********************************************************* From mek at mekentosj.com Sat Jan 8 07:45:50 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sat, 8 Jan 2005 13:45:50 +0100 Subject: [Biococoa-dev] BCSequence class cluster In-Reply-To: <931424C0-5F6A-11D9-BC51-003065A5FDCC@earthlink.net> References: <7AA9BE1B-5F0F-11D9-A29A-000D93AE89A4@mekentosj.com> <931424C0-5F6A-11D9-BC51-003065A5FDCC@earthlink.net> Message-ID: <39E56ABA-6173-11D9-9A51-000D93AE89A4@mekentosj.com> Man, I'm running behind, this one goes really back, but I still had a few comments.... >> I'm still wondering a bit how we're going to implement these kind of >> methods, as we now have to start ALL methods with a test what the >> sequence type is. As charles pointed out, I'm wrong here as all subclasses can/will handle things differently. > > You only need to put the test in the init of the BCTools subclasses. > No need to keep them in BCSequence, et al. Another possible solution > could be to create a DNATools/ProteinTools class, which has > convenience methods to various manipulations only for that type of > sequence (this is the BioJava approach - I like the first one better, > though). The comment I had was one I've expressed before that I would strongly argue against DNA/ProteinTools for every thinkable manipulation of your sequence objects. Although I have to admit I haven't practically used BioJava much, one thing that annoyed me in there examples was that for even the simplest manipulation of your sequences you have to instantiate and use tools. I think tools are the way to go for complex things, but for things like reverse and complement I don't wanna think of needing tools for that, so un-cocoa like.. >> To implement mutable objects in the class cluster could be a bit >> tricky, because there are two conflicting subclass organizations >> here: mutable/immutable and dna/rna/protein/codon. To get all the >> combinations, it seems that we need 8 subclasses!! >> Oops, Koen won't like this, LOL ;-) On the other hand, look at the >> number of NSNumber subclasses... > > No, I don't like that :D, see my comment above. Another possibility > (also stolen from BioJava) is to make a BCToolsEdit class that takes > care of editing a immutable class. Although practically a valid option, again, this sounds weird, I think we've clearly made a bad design decision the moment you start editing immutable classes ?! Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Claiming that the Macintosh is inferior to Windows because most people use Windows, is like saying that all other restaurants serve food that is inferior to McDonalds ********************************************************* From mek at mekentosj.com Sat Jan 8 07:54:27 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sat, 8 Jan 2005 13:54:27 +0100 Subject: [Biococoa-dev] BCSequence class cluster In-Reply-To: References: <7AA9BE1B-5F0F-11D9-A29A-000D93AE89A4@mekentosj.com> <48A29064-5F59-11D9-9A51-000D93AE89A4@mekentosj.com> Message-ID: <6DEA4191-6174-11D9-9A51-000D93AE89A4@mekentosj.com> >> Therefore, I would propose to start with the mutable version, then >> later we can always generate the immutable versions in addition for >> optimization purposes. > > OK, I agree with that. It seems easier to implement the mutable > version first. The only problem is that the addition of the immutable > class could break existing programs in the future. More specifically, > if a program keeps using BCSequence as mutable objects and then > suddenly the framework decides they are now immutable and the user > should now use BCMutableSequence, there will be a lot of rewrite > necessary. I see, and that we certainly don't want, I agree. > > An alternative is to call the puclic class BCMutableSequence instead > now. Actually, I like that idea very much, because the class simply is mutable and it allows a gap to be filled later. > And not have a BCSequence class (at least no public class). So future > introduction of BCSequence will not break old code. I know there is > not so many applications relying on BioCocoa yet, but even if there > are just one or two (like the examples in the CVS project), it might > still be worth thinking about it now. > > Another alternative is to have both public headers for in fact the > same implementation. The implementation would be very easy: > * superclass BCSequence: header without the mutable-specific methods > * subclass BCMutableSequence: header with the mutable-specific methods > * the rest of the subclasses = below that > Then, at least, the user gets some compiler warnings even if the > mutable-specific methods actually will run fine on the immutable > instances. Of course, there could still be cases not detected by the > compiler and and that should be illegal at runtime (like if the user > ignores the compiler warnings!!) and would still cause programs to > break in the future. But that would be less frequent thanks to the > compiler warnings. Perhaps this is the best way then, really nice in fact (except that it might again require many subclasses but I never found that a problem). And also maybe again to easily thought, but remember our "users" are developers who should do a lot of testing and debugging (ouch, that's a bate for a hot discussion), one golden rule is never to ship a product with compiler warnings (I know that Apple this is the case). My point of view (admittedly a simple one), if users start to ignore our warnings, then they are on their own. > Whichever way we implement it in the future, we have to make sure that > it is feasible. It looks like we already have several options and > opinions, so that's a good sign! We can decide how to do it later. So > the roadmap is: > * do we want mutable and immutable objects? --> well, yeah, sure! (who > would say no?) YES! > * can we do it in the future? --> apparently yes, we have several > ideas (it does not mean it will be easy to maintain code, we have to > make the best decision at some point, and Koen raised that important > issue) YES > * can we do it in the future with the class cluster design? --> same > answer, same ideas YES > * should we have something temporary already in place to ensure > backward-compatibility? --> my vote: yes; what do you guys think? YES > * if yes, how? --> I proposed 2 ways I would go for the 2nd one, with a mutable subclass > > >> About the warnings, I'm not that much of a fan to add flags like >> isMutable or isTyped, in the first case I would rather have a real >> immutable subclass and in the second can't we just generate the >> runtime errors, in general those will surface in 99% of the cases in >> the development cycle and the developer can take countermeasures to >> prevent the end-user from doing stupid things. > The problem with an immutable suclass is that it would inherit the > methods in the mutable superclass, and thus the user would not get > compiler warnings when using a mutable-specific method on a immutable > class. You're right, basically the mutable variant should always be a subclass of the immutable one and not the other way around. I think the 2nd option might work out the nicest. Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* From mek at mekentosj.com Sat Jan 8 07:57:26 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sat, 8 Jan 2005 13:57:26 +0100 Subject: [Biococoa-dev] (no subject) In-Reply-To: <5FCC29DE-5F8A-11D9-8F75-000A95685F72@earthlink.net> References: <5FCC29DE-5F8A-11D9-8F75-000A95685F72@earthlink.net> Message-ID: > BTW, the reason that BioJava uses immutable sequences is: " It is > worth noting that many BioJava implementations of Sequence and > SymbolList do not allow edit operations as this may invalidate > underlying Features or Annotations." That's indeed something to keep > in the back of our minds. And that scares me actually, because we all know it should be possible to keep annotations in sync: the attributed string. Apple managed to solve exactly this problem, so we should be able to come up with a way to have annotations in a mutable sequence class, instead of this lousy excuse ;-) Cheers, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com The requirements said: Windows 2000 or better. So I got a Macintosh. ********************************************************* -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 1365 bytes Desc: not available URL: From kvddrift at earthlink.net Sat Jan 8 07:58:40 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 8 Jan 2005 07:58:40 -0500 Subject: [Biococoa-dev] BCSequence class cluster In-Reply-To: <39E56ABA-6173-11D9-9A51-000D93AE89A4@mekentosj.com> References: <7AA9BE1B-5F0F-11D9-A29A-000D93AE89A4@mekentosj.com> <931424C0-5F6A-11D9-BC51-003065A5FDCC@earthlink.net> <39E56ABA-6173-11D9-9A51-000D93AE89A4@mekentosj.com> Message-ID: <0522E90C-6175-11D9-8F75-000A95685F72@earthlink.net> On Jan 8, 2005, at 7:45 AM, Alexander Griekspoor wrote: >> You only need to put the test in the init of the BCTools subclasses. >> No need to keep them in BCSequence, et al. Another possible solution >> could be to create a DNATools/ProteinTools class, which has >> convenience methods to various manipulations only for that type of >> sequence (this is the BioJava approach - I like the first one better, >> though). > > The comment I had was one I've expressed before that I would strongly > argue against DNA/ProteinTools for every thinkable manipulation of > your sequence objects. Although I have to admit I haven't practically > used BioJava much, one thing that annoyed me in there examples was > that for even the simplest manipulation of your sequences you have to > instantiate and use tools. I think tools are the way to go for complex > things, but for things like reverse and complement I don't wanna think > of needing tools for that, so un-cocoa like.. I agree, BioJava has a lot of good things, but they use way to many steps to get things accomplished. But maybe that's inheritant to Java, which I hardly know (I can only read it), Or maybe when the framework gets bigger, these things turn out to be useful. > >>> To implement mutable objects in the class cluster could be a bit >>> tricky, because there are two conflicting subclass organizations >>> here: mutable/immutable and dna/rna/protein/codon. To get all the >>> combinations, it seems that we need 8 subclasses!! >>> Oops, Koen won't like this, LOL ;-) On the other hand, look at the >>> number of NSNumber subclasses... >> >> No, I don't like that :D, see my comment above. Another possibility >> (also stolen from BioJava) is to make a BCToolsEdit class that takes >> care of editing a immutable class. > > Although practically a valid option, again, this sounds weird, I think > we've clearly made a bad design decision the moment you start editing > immutable classes ?! > Now I think of it, is there a good reason why we should have immutable sequences? The only I can think of right now is that we have to be careful when annotations are present. If we can solve that, then IMO there is no need for both immutable and mutable variants. Charles, you know perl, any idea how this is solved in BioPerl? - Koen. From mek at mekentosj.com Sat Jan 8 08:20:23 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sat, 8 Jan 2005 14:20:23 +0100 Subject: [Biococoa-dev] BCSequence class cluster In-Reply-To: References: Message-ID: <0DADF809-6178-11D9-9A51-000D93AE89A4@mekentosj.com> Op 7-jan-05 om 23:20 heeft Charles PARNOT het volgende geschreven: > First, if we go back again to the BCSequence, we have 2 options, > regardless of class cluster: > > * we only provide the user with one public class, either with a class > cluster or a general class that handle the different cases; how to > handle the irrelevant cases is debatable: bluntly (returning nil), > silently (returning self or empty objects or even [NSNull null]), with > a Biococoa-specific error reporting system (which could be useful > anyway at some point in the future), or with runtime errors (this > would not be very nice!) I largely agree, but remember the latter isn't so bad at all as we're dealing with a different target audience here, yes runtime errors are bad if you ship programs to end users because nothing can be done at it anymore when the users experience the error (well, usually they won't even see it). Here users are developers who WILL notice the error while testing their app, and they will take them into account and change their code. But yes, it's a last resort. > > * you provide public headers for all the different types of sequence; > and you don't let the user get a sequence object BCSequence, without a > strong type, like you get with the method [BCSequence > sequenceWithString:aString]; so you provide the user with all the > objects BCSequenceDNA, BCSequenceRNA, BCSequenceCodon, > BCSequenceProtein, and the user chooses which one to use; this gives > more work to the user, but also more control and thus potentially more > flexibility (and BCSequenceNucleotide could fit in the picture too) Hmm, this is not a good thing either, if you hand the user a bag of different sequence object and let him choose, than he (assuming that he knows what to choose) could have just as well send the proper method from the beginning right? I think such a method is supposed to figure out what the most likely type the handed sequence is of, and then return that one. Otherwise, the user should pick the right method. > > The choice depends on 2 things: > > * what the user wants; this is difficult; we are all potential users, > I suppose; maybe one hint is that the user of BioCocoa would also be a > user of Cocoa, and would be used to simple interfaces where she does > not have to handle the details and can get results in 2 lines of code Yes please! > > * what the developers can do and how much work it requires; my feeling > is in both designs, there will have to be some compromises, and not > everything will be perfect; but it seems both options are feasible, > the class cluster potentially requiring more careful planning (but > once established, not more complicated than other approaches, or maybe > even simpler); the amount of code would be roughly equivalent in both > cases; the simple interface would make code maintainance easier (at > least in terms of testing) and would improve backward compatibility; > on the other hand, the more complex public interface would male our > life easier when we want to provide more radical extensions Yes, I agree. > > Again, the reason why I came up with the idea of some public headers > for placeholder classes for typed sequences was to propose the user > BOTH OPTIONS! (but maybe we should not). That's perhaps the most important question we have to answer first indeed. > > OK, so we have two choices for the interface. There might be ways to > provide both choices to the user, but we probably would not want to do > that as a first version and we have to choose anyway. Right. > > > Now, to go beyond the BCSequence implementation, you raise the issue > of the BCTools implementation. And it strikes me now that they are > very much interrelated and that there are other design issues with > BCTools and that they closely relate to BCSequence. We have now > another choice to make: > * to perform an operation on a BCSequence, the user has to use one of > the BCTools; the simple BCTools might provide some convenience class > methods like > + (BCSequence *)complementForSequence:(BCSequence *)aSequence > but more complex tools will be used by alloc/init and then settings > some parameters with some accessors methods and then calling a > 'result' method on the tool; the interface to the BCTools is public > * to perform an operation on a BCSequence, the user has to use a > BCSequence method, such as 'complement', or 'cutWithEnzyme:' or > 'weigh' or ...; and the user does not even have to know that BCTools > exist! Yes, that was my proposal, as expressed in the previous email, I would not like to force the user to continuously use tools. I don't care to use tools, but then the simple things should be hidden by convenience methods where we do the work. > > The latter gives a very simple interface, all within BCSequence. > However, while the concept of sequence is abstract enough to be put in > one class, I don't think the concept of tools or operation on a > sequence is simple enough to have all the interface all fit in the > context of BCSequence: I think we all agree that somewhere there's a border we have to set on a per situation basis where to go for simple tools hidden by convenience methods and more complex situations where it would make sense not to hide things for the various reasons you have listed here. > > So my impression is that the BCTools interface will have to be public > to a certain extent (and for simple, obvious tools like translation, > some convenience methods will be included in the BCSequence interface, > the job being done really by a BCTool). But I don't see why a method for which we provide a convenience method in BCSequence should be hidden in the BCTool, I don't mind that the BCTool methods we use internally in the BCSequence convenience methods are public as well, if a user wants to replicate the convenience method by hand for some reason, he's free to do so... > Now, the user will have to provide the BCTool with a BCSequence. > Depending on the design chosen for BCSequence (one public class or > several typed subclasses), the interface for the BCTools will be quite > different, and the implementation as well: > > * with just one BCSequence, there is only the need for one 'init' > method, namely 'initWithSequence:'; the code may have to decide what > to do depending on the 'sequenceType' (the good thing is BCSequence > does not have to decide, so if a convenience method is provided in > BCSequence, it can be at the level of the superclass). So there might > be some 'if' and 'case' statement involved here. In certain cases, it > might get very difficult to stick to a simple general tool able to > handle all sequence types with just one class. Such an example is > alignement. Hey, we can have the same approach we're now taking for bcsequence (are we? ;-) for such tools as well can't we? Have a public tool with private subclasses depending on the fed sequence subclass... > A BCToolAlignement would be very different for a protein and a DNA. We > then may have to provide two tools in two separate classes and have > more stringent rules (the user will have to be more careful then), but > we don't have to give up the simplicity of the BCSequence, I think. > Anyway, alignements are really a very elaborate thing that may fall > out of the BCTool paradigm. Hmm, that might no be the case actually, I've seen quite some alignment code recently and almost never people use different methods for aligning DNA and protein, there's really no big difference (take ClustalW for instance), only the scoring matrix might be a bit more complicated. A still think alignments are very nice in fact to be offered as a tool. > *what you are saying, Peter (finally I comment on your comment!), is > that to solve these dilemna, we implement a class cluster together > with some typed classes that will be only used in certain tools; > > I did not mean to go that far in the discussion when I started that > email, but these issues will have to be debated, will have some impact > on the design of BCSequence, and some design decisions will have to > be taken for the BCTools as well. Yep. > Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows is a 32-bit patch to a 16-bit shell for an 8-bit operating system, written for a 4-bit processor by a 2- bit company without 1 bit of sense. ********************************************************* From kvddrift at earthlink.net Sat Jan 8 08:31:44 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 8 Jan 2005 08:31:44 -0500 Subject: [Biococoa-dev] BCSequence class cluster In-Reply-To: <0DADF809-6178-11D9-9A51-000D93AE89A4@mekentosj.com> References: <0DADF809-6178-11D9-9A51-000D93AE89A4@mekentosj.com> Message-ID: On Jan 8, 2005, at 8:20 AM, Alexander Griekspoor wrote: >> Again, the reason why I came up with the idea of some public headers >> for placeholder classes for typed sequences was to propose the user >> BOTH OPTIONS! (but maybe we should not). > That's perhaps the most important question we have to answer first > indeed. > I still don't understand the construction with these placeholder classes. Why do we need them, next to the subclasses we already have? (this is an educational question, not a critical one ;-) - Koen. From mek at mekentosj.com Sat Jan 8 08:36:40 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sat, 8 Jan 2005 14:36:40 +0100 Subject: [Biococoa-dev] BCSequence class cluster In-Reply-To: References: Message-ID: <53C78A64-617A-11D9-9A51-000D93AE89A4@mekentosj.com> >> Given that, I think we should proceed with caution, >> and perhaps ask Apple's Cocoa Dev list for their opinions on when to >> use a >> class cluster. > > I totally agree that such a big design decision should be carefully > questioned and then carefully planned, be it class clusters or not. It > seems to me that with the core of BCSequence in place, the arrival of > annotations and of BCTools, some design decisions have to be taken > anyway. Yes, we obviously have hit this point a few times now, and I start to believe in Charles' class cluster approach, it seems to have some basic properties that will make especially annotations and tools in a direction that I have more faith in than the public subclassing approach. And these are even the only things we've thought about now. In the end the real question is what approach is the most flexible and provides us with the strongest foundation to build upon. > I did raise an additional sets of questions and pointed to some > issues, coming after many of the discussions you had before. It seems > you already had to take some decisions in the past, and more are > coming! All these discussions are useful (!), even though right now, > it seems to be all theoretical and no decision has been taken. In > fact, one could get the feeeling that such discussions could go > forever, and stall the project. Yes, and result in longer and longer emails LOL, luckily there's the 40Kb limit of the biococoa list server after which Peter will have the final word ;-) > I agree with you that the current design decisions could be critical > for the future, so one or two weeks of discussions are no big deal. > This way, we can see where the questions are, and get a sense of the > priorities before coming up with a roadmap, and also a set of yes/no > design questions to take a decision on. And then all vote, and then > Peter makes the decision.. I am just guessing at the current process > for decision taking ;-) Like I said ;-) > > What the other BioX efforts can show us, is what kind of tools can > exist, what type of sequence can exist, what kind of annotations, and > how they all play together. I personnaly don't have much knowledge of > the other BioX efforts. How about you guys? Koen seems to have the most, he can often tell us how things are done in BioPerl and BioJava, which is really informative. > > Regarding class cluster. > To quote Apple, 'The grouping of classes in this way simplifies the > publicly visible architecture of an object-oriented framework without > reducing its functional richness'. And that's exactly what we're looking for as well IMHO. > Asking the cocoadev mailing list (or a discussion on cocoadev.com) is > a very good idea. I thought about it at some point. Ultimately, > because WE are the biologists, only us can decide wether dna and > protein sequences are as close to each other as, eg int and float (cf > NSNumber). Yes, and I'm afraid that although it would be informative to have others mix in the situation, I'm afraid it will lead to more discussion but less decisions. I think we should be careful with dropping the ball somewhere else with a simple "what do you think?" I would propose to invite people to have a look that we already know and ideally are both in Biology and Cocoa/BioX projects. We all know a few people like that I guess. Look at the impact that the involvement of Charles has! For example, I could invite Serge Cohen (Serge if you're reading this, I'll let you know why I send you this email ;-) to share his opinion (he's somewhat more from the informatics than from the bio side in contrast to most of us), or I could ask a guy from Apple I met at the WWDC if he has suggestions. If we all do that, I think it would result in more constructive discussions than simply asking on cocoa-dev. > > To go further, let 's go very far (or not so far) in the future. What > could BioCocoa be? For me, it seems it could do to sequences what the > WebView does with web pages. Yep, my idea as well. > Thanks to the simplicity and power of WebView, with two lines of code > (or even just a few links in a nib), you get a web browser. Imagine > the same with BioCocoa. A nib with a BCSequenceView in a window, and a > few menu items like 'complement', 'reverse', etc... Then a few lines > of code in a controller would allow the user to load a sequence from > file, choose complement in the menu and get a new window with the > complement. The developer of that app (I can't call him the user > anymore, sorry!) would not have to know which type of sequence the > view is dealing with. So it would just forward the 'complement' calls > to the sequence in the view and pop up a new view with the returned > sequence, no question asked. What if the user of the app opens a > protein sequence, and chooses complement in the menu. What should the > user of the app expect? Well, the user should not be surprised to get > the same sequence back, or some empty window, or nothing. The > developer of BioCocoa are not to blame, the developer of the app is > not to blame, the user of the app can only blame himself for that and > if he does not understand what is happening, he should probably not > use that app!! In the meantime, BioCocoa has made the life of the > developer of the app very easy; it took less than an hour to build a > good-looking app; and should more types of sequence be added in the > framework, no need for any change in the code. I love that story :-) Amen! Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com 4Peaks - For Peaks, Four Peaks. 2004 Winner of the Apple Design Awards Best Mac OS X Student Product http://www.mekentosj.com/4peaks ********************************************************* From mek at mekentosj.com Sat Jan 8 08:41:35 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sat, 8 Jan 2005 14:41:35 +0100 Subject: [Biococoa-dev] BCSequence class cluster In-Reply-To: References: Message-ID: <03EDC685-617B-11D9-9A51-000D93AE89A4@mekentosj.com> Op 8-jan-05 om 1:48 heeft John Timmer het volgende geschreven: > Just a quick comment: > >> I just thought of an analogy to throw in the discussion. NSString has >> the path >> methods like 'stringByAppendingPathExtension'. These will work on ANY >> string, >> even if they are not path, but the contents of this email. However, >> it does >> not make sense. Do we get a compiler warning? no. Do we get a runtime >> error? >> no. Are we in trouble? yes. What the f... this string is doing here >> when I >> should have a path??? This is clearly the fault of the user, here, >> not of the >> guy who designed NSString. OK, this is really a much simpler >> situation than >> ours, but still. > > Yeah, this analogy doesn't really work in terms of taking into account > why > Alex and I worry. You can add a .tiff to a non-path, and the result is > still a string. You might get unexpected behavior when you used it, > but > you'd have to do something convoluted to get your app to crash as a > result. > > If you ask for a complement from a protein, you'll either get nil or > something with a sequenceArray count of 0. Either of these make it > very > easy for a user to crash the app. That's why I feel (and as of last > check, > Alex did as well) Yes sir! > that things should be structured so that asking for the > complement of a sequence object should generate a compiler warning if > it's > not a nucleotide sequence (and likewise for other sequence type > specific > methods). It's the user-friendly thing to do. If we're doing that, > then we > should definitely have a header that informs users of what types of > sequence > respond to what messages. > > I find it personally more satisfying to explicitly type my sequence > variables so that the code is easier to follow and the right methods > pop up > in code sense, but I recognize that that's probably a personal taste. I like to type my variables as well like: BCSequenceDNA* theDNA; instead of a simple BCSequence *theDNA; Perhaps that should be the way a developer can choose to do weak or strong typing. But I would not like to do a test before every method I call whether the method will respond properly to my sequence object. Or provide a protocol along with each method... Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com iRNAi, do you? http://www.mekentosj.com/irnai ********************************************************* From kvddrift at earthlink.net Sat Jan 8 10:42:17 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 8 Jan 2005 10:42:17 -0500 Subject: [Biococoa-dev] BCSequence class cluster In-Reply-To: <53C78A64-617A-11D9-9A51-000D93AE89A4@mekentosj.com> References: <53C78A64-617A-11D9-9A51-000D93AE89A4@mekentosj.com> Message-ID: On Jan 8, 2005, at 8:36 AM, Alexander Griekspoor wrote: > Yes, and I'm afraid that although it would be informative to have > others mix in the situation, I'm afraid it will lead to more > discussion but less decisions. I think we should be careful with > dropping the ball somewhere else with a simple "what do you think?" I > would propose to invite people to have a look that we already know and > ideally are both in Biology and Cocoa/BioX projects. We all know a few > people like that I guess. > Look at the impact that the involvement of Charles has! > For example, I could invite Serge Cohen (Serge if you're reading this, > I'll let you know why I send you this email ;-) to share his opinion > (he's somewhat more from the informatics than from the bio side in > contrast to most of us), or I could ask a guy from Apple I met at the > WWDC if he has suggestions. If we all do that, I think it would result > in more constructive discussions than simply asking on cocoa-dev. > I think that is a really good idea, Alex. Since all of us have a science background, and not an programming background, it could well be that we completely miss an important issue, or make things too complicated. We all know how much we are struggling with finding the most usable design for BCSequence et al. Although it is good to think of the user's approach, I also think we need to have a solid base, that we can build all our ideas on top of. As soon as we have the base, adding biology and chemistry extensions shouldn't be that difficult anymore. One other thing would be alignment coding. This has been done by many other programmers, and we shouldn't waste our time and efforts by trying to reinvent the wheel. Of course we have to do thing the ObjC way, not just blindly copying some cool approach from one of the other BioX projects. Regarding new developers, a quick search for biology/molecule on the CocoaDev users list shows a few names: Charles (we all know him by now :), Julan Blow (http://www.cocoadev.com/index.pl?JulianBlow), and Todd Harris (http://www.cocoadev.com/index.pl?ToddHarris). We could see they are interested to join. On the other hand, more developers -> more opinions -> longer discussions -> less coding :) cheers, - Koen. From mek at mekentosj.com Sat Jan 8 11:06:46 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sat, 8 Jan 2005 17:06:46 +0100 Subject: [Biococoa-dev] BCSequence class cluster In-Reply-To: References: <53C78A64-617A-11D9-9A51-000D93AE89A4@mekentosj.com> Message-ID: <4BA7654A-618F-11D9-9A51-000D93AE89A4@mekentosj.com> > We could see they are interested to join. On the other hand, more > developers -> more opinions -> longer discussions -> less coding :) And that's exactly my point and why I rather have a few extra people to the project who can really contribute, then a lot that don't know what they're talking about... At some point we better just try and see if the class cluster approach works (the project isn't so big anyway yet) instead of purely theoretical discussions for weeks... Let's just see if it works. Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Claiming that the Macintosh is inferior to Windows because most people use Windows, is like saying that all other restaurants serve food that is inferior to McDonalds ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows vs Mac 65 million years ago, there were more dinosaurs than humans. Where are the dinosaurs now? ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Microsoft is not the answer, Microsoft is the question, NO is the answer ********************************************************* From mek at mekentosj.com Sat Jan 8 11:09:03 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sat, 8 Jan 2005 17:09:03 +0100 Subject: [Biococoa-dev] BCSequence class cluster In-Reply-To: References: <53C78A64-617A-11D9-9A51-000D93AE89A4@mekentosj.com> Message-ID: <9D672F72-618F-11D9-9A51-000D93AE89A4@mekentosj.com> Koen, I forgot, do you want to send an invitation to these two people at CocoaDev to see if they might be interested in helping with the framework, or at least in the discussions? Alex Op 8-jan-05 om 16:42 heeft Koen van der Drift het volgende geschreven: > > On Jan 8, 2005, at 8:36 AM, Alexander Griekspoor wrote: > >> Yes, and I'm afraid that although it would be informative to have >> others mix in the situation, I'm afraid it will lead to more >> discussion but less decisions. I think we should be careful with >> dropping the ball somewhere else with a simple "what do you think?" I >> would propose to invite people to have a look that we already know >> and ideally are both in Biology and Cocoa/BioX projects. We all know >> a few people like that I guess. >> Look at the impact that the involvement of Charles has! >> For example, I could invite Serge Cohen (Serge if you're reading >> this, I'll let you know why I send you this email ;-) to share his >> opinion (he's somewhat more from the informatics than from the bio >> side in contrast to most of us), or I could ask a guy from Apple I >> met at the WWDC if he has suggestions. If we all do that, I think it >> would result in more constructive discussions than simply asking on >> cocoa-dev. >> > > I think that is a really good idea, Alex. Since all of us have a > science background, and not an programming background, it could well > be that we completely miss an important issue, or make things too > complicated. We all know how much we are struggling with finding the > most usable design for BCSequence et al. Although it is good to think > of the user's approach, I also think we need to have a solid base, > that we can build all our ideas on top of. As soon as we have the > base, adding biology and chemistry extensions shouldn't be that > difficult anymore. One other thing would be alignment coding. This has > been done by many other programmers, and we shouldn't waste our time > and efforts by trying to reinvent the wheel. Of course we have to do > thing the ObjC way, not just blindly copying some cool approach from > one of the other BioX projects. > > Regarding new developers, a quick search for biology/molecule on the > CocoaDev users list shows a few names: Charles (we all know him by now > :), Julan Blow (http://www.cocoadev.com/index.pl?JulianBlow), and > Todd Harris (http://www.cocoadev.com/index.pl?ToddHarris). We could > see they are interested to join. On the other hand, more developers -> > more opinions -> longer discussions -> less coding :) > > cheers, > > - Koen. > > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com The requirements said: Windows 2000 or better. So I got a Macintosh. ********************************************************* From kvddrift at earthlink.net Sat Jan 8 12:32:57 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 8 Jan 2005 12:32:57 -0500 Subject: [Biococoa-dev] BCSequence class cluster In-Reply-To: <9D672F72-618F-11D9-9A51-000D93AE89A4@mekentosj.com> References: <53C78A64-617A-11D9-9A51-000D93AE89A4@mekentosj.com> <9D672F72-618F-11D9-9A51-000D93AE89A4@mekentosj.com> Message-ID: <56265A60-619B-11D9-A324-003065A5FDCC@earthlink.net> On Jan 8, 2005, at 11:09 AM, Alexander Griekspoor wrote: > Koen, I forgot, do you want to send an invitation to these two people > at CocoaDev to see if they might be interested in helping with the > framework, or at least in the discussions? > I'll try to do that this weekend. - Koen. From kvddrift at earthlink.net Sat Jan 8 16:20:00 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 8 Jan 2005 16:20:00 -0500 Subject: [Biococoa-dev] BioCocoa invitation Message-ID: <0E076119-61BB-11D9-8F75-000A95685F72@earthlink.net> Hi guys, I wrote this 'invitation' that we can send out to possible new developers. Let me know if you think this is ok, and if I missed anything. cheers, - Koen. ------------------------- Dear FooBar, I found your name on the CocoaDev mailing list. I am one of the core developers of the BioCocoa project, an open source framework for reading, writing, and manipulating various biological sequences. Originally the framework was aimed also at GNUStep and only focused on reading and writing sequences, which is still the official version that you can download from the website at http://bioinformatics.org/biococoa/. A few months ago a couple of mac loving bioscientists joined the project, and it was transformed in a Cocoa-only framework, and extended to be able to do other things with sequences than reading and writing. You could say that it is trying to be the Cocoa/Obj-C sibling of BioJava, BioPerl, etc. So far we have created a set of new classes, that we believe should be a solid base for extending the whole framework. However, we still have many discussions what would be the best design. For instance, currently we have a core class BCSequence, with various subclasses for DNA, RNA, and proteins. We have had many discussions whether we actually should subclass BCSequence (eg BioCocoa doesn't subclass it's main sequence class). Recently we have been discussing if maybe we should use the class cluster design for the various sequence classes, At the moment we are only a small group of developers, mainly with a scientific background. Based on the description on the CocoaDev webpage, it seems you could be a valuable addition to the BioCocoa project. So we were wondering if you would be interested in joining the development team, and the discussions on our mailing list (see http://bioinformatics.org/project/?group_id=318 for more info). Feel free to have a look at the code and the current discussions, and let me or one of the other developers know if you are interested. We look forward to your reply, On behalf of the BioCocoa Team, best regards, Koen van der Drift From peter.schols at bio.kuleuven.ac.be Sat Jan 8 17:15:51 2005 From: peter.schols at bio.kuleuven.ac.be (Peter Schols) Date: Sat, 8 Jan 2005 23:15:51 +0100 Subject: [Biococoa-dev] BioCocoa invitation In-Reply-To: <0E076119-61BB-11D9-8F75-000A95685F72@earthlink.net> References: <0E076119-61BB-11D9-8F75-000A95685F72@earthlink.net> Message-ID: Sounds good to me. peter On 08 Jan 2005, at 22:20, Koen van der Drift wrote: > Hi guys, > > I wrote this 'invitation' that we can send out to possible new > developers. Let me know if you think this is ok, and if I missed > anything. > > cheers, > > - Koen. > > ------------------------- > > > Dear FooBar, > > I found your name on the CocoaDev mailing list. I am one of the core > developers of the BioCocoa project, an open source framework for > reading, writing, and manipulating various biological sequences. > Originally the framework was aimed also at GNUStep and only focused on > reading and writing sequences, which is still the official version > that you can download from the website at > http://bioinformatics.org/biococoa/. A few months ago a couple of mac > loving bioscientists joined the project, and it was transformed in a > Cocoa-only framework, and extended to be able to do other things with > sequences than reading and writing. You could say that it is trying to > be the Cocoa/Obj-C sibling of BioJava, BioPerl, etc. > > So far we have created a set of new classes, that we believe should be > a solid base for extending the whole framework. However, we still have > many discussions what would be the best design. For instance, > currently we have a core class BCSequence, with various subclasses for > DNA, RNA, and proteins. We have had many discussions whether we > actually should subclass BCSequence (eg BioCocoa doesn't subclass it's > main sequence class). Recently we have been discussing if maybe we > should use the class cluster design for the various sequence classes, > > At the moment we are only a small group of developers, mainly with a > scientific background. Based on the description on the CocoaDev > webpage, it seems you could be a valuable addition to the BioCocoa > project. So we were wondering if you would be interested in joining > the development team, and the discussions on our mailing list (see > http://bioinformatics.org/project/?group_id=318 for more info). > > Feel free to have a look at the code and the current discussions, and > let me or one of the other developers know if you are interested. We > look forward to your reply, > > On behalf of the BioCocoa Team, best regards, > > > Koen van der Drift > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > From kvddrift at earthlink.net Sat Jan 8 17:30:50 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 8 Jan 2005 17:30:50 -0500 Subject: [Biococoa-dev] BioCocoa invitation In-Reply-To: References: <0E076119-61BB-11D9-8F75-000A95685F72@earthlink.net> Message-ID: On Jan 8, 2005, at 5:15 PM, Peter Schols wrote: > Sounds good to me. > Thanks. I already found a couple of errors, though :-) Will they be able to download the source code from CVS, or do need to have a BioCocoa account? Just pointing them to the CVS website is not a good idea, because it contains quite some deprecated classes, and it doesn't allow them to use the Xcode project. cheers, - Koen. From mek at mekentosj.com Sat Jan 8 17:52:18 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sat, 8 Jan 2005 23:52:18 +0100 Subject: [Biococoa-dev] BioCocoa invitation In-Reply-To: References: <0E076119-61BB-11D9-8F75-000A95685F72@earthlink.net> Message-ID: Very nice Koen! The only thing I had some trouble with was the Foobar part. Now I might be the joke of the day if I'm wrong here, but I thought Foobar is derived from Fubar, which according to what I remember from "Saving Private Ryan", stands for "F*cked up beyond any recognition", not really a compliment to any new person ;-) About the CVS, this is what I found on the bioinformatics webpage to access the CVS anonymously: How to check out source anonymously through pserver ? First, log in by typing the following. When prompted for the password, just press Enter. cvs -d:pserver:anonymous at bioinformatics.org:/cvsroot login ? Then, type the following, making the necessary obvious substitution for the modulename. As mentioned above, the -d flag need not be included for subsequent commands, such as update. cvs -d:pserver:anonymous at bioinformatics.org:/cvsroot checkout -P BioCocoa I already added the BioCocoa module name and the pruning option (-P), checked it and it works. Perhaps you can add this to your email. Also note that to join the mailinglist, they don't have to register at bioinformatics yet, only if they want to join the project. Cheers, alex Op 8-jan-05 om 23:30 heeft Koen van der Drift het volgende geschreven: > > On Jan 8, 2005, at 5:15 PM, Peter Schols wrote: > >> Sounds good to me. >> > > Thanks. I already found a couple of errors, though :-) > > Will they be able to download the source code from CVS, or do need to > have a BioCocoa account? Just pointing them to the CVS website is not > a good idea, because it contains quite some deprecated classes, and it > doesn't allow them to use the Xcode project. > > > cheers, > > - Koen. > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 E-mail: a.griekspoor at nki.nl AIM: mekentosj at mac.com Web: http://www.mekentosj.com EnzymeX - To cut or not to cut http://www.mekentosj.com/enzymex ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* From charles.parnot at stanford.edu Sun Jan 9 00:45:17 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Sat, 8 Jan 2005 21:45:17 -0800 Subject: [Biococoa-dev] BioCocoa invitation In-Reply-To: <0E076119-61BB-11D9-8F75-000A95685F72@earthlink.net> References: <0E076119-61BB-11D9-8F75-000A95685F72@earthlink.net> Message-ID: Sounds good to me too! Maybe you could also simply put up a small annoucement with about the same contents in the BioCocoa page of the wiki (there is one, right?), dated and stating that Biococoa is looking for developers that know their biology, at least for discussion (and not just any developer). Charles NB: wow, a short email! Don't sigh of relief yet, more is coming... >Hi guys, > >I wrote this 'invitation' that we can send out to possible new developers. Let me know if you think this is ok, and if I missed anything. > >cheers, > >- Koen. > >------------------------- > > >Dear FooBar, > >I found your name on the CocoaDev mailing list. I am one of the core developers of the BioCocoa project, an open source framework for reading, writing, and manipulating various biological sequences. Originally the framework was aimed also at GNUStep and only focused on reading and writing sequences, which is still the official version that you can download from the website at http://bioinformatics.org/biococoa/. A few months ago a couple of mac loving bioscientists joined the project, and it was transformed in a Cocoa-only framework, and extended to be able to do other things with sequences than reading and writing. You could say that it is trying to be the Cocoa/Obj-C sibling of BioJava, BioPerl, etc. > >So far we have created a set of new classes, that we believe should be a solid base for extending the whole framework. However, we still have many discussions what would be the best design. For instance, currently we have a core class BCSequence, with various subclasses for DNA, RNA, and proteins. We have had many discussions whether we actually should subclass BCSequence (eg BioCocoa doesn't subclass it's main sequence class). Recently we have been discussing if maybe we should use the class cluster design for the various sequence classes, > >At the moment we are only a small group of developers, mainly with a scientific background. Based on the description on the CocoaDev webpage, it seems you could be a valuable addition to the BioCocoa project. So we were wondering if you would be interested in joining the development team, and the discussions on our mailing list (see http://bioinformatics.org/project/?group_id=318 for more info). > >Feel free to have a look at the code and the current discussions, and let me or one of the other developers know if you are interested. We look forward to your reply, > >On behalf of the BioCocoa Team, best regards, > > >Koen van der Drift > >_______________________________________________ >Biococoa-dev mailing list >Biococoa-dev at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/biococoa-dev -- Charles Parnot charles.parnot at stanford.edu Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From charles.parnot at stanford.edu Sun Jan 9 00:51:48 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Sat, 8 Jan 2005 21:51:48 -0800 Subject: [Biococoa-dev] why placeholder classes was: BCSequence class cluster In-Reply-To: References: <0DADF809-6178-11D9-9A51-000D93AE89A4@mekentosj.com> Message-ID: >On Jan 8, 2005, at 8:20 AM, Alexander Griekspoor wrote: > >>>Again, the reason why I came up with the idea of some public headers for placeholder classes for typed sequences was to propose the user BOTH OPTIONS! (but maybe we should not). >>That's perhaps the most important question we have to answer first indeed. >> > >I still don't understand the construction with these placeholder classes. Why do we need them, next to the subclasses we already have? (this is an educational question, not a critical one ;-) > >- Koen. This only applies to the class cluster design. Because the public superclass declares all the methods as valid, then all the subclasses are known by the compiler to also respond to all the methods. So we can't just use the headers of the subclass. We have to provide separate headers with a restricted set of methods. But alloc-ing these objects will actually return objects of the class cluster (hence 'placeholder' classes). The compiler is fooled. The runtime gets what is needed. I hope the shortness makes it clearer. I can make a longer version if you want ;-) Note that there might be other possibilities to achieve the same effect. I was thinking this morning in the car that maybe some aternate headers might still fool the compiler without the need for extra classes. Charles -- Charles Parnot charles.parnot at stanford.edu Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From kvddrift at earthlink.net Sun Jan 9 00:54:07 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 9 Jan 2005 00:54:07 -0500 Subject: [Biococoa-dev] BioCocoa invitation In-Reply-To: References: <0E076119-61BB-11D9-8F75-000A95685F72@earthlink.net> Message-ID: On Jan 8, 2005, at 5:52 PM, Alexander Griekspoor wrote: > Very nice Koen! The only thing I had some trouble with was the Foobar > part. Now I might be the joke of the day if I'm wrong here, but I > thought Foobar is derived from Fubar, which according to what I > remember from "Saving Private Ryan", stands for "F*cked up beyond any > recognition", not really a compliment to any new person ;-) LOL - no that was just a temp name, never intended to use that for real. And I never knew about the acronym... > About the CVS, this is what I found on the bioinformatics webpage to > access the CVS anonymously: > > How to check out source anonymously through pserver > ? First, log in by typing the following. When prompted for the > password, just press Enter. > > cvs -d:pserver:anonymous at bioinformatics.org:/cvsroot login > > ? Then, type the following, making the necessary obvious > substitution for the modulename. As mentioned above, the -d flag need > not be included for subsequent commands, such as update. > > cvs -d:pserver:anonymous at bioinformatics.org:/cvsroot checkout -P > BioCocoa > > I already added the BioCocoa module name and the pruning option (-P), > checked it and it works. Perhaps you can add this to your email. Also > note that to join the mailinglist, they don't have to register at > bioinformatics yet, only if they want to join the project. Thanks, I will add that to the email. There's a another person on CocoaDev that could be useful. His nick is theMZA. Unfortunately there is no website or email, so I don't know how to reach him. If anyone knows his email, please let me know. - Koen. From kvddrift at earthlink.net Sun Jan 9 01:03:18 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 9 Jan 2005 01:03:18 -0500 Subject: [Biococoa-dev] why placeholder classes was: BCSequence class cluster In-Reply-To: References: <0DADF809-6178-11D9-9A51-000D93AE89A4@mekentosj.com> Message-ID: <287B12E5-6204-11D9-AE6D-000A95685F72@earthlink.net> On Jan 9, 2005, at 12:51 AM, Charles PARNOT wrote: >> On Jan 8, 2005, at 8:20 AM, Alexander Griekspoor wrote: >> >>>> Again, the reason why I came up with the idea of some public >>>> headers for placeholder classes for typed sequences was to propose >>>> the user BOTH OPTIONS! (but maybe we should not). >>> That's perhaps the most important question we have to answer first >>> indeed. >>> >> >> I still don't understand the construction with these placeholder >> classes. Why do we need them, next to the subclasses we already have? >> (this is an educational question, not a critical one ;-) >> >> - Koen. > > This only applies to the class cluster design. Because the public > superclass declares all the methods as valid, then all the subclasses > are known by the compiler to also respond to all the methods. > So we can't just use the headers of the subclass. We have to provide > separate headers with a restricted set of methods. Why a restricted set f headers? > But alloc-ing these objects will actually return objects of the class > cluster (hence 'placeholder' classes). The compiler is fooled. The > runtime gets what is needed. > I hope the shortness makes it clearer. I can make a longer version if > you want ;-) Please do, because it is still very confusing to me. - Koen. From charles.parnot at stanford.edu Sun Jan 9 01:29:10 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Sat, 8 Jan 2005 22:29:10 -0800 Subject: [Biococoa-dev] Misc Message-ID: Sorry I just wanted to discuss further a few points here, not critical points, and a little random, but still worth a few words... At 7:48 PM -0500 1/7/05, John Timmer wrote: >Yeah, this analogy doesn't really work in terms of taking into account why >Alex and I worry. You can add a .tiff to a non-path, and the result is >still a string. You might get unexpected behavior when you used it, but >you'd have to do something convoluted to get your app to crash as a result. Actually, like you say, the result is still a string. In a system that only uses BCSequence object, as long as you get BCSequence objects, you will never crash (except if you intentionally add runtime errors... well and if we make errors, but this is our task to squash these bugs). If you get NSArray, NSNumber, etc..., you should be fine too. What gets more dangerous is to receive nil back. Still OK, but borderline (like NSDictionary objectForKey). I am sorry the analogy with NSString is really really stretched, I could not find better, and I still agree the risk is higher. Note that as you will see in my other email of the day, I do see your point, and thinks we could provide some strong typing for those who wants it. But I still think that due to the nature of Cocoa, crashes would be rare as long as you return friendly objects. Their lack of meaning has a chance to get back to the real user of the final app, and this lack of meaning should make sense to him, hopefully, if he did something stupid. At 1:37 PM +0100 1/8/05, Alexander Griekspoor wrote: >Yes, I agree, if we can get rid of BCSymbollist and only have one class (BCSequence) that would have my preference too. Just my stupidity, adding many methods to a class (which we need to keep annotations in sync and add/remove them etc) doesn't make it more costly to use memory/speed wise? At 1:37 PM +0100 1/8/05, Alexander Griekspoor wrote: >Exactly my point, I'm not sure though, the first (and foremost) question is whether putting everything in one class doesn't matter memory/speed wise (I'm not talking about the small unused ivars). Second, how much will the sequence class increase code-wise to a point where things become unpractical. For example, I have experienced opensource projects where classes have become so extended that you can't say extract one class for use in a custom project without spending hours unwinding everything unnecessary, where with a class you basically have to incorporate the whole project to make it work. Maybe not so much of importance, but usually the more simple the code and easier to overview, the more useful and versatile it turns out to be. I am not sure about performance issue. But they should not be an issue. About subclass versus large class, my wild guess is more subclasses will have the runtime lookup the class hierarchy more, while more methods will have the runtime lookup the method list more (this is some pure science, here). With the caching and all, well, euh... About code bloating, yes this could be a issue on the developer-side. >At 1:37 PM +0100 1/8/05, Alexander Griekspoor wrote: >>If the purpose is to separate the code for annotations, an alternative is to use a category, and not a subclass. And I think I like that better, actually. >As said before, I'm not really in favor of categories as they don't really belong in a framework, BUT as they are private things maybe different. What is the problem with categories? I think I may understand what you mean: categories on an Apple object like NSDictionary are not welcome? No, they are not. But indeed, we are talking about private classes, so we do what we want (and Apple does it a lot, look at the headers of NString or NSArray, for example). You just have to put all the @interface in the same header file. At 7:58 AM -0500 1/8/05, Koen van der Drift wrote: >Now I think of it, is there a good reason why we should have immutable sequences? The only I can think of right now is that we have to be careful when annotations are present. If we can solve that, then IMO there is no need for both immutable and mutable variants. Charles, you know perl, any idea how this is solved in BioPerl? I know perl, but not BioPerl very much. I actually tried to use it once to read sequences, but it was so slow...; it was all FASTA format so a few lines of simple perl did much better. My interpretation is they had added many layers of abstraction for complicated sequence formats, which takes a bit hit of performance in perl. Note that in ObjC, we would not see such a hit. Anyway, I will have another look at BioPerl... Like Alex and you, I actually don't think sequence should be made immutable just because of annotations. This is a lame excuse (I hope nobody from BioJava ever reads that!!). The reason for immutable sequences is also for copying. It is possible that taking a subarray is also optimized just because NSArray is optimized for that (or will be). Maybe Apple is smart enough to create a subarray by pointing at a piece of the already existing parent array, so no new memory is used and no copy is done. More generally, any optimization put in NSArray will benefit BCSequence (if we use the native NSArray methods as much as possible). And we don't get them with NSMutableArray. At 2:20 PM +0100 1/8/05, Alexander Griekspoor wrote: >But I don't see why a method for which we provide a convenience method in BCSequence should be hidden in the BCTool, I don't mind that the BCTool methods we use internally in the BCSequence convenience methods are public as well, if a user wants to replicate the convenience method by hand for some reason, he's free to do so... I did not mean to hide the simple BCTool altogether. They can still be public, yes. Sorry for the confusion! At 2:20 PM +0100 1/8/05, Alexander Griekspoor wrote: >Hey, we can have the same approach we're now taking for bcsequence (are we? ;-) for such tools as well can't we? Have a public tool with private subclasses depending on the fed sequence subclass... I did not dare to bring that up... Let's just keep it for ourselves for now. At 2:20 PM +0100 1/8/05, Alexander Griekspoor wrote: >Hmm, that might no be the case actually, I've seen quite some alignment code recently and almost never people use different methods for aligning DNA and protein, there's really no big difference (take ClustalW for instance), only the scoring matrix might be a bit more complicated. A still think alignments are very nice in fact to be offered as a tool. OK, good to know. Makes our future brighter... -------- Sorry guys, your reading is not done. I am planning on sending another email with some further ideas following John, Peter and Alex's concerns with not having strongly typed classes. I hope what you will read will please you... Charles -- Charles Parnot charles.parnot at stanford.edu Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From charles.parnot at stanford.edu Sun Jan 9 02:15:20 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Sat, 8 Jan 2005 23:15:20 -0800 Subject: [Biococoa-dev] why placeholder classes was: BCSequence class cluster In-Reply-To: <287B12E5-6204-11D9-AE6D-000A95685F72@earthlink.net> References: <0DADF809-6178-11D9-9A51-000D93AE89A4@mekentosj.com> <287B12E5-6204-11D9-AE6D-000A95685F72@earthlink.net> Message-ID: >>>I still don't understand the construction with these placeholder classes. Why do we need them, next to the subclasses we already have? (this is an educational question, not a critical one ;-) >>> >>>- Koen. >> >>... I hope the shortness makes it clearer. I can make a longer version if you want ;-) > >Please do, because it is still very confusing to me. > >- Koen. OK, I will try with questions and answers so I will force you to follow my twisted path of reasoning;-) 1. I have a class cluster BCSequence. BCSequence responds to all kind of methods, inlcuding -complement. I let the user know by having that method in the header. But, pressured by John and Alex, I decide to make the subclass BCSequenceProtein public. How do I do it? Well, I make the header file 'BCSequenceProtein.h' public, so the user can #import it and use BCSequenceProtein. 2. Oups, BCSequenceProtein.h does also import BCSequence.h, so the compiler thinks that BCSequenceProtein can respond to '-complement'. Well, and all the methods. So much for a strongly-typed sequence class!! What do I do?? OK, I remove -complement from the BCSequence.h header, and only put it in BCSequenceDNA.h, but not in BCSequenceProtein.h 3. A?e, now BCSequence gets some compiler warnings when trying to use -complement. How do I prevent that?? Well, I go back to what I had before... 4. OK, but should I go back to step 1 and spend the rest of my life on it? No. 5. Then what do I do? You can't make BCSequenceProtein.h public, you have to create a fake placeholder class, say BCSeqProtein, that will be public, with the right set of methods in the header. The user only knows about this public header that does not inherit from BCSequence. So when using that header, you get a compiler warning if you call -complement. However, under the hood, when you alloc such an object, you really get a BCSequenceProtein object, which hooks your object back into the class cluster at runtime. I am kind of hoping that you were mostly missing points 1-4. So step 5 might still need some explanations... And step 5 might certainly have other answers. should't you be sleeping??? (I hope this email helps;-) Charles -- Charles Parnot charles.parnot at stanford.edu Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From charles.parnot at stanford.edu Sun Jan 9 02:47:27 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Sat, 8 Jan 2005 23:47:27 -0800 Subject: [Biococoa-dev] a new design to please everybody Message-ID: >At 2:20 PM +0100 1/8/05, Alexander Griekspoor wrote: >>Again, the reason why I came up with the idea of some public headers for placeholder classes for typed sequences was to propose the user BOTH OPTIONS! (but maybe we should not). >That's perhaps the most important question we have to answer first indeed. This seems indeed the problem raised by Alex, Peter and John. There is a sense that strongly typed sequences are wanted, and maybe even needed. I stated somewhere that a design decision should also be guided by what the user of the framework wants. And I also realize that several of these users will be you (and maybe the only ones for a while), so we should try to please us, potential users, too. As both Alex and John clearly would not feel confortable working without strongly typed sequence classes, it seems it would be bad to prevent their use. On the other hand, a good-for-all BCSequence object, that will blindly respond to almost any requests (to some variable extent), is wanted or at least seen as a good thing by several of us too (think WebView). Probably Alex, Koen, Peter and me. There is also some concern, including me, that there may be some limit there. And then, there is a debate over what kind of response would be appropriate for irrelevant messages sent to such a generic objec: runtime error, return nil, BCError, empty object (or self if appropriate)? Now, a little bit of history (already!) on the recent 'class cluster' discussion, viewed from my point of view: * It was triggered on my side by the feeling that the current code was getting a bit schizophrenic. It currently allows to instantiate viw BCSequenceFactory a weakly-typed object that will respond to the methods in BCSequence.h (at least from the compiler point of view). If this list of methods is very restrictive (only methods relevant to all types, no -complement, no -hydrophobicity,...), then you get a quite useless object. If this list of methods is large, then you have a problem: due to inheritance, the compiler will assume all the suclasses can respond to the messages; now the subclasses are useless, the compiler think they can respond to anything (hence no compiler warnings) * then I (somewhat stupidely) assumed that the latter case was the one favored by the current design, ie a one-for-all class; this is when I thought of the class cluster being a better design in this context; I still think it is for such a one-for-all class, because the sequence tyes are still different enough that they deserve their own class; * at the same time, to please some (yet virtual) users willing to stick to strong typing, I came up with the idea of an additional set of placeholder classes with resticted sets of methods in their headers; in the context of a class cluster, other ideas are possible; and actually, these ideas may apply to the current design too; i was thinking this could be added later anyway to please these yet non-existing users; * then I realized yesterday that such users actually existed, and I even could see their point; so now my opinion is that we should indeed give BOTH options to the user, which will please all of us (see above why) Going back one step, forgetting the class cluster idea for a minute, looking at the current design, and trying to think of what could be done to achieve this, a new idea came up, which looks very simple, easy to grasp for the developer and the user, and with minimal code duplication. Well, OK, maybe I exagerate a bit, I should let you decide by yourselves how good it is, and what pitfalls I am missing or subconciously hiding. Here it is. We keep mostly the same implementation as now. The superclass BCSequence is public (but abstract, see why below), and all the current subclasses are concrete and public (BCSequenceDNA, BCSequenceRNA,...). The superclass handles as much as possible the code that can be factored out (including annotations, though an intermediate subclass is also possible, see earlier discussions). The subclasses step in when necessary to replace the superclass methods (for optimizations, specific handling,...). Note that ALL methods should return something, regardless of the relevance (eg BCSequenceProtein should return something in response to -complement, which can done by the superclass, anyway); you will see why below; this can be achieved by having the superclass implement ALL the methods, always returning something not too stupid (-complement actually is already quite smart and in the superclass). Now about the headers. They are all public, because the classes are all public. We only keep in the superclass BCSequence the methods that apply to all subclasses, i.e. the restrictive set of methods (no -complement, no -hydrophobicity,...). We add the appropriate methods in the appropriate subclasses (-complement in BCSequenceDNA, -hydrophobicity in BCSequenceProtein,...) And THEN, we add another subclass, for example called BCSequenceGeneric. In the header of this subclass, we put all the methods. This will be for the user a concreate subclass with this one-for-all feel and look (hence 'generic'). And under the hood, this class is like a class cluster (ah! ah! the minute is elapsed; see above). At runtime, you don't get a BCSequenceGeneric instance, but an instance of one of the other subclasses, BCSequenceDNA, ... So no additional code is needed, it is already provided by the other classes. This new generic sequence can be used by the lover of the one-for-all class, and will automatically benefit from the implementation of the other subclass. As a result, if you use the generic one-for-all class, you can call any method you want and always get something back, without the need to know what is going on (it is in the hands of the user of the final app). However, if you use a typed class, you get appropriate compiler warnings (no runtime error, though). Note that should never use BCSequence (in theory you could, but you would not benefit from potential optimizations in the subclasses). I am still not sure how to fit the mutable/immutable design in this, but it seems you can't avoid NSMutableSequenceDNA et al. if you are going to have some strong typing. What do you think? Charles NB: that's all for today, it is bedtime -- Charles Parnot charles.parnot at stanford.edu Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From mek at mekentosj.com Sun Jan 9 05:32:08 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sun, 9 Jan 2005 11:32:08 +0100 Subject: [Biococoa-dev] Misc In-Reply-To: References: Message-ID: > At 1:37 PM +0100 1/8/05, Alexander Griekspoor wrote: >> Yes, I agree, if we can get rid of BCSymbollist and only have one >> class (BCSequence) that would have my preference too. Just my >> stupidity, adding many methods to a class (which we need to keep >> annotations in sync and add/remove them etc) doesn't make it more >> costly to use memory/speed wise? > At 1:37 PM +0100 1/8/05, Alexander Griekspoor wrote: >> Exactly my point, I'm not sure though, the first (and foremost) >> question is whether putting everything in one class doesn't matter >> memory/speed wise (I'm not talking about the small unused ivars). >> Second, how much will the sequence class increase code-wise to a >> point where things become unpractical. For example, I have >> experienced opensource projects where classes have become so extended >> that you can't say extract one class for use in a custom project >> without spending hours unwinding everything unnecessary, where with a >> class you basically have to incorporate the whole project to make it >> work. Maybe not so much of importance, but usually the more simple >> the code and easier to overview, the more useful and versatile it >> turns out to be. > I am not sure about performance issue. But they should not be an issue. > About subclass versus large class, my wild guess is more subclasses > will have the runtime lookup the class hierarchy more, while more > methods will have the runtime lookup the method list more (this is > some pure science, here). With the caching and all, well, euh... Sometime ago we had a discussion about this as well, and the caching indeed resolves this issue to nil what I've read.. > About code bloating, yes this could be a issue on the developer-side. >> > What is the problem with categories? > I think I may understand what you mean: categories on an Apple object > like NSDictionary are not welcome? No, they are not. But indeed, we > are talking about private classes, so we do what we want (and Apple > does it a lot, look at the headers of NString or NSArray, for > example). You just have to put all the @interface in the same header > file. Yep, that was what I meant, we shouldn't do public categories on our own classes, but internally they are actually very nice, if you look at the apple headers they're often all in the same header, but somehow they nicely separate different parts of the code (almost like #pragma mark). And as the bcsequence class grows it might be nice to do @interface BCSequence : NSObject .... very general methods .... @end @interface BCSequence (annotations) ... annotations methods ... @end @interface BCSequence (tools) ... convenience methods ... @end etc > At 2:20 PM +0100 1/8/05, Alexander Griekspoor wrote: >> Hey, we can have the same approach we're now taking for bcsequence >> (are we? ;-) for such tools as well can't we? Have a public tool with >> private subclasses depending on the fed sequence subclass... > I did not dare to bring that up... Let's just keep it for ourselves > for now. ha ha ;-) > -------- > > Sorry guys, your reading is not done. I am planning on sending > another email with some further ideas following John, Peter and Alex's > concerns with not having strongly typed classes. I hope what you will > read will please you... Can't wait! Alex ************************************************************** ** Alexander Griekspoor ** ************************************************************** The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com MacOS X: The power of UNIX with the simplicity of the Mac *************************************************************** -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 4242 bytes Desc: not available URL: From mek at mekentosj.com Sun Jan 9 05:53:39 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sun, 9 Jan 2005 11:53:39 +0100 Subject: [Biococoa-dev] a new design to please everybody In-Reply-To: References: Message-ID: Op 9-jan-05 om 8:47 heeft Charles PARNOT het volgende geschreven: >> At 2:20 PM +0100 1/8/05, Alexander Griekspoor wrote: >>> Again, the reason why I came up with the idea of some public headers >>> for placeholder classes for typed sequences was to propose the user >>> BOTH OPTIONS! (but maybe we should not). >> That's perhaps the most important question we have to answer first >> indeed. > > This seems indeed the problem raised by Alex, Peter and John. There is > a sense that strongly typed sequences are wanted, and maybe even > needed. Yes, and no, it's more a gut feeling (from someone without that much of experience, the question is what that's worth). > I stated somewhere that a design decision should also be guided by > what the user of the framework wants. And I also realize that several > of these users will be you (and maybe the only ones for a while), an important point indeed. > so we should try to please us, potential users, too. As both Alex and > John clearly would not feel confortable working without strongly typed > sequence classes, it seems it would be bad to prevent their use. > > On the other hand, a good-for-all BCSequence object, that will blindly > respond to almost any requests (to some variable extent), is wanted or > at least seen as a good thing by several of us too (think WebView). > Probably Alex, Koen, Peter and me. There is also some concern, > including me, that there may be some limit there. And then, there is a > debate over what kind of response would be appropriate for irrelevant > messages sent to such a generic objec: runtime error, return nil, > BCError, empty object (or self if appropriate)? That sums it up nicely, thanks! > > Now, a little bit of history (already!) on the recent 'class cluster' > discussion, viewed from my point of view: > * It was triggered on my side by the feeling that the current code was > getting a bit schizophrenic. It currently allows to instantiate viw > BCSequenceFactory a weakly-typed object that will respond to the > methods in BCSequence.h (at least from the compiler point of view). If > this list of methods is very restrictive (only methods relevant to all > types, no -complement, no -hydrophobicity,...), then you get a quite > useless object. If this list of methods is large, then you have a > problem: due to inheritance, the compiler will assume all the > suclasses can respond to the messages; now the subclasses are useless, > the compiler think they can respond to anything (hence no compiler > warnings) > * then I (somewhat stupidely) assumed that the latter case was the one > favored by the current design, ie a one-for-all class; In principle, a one-for-all-class would certainly be nicer than a dozen subclasses, from a user perspective. > this is when I thought of the class cluster being a better design in > this context; I still think it is for such a one-for-all class, > because the sequence tyes are still different enough that they deserve > their own class; Yes, I agree. > * at the same time, to please some (yet virtual) users willing to > stick to strong typing, I came up with the idea of an additional set > of placeholder classes with resticted sets of methods in their > headers; in the context of a class cluster, other ideas are possible; > and actually, these ideas may apply to the current design too; i was > thinking this could be added later anyway to please these yet > non-existing users; > * then I realized yesterday that such users actually existed, and I > even could see their point; so now my opinion is that we should indeed > give BOTH options to the user, which will please all of us (see above > why) Then my answer would be YES! > > > Going back one step, forgetting the class cluster idea for a minute, > looking at the current design, and trying to think of what could be > done to achieve this, a new idea came up, > Here it is. > > We keep mostly the same implementation as now. The superclass > BCSequence is public (but abstract, see why below), and all the > current subclasses are concrete and public (BCSequenceDNA, > BCSequenceRNA,...). The superclass handles as much as possible the > code that can be factored out (including annotations, though an > intermediate subclass is also possible, see earlier discussions). The > subclasses step in when necessary to replace the superclass methods > (for optimizations, specific handling,...). Note that ALL methods > should return something, regardless of the relevance (eg > BCSequenceProtein should return something in response to -complement, > which can done by the superclass, anyway); you will see why below; > this can be achieved by having the superclass implement ALL the > methods, always returning something not too stupid (-complement > actually is already quite smart and in the superclass). > > Now about the headers. They are all public, because the classes are > all public. We only keep in the superclass BCSequence the methods that > apply to all subclasses, i.e. the restrictive set of methods (no > -complement, no -hydrophobicity,...). We add the appropriate methods > in the appropriate subclasses (-complement in BCSequenceDNA, > -hydrophobicity in BCSequenceProtein,...) > > And THEN, we add another subclass, for example called > BCSequenceGeneric. In the header of this subclass, we put all the > methods. This will be for the user a concreate subclass with this > one-for-all feel and look (hence 'generic'). And under the hood, this > class is like a class cluster (ah! ah! the minute is elapsed; see > above). At runtime, you don't get a BCSequenceGeneric instance, but an > instance of one of the other subclasses, BCSequenceDNA, ... So no > additional code is needed, it is already provided by the other > classes. This new generic sequence can be used by the lover of the > one-for-all class, and will automatically benefit from the > implementation of the other subclass. > > As a result, if you use the generic one-for-all class, you can call > any method you want and always get something back, without the need to > know what is going on (it is in the hands of the user of the final > app). However, if you use a typed class, you get appropriate compiler > warnings (no runtime error, though). Note that should never use > BCSequence (in theory you could, but you would not benefit from > potential optimizations in the subclasses). > > I am still not sure how to fit the mutable/immutable design in this, > but it seems you can't avoid NSMutableSequenceDNA et al. if you are > going to have some strong typing. The latter is perfect, and I like the idea in principle. The only concern is if this in the end proves to be too complex. It requires very good documentation and tutorials (which we need anyway, but ok) to explain all this to a potential new user. Also, it somehow still feels schizophrenic, like "we couldn't choose so did both" (which is the case ;-). Again, I'm not sure. Sometimes we just have to do what I call the Apple approach, instead of giving him dozens of options to tweak and adjust, make that choice for the user, and make it so good that the user never even wants to change those settings. Reading all this starts me thinking, "well perhaps we should just go for the single sequence setup class cluster thing that reacts to all things" and find out if that works, it sounds very oop like and loose typing is part of that. Perhaps instead of wanting to strongly type things, we just have to check if we get something relevant (non-nil or whatever) back from the method (we often do that anyway). What do you think John, what's your feeling after reading Charles' proposal? Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* From peter.schols at bio.kuleuven.ac.be Sun Jan 9 07:07:32 2005 From: peter.schols at bio.kuleuven.ac.be (Peter Schols) Date: Sun, 9 Jan 2005 13:07:32 +0100 Subject: [Biococoa-dev] a new design to please everybody In-Reply-To: References: Message-ID: <0A6A3EC7-6237-11D9-8F3D-00039345483C@bio.kuleuven.ac.be> Charles, I like the approach you are proposing. This seems to make a lot of sense to me. I have one question though: if I get it right, the only reason we need the strongly typed approach as an extra option to the good-for-all BCSequence object is that people will get compiler warnings? Or are there additional reasons? To paraphrase Koen, this is an educational question, not a critical one ;-) Best wishes, Peter > * at the same time, to please some (yet virtual) users willing to > stick to strong typing, I came up with the idea of an additional set > of placeholder classes with resticted sets of methods in their > headers; in the context of a class cluster, other ideas are possible; > and actually, these ideas may apply to the current design too; i was > thinking this could be added later anyway to please these yet > non-existing users; > * then I realized yesterday that such users actually existed, and I > even could see their point; so now my opinion is that we should indeed > give BOTH options to the user, which will please all of us (see above > why) > > What do you think? > > Charles > > NB: that's all for today, it is bedtime > > -- > Charles Parnot > charles.parnot at stanford.edu > From jtimmer at bellatlantic.net Sun Jan 9 09:13:04 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Sun, 09 Jan 2005 09:13:04 -0500 Subject: [Biococoa-dev] BioCocoa invitation In-Reply-To: Message-ID: > > On Jan 8, 2005, at 5:52 PM, Alexander Griekspoor wrote: > >> Very nice Koen! The only thing I had some trouble with was the Foobar >> part. Now I might be the joke of the day if I'm wrong here, but I >> thought Foobar is derived from Fubar, which according to what I >> remember from "Saving Private Ryan", stands for "F*cked up beyond any >> recognition", not really a compliment to any new person ;-) > > LOL - no that was just a temp name, never intended to use that for > real. And I never knew about the acronym... > Alex remembers correctly - it's a US military acronym dating from WWII, with an alternate derivation being Beyond All Repair. In the states at least, though, it's been so long since it's been in common usage that most people would look at you in confusion, rather than being insulted. It was still in use when my father was in the navy during Korea, though, so it was only said around the house when he was very angry about something breaking. JT _______________________________________________ This mind intentionally left blank From jtimmer at bellatlantic.net Sun Jan 9 09:49:22 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Sun, 09 Jan 2005 09:49:22 -0500 Subject: [Biococoa-dev] Re: a new design to please everybody (am I pleased?) In-Reply-To: <0A6A3EC7-6237-11D9-8F3D-00039345483C@bio.kuleuven.ac.be> Message-ID: > I have one question though: if I get it right, the only reason we need > the strongly typed approach as an extra option to the good-for-all > BCSequence object is that people will get compiler warnings? Or are > there additional reasons? To paraphrase Koen, this is an educational > question, not a critical one ;-) > As probably the strongest advocate for it, let me give a few reasons: The compiler warnings are definitely one thing - in practice, they're the biggest way I identify when I've been sloppy with my coding before resorting to debugging. Non-typed methods mean that the sequence type has to be checked every time the method is called, slowing the code down. Uncertain return values mean that careful developers will have to surround every method call with tests (did it return nil? Was the returned sequence length 0?) that slow the code down and are very tedious to constantly implement. How are we going to define a sensible return value for a method call that makes no sense in the first place? Is nil appropriate? Throwing an exception? With typed classes, methods could actually be grouped with the data they could operate on, instead of in with data they may or may not operate on. At 4 non-abstract classes to represent all sequences, I hadn't thought the structure was that bad. What is the advantage of allowing something that makes no biological sense (ie - complementing a protein)? I'm sure I could think of more were I not a bit foggy headed from cold medication. It just feels like we're twisting the biology in order to achieve code elegance. And I'm not even certain we're achieving that - it feels like the equivalent of getting rid of all of the UI control classes in order to achieve the code purity of only interacting with NSViews. A button responds to different things and conveys different information than a slider does - they're different classes. The same could be said for a protein and a DNA sequence - why treat them like they're the same class? Although the class cluster idea is very appealing on an intellectual level, it's going to take some extra work to implement it in such a way that users (again, meaning non-contributing developers) will be able to grasp it easily. And the internal structure is probably going to be extremely confusing to anyone downloading the source for the first time. Given that, I'm just wondering whether it's going to be more effort than it's worth. As someone noted a few mails ago, it's going to require extensive documentation, and the assumption that developers are any better about reading the documentation than regular users are. As an aside, I was struck by the following quote from Charles: > 2. Oups, BCSequenceProtein.h does also import BCSequence.h, so the compiler > thinks that BCSequenceProtein can respond to '-complement'. Well, and all the > methods. So much for a strongly-typed sequence class!! What do I do?? > > OK, I remove -complement from the BCSequence.h header, and only put it in > BCSequenceDNA.h, but not in BCSequenceProtein.h > > 3. A?e, now BCSequence gets some compiler warnings when trying to use > -complement. How do I prevent that?? Item 3 is the whole point - it's a good thing, not something that needs to be prevented ;). _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Sun Jan 9 09:51:07 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 9 Jan 2005 09:51:07 -0500 Subject: [Biococoa-dev] why placeholder classes was: BCSequence class cluster In-Reply-To: References: <0DADF809-6178-11D9-9A51-000D93AE89A4@mekentosj.com> <287B12E5-6204-11D9-AE6D-000A95685F72@earthlink.net> Message-ID: On Jan 9, 2005, at 2:15 AM, Charles PARNOT wrote: > 5. Then what do I do? > > You can't make BCSequenceProtein.h public, you have to create a fake > placeholder class, say BCSeqProtein, that will be public, with the > right set of methods in the header. The user only knows about this > public header that does not inherit from BCSequence. So when using > that header, you get a compiler warning if you call -complement. > However, under the hood, when you alloc such an object, you really get > a BCSequenceProtein object, which hooks your object back into the > class cluster at runtime. > > I am kind of hoping that you were mostly missing points 1-4. So step 5 > might still need some explanations... And step 5 might certainly have > other answers. > > should't you be sleeping??? (I hope this email helps;-) > > Thanks Charles, I understand now why you need a placeholder. However, I am not certain why BCSequenceProtein.h should be public. I still don't like the idea of having the possibility of the user to use the subclasses. And yes, I got some sleep. - Koen. From kvddrift at earthlink.net Sun Jan 9 12:09:24 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 9 Jan 2005 12:09:24 -0500 Subject: [Biococoa-dev] Re: a new design to please everybody (am I pleased?) In-Reply-To: References: Message-ID: <367005B6-6261-11D9-B4BF-003065A5FDCC@earthlink.net> On Jan 9, 2005, at 9:49 AM, John Timmer wrote: > > Non-typed methods mean that the sequence type has to be checked every > time > the method is called, slowing the code down. How much slow down will that be compared to the rest of the method? Most methods iterate over the sequence, so that will probably take much more time compared to one call to check the type. > > Although the class cluster idea is very appealing on an intellectual > level, > it's going to take some extra work to implement it in such a way that > users > (again, meaning non-contributing developers) will be able to grasp it > easily. And the internal structure is probably going to be extremely > confusing to anyone downloading the source for the first time. Given > that, > I'm just wondering whether it's going to be more effort than it's > worth. As > someone noted a few mails ago, it's going to require extensive > documentation, and the assumption that developers are any better about > reading the documentation than regular users are. I agree with that. The class cluster design still has the same issues as the current code (type check yes or no, return values, etc), and there are no real big advantages (at least I don't see them :). It's good that we keep talking on this subject, though, because it is the core of BioCocoa, and we all should feel comfortable with whatever we eventually decide on. BTW, how about a class cluster for BCSymbol???? ;-) cheers, - Koen. From peter.schols at bio.kuleuven.ac.be Sun Jan 9 13:05:06 2005 From: peter.schols at bio.kuleuven.ac.be (Peter Schols) Date: Sun, 9 Jan 2005 19:05:06 +0100 Subject: [Biococoa-dev] Re: a new design to please everybody (am I pleased?) In-Reply-To: References: Message-ID: Thanks John, this clears some things up for me. One of the good things about Cocoa - at least for me - is that you need very few "checks" when compared to other languages / frameworks. Some Java source code I have looked at (I'm not a Java expert) had much more checks. I agree that we have to make sure that the framework itself takes care of most of the checking (by offering the possibility to have errors caught at compile time), so that the code written by users of the framework can be as simple as possible. I was - and I still am - a big proponent of the class cluster approach because it would offer the user of the framework a very easy interface. I think we would all prefer that the framework would handle all the details for us behind the scenes with one, powerful BCSequence class as the only interface. However, now that John convinced me that strong typing is necessary as well - or should at least be available for the users who want it - I'm not so sure what value the class cluster approach would add. Let me explain this: the reason I love the NSString cluster, for example is that I don't have to bother choosing between NSSmallString and NSLargeMutableString or whatever class I'm actually creating by creating an NSString. And that I don't have to do the type checking to see whether the particular NSXXXString instance responds to a given method. But what would be the point of it if I was forced to check the type of NSString the framework created for me (which would expose me to the different classes in the cluster anyway)? In other words, I'd love the class cluster approach with one, good-for-all BCSequence class if this were the only interface, just like with NSString or NSArray. But what would be the point of having the cluster around if we will need to deal with the subclasses (or related, more specific classes) anyway. Wouldn't that be an unnecessary duplication of the interface users will have to learn, or at least be confusing? The point I want to make is: shouldn't we choose one approach, in stead of offering two options to the users? Again, this is just a question I'm asking myself, maybe I'm completely wrong. I have never designed a framework before, I just want to help out by thinking about this and by asking - possibly stupid - questions (and by writing some code as soon as we have taken a decision, of course ;-)). Peter On 09 Jan 2005, at 15:49, John Timmer wrote: > >> I have one question though: if I get it right, the only reason we need >> the strongly typed approach as an extra option to the good-for-all >> BCSequence object is that people will get compiler warnings? Or are >> there additional reasons? To paraphrase Koen, this is an educational >> question, not a critical one ;-) >> > > As probably the strongest advocate for it, let me give a few reasons: > > The compiler warnings are definitely one thing - in practice, they're > the > biggest way I identify when I've been sloppy with my coding before > resorting > to debugging. > > Non-typed methods mean that the sequence type has to be checked every > time > the method is called, slowing the code down. > > Uncertain return values mean that careful developers will have to > surround > every method call with tests (did it return nil? Was the returned > sequence > length 0?) that slow the code down and are very tedious to constantly > implement. > > How are we going to define a sensible return value for a method call > that > makes no sense in the first place? Is nil appropriate? Throwing an > exception? > > With typed classes, methods could actually be grouped with the data > they > could operate on, instead of in with data they may or may not operate > on. > > At 4 non-abstract classes to represent all sequences, I hadn't thought > the > structure was that bad. > > What is the advantage of allowing something that makes no biological > sense > (ie - complementing a protein)? > > > I'm sure I could think of more were I not a bit foggy headed from cold > medication. It just feels like we're twisting the biology in order to > achieve code elegance. > > And I'm not even certain we're achieving that - it feels like the > equivalent > of getting rid of all of the UI control classes in order to achieve > the code > purity of only interacting with NSViews. A button responds to > different > things and conveys different information than a slider does - they're > different classes. The same could be said for a protein and a DNA > sequence > - why treat them like they're the same class? > > Although the class cluster idea is very appealing on an intellectual > level, > it's going to take some extra work to implement it in such a way that > users > (again, meaning non-contributing developers) will be able to grasp it > easily. And the internal structure is probably going to be extremely > confusing to anyone downloading the source for the first time. Given > that, > I'm just wondering whether it's going to be more effort than it's > worth. As > someone noted a few mails ago, it's going to require extensive > documentation, and the assumption that developers are any better about > reading the documentation than regular users are. > > > > As an aside, I was struck by the following quote from Charles: >> 2. Oups, BCSequenceProtein.h does also import BCSequence.h, so the >> compiler >> thinks that BCSequenceProtein can respond to '-complement'. Well, and >> all the >> methods. So much for a strongly-typed sequence class!! What do I do?? >> >> OK, I remove -complement from the BCSequence.h header, and only put >> it in >> BCSequenceDNA.h, but not in BCSequenceProtein.h >> >> 3. A?e, now BCSequence gets some compiler warnings when trying to use >> -complement. How do I prevent that?? > Item 3 is the whole point - it's a good thing, not something that > needs to > be prevented ;). > > > _______________________________________________ > This mind intentionally left blank > > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > From kvddrift at earthlink.net Sun Jan 9 13:19:33 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 9 Jan 2005 13:19:33 -0500 Subject: [Biococoa-dev] BioCocoa invitation In-Reply-To: References: <0E076119-61BB-11D9-8F75-000A95685F72@earthlink.net> Message-ID: <0316E8AA-626B-11D9-A15A-000A95685F72@earthlink.net> On Jan 9, 2005, at 12:45 AM, Charles PARNOT wrote: > Maybe you could also simply put up a small annoucement with about the > same contents in the BioCocoa page of the wiki (there is one, right?), > dated and stating that Biococoa is looking for developers that know > their biology, at least for discussion (and not just any developer). > See here: http://www.cocoadev.com/index.pl?BioCocoa - Koen. From mek at mekentosj.com Sun Jan 9 16:34:59 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sun, 9 Jan 2005 22:34:59 +0100 Subject: [Biococoa-dev] Re: a new design to please everybody (am I pleased?) In-Reply-To: References: Message-ID: <5042FE2D-6286-11D9-9A51-000D93AE89A4@mekentosj.com> Peter sums up my feelings perfectly: > In other words, I'd love the class cluster approach with one, > good-for-all BCSequence class if this were the only interface, just > like with NSString or NSArray. But what would be the point of having > the cluster around if we will need to deal with the subclasses (or > related, more specific classes) anyway. Wouldn't that be an > unnecessary duplication of the interface users will have to learn, or > at least be confusing? The point I want to make is: shouldn't we > choose one approach, in stead of offering two options to the users? Absolutely! I'm definitely in favor of this brilliant one-does-it-all BCSequence class, but the moment we're moving into the situation where you're using the subclasses in the end, there's no use to go in that direction IMHO. I think we should choose either way. > Again, this is just a question I'm asking myself, maybe I'm completely > wrong. I have never designed a framework before, I just want to help > out by thinking about this and by asking - possibly stupid - questions > (and by writing some code as soon as we have taken a decision, of > course ;-)). Same here! Alex > > Peter > > > On 09 Jan 2005, at 15:49, John Timmer wrote: > >> >>> I have one question though: if I get it right, the only reason we >>> need >>> the strongly typed approach as an extra option to the good-for-all >>> BCSequence object is that people will get compiler warnings? Or are >>> there additional reasons? To paraphrase Koen, this is an educational >>> question, not a critical one ;-) >>> >> >> As probably the strongest advocate for it, let me give a few reasons: >> >> The compiler warnings are definitely one thing - in practice, they're >> the >> biggest way I identify when I've been sloppy with my coding before >> resorting >> to debugging. >> >> Non-typed methods mean that the sequence type has to be checked every >> time >> the method is called, slowing the code down. >> >> Uncertain return values mean that careful developers will have to >> surround >> every method call with tests (did it return nil? Was the returned >> sequence >> length 0?) that slow the code down and are very tedious to constantly >> implement. >> >> How are we going to define a sensible return value for a method call >> that >> makes no sense in the first place? Is nil appropriate? Throwing an >> exception? >> >> With typed classes, methods could actually be grouped with the data >> they >> could operate on, instead of in with data they may or may not operate >> on. >> >> At 4 non-abstract classes to represent all sequences, I hadn't >> thought the >> structure was that bad. >> >> What is the advantage of allowing something that makes no biological >> sense >> (ie - complementing a protein)? >> >> >> I'm sure I could think of more were I not a bit foggy headed from cold >> medication. It just feels like we're twisting the biology in order to >> achieve code elegance. >> >> And I'm not even certain we're achieving that - it feels like the >> equivalent >> of getting rid of all of the UI control classes in order to achieve >> the code >> purity of only interacting with NSViews. A button responds to >> different >> things and conveys different information than a slider does - they're >> different classes. The same could be said for a protein and a DNA >> sequence >> - why treat them like they're the same class? >> >> Although the class cluster idea is very appealing on an intellectual >> level, >> it's going to take some extra work to implement it in such a way that >> users >> (again, meaning non-contributing developers) will be able to grasp it >> easily. And the internal structure is probably going to be extremely >> confusing to anyone downloading the source for the first time. Given >> that, >> I'm just wondering whether it's going to be more effort than it's >> worth. As >> someone noted a few mails ago, it's going to require extensive >> documentation, and the assumption that developers are any better about >> reading the documentation than regular users are. >> >> >> >> As an aside, I was struck by the following quote from Charles: >>> 2. Oups, BCSequenceProtein.h does also import BCSequence.h, so the >>> compiler >>> thinks that BCSequenceProtein can respond to '-complement'. Well, >>> and all the >>> methods. So much for a strongly-typed sequence class!! What do I do?? >>> >>> OK, I remove -complement from the BCSequence.h header, and only put >>> it in >>> BCSequenceDNA.h, but not in BCSequenceProtein.h >>> >>> 3. A?e, now BCSequence gets some compiler warnings when trying to use >>> -complement. How do I prevent that?? >> Item 3 is the whole point - it's a good thing, not something that >> needs to >> be prevented ;). >> >> >> _______________________________________________ >> This mind intentionally left blank >> >> >> _______________________________________________ >> Biococoa-dev mailing list >> Biococoa-dev at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/biococoa-dev >> > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows is a 32-bit patch to a 16-bit shell for an 8-bit operating system, written for a 4-bit processor by a 2- bit company without 1 bit of sense. ********************************************************* From mek at mekentosj.com Sun Jan 9 16:35:33 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sun, 9 Jan 2005 22:35:33 +0100 Subject: [Biococoa-dev] BioCocoa invitation In-Reply-To: <0316E8AA-626B-11D9-A15A-000A95685F72@earthlink.net> References: <0E076119-61BB-11D9-8F75-000A95685F72@earthlink.net> <0316E8AA-626B-11D9-A15A-000A95685F72@earthlink.net> Message-ID: <6450DAF2-6286-11D9-9A51-000D93AE89A4@mekentosj.com> Nice! Op 9-jan-05 om 19:19 heeft Koen van der Drift het volgende geschreven: > > On Jan 9, 2005, at 12:45 AM, Charles PARNOT wrote: > >> Maybe you could also simply put up a small annoucement with about the >> same contents in the BioCocoa page of the wiki (there is one, >> right?), dated and stating that Biococoa is looking for developers >> that know their biology, at least for discussion (and not just any >> developer). >> > > See here: http://www.cocoadev.com/index.pl?BioCocoa > > > - Koen. > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* From charles.parnot at stanford.edu Tue Jan 11 02:35:10 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Mon, 10 Jan 2005 23:35:10 -0800 Subject: [Biococoa-dev] Re: a new design to please everybody (am I pleased?) In-Reply-To: References: Message-ID: Hi all, I have a cold too, that is a coincidence! I am writing under the influence of Actifed (I feel like I am floating, very nice). In responding to John here, I just want to show (again) that both approaches have advantages and pitfalls (well, because John is already defending typed sequences, I can only show their pitfalls and I can only say good things about a generic BCSequenceGeneric class...). At 9:49 AM -0500 1/9/05, John Timmer wrote: >Non-typed methods mean that the sequence type has to be checked every time >the method is called, slowing the code down. Actually, this is exactly the problem I could have with typed classes... and that you don't have with BCSequenceGeneric. The good thing with a generic sequence is that you don't have to check, you know it responds (how to respond to irrelevant methods is another topic, see below!). Conversely, I can see where I could have some checking to do with typed classes. Let's say I am writing the new killer app, say... DNAStrander. It can load proteins, plasmids,... and export/import all sort of formats(wow!). Of course, my document has a BCSequence ivar to hold the sequence (I have to use the superclass type because it could be any sequence). Imagine the user chooses complement in the menu. My code can't just send the message complement to my BCSequence ivar. I have to check the sequence type (to avoid irrelevant behavior that I don't want to handle) and cast my ivar to a BCSequenceDNA (to prevent compiler warnings) before sending the message. As soon as you use a mix of different types of sequences in an app, you get into trouble, because you have to use the superclass to refer to them as a whole, and then identify their type and possibly some cast to get the messages right. Plus, if more sequence types are added in the future, you will have to go through all your code to add the corresponding case tests. >Uncertain return values mean that careful developers will have to surround >every method call with tests (did it return nil? Was the returned sequence >length 0?) that slow the code down and are very tedious to constantly >implement. > >How are we going to define a sensible return value for a method call that >makes no sense in the first place? Is nil appropriate? Throwing an >exception? If a header says a method is handled, it should not crash the app. So, at least, I don't think throwing an exception is appropriate in the case of a generic sequence. I would also ban nil as much as possible. Here are examples of possible behaviors: * complement of a protein --> self or empty sequence * cutting a prot with enzyme --> return empty arrry or array with just the prot * hydrophobicity of DNA --> return 0 * align a DNA and prot --> align next to each other I don't think it will crash the app as long as you get objects of the expected types. It may result in weird behavior on the final app, but only in cases where the final user does equally weird things. > >With typed classes, methods could actually be grouped with the data they >could operate on, instead of in with data they may or may not operate on. This is what would happen anyway in the design with a placeholder class BCSequenceGeneric. Methods specific for one type will be written in the corresponding subclass. To handle all other cases, the superclass would step in and return something consistent with the expected return type. I think it could work mostly with one-liner, like 'return self' or 'return [NSArray array];',... About alignment, more tests may have to be done (which would anyway have to be done by the user otherwise). Alignement could involve passing NSArray of sequences, so some sort of type checking will probably be needed no matter what. >At 4 non-abstract classes to represent all sequences, I hadn't thought the >structure was that bad. I think it is actually good, and I am proposing to take advantage of that great structure to add another potentially useful class =) >What is the advantage of allowing something that makes no biological sense >(ie - complementing a protein)? I gave some examples, and for me it is also a general sense that this could work for some types of apps (a gut feeling). >I'm sure I could think of more were I not a bit foggy headed from cold >medication. It just feels like we're twisting the biology in order to >achieve code elegance. Yes, elegance, robustness, conciseness, backward-compatibility. Sorry I am just throwing words here and I am being a bit pedantic (am I?). I don't want to be too long and explain all over again why I think it could be true in a number of situations, and that summarizes my feelings. >And I'm not even certain we're achieving that - it feels like the equivalent >of getting rid of all of the UI control classes in order to achieve the code >purity of only interacting with NSViews. A button responds to different >things and conveys different information than a slider does - they're >different classes. The same could be said for a protein and a DNA sequence >- why treat them like they're the same class? >Although the class cluster idea is very appealing on an intellectual level, >it's going to take some extra work to implement it in such a way that users >(again, meaning non-contributing developers) will be able to grasp it >easily. And the internal structure is probably going to be extremely >confusing to anyone downloading the source for the first time. Given that, >I'm just wondering whether it's going to be more effort than it's worth. As >someone noted a few mails ago, it's going to require extensive >documentation, and the assumption that developers are any better about >reading the documentation than regular users are. I am not fighting for class cluster per se. I am defending the existence of a good-for-all generic sequence class that will allow simple apps to be designed by the user with very little work. The simili-class cluster I proposed in my email was a way to minimize code writing (it should not be called class cluster anymore, really, maybe simply a placeholder design). The idea is to have just two methods '-initWithString' and '-sequenceWithString' in a placeholder class BCSequenceGeneric. Once init-ed, the instances will completely rely on code in the typed sequence subclasses. Any change in those will be automatically used without doing anything special. So, the additional code is very very little: the only additional code is every time a new public method is added to one of the subclasses, we have to have code in the superclass that handles it no matter what (like I said above, at least something in the line of 'return self'). In many cases, hopefully, it would be handled by the superclass anyway (like complement). Regarding documentation and how the whole thing is received by the user and by a potential new BioCocoa developer, I agree some extra work will be needed depending on the option we choose. However, I don't think it is 'extensive' (eg, getting the headerdoc for BCSequenceGeneric is just copy and paste of the headers from the typed BCSequence subclass) and I don't think it would be so confusing for a potential developer. Because it is a general concern (including me), I will try to write another email about this issue. > >As an aside, I was struck by the following quote from Charles: >> 2. Oups, BCSequenceProtein.h does also import BCSequence.h, so the compiler >> thinks that BCSequenceProtein can respond to '-complement'. Well, and all the >> methods. So much for a strongly-typed sequence class!! What do I do?? >> >> OK, I remove -complement from the BCSequence.h header, and only put it in >> BCSequenceDNA.h, but not in BCSequenceProtein.h >> >> 3. A?e, now BCSequence gets some compiler warnings when trying to use >> -complement. How do I prevent that?? >Item 3 is the whole point - it's a good thing, not something that needs to >be prevented ;). Yes, this would be the most logical behavior in the current design if we were to have only typed sequence (and I would want it to be the behavior). Interestingly, even in my last proposition, it would also be the behavior;-) Only the additional subclass BCSequenceGeneric would be granted all the methods. The abstract BCSequence would only have the methods common to all subclasses. In conclusion, I still think there is a case for a BCSequenceGeneric. I am starting to think that there might not be a need to choose now (or ever) for one design over the other. I hope I can get that in an other email! and of course, I wish you a quick recovery, John! Charles -- Charles Parnot charles.parnot at stanford.edu Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From charles.parnot at stanford.edu Tue Jan 11 02:38:49 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Mon, 10 Jan 2005 23:38:49 -0800 Subject: [Biococoa-dev] Re: a new design to please everybody (am I pleased?) In-Reply-To: <367005B6-6261-11D9-B4BF-003065A5FDCC@earthlink.net> References: <367005B6-6261-11D9-B4BF-003065A5FDCC@earthlink.net> Message-ID: >BTW, how about a class cluster for BCSymbol???? ;-) > >cheers, > >- Koen. mmmhh.. Well, let's keep that for ourselves for now, Koen, don't tell anybody ;-) Charles -- Charles Parnot charles.parnot at stanford.edu Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From charles.parnot at stanford.edu Tue Jan 11 03:59:55 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Tue, 11 Jan 2005 00:59:55 -0800 Subject: [Biococoa-dev] Should we choose? In-Reply-To: <5042FE2D-6286-11D9-9A51-000D93AE89A4@mekentosj.com> References: <5042FE2D-6286-11D9-9A51-000D93AE89A4@mekentosj.com> Message-ID: At 10:34 PM +0100 1/9/05, Alexander Griekspoor wrote: >Peter sums up my feelings perfectly: > >>In other words, I'd love the class cluster approach with one, good-for-all BCSequence class if this were the only interface, just like with NSString or NSArray. But what would be the point of having the cluster around if we will need to deal with the subclasses (or related, more specific classes) anyway. Wouldn't that be an unnecessary duplication of the interface users will have to learn, or at least be confusing? The point I want to make is: shouldn't we choose one approach, in stead of offering two options to the users? > >Absolutely! I'm definitely in favor of this brilliant one-does-it-all BCSequence class, but the moment we're moving into the situation where you're using the subclasses in the end, there's no use to go in that direction IMHO. I think we should choose either way. OK, here is my latest thoughts. We may not have to choose now, and maybe we should not, for several reasons: * the way I proposed to have BCSequenceGeneric implemented in my Saturday's email, it just requires a few lines of code in this new subclass, and will automagically take advantage of the code in the other subclasses (see my other email today); so both designs can be developed at the same time at very little cost for the developers (the BCSequenceGeneric just living like a small harmless parasite; OK, better, living in perfect symbiosis with the others;-); if ultimately we want to choose, there will be almost no refactoring needed; we either ditch BCSequenceGeneric, or make the other subclasses private (and probably promote BCSequenceGeneric to the superclass); what I am saying here is we can start coding now and think later (that should please Alex!!) * because it can wait, proponents of both sides can live happily together (in symbiosis) within the BioCocoa team without the feeling that this is not right, which could hurt their motivation; again, us developers will be also the main users for a while, and giving the possibility for each of us to do it the way we like it will be a good motivation to keep contributing... * as applications using the frameworks get mature (mostly developed by the developer/user of the Biococoa team), things might get clearer too; ultimately, it might make sense to have both BCSequenceGeneric as well as the other typed subclasses around at the same time; they have very distinct roles and distinct potential uses; the BCSequenceGeneric could be used in general purpose program, while the other subclasses could be used in more specialized programs (this is just an example); users may like one approach vs the other for various other reasons; probably the user would not want to mix both approaches, though; This is the first part of my thoughts. Following the third point, I just want to consider for a minute that we keep both designs around the way I proposed it. I already talked about the coding effort, which I see as really small. What about the documentation issue, for both the user and a new developer? This is a general concern about having two designs at the same time, and I agree it might ultimately prove confusing. But let's just imagine it is there for now. How could we still do it right? * documenting BCSequence; the purpose here is mostly the way you introduce it; I will try something... ---- BCSequence is an abstract superclass for the different type of sequences handled by the BioCocoa framework. The concrete subclasses include: - BCSequenceDNA that handles .... - BCSequenceRNA that handles .... - BCSequenceProtein that handles .... - BCSequenceCodon that handles .... In addition, BCSequence has an other concrete subclass called BCSequenceGeneric. This subclass encompasses all the different types of sequences and can respond to all of the messages normally specifically handled by only one or a subset of the other subclasses BCSequenceDNA, BCSequenceRNA, BCSequenceProtein,... Thus, BCSequenceGeneric is a general purpose class, that can handle of the messages that any type of sequence could have to respond to. It is the user's choice to use the weakly type BCSequenceGeneric class or to use the set of typed subclasses. This choice will depend on the type of applications developed, and will also depend on the user's personal taste. The framework has been designed to function with both approaches, though mixing the two might prove confusing and is not recommended. BCSequenceGeneric will appear more powerful and flexible to develop general purpose applications. The use of the typed sequence classes BCSequenceDNA, BCSequenceRNA, BCSequenceProtein,... will allow more control on the details of the app behavior, and might be more appropriate for more specialized applications. Finally, it might also simply be a matter of taste. Note that BCSequenceGeneric is designed to automatically use the implementation of the typed sequence classes. Because of this, the behavior and the performance of the general purpose class are strictly equivalent to that of the corresponding typed sequence class, in any given situation. ----- * documenting BCSequenceGeneric - introduction... ---- BCSequenceGeneric is a concrete subclass of BCSequence. As suggested by its name, BCSequenceGeneric provides a generic interface to all the sequence types (DNA, RNA, protein,...). In reality, BCSequenceGeneric is just a placeholder class. After initialization, it will actually return an instance of one of the typed subclasses BCSequenceDNA, BCSequenceRNA, BCSequenceProtein,... Its functioning is very similar to the class cluster design. Importantly, this is all transparent, so the user of the BioCocoa framework does not have to know about the details (and is better off ignoring them, actually). Importantly, this design results in behavior indistiguishible from the underlying typed sequence classes BCSequenceDNA, BCSequenceRNA, BCSequenceProtein,..., and has no cost in performance over using those subclasses explicitely. When a method is appropriately called on the right sequence type (like calling hydrophobicity for a protein), it automatically uses the appropriate implementation of the subclass. When the method is irrelevant for the sequence type (like calling hydrophobicity for a DNA sequence), the method still returns a value of the expected type, such as an empty sequence, an empty array, or a zero value. This way, the developer should be able to use BCSequenceGeneric in all situations without having to check the sequence type or fear runtime errors. By leaving the details for the framework to handle, the application requires less code and its behavior will be more general. If more control is needed over the application behavior, or if different types of sequences are handled by separate parts of the application, the developer might consider using explicitely the other subclasses of BCSequence, namely BCSequenceDNA, BCSequenceRNA and BCSequenceProtein. ---- * documenting the methods of BCSequenceGeneric: copy and past of the headers from BCSequenceDNA/RNA/... * explaining the design to a new developer. Reading the user docs will introduce the concept just as well. The class hierarchy itself makes sense. Once the purpose of BCSequenceGeneric is understood, the implementation is trivial. The concept of a placeholder class is either already known, or new, in which case the new developer will learn something. He can then forget about the details. I may be missing some other details (or huge problems?), but it seems not so difficult to explain, is it? OK, I will stop here! In conlusion, I believe we could keep the existing code, start coding again now, keep the two designs around, and choose later the best design. Or not even choose. In which case there might be ways to present it to the user, the easiest path here being plain honest about the schizophrenic aspect of the framework. good night, Charles -- Charles Parnot charles.parnot at stanford.edu Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From peter.schols at bio.kuleuven.ac.be Tue Jan 11 05:21:36 2005 From: peter.schols at bio.kuleuven.ac.be (Peter Schols) Date: Tue, 11 Jan 2005 11:21:36 +0100 Subject: [Biococoa-dev] Should we choose? In-Reply-To: References: <5042FE2D-6286-11D9-9A51-000D93AE89A4@mekentosj.com> Message-ID: <92D7543E-63BA-11D9-A990-00039345483C@bio.kuleuven.ac.be> Hi Charles, Thanks for your reassuring mail. Your description takes care of my major concerns about the schizophrenic nature of the framework and about the difficulties we could face explaining the interface to our users (developers). It seems that BCSequenceGeneric would require very little effort to create and even less effort to maintain while offering users easy-access to the entire framework. I also like your API documentation proposals. It's my opinion too that - if everybody agrees with this structure - we can start implementing it. Peter On 11 Jan 2005, at 09:59, Charles PARNOT wrote: > At 10:34 PM +0100 1/9/05, Alexander Griekspoor wrote: >> Peter sums up my feelings perfectly: >> >>> In other words, I'd love the class cluster approach with one, >>> good-for-all BCSequence class if this were the only interface, just >>> like with NSString or NSArray. But what would be the point of having >>> the cluster around if we will need to deal with the subclasses (or >>> related, more specific classes) anyway. Wouldn't that be an >>> unnecessary duplication of the interface users will have to learn, >>> or at least be confusing? The point I want to make is: shouldn't we >>> choose one approach, in stead of offering two options to the users? >> >> Absolutely! I'm definitely in favor of this brilliant one-does-it-all >> BCSequence class, but the moment we're moving into the situation >> where you're using the subclasses in the end, there's no use to go in >> that direction IMHO. I think we should choose either way. > > OK, here is my latest thoughts. We may not have to choose now, and > maybe we should not, for several reasons: > > * the way I proposed to have BCSequenceGeneric implemented in my > Saturday's email, it just requires a few lines of code in this new > subclass, and will automagically take advantage of the code in the > other subclasses (see my other email today); so both designs can be > developed at the same time at very little cost for the developers (the > BCSequenceGeneric just living like a small harmless parasite; OK, > better, living in perfect symbiosis with the others;-); if ultimately > we want to choose, there will be almost no refactoring needed; we > either ditch BCSequenceGeneric, or make the other subclasses private > (and probably promote BCSequenceGeneric to the superclass); what I am > saying here is we can start coding now and think later (that should > please Alex!!) > > * because it can wait, proponents of both sides can live happily > together (in symbiosis) within the BioCocoa team without the feeling > that this is not right, which could hurt their motivation; again, us > developers will be also the main users for a while, and giving the > possibility for each of us to do it the way we like it will be a good > motivation to keep contributing... > > * as applications using the frameworks get mature (mostly developed by > the developer/user of the Biococoa team), things might get clearer > too; ultimately, it might make sense to have both BCSequenceGeneric as > well as the other typed subclasses around at the same time; they have > very distinct roles and distinct potential uses; the BCSequenceGeneric > could be used in general purpose program, while the other subclasses > could be used in more specialized programs (this is just an example); > users may like one approach vs the other for various other reasons; > probably the user would not want to mix both approaches, though; > > This is the first part of my thoughts. > > > Following the third point, I just want to consider for a minute that > we keep both designs around the way I proposed it. I already talked > about the coding effort, which I see as really small. What about the > documentation issue, for both the user and a new developer? This is a > general concern about having two designs at the same time, and I agree > it might ultimately prove confusing. But let's just imagine it is > there for now. How could we still do it right? > > * documenting BCSequence; the purpose here is mostly the way you > introduce it; I will try something... > > ---- > BCSequence is an abstract superclass for the different type of > sequences handled by the BioCocoa framework. The concrete subclasses > include: > - BCSequenceDNA that handles .... > - BCSequenceRNA that handles .... > - BCSequenceProtein that handles .... > - BCSequenceCodon that handles .... > > In addition, BCSequence has an other concrete subclass called > BCSequenceGeneric. This subclass encompasses all the different types > of sequences and can respond to all of the messages normally > specifically handled by only one or a subset of the other subclasses > BCSequenceDNA, BCSequenceRNA, BCSequenceProtein,... Thus, > BCSequenceGeneric is a general purpose class, that can handle of the > messages that any type of sequence could have to respond to. > > It is the user's choice to use the weakly type BCSequenceGeneric class > or to use the set of typed subclasses. This choice will depend on the > type of applications developed, and will also depend on the user's > personal taste. The framework has been designed to function with both > approaches, though mixing the two might prove confusing and is not > recommended. BCSequenceGeneric will appear more powerful and flexible > to develop general purpose applications. The use of the typed sequence > classes BCSequenceDNA, BCSequenceRNA, BCSequenceProtein,... will > allow more control on the details of the app behavior, and might be > more appropriate for more specialized applications. Finally, it might > also simply be a matter of taste. > > Note that BCSequenceGeneric is designed to automatically use the > implementation of the typed sequence classes. Because of this, the > behavior and the performance of the general purpose class are strictly > equivalent to that of the corresponding typed sequence class, in any > given situation. > ----- > > > * documenting BCSequenceGeneric - introduction... > ---- > BCSequenceGeneric is a concrete subclass of BCSequence. As suggested > by its name, BCSequenceGeneric provides a generic interface to all the > sequence types (DNA, RNA, protein,...). In reality, BCSequenceGeneric > is just a placeholder class. After initialization, it will actually > return an instance of one of the typed subclasses BCSequenceDNA, > BCSequenceRNA, BCSequenceProtein,... Its functioning is very similar > to the class cluster design. Importantly, this is all transparent, so > the user of the BioCocoa framework does not have to know about the > details (and is better off ignoring them, actually). Importantly, this > design results in behavior indistiguishible from the underlying typed > sequence classes BCSequenceDNA, BCSequenceRNA, BCSequenceProtein,..., > and has no cost in performance over using those subclasses > explicitely. > > When a method is appropriately called on the right sequence type (like > calling hydrophobicity for a protein), it automatically uses the > appropriate implementation of the subclass. When the method is > irrelevant for the sequence type (like calling hydrophobicity for a > DNA sequence), the method still returns a value of the expected type, > such as an empty sequence, an empty array, or a zero value. This way, > the developer should be able to use BCSequenceGeneric in all > situations without having to check the sequence type or fear runtime > errors. By leaving the details for the framework to handle, the > application requires less code and its behavior will be more general. > > If more control is needed over the application behavior, or if > different types of sequences are handled by separate parts of the > application, the developer might consider using explicitely the other > subclasses of BCSequence, namely BCSequenceDNA, BCSequenceRNA and > BCSequenceProtein. > ---- > > * documenting the methods of BCSequenceGeneric: copy and past of the > headers from BCSequenceDNA/RNA/... > > * explaining the design to a new developer. Reading the user docs will > introduce the concept just as well. The class hierarchy itself makes > sense. Once the purpose of BCSequenceGeneric is understood, the > implementation is trivial. The concept of a placeholder class is > either already known, or new, in which case the new developer will > learn something. He can then forget about the details. > > > I may be missing some other details (or huge problems?), but it seems > not so difficult to explain, is it? > > > OK, I will stop here! > In conlusion, I believe we could keep the existing code, start coding > again now, keep the two designs around, and choose later the best > design. Or not even choose. In which case there might be ways to > present it to the user, the easiest path here being plain honest about > the schizophrenic aspect of the framework. > > good night, > > Charles > > -- > Charles Parnot > charles.parnot at stanford.edu > > Help science go fast forward: > http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ > > Room B157 in Beckman Center > 279, Campus Drive > Stanford University > Stanford, CA 94305 (USA) > > Tel +1 650 725 7754 > Fax +1 650 725 8021 > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > From mek at mekentosj.com Tue Jan 11 05:37:09 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Tue, 11 Jan 2005 11:37:09 +0100 Subject: [Biococoa-dev] Should we choose? In-Reply-To: <92D7543E-63BA-11D9-A990-00039345483C@bio.kuleuven.ac.be> References: <5042FE2D-6286-11D9-9A51-000D93AE89A4@mekentosj.com> <92D7543E-63BA-11D9-A990-00039345483C@bio.kuleuven.ac.be> Message-ID: Op 11-jan-05 om 11:21 heeft Peter Schols het volgende geschreven: > Hi Charles, > > Thanks for your reassuring mail. Your description takes care of my > major concerns about the schizophrenic nature of the framework and > about the difficulties we could face explaining the interface to our > users (developers). > It seems that BCSequenceGeneric would require very little effort to > create and even less effort to maintain while offering users > easy-access to the entire framework. > I also like your API documentation proposals. > > It's my opinion too that - if everybody agrees with this structure - > we can start implementing it. Hooray! Same here! Nicely done Charles, and yes indeed it make's me very happy to see some implementations of all those ideas again, although I'm still looking forward to long emails ;-) Cheers, Alex Ps. I'll take a look at the documentation proposals soon and see how we can organize that a bit... ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com 4Peaks - For Peaks, Four Peaks. 2004 Winner of the Apple Design Awards Best Mac OS X Student Product http://www.mekentosj.com/4peaks ********************************************************* From jtimmer at bellatlantic.net Tue Jan 11 08:22:09 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Tue, 11 Jan 2005 08:22:09 -0500 Subject: [Biococoa-dev] Re: a new design to please everybody (am I pleased?) In-Reply-To: Message-ID: Just to clarify one thing regarding the section below. I wasn't suggesting our methods themselves were going to crash the app, but rather they make the following situation more likely: bob = [genericSequence complement]; bobArray = [bob sequenceArray]; id aSymbol = [bobArray objectAtIndex: 5]; <- app crashes here Part of it may be just how I do things: when designing my own objects, I put a lot of error checking in to the initialization and transformation routines, and then assume for all other purposes that I have an object with valid internals. Faced with using a generic sequence, I wouldn't be able to do that, and I'd be checking its length all the time to make sure it's valid. Anyway, I do agree that your plan for moving forward's a good one, and sorry you've picked up a cold as well. Cheers, JT > >> Uncertain return values mean that careful developers will have to surround >> every method call with tests (did it return nil? Was the returned sequence >> length 0?) that slow the code down and are very tedious to constantly >> implement. >> >> How are we going to define a sensible return value for a method call that >> makes no sense in the first place? Is nil appropriate? Throwing an >> exception? > > If a header says a method is handled, it should not crash the app. So, at > least, I don't think throwing an exception is appropriate in the case of a > generic sequence. I would also ban nil as much as possible. > > Here are examples of possible behaviors: > * complement of a protein --> self or empty sequence > * cutting a prot with enzyme --> return empty arrry or array with just the > prot > * hydrophobicity of DNA --> return 0 > * align a DNA and prot --> align next to each other > > I don't think it will crash the app as long as you get objects of the expected > types. It may result in weird behavior on the final app, but only in cases > where the final user does equally weird things. _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Tue Jan 11 20:37:29 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue, 11 Jan 2005 20:37:29 -0500 Subject: [Biococoa-dev] Should we choose? In-Reply-To: References: <5042FE2D-6286-11D9-9A51-000D93AE89A4@mekentosj.com> <92D7543E-63BA-11D9-A990-00039345483C@bio.kuleuven.ac.be> Message-ID: <85BD6180-643A-11D9-AEC9-000A95685F72@earthlink.net> On Jan 11, 2005, at 5:37 AM, Alexander Griekspoor wrote: > Op 11-jan-05 om 11:21 heeft Peter Schols het volgende geschreven: >> Hi Charles, >> >> Thanks for your reassuring mail. Your description takes care of my >> major concerns about the schizophrenic nature of the framework and >> about the difficulties we could face explaining the interface to our >> users (developers). >> It seems that BCSequenceGeneric would require very little effort to >> create and even less effort to maintain while offering users >> easy-access to the entire framework. >> I also like your API documentation proposals. >> >> It's my opinion too that - if everybody agrees with this structure - >> we can start implementing it. > > Hooray! Same here! Nicely done Charles, and yes indeed it make's me > very happy to see some implementations of all those ideas again, > although I'm still looking forward to long emails ;-) > Cheers, > Alex > > Ps. I'll take a look at the documentation proposals soon and see how > we can organize that a bit... > > It seems like a good compromise to me, so let's move forward with this structure. I would like to add though that I think that is is important that we have a symbolset as one of the ivars of the sequence class. This has several advantages: 1. If the user by accident creates a DNA or protein sequence with a nonsense character (eg a number), then the framework will not create the sequence, and can inform the user that there was an error in the input. 2. As soon as we have a valid sequence, there is no need to test for each BCSymbol if it is the right through every iteration, because we already know the sequence is created using the correct subclass of BCSymbol. This will also speed up things. 3. The symbol set can be used to test if the right sequence is passed to a method. As said in an earlier email, the cost of one test is minimal compared to a MW calculation, translation, etc. cheers, - Koen (wants to buy a Mac mini and iPod Shuffle ;-) From charles.parnot at stanford.edu Wed Jan 12 01:05:37 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Tue, 11 Jan 2005 22:05:37 -0800 Subject: [Biococoa-dev] Should we choose? In-Reply-To: <85BD6180-643A-11D9-AEC9-000A95685F72@earthlink.net> References: <5042FE2D-6286-11D9-9A51-000D93AE89A4@mekentosj.com> <92D7543E-63BA-11D9-A990-00039345483C@bio.kuleuven.ac.be> <85BD6180-643A-11D9-AEC9-000A95685F72@earthlink.net> Message-ID: >It seems like a good compromise to me, so let's move forward with this structure. I would like to add though that I think that is is important that we have a symbolset as one of the ivars of the sequence class. This has several advantages: > >1. If the user by accident creates a DNA or protein sequence with a nonsense character (eg a number), then the framework will not create the sequence, and can inform the user that there was an error in the input. > >2. As soon as we have a valid sequence, there is no need to test for each BCSymbol if it is the right through every iteration, because we already know the sequence is created using the correct subclass of BCSymbol. This will also speed up things. > >3. The symbol set can be used to test if the right sequence is passed to a method. As said in an earlier email, the cost of one test is minimal compared to a MW calculation, translation, etc. > > >cheers, > >- Koen (wants to buy a Mac mini and iPod Shuffle ;-) Like I said, I think you guys have already set up a nice foundation! I see no reason to ditch the symbol set, and many to keep it! Charles (get both) -- Charles Parnot charles.parnot at stanford.edu Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From charles.parnot at stanford.edu Wed Jan 12 01:06:13 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Tue, 11 Jan 2005 22:06:13 -0800 Subject: [Biococoa-dev] Re: a new design to please everybody (am I pleased?) Message-ID: >Just to clarify one thing regarding the section below. I wasn't suggesting >our methods themselves were going to crash the app, but rather they make the >following situation more likely: > >bob = [genericSequence complement]; >bobArray = [bob sequenceArray]; >id aSymbol = [bobArray objectAtIndex: 5]; <- app crashes here > >Part of it may be just how I do things: when designing my own objects, I >put a lot of error checking in to the initialization and transformation >routines, and then assume for all other purposes that I have an object with >valid internals. Faced with using a generic sequence, I wouldn't be able to >do that, and I'd be checking its length all the time to make sure it's >valid. > >Anyway, I do agree that your plan for moving forward's a good one, and sorry >you've picked up a cold as well. > >Cheers, JT > I agree empty NSArrays can be very annoying (maybe even more than nil, sometimes!), and we should avoid them. For instance, returning self instead of an empty sequence, or an array with only one sequence as a result of a digest. Charles Charles Parnot charles.parnot at stanford.edu Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 -- Charles Parnot charles.parnot at stanford.edu Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 -------------- next part -------------- An HTML attachment was scrubbed... URL: From charles.parnot at stanford.edu Wed Jan 12 01:41:27 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Tue, 11 Jan 2005 22:41:27 -0800 Subject: [Biococoa-dev] Should we choose? In-Reply-To: <92D7543E-63BA-11D9-A990-00039345483C@bio.kuleuven.ac.be> References: <5042FE2D-6286-11D9-9A51-000D93AE89A4@mekentosj.com> <92D7543E-63BA-11D9-A990-00039345483C@bio.kuleuven.ac.be> Message-ID: >Hi Charles, > >Thanks for your reassuring mail. Your description takes care of my major concerns about the schizophrenic nature of the framework and about the difficulties we could face explaining the interface to our users (developers). >It seems that BCSequenceGeneric would require very little effort to create and even less effort to maintain while offering users easy-access to the entire framework. >I also like your API documentation proposals. > >It's my opinion too that - if everybody agrees with this structure - we can start implementing it. > >Peter > Wow, it looks like we get a nice unanimous agreement! I guess I will have to write the (very little) code needed for BCSequenceGeneric, now that I bragged so much about it. BTW, is the naming OK? I was tempted to propose simply BCSequence for the generic subclass, but I think in the end, BCSequenceGeneric (or something equivalent) could carry more sense and be clearer for everybody, particularly if living with other subclasses. And keeping the superclass name to BCSequence is also more consistent with the class hierarchy. OK, I know, I am just arguing with myself, here. Next issues are: * annotations (a new ivar in the superclass, and a category of the superclass for the implementation??) * mutable/immutable: another hot debate coming soon... I might fall for the subclassing option... Charles NB: I will send some code to the list asap. I am not sure I want to deal with CVS immediately. Well, I am not a 'developer' yet, anyway ;-). The compiling of the most recent BioCocoa anonymously CVS-ed was troublesome with my Xcode settings for build paths, etc... and I need to figure that out before I mess up your project. -- Charles Parnot charles.parnot at stanford.edu Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From kvddrift at earthlink.net Wed Jan 12 18:31:40 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 12 Jan 2005 18:31:40 -0500 Subject: [Biococoa-dev] Should we choose? In-Reply-To: References: <5042FE2D-6286-11D9-9A51-000D93AE89A4@mekentosj.com> <92D7543E-63BA-11D9-A990-00039345483C@bio.kuleuven.ac.be> Message-ID: <1CA148DA-64F2-11D9-9DC1-003065A5FDCC@earthlink.net> On Jan 12, 2005, at 1:41 AM, Charles PARNOT wrote: > Wow, it looks like we get a nice unanimous agreement! I guess I will > have to write the (very little) code needed for BCSequenceGeneric, now > that I bragged so much about it. BTW, is the naming OK? I was tempted > to propose simply BCSequence for the generic subclass, but I think in > the end, BCSequenceGeneric (or something equivalent) could carry more > sense and be clearer for everybody, particularly if living with other > subclasses. And keeping the superclass name to BCSequence is also more > consistent with the class hierarchy. OK, I know, I am just arguing > with myself, here. Actually, I would prefer the name BCSequence for what the user will use, which is BCSequenceGeneric in your proposal, right? We can then rename BCSequence to something like BCAbstractSequence, or even promote the current BCSymbolList to that role. > > Next issues are: > * annotations (a new ivar in the superclass, and a category of the > superclass for the implementation??) IIRC, I think we already agreed to make a BCAnnotation class, and put an ivar to it in the superclass. cheers, - Koen. From charles.parnot at stanford.edu Wed Jan 12 19:32:35 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Wed, 12 Jan 2005 16:32:35 -0800 Subject: [Biococoa-dev] Should we choose? Message-ID: >On Jan 12, 2005, at 1:41 AM, Charles PARNOT wrote: > >>Wow, it looks like we get a nice unanimous agreement! I guess I >>will have to write the (very little) code needed for >>BCSequenceGeneric, now that I bragged so much about it. BTW, is the >>naming OK? I was tempted to propose simply BCSequence for the >>generic subclass, but I think in the end, BCSequenceGeneric (or >>something equivalent) could carry more sense and be clearer for >>everybody, particularly if living with other subclasses. And >>keeping the superclass name to BCSequence is also more consistent >>with the class hierarchy. OK, I know, I am just arguing with >>myself, here. > >Actually, I would prefer the name BCSequence for what the user will >use, which is BCSequenceGeneric in your proposal, right? We can then >rename BCSequence to something like BCAbstractSequence, or even >promote the current BCSymbolList to that role. I don't really have an preference on this (I have contradictory thoughts that cancel each other), so I will let you guys decide. > >> >>Next issues are: >>* annotations (a new ivar in the superclass, and a category of the >>superclass for the implementation??) > >IIRC, I think we already agreed to make a BCAnnotation class, and >put an ivar to it in the superclass. > OK, I was not sure, hence the question marks. I am complete newbie in the whole annotation discussion anyway, so I will just try to catch up as it goes... Charles Charles Parnot charles.parnot at stanford.edu Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 -- Charles Parnot charles.parnot at stanford.edu Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 -------------- next part -------------- An HTML attachment was scrubbed... URL: From kvddrift at earthlink.net Thu Jan 13 06:34:51 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 13 Jan 2005 06:34:51 -0500 Subject: [Biococoa-dev] Should we choose? In-Reply-To: References: <5042FE2D-6286-11D9-9A51-000D93AE89A4@mekentosj.com> <92D7543E-63BA-11D9-A990-00039345483C@bio.kuleuven.ac.be> Message-ID: <235BCE88-6557-11D9-A650-003065A5FDCC@earthlink.net> On Jan 12, 2005, at 1:41 AM, Charles PARNOT wrote: > NB: I will send some code to the list asap. I am not sure I want to > deal with CVS immediately. Well, I am not a 'developer' yet, anyway > ;-). The compiling of the most recent BioCocoa anonymously CVS-ed was > troublesome with my Xcode settings for build paths, etc... and I need > to figure that out before I mess up your project. > See this older message on the list for setting up XCode and CVS: http://bioinformatics.org/pipermail/biococoa-dev/2004-July/000017.html Also make sure that you use the -P flag when checking out the code: cvs checkout -P BioCocoa This prevents that you download empty, deprecated folders to your project. - Koen. From charles.parnot at stanford.edu Thu Jan 13 23:18:39 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Thu, 13 Jan 2005 20:18:39 -0800 Subject: [Biococoa-dev] BioCocoa.framework path Message-ID: I identified one very little issue with the project settings. This issue prevented me from compiling the demo apps 'out of the box' from the CVS-ed project (and could prevent others for the same reason). The project includes a reference to the file 'BioCocoa.framework' (I am talking about the one in the root group, not the one in the 'Products' group). This reference is the one being used for building the demos app (inside the corresponging targets, an alias of it is found in the 'Frameworks & Librairies' group). The problem is that the path for this file is set to "build/BioCocoa.framework" and "relative to enclosing group". It should be set to "BioCocoa.framework" and "relative to build products". People can have different settings for the build products, and this ensures the right path will be used no matter what the settings are. It should not change anything to the final path for any of you (apparently, you all have your build products and intermediate inside the project folder). It would then work better for me as I have set my build products to go in a special folder (namely ~/Xcode/Products), and not inside the project (the reason is that I don't like to have this big 'build' folder inside my project; it is annoying when I copy the project from one computer to another, which I do quite often). And I may not be the only one with these settings. I hope my explanations are clear! Charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From charles.parnot at stanford.edu Thu Jan 13 23:47:26 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Thu, 13 Jan 2005 20:47:26 -0800 Subject: [Biococoa-dev] may I be a developer? Message-ID: Now that I think I have solved this issue I had with the compiling, maybe I could be added to the developer list. Is that needed if I want to be able to commit, or can I already commit now (I have a account)? thanks for the info! (BTW, thanks, Koen for your link for the ssh setup) Charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From peter.schols at bio.kuleuven.ac.be Fri Jan 14 04:21:12 2005 From: peter.schols at bio.kuleuven.ac.be (Peter Schols) Date: Fri, 14 Jan 2005 10:21:12 +0100 Subject: [Biococoa-dev] may I be a developer? In-Reply-To: References: Message-ID: Hi Charles, To join the BC developer community, just create a bioinformatics.org account if you don't already have one and let me know your member name so I can add you to the list. Best wishes, Peter On 14 Jan 2005, at 05:47, Charles PARNOT wrote: > Now that I think I have solved this issue I had with the compiling, > maybe I could be added to the developer list. Is that needed if I want > to be able to commit, or can I already commit now (I have a account)? > > thanks for the info! (BTW, thanks, Koen for your link for the ssh > setup) > > Charles > > -- > Help science go fast forward: > http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ > > Charles Parnot > charles.parnot at stanford.edu > > Room B157 in Beckman Center > 279, Campus Drive > Stanford University > Stanford, CA 94305 (USA) > > Tel +1 650 725 7754 > Fax +1 650 725 8021 > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > From kvddrift at earthlink.net Fri Jan 14 19:44:57 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 14 Jan 2005 19:44:57 -0500 Subject: [Biococoa-dev] Should we choose? In-Reply-To: References: <5042FE2D-6286-11D9-9A51-000D93AE89A4@mekentosj.com> <92D7543E-63BA-11D9-A990-00039345483C@bio.kuleuven.ac.be> Message-ID: On Jan 12, 2005, at 1:41 AM, Charles PARNOT wrote: > * mutable/immutable: another hot debate coming soon... I might fall > for the subclassing option... > There's a good discussion going on about this subject on the cocoa-dev mailinglist. I need to read more the postings, but they have some interesting ideas about subclassing. - Koen. From charles.parnot at stanford.edu Mon Jan 17 13:24:49 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Mon, 17 Jan 2005 10:24:49 -0800 Subject: [Biococoa-dev] BioCocoa.framework path In-Reply-To: References: Message-ID: OK, I am officially a developer now :-) I have set up ssh and all, and I am quite impressed by Xcode support for CVS. I had not tried in a while, and will probably give up the command-line and the CVS-GUI tool I had been using on my other projects! My first commit was to fix that path issue documented below, let me know if it causes any trouble to you. I will be working on the BCSequenceGeneric, or whatever we call it in the end. Charles At 20:18 -0800 1/13/05, Charles PARNOT wrote: >I identified one very little issue with the project settings. This issue prevented me from compiling the demo apps 'out of the box' from the CVS-ed project (and could prevent others for the same reason). > >The project includes a reference to the file 'BioCocoa.framework' (I am talking about the one in the root group, not the one in the 'Products' group). This reference is the one being used for building the demos app (inside the corresponging targets, an alias of it is found in the 'Frameworks & Librairies' group). > >The problem is that the path for this file is set to "build/BioCocoa.framework" and "relative to enclosing group". It should be set to "BioCocoa.framework" and "relative to build products". People can have different settings for the build products, and this ensures the right path will be used no matter what the settings are. It should not change anything to the final path for any of you (apparently, you all have your build products and intermediate inside the project folder). > >It would then work better for me as I have set my build products to go in a special folder (namely ~/Xcode/Products), and not inside the project (the reason is that I don't like to have this big 'build' folder inside my project; it is annoying when I copy the project from one computer to another, which I do quite often). And I may not be the only one with these settings. > >I hope my explanations are clear! > >Charles > -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From kvddrift at earthlink.net Mon Jan 17 20:28:07 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Mon, 17 Jan 2005 20:28:07 -0500 Subject: [Biococoa-dev] BioCocoa.framework path In-Reply-To: References: Message-ID: <34D145EA-68F0-11D9-A4CB-003065A5FDCC@earthlink.net> On Jan 17, 2005, at 1:24 PM, Charles PARNOT wrote: > OK, I am officially a developer now :-) > > I have set up ssh and all, and I am quite impressed by Xcode support > for CVS. I had not tried in a while, and will probably give up the > command-line and the CVS-GUI tool I had been using on my other > projects! Is it possible to make a branch using Xcode, or do I need to do that from the CLI? > My first commit was to fix that path issue documented below, let me > know if it causes any trouble to you. No problems compiling here. > > I will be working on the BCSequenceGeneric, or whatever we call it in > the end. Great, I'm looking forward to see what it looks like in real code :) - Koen. From charles.parnot at stanford.edu Mon Jan 17 22:19:36 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Mon, 17 Jan 2005 19:19:36 -0800 Subject: [Biococoa-dev] BioCocoa.framework path In-Reply-To: <34D145EA-68F0-11D9-A4CB-003065A5FDCC@earthlink.net> References: <34D145EA-68F0-11D9-A4CB-003065A5FDCC@earthlink.net> Message-ID: At 8:28 PM -0500 1/17/05, Koen van der Drift wrote: >On Jan 17, 2005, at 1:24 PM, Charles PARNOT wrote: > >>OK, I am officially a developer now :-) >> >> I have set up ssh and all, and I am quite impressed by Xcode support for CVS. I had not tried in a while, and will probably give up the command-line and the CVS-GUI tool I had been using on my other projects! > > >Is it possible to make a branch using Xcode, or do I need to do that from the CLI? > Wow, too complicated for me!! I don't know. I just do basics commit/diff... And when I tried some time ago in Xcode, it was not quite there, plus some issues with nib files. Now is much better. Charles -- Charles Parnot charles.parnot at stanford.edu Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From mek at mekentosj.com Tue Jan 18 06:02:03 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Tue, 18 Jan 2005 12:02:03 +0100 Subject: [Biococoa-dev] BioCocoa.framework path In-Reply-To: References: Message-ID: <62A59916-6940-11D9-A430-000D93AE89A4@mekentosj.com> > OK, I am officially a developer now :-) Great!! Welcome ;-) > > I have set up ssh and all, and I am quite impressed by Xcode support > for CVS. I had not tried in a while, and will probably give up the > command-line and the CVS-GUI tool I had been using on my other > projects! It's very nice indeed, although I noticed that you have to commit some things by hand, like updated NIB files... > My first commit was to fix that path issue documented below, let me > know if it causes any trouble to you. No problems here either. > > I will be working on the BCSequenceGeneric, or whatever we call it in > the end. Very nice, looking forward to that Charles, Cheers, Alex Ps. did you think about the xgrid widget already, next week I'm on a holiday so if we want to drop it before, we'd have to think about it before thursday night.... > > Charles > > > > At 20:18 -0800 1/13/05, Charles PARNOT wrote: >> I identified one very little issue with the project settings. This >> issue prevented me from compiling the demo apps 'out of the box' from >> the CVS-ed project (and could prevent others for the same reason). >> >> The project includes a reference to the file 'BioCocoa.framework' (I >> am talking about the one in the root group, not the one in the >> 'Products' group). This reference is the one being used for building >> the demos app (inside the corresponging targets, an alias of it is >> found in the 'Frameworks & Librairies' group). >> >> The problem is that the path for this file is set to >> "build/BioCocoa.framework" and "relative to enclosing group". It >> should be set to "BioCocoa.framework" and "relative to build >> products". People can have different settings for the build products, >> and this ensures the right path will be used no matter what the >> settings are. It should not change anything to the final path for any >> of you (apparently, you all have your build products and intermediate >> inside the project folder). >> >> It would then work better for me as I have set my build products to >> go in a special folder (namely ~/Xcode/Products), and not inside the >> project (the reason is that I don't like to have this big 'build' >> folder inside my project; it is annoying when I copy the project >> from one computer to another, which I do quite often). And I may not >> be the only one with these settings. >> >> I hope my explanations are clear! >> >> Charles >> > > > -- > Help science go fast forward: > http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ > > Charles Parnot > charles.parnot at stanford.edu > > Room B157 in Beckman Center > 279, Campus Drive > Stanford University > Stanford, CA 94305 (USA) > > Tel +1 650 725 7754 > Fax +1 650 725 8021 > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 E-mail: a.griekspoor at nki.nl AIM: mekentosj at mac.com Web: http://www.mekentosj.com EnzymeX - To cut or not to cut http://www.mekentosj.com/enzymex ********************************************************* From charles.parnot at stanford.edu Tue Jan 18 17:56:56 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Tue, 18 Jan 2005 14:56:56 -0800 Subject: [Biococoa-dev] immutability and multithreading Message-ID: I just remembered another advantage of immutable objects: multithreading. In general, immutable objects are safe, mutable are not. So if you only have mutable objects, you have to copy them before use and make sure the copy is private to the thread. Multithreading-support is probably not a priority, but is something to also keep in mind. Charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From kvddrift at earthlink.net Sat Jan 22 17:01:46 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 22 Jan 2005 17:01:46 -0500 Subject: [Biococoa-dev] BioCocoa.framework path In-Reply-To: References: Message-ID: <35302DC6-6CC1-11D9-8B33-003065A5FDCC@earthlink.net> On Jan 17, 2005, at 1:24 PM, Charles PARNOT wrote: > I will be working on the BCSequenceGeneric, or whatever we call it in > the end. > And then there was silence... ;-) Charles, do you need any help with this class? If you just have some snippets, or empty methods, go ahead and post them here or submit them to CVS. This way we all can have a look at what you have in mind, and maybe fill in some of the blanks. - Koen. From charles.parnot at stanford.edu Sun Jan 23 17:45:05 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Sun, 23 Jan 2005 14:45:05 -0800 Subject: [Biococoa-dev] BioCocoa.framework path Message-ID: >On Jan 17, 2005, at 1:24 PM, Charles PARNOT wrote: > >>I will be working on the BCSequenceGeneric, or whatever we call it in the end. >> > >And then there was silence... ;-) hey, it's been less than a week! :-) > >Charles, do you need any help with this class? If you just have some snippets, or empty methods, go ahead and post them here or submit them to CVS. This way we all can have a look at what you have in mind, and maybe fill in some of the blanks. > > >- Koen. OK, before writing the header, I wanted to make sure I understood fully the BCSequence design in terms of init methods. In the way, I found a couple of questions that I was about to share with the list... when I have more than 5 min of time, because you know me, I need to write a long email! I am the kind that like to have fully thought the design in details before writing the code (which then does not take very long). Any way, the contents of that class should be quite independent from the rest, so please go on with the other stuff!! Maybe one thing that can be said in less than 5 minutes: could you guys vote on the class naming: * superclass = BCSequence or BCSequenceAbstract or ... * generic subclass = BCSequence or BCSequenceGeneric or ... Another email soon! Charles -- Charles Parnot charles.parnot at stanford.edu Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From kvddrift at earthlink.net Sun Jan 23 21:43:02 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 23 Jan 2005 21:43:02 -0500 Subject: [Biococoa-dev] BioCocoa.framework path In-Reply-To: References: Message-ID: On Jan 23, 2005, at 5:45 PM, Charles PARNOT wrote: > > Maybe one thing that can be said in less than 5 minutes: > could you guys vote on the class naming: > * superclass = BCSequence or BCSequenceAbstract or ... > * generic subclass = BCSequence or BCSequenceGeneric or ... > I vote for BCSequence as the generic subclass, and BCAbstractSequence as the superclass. - Koen. From charles.parnot at stanford.edu Mon Jan 31 19:03:05 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Mon, 31 Jan 2005 16:03:05 -0800 Subject: [Biococoa-dev] BCSymbolList replacement Message-ID: OK, I finally took the time to write some code yesterday. I have done the most difficult part of it, I think, which was to refactor the BCSequence class tree: * Superclass = BCAbstractSequence (Koen won the vote on the name, 100%!) that includes annotations (no code yet, though!!) * BCSymbolList thus gone; sorry Koen, it seems that I had to undo most of what you did when adding that class; the vote was for removal of that hierarchy for simplification purpose; we don't have to, we can also keep BCSymbolList as the super-superclass and then BCAbstractSequence; * BCSequence = one of the subclass of BCAbstractSequence (I have not added all the code, yet, I have more questions to ask to you guys... next email!) * BCSequenceDNA, BCSequenceRNA,... : same as before, they are simply now subclasses of BCAbstractSequence Before commiting, I just wanted to check one last time, because I had to modify ~30 classes to replace the 'symbolList' namings with 'sequence', also in method names (several 'Find and Replace' were necessary). The compilation is fine and everything seems to work OK. I don't know if you want to add a tag before such major refactoring (this is probably ~ the same as when you added the BCSymbolList, Koen?). I suppose such major refactoring is bound to happen at early stages of a project, and are a good thing? :-) charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From kvddrift at earthlink.net Mon Jan 31 19:16:52 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Mon, 31 Jan 2005 19:16:52 -0500 Subject: [Biococoa-dev] BCSymbolList replacement In-Reply-To: References: Message-ID: <9b16c8d2b8afff9b8800873cfe65eca3@earthlink.net> Great news, Charles. I look forward to your commits. > Before commiting, I just wanted to check one last time, because I had > to modify ~30 classes to replace the 'symbolList' namings with > 'sequence', also in method names (several 'Find and Replace' were > necessary). The compilation is fine and everything seems to work OK. I > don't know if you want to add a tag before such major refactoring > (this is probably ~ the same as when you added the BCSymbolList, > Koen?). Actually, I didn't tag it (don't know how to do that with Xcode :). Alex did another reorganization a while ago, so you might want to check with him what would be the easiest procedure. However, since we all agree on this change (I hope we still do after we see what you did to the code ;-), I would say just commit it. It's in CVS, so we always can revert certain changes. > I suppose such major refactoring is bound to happen at early stages of > a project, and are a good thing? :-) I guess they happen a lot, but am not sure if they are a good thing so early. However, all of us were/are pretty new at writing a framework with a group of developers, so we have a good excuse. Usually major changes (should) happen after a few stable releases. BioJava is currently in the progress of a complete rewrite. I am glad I'm not in that project! cheers, - Koen. From charles.parnot at stanford.edu Mon Jan 31 19:28:45 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Mon, 31 Jan 2005 16:28:45 -0800 Subject: [Biococoa-dev] BCSymbolList replacement In-Reply-To: <9b16c8d2b8afff9b8800873cfe65eca3@earthlink.net> References: <9b16c8d2b8afff9b8800873cfe65eca3@earthlink.net> Message-ID: At 19:16 -0500 1/31/05, Koen van der Drift wrote: >Great news, Charles. I look forward to your commits. > OK, I'll just go ahead tonight (well, tonight is 24, so maybe tomorrow), and will stop slowing things down! We have to get something done with annotations too... Charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021