From kvddrift at earthlink.net Tue Mar 1 18:44:27 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue, 1 Mar 2005 18:44:27 -0500 Subject: [Biococoa-dev] Target upgraded In-Reply-To: References: Message-ID: On Feb 28, 2005, at 1:26 AM, Charles PARNOT wrote: > The BioCocoa target has been upgraded to a native target, and its > SDKROOT set to 10.2.8 in the build settings. > FYI, with the new target, the translation demo compiles without any errors. - Koen. From kvddrift at earthlink.net Tue Mar 1 18:46:26 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue, 1 Mar 2005 18:46:26 -0500 Subject: [Biococoa-dev] BCSequenceView In-Reply-To: <0ffa67f4b9db7861cf78ee89826d10ad@earthlink.net> References: <0ffa67f4b9db7861cf78ee89826d10ad@earthlink.net> Message-ID: <0322c5351b071dedbab5e170020f1474@earthlink.net> On Feb 25, 2005, at 9:05 PM, Koen van der Drift wrote: > So I added the textDidChange method directly to BCSequenceView, > although this catually is a method for the delegate. Indeed the > whitespaces are now added and updated when editing the text. The only > problem now is that I cannot put the cursor in the middle of the text. > New characters are only added to the end of the text. > > > Any ideas what might be going on and how to fix it without using a > delegate object? > Actually, maybe I should make code to use a delegate object with BCSequenceView. The delegate can then inform the view that the sequence has been changed, send the new NSString to the view, and let the view display it. What do you guys think? - Koen. From kvddrift at earthlink.net Tue Mar 1 20:12:43 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue, 1 Mar 2005 20:12:43 -0500 Subject: [Biococoa-dev] BCSequenceView In-Reply-To: References: Message-ID: <0c5ace4be6df67de2526fda570bc6a3b@earthlink.net> On Mar 1, 2005, at 7:55 PM, John Timmer wrote: > So, with the caveat that I have no experience with this myself, have > you > checked into NSLayoutManager? It seems to be designed to display text > in > custom formats. It's probably more difficult than handling delegate > messages and such, but it sounds like it's more capable for creating a > free > standing solution, one that's not dependent upon how you connect > things up > in the NIB. > The NSLayoutManager is for this case way too complicated. The problem is not drawing the line/symbol numbers, but the interaction with the controller. In my own app, I have solved it by using delegate methods, but sofar was trying to make BCSequenceView stand alone. But probably it wont hurt if I make it part of a MVC pattern. It is up to the user to call the correct methods to make it work with a controller. Just like we have to do when we eg implement an NSTable. - Koen. From kvddrift at earthlink.net Tue Mar 1 20:47:45 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue, 1 Mar 2005 20:47:45 -0500 Subject: [Biococoa-dev] BCSequenceView In-Reply-To: <0c5ace4be6df67de2526fda570bc6a3b@earthlink.net> References: <0c5ace4be6df67de2526fda570bc6a3b@earthlink.net> Message-ID: <003ff5b667711c3ded8ccdfba946dcd1@earthlink.net> On Mar 1, 2005, at 8:12 PM, Koen van der Drift wrote: > The NSLayoutManager is for this case way too complicated. The problem > is not drawing the line/symbol numbers, but the interaction with the > controller. In my own app, I have solved it by using delegate methods, > but sofar was trying to make BCSequenceView stand alone. But probably > it wont hurt if I make it part of a MVC pattern. It is up to the user > to call the correct methods to make it work with a controller. Just > like we have to do when we eg implement an NSTable. So, I made theController.m in the translation example a delegate of BCSequenceView. I think it is a good solution, because this is also what should happen in a 'real' app. The demo also looks better now, and editing works ... almost. What goes wrong is that after each time I edit the sequence manually in the view, the cursor jumps to the end. I have no clue how to fix this. Also, I am adding headerdoc info to BCSequenceView. Almost all methods are only supposed to be used by the view itself. Should I still make an entry for those? - Koen. From kvddrift at earthlink.net Wed Mar 2 20:07:02 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 2 Mar 2005 20:07:02 -0500 Subject: [Biococoa-dev] BCSequenceView In-Reply-To: References: Message-ID: <7d36df82618c2562674aa4b09e2abae7@earthlink.net> On Mar 1, 2005, at 7:55 PM, John Timmer wrote: > So, with the caveat that I have no experience with this myself, have > you > checked into NSLayoutManager? It seems to be designed to display text > in > custom formats. It's probably more difficult than handling delegate > messages and such, but it sounds like it's more capable for creating a > free > standing solution, one that's not dependent upon how you connect > things up > in the NIB. > Alex, You have a sequence view in 4peaks. Is this an NSTextView where you use the NSLayoutManager? It might be actually a good idea if we want to make the BCSequenceView a little more advanced. - Koen. From charles.parnot at stanford.edu Thu Mar 3 01:04:04 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Wed, 2 Mar 2005 22:04:04 -0800 Subject: [Biococoa-dev] More on BCSymbolSets In-Reply-To: References: Message-ID: >Incidentally, how does ObjC work: you have a pointer typed to one thing, >and it actually holds a subclass - when you request its class, what do you >get? I forgot to answer that earlier. ObjC is dynamic. When you send the message class to an object at runtime, you are actually asking the runtime to take whatever pointer is hold by the variable and get the struct pointed by that reference. Normally, this struct should be an object with all its ivars, including all the ivars from all the superclasses, such as the ivar 'isa' (if the pointer does not point to a struct for an object). This ivar 'isa' points to another struct, which is the class object. The class object can then be used to find the pointer to the function actually implementing the method with selector 'class', starting with the subclass, and going up in the class tree until it finds a class object that does implement the selector. That function is then called. In the case of 'class', NSObject implements it. I don't know what the code is, but it probably simply returns the 'isa' value. In summary, because it is dynamic, you always get the real class at runtime (except if you override the method 'class' to return junk!). For instance: NSArray *anArray=[NSString stringWithString:@"a string"]; NSLog(@"%@ : %@",[anArray class],anArray); ---> 'NSString : a string' and does not give any compiler warning(!!) because the type returned by 'stringWithString' is an id. In other words, the compiler can check typing, but will not if you use id. Depending on the situations, you thus may have a compiler warning and no runtime error, and you may have no compiler warning but a runtime error, just because the type of an object as guessed by the compiler may be different at runtime. This is also what my placeholder sequence class BCSequence does. Another little comment on the 'class' method. This is something quite important to remember to use in some occasions, for instance when you write a 'copy' method. You should always use [[self class] alloc] instead of [className alloc] if you need to create a new instance that will be the copy. This is in case the method is actually run for an instance of a subclass. Otherwise, you get an instance of the superclass when calling copy on an instance of a subclass (and if this subclass does not override the copy method). This might be the case for the sequence class tree, where we may be able to have the copy in the superclass, without overriding in the subclasses. charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From charles.parnot at stanford.edu Thu Mar 3 01:25:51 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Wed, 2 Mar 2005 22:25:51 -0800 Subject: [Biococoa-dev] More on BCSymbolSets In-Reply-To: <1fd716c728644fa55ff651f015037831@earthlink.net> References: <1fd716c728644fa55ff651f015037831@earthlink.net> Message-ID: >ps you guys are going *way* too fast with all those emails. I only have a limited time each day to read them, understand them and possibly reply to them. Sorry if I don't address all issues :( We slowed down ;-) I am particularly interested in getting that symbolSet thing settled (for some reason, I am focused on the sequence classes for now!). So let's try to make decisions by answering these questions (I seem to answer some of these without leaving room for discussion, but please you know me now, I tend to be very -too?- affirmative about things): * do we want symbol sets? I am in favor of them, Koen is, it looks like John is too. As long as somebody is ready to write code, I don't think we can stop them! So this has to be yes... * what do we want symbolSet for? Koen and I both had 'filtering' in mind, it seems John had it in mind too a long time ago. If we use symbol sets in teh sequence classes, they have to do something and that something would be filtering then. There are problems raised by John. We will have to tackle them and if we can't, then we will just give up the symbol sets (or maybe we could go to the point where any remaining problems would be unlikely to appear in reasonable use of the framework). * Do we let the user define the symbol set at creation by providing a 'symbolSet' argument in some of the initializers? The other option is to decide the symbol set ourselves. My opinion is we should give the user the possibility to use the symbol set she wants, but also provide a default symbol set when she just wants to create a sequence and not bother. Another option is to let that option open for later and internally use symbol sets for now, and see how it is doing (performance, usability,...), and then add more possibility for the user later. About sequenceTypes: * Should we extend the number of sequence types to take into account the different symbol sets? Proposed by Koen. John and I agree this would be redundant with symbol sets, and an enum would not catch all the possibilities that we may end up having. The whole idea of symbol sets is to indeed extend the sequence type. * Will all instances of one given sequence classalways have the same sequenceType? e.g. all instances of BCDNASequence will be of type 'BCDNASequence'. It seems to be the case at present. Given the answer to the previous question, I don't see a case where we would want to have a different sequence type. It seems symbol set will provide for more refined typing. So the answer to the question seems to be yes. There are probably more basic questions that I am missing. After answering the design questions, we can move to the details of the implementations. charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From charles.parnot at stanford.edu Thu Mar 3 03:01:14 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Thu, 3 Mar 2005 00:01:14 -0800 Subject: [Biococoa-dev] BCSequenceView In-Reply-To: <003ff5b667711c3ded8ccdfba946dcd1@earthlink.net> <7d36df82618c2562674aa4b09e2abae7@earthlink.net> <0322c5351b071dedbab5e170020f1474@earthlink.net> <0c5ace4be6df67de2526fda570bc6a3b@earthlink.net> References: <0c5ace4be6df67de2526fda570bc6a3b@earthlink.net> <003ff5b667711c3ded8ccdfba946dcd1@earthlink.net> <7d36df82618c2562674aa4b09e2abae7@earthlink.net> <0ffa67f4b9db7861cf78ee89826d10ad@earthlink.net> <0322c5351b071dedbab5e170020f1474@earthlink.net> <0c5ace4be6df67de2526fda570bc6a3b@earthlink.net> Message-ID: Sorry Koen I did not follow that thread very closely. I will have more time this week to think about it too. My gut feeling is that subclassing NSTextView is 'evil', and that instead the NSTextView should be a ivar and a subview (taking all the space for now, but in the future a BCSequenceView could have several subviews with several text fields like 'sequence length', 'current selection',...), and that BCSequenceView could be the NSTextView delegate to respond to user input and reformat the text view contents accordingly. Sorry this is not very detailed, so you can ignore it if that does not make sense! And what about the symbol set thread? ;-) charles At 6:46 PM -0500 3/1/05, Koen van der Drift wrote: >Actually, maybe I should make code to use a delegate object with BCSequenceView. The delegate can then inform the view that the sequence has been changed, send the new NSString to the view, and let the view display it. What do you guys think? > At 8:12 PM -0500 3/1/05, Koen van der Drift wrote: >The NSLayoutManager is for this case way too complicated. The problem is not drawing the line/symbol numbers, but the interaction with the controller. In my own app, I have solved it by using delegate methods, but sofar was trying to make BCSequenceView stand alone. But probably it wont hurt if I make it part of a MVC pattern. It is up to the user to call the correct methods to make it work with a controller. Just like we have to do when we eg implement an NSTable. > >- Koen. At 8:47 PM -0500 3/1/05, Koen van der Drift wrote: >So, I made theController.m in the translation example a delegate of BCSequenceView. I think it is a good solution, because this is also what should happen in a 'real' app. The demo also looks better now, and editing works ... almost. What goes wrong is that after each time I edit the sequence manually in the view, the cursor jumps to the end. I have no clue how to fix this. > >Also, I am adding headerdoc info to BCSequenceView. Almost all methods are only supposed to be used by the view itself. Should I still make an entry for those? > >- Koen. At 8:07 PM -0500 3/2/05, Koen van der Drift wrote: >Alex, > >You have a sequence view in 4peaks. Is this an NSTextView where you use the NSLayoutManager? It might be actually a good idea if we want to make the BCSequenceView a little more advanced. > >- Koen. -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From mek at mekentosj.com Thu Mar 3 16:47:53 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Thu, 3 Mar 2005 22:47:53 +0100 Subject: [Biococoa-dev] BCSequenceView In-Reply-To: <7d36df82618c2562674aa4b09e2abae7@earthlink.net> References: <7d36df82618c2562674aa4b09e2abae7@earthlink.net> Message-ID: <2ebc22e6013b7b3d41526565158c8ff0@mekentosj.com> Sorry guys, terribly busy, but I'll try to catch up again from tomorrow... (and the reason is not only the fact that we did not have that much snow yesterday in the past 20 years! It took me 1.5 hours plowing by bike through 40cm of snow to get to work ;-) To quickly answer Koen's question: On 3-mrt-05, at 2:07, Koen van der Drift wrote: > Alex, > > You have a sequence view in 4peaks. Is this an NSTextView where you > use the NSLayoutManager? It might be actually a good idea if we want > to make the BCSequenceView a little more advanced. Yep, it's the JSDTextView from James S. Derry, who thanks you by the way in this header comments for some code you helped him with: / ************************************************************************ *************** JSDTextView.m Modify NSTextView for a single, special purpose. We will: o non-wrapping text with a line counter to the left. o self contained (hence multiple classes in this header) We will: o hand-build an NSScrollView, NSTextView, NSLayoutManager, NSTextContainer, and NSTextStorage for textView, and add a JSDNumberView for numberView o allow turning wrapping of text to YES or NO. o allow turning line number counting to YES or NO. I'd like to give especial thanks to Koen van der Drift for the means to implement the line numbering in the NSTextView subclass! ************************************************************************ ***************/ So it "re-creates" the components of the textview system (though conveniently all in one .m/.h filepair). It does contain a nasty "black flash" bug the first time you use it, something I still haven't found a fix for. I was thinking of switching to your variant though Koen ;-) The cocoa textsystem is quite a thing to build, and I'm still figuring what would be the best way to go for in our situation, it will be quite some work to make that work natively with our bcsequences, but it must be possible. For instance, the "peaks-view" in 4Peaks is completely custom-made, basically from scratch. It does use the textediting features from NSText though, and also the NSTextContainer, NSTextStorage, NSLayoutManager trio to draw the strings (which made it about 500% faster than using the NSString drawString: convenience methods). Anyway, I'll read the other message soon and jump back in... Cheers, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com The requirements said: Windows 2000 or better. So I got a Macintosh. ********************************************************* -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 3326 bytes Desc: not available URL: From kvddrift at earthlink.net Thu Mar 3 19:43:58 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 3 Mar 2005 19:43:58 -0500 Subject: [Biococoa-dev] More on BCSymbolSets In-Reply-To: References: <1fd716c728644fa55ff651f015037831@earthlink.net> Message-ID: On Mar 3, 2005, at 1:25 AM, Charles PARNOT wrote: > * Do we let the user define the symbol set at creation by providing a > 'symbolSet' argument in some of the initializers? The other option is > to decide the symbol set ourselves. > My opinion is we should give the user the possibility to use the > symbol set she wants, but also provide a default symbol set when she > just wants to create a sequence and not bother. > Another option is to let that option open for later and internally use > symbol sets for now, and see how it is doing (performance, > usability,...), and then add more possibility for the user later. I think it would be a good idea if we allow the user to pass a symbolset, defining the type of sequence. In fact you not only make a filter for whatever string or array is supplied to create the sequence, but you also have immediately an identifier of the sequence. > > About sequenceTypes: > > * Should we extend the number of sequence types to take into account > the different symbol sets? > Proposed by Koen. I am not in favor of extending the number of sequence types, It was more a question based on the comments made by John. Actually, I would propose to not use the sequencetype at all, but only use symbolsets, since they also act as identifiers (see above). > * Will all instances of one given sequence classalways have the same > sequenceType? e.g. all instances of BCDNASequence will be of type > 'BCDNASequence'. Probably not. A BCSequenceDNA can have ambiguous symbols, but can also be strict. It can allow for gaps in an alignment, etc. By assigning it a sequence type, still doesn't tell anything about the possible symbols. Therefore a symbolset will be much more useful. Another thing that bugs me is that the sequence is BCSequenceDNA but the type is BCDNASequence. Very confusing :) cheers, - Koen. From kvddrift at earthlink.net Thu Mar 3 20:12:29 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 3 Mar 2005 20:12:29 -0500 Subject: [Biococoa-dev] More on BCSymbolSets In-Reply-To: References: <1fd716c728644fa55ff651f015037831@earthlink.net> Message-ID: <8a7f460754e5343579b9986b3100f479@earthlink.net> On Mar 3, 2005, at 1:25 AM, Charles PARNOT wrote: > There are probably more basic questions that I am missing. After > answering the design questions, we can move to the details of the > implementations. > > I already made some changes, the symbolsets are now singletons, as discussed earlier. Other things that come to mind: * when using a symbolset in the initializer, which method should be the designated initializer? My preference would be to make the initWithString:usingSymbolSet: the designated initializer. This is probably the most common case. When using an array of symbols, the symbolset is already known from the original sequence, so that one can be used. If no symbolset is used, the code in sequencetypefromstring can be used to determine what the symbolset should be. * do we still need the 'skippingunknownsymbols' flag? I would say no. I cannot think of a situation when that flag will be NO. And also using a symbolset will filter these out. - Koen. From charles.parnot at stanford.edu Thu Mar 3 21:47:45 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Thu, 3 Mar 2005 18:47:45 -0800 Subject: [Biococoa-dev] More on BCSymbolSets In-Reply-To: References: <1fd716c728644fa55ff651f015037831@earthlink.net> Message-ID: >I think it would be a good idea if we allow the user to pass a symbolset, defining the type of sequence. In fact you not only make a filter for whatever string or array is supplied to create the sequence, but you also have immediately an identifier of the sequence. So we would need to provide an initializer with a symbolSet argument, e.b. 'initWithSymbolArray:symbolSet'. OK, we agree :-) What do you mean an identifier? >> >>About sequenceTypes: >> >>* Should we extend the number of sequence types to take into account the different symbol sets? >>Proposed by Koen. > >I am not in favor of extending the number of sequence types, It was more a question based on the comments made by John. Actually, I would propose to not use the sequencetype at all, but only use symbolsets, since they also act as identifiers (see above). Still unclear how they would be identifiers of the sequence (like in unique id??). OK, we basically agree that sequence type as it is now is not super useful, except as a shortcut for the sequence class. I still think that a BCSequenceType has a use. A symbolSet should not be allowed to hold symbols of different types/classes. So symbolSet would have a type. And a symbolSet should be allowed to be associated with a sequence only if the right type. Instead of checking the class all the time, it is probably better to use an enum like BCSequenceType. >>* Will all instances of one given sequence classalways have the same sequenceType? e.g. all instances of BCDNASequence will be of type 'BCDNASequence'. > >Probably not. A BCSequenceDNA can have ambiguous symbols, but can also be strict. It can allow for gaps in an alignment, etc. By assigning it a sequence type, still doesn't tell anything about the possible symbols. Therefore a symbolset will be much more useful. Another thing that bugs me is that the sequence is BCSequenceDNA but the type is BCDNASequence. Very confusing :) I agree that symbolSets will be different for each instance. But the sequenceType, if we keep it in addition of the symbolSet (for the reason above), then it will be always the same for all instances of a class. Regarding the naming conventions, BCSequenceDNA for the class, BCDNASequence for the type, it is indeed quite confusing; how about BCSequenceTypeDNA et al.? charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From kvddrift at earthlink.net Thu Mar 3 21:56:05 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 3 Mar 2005 21:56:05 -0500 Subject: [Biococoa-dev] More on BCSymbolSets In-Reply-To: References: <1fd716c728644fa55ff651f015037831@earthlink.net> Message-ID: <71772beb057d4031fd700d20cc32fb8d@earthlink.net> On Mar 3, 2005, at 9:47 PM, Charles PARNOT wrote: > >> I think it would be a good idea if we allow the user to pass a >> symbolset, defining the type of sequence. In fact you not only make a >> filter for whatever string or array is supplied to create the >> sequence, but you also have immediately an identifier of the >> sequence. > > So we would need to provide an initializer with a symbolSet argument, > e.b. 'initWithSymbolArray:symbolSet'. OK, we agree :-) > > What do you mean an identifier? I mean the sequence type. > OK, we basically agree that sequence type as it is now is not super > useful, except as a shortcut for the sequence class. > I still think that a BCSequenceType has a use. A symbolSet should not > be allowed to hold symbols of different types/classes. So symbolSet > would have a type. This will be taken care of when the symbolset is created, see the BCSymbol class. The dnaSymbolSet only holds nucleotides, the proteinSymbolSet holds only amino acids. > And a symbolSet should be allowed to be associated with a sequence > only if the right type. > Instead of checking the class all the time, it is probably better to > use an enum like BCSequenceType. This won't happen that much, maybe only during creation, so I don't think there will be much slowdown by calling the class instead of the sequenceType. > >>> * Will all instances of one given sequence classalways have the same >>> sequenceType? e.g. all instances of BCDNASequence will be of type >>> 'BCDNASequence'. >> >> Probably not. A BCSequenceDNA can have ambiguous symbols, but can >> also be strict. It can allow for gaps in an alignment, etc. By >> assigning it a sequence type, still doesn't tell anything about the >> possible symbols. Therefore a symbolset will be much more useful. >> Another thing that bugs me is that the sequence is BCSequenceDNA but >> the type is BCDNASequence. Very confusing :) > > I agree that symbolSets will be different for each instance. But the > sequenceType, if we keep it in addition of the symbolSet (for the > reason above), then it will be always the same for all instances of a > class. > Regarding the naming conventions, BCSequenceDNA for the class, > BCDNASequence for the type, it is indeed quite confusing; how about > BCSequenceTypeDNA et al.? *If* we decide to keep it, that would indeed be better, yes. - Koen. From charles.parnot at stanford.edu Thu Mar 3 21:58:14 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Thu, 3 Mar 2005 18:58:14 -0800 Subject: [Biococoa-dev] More on BCSymbolSets In-Reply-To: <8a7f460754e5343579b9986b3100f479@earthlink.net> References: <1fd716c728644fa55ff651f015037831@earthlink.net> <8a7f460754e5343579b9986b3100f479@earthlink.net> Message-ID: At 8:12 PM -0500 3/3/05, Koen van der Drift wrote: >On Mar 3, 2005, at 1:25 AM, Charles PARNOT wrote: > >>There are probably more basic questions that I am missing. After answering the design questions, we can move to the details of the implementations. >> > >I already made some changes, the symbolsets are now singletons, as discussed earlier. I hate to be picky with words, but I had a hard time following this idea because a 'singleton' class is a class with only one instance... so I was confused. What you guys mean is to provide a bunch of pre-built immutable symbolSets accessible through factory methods. I think this is excellent design, and the user will rarely need anything outside of these instances. I suggested in a previous emails that we made all instances immutables (basically sticking to NSSet), for simplicity. In particular, one should not be able to change the symbolSet of a sequence, that would be disastrous. A mutable symbol set would have these problems. What do you think? >Other things that come to mind: > >* when using a symbolset in the initializer, which method should be the designated initializer? > >My preference would be to make the initWithString:usingSymbolSet: the designated initializer. This is probably the most common case. When using an array of symbols, the symbolset is already known from the original sequence, so that one can be used. If no symbolset is used, the code in sequencetypefromstring can be used to determine what the symbolset should be. Yes, that is exactly what I think! Did you look at my code? ;-) If no symbolset is passed as argument in the initializer, I propose instead to use a default symbol set provided as a method and overriden in each subclass: - (BCSymbolSet *)defaultSymbolSet; This default symbol set can then be accessed in the implementation of the initializer in the superclass. This default set would be the most general. I think if the user starts a sequence with @"ATGTG" and later add @"BYTGTG", the symbols "BY" should be recognized (in what you propose, this would not be recognized). That should really be standard. In the absence of choice, we should leave the symbol set as open as possible, or the user will have different behavior depending on how the sequence was initialized. >* do we still need the 'skippingunknownsymbols' flag? > >I would say no. I cannot think of a situation when that flag will be NO. And also using a symbolset will filter these out. I vote no too. Symbol sets will take care of that, yes!! NB: I will try to post an updated version of the code I posted previously on the list. Tell me I can't wait to read your responses to my responses to your responses! charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From jtimmer at bellatlantic.net Thu Mar 3 22:07:23 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Thu, 03 Mar 2005 22:07:23 -0500 Subject: [Biococoa-dev] More on BCSymbolSets In-Reply-To: <71772beb057d4031fd700d20cc32fb8d@earthlink.net> Message-ID: >> I agree that symbolSets will be different for each instance. But the >> sequenceType, if we keep it in addition of the symbolSet (for the >> reason above), then it will be always the same for all instances of a >> class. >> Regarding the naming conventions, BCSequenceDNA for the class, >> BCDNASequence for the type, it is indeed quite confusing; how about >> BCSequenceTypeDNA et al.? > > *If* we decide to keep it, that would indeed be better, yes. > Renaming it's fine, but we HAVE to keep it. There are going to be literally dozens of symbol sets, plus the potential for user-generated sets, and I would not want to loop through all the possibilities just to find out whether a sequence could be translated or complemented. JT _______________________________________________ This mind intentionally left blank From charles.parnot at stanford.edu Fri Mar 4 01:44:22 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Thu, 3 Mar 2005 22:44:22 -0800 Subject: [Biococoa-dev] More on BCSymbolSets In-Reply-To: References: Message-ID: > >> Regarding the naming conventions, BCSequenceDNA for the class, >>> BCDNASequence for the type, it is indeed quite confusing; how about >>> BCSequenceTypeDNA et al.? >> >> *If* we decide to keep it, that would indeed be better, yes. >> > >Renaming it's fine, but we HAVE to keep it. There are going to be literally >dozens of symbol sets, plus the potential for user-generated sets, and I >would not want to loop through all the possibilities just to find out >whether a sequence could be translated or complemented. > >JT I totally agree with John. It will already be useful for us, but for the user it is critical. If we let the sequence type be checked using the class, you have a number of potential big issues: * testing for class equality e.g. [sequence class]==[BCSequenceDNA], can be broken if the class is a subclass * using isKindOfClass method is also dangerous if the class tree changes * how about BCSequence?? The user should not have to know that at runtime, they could be one of the other class, while [someSequence sequenceType] is simple Now, what do you guys think of a 'sequenceType' on the symbolSet as well that control what kind of symbols can get in, and that allows us to check the compatibility of a sequence and a symbolSet before initialization? charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From kvddrift at earthlink.net Fri Mar 4 06:29:13 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 4 Mar 2005 06:29:13 -0500 Subject: [Biococoa-dev] More on BCSymbolSets In-Reply-To: References: <1fd716c728644fa55ff651f015037831@earthlink.net> <8a7f460754e5343579b9986b3100f479@earthlink.net> Message-ID: On Mar 3, 2005, at 9:58 PM, Charles PARNOT wrote: > I suggested in a previous emails that we made all instances immutables > (basically sticking to NSSet), for simplicity. In particular, one > should not be able to change the symbolSet of a sequence, that would > be disastrous. A mutable symbol set would have these problems. > What do you think? Yes, immutable sets sounds good to me. But how do we populate them? They way the code now works is that each symbol is added one by one. I guess we can make an array of symbols and then add them at once using setWithArray. > Yes, that is exactly what I think! Did you look at my code? ;-) Hmm, not sure what you mean - I didn't see anything in the code. Maybe my Xcode - cvs is screwed up again :( > > If no symbolset is passed as argument in the initializer, I propose > instead to use a default symbol set provided as a method and overriden > in each subclass: > - (BCSymbolSet *)defaultSymbolSet; > This default symbol set can then be accessed in the implementation of > the initializer in the superclass. This default set would be the most > general. I think if the user starts a sequence with @"ATGTG" and later > add @"BYTGTG", the symbols "BY" should be recognized (in what you > propose, this would not be recognized). That should really be > standard. In the absence of choice, we should leave the symbol set as > open as possible, or the user will have different behavior depending > on how the sequence was initialized. Sounds good to me. - Koen. From kvddrift at earthlink.net Fri Mar 4 06:32:52 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 4 Mar 2005 06:32:52 -0500 Subject: [Biococoa-dev] More on BCSymbolSets In-Reply-To: References: Message-ID: <9efb0aa32066213fbbb9d542aee8694e@earthlink.net> On Mar 3, 2005, at 10:07 PM, John Timmer wrote: > Renaming it's fine, but we HAVE to keep it. There are going to be > literally > dozens of symbol sets, plus the potential for user-generated sets, and > I > would not want to loop through all the possibilities just to find out > whether a sequence could be translated or complemented. > I am not sure if I follow this. As soon as a sequence is created, the symbolset is defined. So there is no need to iterate over all symbolsets to find out if a certain operation is possible. For convenience, we could extend BCSymbolSet with a method "containsNucleotides" that will return yes is the objects in the set are of that type. - Koen. From charles.parnot at stanford.edu Fri Mar 4 09:45:48 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Fri, 4 Mar 2005 06:45:48 -0800 Subject: [Biococoa-dev] More on BCSymbolSets In-Reply-To: References: <1fd716c728644fa55ff651f015037831@earthlink.net> <8a7f460754e5343579b9986b3100f479@earthlink.net> Message-ID: >>Yes, that is exactly what I think! Did you look at my code? ;-) > >Hmm, not sure what you mean - I didn't see anything in the code. Maybe my Xcode - cvs is screwed up again :( code posted on the list in a previous email @interface BCAbstractSequence:NSObject { BCSymbolSet *symbolSet; BCSequenceType *sequenceType; NSMutableArray *symbolArray; } //designated initializer - (id)initWithSymbolArray:(NSArray *)anArray symbolSet:(BCSymbolSet *)aSet; - (id)initWithString:(NSString *)aString; - (id)initWithString:(NSString *)aString symbolSet:(BCSymbolSet *)aSet; //methods to override in the subclasses + (BCSequenceType)sequenceType; + (BCSymbolSet *)defaultSymbolSet; @end @implementation BCAbstractSequence - (id)initWithSymbolArray:(NSArray *)anArray symbolSet:(BCSymbolSet *)aSet { self=[super init]; if (self!=nil) { sequenceType=[[self class] sequenceType]; //check that the symbol set is the right type, otherwise use default if ([aSet sequenceType]!=sequenceType) aSet=[[self class] defaultSymbolSet]; //let the set check the symbols NSArray *finalArray=[aSet arrayByRemovingUnknownSymbolsFromArray:anArray]; symbolArray=[[NSMutableArray alloc] initWithArray:finalArray]; } return self; } - (id)initWithString:(NSString *)aString; { return [self initWithString:aString symbolSet:[[self class] defaultSymbolSet]]; } - (id)initWithString:(NSString *)aString symbolSet:(BCSymbolSet *)aSet; { int i,n; NSMutableArray *anArray; BCSymbol *aSymbol; //check that the symbol set is the right type, otherwise use default if ([aSet sequenceType]!=[[self class] sequenceType]) aSet=[[self class] defaultSymbolSet]; //creates a symbol array n=[aString length]; anArray=[NSMutableArray arrayWithCapacity:[aString length]]; for (i=0;i++;i Message-ID: >> Renaming it's fine, but we HAVE to keep it. There are going to be >> literally >> dozens of symbol sets, plus the potential for user-generated sets, and >> I >> would not want to loop through all the possibilities just to find out >> whether a sequence could be translated or complemented. >> > > I am not sure if I follow this. As soon as a sequence is created, the > symbolset is defined. So there is no need to iterate over all > symbolsets to find out if a certain operation is possible. For > convenience, we could extend BCSymbolSet with a method > "containsNucleotides" that will return yes is the objects in the set > are of that type. I agree that the symbol set's defined, but you'd still need some way of recognizing which type of symbol set it is. I can't see how to do that without iteration. For example, let's say we provide all combinations of symbol sets using only the single bases (ATCG), those plus N, those plus N and gap, those plus, N, gap, and undefined, etc. You're easily up to about a dozen symbol sets for DNA alone. Then you add RNA, and protein, and you're probably in the area of 25. Now, you need to do a restriction digest. That only works with DNA, so you need to know if you have a DNA sequence. There's no easy way to do this with just a symbol set. You'd have to either iterate through all its symbols and determine whether they're all DNA nucleotides, or iterate through all the DNA symbol set singletons and test for equality to the set that the sequence is using. Translation's even worse, since it works with DNA and RNA. I don't see how you can avoid iteration, but you feel you can, so maybe i'm missing something. Your alternative, "containsNucleotides" is fine, but we already have the other system in place - it's simple, and it works, so I don't see the need to redo it. Anyway, as an aside, i've been thinking that the symbol set structure would allow for a nice encapsulation of a genetic code. The problem is that codons aren't symbols (since they have both amino acid and nucleotide information). Any suggestions on how to adapt things? JT _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Fri Mar 4 19:08:02 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 4 Mar 2005 19:08:02 -0500 Subject: [Biococoa-dev] More on BCSymbolSets In-Reply-To: References: Message-ID: On Mar 4, 2005, at 10:50 AM, John Timmer wrote: > > For example, let's say we provide all combinations of symbol sets > using only > the single bases (ATCG), those plus N, those plus N and gap, those > plus, N, > gap, and undefined, etc. You're easily up to about a dozen symbol > sets for > DNA alone. Then you add RNA, and protein, and you're probably in the > area > of 25. But each set will only have nucleotides or amino acids. They are not intended to mix. > > Now, you need to do a restriction digest. That only works with DNA, > so you > need to know if you have a DNA sequence. There's no easy way to do > this > with just a symbol set. Why not? Just test if the sequences' symbolset contains nucleotides, excluding 'U'. No need to go through each symbol in every symbolset. We could even move the sequenceType to the BCSymbolSet class as suggested by Charles. That way we just need a convenience method to check. > You'd have to either iterate through all its > symbols and determine whether they're all DNA nucleotides, or iterate > through all the DNA symbol set singletons and test for equality to the > set > that the sequence is using. Translation's even worse, since it works > with > DNA and RNA. Actually, that's even easier, you only need to check if the symbolset contains nucleotides! > > I don't see how you can avoid iteration, but you feel you can, so > maybe i'm > missing something. Your alternative, "containsNucleotides" is fine, > but we > already have the other system in place - it's simple, and it works, > so I > don't see the need to redo it. What other system are you referring to? > > > Anyway, as an aside, i've been thinking that the symbol set structure > would > allow for a nice encapsulation of a genetic code. The problem is that > codons aren't symbols (since they have both amino acid and nucleotide > information). Any suggestions on how to adapt things? Not saying that this should be the way that we should do this, but I remember that BioJava uses cross-alphabets for this. While googling for that I found this short explanation: http://www.biojava.org/docs/bj_in_anger/crossProd.htm I also came accross this from our friend at biopython, who went through the same process of finding out what would be the best way to implement the variuos sequences: http://biopython.org/pipermail/biopython/2000-March/000190.html - Koen. From kvddrift at earthlink.net Fri Mar 4 19:38:29 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 4 Mar 2005 19:38:29 -0500 Subject: [Biococoa-dev] More on BCSymbolSets In-Reply-To: References: <1fd716c728644fa55ff651f015037831@earthlink.net> <8a7f460754e5343579b9986b3100f479@earthlink.net> Message-ID: On Mar 4, 2005, at 9:45 AM, Charles PARNOT wrote: >> Hmm, not sure what you mean - I didn't see anything in the code. >> Maybe my Xcode - cvs is screwed up again :( > > code posted on the list in a previous email > I must have missed that the first time. Your code looks good, although you made: - (id)initWithSymbolArray:(NSArray *)anArray symbolSet:(BCSymbolSet *)aSet; not - (id)initWithString:(NSString *)aString symbolSet:(BCSymbolSet *)aSet; the designated initializer like you do now. Probably the array-version is a better idea, because that is already closer to the BCSequence than just a string. Otherwise you have an array, then make a string, and then make an array again, which is a waste of time (and yes, I am contradicting a previous message from me where I stated the opposite :) As far as I am concerned you can commit this code. - Koen. From jtimmer at bellatlantic.net Fri Mar 4 20:08:41 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Fri, 04 Mar 2005 20:08:41 -0500 Subject: [Biococoa-dev] More on BCSymbolSets In-Reply-To: Message-ID: >> For example, let's say we provide all combinations of symbol sets >> using only >> the single bases (ATCG), those plus N, those plus N and gap, those >> plus, N, >> gap, and undefined, etc. You're easily up to about a dozen symbol >> sets for >> DNA alone. Then you add RNA, and protein, and you're probably in the >> area >> of 25. > > But each set will only have nucleotides or amino acids. They are not > intended to mix. Okay, so we will have no user-defined sets, and I would go in, grab the first symbol, and test what type of symbol it is? Wouldn't that get into issues like testing its class, which Charles indicated might not be the best way of determining this? >> >> I don't see how you can avoid iteration, but you feel you can, so >> maybe i'm >> missing something. Your alternative, "containsNucleotides" is fine, >> but we >> already have the other system in place - it's simple, and it works, >> so I >> don't see the need to redo it. > > What other system are you referring to? The existing one, where we have an enumeration with a value associated with the sequence. Thanks for the links - I'll look into them tomorrow. JT _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Fri Mar 4 20:24:54 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 4 Mar 2005 20:24:54 -0500 Subject: [Biococoa-dev] More on BCSymbolSets In-Reply-To: References: Message-ID: <0859b61ad5592d0122515673c255f83a@earthlink.net> On Mar 4, 2005, at 8:08 PM, John Timmer wrote: >> But each set will only have nucleotides or amino acids. They are not >> intended to mix. > Okay, so we will have no user-defined sets, and I would go in, grab the > first symbol, and test what type of symbol it is? Wouldn't that get > into > issues like testing its class, which Charles indicated might not be > the best > way of determining this? That's a good point. So we could move the BCSequenceType ivar to BCSymbolSet. BTW, With the line 'They are not intended to mix' I didn't mean that there cannot be user-defined sets. Just meant that I don't see a situation where a symbol set is created containing both amino acids and nucleotides. That would be 'distorting biology', to use your own words :) - Koen. From jtimmer at bellatlantic.net Sat Mar 5 12:20:32 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Sat, 05 Mar 2005 12:20:32 -0500 Subject: [Biococoa-dev] More on BCSymbolSets In-Reply-To: <0859b61ad5592d0122515673c255f83a@earthlink.net> Message-ID: > > On Mar 4, 2005, at 8:08 PM, John Timmer wrote: > >>> But each set will only have nucleotides or amino acids. They are not >>> intended to mix. >> Okay, so we will have no user-defined sets, and I would go in, grab the >> first symbol, and test what type of symbol it is? Wouldn't that get >> into >> issues like testing its class, which Charles indicated might not be >> the best >> way of determining this? > > That's a good point. So we could move the BCSequenceType ivar to > BCSymbolSet. Yes, and just make the method that's now in BCSequence a convenience call-through to that. To maintain key/value coding, we could just have the setter method do nothing. > BTW, With the line 'They are not intended to mix' I didn't mean that > there cannot be user-defined sets. Just meant that I don't see a > situation where a symbol set is created containing both amino acids and > nucleotides. That would be 'distorting biology', to use your own words > :) Exactly. Which is why I expect it will happen, and want the code to be robust enough to cope with it. It still amazes me, the bugs people find when the app gets distributed to more than a dozen people. JT _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Sat Mar 5 17:12:28 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 5 Mar 2005 17:12:28 -0500 Subject: [Biococoa-dev] functional groups Message-ID: <1ee37af387f8bb7c59da4f73e2ab33ff@earthlink.net> Hi, I am thinking about adding a class that will take care of functional groups on sequences, eg methyl, phosphate, etc. My first thought was to make it a subclass of BCSymbol. However, it will then also inherit the representedBy code, etc. That is not so useful for functional groups (or maybe they are?). So, what I plan to do is to make a new, abstract root class for symbols, maybe called BCRoot. Then BCSymbol and BCFunctionalGroup wll inherit from BCRoot. The BCRoot class will have a lot of the code of BCSymbol, eg for name, mass, etc. However, some things will be in BCSymbol, eg initializeSymbolRelationships. The functional groups can be attached to the sequence, using the not-yet-existing BCFeature class. The BCFunctionalGroup can also have an ivar pointing to the symbol it is attached to. What do you guys think? For proteins functional groups are very important, and the probably need their own class. What about DNA/RNA? Are functional groups important, eg when digesting? Also, when looking at the code of BCSymbol, I noticed that we use the initializer initWithSymbol:(unichar)aChar. Maybe we can change this to initWithChar, or even add that method? When using the word 'symbol' I would expect a BCSymbol, not a char. I remember we discussed this earlier, but I forgot why we eventually decided to use initWithSymbol instead of initWithChar. cheers, - Koen. From charles.parnot at stanford.edu Sat Mar 5 18:26:36 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Sat, 5 Mar 2005 15:26:36 -0800 Subject: [Biococoa-dev] More on BCSymbolSets In-Reply-To: References: <1fd716c728644fa55ff651f015037831@earthlink.net> <8a7f460754e5343579b9986b3100f479@earthlink.net> Message-ID: At 6:29 AM -0500 3/4/05, Koen van der Drift wrote: >Yes, immutable sets sounds good to me. But how do we populate them? They way the code now works is that each symbol is added one by one. I guess we can make an array of symbols and then add them at once using setWithArray. yes, or a nil terminated list ? la NSArray. plus the 'intersect' and 'union' directly wrapping NSSet equivalent methods charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From charles.parnot at stanford.edu Sat Mar 5 18:34:00 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Sat, 5 Mar 2005 15:34:00 -0800 Subject: [Biococoa-dev] More on BCSymbolSets In-Reply-To: References: <1fd716c728644fa55ff651f015037831@earthlink.net> <8a7f460754e5343579b9986b3100f479@earthlink.net> Message-ID: At 7:38 PM -0500 3/4/05, Koen van der Drift wrote: >Your code looks good, although you made: > >- (id)initWithSymbolArray:(NSArray *)anArray symbolSet:(BCSymbolSet *)aSet; > >not > >- (id)initWithString:(NSString *)aString symbolSet:(BCSymbolSet *)aSet; > >the designated initializer like you do now. > >Probably the array-version is a better idea, because that is already closer to the BCSequence than just a string. Otherwise you have an array, then make a string, and then make an array again, which is a waste of time (and yes, I am contradicting a previous message from me where I stated the opposite :) In the current code, there are 2 independent initializers, which bothers me a lot. Like you say, the symbolArray one makes more sense for a designated initializer. John wanted to make the class methods into instance methods, which is probably better, I agree. But I don't know if he had other concerns too. John? charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From charles.parnot at stanford.edu Sat Mar 5 18:41:33 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Sat, 5 Mar 2005 15:41:33 -0800 Subject: [Biococoa-dev] More on BCSymbolSets In-Reply-To: References: Message-ID: At 12:20 PM -0500 3/5/05, John Timmer wrote: > > >> On Mar 4, 2005, at 8:08 PM, John Timmer wrote: >> >>>> But each set will only have nucleotides or amino acids. They are not >>>> intended to mix. >>> Okay, so we will have no user-defined sets, and I would go in, grab the >>> first symbol, and test what type of symbol it is? Wouldn't that get >>> into >>> issues like testing its class, which Charles indicated might not be >>> the best >>> way of determining this? >> >> That's a good point. So we could move the BCSequenceType ivar to >> BCSymbolSet. > >Yes, and just make the method that's now in BCSequence a convenience >call-through to that. To maintain key/value coding, we could just have the >setter method do nothing. We're getting close! In the initializer, it is necessary to check that the BCSymbolSet is the right type before using it, right? So it would still be nice to have a sequenceType method for the BCAbstractSequence subclasses, which will always return the same value (no need for an ivar), and which does not rely on the BCSymbolSet (because, obviously, in the intializer, it has not been set yet). So the initializer in the superclass implementation can simply check that the BCSequence type and the BCSymbolSet type are compatible. charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From kvddrift at earthlink.net Sat Mar 5 18:42:46 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 5 Mar 2005 18:42:46 -0500 Subject: [Biococoa-dev] BCSequenceView In-Reply-To: References: <0c5ace4be6df67de2526fda570bc6a3b@earthlink.net> <003ff5b667711c3ded8ccdfba946dcd1@earthlink.net> <7d36df82618c2562674aa4b09e2abae7@earthlink.net> <0ffa67f4b9db7861cf78ee89826d10ad@earthlink.net> <0322c5351b071dedbab5e170020f1474@earthlink.net> <0c5ace4be6df67de2526fda570bc6a3b@earthlink.net> Message-ID: On Mar 3, 2005, at 3:01 AM, Charles PARNOT wrote: > My gut feeling is that subclassing NSTextView is 'evil', and that > instead the NSTextView should be a ivar and a subview (taking all the > space for now, but in the future a BCSequenceView could have several > subviews with several text fields like 'sequence length', 'current > selection',...), and that BCSequenceView could be the NSTextView > delegate to respond to user input and reformat the text view contents > accordingly. Sorry this is not very detailed, so you can ignore it if > that does not make sense! If BCSequenceView is the delegate, then I wouldn't call it a view :) Apple actually documents subclassing an NSTextView, so I don't see why it is evil. I think what you have in mind is more a complete window, with a view, and several widgets. For that you would need to create a controller class, probably a NSWindowController subclass. If we want more control over the way the sequence is displayed in a view, overriding NSLayoutManager et al, as explained by Alex is probably the way to go. I am not sure however, if that would be in place 'in the framework' of BioCocoa. - Koen. From a.griekspoor at nki.nl Sat Mar 5 18:44:51 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Sun, 6 Mar 2005 00:44:51 +0100 Subject: [Biococoa-dev] functional groups In-Reply-To: <1ee37af387f8bb7c59da4f73e2ab33ff@earthlink.net> References: <1ee37af387f8bb7c59da4f73e2ab33ff@earthlink.net> Message-ID: Hmm, this is kind of a difficult thing we're talking about. First to answer the easy question first. In principal this is much more important in the protein world than in the DNA world. Having said that, there are examples of special bases as well. For instance, methylated DNA bases, special nucleotides you can order to be build into your oligos etc. So ideally it would work with all sequence types. My doubts are specifically related to the question: should these things be a) a BCFeature instead of a symbol in the first place b) a BCSymbol variant c) now that I think of it, a BCSymbol feature instead of a BCSequence Feature? For all options is something to say I guess, they're both a feature and a special symbol. If we go for option a) it would mean that we would have problems implementing things like adding molecular weights, specific influences in calculations, etc On the other hand going for b) makes it difficult to draw the border to what we call a feature and what should be an option. c) is some kind of new idea, but perhaps adds to much complexity. It's just to think out loud and start the discussion. Basically what might be a good thing to keep in mind and a point at where to draw the border between a feature and a sequence is that we're talking about modifications on a per symbol basis. So while an alpha helix is clearly a feature, a phosphorylation is clearly a symbol. What are the borderline examples we can think of? I'm a bit worried about the addition of yet another symbol layer, and not sure about the BCRoot, BCToken or something alike would be better IMHO. Still, it feels a bit odd. Ideally, say a phosphorylation ADDs something to a symbol, it changes it properties, its mass etc but it does STAY for instance a tyrosine. That's why I came up with the idea of i.e. symbol features (now this is impossible because the symbols are singletons now that I think of it) but than perhaps the other way around, we should add symbolFeatures (single symbol features) that have additional effects than the general features in that they're taken along in calculations). What I try to say is that I see a problem when you decide to say incorporate a phosphorylated-tyrosine instead of a standard tyrosine symbol because suddenly part of my BCSequence doesn't respond to representedBy: for instance. Or a methylated guanine does not respond to complement (which is still something to debate whether to return a methylated cytosine then or a non-methylated one but let's leave that for a while). I think that a modified symbol should respond to ALL methods of BCSymbol + do some additional stuff if necessary. But then again, I'm not sure if these things should be added as BCSymbol subclasses at all! Because what do we do now with BCSymbolSets, do we now suddenly have to add all possible modified symbols to a set as well, that will never work.... So in conclusion, I have no clue how to implement this ;-) Cheers, Alex On 5-mrt-05, at 23:12, Koen van der Drift wrote: > Hi, > > I am thinking about adding a class that will take care of functional > groups on sequences, eg methyl, phosphate, etc. My first thought was > to make it a subclass of BCSymbol. However, it will then also inherit > the representedBy code, etc. That is not so useful for functional > groups (or maybe they are?). So, what I plan to do is to make a new, > abstract root class for symbols, maybe called BCRoot. Then BCSymbol > and BCFunctionalGroup wll inherit from BCRoot. The BCRoot class will > have a lot of the code of BCSymbol, eg for name, mass, etc. However, > some things will be in BCSymbol, eg initializeSymbolRelationships. The > functional groups can be attached to the sequence, using the > not-yet-existing BCFeature class. The BCFunctionalGroup can also have > an ivar pointing to the symbol it is attached to. > > What do you guys think? For proteins functional groups are very > important, and the probably need their own class. What about DNA/RNA? > Are functional groups important, eg when digesting? > > > Also, when looking at the code of BCSymbol, I noticed that we use the > initializer initWithSymbol:(unichar)aChar. Maybe we can change this to > initWithChar, or even add that method? When using the word 'symbol' I > would expect a BCSymbol, not a char. I remember we discussed this > earlier, but I forgot why we eventually decided to use initWithSymbol > instead of initWithChar. > > > cheers, > > - Koen. > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com The requirements said: Windows 2000 or better. So I got a Macintosh. ********************************************************* ************************************************************** ** Alexander Griekspoor ** ************************************************************** The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com MacOS X: The power of UNIX with the simplicity of the Mac *************************************************************** From charles.parnot at stanford.edu Sat Mar 5 18:49:21 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Sat, 5 Mar 2005 15:49:21 -0800 Subject: [Biococoa-dev] Symbols, codons and functional groups In-Reply-To: References: Message-ID: At 10:50 AM -0500 3/4/05, John Timmer wrote: >.......Anyway, as an aside, i've been thinking that the symbol set structure would >allow for a nice encapsulation of a genetic code. The problem is that >codons aren't symbols (since they have both amino acid and nucleotide >information). Any suggestions on how to adapt things? > >JT I thought of that too at some point. And my idea was: codons could simply be symbols, to benefit from the BCSymbol/BCSequence/BCSymbolSet design. At this point, symbols are one char, but a subclass could add an ivar. Then, in BCSequence et al., we should distinguish between countSymbols and length. At 5:12 PM -0500 3/5/05, Koen van der Drift wrote: >Hi, > >I am thinking about adding a class that will take care of functional groups on sequences, eg methyl, phosphate, etc. My first thought was to make it a subclass of BCSymbol. However, it will then also inherit the representedBy code, etc. That is not so useful for functional groups (or maybe they are?). So, what I plan to do is to make a new, abstract root class for symbols, maybe called BCRoot. Then BCSymbol and BCFunctionalGroup wll inherit from BCRoot. The BCRoot class will have a lot of the code of BCSymbol, eg for name, mass, etc. However, some things will be in BCSymbol, eg initializeSymbolRelationships. The functional groups can be attached to the sequence, using the not-yet-existing BCFeature class. The BCFunctionalGroup can also have an ivar pointing to the symbol it is attached to. > >What do you guys think? For proteins functional groups are very important, and the probably need their own class. What about DNA/RNA? Are functional groups important, eg when digesting? Functional groups can be useful for DNA and RNA as well. Methylation(s), ddNTP, dephosphorylation I see a connection with codons here: we may need symbols with more than one char. These could be subclasses, either of the current root class (codons) or of DNA, RNA, protein symbols. Or we could have a symbol class that handles symbols with more than one char. The ' representedBy' method could be useful for functional groups too. If we use suclasses, then maybe we need a 'sequenceType' method, so symbolSet can contain symbols of different classes and just need to check the sequenceType compatibility... so that BCSymbol>>BCSymbolSet>>BCSequenceXXX of the same type always go together. > >Also, when looking at the code of BCSymbol, I noticed that we use the initializer initWithSymbol:(unichar)aChar. Maybe we can change this to initWithChar, or even add that method? When using the word 'symbol' I would expect a BCSymbol, not a char. I remember we discussed this earlier, but I forgot why we eventually decided to use initWithSymbol instead of initWithChar. Not sure about that (I was not there), I'd say that 'initWithChar' is more logical. charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From kvddrift at earthlink.net Sat Mar 5 18:50:45 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 5 Mar 2005 18:50:45 -0500 Subject: [Biococoa-dev] More on BCSymbolSets In-Reply-To: References: Message-ID: On Mar 5, 2005, at 6:41 PM, Charles PARNOT wrote: > We're getting close! > > In the initializer, it is necessary to check that the BCSymbolSet is > the right type before using it, right? Ha, in my idea it should be the other way around. The BCSymbolSet defines which symbols are allowed. The user wants to make a protein, so it passes a string/array and the proteinSymbolSet to filter the string/array. - Koen. From kvddrift at earthlink.net Sat Mar 5 18:56:36 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 5 Mar 2005 18:56:36 -0500 Subject: [Biococoa-dev] functional groups In-Reply-To: References: <1ee37af387f8bb7c59da4f73e2ab33ff@earthlink.net> Message-ID: On Mar 5, 2005, at 6:44 PM, Alexander Griekspoor wrote: > Ideally, say a phosphorylation ADDs something to a symbol, it changes > it properties, its mass etc but it does STAY for instance a tyrosine. > That's why I came up with the idea of i.e. symbol features (now this > is impossible because the symbols are singletons now that I think of > it) but than perhaps the other way around, we should add > symbolFeatures (single symbol features) that have additional effects > than the general features in that they're taken along in > calculations). > What I try to say is that I see a problem when you decide to say > incorporate a phosphorylated-tyrosine instead of a standard tyrosine > symbol because suddenly part of my BCSequence doesn't respond to > representedBy: for instance. Or a methylated guanine does not respond > to complement (which is still something to debate whether to return a > methylated cytosine then or a non-methylated one but let's leave that > for a while). You are absolutely right. What we need is an extension of the BCSymbol class - without changing its basic properties. So forget about the BCFunctionalGroup class :) I need to think what would be a proper solution for this. In Design Patterns lingo it would probably be the Decorator, see eg http://www.javaworld.com/javaworld/jw-12-2001/jw-1214- designpatterns.html. I will see how that can be applied to ObjC. Having fun with the snow ? - Koen. From a.griekspoor at nki.nl Sat Mar 5 18:58:15 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Sun, 6 Mar 2005 00:58:15 +0100 Subject: [Biococoa-dev] Symbols, codons and functional groups In-Reply-To: References: Message-ID: <791c51c6cad2bbe30f1d99e034b92b1b@nki.nl> > I thought of that too at some point. > And my idea was: codons could simply be symbols, to benefit from the > BCSymbol/BCSequence/BCSymbolSet design. At this point, symbols are one > char, but a subclass could add an ivar. If I'm correct, this is exactly the BioJava approach where Codons are symbols as well... For instance, what they call Alphabets is what we call SymbolSets. It might be handy to let us further study their approach a bit more to see the pros/cons. Some quotes from the cookbook: > CrossProductAlphabets are used to represent groups of Symbols as a > single Symbol. This is very useful for treating things like codons as > single Symbols. > CrossProductAlphabets result from the multiplication of other > Alphabets. CrossProductAlphabets are used to wrap up 2 or more Symbols > into a single "cross product" Symbol. For example using a 3 way cross > of the DNA alphabet you could wrap a codon as a Symbol. You could then > count those codon Symbols in a Count or you could used them in a > Distribution. http://www.biojava.org/docs/bj_in_anger/index.htm Again, I have absolutely no problem to adapt for a large part their setup/basic ideas. I've noticed with the advocacy of the symbolsets and recent discussions on features etc that we're seeing a nice example of convergent evolution ;-) Cheers, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Microsoft is not the answer, Microsoft is the question, NO is the answer ********************************************************* From charles.parnot at stanford.edu Sun Mar 6 01:27:56 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Sat, 5 Mar 2005 22:27:56 -0800 Subject: [Biococoa-dev] More on BCSymbolSets In-Reply-To: References: Message-ID: At 6:50 PM -0500 3/5/05, Koen van der Drift wrote: >On Mar 5, 2005, at 6:41 PM, Charles PARNOT wrote: > >>We're getting close! >> >>In the initializer, it is necessary to check that the BCSymbolSet is the right type before using it, right? > >Ha, in my idea it should be the other way around. The BCSymbolSet defines which symbols are allowed. The user wants to make a protein, so it passes a string/array and the proteinSymbolSet to filter the string/array. > >- Koen. OK, that works well for BCSequence, but how about that code: BCSequenceDNA *dna=[BCSequenceDNA initWithString:@"ATGTTTGAT" symbolSet:proteinSymbolSet]; What should happen? charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From charles.parnot at stanford.edu Sun Mar 6 01:46:31 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Sat, 5 Mar 2005 22:46:31 -0800 Subject: [Biococoa-dev] BCSequenceView In-Reply-To: References: <0c5ace4be6df67de2526fda570bc6a3b@earthlink.net> <003ff5b667711c3ded8ccdfba946dcd1@earthlink.net> <7d36df82618c2562674aa4b09e2abae7@earthlink.net> <0ffa67f4b9db7861cf78ee89826d10ad@earthlink.net> <0322c5351b071dedbab5e170020f1474@earthlink.net> <0c5ace4be6df67de2526fda570bc6a3b@earthlink.net> Message-ID: At 6:42 PM -0500 3/5/05, Koen van der Drift wrote: >On Mar 3, 2005, at 3:01 AM, Charles PARNOT wrote: > >>My gut feeling is that subclassing NSTextView is 'evil', and that instead the NSTextView should be a ivar and a subview (taking all the space for now, but in the future a BCSequenceView could have several subviews with several text fields like 'sequence length', 'current selection',...), and that BCSequenceView could be the NSTextView delegate to respond to user input and reformat the text view contents accordingly. Sorry this is not very detailed, so you can ignore it if that does not make sense! > > >If BCSequenceView is the delegate, then I wouldn't call it a view :) Apple actually documents subclassing an NSTextView, so I don't see why it is evil. I think what you have in mind is more a complete window, with a view, and several widgets. For that you would need to create a controller class, probably a NSWindowController subclass. > >If we want more control over the way the sequence is displayed in a view, overriding NSLayoutManager et al, as explained by Alex is probably the way to go. I am not sure however, if that would be in place 'in the framework' of BioCocoa. > >- Koen. Yes, sorry, I just wrote that email too fast, and 'evil' should have been 'scares me', and I should just read the docs before opening my big mouth, sometimes. I don't know anything about the text layout in Cocoa, so shame on me! The way I thought of BCSequenceView was more a superview composed of different prebuilt elements, with an NSTextView being central to it and displaying the sequence per se, but maybe also a little text field with the length, sequence type, some other text fields for symbol counts,... That would be easy for me. I agree that a better integrated NSTextView might do a better job (as long as you get it to work!). I continue this email 3 hours later... Well, now that I think more about it, I actually have a maybe more meaningful concern. When you make your own program and subclass NSTextView, it is fine, because you are the only user of it, you design it, you know how to use it and the limitations. But when you try to create a class for a framework that other people will use, you may want to make the object more encpasulated. If it is a subclass of NSTextView, it clearly inherits many different things from it that users may be tempted to use. Even if you put warnings about not doing this or that with it, we may not able to predict all the subtleties and consequences. Conversely, if you wrap the NSTextView in an NSView subclass, and have a limited number of public methods, it may be easier to use and document. This NSView subclass would just add a layer around the objects in it, and shield the NSTextView from being abused and misunderstood. In addition, it might be easier to later change the internals and/or have several widgets composing the final view. An example is the WebView in Apple's framework. It could have been made a subclass of NSTextView. After all, it is usually a lot of text, and scrollers. But that would be confusing to use, with all the inheritage from NSTextView + the new stuff dealing with web connections. OK, a web view is more complicated that a sequence view, so it is not the best example, but I hope you see my point. just my 2 cents, from a naive user perspective charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From kvddrift at earthlink.net Sun Mar 6 07:24:47 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 6 Mar 2005 07:24:47 -0500 Subject: [Biococoa-dev] More on BCSymbolSets In-Reply-To: References: Message-ID: <084fbdeb7c3568ad8ca138ed33d42b4c@earthlink.net> On Mar 6, 2005, at 1:27 AM, Charles PARNOT wrote: > OK, that works well for BCSequence, but how about that code: > > BCSequenceDNA *dna=[BCSequenceDNA initWithString:@"ATGTTTGAT" > symbolSet:proteinSymbolSet]; My understanding of the whole new BCSequence construction, is that the user only needs to use BCSequence. The framework will deal with its subclasses if needed. So in that case, the above would change to: BCSequence *dna=[BCSequence initWithString:@"ATGTTTGAT" symbolSet:proteinSymbolSet]; which returns a protein object, named 'dna'. Can we prevent the user making such ambiguous situations, maybe by making the BCSequence subclasses private? Or will that break things? - Koen. From kvddrift at earthlink.net Sun Mar 6 07:29:34 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 6 Mar 2005 07:29:34 -0500 Subject: [Biococoa-dev] BCSequenceView In-Reply-To: References: <0c5ace4be6df67de2526fda570bc6a3b@earthlink.net> <003ff5b667711c3ded8ccdfba946dcd1@earthlink.net> <7d36df82618c2562674aa4b09e2abae7@earthlink.net> <0ffa67f4b9db7861cf78ee89826d10ad@earthlink.net> <0322c5351b071dedbab5e170020f1474@earthlink.net> <0c5ace4be6df67de2526fda570bc6a3b@earthlink.net> Message-ID: <9c841f4b2c2fb4f6b78e8ace70eef485@earthlink.net> On Mar 6, 2005, at 1:46 AM, Charles PARNOT wrote: > An example is the WebView in Apple's framework. It could have been > made a subclass of NSTextView. After all, it is usually a lot of text, > and scrollers. But that would be confusing to use, with all the > inheritage from NSTextView + the new stuff dealing with web > connections. OK, a web view is more complicated that a sequence view, > so it is not the best example, but I hope you see my point. > I see your point, and I think it is a valid one. Not sure though, if that would fit in BCAppKit. It might need a separate target. including a IBPalette. See for another example the SM2DGraphView framework (http://developer.snowmintcs.com/frameworks/sm2dgraphview/). - Koen. From jtimmer at bellatlantic.net Sun Mar 6 09:16:23 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Sun, 06 Mar 2005 09:16:23 -0500 Subject: [Biococoa-dev] More on BCSymbolSets In-Reply-To: <084fbdeb7c3568ad8ca138ed33d42b4c@earthlink.net> Message-ID: > > On Mar 6, 2005, at 1:27 AM, Charles PARNOT wrote: > >> OK, that works well for BCSequence, but how about that code: >> >> BCSequenceDNA *dna=[BCSequenceDNA initWithString:@"ATGTTTGAT" >> symbolSet:proteinSymbolSet]; > > My understanding of the whole new BCSequence construction, is that the > user only needs to use BCSequence. The framework will deal with its > subclasses if needed. So in that case, the above would change to: > > BCSequence *dna=[BCSequence initWithString:@"ATGTTTGAT" > symbolSet:proteinSymbolSet]; > > which returns a protein object, named 'dna'. > > > Can we prevent the user making such ambiguous situations, maybe by > making the BCSequence subclasses private? Or will that break things? Don't know if it would break things, but Alex and I would be disappointed if they got made private, and I find it hard to believe that we'd be the only ones who want to work that way. Anyway, the point that Charles is trying to make (I think) is that symbol sets open up a new set of options for having things winding up internally inconsistent, and we need to police that. In this case, specifically, we need the init method of subclasses to ensure that the symbol set they've been handed is of the appropriate type - very simple, a line of code. The more significant question is what should be returned. Nil? an error? Use the default symbol set for that sequence type instead? Go 10.3 only and throw an exception? My vote would be to fall back to the default if we get anything non-sensical handed to us. _______________________________________________ This mind intentionally left blank From charles.parnot at stanford.edu Sun Mar 6 16:38:45 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Sun, 6 Mar 2005 13:38:45 -0800 Subject: [Biococoa-dev] More on BCSymbolSets In-Reply-To: References: Message-ID: At 9:16 AM -0500 3/6/05, John Timmer wrote: > > >> On Mar 6, 2005, at 1:27 AM, Charles PARNOT wrote: >> >>> OK, that works well for BCSequence, but how about that code: >>> >>> BCSequenceDNA *dna=[BCSequenceDNA initWithString:@"ATGTTTGAT" >>> symbolSet:proteinSymbolSet]; >> >> My understanding of the whole new BCSequence construction, is that the >> user only needs to use BCSequence. The framework will deal with its >> subclasses if needed. So in that case, the above would change to: >> >> BCSequence *dna=[BCSequence initWithString:@"ATGTTTGAT" >> symbolSet:proteinSymbolSet]; >> >> which returns a protein object, named 'dna'. >> >> >> Can we prevent the user making such ambiguous situations, maybe by >> making the BCSequence subclasses private? Or will that break things? > >Don't know if it would break things, but Alex and I would be disappointed if >they got made private, and I find it hard to believe that we'd be the only >ones who want to work that way. Anyway, the point that Charles is trying to >make (I think) is that symbol sets open up a new set of options for having >things winding up internally inconsistent, and we need to police that. In >this case, specifically, we need the init method of subclasses to ensure >that the symbol set they've been handed is of the appropriate type - very >simple, a line of code. Exactly my point. Koen, remember we need to give access to both a generic class (BCSequence) and typed sequence classes (for John and Alex... well, and potentially other users too!). >The more significant question is what should be returned. Nil? an error? >Use the default symbol set for that sequence type instead? Go 10.3 only and >throw an exception? > >My vote would be to fall back to the default if we get anything non-sensical >handed to us. I thought that too (see my code) but maybe another option is to use an empty set, which means you get empty sequences no matter what you do. The reason is you want something not ugly enough to generate runtime errors (no nil, no exception), but still strong enough that the error can not be missed (be it the user of the framework, or the user of the BioCocoa-based app, doing something silly). charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From kvddrift at earthlink.net Sun Mar 6 16:54:49 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 6 Mar 2005 16:54:49 -0500 Subject: [Biococoa-dev] More on BCSymbolSets In-Reply-To: References: Message-ID: <5b80b9c9482f65dc97a1cd98fd3da607@earthlink.net> On Mar 6, 2005, at 4:38 PM, Charles PARNOT wrote: > > Exactly my point. Koen, remember we need to give access to both a > generic class (BCSequence) and typed sequence classes (for John and > Alex... well, and potentially other users too!). Thanks for the reminder :) > > >> The more significant question is what should be returned. Nil? an >> error? >> Use the default symbol set for that sequence type instead? Go 10.3 >> only and >> throw an exception? >> >> My vote would be to fall back to the default if we get anything >> non-sensical >> handed to us. > > I thought that too (see my code) but maybe another option is to use an > empty set, which means you get empty sequences no matter what you do. > The reason is you want something not ugly enough to generate runtime > errors (no nil, no exception), but still strong enough that the error > can not be missed (be it the user of the framework, or the user of the > BioCocoa-based app, doing something silly). > I also would prefer an empty sequence. - Koen. From charles.parnot at stanford.edu Sun Mar 6 18:34:06 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Sun, 6 Mar 2005 15:34:06 -0800 Subject: [Biococoa-dev] SDKROOT Message-ID: I forgot to set the SDKROOT and MACOSX_DEPLOYMENT_TARGET of BioCocoa target to make it 10.2 or higher, which I just did. Remember to CVS-update your project file. charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From charles.parnot at stanford.edu Mon Mar 7 01:00:43 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Sun, 6 Mar 2005 22:00:43 -0800 Subject: [Biococoa-dev] SDKROOT In-Reply-To: <1885e0ee86ab85484fac6ae08204d52d@earthlink.net> References: <1885e0ee86ab85484fac6ae08204d52d@earthlink.net> Message-ID: >>I forgot to set the SDKROOT and MACOSX_DEPLOYMENT_TARGET of BioCocoa target to make it 10.2 or higher, which I just did. Remember to CVS-update your project file. >> > >Ah, then you must be 'didou'? I wondered who that could be :) > >- Koen. Arghhh... You found my double life! OK, this is the username at home (my daughter's nickname). charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From biococoa at bioworxx.com Tue Mar 8 15:07:54 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Tue, 8 Mar 2005 21:07:54 +0100 Subject: [Biococoa-dev] can't compile framework Message-ID: <99dc40100232926d2c1253bf1002e9e8@bioworxx.com> Hi everybody, i just checked out the latest version and tried to compile the project, but it didn't work. Very strange problem: BioCocoa_Prefix.h:6:38: Foundation/Foundation.h: No such file or directory BioCocoa_Prefix.h:7:30: AppKit/AppKit.h: No such file or directory I think it has something to do with the new framework setting. I'm not very familiar with framework settings in cocoa, so perhaps anyone can help me. please :-) NEXT I want to start with the BCAlignment stuff and there are many things to discuss: 1. what exactly do we want an BCAlignment to be ? A slim Datastructure for different Alignment algorithms Or a comfortable datastructure, which is perhaps not very useful for programs concentrating on performance 2. We need a BCMatrix (protocol or class) for substitution matrices 3. A protocol for alignment generating algorithms 4.. 5.. :-) i think it's enough for now I'd like to know what you think about it and what you want to do with alignments in your application @Charles: I'm interested in XGrid programming, do you know where i can find the api docs for the new XGridFoundation framework ? I looked in my software seed section in ADC but didn't find it. In my tiger distribution is just a demo app and the framework itself, but no documentation. Do you know where to find the documentation ? Thx Philipp Seibel Lehrstuhl f?r Bioinformatik Biozentrum, Am Hubland Universit?t W?rzburg philipp.seibel at biozentrum.uni-wuerzburg.de -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 1590 bytes Desc: not available URL: From biococoa at bioworxx.com Tue Mar 8 15:08:33 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Tue, 8 Mar 2005 21:08:33 +0100 Subject: [Biococoa-dev] can't compile framework Message-ID: <4b49dee839ec0002cce7bce3beb124f3@bioworxx.com> Hi everybody, i just checked out the latest version and tried to compile the project, but it didn't work. Very strange problem: BioCocoa_Prefix.h:6:38: Foundation/Foundation.h: No such file or directory BioCocoa_Prefix.h:7:30: AppKit/AppKit.h: No such file or directory I think it has something to do with the new framework setting. I'm not very familiar with framework settings in cocoa, so perhaps anyone can help me. please :-) NEXT I want to start with the BCAlignment stuff and there are many things to discuss: 1. what exactly do we want an BCAlignment to be ? A slim Datastructure for different Alignment algorithms Or a comfortable datastructure, which is perhaps not very useful for programs concentrating on performance 2. We need a BCMatrix (protocol or class) for substitution matrices 3. A protocol for alignment generating algorithms 4.. 5.. :-) i think it's enough for now I'd like to know what you think about it and what you want to do with alignments in your application @Charles: I'm interested in XGrid programming, do you know where i can find the api docs for the new XGridFoundation framework ? I looked in my software seed section in ADC but didn't find it. In my tiger distribution is just a demo app and the framework itself, but no documentation. Do you know where to find the documentation ? Thx Philipp Seibel Lehrstuhl f?r Bioinformatik Biozentrum, Am Hubland Universit?t W?rzburg philipp.seibel at biozentrum.uni-wuerzburg.de -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 1590 bytes Desc: not available URL: From jtimmer at bellatlantic.net Tue Mar 8 16:13:18 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Tue, 08 Mar 2005 16:13:18 -0500 Subject: [Biococoa-dev] can't compile framework In-Reply-To: <4b49dee839ec0002cce7bce3beb124f3@bioworxx.com> Message-ID: Philip - > > i just checked out the latest version and tried to compile the project, but it > didn't work. Very strange problem: > > BioCocoa_Prefix.h:6:38: Foundation/Foundation.h: No such file or directory > BioCocoa_Prefix.h:7:30: AppKit/AppKit.h: No such file or directory > > I think it has something to do with the new framework setting. I'm not very > familiar with framework settings in cocoa, so perhaps anyone can help me. > please :-) I just did a clean checkout, and switched to the new Framework target, and things worked without a hitch. Check that your target isn?t the ?old format? one. > > > NEXT > > I want to start with the BCAlignment stuff and there are many things to > discuss: > > 1. what exactly do we want an BCAlignment to be ? > A slim Datastructure for different Alignment algorithms > Or a comfortable datastructure, which is perhaps not very useful for > programs concentrating on performance > Both, obviously ;). From what I?ve seen of many alignment programs, they?re pretty processor intensive. I?d be willing to sacrifice using the full sequence classes in order to up the performance. > > 2. We need a BCMatrix (protocol or class) for substitution matrices > Could you explain what you need this for? Be gentle ? I haven?t taken a math course in about 15 years, and I?m a developmental biologist now. > > 3. A protocol for alignment generating algorithms > Well, as the person writing the code, what would you need in order to set things up? To ask you another question ? are there any values that would help you perform alignments that we could put in the base symbol classes (nucleotide, amino acid)? > > I'd like to know what you think about it and what you want to do with > alignments in your application > Two things, really ? contig generation from overlapping sequences, and multiple alignments of related sequences so that I could generate a nice looking view of the conservations. As to what I?d think about it, I?m very excited! Cheers, JT > _______________________________________________ This mind intentionally left blank -------------- next part -------------- An HTML attachment was scrubbed... URL: From charles.parnot at stanford.edu Tue Mar 8 16:31:53 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Tue, 8 Mar 2005 13:31:53 -0800 Subject: [Biococoa-dev] can't compile framework In-Reply-To: <4b49dee839ec0002cce7bce3beb124f3@bioworxx.com> References: <4b49dee839ec0002cce7bce3beb124f3@bioworxx.com> Message-ID: At 21:08 +0100 3/8/05, Philipp Seibel wrote: >Hi everybody, > >i just checked out the latest version and tried to compile the >project, but it didn't work. Very strange problem: > >BioCocoa_Prefix.h:6:38: Foundation/Foundation.h: No such file or directory >BioCocoa_Prefix.h:7:30: AppKit/AppKit.h: No such file or directory > It is probably because you don't have the SKDs installed. I believe they are optional when you install Xcode. Look for /Developer/SDKs/MacOSxxx folders. If you don't have those, then yes, this is the problem. To compile the framework, the compiler uses the SDKROOT build settings as the path to the headers. This is why it can't find the Foundation and AppKit headers, if the SDKROOT points to an empty folder. If this is the problem, you may want to run the latest Xcode installer and check the SDKs in the options. charles NB: the SDKs were supposed to be used before, but were actually not used until I changed the target type and added the SDKROOT setting. -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 -------------- next part -------------- An HTML attachment was scrubbed... URL: From charles.parnot at stanford.edu Tue Mar 8 20:17:01 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Tue, 8 Mar 2005 17:17:01 -0800 Subject: [Biococoa-dev] starting BCAlignment In-Reply-To: <4b49dee839ec0002cce7bce3beb124f3@bioworxx.com> References: <4b49dee839ec0002cce7bce3beb124f3@bioworxx.com> Message-ID: >NEXT > >I want to start with the BCAlignment stuff and there are many things to discuss: > >1. what exactly do we want an BCAlignment to be ? > A slim Datastructure for different Alignment algorithms > Or a comfortable datastructure, which is perhaps not very useful for programs concentrating on performance In terms of design, we will probably need at least 2 types of classes, following the current structure of the project: * some data objects with the results of alignement; I think there were some discussions about it before I joined, on what a sequence cluster might look like,... * some tool objects that do the job In the tools, you may have to transform the data structure for performance, if performance turns out to be an issue. Now my 2 cents with maybe obvious things, but important to keep in mind. It is probably a good idea to not focus too much on performance at first... and still leave some doors open for it, with some clean separation between the core functions and the rest. If you are familiar with Shark, it would be a good idea to use it after the first draft of your tools. If you are not familiar with it, it would still be a good idea! I was just reading this piece today, linked from cocoadev.com, and it is a very interesting reading: http://wilshipley.com/blog/2005/02/free-programming-tips-are-worth-every.html I suspect these considerations do not fully apply to scientific computing, where you can sometimes better predict performance bottleneck, but still, good reading. >2. We need a BCMatrix (protocol or class) for substitution matrices >3. A protocol for alignment generating algorithms >4.. >5.. :-) i think it's enough for now > I am not an expert in alignments, so I am listening to your ideas! Maybe you could present what you think the goals could be in more details, what kind of alignements you'd want to implement first, how far you want to go,... You could also start putting some code, and I would look at it and make comments from the naive user's perspective. I am very good at making long emails with endless comments ;-) charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From kvddrift at earthlink.net Tue Mar 8 20:32:49 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue, 8 Mar 2005 20:32:49 -0500 Subject: [Biococoa-dev] can't compile framework In-Reply-To: <4b49dee839ec0002cce7bce3beb124f3@bioworxx.com> References: <4b49dee839ec0002cce7bce3beb124f3@bioworxx.com> Message-ID: <44580952cfc687478b6ed8e176ce8120@earthlink.net> On Mar 8, 2005, at 3:08 PM, Philipp Seibel wrote: > I want to start with the BCAlignment stuff and there are many things > to discuss: > > 1. what exactly do we want an BCAlignment to be ? > A slim Datastructure for different Alignment algorithms > Or a comfortable datastructure, which is perhaps not very useful for > programs concentrating on performance Although I have not much experience with alignments, I would suggest that we go for the first option. If programmed cleverly, it should allow for expansion to more sophisticated calculations. This expansion can be done within BioCocoa, or by the user of the framework. > > 2. We need a BCMatrix (protocol or class) for substitution matrices Could you please explain this in a littile more detail? My understanding of alignment calculations is that these are very complicated and need a lot of processing power (correct me if I am wrong). Therefore we should consider using C directly, without much overhead of ObjC classes. - Koen. From biococoa at bioworxx.com Wed Mar 9 14:28:14 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Wed, 9 Mar 2005 20:28:14 +0100 Subject: [Biococoa-dev] starting BCAlignment In-Reply-To: References: <4b49dee839ec0002cce7bce3beb124f3@bioworxx.com> Message-ID: <0522d600a08d97128697a1ebdc61f2d7@bioworxx.com> Am 09.03.2005 um 02:17 schrieb Charles PARNOT: >> NEXT >> >> I want to start with the BCAlignment stuff and there are many things >> to discuss: >> >> 1. what exactly do we want an BCAlignment to be ? >> A slim Datastructure for different Alignment algorithms >> Or a comfortable datastructure, which is perhaps not very useful for >> programs concentrating on performance > > In terms of design, we will probably need at least 2 types of classes, > following the current structure of the project: > * some data objects with the results of alignement; I think there were > some discussions about it before I joined, on what a sequence cluster > might look like,... > * some tool objects that do the job > > In the tools, you may have to transform the data structure for > performance, if performance turns out to be an issue. > > Now my 2 cents with maybe obvious things, but important to keep in > mind. > It is probably a good idea to not focus too much on performance at > first... and still leave some doors open for it, with some clean > separation between the core functions and the rest. If you are > familiar with Shark, it would be a good idea to use it after the first > draft of your tools. If you are not familiar with it, it would still > be a good idea! > I was just reading this piece today, linked from cocoadev.com, and it > is a very interesting reading: > http://wilshipley.com/blog/2005/02/free-programming-tips-are-worth- > every.html > You are completly right, i will just start and do the optimization later on. > I suspect these considerations do not fully apply to scientific > computing, where you can sometimes better predict performance > bottleneck, but still, good reading. > > >> 2. We need a BCMatrix (protocol or class) for substitution matrices >> 3. A protocol for alignment generating algorithms >> 4.. >> 5.. :-) i think it's enough for now >> > > > I am not an expert in alignments, so I am listening to your ideas! > Maybe you could present what you think the goals could be in more > details, what kind of alignements you'd want to implement first, how > far you want to go,... I thought of the most common alignment algorithms, like smith-waterman and needleman-wunsch. All common sequence based algorithms need substitution matrices to compute the alignment score. ( Thats why i like to have a BCSubstitutionMatrix class ( e.g. PAM, BLOSSOM ) ). > You could also start putting some code, and I would look at it and > make comments from the naive user's perspective. I am very good at > making long emails with endless comments ;-) i just added a first version of the BCAlignment header. I consider two classes BCAlignment and BCMutableAlignment, what do you think about it ? > Phil From jtimmer at bellatlantic.net Wed Mar 9 14:45:55 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Wed, 09 Mar 2005 14:45:55 -0500 Subject: [Biococoa-dev] starting BCAlignment In-Reply-To: <0522d600a08d97128697a1ebdc61f2d7@bioworxx.com> Message-ID: Just so I have this straight - Google tells me that the substitution matrix is just a way to look up a score that indicates how good an alignment between two amino acids is. If it's something rare, like W, a W-W match gets a high score. If it's something like alanine, it A-A matches get a lower score. If it's not a match, the score is calculated based on how similar the amino acids are. The matrix is simply something that researchers have devised, and not an inherent property - different matrixes exist, and may produce different alignments. If all of this is correct, then I think I know enough to actually help ;). JT _______________________________________________ This mind intentionally left blank From biococoa at bioworxx.com Wed Mar 9 14:59:58 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Wed, 9 Mar 2005 20:59:58 +0100 Subject: Fwd: [Biococoa-dev] starting BCAlignment Message-ID: <88fa340c9550b701c9a2a22f28dc8344@bioworxx.com> > > Am 09.03.2005 um 20:45 schrieb John Timmer: > >> Just so I have this straight - >> >> Google tells me that the substitution matrix is just a way to look up >> a >> score that indicates how good an alignment between two amino acids is. > > A T G C > A 9 3 2 1 > T 3 9 3 2 > G 2 3 9 1 > C 1 2 1 9 > > for a dna sequence matrix. As you can see, every matrix has to be > symmetric. > >> If >> it's something rare, like W, a W-W match gets a high score. If it's >> something like alanine, it A-A matches get a lower score. If it's >> not a >> match, the score is calculated based on how similar the amino acids >> are. > > Thats the point. > >> The matrix is simply something that researchers have devised, and not >> an >> inherent property - different matrixes exist, and may produce >> different >> alignments. > > yes there are special matrices for special purposes. For example you > can define a matrix to align specific domains better and so on. > >> >> If all of this is correct, then I think I know enough to actually >> help ;). > > you are completly right !!! :-) From jtimmer at bellatlantic.net Wed Mar 9 18:31:37 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Wed, 09 Mar 2005 18:31:37 -0500 Subject: [Biococoa-dev] starting BCAlignment In-Reply-To: <88fa340c9550b701c9a2a22f28dc8344@bioworxx.com> Message-ID: >>> >>> If all of this is correct, then I think I know enough to actually >>> help ;). >> >> you are completly right !!! :-) > Good to know. So, coding efficiently, how do matrix lookups normally work? It's easy to see how a dictionary of dictionaries would work (ie - get dictionary with the key "Cys", look up value for "Tyr" in it), but I'd imagine they're normally implemented in much faster C code. The reason I ask is that I'm expecting the lookup to work with char variables, but I'd imagine the code could be adapted to use pointers instead of a char. If it is, since all our individual bases/amino acids are single instance pointers, it'd be very easy to get this to work, and I'd be happy to help you populate the matrix, as well as helping set up a .plist file format to store it. Cheers, JT _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Wed Mar 9 20:32:19 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 9 Mar 2005 20:32:19 -0500 Subject: [Biococoa-dev] starting BCAlignment In-Reply-To: <0522d600a08d97128697a1ebdc61f2d7@bioworxx.com> References: <4b49dee839ec0002cce7bce3beb124f3@bioworxx.com> <0522d600a08d97128697a1ebdc61f2d7@bioworxx.com> Message-ID: Hi, As I said before, I don't know much about coding alignments, and leave that all up to you guys. I remember however, that Alex once mentioned that he knows someone in his department who is very familiar with alignment code. It could be an idea to see what he thinks about our approach. Maybe he even wants to become a BioCocoa member :) - Koen. From biococoa at bioworxx.com Thu Mar 10 03:46:23 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Thu, 10 Mar 2005 09:46:23 +0100 Subject: [Biococoa-dev] starting BCAlignment In-Reply-To: References: Message-ID: <4455f60cfa9cdc2a1c9e9dfc52bc2c7e@bioworxx.com> Am 10.03.2005 um 00:31 schrieb John Timmer: > >>>> >>>> If all of this is correct, then I think I know enough to actually >>>> help ;). >>> >>> you are completly right !!! :-) >> > > Good to know. So, coding efficiently, how do matrix lookups normally > work? > It's easy to see how a dictionary of dictionaries would work (ie - get > dictionary with the key "Cys", look up value for "Tyr" in it), but I'd > imagine they're normally implemented in much faster C code There is no need for a dictionary, i would suggest to have a NSArray of BCSymbols. That would allow to access the substitution score with the indices of the Symbols in a int* matrix. It's perhaps not the fastest, but it works well and is a clean interface. So internally we have a int* matrix for the scores. Just look at the small interface i just submittet. > The reason I ask is that I'm expecting the lookup to work with char > variables, but I'd imagine the code could be adapted to use pointers > instead > of a char. If it is, since all our individual bases/amino acids are > single > instance pointers, it'd be very easy to get this to work, and I'd be > happy > to help you populate the matrix, as well as helping set up a .plist > file > format to store it. Yes, i also thought of storing the matrix in a .plist file. I also think we should provide the most common matrices with our framework. Phil From biococoa at bioworxx.com Thu Mar 10 08:54:49 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Thu, 10 Mar 2005 14:54:49 +0100 Subject: [Biococoa-dev] BCMultiSequenceTool Message-ID: <366193552e27b83613c475f7d9a7faa2@bioworxx.com> Hi everyone, to make the first AlignmentTools i would suggest to make a BCMultiSequenceTool base class including a NSArray of abstract sequences. Does everyone agree with me ? ( yes, sure ;-) ) Phil From biococoa at bioworxx.com Thu Mar 10 09:03:11 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Thu, 10 Mar 2005 15:03:11 +0100 Subject: [Biococoa-dev] BCSymbolSet Message-ID: <5cba0875c2320b3b1500dce968b5d75e@bioworxx.com> Hi, i'd like to make the suggestion to make BCSymbolSet a subclass of NSMutableSet, to get all set features for free. I for example need the allObjects: method for my BCScoringMatrix class. cheers, Phil From jtimmer at bellatlantic.net Thu Mar 10 14:17:35 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Thu, 10 Mar 2005 14:17:35 -0500 Subject: [Biococoa-dev] BCSymbolSet In-Reply-To: <5cba0875c2320b3b1500dce968b5d75e@bioworxx.com> Message-ID: > Hi, > > i'd like to make the suggestion to make BCSymbolSet a subclass of > NSMutableSet, to get all set features for free. I for example need the > allObjects: method for my BCScoringMatrix class. I think we were going to make BCSymbolSet already. I think the root class was going to be non-mutable and hold standard sets, like all nucleotides, etc. There was going to be a subclass that was mutable to allow user-defined sets. I wasn't involved in coding this, though, so I'm not sure where things stand. > to make the first AlignmentTools i would suggest to make a > BCMultiSequenceTool base class including a NSArray of abstract > sequences. That sounds reasonable. You had implied you had checked something in to the CVS in a previous email - is that so? If so, where can I find it? Cheers, JT _______________________________________________ This mind intentionally left blank From biococoa at bioworxx.com Thu Mar 10 14:31:48 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Thu, 10 Mar 2005 20:31:48 +0100 Subject: [Biococoa-dev] BCSymbolSet In-Reply-To: References: Message-ID: Am 10.03.2005 um 20:17 schrieb John Timmer: >> Hi, >> >> i'd like to make the suggestion to make BCSymbolSet a subclass of >> NSMutableSet, to get all set features for free. I for example need the >> allObjects: method for my BCScoringMatrix class. > I think we were going to make BCSymbolSet already. I think the root > class > was going to be non-mutable and hold standard sets, like all > nucleotides, > etc. Yes i noticed that, but it derives from NSObject, would be cleaner if it derives from NS(Mutable)Set. (just my opinion) > There was going to be a subclass that was mutable to allow > user-defined sets. I wasn't involved in coding this, though, so I'm > not > sure where things stand. > > >> to make the first AlignmentTools i would suggest to make a >> BCMultiSequenceTool base class including a NSArray of abstract >> sequences. > That sounds reasonable. I looked at the current structure and perhaps its better to make some protocols instead of the Tools base classes, so we're more flexible with the tool organisation. > You had implied you had checked something in to the CVS in a previous > email > - is that so? If so, where can I find it? > in the BCFoundation->BCAlignment Folder cheers, Phil From jtimmer at bellatlantic.net Thu Mar 10 14:43:30 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Thu, 10 Mar 2005 14:43:30 -0500 Subject: [Biococoa-dev] BCSymbolSet In-Reply-To: Message-ID: >>> i'd like to make the suggestion to make BCSymbolSet a subclass of >>> NSMutableSet, to get all set features for free. I for example need the >>> allObjects: method for my BCScoringMatrix class. >> I think we were going to make BCSymbolSet already. I think the root >> class >> was going to be non-mutable and hold standard sets, like all >> nucleotides, >> etc. > > Yes i noticed that, but it derives from NSObject, would be cleaner if > it derives from NS(Mutable)Set. (just my opinion) Ah, but NSMutableSet's a class cluster, so you can't really subclass it very easily. JT _______________________________________________ This mind intentionally left blank From a.griekspoor at nki.nl Thu Mar 10 15:03:45 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Thu, 10 Mar 2005 21:03:45 +0100 Subject: [Biococoa-dev] can't compile framework In-Reply-To: <44580952cfc687478b6ed8e176ce8120@earthlink.net> References: <4b49dee839ec0002cce7bce3beb124f3@bioworxx.com> <44580952cfc687478b6ed8e176ce8120@earthlink.net> Message-ID: >> I want to start with the BCAlignment stuff and there are many things >> to discuss: >> >> 1. what exactly do we want an BCAlignment to be ? >> A slim Datastructure for different Alignment algorithms >> Or a comfortable datastructure, which is perhaps not very useful for >> programs concentrating on performance Is this a rethorical question ;-) ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows vs Mac 65 million years ago, there were more dinosaurs than humans. Where are the dinosaurs now? ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com 4Peaks - For Peaks, Four Peaks. 2004 Winner of the Apple Design Awards Best Mac OS X Student Product http://www.mekentosj.com/4peaks ********************************************************* From a.griekspoor at nki.nl Thu Mar 10 15:07:55 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Thu, 10 Mar 2005 21:07:55 +0100 Subject: [Biococoa-dev] BCSymbolSet In-Reply-To: References: Message-ID: Why NSMutableSet? NSSet contains the needed allObjects method already... Alex On 10-mrt-05, at 20:43, John Timmer wrote: > >>>> i'd like to make the suggestion to make BCSymbolSet a subclass of >>>> NSMutableSet, to get all set features for free. I for example need >>>> the >>>> allObjects: method for my BCScoringMatrix class. >>> I think we were going to make BCSymbolSet already. I think the root >>> class >>> was going to be non-mutable and hold standard sets, like all >>> nucleotides, >>> etc. >> >> Yes i noticed that, but it derives from NSObject, would be cleaner if >> it derives from NS(Mutable)Set. (just my opinion) > > Ah, but NSMutableSet's a class cluster, so you can't really subclass > it very > easily. > > JT > > _______________________________________________ > This mind intentionally left blank > > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Microsoft is not the answer, Microsoft is the question, NO is the answer ********************************************************* From charles.parnot at stanford.edu Thu Mar 10 15:44:45 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Thu, 10 Mar 2005 12:44:45 -0800 Subject: [Biococoa-dev] BCSymbolSet In-Reply-To: References: Message-ID: At 14:43 -0500 3/10/05, John Timmer wrote: > >>> i'd like to make the suggestion to make BCSymbolSet a subclass of >>>> NSMutableSet, to get all set features for free. I for example need the >>>> allObjects: method for my BCScoringMatrix class. >>> I think we were going to make BCSymbolSet already. I think the root >>> class >>> was going to be non-mutable and hold standard sets, like all >>> nucleotides, >>> etc. >> >> Yes i noticed that, but it derives from NSObject, would be cleaner if >> it derives from NS(Mutable)Set. (just my opinion) > >Ah, but NSMutableSet's a class cluster, so you can't really subclass it very >easily. > >JT Sorry I did not step in the discussion earlier and I am quite busy, so just a quick note on that. I think John's argument is good. We don't understand enough of Apple's class guts. Wrapping the NSSet inside a BCSymbolSet object adds a little bit of overhead, but in 99% of the cases, it should not matter. And when it does matter, one can always get the underlying NSSet to work with. I already made the modifications to BCSymbolSet to make it immutable (much safer if we are going to use it in BCSequence, where it is not going to be modified). We may have a mutable subclass, but this may not be necessary with a few clever factory methods. Philip, tell me if you really need a mutable symbol set. We should probably add a mutable one only if really needed. I also added a method called 'allSymbols' to get an array with the symbols. I did not commit any of that yet, because I have to modify the code for the prebuilt symbol sets to fit with the new implementation. But you can assume 'allSymbols' exists and returns an NSArray. Add more methods in the BCSymbolSet header if you need more! Please do not put code in the implementation file if possible, as I am not sure how I would then merge the modifs with my current non-commited modifs. I will also try to add my 2 cents about the alignment implementation when I have some time tonight or tomorrow :-) charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From mek at mekentosj.com Thu Mar 10 15:46:32 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Thu, 10 Mar 2005 21:46:32 +0100 Subject: [Biococoa-dev] starting BCAlignment In-Reply-To: References: <4b49dee839ec0002cce7bce3beb124f3@bioworxx.com> <0522d600a08d97128697a1ebdc61f2d7@bioworxx.com> Message-ID: <8a914c5b87018ed76856ff63465c7456@mekentosj.com> On 10-mrt-05, at 2:32, Koen van der Drift wrote: > Hi, > > As I said before, I don't know much about coding alignments, and leave > that all up to you guys. I remember however, that Alex once mentioned > that he knows someone in his department who is very familiar with > alignment code. It could be an idea to see what he thinks about our > approach. Maybe he even wants to become a BioCocoa member :) > > - Koen. Unfortunately, Serge isn't an expert on alignments, he is (in my eyes at least) a guru in programming complicated structural analysis programs, in working with huge datasets, difficult mathematics and great expertise in optimizing code using altivec, multiprocessors etc. Especially the latter are things we could benefit from in the case of alignments... As I told some of you already, alignments have been the number one feature request for 4Peaks, and I have tried at least a dozen times to get it working. Two weeks ago, I finally managed to get it working and now it's a matter of implementing the GUI. And of course, now I tell people that I got it working they start asking already for contigs, aaarrrrgrrghghgghh. Anyway, in the past year I've read and learned quite a bit of alignments, so let me share my knowledge for those not familiar. Now, again I'm not an expert so I'm fully confident that Philipp will correct me where I'm wrong and probably has some more things to add. Basically there are two types of alignments, global and local. It all started with the first, a global alignment, which aligns ALL symbols of sequence1 with sequence2. This algorithm generally used is the so-called Needleman-Wunsch algorithm after the geniuses that came up with the idea. So very simple, what you do is create a 2d matrix like this: // W O R D 1 // // W 0 0 0 0 0 // // O 0 // // R 0 // // D 0 // // 2 0 You start by filling the outer edges of the matrix with score 0, this is the setup phase. Next , you start from the upper left with the socalled fill-phase in which you calculate the score for 1) a match 2) a shift in word1 3) a shift in word2 From these 3 you pick the highest score leading to a direction (diagonal, left, up respectively) This the allows you to calculate the score for the next position from the upper left corner, and like this you fill up the complete matrix Then comes the so-called reverse-phase in which starting from the lower right corner you trace back via the directions set in the fill phase, which gives you the alignment. Soon after, Smith and Waterman came up with a modification of this algorithm, the so-called local alignment. A few simple changes (like a score must always be larger than 0) removes the need to align all symbols, but instead the part that is most similar. Now one of the biggest problems is quite obvious, as the aligned "words" get bigger the matrix and thus both the memory and time requirements increase quadratically. Rapidly becoming a disaster for your machine if these get to big. So a few clever heads came up with more modifications to work at least towards a sub-quadratic memory requirement, and this is where Philipp comes in ;-) Now one thing more about matrices to explain John a bit more: You can imagine that in the DNA world a (very simple) scoring scheme can be: a match positive, e.g. +1 a mismatch negative, e.g. -1 A simple char comparison is all it takes to get the score. But in the protein world there's more info as the change from aminoacid X to Y can be less or more important based on if they belong to the same chemical class or not. Based on analysis of mutations in many sequences, people have created substitution matrices with this point in mind (examples are PAM and BLOSUM). As for each score these matrices have to be accessed, for performance reasons they are usually of type int** (or char** but that's the same). Perhaps this gives a bit of introduction in the terminology, again I rely on Philipp to make corrections wherever needed ;-) I have attached an .m file illustrating the original Needleman-wunsch algorithm described above.. Cheers, Alex -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 4669 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: GlobalAlignment.m Type: application/octet-stream Size: 4269 bytes Desc: not available URL: -------------- next part -------------- ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com The requirements said: Windows 2000 or better. So I got a Macintosh. ********************************************************* -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 673 bytes Desc: not available URL: From a.griekspoor at nki.nl Thu Mar 10 15:50:45 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Thu, 10 Mar 2005 21:50:45 +0100 Subject: [Biococoa-dev] WWDC2005 Message-ID: Hi everybody, For the students among us, a reminder to apply for the student scholarship before the 25th at: http://developer.apple.com/wwdc/students/index.html Tom and I did already ;-) Peter, a draft email about the meeting to Apple will arrive in your emailbox tomorrow evening.... Cheers, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows vs Mac 65 million years ago, there were more dinosaurs than humans. Where are the dinosaurs now? ********************************************************* From biococoa at bioworxx.com Thu Mar 10 15:53:11 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Thu, 10 Mar 2005 21:53:11 +0100 Subject: Fwd: [Biococoa-dev] BCSymbolSet Message-ID: <5298232f1c3e9558df4a31be4e44e61f@bioworxx.com> > Why NSMutableSet? NSSet contains the needed allObjects method > already... > Alex Because we have a NSMutableSet inside the current BCSymbolSet class. There are also mutability methods, so we need a mutable class. Phil > > On 10-mrt-05, at 20:43, John Timmer wrote: > >> >>>>> i'd like to make the suggestion to make BCSymbolSet a subclass of >>>>> NSMutableSet, to get all set features for free. I for example need >>>>> the >>>>> allObjects: method for my BCScoringMatrix class. >>>> I think we were going to make BCSymbolSet already. I think the root >>>> class >>>> was going to be non-mutable and hold standard sets, like all >>>> nucleotides, >>>> etc. >>> >>> Yes i noticed that, but it derives from NSObject, would be cleaner if >>> it derives from NS(Mutable)Set. (just my opinion) From a.griekspoor at nki.nl Thu Mar 10 15:55:35 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Thu, 10 Mar 2005 21:55:35 +0100 Subject: [Biococoa-dev] BCSymbolSet In-Reply-To: <5298232f1c3e9558df4a31be4e44e61f@bioworxx.com> References: <5298232f1c3e9558df4a31be4e44e61f@bioworxx.com> Message-ID: Oh, sorry, I missed parts of the symbolset discussion, but I thought I remember having read a strong advocacy (John ?) for a non-mutable symbolset... Never mind... Alex On 10-mrt-05, at 21:53, Philipp Seibel wrote: >> Why NSMutableSet? NSSet contains the needed allObjects method >> already... >> Alex > > Because we have a NSMutableSet inside the current BCSymbolSet class. > There are also mutability methods, so we need a mutable class. > > Phil > >> >> On 10-mrt-05, at 20:43, John Timmer wrote: >> >>> >>>>>> i'd like to make the suggestion to make BCSymbolSet a subclass of >>>>>> NSMutableSet, to get all set features for free. I for example >>>>>> need the >>>>>> allObjects: method for my BCScoringMatrix class. >>>>> I think we were going to make BCSymbolSet already. I think the >>>>> root >>>>> class >>>>> was going to be non-mutable and hold standard sets, like all >>>>> nucleotides, >>>>> etc. >>>> >>>> Yes i noticed that, but it derives from NSObject, would be cleaner >>>> if >>>> it derives from NS(Mutable)Set. (just my opinion) > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Claiming that the Macintosh is inferior to Windows because most people use Windows, is like saying that all other restaurants serve food that is inferior to McDonalds ********************************************************* From biococoa at bioworxx.com Thu Mar 10 16:05:55 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Thu, 10 Mar 2005 22:05:55 +0100 Subject: [Biococoa-dev] starting BCAlignment In-Reply-To: <8a914c5b87018ed76856ff63465c7456@mekentosj.com> References: <4b49dee839ec0002cce7bce3beb124f3@bioworxx.com> <0522d600a08d97128697a1ebdc61f2d7@bioworxx.com> <8a914c5b87018ed76856ff63465c7456@mekentosj.com> Message-ID: <71fea05ed88ed8b6b13c35e5b1c37564@bioworxx.com> Am 10.03.2005 um 21:46 schrieb Alexander Griekspoor: > On 10-mrt-05, at 2:32, Koen van der Drift wrote: > >> Hi, >> >> As I said before, I don't know much about coding alignments, and >> leave that all up to you guys. I remember however, that Alex once >> mentioned that he knows someone in his department who is very >> familiar with alignment code. It could be an idea to see what he >> thinks about our approach. Maybe he even wants to become a BioCocoa >> member :) >> >> - Koen. > > Unfortunately, Serge isn't an expert on alignments, he is (in my eyes > at least) a guru in programming complicated structural analysis > programs, in working with huge datasets, difficult mathematics and > great expertise in optimizing code using altivec, multiprocessors etc. > Especially the latter are things we could benefit from in the case of > alignments... > As I told some of you already, alignments have been the number one > feature request for 4Peaks, and I have tried at least a dozen times to > get it working. Two weeks ago, I finally managed to get it working and > now it's a matter of implementing the GUI. And of course, now I tell > people that I got it working they start asking already for contigs, > aaarrrrgrrghghgghh. > Anyway, in the past year I've read and learned quite a bit of > alignments, so let me share my knowledge for those not familiar. Now, > again I'm not an expert so I'm fully confident that Philipp will > correct me where I'm wrong and probably has some more things to add. > Basically there are two types of alignments, global and local. > It all started with the first, a global alignment, which aligns ALL > symbols of sequence1 with sequence2. This algorithm generally used is > the so-called Needleman-Wunsch algorithm after the geniuses that came > up with the idea. So very simple, what you do is create a 2d matrix > like this: > // W O R D 1 > // > // W 0 0 0 0 0 > // > // O 0 > // > // R 0 > // > // D 0 > // > // 2 0 > > You start by filling the outer edges of the matrix with score 0, this > is the setup phase. > Next , you start from the upper left with the socalled fill-phase in > which you calculate the score for 1) a match 2) a shift in word1 3) a > shift in word2 > From these 3 you pick the highest score leading to a direction > (diagonal, left, up respectively) > This the allows you to calculate the score for the next position from > the upper left corner, and like this you fill up the complete matrix > Then comes the so-called reverse-phase in which starting from the > lower right corner you trace back via the directions set in the fill > phase, which gives you the alignment. > this part is also known as dynamic programming ( for those who want to google for more information ) > Soon after, Smith and Waterman came up with a modification of this > algorithm, the so-called local alignment. A few simple changes (like a > score must always be larger than 0) removes the need to align all > symbols, but instead the part that is most similar. > this is exactly the same approach except from one thing, you look for the subpath with the highest score. > Now one of the biggest problems is quite obvious, as the aligned > "words" get bigger the matrix and thus both the memory and time > requirements increase quadratically. Rapidly becoming a disaster for > your machine if these get to big. So a few clever heads came up with > more modifications to work at least towards a sub-quadratic memory > requirement, and this is where Philipp comes in ;-) > ok here i'm. what should i say, you're again completly right ;-). To get subquadratic is as far as i know only possible with heuristics, but in this case you loose accuracy. I think we should first implement the basic algorithms. > Now one thing more about matrices to explain John a bit more: > You can imagine that in the DNA world a (very simple) scoring scheme > can be: > a match positive, e.g. +1 > a mismatch negative, e.g. -1 > A simple char comparison is all it takes to get the score. > But in the protein world there's more info as the change from > aminoacid X to Y can be less or more important based on if they belong > to the same chemical class or not. Based on analysis of mutations in > many sequences, people have created substitution matrices with this > point in mind (examples are PAM and BLOSUM). As for each score these > matrices have to be accessed, for performance reasons they are usually > of type int** (or char** but that's the same). > I think we should use a int* instead of int** because its faster. Take a look at my BCScoringMatrix. > Perhaps this gives a bit of introduction in the terminology, again I > rely on Philipp to make corrections wherever needed ;-) > I have attached an .m file illustrating the original Needleman-wunsch > algorithm described above.. > Cheers, > Alex > -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 5370 bytes Desc: not available URL: From jtimmer at bellatlantic.net Thu Mar 10 16:07:30 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Thu, 10 Mar 2005 16:07:30 -0500 Subject: [Biococoa-dev] BCSymbolSet In-Reply-To: Message-ID: > Oh, sorry, I missed parts of the symbolset discussion, but I thought I > remember having read a strong advocacy (John ?) for a non-mutable > symbolset... > Never mind... Well, we just want the pre-defined ones non-mutable. We can't allow a user to accidentally delete cysteine from the set of all amino acids.... JT _______________________________________________ This mind intentionally left blank From a.griekspoor at nki.nl Thu Mar 10 16:20:47 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Thu, 10 Mar 2005 22:20:47 +0100 Subject: [Biococoa-dev] starting BCAlignment In-Reply-To: <71fea05ed88ed8b6b13c35e5b1c37564@bioworxx.com> References: <4b49dee839ec0002cce7bce3beb124f3@bioworxx.com> <0522d600a08d97128697a1ebdc61f2d7@bioworxx.com> <8a914c5b87018ed76856ff63465c7456@mekentosj.com> <71fea05ed88ed8b6b13c35e5b1c37564@bioworxx.com> Message-ID: <7ee2818ee8c64bc9f92ae4b3cd27cda0@nki.nl> On 10-mrt-05, at 22:05, Philipp Seibel wrote: >> Now one thing more about matrices to explain John a bit more: >> You can imagine that in the DNA world a (very simple) scoring scheme >> can be: >> a match positive, e.g. +1 >> a mismatch negative, e.g. -1 >> A simple char comparison is all it takes to get the score. >> But in the protein world there's more info as the change from >> aminoacid X to Y can be less or more important based on if they >> belong to the same chemical class or not. Based on analysis of >> mutations in many sequences, people have created substitution >> matrices with this point in mind (examples are PAM and BLOSUM). As >> for each score these matrices have to be accessed, for performance >> reasons they are usually of type int** (or char** but that's the >> same). >> > I think we should use a int* instead of int** because its faster. Take > a look at my BCScoringMatrix. You're the expert! ;-) I came along this example code which I though was quite elegant: Generation of a (DNA)scoring matrix: match = 1; mismh = -1; /* set match and mismatch weights */ for ( i = 0; i < 128 ; i++ ) for ( j = 0; j < 128 ; j++ ) if (i == j ) v[i][j] = match; else v[i][j] = mismh; v['N']['N'] = mismh; v['n']['n'] = mismh; v['A']['a'] = v['a']['A'] = match; v['C']['c'] = v['c']['C'] = match; v['G']['g'] = v['g']['G'] = match; v['T']['t'] = v['t']['T'] = match; So, you simply build a 128x128 char matrix using the fact that chars are ints Next to calculate the score: char *a = A[++i]; // character i in sequence A char *b = B[++j]; // character j in sequence B char *c++ = (*a == *b || isdna && v[*a][*b] == MATCHSC ) ? '|' : ' '; // code to insert a | in the case of a match and // a space in the case of a mismatch Again, my experience is pretty limited, so I believe you immediately that using a simple int array is faster than a matrix, and certainly much simpler!! Cheers, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 E-mail: a.griekspoor at nki.nl AIM: mekentosj at mac.com Web: http://www.mekentosj.com EnzymeX - To cut or not to cut http://www.mekentosj.com/enzymex ********************************************************* -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 4116 bytes Desc: not available URL: From biococoa at bioworxx.com Thu Mar 10 16:23:30 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Thu, 10 Mar 2005 22:23:30 +0100 Subject: [Biococoa-dev] starting BCAlignment In-Reply-To: <8a914c5b87018ed76856ff63465c7456@mekentosj.com> References: <4b49dee839ec0002cce7bce3beb124f3@bioworxx.com> <0522d600a08d97128697a1ebdc61f2d7@bioworxx.com> <8a914c5b87018ed76856ff63465c7456@mekentosj.com> Message-ID: <06dc6ee729fae77c20a0520b62e6a8f2@bioworxx.com> The piece of code alex just sent arround is great to understand what a global alignment is. Alex uses two types of scores, a match and a mismatch score, this is the simplest way to align sequences. It produces quite good alignments for dna sequences, but in the most common cases, you need the scoring matrix to score different kind of matches and mismatches. Another point is to differentiate between gap-open and gap-extension costs, but this is more relevant at local alignments. Ok i stop here, because nobody can follow me anymore ..... ;-) Phil From biococoa at bioworxx.com Thu Mar 10 16:33:33 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Thu, 10 Mar 2005 22:33:33 +0100 Subject: [Biococoa-dev] starting BCAlignment In-Reply-To: <7ee2818ee8c64bc9f92ae4b3cd27cda0@nki.nl> References: <4b49dee839ec0002cce7bce3beb124f3@bioworxx.com> <0522d600a08d97128697a1ebdc61f2d7@bioworxx.com> <8a914c5b87018ed76856ff63465c7456@mekentosj.com> <71fea05ed88ed8b6b13c35e5b1c37564@bioworxx.com> <7ee2818ee8c64bc9f92ae4b3cd27cda0@nki.nl> Message-ID: Am 10.03.2005 um 22:20 schrieb Alexander Griekspoor: > On 10-mrt-05, at 22:05, Philipp Seibel wrote: > >>> Now one thing more about matrices to explain John a bit more: >>> You can imagine that in the DNA world a (very simple) scoring scheme >>> can be: >>> a match positive, e.g. +1 >>> a mismatch negative, e.g. -1 >>> A simple char comparison is all it takes to get the score. >>> But in the protein world there's more info as the change from >>> aminoacid X to Y can be less or more important based on if they >>> belong to the same chemical class or not. Based on analysis of >>> mutations in many sequences, people have created substitution >>> matrices with this point in mind (examples are PAM and BLOSUM). As >>> for each score these matrices have to be accessed, for performance >>> reasons they are usually of type int** (or char** but that's the >>> same). >>> >> I think we should use a int* instead of int** because its faster. >> Take a look at my BCScoringMatrix. > > You're the expert! ;-) > I came along this example code which I though was quite elegant: > Generation of a (DNA)scoring matrix: > > match = 1; > mismh = -1; > /* set match and mismatch weights */ > for ( i = 0; i < 128 ; i++ ) > for ( j = 0; j < 128 ; j++ ) > if (i == j ) v[i][j] = match; > else v[i][j] = mismh; > > v['N']['N'] = mismh; > v['n']['n'] = mismh; > v['A']['a'] = v['a']['A'] = match; > v['C']['c'] = v['c']['C'] = match; > v['G']['g'] = v['g']['G'] = match; > v['T']['t'] = v['t']['T'] = match; > > So, you simply build a 128x128 char matrix using the fact that chars > are ints > Next to calculate the score: > > char *a = A[++i]; // character i in sequence A > char *b = B[++j]; // character j in sequence B > char *c++ = (*a == *b || isdna && v[*a][*b] == MATCHSC ) ? '|' : ' '; > // code to insert a | in the case of a match and > // a space in the case of a mismatch > I think it's a quite good approach, but we have to decide wheter we want to "ask" the matrix with two BCSymbols or just with chars. Take a look at my recent implementation of the scoring matrix. It's perhaps slower than this one, but more comfortable. I think we just have to test the performance, when we've done the first algorithm. > Again, my experience is pretty limited, so I believe you immediately > that using a simple int array is faster than a matrix, and certainly > much simpler!! My experience is limited to several java alignment implementations, so i've never done this with a good programming language ;-) Phil > Cheers, > Alex -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 4026 bytes Desc: not available URL: From charles.parnot at stanford.edu Thu Mar 10 16:37:57 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Thu, 10 Mar 2005 13:37:57 -0800 Subject: [Biococoa-dev] BCSymbolSet In-Reply-To: References: <5298232f1c3e9558df4a31be4e44e61f@bioworxx.com> Message-ID: At 21:55 +0100 3/10/05, Alexander Griekspoor wrote: >Oh, sorry, I missed parts of the symbolset discussion, but I thought I remember having read a strong advocacy (John ?) for a non-mutable symbolset... >Never mind... >Alex > sorry, very short: * the user should not be able to modify a prebuilt symbol set [John (and me agree)] * One should not be able to change the symbolSet of a sequence, that would be disastrous [me] * all the arguments that usually apply in the mutable vs immutable design, with the added fact that symbol sets are small and thus easy to copy [me; also applies to sequences] charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From a.griekspoor at nki.nl Thu Mar 10 16:38:21 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Thu, 10 Mar 2005 22:38:21 +0100 Subject: [Biococoa-dev] starting BCAlignment In-Reply-To: References: <4b49dee839ec0002cce7bce3beb124f3@bioworxx.com> <0522d600a08d97128697a1ebdc61f2d7@bioworxx.com> <8a914c5b87018ed76856ff63465c7456@mekentosj.com> <71fea05ed88ed8b6b13c35e5b1c37564@bioworxx.com> <7ee2818ee8c64bc9f92ae4b3cd27cda0@nki.nl> Message-ID: > [code snippet ] > I think it's a quite good approach, but we have to decide wheter we > want to "ask" the matrix with two BCSymbols or just with chars. Take a > look at my recent implementation of the scoring matrix. It's perhaps > slower than this one, but more comfortable. I think we just have to > test the performance, when we've done the first algorithm. I'm a big advocate of using our symbols and sequences natively wherever possible, so I'm absolutely in favor of your implementation! > >> Again, my experience is pretty limited, so I believe you immediately >> that using a simple int array is faster than a matrix, and certainly >> much simpler!! > > My experience is limited to several java alignment implementations, so > i've never done this with a good programming language ;-) Oooohhh, you'd better hope no one from that camp is listening ;-) LOL I hope you don't see my remarks as obstructions to your plans, I'm very much looking forward to your implementations! Cheers, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 E-mail: a.griekspoor at nki.nl AIM: mekentosj at mac.com Web: http://www.mekentosj.com EnzymeX - To cut or not to cut http://www.mekentosj.com/enzymex ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 E-mail: a.griekspoor at nki.nl AIM: mekentosj at mac.com Web: http://www.mekentosj.com EnzymeX - To cut or not to cut http://www.mekentosj.com/enzymex ********************************************************* From a.griekspoor at nki.nl Thu Mar 10 16:41:28 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Thu, 10 Mar 2005 22:41:28 +0100 Subject: [Biococoa-dev] starting BCAlignment In-Reply-To: <06dc6ee729fae77c20a0520b62e6a8f2@bioworxx.com> References: <4b49dee839ec0002cce7bce3beb124f3@bioworxx.com> <0522d600a08d97128697a1ebdc61f2d7@bioworxx.com> <8a914c5b87018ed76856ff63465c7456@mekentosj.com> <06dc6ee729fae77c20a0520b62e6a8f2@bioworxx.com> Message-ID: <483aa860d38603ca7b91f4edb525da23@nki.nl> I can ;-) Actually for 4Peaks I use a global alignment which 1) doesn't penaltize beginning and end gaps (almost making it local), 2) doesn't penalitize extra above a certain gap length (meaning that introns in your sequences don't end up with huge penalties).... A nice example where you pick a certain alignment algorithm for a certain situation simply because it's the best fit. In general one algorithm isn't better per se than the other, often they're just suited for different situations... Alex On 10-mrt-05, at 22:23, Philipp Seibel wrote: > The piece of code alex just sent arround is great to understand what a > global alignment is. Alex uses two types of scores, a match and a > mismatch score, this is the simplest way to align sequences. It > produces quite good alignments for dna sequences, but in the most > common cases, you need the scoring matrix to score different kind of > matches and mismatches. Another point is to differentiate between > gap-open and gap-extension costs, but this is more relevant at local > alignments. > Ok i stop here, because nobody can follow me anymore ..... ;-) > > Phil > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com 4Peaks - For Peaks, Four Peaks. 2004 Winner of the Apple Design Awards Best Mac OS X Student Product http://www.mekentosj.com/4peaks ********************************************************* From a.griekspoor at nki.nl Thu Mar 10 16:44:26 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Thu, 10 Mar 2005 22:44:26 +0100 Subject: [Biococoa-dev] BCSymbolSet In-Reply-To: References: <5298232f1c3e9558df4a31be4e44e61f@bioworxx.com> Message-ID: On 10-mrt-05, at 22:37, Charles PARNOT wrote: > At 21:55 +0100 3/10/05, Alexander Griekspoor wrote: >> Oh, sorry, I missed parts of the symbolset discussion, but I thought >> I remember having read a strong advocacy (John ?) for a non-mutable >> symbolset... >> Never mind... >> Alex >> > > sorry, very short: > > * the user should not be able to modify a prebuilt symbol set [John > (and me agree)] Yep, same here. > * One should not be able to change the symbolSet of a sequence, that > would be disastrous [me] Me too, but perhaps you CAN add symbols (like ambiguity to a perfect ATCG sequence), but certainly not remove one. > * all the arguments that usually apply in the mutable vs immutable > design, with the added fact that symbol sets are small and thus easy > to copy [me; also applies to sequences] Yes, I think you have a point. Say you want to merge two sets, it's just as easy (or maybe easier) to get the merge than to add the symbols of one to the other... This is exactly why I asked about the mutability, so it should have been: (Charles?) ;-) Alex > > charles > > -- > Help science go fast forward: > http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ > > Charles Parnot > charles.parnot at stanford.edu > > Room B157 in Beckman Center > 279, Campus Drive > Stanford University > Stanford, CA 94305 (USA) > > Tel +1 650 725 7754 > Fax +1 650 725 8021 > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Claiming that the Macintosh is inferior to Windows because most people use Windows, is like saying that all other restaurants serve food that is inferior to McDonalds ********************************************************* From biococoa at bioworxx.com Thu Mar 10 16:50:13 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Thu, 10 Mar 2005 22:50:13 +0100 Subject: [Biococoa-dev] starting BCAlignment In-Reply-To: <483aa860d38603ca7b91f4edb525da23@nki.nl> References: <4b49dee839ec0002cce7bce3beb124f3@bioworxx.com> <0522d600a08d97128697a1ebdc61f2d7@bioworxx.com> <8a914c5b87018ed76856ff63465c7456@mekentosj.com> <06dc6ee729fae77c20a0520b62e6a8f2@bioworxx.com> <483aa860d38603ca7b91f4edb525da23@nki.nl> Message-ID: > I can ;-) > Actually for 4Peaks I use a global alignment which 1) doesn't > penaltize beginning and end gaps (almost making it local), 2) doesn't > penalitize extra above a certain gap length (meaning that introns in > your sequences don't end up with huge penalties).... A nice example > where you pick a certain alignment algorithm for a certain situation > simply because it's the best fit. In general one algorithm isn't > better per se than the other, often they're just suited for different > situations... > Alex Seems to grow up into an alignment battle between me and you LOL ;-). I want to start coding with the BCNeedlemanWunsch, but i'm not sure whether i should put it into my BCAlignments folder or in the BCTools folder. I would prefer the BCAlignment folder. What do you think Phil > > On 10-mrt-05, at 22:23, Philipp Seibel wrote: > >> The piece of code alex just sent arround is great to understand what >> a global alignment is. Alex uses two types of scores, a match and a >> mismatch score, this is the simplest way to align sequences. It >> produces quite good alignments for dna sequences, but in the most >> common cases, you need the scoring matrix to score different kind of >> matches and mismatches. Another point is to differentiate between >> gap-open and gap-extension costs, but this is more relevant at local >> alignments. >> Ok i stop here, because nobody can follow me anymore ..... ;-) >> >> Phil >> >> _______________________________________________ >> Biococoa-dev mailing list >> Biococoa-dev at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/biococoa-dev From a.griekspoor at nki.nl Thu Mar 10 16:54:56 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Thu, 10 Mar 2005 22:54:56 +0100 Subject: [Biococoa-dev] starting BCAlignment In-Reply-To: References: <4b49dee839ec0002cce7bce3beb124f3@bioworxx.com> <0522d600a08d97128697a1ebdc61f2d7@bioworxx.com> <8a914c5b87018ed76856ff63465c7456@mekentosj.com> <06dc6ee729fae77c20a0520b62e6a8f2@bioworxx.com> <483aa860d38603ca7b91f4edb525da23@nki.nl> Message-ID: <8dbe791778ae3d09352ccb37e7b0057d@nki.nl> >> A nice example where you pick a certain alignment algorithm for a >> certain situation simply because it's the best fit. In general one >> algorithm isn't better per se than the other, often they're just >> suited for different situations... >> Alex > > Seems to grow up into an alignment battle between me and you LOL ;-). No, please no, that was exactly what I didn't want to do.... sorry for that. The last sentence was purely informative for the others, without any offensive ideas towards your plans. > > I want to start coding with the BCNeedlemanWunsch, but i'm not sure > whether i should put it into my BCAlignments folder or in the BCTools > folder. I would prefer the BCAlignment folder. What do you think I think you're right, it would be nice to keep everything alignment in the BCAlignment folder. In fact I'm not even sure if we even need a tool by smart use of class methods, but time will tell... Alex >> >> On 10-mrt-05, at 22:23, Philipp Seibel wrote: >> >>> The piece of code alex just sent arround is great to understand what >>> a global alignment is. Alex uses two types of scores, a match and a >>> mismatch score, this is the simplest way to align sequences. It >>> produces quite good alignments for dna sequences, but in the most >>> common cases, you need the scoring matrix to score different kind of >>> matches and mismatches. Another point is to differentiate between >>> gap-open and gap-extension costs, but this is more relevant at local >>> alignments. >>> Ok i stop here, because nobody can follow me anymore ..... ;-) >>> >>> Phil >>> >>> _______________________________________________ >>> Biococoa-dev mailing list >>> Biococoa-dev at bioinformatics.org >>> https://bioinformatics.org/mailman/listinfo/biococoa-dev > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows is a 32-bit patch to a 16-bit shell for an 8-bit operating system, written for a 4-bit processor by a 2- bit company without 1 bit of sense. ********************************************************* From kvddrift at earthlink.net Thu Mar 10 19:53:03 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 10 Mar 2005 19:53:03 -0500 Subject: [Biococoa-dev] starting BCAlignment In-Reply-To: <8dbe791778ae3d09352ccb37e7b0057d@nki.nl> References: <4b49dee839ec0002cce7bce3beb124f3@bioworxx.com> <0522d600a08d97128697a1ebdc61f2d7@bioworxx.com> <8a914c5b87018ed76856ff63465c7456@mekentosj.com> <06dc6ee729fae77c20a0520b62e6a8f2@bioworxx.com> <483aa860d38603ca7b91f4edb525da23@nki.nl> <8dbe791778ae3d09352ccb37e7b0057d@nki.nl> Message-ID: <128c8d41b084dfbacdaa4fc9d9ce2882@earthlink.net> On Mar 10, 2005, at 4:54 PM, Alexander Griekspoor wrote: >> >> Seems to grow up into an alignment battle between me and you LOL ;-). > No, please no, that was exactly what I didn't want to do.... sorry for > that. The last sentence was purely informative for the others, without > any offensive ideas towards your plans. I'm still reading, but I get the picture now - great expansion! >> >> I want to start coding with the BCNeedlemanWunsch, but i'm not sure >> whether i should put it into my BCAlignments folder or in the BCTools >> folder. I would prefer the BCAlignment folder. What do you think > I think you're right, it would be nice to keep everything alignment in > the BCAlignment folder. In fact I'm not even sure if we even need a > tool by smart use of class methods, but time will tell... I am thinking how this will be used. The end user probably wants to try out one type of alignment, see the result, then try another one, compare the results, etc. So if we make a BCNeedlemanWunsch, and then a BCSmithWaterman where is the actual matrix that is used to calculate. I think it is a good idea if we have just one matrix, that is used as a basis for each different calculation. It would be a waste if for every calculation the starting matrix has to be re-calculated. Or maybe that's where BCMatrix comes in place? cheers, - Koen. From kvddrift at earthlink.net Thu Mar 10 20:29:28 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 10 Mar 2005 20:29:28 -0500 Subject: [Biococoa-dev] adding new files Message-ID: <825b4740c04ab2a1aff98b46f9d3ade7@earthlink.net> Hi, Just a general remark. When adding new files to the framework, make sure you do the following: 1. Add the files in the right location in Xcode, so they match the structure on your HD 2. Add the files to BCFoundation.h or BCAppKit.h 3. Click on the BioCocoa target and make sure all files are labled 'public' (except for BCCocoa_Prefix.h) 4. Files in BCFoundation only need to #import , not 5. Change _MyCompanyName_ to The BioCocoa Project BTW it would be nice if we can do this automatically, including the addition of the LGPL licence which I think should be at the top of each file. I know we can make our own templates, but they should be stored outside the project, somewhere in Application Support. Anyone knows if this can be set on a per project basis? 6. Commit the files to cvs, and don't forget to commit the changes to BCFoundation.h and project.pbxproj cheers, - Koen. From kvddrift at earthlink.net Thu Mar 10 21:13:34 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 10 Mar 2005 21:13:34 -0500 Subject: [Biococoa-dev] BCSymbolSet In-Reply-To: References: <5298232f1c3e9558df4a31be4e44e61f@bioworxx.com> Message-ID: <00375b6746c3fcf41f1ca51ee9aaee0d@earthlink.net> On Mar 10, 2005, at 4:44 PM, Alexander Griekspoor wrote: > On 10-mrt-05, at 22:37, Charles PARNOT wrote: > >> At 21:55 +0100 3/10/05, Alexander Griekspoor wrote: >>> Oh, sorry, I missed parts of the symbolset discussion, but I thought >>> I remember having read a strong advocacy (John ?) for a non-mutable >>> symbolset... >>> Never mind... >>> Alex >>> >> >> sorry, very short: >> >> * the user should not be able to modify a prebuilt symbol set [John >> (and me agree)] > Yep, same here. Me too. > >> * One should not be able to change the symbolSet of a sequence, that >> would be disastrous [me] > Me too, but perhaps you CAN add symbols (like ambiguity to a perfect > ATCG sequence), but certainly not remove one. Me three for the first part. However if the user wants to change the symbolset, maybe to include ambiguity, I think that she should create a new sequence, not change the symbolset. One reason could be, suppose the user wants to undo the change, that will be impossible if you can only add, but not remove symbols from a set. >> * all the arguments that usually apply in the mutable vs immutable >> design, with the added fact that symbol sets are small and thus easy >> to copy [me; also applies to sequences] > Yes, I think you have a point. Say you want to merge two sets, it's > just as easy (or maybe easier) to get the merge than to add the > symbols of one to the other... Me agrees. - Koen. From charles.parnot at stanford.edu Fri Mar 11 00:34:42 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Thu, 10 Mar 2005 21:34:42 -0800 Subject: [Biococoa-dev] adding new files In-Reply-To: <825b4740c04ab2a1aff98b46f9d3ade7@earthlink.net> References: <825b4740c04ab2a1aff98b46f9d3ade7@earthlink.net> Message-ID: Thanks, Koen, it is a very nice list. I did not realize all of this was indeed necessary. How about adding an entry in the dev-docs file to make this easier to retrieve and refer to? (BTW, guys, it is still in my agenda to expand those docs too, and add a word about the BCSequence 'placeholder' class and the overall structure of the sequence classes; and something about unit testings; be patient!). We could add this one too: In public headers, check that you do not use relative paths in your #import statements. Example: use #import "BCSequence.h" and not #import "../BCSequence/BCSequence.h" if called from a file in another folder. The compiler will find the header even without the correct path. charles At 8:29 PM -0500 3/10/05, Koen van der Drift wrote: >Hi, > >Just a general remark. When adding new files to the framework, make sure you do the following: > >1. Add the files in the right location in Xcode, so they match the structure on your HD > >2. Add the files to BCFoundation.h or BCAppKit.h > >3. Click on the BioCocoa target and make sure all files are labled 'public' (except for BCCocoa_Prefix.h) > >4. Files in BCFoundation only need to #import , not > >5. Change _MyCompanyName_ to The BioCocoa Project > >BTW it would be nice if we can do this automatically, including the addition of the LGPL licence which I think should be at the top of each file. I know we can make our own templates, but they should be stored outside the project, somewhere in Application Support. Anyone knows if this can be set on a per project basis? > >6. Commit the files to cvs, and don't forget to commit the changes to BCFoundation.h and project.pbxproj > > >cheers, > >- Koen. > >_______________________________________________ >Biococoa-dev mailing list >Biococoa-dev at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/biococoa-dev -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From charles.parnot at stanford.edu Fri Mar 11 01:21:18 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Thu, 10 Mar 2005 22:21:18 -0800 Subject: [Biococoa-dev] starting BCAlignment In-Reply-To: References: <4b49dee839ec0002cce7bce3beb124f3@bioworxx.com> <0522d600a08d97128697a1ebdc61f2d7@bioworxx.com> <8a914c5b87018ed76856ff63465c7456@mekentosj.com> <71fea05ed88ed8b6b13c35e5b1c37564@bioworxx.com> <7ee2818ee8c64bc9f92ae4b3cd27cda0@nki.nl> Message-ID: >I think it's a quite good approach, but we have to decide wheter we want to "ask" the matrix with two BCSymbols or just with chars. Take a look at my recent implementation of the scoring matrix. It's perhaps slower than this one, but more comfortable. I think we just have to test the performance, when we've done the first algorithm. I was thinking about the alignement implementation while driving to the day care (!), and after looking at your code, I am so delighted to see that I had something very similar in mind. What you are doing is mapping symbols to int in your scoring matrix. My thought about it was the same, but using... symbol sets, of course. Which you probably had in mind, in fact, given the question you asked about 'allObjects'. The current implementation is still very much OO, which is good. Of course, as a result, it might be slow, with the overhead from the substituteSymbol:forSymbol, that scans the NSArray, and accessing the symbols through the sequence objects, but Shark will tell. Then, if we need to optimize, there is an obvious(?) path, and here is how we could use symbol sets: * The sequences you need to align define a SymbolSet, probaby the union of the symbol sets of the sequences * That instance of the BCSymbolSet classmight be then able to provide a perfect and reproducible bijection between that set of symbols and int values --> e.g. '(int)equivalentIntValueForSymbol:(BCSymbol *)aSymbol' And ONLY the BCSymbolSet class can decide on that bijection. One way could be to simply sort the symbols alphabetically. So again, one symbol in one SymbolSet = one int (very similar to what you did in BCScoreMatrix) * That bijection between symbols and int can be used to: - translate sequences into int array - translate the dictionary in the score matrix into int** * Then the alignment algorithm manipulates only int, and is completely sequence-agnostic * After alignement, everything is translated back to symbols to generate a BCAlignement object Nothing really original: objects ------> C ------> algorithm -------> C -------> Objects The first and last arrow are the 'translation' steps. To avoid problems, that translation should be all in one place, which means all in one class. For instance, BCSymbolSet (and not BCScoreMatrix). And then, BCSymbolSet becomes really important in the framework. In the end, also, a user could create exotic symbols, exotic sequences, and exotic score matrices, and still use the same algorithms. A final comment about the scrore matrix in that design: because BCSymbolSet is in charge of the int<-->BCSymbol translation, the score matrix has to be defined as a dictionary, like John suggested. Such a dictionary could use the symbols as the key, for instance to get the score of substitution of symbolA for symbolB: NSNUmber *score = [[scoreDictionary objectForKey:symbolA] objectForKey:symbolB]; (key being copied for dictionary, BCSymbol has to be immutable with that design... or we could use the string representation) That makes matrices difficult to define programatically, but easier through plist. OK, maybe something else, but at something fully OO and very readable. Final question: how do gaps fit in the matrix score thing? Is there a score for a gap/symbol? Maybe gaps sould be excluded from the symbol <--> int conversion? They would be a special case, with some special scoring schemes, like gap-open, gap-extension,...? Well, these were my 2 cents ;-) charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From charles.parnot at stanford.edu Fri Mar 11 01:56:07 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Thu, 10 Mar 2005 22:56:07 -0800 Subject: [Biococoa-dev] adding new files In-Reply-To: <825b4740c04ab2a1aff98b46f9d3ade7@earthlink.net> References: <825b4740c04ab2a1aff98b46f9d3ade7@earthlink.net> Message-ID: Oups, just thought of another one (does not apply to anybody so far): between step 5 and step 6, make sure the framework compiles without error before commit ;-) charles At 8:29 PM -0500 3/10/05, Koen van der Drift wrote: >Hi, > >Just a general remark. When adding new files to the framework, make sure you do the following: > >1. Add the files in the right location in Xcode, so they match the structure on your HD > >2. Add the files to BCFoundation.h or BCAppKit.h > >3. Click on the BioCocoa target and make sure all files are labled 'public' (except for BCCocoa_Prefix.h) > >4. Files in BCFoundation only need to #import , not > >5. Change _MyCompanyName_ to The BioCocoa Project > >BTW it would be nice if we can do this automatically, including the addition of the LGPL licence which I think should be at the top of each file. I know we can make our own templates, but they should be stored outside the project, somewhere in Application Support. Anyone knows if this can be set on a per project basis? > >6. Commit the files to cvs, and don't forget to commit the changes to BCFoundation.h and project.pbxproj > > >cheers, > >- Koen. > >_______________________________________________ >Biococoa-dev mailing list >Biococoa-dev at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/biococoa-dev -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From charles.parnot at stanford.edu Fri Mar 11 02:18:16 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Thu, 10 Mar 2005 23:18:16 -0800 Subject: [Biococoa-dev] BCSymbolSet done Message-ID: I commited an updated version of BCSymbolSet, which is now immutable, can return an array of symbols, etc and other goodies. Here is the header below, so you don't have to necessarily update your project right now. Comments and questions welcome! charles NB: next on my agenda is BCAbstractSequence et al... Also, my full agenda is in the TODO file in the project, if you are curious :-) // // BCSymbolSet.h // BioCocoa // // Created by Alexander Griekspoor on Fri Sep 10 2004. // Copyright (c) 2004 The BioCocoa Project. All rights reserved. // #import #import "BCFoundationDefines.h" @class BCSymbol; /*! @class BCSymbolSet @abstract A collection of BCSymbols of the same type @discussion BCSymbolSet objects provide a lot of flexibility for the user * of the framework when creating and manipulating sequence objects. * They can be thought of as filters, by restricting a sequence object * to a certain set of symbols. For instance, a dna sequence could be * created that will only accept the non-ambiguous bases A, T, G, C * but not the compound symbols like P and Y. * * BCSymbolSet objects are immutable. The class object provides a number * of factory methods for prebuilt symbol sets such as * '+ (BCSymbolSet *)dnaSymbolSet' and '+ (BCSymbolSet *)dnaStrictSymbolSet'. * It is recommanded to use these methods when creating a symbol set * supported by the class. For other cases, a new BCSymbolSet can easily * be created from an array of symbols, or by combining existing symbol sets. * Because BCSymbolSet are immutable, new objects have to be created to * modify existing symbol sets. */ @interface BCSymbolSet : NSObject { NSSet *symbolSet; BCSequenceType sequenceType; } //////////////////////////////////////////////////////////////////////////// // OBJECT METHODS START HERE // #pragma mark ? #pragma mark ?INITIALIZATION METHODS // // INITIALIZATION //////////////////////////////////////////////////////////////////////////// //designated initializer - (id)initWithArray:(NSArray *)symbols sequenceType:(BCSequenceType)type; //decide the sequence type based on the first symbol in the passed array - (id)initWithArray:(NSArray *)symbols; //initializes the symbol sets using a string, by scanning the characters //and generating symbols of the right sequence type // e.g. A --> Adenosine if sequence type is DNA, Alanine if protein - (id)initWithString:(NSString *)stringOfCharacters sequenceType:(BCSequenceType)type; //factory methods - return an autoreleased object + (BCSymbolSet *)symbolSetWithArray:(NSArray *)symbols; + (BCSymbolSet *)symbolSetWithArray:(NSArray *)symbols sequenceType:(BCSequenceType)type; + (BCSymbolSet *)symbolSetWithString:(NSString *)aString sequenceType:(BCSequenceType)type; //pre-built symbol sets + (BCSymbolSet *)dnaSymbolSet; + (BCSymbolSet *)dnaStrictSymbolSet; + (BCSymbolSet *)rnaSymbolSet; + (BCSymbolSet *)rnaStrictSymbolSet; + (BCSymbolSet *)proteinSymbolSet; + (BCSymbolSet *)proteinStrictSymbolSet; + (BCSymbolSet *)unknownSymbolSet; + (BCSymbolSet *)unknownAndGapSymbolSet; //////////////////////////////////////////////////////////////////////////// // #pragma mark ? #pragma mark ?GENERAL METHODS // // GENERAL METHODS //////////////////////////////////////////////////////////////////////////// - (NSSet *)symbolSet; - (NSArray *)allSymbols; - (NSCharacterSet *)characterSetRepresentation; - (BCSequenceType)sequenceType; - (BOOL)containsSymbol:(BCSymbol *)aSymbol; // aSymbol=W and contains A --> no - (BOOL)containsSymbolRepresentedBy:(BCSymbol *)aSymbol; // aSymbol=W and contains A --> yes - (BOOL)containsAllSymbolsRepresentedBy:(BCSymbol *)aSymbol; // aSymbol=W and contains A,T --> yes - (BOOL)containsSymbolRepresenting:(BCSymbol *)aSymbol; // aSymbol=A and contains W --> yes //creating new symbol sets from existing ones - (BCSymbolSet *)symbolSetByFormingUnionWithSymbolSet:(BCSymbolSet *)otherSet; - (BCSymbolSet *)symbolSetByFormingIntersectionWithSymbolSet:(BCSymbolSet *)otherSet; /* TO DO (or not to do?) - (BCSymbolSet *)complementSet; - (BCSymbolSet *)expandedSet; // ambigous symbols expanded */ - (BOOL)isSupersetOfSet:(BCSymbolSet *)theOtherSet; //NSCopying formal protocol - (id)copyWithZone:(NSZone *)zone; //BCSymbolSet is immutable //Keep this for a future BCMutableSymbolSet, if ever needed /* //////////////////////////////////////////////////////////////////////////// // #pragma mark ? #pragma mark ?MUTABILITY METHODS // // MUTABILITY METHODS //////////////////////////////////////////////////////////////////////////// - (void)addSymbol:(BCSymbol *)symbol; - (void)addSymbols:(NSArray *)symbols; - (void)addSymbolsInString:(NSString *)aString; - (void)removeSymbol:(BCSymbol *)symbol; - (void)removeSymbols:(NSArray *)symbols; - (void)removeSymbolsInString:(NSString *)aString; - (void)formUnionWithSymbolSet:(BCSymbolSet *)otherSet; - (void)formIntersectionWithSymbolSet:(BCSymbolSet *)otherSet; - (void)makeComplementary; */ @end -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 -------------- next part -------------- An HTML attachment was scrubbed... URL: From charles.parnot at stanford.edu Fri Mar 11 02:30:24 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Thu, 10 Mar 2005 23:30:24 -0800 Subject: [Biococoa-dev] Method names with 'and' In-Reply-To: References: <4b49dee839ec0002cce7bce3beb124f3@bioworxx.com> <0522d600a08d97128697a1ebdc61f2d7@bioworxx.com> <8a914c5b87018ed76856ff63465c7456@mekentosj.com> <06dc6ee729fae77c20a0520b62e6a8f2@bioworxx.com> <483aa860d38603ca7b91f4edb525da23@nki.nl> Message-ID: Sorry I had one minor comment I forgot about your new code, Philip, more specifically method names: in some of the methods are something like 'initWithSymbols:andMatrix:'. I believe 'initWithSymbols:Matrix:' is more "Cocoa-standard". ... a few minutes interruption... OK, because I did not want to look too picky and stupid, or be wrong, I looked in: http://developer.apple.com/documentation/Cocoa/Conceptual/CodingGuidelines/Articles/NamingMethods.html#//apple_ref/doc/uid/20001282/BCIBJEFG Excerpt: "Don't use "and" to link keywords that are attributes of the receiver. RIGHT: - (int)runModalForDirectory:(NSString *)path file: (NSString *) name types:(NSArray *)fileTypes; WRONG: - (int)runModalForDirectory:(NSString *)path andFile:(NSString *)name andTypes:(NSArray *)fileTypes; Although "and" may sound good in this example, it causes problems as you create methods with more and more keywords." Apple said it, not me ;-) charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From biococoa at bioworxx.com Fri Mar 11 03:04:34 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Fri, 11 Mar 2005 09:04:34 +0100 Subject: [Biococoa-dev] Method names with 'and' In-Reply-To: References: <4b49dee839ec0002cce7bce3beb124f3@bioworxx.com> <0522d600a08d97128697a1ebdc61f2d7@bioworxx.com> <8a914c5b87018ed76856ff63465c7456@mekentosj.com> <06dc6ee729fae77c20a0520b62e6a8f2@bioworxx.com> <483aa860d38603ca7b91f4edb525da23@nki.nl> Message-ID: <1ab885745362d7e500c776f30219fe8b@bioworxx.com> Am 11.03.2005 um 08:30 schrieb Charles PARNOT: > Sorry I had one minor comment I forgot about your new code, Philip, > more specifically method names: in some of the methods are something > like 'initWithSymbols:andMatrix:'. I believe 'initWithSymbols:Matrix:' > is more "Cocoa-standard". > sure, thanks for your advice. I changed that. Phil From biococoa at bioworxx.com Fri Mar 11 03:15:01 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Fri, 11 Mar 2005 09:15:01 +0100 Subject: [Biococoa-dev] starting BCAlignment In-Reply-To: References: <4b49dee839ec0002cce7bce3beb124f3@bioworxx.com> <0522d600a08d97128697a1ebdc61f2d7@bioworxx.com> <8a914c5b87018ed76856ff63465c7456@mekentosj.com> <71fea05ed88ed8b6b13c35e5b1c37564@bioworxx.com> <7ee2818ee8c64bc9f92ae4b3cd27cda0@nki.nl> Message-ID: <22787a51dd2e1cfc749d8c0dca8d7dd5@bioworxx.com> Am 11.03.2005 um 07:21 schrieb Charles PARNOT: >> I think it's a quite good approach, but we have to decide wheter we >> want to "ask" the matrix with two BCSymbols or just with chars. Take >> a look at my recent implementation of the scoring matrix. It's >> perhaps slower than this one, but more comfortable. I think we just >> have to test the performance, when we've done the first algorithm. > > I was thinking about the alignement implementation while driving to > the day care (!), and after looking at your code, I am so delighted to > see that I had something very similar in mind. What you are doing is > mapping symbols to int in your scoring matrix. My thought about it was > the same, but using... symbol sets, of course. Which you probably had > in mind, in fact, given the question you asked about 'allObjects'. > > The current implementation is still very much OO, which is good. Of > course, as a result, it might be slow, with the overhead from the > substituteSymbol:forSymbol, that scans the NSArray, and accessing the > symbols through the sequence objects, but Shark will tell. > > > Then, if we need to optimize, there is an obvious(?) path, and here is > how we could use symbol sets: > > * The sequences you need to align define a SymbolSet, probaby the > union of the symbol sets of the sequences > > * That instance of the BCSymbolSet classmight be then able to provide > a perfect and reproducible bijection between that set of symbols and > int values > --> e.g. '(int)equivalentIntValueForSymbol:(BCSymbol *)aSymbol' > And ONLY the BCSymbolSet class can decide on that bijection. > One way could be to simply sort the symbols alphabetically. > So again, one symbol in one SymbolSet = one int > (very similar to what you did in BCScoreMatrix) > > * That bijection between symbols and int can be used to: > - translate sequences into int array > - translate the dictionary in the score matrix into int** or int* as i mentioned before ;-). > > * Then the alignment algorithm manipulates only int, and is completely > sequence-agnostic Thats what i want it to be. > * After alignement, everything is translated back to symbols to > generate a BCAlignement object > > > Nothing really original: > objects ------> C ------> algorithm -------> C -------> Objects I think that will be the approach to make it fast. > > The first and last arrow are the 'translation' steps. To avoid > problems, that translation should be all in one place, which means all > in one class. For instance, BCSymbolSet (and not BCScoreMatrix). And > then, BCSymbolSet becomes really important in the framework. In the > end, also, a user could create exotic symbols, exotic sequences, and > exotic score matrices, and still use the same algorithms. > > A final comment about the scrore matrix in that design: because > BCSymbolSet is in charge of the int<-->BCSymbol translation, the score > matrix has to be defined as a dictionary, like John suggested. Such a > dictionary could use the symbols as the key, for instance to get the > score of substitution of symbolA for symbolB: > NSNUmber *score = [[scoreDictionary objectForKey:symbolA] > objectForKey:symbolB]; > (key being copied for dictionary, BCSymbol has to be immutable with > that design... or we could use the string representation) > That makes matrices difficult to define programatically, but easier > through plist. What about storing in .plists and representing as int* in the Object. > OK, maybe something else, but at something fully OO and very readable. > > Final question: how do gaps fit in the matrix score thing? Is there a > score for a gap/symbol? No there is not. That is a seperate option for the algorithm. > Maybe gaps sould be excluded from the symbol <--> int conversion? They > would be a special case, with some special scoring schemes, like > gap-open, gap-extension,...? > > Well, these were my 2 cents ;-) Very helpful, thx Phil From kvddrift at earthlink.net Fri Mar 11 06:44:20 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 11 Mar 2005 06:44:20 -0500 Subject: [Biococoa-dev] adding new files In-Reply-To: References: <825b4740c04ab2a1aff98b46f9d3ade7@earthlink.net> Message-ID: <509d7db9cb4db9ac655581d98b325472@earthlink.net> On Mar 11, 2005, at 1:56 AM, Charles PARNOT wrote: > Oups, just thought of another one (does not apply to anybody so far): > between step 5 and step 6, make sure the framework compiles without > error before commit ;-) > > Haha, yes that is a good one. I would say compiles without warnings as well. I will make a separate doc that goes in the devdocs folder. - Koen. From biococoa at bioworxx.com Fri Mar 11 11:47:56 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Fri, 11 Mar 2005 17:47:56 +0100 Subject: [Biococoa-dev] BCPairwiseAlignment & BCScoreMatrix Message-ID: Hi everybody, i just made some modifications to the Alignment stuff. I followed Alex' advice and made the Scoring Matrix char based. every symbol is casted to a char and used as a number key for the matrix. With this approach we have some memory overhead, but we're much faster, because we need not to ask the NSArray for the Symbol index everytime. I also copied some of alex' code ( sorry for that alex ;-) ) to provide a short overview over the global alignment. @Charles: Perhaps we could discuss your symbol to int mapping in more detail, i didn't get the idea. ;-) Phil From a.griekspoor at nki.nl Fri Mar 11 16:17:24 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Fri, 11 Mar 2005 22:17:24 +0100 Subject: [Biococoa-dev] Method names with 'and' In-Reply-To: References: <4b49dee839ec0002cce7bce3beb124f3@bioworxx.com> <0522d600a08d97128697a1ebdc61f2d7@bioworxx.com> <8a914c5b87018ed76856ff63465c7456@mekentosj.com> <06dc6ee729fae77c20a0520b62e6a8f2@bioworxx.com> <483aa860d38603ca7b91f4edb525da23@nki.nl> Message-ID: <8fbb7947daa2e794405285f7bfe24009@nki.nl> On 11-mrt-05, at 8:30, Charles PARNOT wrote: > Sorry I had one minor comment I forgot about your new code, Philip, > more specifically method names: in some of the methods are something > like 'initWithSymbols:andMatrix:'. I believe 'initWithSymbols:Matrix:' > is more "Cocoa-standard". > well, I believe it should than be initWithSymbols: matrix: (all words should start lowercase) If I would be picky ;-) ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows vs Mac 65 million years ago, there were more dinosaurs than humans. Where are the dinosaurs now? ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows is a 32-bit patch to a 16-bit shell for an 8-bit operating system, written for a 4-bit processor by a 2- bit company without 1 bit of sense. ********************************************************* From a.griekspoor at nki.nl Fri Mar 11 16:20:27 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Fri, 11 Mar 2005 22:20:27 +0100 Subject: [Biococoa-dev] starting BCAlignment In-Reply-To: <22787a51dd2e1cfc749d8c0dca8d7dd5@bioworxx.com> References: <4b49dee839ec0002cce7bce3beb124f3@bioworxx.com> <0522d600a08d97128697a1ebdc61f2d7@bioworxx.com> <8a914c5b87018ed76856ff63465c7456@mekentosj.com> <71fea05ed88ed8b6b13c35e5b1c37564@bioworxx.com> <7ee2818ee8c64bc9f92ae4b3cd27cda0@nki.nl> <22787a51dd2e1cfc749d8c0dca8d7dd5@bioworxx.com> Message-ID: On 11-mrt-05, at 9:15, Philipp Seibel wrote: >> A final comment about the scrore matrix in that design: because >> BCSymbolSet is in charge of the int<-->BCSymbol translation, the >> score matrix has to be defined as a dictionary, like John suggested. >> Such a dictionary could use the symbols as the key, for instance to >> get the score of substitution of symbolA for symbolB: >> NSNUmber *score = [[scoreDictionary objectForKey:symbolA] >> objectForKey:symbolB]; >> (key being copied for dictionary, BCSymbol has to be immutable with >> that design... or we could use the string representation) >> That makes matrices difficult to define programatically, but easier >> through plist. > > What about storing in .plists and representing as int* in the Object. I think that's indeed better phil, we don't need to stick to dictionaries to have storage in plists... ************************************************************** ** Alexander Griekspoor ** ************************************************************** The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com MacOS X: The power of UNIX with the simplicity of the Mac *************************************************************** From a.griekspoor at nki.nl Fri Mar 11 16:31:24 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Fri, 11 Mar 2005 22:31:24 +0100 Subject: [Biococoa-dev] BCPairwiseAlignment & BCScoreMatrix In-Reply-To: References: Message-ID: On 11-mrt-05, at 17:47, Philipp Seibel wrote: > Hi everybody, > > i just made some modifications to the Alignment stuff. I followed > Alex' advice and made the Scoring Matrix char based. every symbol is > casted to a char and used as a number key for the matrix. With this > approach we have some memory overhead, but we're much faster, because > we need not to ask the NSArray for the Symbol index everytime. > > I also copied some of alex' code ( sorry for that alex ;-) ) to > provide a short overview over the global alignment. Absolutely no problem! Just to make things clear for everyone, with alignments we're talking about two kinds of matrices. The one with the scores one which are also known as substitution matrices, although you can implement them as arrays as well like phil demonstrated before. These are different from the matrices used during the actual alignments (with the 3 phases as you might remember). For the first we create the scoring matrix objects, the second are probably only used internally in the algorithm implementation. So Koen, in this light your remark: > I am thinking how this will be used. The end user probably wants to > try out one type of alignment, see the result, then try another one, > compare the results, etc. So if we make a BCNeedlemanWunsch, and then > a BCSmithWaterman where is the actual matrix that is used to > calculate. I think it is a good idea if we have just one matrix, that > is used as a basis for each different calculation. It would be a waste > if for every calculation the starting matrix has to be re-calculated. > Or maybe that's where BCMatrix comes in place? The actual matrix used for calculation is the second one. But keeping the matrix only saves you the memory allocation, but different alignments fill the matrix differently so there's no use in keeping it around as it has to be refilled again with scores based on algorithm, penalty scores, gap costs etc. As most time goes into filling the matrix and tracing it back after the fill, you can't reuse it. Also, most algorithms that are subquadratic for memory requirements, chop up the matrix and use a divide-and-conquer approach because it's the storage of a complete-sized matrix that forms the memory problem. Does this make any sense? Alex > > @Charles: Perhaps we could discuss your symbol to int mapping in more > detail, i didn't get the idea. ;-) > > Phil > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com 4Peaks - For Peaks, Four Peaks. 2004 Winner of the Apple Design Awards Best Mac OS X Student Product http://www.mekentosj.com/4peaks ********************************************************* From charles.parnot at stanford.edu Fri Mar 11 17:40:57 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Fri, 11 Mar 2005 14:40:57 -0800 Subject: [Biococoa-dev] Method names with 'and' In-Reply-To: <8fbb7947daa2e794405285f7bfe24009@nki.nl> References: <4b49dee839ec0002cce7bce3beb124f3@bioworxx.com> <0522d600a08d97128697a1ebdc61f2d7@bioworxx.com> <8a914c5b87018ed76856ff63465c7456@mekentosj.com> <06dc6ee729fae77c20a0520b62e6a8f2@bioworxx.com> <483aa860d38603ca7b91f4edb525da23@nki.nl> <8fbb7947daa2e794405285f7bfe24009@nki.nl> Message-ID: At 22:17 +0100 3/11/05, Alexander Griekspoor wrote: >On 11-mrt-05, at 8:30, Charles PARNOT wrote: > >>Sorry I had one minor comment I forgot about your new code, Philip, more specifically method names: in some of the methods are something like 'initWithSymbols:andMatrix:'. I believe 'initWithSymbols:Matrix:' is more "Cocoa-standard". >> >well, I believe it should than be initWithSymbols: matrix: (all words should start lowercase) If I would be picky ;-) > arrghh.... I swear to God that was a typo !!! thanks ;-) charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From a.griekspoor at nki.nl Fri Mar 11 17:42:59 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Fri, 11 Mar 2005 23:42:59 +0100 Subject: [Biococoa-dev] Method names with 'and' In-Reply-To: References: <4b49dee839ec0002cce7bce3beb124f3@bioworxx.com> <0522d600a08d97128697a1ebdc61f2d7@bioworxx.com> <8a914c5b87018ed76856ff63465c7456@mekentosj.com> <06dc6ee729fae77c20a0520b62e6a8f2@bioworxx.com> <483aa860d38603ca7b91f4edb525da23@nki.nl> <8fbb7947daa2e794405285f7bfe24009@nki.nl> Message-ID: On 11-mrt-05, at 23:40, Charles PARNOT wrote: > At 22:17 +0100 3/11/05, Alexander Griekspoor wrote: >> On 11-mrt-05, at 8:30, Charles PARNOT wrote: >> >>> Sorry I had one minor comment I forgot about your new code, Philip, >>> more specifically method names: in some of the methods are something >>> like 'initWithSymbols:andMatrix:'. I believe >>> 'initWithSymbols:Matrix:' is more "Cocoa-standard". >>> >> well, I believe it should than be initWithSymbols: matrix: (all words >> should start lowercase) If I would be picky ;-) >> > > arrghh.... I swear to God that was a typo !!! LOL! > > thanks ;-) > > charles > -- > Help science go fast forward: > http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ > > Charles Parnot > charles.parnot at stanford.edu > > Room B157 in Beckman Center > 279, Campus Drive > Stanford University > Stanford, CA 94305 (USA) > > Tel +1 650 725 7754 > Fax +1 650 725 8021 > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Claiming that the Macintosh is inferior to Windows because most people use Windows, is like saying that all other restaurants serve food that is inferior to McDonalds ********************************************************* From kvddrift at earthlink.net Fri Mar 11 18:18:35 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 11 Mar 2005 18:18:35 -0500 Subject: [Biococoa-dev] BCPairwiseAlignment & BCScoreMatrix In-Reply-To: References: Message-ID: <0098d0562d0beefa0aa03e5dbd6a40dd@earthlink.net> On Mar 11, 2005, at 4:31 PM, Alexander Griekspoor wrote: > So Koen, in this light your remark: >> I am thinking how this will be used. The end user probably wants to >> try out one type of alignment, see the result, then try another one, >> compare the results, etc. So if we make a BCNeedlemanWunsch, and then >> a BCSmithWaterman where is the actual matrix that is used to >> calculate. I think it is a good idea if we have just one matrix, that >> is used as a basis for each different calculation. It would be a >> waste if for every calculation the starting matrix has to be >> re-calculated. Or maybe that's where BCMatrix comes in place? > The actual matrix used for calculation is the second one. But keeping > the matrix only saves you the memory allocation, but different > alignments fill the matrix differently so there's no use in keeping it > around as it has to be refilled again with scores based on algorithm, > penalty scores, gap costs etc. As most time goes into filling the > matrix and tracing it back after the fill, you can't reuse it. Also, > most algorithms that are subquadratic for memory requirements, chop up > the matrix and use a divide-and-conquer approach because it's the > storage of a complete-sized matrix that forms the memory problem. > Does this make any sense? > Yes :) - Koen. From kvddrift at earthlink.net Fri Mar 11 19:46:12 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 11 Mar 2005 19:46:12 -0500 Subject: [Biococoa-dev] BCPairwiseAlignment & BCScoreMatrix In-Reply-To: References: Message-ID: <84c45d6c6a7125c879dfb4c4c929520d@earthlink.net> Another ignorant question: what's the difference between alignment and pairwise alignment? Regarding their place in BioCocoa, should BCPairwiseAlignment be a subclass of BCAlignment? Right now it isn't, although I would expect that based on their names. - Koen. On Mar 11, 2005, at 4:31 PM, Alexander Griekspoor wrote: > > On 11-mrt-05, at 17:47, Philipp Seibel wrote: > >> Hi everybody, >> >> i just made some modifications to the Alignment stuff. I followed >> Alex' advice and made the Scoring Matrix char based. every symbol is >> casted to a char and used as a number key for the matrix. With this >> approach we have some memory overhead, but we're much faster, because >> we need not to ask the NSArray for the Symbol index everytime. >> >> I also copied some of alex' code ( sorry for that alex ;-) ) to >> provide a short overview over the global alignment. > Absolutely no problem! > Just to make things clear for everyone, with alignments we're talking > about two kinds of matrices. The one with the scores one which are > also known as substitution matrices, although you can implement them > as arrays as well like phil demonstrated before. > These are different from the matrices used during the actual > alignments (with the 3 phases as you might remember). For the first we > create the scoring matrix objects, the second are probably only used > internally in the algorithm implementation. > > So Koen, in this light your remark: >> I am thinking how this will be used. The end user probably wants to >> try out one type of alignment, see the result, then try another one, >> compare the results, etc. So if we make a BCNeedlemanWunsch, and then >> a BCSmithWaterman where is the actual matrix that is used to >> calculate. I think it is a good idea if we have just one matrix, that >> is used as a basis for each different calculation. It would be a >> waste if for every calculation the starting matrix has to be >> re-calculated. Or maybe that's where BCMatrix comes in place? > The actual matrix used for calculation is the second one. But keeping > the matrix only saves you the memory allocation, but different > alignments fill the matrix differently so there's no use in keeping it > around as it has to be refilled again with scores based on algorithm, > penalty scores, gap costs etc. As most time goes into filling the > matrix and tracing it back after the fill, you can't reuse it. Also, > most algorithms that are subquadratic for memory requirements, chop up > the matrix and use a divide-and-conquer approach because it's the > storage of a complete-sized matrix that forms the memory problem. > Does this make any sense? > Alex > >> >> @Charles: Perhaps we could discuss your symbol to int mapping in more >> detail, i didn't get the idea. ;-) >> >> Phil >> >> _______________________________________________ >> Biococoa-dev mailing list >> Biococoa-dev at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/biococoa-dev >> >> > ********************************************************* > ** Alexander Griekspoor ** > ********************************************************* > The Netherlands Cancer Institute > Department of Tumorbiology (H4) > Plesmanlaan 121, 1066 CX, Amsterdam > Tel: + 31 20 - 512 2023 > Fax: + 31 20 - 512 2029 > AIM: mekentosj at mac.com > E-mail: a.griekspoor at nki.nl > Web: http://www.mekentosj.com > > 4Peaks - For Peaks, Four Peaks. > 2004 Winner of the Apple Design Awards > Best Mac OS X Student Product > http://www.mekentosj.com/4peaks > > ********************************************************* > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > From kvddrift at earthlink.net Fri Mar 11 21:09:38 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 11 Mar 2005 21:09:38 -0500 Subject: [Biococoa-dev] initWithSymbol Message-ID: Hi, Anyone has objections if I in BCSymbol change: - (id)initWithSymbol:(unichar)aSymbol { .... to: - (id)initWithChar:(unichar)aChar { ... This seems more logical, initWithSymbol would imply that another BCSymbol is used. Maybe we can also rename the ivar symbol to symbolChar? cheers, - Koen. From kvddrift at earthlink.net Fri Mar 11 21:20:56 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 11 Mar 2005 21:20:56 -0500 Subject: [Biococoa-dev] BCSymbolSet done In-Reply-To: References: Message-ID: On Mar 11, 2005, at 2:18 AM, Charles PARNOT wrote: > I commited an updated version of BCSymbolSet, which is now immutable, > can return an array of symbols, etc and other goodies. > Here is the header below, so you don't have to necessarily update your > project right now. > Very nice, Charles! It's getting real shape now. > /* TO DO (or not to do?) > - (BCSymbolSet *)complementSet; > - (BCSymbolSet *)expandedSet; // ambigous symbols expanded > */ > I left these empty initially, because I had no idea how to code that. Feel free to add it :) - Koen. From charles.parnot at stanford.edu Sat Mar 12 02:17:57 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Fri, 11 Mar 2005 23:17:57 -0800 Subject: [Biococoa-dev] initWithSymbol In-Reply-To: References: Message-ID: While you are on it, it would be nice to replace the methods '+aaForChar' and '+baseForChar' with a unique name like '+symbolForChar' in the subclasses. I have some ugly code in symbol set class that tests the kind of class just to know which method to call. In the case of this method, having the same name makes sense (polymorphism!). Now that I think about it, funny how '+symbolForChar' and '-initWithChar' will have different and not so obvious meaning... I have to look at the code, I am not so sure now what the difference is! charles At 9:09 PM -0500 3/11/05, Koen van der Drift wrote: >Hi, > >Anyone has objections if I in BCSymbol change: > >- (id)initWithSymbol:(unichar)aSymbol >{ > .... > >to: > >- (id)initWithChar:(unichar)aChar >{ > ... > >This seems more logical, initWithSymbol would imply that another BCSymbol is used. Maybe we can also rename the ivar symbol to symbolChar? > > >cheers, > >- Koen. > >_______________________________________________ >Biococoa-dev mailing list >Biococoa-dev at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/biococoa-dev -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From charles.parnot at stanford.edu Sat Mar 12 02:18:45 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Fri, 11 Mar 2005 23:18:45 -0800 Subject: [Biococoa-dev] BCSymbolSet done In-Reply-To: References: Message-ID: At 9:20 PM -0500 3/11/05, Koen van der Drift wrote: >On Mar 11, 2005, at 2:18 AM, Charles PARNOT wrote: > >>I commited an updated version of BCSymbolSet, which is now immutable, can return an array of symbols, etc and other goodies. >>Here is the header below, so you don't have to necessarily update your project right now. >> > >Very nice, Charles! It's getting real shape now. > >>/* TO DO (or not to do?) >> - (BCSymbolSet *)complementSet; >> - (BCSymbolSet *)expandedSet; // ambigous symbols expanded >> */ >> > > >I left these empty initially, because I had no idea how to code that. Feel free to add it :) > >- Koen. We'll do it when/if we need it ;-) charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From a.griekspoor at nki.nl Sat Mar 12 02:51:15 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Sat, 12 Mar 2005 08:51:15 +0100 Subject: [Biococoa-dev] BCPairwiseAlignment & BCScoreMatrix In-Reply-To: <84c45d6c6a7125c879dfb4c4c929520d@earthlink.net> References: <84c45d6c6a7125c879dfb4c4c929520d@earthlink.net> Message-ID: Yeah, I was about to raise the same question yesterday, I also thought that the BCAlignment would be the general "algorithm" class, and BCPairwiseAlignment one of it subclasses But, as far as I can see what Phil had in mind, they're two completely different things. BCPairwiseAlignment does the job, and BCAlignment is what it returns. It is the result of an alignment. Now, I think Koen has a point that this is quite confusing. First, indeed their maybe a benefit if we implement a general algorithm class BCAlignmentAlgorithm or something and have subclasses that implement the different alignments (NW, Smith-W). And have its name be more distinct from the alignment class. Phil, do you know NeoBio, a java framework for alignments, they also use classes throughout and maybe you can see how they gave names to them (not saying they're better though)... Cheers, Alex On 12-mrt-05, at 1:46, Koen van der Drift wrote: > Another ignorant question: > > what's the difference between alignment and pairwise alignment? > Regarding their place in BioCocoa, should BCPairwiseAlignment be a > subclass of BCAlignment? Right now it isn't, although I would expect > that based on their names. > > - Koen. > > > On Mar 11, 2005, at 4:31 PM, Alexander Griekspoor wrote: > >> >> On 11-mrt-05, at 17:47, Philipp Seibel wrote: >> >>> Hi everybody, >>> >>> i just made some modifications to the Alignment stuff. I followed >>> Alex' advice and made the Scoring Matrix char based. every symbol is >>> casted to a char and used as a number key for the matrix. With this >>> approach we have some memory overhead, but we're much faster, >>> because we need not to ask the NSArray for the Symbol index >>> everytime. >>> >>> I also copied some of alex' code ( sorry for that alex ;-) ) to >>> provide a short overview over the global alignment. >> Absolutely no problem! >> Just to make things clear for everyone, with alignments we're talking >> about two kinds of matrices. The one with the scores one which are >> also known as substitution matrices, although you can implement them >> as arrays as well like phil demonstrated before. >> These are different from the matrices used during the actual >> alignments (with the 3 phases as you might remember). For the first >> we create the scoring matrix objects, the second are probably only >> used internally in the algorithm implementation. >> >> So Koen, in this light your remark: >>> I am thinking how this will be used. The end user probably wants to >>> try out one type of alignment, see the result, then try another one, >>> compare the results, etc. So if we make a BCNeedlemanWunsch, and >>> then a BCSmithWaterman where is the actual matrix that is used to >>> calculate. I think it is a good idea if we have just one matrix, >>> that is used as a basis for each different calculation. It would be >>> a waste if for every calculation the starting matrix has to be >>> re-calculated. Or maybe that's where BCMatrix comes in place? >> The actual matrix used for calculation is the second one. But keeping >> the matrix only saves you the memory allocation, but different >> alignments fill the matrix differently so there's no use in keeping >> it around as it has to be refilled again with scores based on >> algorithm, penalty scores, gap costs etc. As most time goes into >> filling the matrix and tracing it back after the fill, you can't >> reuse it. Also, most algorithms that are subquadratic for memory >> requirements, chop up the matrix and use a divide-and-conquer >> approach because it's the storage of a complete-sized matrix that >> forms the memory problem. >> Does this make any sense? >> Alex >> >>> >>> @Charles: Perhaps we could discuss your symbol to int mapping in >>> more detail, i didn't get the idea. ;-) >>> >>> Phil >>> >>> _______________________________________________ >>> Biococoa-dev mailing list >>> Biococoa-dev at bioinformatics.org >>> https://bioinformatics.org/mailman/listinfo/biococoa-dev >>> >>> >> ********************************************************* >> ** Alexander Griekspoor ** >> ********************************************************* >> The Netherlands Cancer Institute >> Department of Tumorbiology (H4) >> Plesmanlaan 121, 1066 CX, Amsterdam >> Tel: + 31 20 - 512 2023 >> Fax: + 31 20 - 512 2029 >> AIM: mekentosj at mac.com >> E-mail: a.griekspoor at nki.nl >> Web: http://www.mekentosj.com >> >> 4Peaks - For Peaks, Four Peaks. >> 2004 Winner of the Apple Design Awards >> Best Mac OS X Student Product >> http://www.mekentosj.com/4peaks >> >> ********************************************************* >> >> _______________________________________________ >> Biococoa-dev mailing list >> Biococoa-dev at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/biococoa-dev >> > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ************************************************************** ** Alexander Griekspoor ** ************************************************************** The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com MacOS X: The power of UNIX with the simplicity of the Mac *************************************************************** From a.griekspoor at nki.nl Sat Mar 12 02:54:22 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Sat, 12 Mar 2005 08:54:22 +0100 Subject: [Biococoa-dev] initWithSymbol In-Reply-To: References: Message-ID: absolutely not, good point, go ahead! On 12-mrt-05, at 3:09, Koen van der Drift wrote: > Hi, > > Anyone has objections if I in BCSymbol change: > > - (id)initWithSymbol:(unichar)aSymbol > { > .... > > to: > > - (id)initWithChar:(unichar)aChar > { > ... > > This seems more logical, initWithSymbol would imply that another > BCSymbol is used. Maybe we can also rename the ivar symbol to > symbolChar? > > > cheers, > > - Koen. > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ************************************************************** ** Alexander Griekspoor ** ************************************************************** The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com MacOS X: The power of UNIX with the simplicity of the Mac *************************************************************** From charles.parnot at stanford.edu Sat Mar 12 03:18:36 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Sat, 12 Mar 2005 00:18:36 -0800 Subject: [Biococoa-dev] Symbol mapping for optimization In-Reply-To: References: Message-ID: >@Charles: Perhaps we could discuss your symbol to int mapping in more detail, i didn't get the idea. ;-) > >Phil Thanks for giving me a second chance! I reread my email, which I realized is quite obfuscated... I got carried away. I think what I am trying to say is basically what everybody has in mind, but I am trying to make it more formal, and I want to propose a specific design to handle the task. So I will start with the only clear lines in my previous email... We want something like this: objects ------> C ------> algorithm -------> C -------> Objects The first and last arrow are the 'translation' steps. The translation takes the data structure in the objects and make C arrays, that will be well adapted for performance. In what we are doing with BioCocoa, we will mostly need to provide a translation for symbols, and decide on the type to use to replace BCSymbol. That type could be int, but maybe char would be more efficient with 1 byte instead of 4. So I changed my mind: let's map BCSymbols to chars, which I will sometimes write BCSymbol<-->char I think we need to keep the following pieces REALLY separate: (1) the BCSymbol classes and their cousins, BCAbstractSequence et al., BCScoreMatrix, BCSymbolSet... These are objects that only know about Symbols as objects, not as chars (at least they don't know about the char used for mapping). (2) the translator class: this is the only class that knows how to map a symbol into a char. (3) the algorithm class: the algorithm code has no idea about the biology. It takes arrays of chars and does its job. Note that this will probably apply to alignments, but could apply to other situations where we need to increase the performance, and that can manipulate arrays of chars and not care about their exact meaning. I think what is really not obvious at first, and that can be confusing, is the separation between (1) and (2). It seems obvious that a char should be the char corresponding to the BCSymbol, for instance base 'A' should be mapped to char 'A'. Maybe we will do that initially but we want to be able to modify that in the future, or even to have more dynamic mapping depending on the context. For instance, we might find later that mapping the bases ATGC to teh chars '0x00-0x01-0x02-0x03' is much better than mapping to the 'ATGC' chars, because we don't have useless chars in between each used char. We then just have to modify the code in (2), and probably only one or two lines of code, to propagate whatever optimization we make in the translation to the whole framework. If we don't keep code in (2) separate, and instead spread it in the different classes of (1) and (3), we will have some problems in maintenance and will slow down future evolutions. For instamce, if we let the algorithm (3) decide, then we have to rewrite the same code for a different algorithm, and then modify everything if we change our mind. Finally, we could offer different mappings, BCSymbol<-->char or BCSymbol<-->int, depending which is best for a given algorithm. At this point, I hope you see why having a separate translator class could make sense. Now, next step: its implementation. In the implementation of the translator, I can see how BCSymbolSet would be very useful. I think each BCSymbolSet could define a different mapping. For instance, a symbol set with ATGC would result in a certain mapping, where A is mapped to a certain char XXXX. But if the symbol set is ATGCBVHD, then symbol A could well be mapped to a different char, e.g. not XXXX but YYYY. Thus instead of having a fixed mapping BCSymbol <--> char, we could have a more dynamic mapping only dependent on a symbol set. This way, for instance, we could decide to always use the smaller possible matrix for scores, e.g. 4x4 for a symbol set of 4 symbols. Symbol set are easy to define before starting an alignement, and should be easy to define before any algorithm where BCSymbol<-->char mapping makes sense. In the case of alignement, we would do the following: * Define a BCSymbolSet that covers the sequences to align, e.g. union of the symbol sets of the sequences * Use that symbol set to instantiate a new translator, e.g. 'translatorWithSymbolSet:' * Call the translator to translate the BCSequences --> *char * Call the translator to translate the BCScoreMatrix --> **int (the indexes will be chars cast to ints) * Run the algorithm using only the chars * Call the translator to translate back the chars into sequences et al. (note about int**: you are right, Phil, that *int are faster to access, but you can have both **int and *int at the same time, because if you create a matrix a[][] as one block in memory, then you can use a[0][] = an *int with single index access, when needed). does this email make more sense?? Thanks for reading it all :-) These were my 4 cents. Charles NB: we may use the name 'mapper' instead of 'translator'... -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From biococoa at bioworxx.com Sat Mar 12 03:30:34 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Sat, 12 Mar 2005 09:30:34 +0100 Subject: [Biococoa-dev] BCPairwiseAlignment & BCScoreMatrix In-Reply-To: References: <84c45d6c6a7125c879dfb4c4c929520d@earthlink.net> Message-ID: <3482debb0e041bc1af6ab56f489126e1@bioworxx.com> Am 12.03.2005 um 08:51 schrieb Alexander Griekspoor: > Yeah, I was about to raise the same question yesterday, I also thought > that the BCAlignment would be the general "algorithm" class, and > BCPairwiseAlignment one of it subclasses But, as far as I can see what > Phil had in mind, they're two completely different things. > BCPairwiseAlignment does the job, and BCAlignment is what it returns. > It is the result of an alignment. Now, I think Koen has a point that > this is quite confusing. First, indeed their maybe a benefit if we > implement a general algorithm class BCAlignmentAlgorithm or something > and have subclasses that implement the different alignments (NW, > Smith-W). And have its name be more distinct from the alignment class. > Phil, do you know NeoBio, a java framework for alignments, they also > use classes throughout and maybe you can see how they gave names to > them (not saying they're better though)... > Cheers, > Alex Ok we should discuss this point in more detail. I think it's logical to have a structure like this: BCPairwiseAlignmentAlgorithm / BCAlignmentAlgorithm (Protocol or class) \ BCMultipleAlignmentAlgorithm oh nice ascii art ;-). And inside the alignmentalgorithm classes we have static methods to perform different kind of algorithms. If they get to complex to have the methods inside one class, we could either do something like class clusters or we simply pack them in different .m files with different categories. Would be more objective-c like than the NeoBio thing. Phil From biococoa at bioworxx.com Sat Mar 12 04:00:07 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Sat, 12 Mar 2005 10:00:07 +0100 Subject: [Biococoa-dev] Symbol mapping for optimization In-Reply-To: References: Message-ID: <7b9bcb1e9831afe6607ccf9509d43a32@bioworxx.com> > > > At this point, I hope you see why having a separate translator class > could make sense. Now, next step: its implementation. In the > implementation of the translator, I can see how BCSymbolSet would be > very useful. I think each BCSymbolSet could define a different > mapping. For instance, a symbol set with ATGC would result in a > certain mapping, where A is mapped to a certain char XXXX. But if the > symbol set is ATGCBVHD, then symbol A could well be mapped to a > different char, e.g. not XXXX but YYYY. Thus instead of having a fixed > mapping BCSymbol <--> char, we could have a more dynamic mapping only > dependent on a symbol set. This way, for instance, we could decide to > always use the smaller possible matrix for scores, e.g. 4x4 for a > symbol set of 4 symbols. > > Symbol set are easy to define before starting an alignement, and > should be easy to define before any algorithm where BCSymbol<-->char > mapping makes sense. In the case of alignement, we would do the > following: > * Define a BCSymbolSet that covers the sequences to align, e.g. union > of the symbol sets of the sequences > * Use that symbol set to instantiate a new translator, e.g. > 'translatorWithSymbolSet:' > * Call the translator to translate the BCSequences --> *char > * Call the translator to translate the BCScoreMatrix --> **int (the > indexes will be chars cast to ints) > * Run the algorithm using only the chars > * Call the translator to translate back the chars into sequences et al. > Great, i like your idea very much. got it now ;-). Perhaps we should not run the translator for the BCScoreMatrix inside the Algorithm class, because when we want to do several alignments with one scoring matrix, we would have to translate it several times. It's better to run the translator during the initialization from the .plist file i think. so we have the matrix for the special SymbolSet already in int* (or int** ;-)) format. Can't wait having this structure ;-) Phil > (note about int**: you are right, Phil, that *int are faster to > access, but you can have both **int and *int at the same time, because > if you create a matrix a[][] as one block in memory, then you can use > a[0][] = an *int with single index access, when needed). > > > does this email make more sense?? > > Thanks for reading it all :-) > These were my 4 cents. > > Charles > > NB: we may use the name 'mapper' instead of 'translator'... > -- > Help science go fast forward: > http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ > > Charles Parnot > charles.parnot at stanford.edu > > Room B157 in Beckman Center > 279, Campus Drive > Stanford University > Stanford, CA 94305 (USA) > > Tel +1 650 725 7754 > Fax +1 650 725 8021 > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > From kvddrift at earthlink.net Sat Mar 12 06:45:41 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 12 Mar 2005 06:45:41 -0500 Subject: [Biococoa-dev] initWithSymbol In-Reply-To: References: Message-ID: <51b1192b9b62ce475bd1069c70a0000e@earthlink.net> On Mar 12, 2005, at 2:54 AM, Alexander Griekspoor wrote: > absolutely not, good point, go ahead! > > Hmm, I now get a buch of the following warnings: warning: multiple declarations for method `initWithChar:' warning: using `-(id)initWithChar:(char)value' warning: also found `-(id)initWithChar:(unichar)aChar' warning: also found `-(id)initWithChar:(unichar)aChar' The '-(id)initWithChar:(char)value' version is in NSValue, the others are in BioCocoa. Any idea how I can get these warnings to go away? Are we not supposed to have method names that are already in the Foundation framework? - Koen. From kvddrift at earthlink.net Sat Mar 12 06:46:30 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 12 Mar 2005 06:46:30 -0500 Subject: [Biococoa-dev] initWithSymbol In-Reply-To: References: Message-ID: <8947b49e5d046dadc14ef338ae28e41f@earthlink.net> On Mar 12, 2005, at 2:17 AM, Charles PARNOT wrote: > While you are on it, it would be nice to replace the methods > '+aaForChar' and '+baseForChar' with a unique name like > '+symbolForChar' in the subclasses. I have some ugly code in symbol > set class that tests the kind of class just to know which method to > call. In the case of this method, having the same name makes sense > (polymorphism!). > > Now that I think about it, funny how '+symbolForChar' and > '-initWithChar' will have different and not so obvious meaning... I > have to look at the code, I am not so sure now what the difference is! > Go ahead and make those changes too :) - Koen. From kvddrift at earthlink.net Sat Mar 12 06:49:48 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 12 Mar 2005 06:49:48 -0500 Subject: [Biococoa-dev] BCPairwiseAlignment & BCScoreMatrix In-Reply-To: References: <84c45d6c6a7125c879dfb4c4c929520d@earthlink.net> Message-ID: <75af9678365dab2c7425ca89ce6ce1a6@earthlink.net> On Mar 12, 2005, at 2:51 AM, Alexander Griekspoor wrote: > First, indeed their maybe a benefit if we implement a general > algorithm class BCAlignmentAlgorithm or something and have subclasses > that implement the different alignments (NW, Smith-W). Not sure if I like the name BCAlignmentAlgorithm, it screams for typo's ;) We could have the result named BCAlignmentResult or BCSequenceAlignment. - Koen. From a.griekspoor at nki.nl Sat Mar 12 06:52:22 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Sat, 12 Mar 2005 12:52:22 +0100 Subject: [Biococoa-dev] initWithSymbol In-Reply-To: <51b1192b9b62ce475bd1069c70a0000e@earthlink.net> References: <51b1192b9b62ce475bd1069c70a0000e@earthlink.net> Message-ID: <3dd08ba9fc72202ce88532624cfaaa3b@nki.nl> Long live untyped methods ;-) The problem is that you haven't told the compiler which class it should lookup the method in... (for instance: [(id)theObject initWithChar: 'a'] will give you the problem, [(BCSymbol *)theObject initWithChar: 'a'] will not). > Are we not supposed to have method names that are already in the > Foundation framework? No, it's perfectly fine to use the same method names in different classes, in fact there are many examples and results in consistency. Example: initWithCapacity: ) Alex On 12-mrt-05, at 12:45, Koen van der Drift wrote: > > On Mar 12, 2005, at 2:54 AM, Alexander Griekspoor wrote: > >> absolutely not, good point, go ahead! >> >> > > Hmm, I now get a buch of the following warnings: > > warning: multiple declarations for method `initWithChar:' > warning: using `-(id)initWithChar:(char)value' > warning: also found `-(id)initWithChar:(unichar)aChar' > warning: also found `-(id)initWithChar:(unichar)aChar' > > The '-(id)initWithChar:(char)value' version is in NSValue, the others > are in BioCocoa. Any idea how I can get these warnings to go away? > Are we not supposed to have method names that are already in the > Foundation framework? > > > - Koen. > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows vs Mac 65 million years ago, there were more dinosaurs than humans. Where are the dinosaurs now? ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com The requirements said: Windows 2000 or better. So I got a Macintosh. ********************************************************* From a.griekspoor at nki.nl Sat Mar 12 06:53:06 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Sat, 12 Mar 2005 12:53:06 +0100 Subject: [Biococoa-dev] BCPairwiseAlignment & BCScoreMatrix In-Reply-To: <75af9678365dab2c7425ca89ce6ce1a6@earthlink.net> References: <84c45d6c6a7125c879dfb4c4c929520d@earthlink.net> <75af9678365dab2c7425ca89ce6ce1a6@earthlink.net> Message-ID: > Not sure if I like the name BCAlignmentAlgorithm, it screams for > typo's ;) True! ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows is a 32-bit patch to a 16-bit shell for an 8-bit operating system, written for a 4-bit processor by a 2- bit company without 1 bit of sense. ********************************************************* From philipp.seibel at byteworxx.de Sat Mar 12 07:05:54 2005 From: philipp.seibel at byteworxx.de (Philipp Seibel) Date: Sat, 12 Mar 2005 13:05:54 +0100 Subject: [Biococoa-dev] BCPairwiseAlignment & BCScoreMatrix In-Reply-To: <75af9678365dab2c7425ca89ce6ce1a6@earthlink.net> References: <84c45d6c6a7125c879dfb4c4c929520d@earthlink.net> <75af9678365dab2c7425ca89ce6ce1a6@earthlink.net> Message-ID: <8cbb68c242987eb996b4dae50bc16581@byteworxx.de> > > On Mar 12, 2005, at 2:51 AM, Alexander Griekspoor wrote: > >> First, indeed their maybe a benefit if we implement a general >> algorithm class BCAlignmentAlgorithm or something and have subclasses >> that implement the different alignments (NW, Smith-W). > > Not sure if I like the name BCAlignmentAlgorithm, it screams for > typo's ;) We could have the result named BCAlignmentResult or > BCSequenceAlignment. > I agree with you. I like the BCSequenceAlignment version. > - Koen. > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > From kvddrift at earthlink.net Sat Mar 12 07:17:39 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 12 Mar 2005 07:17:39 -0500 Subject: [Biococoa-dev] initWithSymbol In-Reply-To: <3dd08ba9fc72202ce88532624cfaaa3b@nki.nl> References: <51b1192b9b62ce475bd1069c70a0000e@earthlink.net> <3dd08ba9fc72202ce88532624cfaaa3b@nki.nl> Message-ID: <5c1e3e75d8fc4041096bf4216cb78094@earthlink.net> On Mar 12, 2005, at 6:52 AM, Alexander Griekspoor wrote: > The problem is that you haven't told the compiler which class it > should lookup the method in... (for instance: [(id)theObject > initWithChar: 'a'] will give you the problem, [(BCSymbol *)theObject > initWithChar: 'a'] will not). So would it be safe to have it return BCSymbol instead of id? Or is there anotherSolution, maybe use initWithSymbolChar? - Koen. From biococoa at bioworxx.com Sat Mar 12 07:25:34 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Sat, 12 Mar 2005 13:25:34 +0100 Subject: [Biococoa-dev] BCPairwiseAlignment Message-ID: <7aa45cee71e384a9cb9648a971715087@bioworxx.com> >> Not sure if I like the name BCAlignmentAlgorithm, it screams for >> typo's ;) We could have the result named BCAlignmentResult or >> BCSequenceAlignment. >> > I agree with you. I like the BCSequenceAlignment version. So there is my new idea: You can see a alignment-algorithms as convenient constructors of a BCAlignment (or BCSequenceAlignment), so we could have categories for BCAlignment like @interface BCAlignment ( PairwiseAlignment ) + (BCAlignment *)needlemanWunschAlignment...... + (BCAlignment *)smithWatermanAlignment.... @end @interface BCAlignment ( MultipleAlignment ) + (BCAlignment *)clustalWAlignment..... @end so everybody who wants to add alignment algorithms do that in additions (categories). Just another idea .... Phil From kvddrift at earthlink.net Sat Mar 12 07:38:10 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 12 Mar 2005 07:38:10 -0500 Subject: [Biococoa-dev] initWithSymbol In-Reply-To: <5c1e3e75d8fc4041096bf4216cb78094@earthlink.net> References: <51b1192b9b62ce475bd1069c70a0000e@earthlink.net> <3dd08ba9fc72202ce88532624cfaaa3b@nki.nl> <5c1e3e75d8fc4041096bf4216cb78094@earthlink.net> Message-ID: <91169e07ad4764207f3459b072d6db97@earthlink.net> On Mar 12, 2005, at 7:17 AM, Koen van der Drift wrote: > So would it be safe to have it return BCSymbol instead of id? Or is > there anotherSolution, maybe use initWithSymbolChar? > I changed it to use initWithSymbolChar and also commited Charles' request to replace aaForSymbol and baseForSymbol with symbolForChar. - Koen. From a.griekspoor at nki.nl Sat Mar 12 10:32:14 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Sat, 12 Mar 2005 16:32:14 +0100 Subject: [Biococoa-dev] BCPairwiseAlignment In-Reply-To: <7aa45cee71e384a9cb9648a971715087@bioworxx.com> References: <7aa45cee71e384a9cb9648a971715087@bioworxx.com> Message-ID: <2fd0daddee2b9c3ae9e8eea732bcccd6@nki.nl> Why the need for such categories to add sequence algorithms and not just in the BCSequenceAlignment class? The framework is completely opensource, so people can just modify the framework right (it would also stimulate them to add their additions to the public framework)... I like the idea of class methods to instantiate different kinds of alignments though! Cheers, Alex On 12-mrt-05, at 13:25, Philipp Seibel wrote: > >>> Not sure if I like the name BCAlignmentAlgorithm, it screams for >>> typo's ;) We could have the result named BCAlignmentResult or >>> BCSequenceAlignment. >>> >> I agree with you. I like the BCSequenceAlignment version. > > So there is my new idea: > > You can see a alignment-algorithms as convenient constructors of a > BCAlignment (or BCSequenceAlignment), so we could have categories for > BCAlignment like > > @interface BCAlignment ( PairwiseAlignment ) > > + (BCAlignment *)needlemanWunschAlignment...... > + (BCAlignment *)smithWatermanAlignment.... > > @end > > > @interface BCAlignment ( MultipleAlignment ) > > + (BCAlignment *)clustalWAlignment..... > > @end > > so everybody who wants to add alignment algorithms do that in > additions (categories). > > Just another idea .... > > > Phil > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 E-mail: a.griekspoor at nki.nl AIM: mekentosj at mac.com Web: http://www.mekentosj.com EnzymeX - To cut or not to cut http://www.mekentosj.com/enzymex ********************************************************* From kvddrift at earthlink.net Sat Mar 12 11:42:37 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 12 Mar 2005 11:42:37 -0500 Subject: [Biococoa-dev] BCPairwiseAlignment In-Reply-To: <2fd0daddee2b9c3ae9e8eea732bcccd6@nki.nl> References: <7aa45cee71e384a9cb9648a971715087@bioworxx.com> <2fd0daddee2b9c3ae9e8eea732bcccd6@nki.nl> Message-ID: On Mar 12, 2005, at 10:32 AM, Alexander Griekspoor wrote: > Why the need for such categories to add sequence algorithms and not > just in the BCSequenceAlignment class? The framework is completely > opensource, so people can just modify the framework right (it would > also stimulate them to add their additions to the public framework)... > I like the idea of class methods to instantiate different kinds of > alignments though! > I agree with Alex, we should try to use subclasses as much as possible instead of categories. - Koen. From biococoa at bioworxx.com Sat Mar 12 12:20:16 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Sat, 12 Mar 2005 18:20:16 +0100 Subject: [Biococoa-dev] BCPairwiseAlignment In-Reply-To: References: <7aa45cee71e384a9cb9648a971715087@bioworxx.com> <2fd0daddee2b9c3ae9e8eea732bcccd6@nki.nl> Message-ID: <43fbb85737659e8877292fa32b98254c@bioworxx.com> Am 12.03.2005 um 17:42 schrieb Koen van der Drift: > > On Mar 12, 2005, at 10:32 AM, Alexander Griekspoor wrote: > >> Why the need for such categories to add sequence algorithms and not >> just in the BCSequenceAlignment class? The framework is completely >> opensource, so people can just modify the framework right (it would >> also stimulate them to add their additions to the public >> framework)... I like the idea of class methods to instantiate >> different kinds of alignments though! >> > > I agree with Alex, we should try to use subclasses as much as possible > instead of categories. > > - Koen. > if i understand alex right, he also wants to add class methods to the new BCSequenceAlignment ( which is the new BCAlignment class ???? or not ?? ). The categories are only to seperate the different algorithms, there is no need of course, but i think its better to administrate. But Koen if you have a logical class - subclass structure just tell .... Phil From kvddrift at earthlink.net Sat Mar 12 12:50:57 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 12 Mar 2005 12:50:57 -0500 Subject: [Biococoa-dev] BCPairwiseAlignment In-Reply-To: <43fbb85737659e8877292fa32b98254c@bioworxx.com> References: <7aa45cee71e384a9cb9648a971715087@bioworxx.com> <2fd0daddee2b9c3ae9e8eea732bcccd6@nki.nl> <43fbb85737659e8877292fa32b98254c@bioworxx.com> Message-ID: On Mar 12, 2005, at 12:20 PM, Philipp Seibel wrote: > > But Koen if you have a logical class - subclass structure just tell > .... > Uhm, I am not sure if I understand what you mean by that :) - Koen. From charles.parnot at stanford.edu Sat Mar 12 15:17:43 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Sat, 12 Mar 2005 12:17:43 -0800 Subject: [Biococoa-dev] BCPairwiseAlignment In-Reply-To: References: <7aa45cee71e384a9cb9648a971715087@bioworxx.com> <2fd0daddee2b9c3ae9e8eea732bcccd6@nki.nl> <43fbb85737659e8877292fa32b98254c@bioworxx.com> Message-ID: At 12:50 PM -0500 3/12/05, Koen van der Drift wrote: >On Mar 12, 2005, at 12:20 PM, Philipp Seibel wrote: > >> >>But Koen if you have a logical class - subclass structure just tell .... >> > >Uhm, I am not sure if I understand what you mean by that :) > >- Koen. Allow me to jump in the discussion (and hopefully understand the issue). Subclasses should be used when it makes sense, when the design helps. And not just to separate chunks of code. Categories are not 'evil'. They are often warned against, but this is mostly when using categories on Apple's classes, because you don't know in which order they are loaded, and you don't know if your category might interfere with someone else's category, and which one will override which in case two methods have the same name. However, categories used inside your own code can be a nice way to cut your class in smaller chunks, both physically (different files), and logically (I believe the compiler will only recompile the category if only the catgory is modified, which makes for faster builds). For the user of our framework, it is completely transparent, as long as you put all the interfaces in the same header file, so that all the headers can be #import-ed at once. Does that fit with the ongoing discussion? charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From kvddrift at earthlink.net Sat Mar 12 15:53:04 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 12 Mar 2005 15:53:04 -0500 Subject: [Biococoa-dev] BCPairwiseAlignment In-Reply-To: References: <7aa45cee71e384a9cb9648a971715087@bioworxx.com> <2fd0daddee2b9c3ae9e8eea732bcccd6@nki.nl> <43fbb85737659e8877292fa32b98254c@bioworxx.com> Message-ID: <510d49fb7954b42cc8ac026086d18068@earthlink.net> On Mar 12, 2005, at 3:17 PM, Charles PARNOT wrote: > Categories are not 'evil'. They are often warned against, but this is > mostly when using categories on Apple's classes, because you don't > know in which order they are loaded, and you don't know if your > category might interfere with someone else's category, and which one > will override which in case two methods have the same name. > > However, categories used inside your own code can be a nice way to cut > your class in smaller chunks, both physically (different files), and > logically (I believe the compiler will only recompile the category if > only the catgory is modified, which makes for faster builds). For the > user of our framework, it is completely transparent, as long as you > put all the interfaces in the same header file, so that all the > headers can be #import-ed at once. > Thanks - I was under the impression that categories were intended to be used to extend private frameworks, such as Foundation. - Koen. From kvddrift at earthlink.net Sat Mar 12 20:00:19 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 12 Mar 2005 20:00:19 -0500 Subject: [Biococoa-dev] BCSymbolSet done In-Reply-To: References: Message-ID: <5c721b382e7a10cddea2b6f3ac25c095@earthlink.net> On Mar 11, 2005, at 2:18 AM, Charles PARNOT wrote: > I commited an updated version of BCSymbolSet, which is now immutable, > can return an array of symbols, etc and other goodies. > Here is the header below, so you don't have to necessarily update your > project right now. > > Comments and questions welcome! > In: + (BCSymbolSet *)rnaStrictSymbolSet { if ( rnaStrictSymbolSetRepresentation == nil ) { rnaStrictSymbolSetRepresentation = [[BCSymbolSet alloc] initWithString:@"ACGT" Shouldn't that be initWithString:@"ACGU" ? Also for the rnaSymbolSet I think it should be initWithString:@"ACGURYMKSWHBVDN" - Koen. From kvddrift at earthlink.net Sat Mar 12 20:06:48 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 12 Mar 2005 20:06:48 -0500 Subject: [Biococoa-dev] adding new files In-Reply-To: References: <825b4740c04ab2a1aff98b46f9d3ade7@earthlink.net> Message-ID: <0af5d607be63174f89a30f4bcc2995bf@earthlink.net> On Mar 11, 2005, at 12:34 AM, Charles PARNOT wrote: > In public headers, check that you do not use relative paths in your > #import statements. > Example: use #import "BCSequence.h" and not #import > "../BCSequence/BCSequence.h" if called from a file in another folder. > The compiler will find the header even without the correct path. > Wasn't this changed the other way around when Alex did a big file reorganization? From #import "BCSequence.h" to #import "../BCSequence/BCSequence.h"? If we can go back to just #import "BCSequence.h", I will make the changes in the existing files as well. - Koen. From a.griekspoor at nki.nl Sun Mar 13 08:41:49 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Sun, 13 Mar 2005 14:41:49 +0100 Subject: [Biococoa-dev] BCPairwiseAlignment In-Reply-To: <510d49fb7954b42cc8ac026086d18068@earthlink.net> References: <7aa45cee71e384a9cb9648a971715087@bioworxx.com> <2fd0daddee2b9c3ae9e8eea732bcccd6@nki.nl> <43fbb85737659e8877292fa32b98254c@bioworxx.com> <510d49fb7954b42cc8ac026086d18068@earthlink.net> Message-ID: <5a249cfa5d477ead72bde179efb16682@nki.nl> > On Mar 12, 2005, at 3:17 PM, Charles PARNOT wrote: > >> Categories are not 'evil'. They are often warned against, but this is >> mostly when using categories on Apple's classes, because you don't >> know in which order they are loaded, and you don't know if your >> category might interfere with someone else's category, and which one >> will override which in case two methods have the same name. >> >> However, categories used inside your own code can be a nice way to >> cut your class in smaller chunks, both physically (different files), >> and logically (I believe the compiler will only recompile the >> category if only the catgory is modified, which makes for faster >> builds). For the user of our framework, it is completely transparent, >> as long as you put all the interfaces in the same header file, so >> that all the headers can be #import-ed at once. >> > > Thanks - I was under the impression that categories were intended to > be used to extend private frameworks, such as Foundation. They are also. But they're used to separate code within the same .h/.m file as well. Phil, the problem Koen and I have as far as I can speak for the two of us, is that we don't really see the structure you have in mind with the categories. Charles is right, the class-subclass layering only has use if there's code that can be grouped together between in this case different algorithms. So, if there is lots of code to share between let's say a smith-waterman and needleman-wunsch alignment then it makes sense to create a superclass for all algorithms. If there's hardly any, than there's not much use. So to clear things up from my side: BCAlignment becomes BCSequenceAlignment, this is the net result of an alignment, some way to store them in the end Then the question is what do we call for instance a Smith-Waterman local alignment and does it make sense to let them derive from a BCAlignmentAlgorithm (I believe Koen had a better name) superclass. Finally, the categories. I'm not sure where this would fit in, but the reason I said to just incorporate the convenience method in the class was based on the idea that given a class named BCSmithWatermanAlgorithm (to mention a horrible name), it would just have a class method: + (BCSequenceAlignment *)alignmentOfSequences: (NSArray *)sequences criteria: (NSDictionary *) dict; (to mention another horrible method name). I don't really see where the categories come in and what structure Phil had in mind.. Does this make my remarks more clear? Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 E-mail: a.griekspoor at nki.nl AIM: mekentosj at mac.com Web: http://www.mekentosj.com EnzymeX - To cut or not to cut http://www.mekentosj.com/enzymex ********************************************************* From a.griekspoor at nki.nl Sun Mar 13 08:42:22 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Sun, 13 Mar 2005 14:42:22 +0100 Subject: [Biococoa-dev] BCSymbolSet done In-Reply-To: <5c721b382e7a10cddea2b6f3ac25c095@earthlink.net> References: <5c721b382e7a10cddea2b6f3ac25c095@earthlink.net> Message-ID: Yep! On 13-mrt-05, at 2:00, Koen van der Drift wrote: > > On Mar 11, 2005, at 2:18 AM, Charles PARNOT wrote: > >> I commited an updated version of BCSymbolSet, which is now immutable, >> can return an array of symbols, etc and other goodies. >> Here is the header below, so you don't have to necessarily update >> your project right now. >> >> Comments and questions welcome! >> > > In: > > + (BCSymbolSet *)rnaStrictSymbolSet > { > if ( rnaStrictSymbolSetRepresentation == nil ) { > rnaStrictSymbolSetRepresentation = [[BCSymbolSet alloc] > initWithString:@"ACGT" > > > Shouldn't that be initWithString:@"ACGU" ? > > Also for the rnaSymbolSet I think it should be > initWithString:@"ACGURYMKSWHBVDN" > > > - Koen. > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 E-mail: a.griekspoor at nki.nl AIM: mekentosj at mac.com Web: http://www.mekentosj.com EnzymeX - To cut or not to cut http://www.mekentosj.com/enzymex ********************************************************* From a.griekspoor at nki.nl Sun Mar 13 08:43:27 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Sun, 13 Mar 2005 14:43:27 +0100 Subject: [Biococoa-dev] adding new files In-Reply-To: <0af5d607be63174f89a30f4bcc2995bf@earthlink.net> References: <825b4740c04ab2a1aff98b46f9d3ade7@earthlink.net> <0af5d607be63174f89a30f4bcc2995bf@earthlink.net> Message-ID: <478639497796f06d4dafe42c368f9637@nki.nl> > Wasn't this changed the other way around when Alex did a big file > reorganization? From #import "BCSequence.h" to #import > "../BCSequence/BCSequence.h"? If we can go back to just #import > "BCSequence.h", I will make the changes in the existing files as well. I can't remember why at the time I had to use this approach to make it work, it just was the case. But if the simple import works, yes please change it! Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* From biococoa at bioworxx.com Sun Mar 13 08:58:31 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Sun, 13 Mar 2005 14:58:31 +0100 Subject: [Biococoa-dev] BCPairwiseAlignment In-Reply-To: <5a249cfa5d477ead72bde179efb16682@nki.nl> References: <7aa45cee71e384a9cb9648a971715087@bioworxx.com> <2fd0daddee2b9c3ae9e8eea732bcccd6@nki.nl> <43fbb85737659e8877292fa32b98254c@bioworxx.com> <510d49fb7954b42cc8ac026086d18068@earthlink.net> <5a249cfa5d477ead72bde179efb16682@nki.nl> Message-ID: <7a579e0566b0b8748b9deb8d16e4a775@bioworxx.com> >> > They are also. But they're used to separate code within the same .h/.m > file as well. > Phil, the problem Koen and I have as far as I can speak for the two of > us, is that we don't really see the structure you have in mind with > the categories. > Charles is right, the class-subclass layering only has use if there's > code that can be grouped together between in this case different > algorithms. So, if there is lots of code to share between let's say a > smith-waterman and needleman-wunsch alignment then it makes sense to > create a superclass for all algorithms. If there's hardly any, than > there's not much use. > So to clear things up from my side: > BCAlignment becomes BCSequenceAlignment, this is the net result of an > alignment, some way to store them in the end > Then the question is what do we call for instance a Smith-Waterman > local alignment and does it make sense to let them derive from a > BCAlignmentAlgorithm (I believe Koen had a better name) superclass. Ok, i don't see anything a multiple alignment and a pairwise alignment have in common. Ok clustal uses Pairwise alignments to compute a non optimal multiple alignment for example, but we don't need a super - subclass structure. > Finally, the categories. I'm not sure where this would fit in, but the > reason I said to just incorporate the convenience method in the class > was based on the idea that given a class named > BCSmithWatermanAlgorithm (to mention a horrible name), it would just > have a class method: > + (BCSequenceAlignment *)alignmentOfSequences: (NSArray *)sequences > criteria: (NSDictionary *) dict; (to mention another horrible method > name). > I don't really see where the categories come in and what structure > Phil had in mind.. Does this make my remarks more clear? My idea was just to put all alignment algorithms into BCSequenceAlignment as convenient methods. The BCSequenceAlignment can represent multiple & pairwise alignments, so the categories just came in to make it more readable, nothing else ;-) Hope this was understandable Phil From a.griekspoor at nki.nl Sun Mar 13 09:01:11 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Sun, 13 Mar 2005 15:01:11 +0100 Subject: [Biococoa-dev] BCPairwiseAlignment In-Reply-To: <7a579e0566b0b8748b9deb8d16e4a775@bioworxx.com> References: <7aa45cee71e384a9cb9648a971715087@bioworxx.com> <2fd0daddee2b9c3ae9e8eea732bcccd6@nki.nl> <43fbb85737659e8877292fa32b98254c@bioworxx.com> <510d49fb7954b42cc8ac026086d18068@earthlink.net> <5a249cfa5d477ead72bde179efb16682@nki.nl> <7a579e0566b0b8748b9deb8d16e4a775@bioworxx.com> Message-ID: > My idea was just to put all alignment algorithms into > BCSequenceAlignment as convenient methods. The BCSequenceAlignment can > represent multiple & pairwise alignments, so > the categories just came in to make it more readable, nothing else ;-) > > Hope this was understandable Yep, got it, so we're talking about the categories within the same .h file right? Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* From philipp.seibel at byteworxx.de Sun Mar 13 09:12:38 2005 From: philipp.seibel at byteworxx.de (Philipp Seibel) Date: Sun, 13 Mar 2005 15:12:38 +0100 Subject: [Biococoa-dev] BCPairwiseAlignment Message-ID: Am 13.03.2005 um 15:01 schrieb Alexander Griekspoor: >> My idea was just to put all alignment algorithms into >> BCSequenceAlignment as convenient methods. The BCSequenceAlignment >> can represent multiple & pairwise alignments, so >> the categories just came in to make it more readable, nothing else ;-) >> >> Hope this was understandable > Yep, got it, so we're talking about the categories within the same .h > file right? Yes thats it. What do you think about it. Btw: How can i change the BCAlignment to BCSequenceAlignment. Should i create new files ?? Phil From a.griekspoor at nki.nl Sun Mar 13 11:23:41 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Sun, 13 Mar 2005 17:23:41 +0100 Subject: [Biococoa-dev] BCPairwiseAlignment In-Reply-To: References: Message-ID: <631ef0313b51836eede83b52fcbbb725@nki.nl> >> Yep, got it, so we're talking about the categories within the same .h >> file right? > Yes thats it. What do you think about it. Now that I understand what you meant, it sounds good phil! > Btw: How can i change the BCAlignment to BCSequenceAlignment. Should i > create new files ?? You can even do that in XCode, it will handle the delete and creation. Right click the file and rename it, in the SCM window the changes will take place upon a commit. Cheers, Alex > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows vs Mac 65 million years ago, there were more dinosaurs than humans. Where are the dinosaurs now? ********************************************************* From kvddrift at earthlink.net Sun Mar 13 12:05:59 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 13 Mar 2005 12:05:59 -0500 Subject: [Biococoa-dev] adding new files In-Reply-To: <478639497796f06d4dafe42c368f9637@nki.nl> References: <825b4740c04ab2a1aff98b46f9d3ade7@earthlink.net> <0af5d607be63174f89a30f4bcc2995bf@earthlink.net> <478639497796f06d4dafe42c368f9637@nki.nl> Message-ID: On Mar 13, 2005, at 8:43 AM, Alexander Griekspoor wrote: > I can't remember why at the time I had to use this approach to make it > work, it just was the case. But if the simple import works, yes please > change it! > Fixed. - Koen. From kvddrift at earthlink.net Sun Mar 13 12:06:20 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 13 Mar 2005 12:06:20 -0500 Subject: [Biococoa-dev] BCSymbolSet done In-Reply-To: References: <5c721b382e7a10cddea2b6f3ac25c095@earthlink.net> Message-ID: On Mar 13, 2005, at 8:42 AM, Alexander Griekspoor wrote: >> >> Shouldn't that be initWithString:@"ACGU" ? >> >> Also for the rnaSymbolSet I think it should be >> initWithString:@"ACGURYMKSWHBVDN" >> Fixed. - Koen. From biococoa at bioworxx.com Sun Mar 13 12:28:15 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Sun, 13 Mar 2005 18:28:15 +0100 Subject: [Biococoa-dev] BCFoundation.h Message-ID: hi, anyone has changed the BCFoundation header in a wrong way. We have to write "MyHeader.h" instead of because we're inside the framework. So please change it ( the one who does the changes ). Phil From biococoa at bioworxx.com Sun Mar 13 13:07:49 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Sun, 13 Mar 2005 19:07:49 +0100 Subject: [Biococoa-dev] BCFoundation.h In-Reply-To: References: Message-ID: Am 13.03.2005 um 18:28 schrieb Philipp Seibel: > hi, > > anyone has changed the BCFoundation header in a wrong way. We have to > write "MyHeader.h" instead of because we're > inside the framework. So please change it ( the one who does the > changes ). > > Phil > Sorry i was wrong ;-). My fault. Phil From biococoa at bioworxx.com Sun Mar 13 14:46:07 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Sun, 13 Mar 2005 20:46:07 +0100 Subject: [Biococoa-dev] BCSymbolMapping Message-ID: <26f52783c4c18ccecd386437ab51bca9@bioworxx.com> Hi, i'd like to implement charles BCSymbolMapping for the alignment algorithms, so we have to discuss the details. I think we should implement a class called BCSymbolMapping with methods like - (int)intMappingForSymbol:(BCSymbol *)symbol; - (char)charMappingForSymbol:(BCSymbol *)symbol; - (NSRange)rangeForCharMapping; - (NSRange)rangeForIntMapping; Every symbolSet class gets a method like - (BCSymbolMapping *)symbolMapping; and thats it *g* any ideas how to implement the mapping efficiently ? so the discussion is declared open ;-). Phil From kvddrift at earthlink.net Sun Mar 13 15:58:21 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 13 Mar 2005 15:58:21 -0500 Subject: [Biococoa-dev] BCSymbolMapping In-Reply-To: <26f52783c4c18ccecd386437ab51bca9@bioworxx.com> References: <26f52783c4c18ccecd386437ab51bca9@bioworxx.com> Message-ID: <9071d0e5af6a4017dde03aee194c849c@earthlink.net> Philipp, Could you expand a little on what this class would do? We already have BCSymbol methods to get a char for each symbol, so I am not sure what additional advantages your BCSymbolMapping class would have. - Koen. On Mar 13, 2005, at 2:46 PM, Philipp Seibel wrote: > Hi, > > i'd like to implement charles BCSymbolMapping for the alignment > algorithms, so we have to discuss the details. > > I think we should implement a class called BCSymbolMapping with > methods like > > - (int)intMappingForSymbol:(BCSymbol *)symbol; > - (char)charMappingForSymbol:(BCSymbol *)symbol; > > - (NSRange)rangeForCharMapping; > - (NSRange)rangeForIntMapping; > > Every symbolSet class gets a method like > > - (BCSymbolMapping *)symbolMapping; > > and thats it *g* > > any ideas how to implement the mapping efficiently ? > so the discussion is declared open ;-). > > Phil > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > From biococoa at bioworxx.com Sun Mar 13 16:15:43 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Sun, 13 Mar 2005 22:15:43 +0100 Subject: [Biococoa-dev] BCSymbolMapping In-Reply-To: <9071d0e5af6a4017dde03aee194c849c@earthlink.net> References: <26f52783c4c18ccecd386437ab51bca9@bioworxx.com> <9071d0e5af6a4017dde03aee194c849c@earthlink.net> Message-ID: <9efcefd6f3ada9c797a2d118c5453f5c@bioworxx.com> Am 13.03.2005 um 21:58 schrieb Koen van der Drift: > Philipp, > > Could you expand a little on what this class would do? We already have > BCSymbol methods to get a char for each symbol, so I am not sure what > additional advantages your BCSymbolMapping class would have. > > > - Koen. > sure koen, charles wrote ( in Thread "symbol mapping for optimization" just look in the list ;-) ): I think what is really not obvious at first, and that can be confusing, is the separation between (1) and (2). It seems obvious that a char should be the char corresponding to the BCSymbol, for instance base 'A' should be mapped to char 'A'. Maybe we will do that initially but we want to be able to modify that in the future, or even to have more dynamic mapping depending on the context. For instance, we might find later that mapping the bases ATGC to teh chars '0x00-0x01-0x02-0x03' is much better than mapping to the 'ATGC' chars, because we don't have useless chars in between each used char. We then just have to modify the code in (2), and probably only one or two lines of code, to propagate whatever optimization we make in the translation to the whole framework. phil ( thats me ;-) ): So we need an optimal mapping for special SymbolSets. For example ATCG should map to 0, 1, 2, 3 to get the best mapping for algorithms. If we take the actual char method we would get the int representation of a special character ( e.g. A = 'A' = (int)'A' = don't know the asci number ;-) ), but thats not what we need. So amino acids should be mapped to 0...22 and nucleotides should be mapped to 0....3. hope you got it, feel free to ask again ;-) Phil From a.griekspoor at nki.nl Sun Mar 13 16:41:22 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Sun, 13 Mar 2005 22:41:22 +0100 Subject: [Biococoa-dev] BCSymbolMapping In-Reply-To: <9efcefd6f3ada9c797a2d118c5453f5c@bioworxx.com> References: <26f52783c4c18ccecd386437ab51bca9@bioworxx.com> <9071d0e5af6a4017dde03aee194c849c@earthlink.net> <9efcefd6f3ada9c797a2d118c5453f5c@bioworxx.com> Message-ID: <34412a3bfff00e8ec2e027e1dd265c44@nki.nl> Hmmm, somehow I totally miss the reason the remapping. Why would it be leaner/faster? What's the difference between: char c = ('a' == 'a') ? 'I' : 'X'; and: char c = ('0x00' == '0x00') ? 'I' : 'X'; So in the example I lend from the sample code I used previously already, the substitution matrix is a simple 128x128 char array and the characters are placed at their own spot. > match = 1; > mismh = -1; > /* set match and mismatch weights */ > for ( i = 0; i < 128 ; i++ ) > for ( j = 0; j < 128 ; j++ ) > if (i == j ) v[i][j] = match; > else v[i][j] = mismh; > > v['N']['N'] = mismh; > v['n']['n'] = mismh; > v['A']['a'] = v['a']['A'] = match; > v['C']['c'] = v['c']['C'] = match; > v['G']['g'] = v['g']['G'] = match; > v['T']['t'] = v['t']['T'] = match; > > So, you simply build a 128x128 char matrix using the fact that chars > are ints > Next to calculate the score: > > char *a = A[++i]; // character i in sequence A > char *b = B[++j]; // character j in sequence B > char *c++ = (*a == *b || isdna && v[*a][*b] == MATCHSC ) ? '|' : ' '; So again, if we convert the sequences to char arrays why the remap? In the sample code above this 128x128 matrix is instantiated only once, takes up hardly any memory and prevents the time needed for the remap! So why the hassle for the few unused spots in the matrix? It it really worth all the trouble going from a 128x128 array (we're talking about 16Kb of RAM!) to a 16x16 array or so? I understand the conversion from BCSequence to char-array, but that can still be done with the normal chars right? Or is the idea that when we do the conversion we can do the remap along? I'm just worried that the code won't be easier to understand and much more error prone if we're have to remap everything all the time. And Koen has a point, can we just add the method charRepresentation in BCSequence for instance, which does the translation job (and sequenceFromCharArray) or something. No need for a translation object right? Again, perhaps I'm taking to many steps in the wrong direction at once... Alex On 13-mrt-05, at 22:15, Philipp Seibel wrote: > > Am 13.03.2005 um 21:58 schrieb Koen van der Drift: > >> Philipp, >> >> Could you expand a little on what this class would do? We already >> have BCSymbol methods to get a char for each symbol, so I am not sure >> what additional advantages your BCSymbolMapping class would have. >> >> >> - Koen. >> > sure koen, > > charles wrote ( in Thread "symbol mapping for optimization" just look > in the list ;-) ): > > I think what is really not obvious at first, and that can be > confusing, is the separation between (1) and (2). It seems obvious > that a char should be the char corresponding to the BCSymbol, for > instance base 'A' should be mapped to char 'A'. Maybe we will do that > initially but we want to be able to modify that in the future, or even > to have more dynamic mapping depending on the context. For instance, > we might find later that mapping the bases ATGC to teh chars > '0x00-0x01-0x02-0x03' is much better than mapping to the 'ATGC' chars, > because we don't have useless chars in between each used char. We then > just have to modify the code in (2), and probably only one or two > lines of code, to propagate whatever optimization we make in the > translation to the whole framework. > > phil ( thats me ;-) ): > > So we need an optimal mapping for special SymbolSets. For example ATCG > should map to 0, 1, 2, 3 to get the best mapping for algorithms. If we > take the actual char method we would get the int representation of a > special character ( e.g. A = 'A' = (int)'A' = don't know the asci > number ;-) ), but thats not what we need. > So amino acids should be mapped to 0...22 and nucleotides should be > mapped to 0....3. > > hope you got it, feel free to ask again ;-) > > Phil > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com The requirements said: Windows 2000 or better. So I got a Macintosh. ********************************************************* -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 6124 bytes Desc: not available URL: From biococoa at bioworxx.com Sun Mar 13 16:55:02 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Sun, 13 Mar 2005 22:55:02 +0100 Subject: [Biococoa-dev] BCSymbolMapping In-Reply-To: <34412a3bfff00e8ec2e027e1dd265c44@nki.nl> References: <26f52783c4c18ccecd386437ab51bca9@bioworxx.com> <9071d0e5af6a4017dde03aee194c849c@earthlink.net> <9efcefd6f3ada9c797a2d118c5453f5c@bioworxx.com> <34412a3bfff00e8ec2e027e1dd265c44@nki.nl> Message-ID: Am 13.03.2005 um 22:41 schrieb Alexander Griekspoor: > Hmmm, somehow I totally miss the reason the remapping. Why would it be > leaner/faster? It would not be faster, but more flexible, because we map the symbols to the minimal set of ints. Not only for perfomance or memory optimization. The next problem is the handling only through the char method, because we need to check for uppercase or other semantic things, where the algorithm is not really responsible for. These things should be handled by the BCSymbol stuff. For example a 'a' and 'A' should be mapped to the same int int the dna symbol class. > What's the difference between: > char c = ('a' == 'a') ? 'I' : 'X'; > and: > char c = ('0x00' == '0x00') ? 'I' : 'X'; > So in the example I lend from the sample code I used previously > already, the substitution matrix is a simple 128x128 char array and > the characters are placed at their own spot. > >> match = 1; >> mismh = -1; >> /* set match and mismatch weights */ >> for ( i = 0; i < 128 ; i++ ) >> for ( j = 0; j < 128 ; j++ ) >> if (i == j ) v[i][j] = match; >> else v[i][j] = mismh; >> >> v['N']['N'] = mismh; >> v['n']['n'] = mismh; >> v['A']['a'] = v['a']['A'] = match; >> v['C']['c'] = v['c']['C'] = match; >> v['G']['g'] = v['g']['G'] = match; >> v['T']['t'] = v['t']['T'] = match; >> >> So, you simply build a 128x128 char matrix using the fact that chars >> are ints >> Next to calculate the score: >> >> char *a = A[++i]; // character i in sequence A >> char *b = B[++j]; // character j in sequence B >> char *c++ = (*a == *b || isdna && v[*a][*b] == MATCHSC ) ? '|' : ' '; > > So again, if we convert the sequences to char arrays why the remap? In > the sample code above this 128x128 matrix is instantiated only once, > takes up hardly any memory and prevents the time needed for the remap! > So why the hassle for the few unused spots in the matrix? It it really > worth all the trouble going from a 128x128 array (we're talking about > 16Kb of RAM!) to a 16x16 array or so? > I understand the conversion from BCSequence to char-array, but that > can still be done with the normal chars right? Or is the idea that > when we do the conversion we can do the remap along? I'm just worried > that the code won't be easier to understand and much more error prone > if we're have to remap everything all the time. > And Koen has a point, can we just add the method charRepresentation in > BCSequence for instance, which does the translation job (and > sequenceFromCharArray) or something. No need for a translation object > right? > Again, perhaps I'm taking to many steps in the wrong direction at > once... Phil -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 4119 bytes Desc: not available URL: From a.griekspoor at nki.nl Sun Mar 13 17:11:25 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Sun, 13 Mar 2005 23:11:25 +0100 Subject: [Biococoa-dev] BCSymbolMapping In-Reply-To: References: <26f52783c4c18ccecd386437ab51bca9@bioworxx.com> <9071d0e5af6a4017dde03aee194c849c@earthlink.net> <9efcefd6f3ada9c797a2d118c5453f5c@bioworxx.com> <34412a3bfff00e8ec2e027e1dd265c44@nki.nl> Message-ID: > Am 13.03.2005 um 22:41 schrieb Alexander Griekspoor: > >> Hmmm, somehow I totally miss the reason the remapping. Why would it >> be leaner/faster? > > It would not be faster, but more flexible, because we map the symbols > to the minimal set of ints. An int is 4 times the size of a char so there goes part of the optimization. And why it would be more flexible I don't see, a basic 128x128 array features all the ascii characters that you want. > Not only for perfomance or memory optimization. All the trouble to save from using 16kb (a 128x128 char matrix)?! And it's only allocated once! As far as a 500 nucleotide char array goes, it will be just as big in memory if it is: 'ACGT' as '0x00 0x01 0x02 0x03'. And what code is easier to read? Also the remapping will come with cost (not much but hey, more code is more time and more errors). > The next problem is the handling only through the char method, because > we need to check for uppercase or other semantic things, where the > algorithm is not really responsible for. No we do not have to, because we know what char each symbol will return. The symbol templates dictate that (currently uppercase)! The proposed -charArrayRepresentation (or something alike) method in the BCSequence superclass will simply itterate over the symbols and ask each one for it's symbol via the - (unichar) symbol; method. For the otherway around we should just add an initFromCharArray or somthing to BCSequence. > These things should be handled by the BCSymbol stuff. For example a > 'a' and 'A' should be mapped to the same int int the dna symbol class. Well, you can store 4 variants of a char in the space of one int ;-) But again that's not an issue, see above. > > >> What's the difference between: >> char c = ('a' == 'a') ? 'I' : 'X'; >> and: >> char c = ('0x00' == '0x00') ? 'I' : 'X'; >> So in the example I lend from the sample code I used previously >> already, the substitution matrix is a simple 128x128 char array and >> the characters are placed at their own spot. >> >>> match = 1; >>> mismh = -1; >>> /* set match and mismatch weights */ >>> for ( i = 0; i < 128 ; i++ ) >>> for ( j = 0; j < 128 ; j++ ) >>> if (i == j ) v[i][j] = match; >>> else v[i][j] = mismh; >>> >>> v['N']['N'] = mismh; >>> v['n']['n'] = mismh; >>> v['A']['a'] = v['a']['A'] = match; >>> v['C']['c'] = v['c']['C'] = match; >>> v['G']['g'] = v['g']['G'] = match; >>> v['T']['t'] = v['t']['T'] = match; >>> >>> So, you simply build a 128x128 char matrix using the fact that chars >>> are ints >>> Next to calculate the score: >>> >>> char *a = A[++i]; // character i in sequence A >>> char *b = B[++j]; // character j in sequence B >>> char *c++ = (*a == *b || isdna && v[*a][*b] == MATCHSC ) ? '|' : ' >>> '; >> >> So again, if we convert the sequences to char arrays why the remap? >> In the sample code above this 128x128 matrix is instantiated only >> once, takes up hardly any memory and prevents the time needed for the >> remap! So why the hassle for the few unused spots in the matrix? It >> it really worth all the trouble going from a 128x128 array (we're >> talking about 16Kb of RAM!) to a 16x16 array or so? >> I understand the conversion from BCSequence to char-array, but that >> can still be done with the normal chars right? Or is the idea that >> when we do the conversion we can do the remap along? I'm just worried >> that the code won't be easier to understand and much more error prone >> if we're have to remap everything all the time. >> And Koen has a point, can we just add the method charRepresentation >> in BCSequence for instance, which does the translation job (and >> sequenceFromCharArray) or something. No need for a translation object >> right? >> Again, perhaps I'm taking to many steps in the wrong direction at >> once... > > > > Phil > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > ************************************************************** ** Alexander Griekspoor ** ************************************************************** The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com MacOS X: The power of UNIX with the simplicity of the Mac *************************************************************** -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 6203 bytes Desc: not available URL: From biococoa at bioworxx.com Sun Mar 13 17:18:53 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Sun, 13 Mar 2005 23:18:53 +0100 Subject: [Biococoa-dev] BCSymbolMapping In-Reply-To: References: <26f52783c4c18ccecd386437ab51bca9@bioworxx.com> <9071d0e5af6a4017dde03aee194c849c@earthlink.net> <9efcefd6f3ada9c797a2d118c5453f5c@bioworxx.com> <34412a3bfff00e8ec2e027e1dd265c44@nki.nl> Message-ID: ok you won ;-). I just want to finish one version ;-) Phil Am 13.03.2005 um 23:11 schrieb Alexander Griekspoor: > >> Am 13.03.2005 um 22:41 schrieb Alexander Griekspoor: >> >>> Hmmm, somehow I totally miss the reason the remapping. Why would it >>> be leaner/faster? >> >> It would not be faster, but more flexible, because we map the symbols >> to the minimal set of ints. > An int is 4 times the size of a char so there goes part of the > optimization. And why it would be more flexible I don't see, a basic > 128x128 array features all the ascii characters that you want. > >> Not only for perfomance or memory optimization. > All the trouble to save from using 16kb (a 128x128 char matrix)?! And > it's only allocated once! > As far as a 500 nucleotide char array goes, it will be just as big in > memory if it is: 'ACGT' as '0x00 0x01 0x02 0x03'. And what code is > easier to read? Also the remapping will come with cost (not much but > hey, more code is more time and more errors). > >> The next problem is the handling only through the char method, >> because we need to check for uppercase or other semantic things, >> where the algorithm is not really responsible for. > No we do not have to, because we know what char each symbol will > return. The symbol templates dictate that (currently uppercase)! > The proposed -charArrayRepresentation (or something alike) method in > the BCSequence superclass will simply itterate over the symbols and > ask each one for it's symbol via the - (unichar) symbol; method. For > the otherway around we should just add an initFromCharArray or > somthing to BCSequence. > >> These things should be handled by the BCSymbol stuff. For example a >> 'a' and 'A' should be mapped to the same int int the dna symbol >> class. > Well, you can store 4 variants of a char in the space of one int ;-) > But again that's not an issue, see above. >> >> >>> What's the difference between: >>> char c = ('a' == 'a') ? 'I' : 'X'; >>> and: >>> char c = ('0x00' == '0x00') ? 'I' : 'X'; >>> So in the example I lend from the sample code I used previously >>> already, the substitution matrix is a simple 128x128 char array and >>> the characters are placed at their own spot. >>> >>>> match = 1; >>>> mismh = -1; >>>> /* set match and mismatch weights */ >>>> for ( i = 0; i < 128 ; i++ ) >>>> for ( j = 0; j < 128 ; j++ ) >>>> if (i == j ) v[i][j] = match; >>>> else v[i][j] = mismh; >>>> >>>> v['N']['N'] = mismh; >>>> v['n']['n'] = mismh; >>>> v['A']['a'] = v['a']['A'] = match; >>>> v['C']['c'] = v['c']['C'] = match; >>>> v['G']['g'] = v['g']['G'] = match; >>>> v['T']['t'] = v['t']['T'] = match; >>>> >>>> So, you simply build a 128x128 char matrix using the fact that >>>> chars are ints >>>> Next to calculate the score: >>>> >>>> char *a = A[++i]; // character i in sequence A >>>> char *b = B[++j]; // character j in sequence B >>>> char *c++ = (*a == *b || isdna && v[*a][*b] == MATCHSC ) ? '|' : ' >>>> '; >>> >>> So again, if we convert the sequences to char arrays why the remap? >>> In the sample code above this 128x128 matrix is instantiated only >>> once, takes up hardly any memory and prevents the time needed for >>> the remap! So why the hassle for the few unused spots in the matrix? >>> It it really worth all the trouble going from a 128x128 array (we're >>> talking about 16Kb of RAM!) to a 16x16 array or so? >>> I understand the conversion from BCSequence to char-array, but that >>> can still be done with the normal chars right? Or is the idea that >>> when we do the conversion we can do the remap along? I'm just >>> worried that the code won't be easier to understand and much more >>> error prone if we're have to remap everything all the time. >>> And Koen has a point, can we just add the method charRepresentation >>> in BCSequence for instance, which does the translation job (and >>> sequenceFromCharArray) or something. No need for a translation >>> object right? >>> Again, perhaps I'm taking to many steps in the wrong direction at >>> once... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 5461 bytes Desc: not available URL: From a.griekspoor at nki.nl Sun Mar 13 17:33:27 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Sun, 13 Mar 2005 23:33:27 +0100 Subject: [Biococoa-dev] BCSymbolMapping In-Reply-To: References: <26f52783c4c18ccecd386437ab51bca9@bioworxx.com> <9071d0e5af6a4017dde03aee194c849c@earthlink.net> <9efcefd6f3ada9c797a2d118c5453f5c@bioworxx.com> <34412a3bfff00e8ec2e027e1dd265c44@nki.nl> Message-ID: <79b6e48ad6fd1e5cf53ab65354db1ea2@nki.nl> you asked for a debate, you got one ;-) Well, discussion instead of debate perhaps, the others might still finish me off ;-) Cheers, Alex On 13-mrt-05, at 23:18, Philipp Seibel wrote: > ok you won ;-). I just want to finish one version ;-) > > Phil > > Am 13.03.2005 um 23:11 schrieb Alexander Griekspoor: > >> >>> Am 13.03.2005 um 22:41 schrieb Alexander Griekspoor: >>> >>>> Hmmm, somehow I totally miss the reason the remapping. Why would it >>>> be leaner/faster? >>> >>> It would not be faster, but more flexible, because we map the >>> symbols to the minimal set of ints. >> An int is 4 times the size of a char so there goes part of the >> optimization. And why it would be more flexible I don't see, a basic >> 128x128 array features all the ascii characters that you want. >> >>> Not only for perfomance or memory optimization. >> All the trouble to save from using 16kb (a 128x128 char matrix)?! And >> it's only allocated once! >> As far as a 500 nucleotide char array goes, it will be just as big in >> memory if it is: 'ACGT' as '0x00 0x01 0x02 0x03'. And what code is >> easier to read? Also the remapping will come with cost (not much but >> hey, more code is more time and more errors). >> >>> The next problem is the handling only through the char method, >>> because we need to check for uppercase or other semantic things, >>> where the algorithm is not really responsible for. >> No we do not have to, because we know what char each symbol will >> return. The symbol templates dictate that (currently uppercase)! >> The proposed -charArrayRepresentation (or something alike) method in >> the BCSequence superclass will simply itterate over the symbols and >> ask each one for it's symbol via the - (unichar) symbol; method. For >> the otherway around we should just add an initFromCharArray or >> somthing to BCSequence. >> >>> These things should be handled by the BCSymbol stuff. For example a >>> 'a' and 'A' should be mapped to the same int int the dna symbol >>> class. >> Well, you can store 4 variants of a char in the space of one int ;-) >> But again that's not an issue, see above. >>> >>> >>>> What's the difference between: >>>> char c = ('a' == 'a') ? 'I' : 'X'; >>>> and: >>>> char c = ('0x00' == '0x00') ? 'I' : 'X'; >>>> So in the example I lend from the sample code I used previously >>>> already, the substitution matrix is a simple 128x128 char array and >>>> the characters are placed at their own spot. >>>> >>>>> match = 1; >>>>> mismh = -1; >>>>> /* set match and mismatch weights */ >>>>> for ( i = 0; i < 128 ; i++ ) >>>>> for ( j = 0; j < 128 ; j++ ) >>>>> if (i == j ) v[i][j] = match; >>>>> else v[i][j] = mismh; >>>>> >>>>> v['N']['N'] = mismh; >>>>> v['n']['n'] = mismh; >>>>> v['A']['a'] = v['a']['A'] = match; >>>>> v['C']['c'] = v['c']['C'] = match; >>>>> v['G']['g'] = v['g']['G'] = match; >>>>> v['T']['t'] = v['t']['T'] = match; >>>>> >>>>> So, you simply build a 128x128 char matrix using the fact that >>>>> chars are ints >>>>> Next to calculate the score: >>>>> >>>>> char *a = A[++i]; // character i in sequence A >>>>> char *b = B[++j]; // character j in sequence B >>>>> char *c++ = (*a == *b || isdna && v[*a][*b] == MATCHSC ) ? '|' : >>>>> ' '; >>>> >>>> So again, if we convert the sequences to char arrays why the remap? >>>> In the sample code above this 128x128 matrix is instantiated only >>>> once, takes up hardly any memory and prevents the time needed for >>>> the remap! So why the hassle for the few unused spots in the >>>> matrix? It it really worth all the trouble going from a 128x128 >>>> array (we're talking about 16Kb of RAM!) to a 16x16 array or so? >>>> I understand the conversion from BCSequence to char-array, but that >>>> can still be done with the normal chars right? Or is the idea that >>>> when we do the conversion we can do the remap along? I'm just >>>> worried that the code won't be easier to understand and much more >>>> error prone if we're have to remap everything all the time. >>>> And Koen has a point, can we just add the method charRepresentation >>>> in BCSequence for instance, which does the translation job (and >>>> sequenceFromCharArray) or something. No need for a translation >>>> object right? >>>> Again, perhaps I'm taking to many steps in the wrong direction at >>>> once... > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com iRNAi, do you? http://www.mekentosj.com/irnai ********************************************************* -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 6522 bytes Desc: not available URL: From kvddrift at earthlink.net Sun Mar 13 19:49:44 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 13 Mar 2005 19:49:44 -0500 Subject: [Biococoa-dev] BCPairwiseAlignment In-Reply-To: References: Message-ID: <04f6e5330d6947d613b3dde3bf65fbea@earthlink.net> On Mar 13, 2005, at 9:12 AM, Philipp Seibel wrote: > Btw: How can i change the BCAlignment to BCSequenceAlignment. Should i > create new files ?? > Should BCPairwiseAlignment still be in the framework? I noticed you removed it from BCFoundation.h, but it is still in the project. - Koen. From kvddrift at earthlink.net Sun Mar 13 20:40:57 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 13 Mar 2005 20:40:57 -0500 Subject: [Biococoa-dev] BCScanner Message-ID: <15ff215188b6f2477d50e97fb09cee9d@earthlink.net> Hi, Because it is needed for the BCDigest class, I started working on the BCScanner class. First method I attempted was - (BOOL)scanSequence:(BCAbstractSequence*)subSequence intoSequence:(BCAbstractSequence **)value. Trying to emulate what NSScanner does with a string, I am now using BCToolSequenceFinder to find the first occurance of the passed sequence. Because the BCScanner is probably going to be called a number of times in succession, I am doubting if using the BCToolSequenceFinder in this way is the most efficient. Maybe the BCToolSequenceFinder should be an ivar of the class? That would probably also be a better way to keep track of the scanLocation. I committed my first attempt, so have a look and let me know what you think. cheers, - Koen. From charles.parnot at stanford.edu Mon Mar 14 02:09:46 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Sun, 13 Mar 2005 23:09:46 -0800 Subject: [Biococoa-dev] adding new files In-Reply-To: References: <825b4740c04ab2a1aff98b46f9d3ade7@earthlink.net> <0af5d607be63174f89a30f4bcc2995bf@earthlink.net> <478639497796f06d4dafe42c368f9637@nki.nl> Message-ID: At 12:05 PM -0500 3/13/05, Koen van der Drift wrote: >On Mar 13, 2005, at 8:43 AM, Alexander Griekspoor wrote: > >>I can't remember why at the time I had to use this approach to make it work, it just was the case. But if the simple import works, yes please change it! >> > >Fixed. > >- Koen. I started the dicussion, so I should clarify what I think happens. (1) Why do we NEED to remove the relative paths in the public headers? Because when the framework is linked against another application and the person doing this tries to compile her stuff, the headers are parsed by the compiler. But at that moment, all the public headers of the BioCocoa framework are in a flat folder, so relative paths don't make sense. So these relative paths need to be removed in PUBLIC HEADERS, but not necessarily in other places. For example in implementation files, it does not matter. Note that #import in headers are relatively rare: usually for declaring subclasses where you need to load the superclass header. In most cases, @class statements are sufficient. (2) Why CAN we remove the relative paths and not confuse the compiler when we are compiling the framework itself? My guess (and it is just a guess) is that the compiler first gathers all the public headers to construct the 'Headers' folder in the bundle and these are remembered throughout the compilation (this is even more explicit with the notation ). However, the gathering will only happen for public headers, and privates headers won't be included (me think). I don't think we have private headers anyway in the framework, but should they appear as more code is added, they might not be #import-ed properly if their path is not included. charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From charles.parnot at stanford.edu Mon Mar 14 02:23:10 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Sun, 13 Mar 2005 23:23:10 -0800 Subject: [Biococoa-dev] BCSymbolSet done In-Reply-To: References: <5c721b382e7a10cddea2b6f3ac25c095@earthlink.net> Message-ID: At 12:06 PM -0500 3/13/05, Koen van der Drift wrote: >On Mar 13, 2005, at 8:42 AM, Alexander Griekspoor wrote: > >>> >>>Shouldn't that be initWithString:@"ACGU" ? >>> >>>Also for the rnaSymbolSet I think it should be initWithString:@"ACGURYMKSWHBVDN" >>> > >Fixed. > >- Koen. Nice job... Interestingly, here is the original method: + (BCSymbolSet *)rnaStrictSymbolSet { if ( rnaStrictSymbolSetRepresentation == nil ) { rnaStrictSymbolSetRepresentation = [[BCSymbolSet alloc] init]; [rnaStrictSymbolSetRepresentation addSymbol: [BCNucleotideRNA baseForSymbol: 'A']]; [rnaStrictSymbolSetRepresentation addSymbol: [BCNucleotideRNA baseForSymbol: 'C']]; [rnaStrictSymbolSetRepresentation addSymbol: [BCNucleotideRNA baseForSymbol: 'G']]; [rnaStrictSymbolSetRepresentation addSymbol: [BCNucleotideRNA baseForSymbol: 'T']]; } return rnaStrictSymbolSetRepresentation; } BBEdit did the job for me, and BBEdit does not know biology very well, apparently. I will file a bug report to Bare Bones Software. Well, and it was ~midnight... charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From charles.parnot at stanford.edu Mon Mar 14 03:13:04 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Mon, 14 Mar 2005 00:13:04 -0800 Subject: [Biococoa-dev] (no subject) Message-ID: At 9:28 AM +0100 3/12/05, Alexander Griekspoor wrote: >Sounds awesome Charles, great ideas. I guess many algorithms can benefit from this approach. It's indeed very wise to "standardize" this conversion path and provide some "legal" form to go from symbols to c structures and vice versa for performance reasons in the case that native use of BCSequences is not possible or does not suffice. >Alex At 10:41 PM +0100 3/13/05, Alexander Griekspoor wrote: >Hmmm, somehow I totally miss the reason the remapping. Why would it be leaner/faster? Somehow, you have to explain that in more details ;-) The BCSymbolMapping class proposed by Phil is exactly what I had in mind. I would add the following methods: - (char *)charMappingForSequence:(BCAbstractSequence *)sequence; - (char **)charMappingForScoreMatrix:yadayada..; ... and the same backwards... The BCSymbolMapping can even take care of the malloc, like put above (with automatic autorelease; I can give more details how). It could implement some caching in the future if needed (@Phil: BTW, I would rather have BCSymbolMapping do the caching than BCScoreMatrix, ref: a previous email from you, see what I mean?). The whole idea of this class, again, would be to have a separate class that takes care of the mapping, and only of the mapping: objects ------> C ------> algorithm -------> C -------> Objects The algorithm should not know anything about the biology. I would not want to see anything like -whatevermatrix['A']['G']- in the middle of the algorithm. Having the mapping done in a separate class allows to write the algorithm like this: BCSymbolSet *set=....union of the symbol sets of seq 1 and 2... BCSymbolMapping *mapping=[BCSymbolMapping mappingWithSymbolSet:set]; char *seq1=[mapping charMappingForSequence:sequenceObject1]; char *seq2=[mapping charMappingForSequence:sequenceObject2]; int **scores=[mapping charMappingForScoreMatrix:matrix]; // .... run the algorithm... BCSequenceAlignment *result=[mapping alignementForSequences(int)count length:(int)length charBuffer:(char*)seqs]; Again, I do think that mapping to the representing char of a symbol will make sense and might do the job (and will be VERY convenient for debugging), so I agree with you Koen and Alex. But separating the mapping step allows for easier modifications in the future: * it is possible that a 16 bytes score matrix will use the caches more efficiently than a 16 kilobytes; it is not just a RAM issue; L2 cache is 512 kb on dual G5, not sure about L1; if may even fit in registers (?) * if a score is an int or a float, the matrix is actually 128 x 128 x 4 = 64 kilobytes * it is possible that int will be better than char because of the cast step? I know it is a big issue for float to int, but I don't know about char --> int; so maybe we will use int? The most important is: we don't know yet any of that and we will know only later, after running Shark on real cases. If we have everything in place to easily test and choose the best mapping, it will be easier. Also, the mapping could be useful for other purposes (like saving as binary and compress, but not the best example!). Finally, if we find that we need to improve the mapping step, at least there will be mostly one class that will have to be modified. The mapping class may evolve to take more parameters and implement different approaches depending on the symbol set (at which point it would become a class cluster, but don't get me there). Sorry this whole email comes a bit after the discussion, but my main point is to make a case in favor of a separate class for mapping. I think it will help, and not obfuscate things, but actually separate things better, and make them clearer! Phil, hang in there. Let's not let these guys take us down ;-) charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From biococoa at bioworxx.com Mon Mar 14 03:19:58 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Mon, 14 Mar 2005 09:19:58 +0100 Subject: [Biococoa-dev] BCPairwiseAlignment In-Reply-To: <04f6e5330d6947d613b3dde3bf65fbea@earthlink.net> References: <04f6e5330d6947d613b3dde3bf65fbea@earthlink.net> Message-ID: Am 14.03.2005 um 01:49 schrieb Koen van der Drift: > > On Mar 13, 2005, at 9:12 AM, Philipp Seibel wrote: > >> Btw: How can i change the BCAlignment to BCSequenceAlignment. Should >> i create new files ?? >> > > Should BCPairwiseAlignment still be in the framework? I noticed you > removed it from BCFoundation.h, but it is still in the project. I still need the BCPairwiseAlignment.m, but not the header file. The .m imports the header BCSequenceAlignment.h. I did this to seperate the code. Phil From charles.parnot at stanford.edu Mon Mar 14 03:24:46 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Mon, 14 Mar 2005 00:24:46 -0800 Subject: [Biococoa-dev] initWithSymbol In-Reply-To: <91169e07ad4764207f3459b072d6db97@earthlink.net> References: <51b1192b9b62ce475bd1069c70a0000e@earthlink.net> <3dd08ba9fc72202ce88532624cfaaa3b@nki.nl> <5c1e3e75d8fc4041096bf4216cb78094@earthlink.net> <91169e07ad4764207f3459b072d6db97@earthlink.net> Message-ID: At 7:38 AM -0500 3/12/05, Koen van der Drift wrote: >On Mar 12, 2005, at 7:17 AM, Koen van der Drift wrote: > >>So would it be safe to have it return BCSymbol instead of id? Or is there anotherSolution, maybe use initWithSymbolChar? >> > >I changed it to use initWithSymbolChar and also commited Charles' request to replace aaForSymbol and baseForSymbol with symbolForChar. > > >- Koen. Thanks, Koen! And thanks for the modifs to the dev-docs. They make sense and the new item looks good to me. Regarding the symbol init, I know it is too late, but another option could have been 'initWithUnichar:'... After all, the conflict was with a method with the same name but different signature, because really using the real 'char' type. charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From a.griekspoor at nki.nl Mon Mar 14 03:31:54 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Mon, 14 Mar 2005 09:31:54 +0100 Subject: [Biococoa-dev] (no subject) In-Reply-To: References: Message-ID: <89b447e344a923125b4960ba5196df96@nki.nl> On 14-mrt-05, at 9:13, Charles PARNOT wrote: > At 9:28 AM +0100 3/12/05, Alexander Griekspoor wrote: >> Sounds awesome Charles, great ideas. I guess many algorithms can >> benefit from this approach. It's indeed very wise to "standardize" >> this conversion path and provide some "legal" form to go from symbols >> to c structures and vice versa for performance reasons in the case >> that native use of BCSequences is not possible or does not suffice. >> Alex > > At 10:41 PM +0100 3/13/05, Alexander Griekspoor wrote: >> Hmmm, somehow I totally miss the reason the remapping. Why would it >> be leaner/faster? You are right, sounds pretty contradictory (and is to some extent as Koen made a good point which made me think about it again and see things differently a bit). The point is that I do see the need for a CONVERTION of BCSequences to c structures (i.e., c arrays) that's clear. However I do not see the need for REMAPPING char symbols to different characters. Given that, and Koen's remark, I do not see why this would need a special object and can't be done in two or 3 methods in BCSequence itself. Hope that makes my schizophrenia more explainable ;-) More to follow... Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com 4Peaks - For Peaks, Four Peaks. 2004 Winner of the Apple Design Awards Best Mac OS X Student Product http://www.mekentosj.com/4peaks ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com The requirements said: Windows 2000 or better. So I got a Macintosh. ********************************************************* From a.griekspoor at nki.nl Mon Mar 14 03:46:29 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Mon, 14 Mar 2005 09:46:29 +0100 Subject: [Biococoa-dev] (no subject) In-Reply-To: References: Message-ID: <42857544500b21b42dd9f5913091fabf@nki.nl> > Somehow, you have to explain that in more details ;-) > > The BCSymbolMapping class proposed by Phil is exactly what I had in > mind. I would add the following methods: > - (char *)charMappingForSequence:(BCAbstractSequence *)sequence; > - (char **)charMappingForScoreMatrix:yadayada..; > ... and the same backwards... > > The BCSymbolMapping can even take care of the malloc, like put above > (with automatic autorelease; I can give more details how). It could > implement some caching in the future if needed (@Phil: BTW, I would > rather have BCSymbolMapping do the caching than BCScoreMatrix, ref: a > previous email from you, see what I mean?). Caching would be nice, but again, why not let the BCSequence do the job itself (no hassle with helper objects), it's also THE place to store the cache IMHO... > The whole idea of this class, again, would be to have a separate class > that takes care of the mapping, and only of the mapping: > > objects ------> C ------> algorithm -------> C -------> Objects > > The algorithm should not know anything about the biology. I would not > want to see anything like -whatevermatrix['A']['G']- in the middle of > the algorithm. Having the mapping done in a separate class allows to > write the algorithm like this: Well, perhaps I'm more humanoid, but I like it better than whatevermatrix['0x00']['0x03']; > Also it would change the code dramatically as well: BCSymbolSet *set=....union of the symbol sets of seq 1 and 2... -> not necessary (unless we make the matrix creation dependent on the symbolset (see below) BCSymbolMapping *mapping=[BCSymbolMapping mappingWithSymbolSet:set]; -> not necessary char *seq1=[mapping charMappingForSequence:sequenceObject1]; -> same char *seq2=[mapping charMappingForSequence:sequenceObject2]; -> same int **scores=[mapping charMappingForScoreMatrix:matrix]; -> int **scores = [BCAlignment matrixForSymbolSet: set]; // .... run the algorithm... BCSequenceAlignment *result=[BCAlignment alignementForSequences(int)count length:(int)length charBuffer:(char*)seqs]; -> Why make BCSymbolmapping the mother of alignments?! > > Again, I do think that mapping to the representing char of a symbol > will make sense and might do the job (and will be VERY convenient for > debugging), so I agree with you Koen and Alex. > But separating the mapping step allows for easier modifications in the > future: > * it is possible that a 16 bytes score matrix will use the caches more > efficiently than a 16 kilobytes; it is not just a RAM issue; L2 cache > is 512 kb on dual G5, not sure about L1; if may even fit in registers > (?) Yes could be, but I really doubt if this is the bottleneck in the algorithm, this would be a typical example of doing lots of tuning before we even know where the problem is! Let's first make the thing in the SIMPLE way and then optimize it. We can always implement the remapping IF indeed there's lots to win in this area. > * if a score is an int or a float, the matrix is actually 128 x 128 x > 4 = 64 kilobytes That's right, but come on, 64kb that's nothing. > * it is possible that int will be better than char because of the cast > step? I know it is a big issue for float to int, but I don't know > about char --> int; so maybe we will use int? Same thing, let's make the thing and Shark will tell us. > > The most important is: we don't know yet any of that and we will know > only later, after running Shark on real cases. Aha, to early again ;-) > If we have everything in place to easily test and choose the best > mapping, it will be easier. No mapping it all ;-) > Also, the mapping could be useful for other purposes (like saving as > binary and compress, but not the best example!). Finally, if we find > that we need to improve the mapping step, at least there will be > mostly one class that will have to be modified. Or none, well you got the point. Sorry for that couldn't resist. > The mapping class may evolve to take more parameters and implement > different approaches depending on the symbol set (at which point it > would become a class cluster, but don't get me there). > > Sorry this whole email comes a bit after the discussion, but my main > point is to make a case in favor of a separate class for mapping. I > think it will help, and not obfuscate things, but actually separate > things better, and make them clearer! That I have no problem with, I believe there might be a need in the future for this thing, but I don't see why we would need it in alignments before we start to optimize things, and thus I don't see why we would implement it now if there's not yet a purpose. We can better focus on writing a damn fast BCSequence to char array converter ;-) > > Phil, hang in there. Let's not let these guys take us down ;-) GRRRRR!!!! LOL, Cheers mates! Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com iRNAi, do you? http://www.mekentosj.com/irnai ********************************************************* From biococoa at bioworxx.com Mon Mar 14 03:46:44 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Mon, 14 Mar 2005 09:46:44 +0100 Subject: [Biococoa-dev] (no subject) In-Reply-To: <89b447e344a923125b4960ba5196df96@nki.nl> References: <89b447e344a923125b4960ba5196df96@nki.nl> Message-ID: Am 14.03.2005 um 09:31 schrieb Alexander Griekspoor: > On 14-mrt-05, at 9:13, Charles PARNOT wrote: > >> At 9:28 AM +0100 3/12/05, Alexander Griekspoor wrote: >>> Sounds awesome Charles, great ideas. I guess many algorithms can >>> benefit from this approach. It's indeed very wise to "standardize" >>> this conversion path and provide some "legal" form to go from >>> symbols to c structures and vice versa for performance reasons in >>> the case that native use of BCSequences is not possible or does not >>> suffice. >>> Alex >> >> At 10:41 PM +0100 3/13/05, Alexander Griekspoor wrote: >>> Hmmm, somehow I totally miss the reason the remapping. Why would it >>> be leaner/faster? > > You are right, sounds pretty contradictory (and is to some extent as > Koen made a good point which made me think about it again and see > things differently a bit). The point is that I do see the need for a > CONVERTION of BCSequences to c structures (i.e., c arrays) that's > clear. However I do not see the need for REMAPPING char symbols to > different characters. This is a good point for large sequences. Could be much faster just to call +stringWithCString, but we will see. Oh this discussion is going to make me schizo, too. > Given that, and Koen's remark, I do not see why this would need a > special object and can't be done in two or 3 methods in BCSequence > itself. Hope that makes my schizophrenia more explainable ;-) More to > follow... btw: i still like charles version better ;-) Phil From biococoa at bioworxx.com Mon Mar 14 03:58:28 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Mon, 14 Mar 2005 09:58:28 +0100 Subject: [Biococoa-dev] (no subject) In-Reply-To: <42857544500b21b42dd9f5913091fabf@nki.nl> References: <42857544500b21b42dd9f5913091fabf@nki.nl> Message-ID: <71fb92129503c74873ca4527d979fb90@bioworxx.com> Wow, seems to become a very hot topic. > Well, perhaps I'm more humanoid, but I like it better than > whatevermatrix['0x00']['0x03']; fast algorithms may not be human readable ;-) >> > Also it would change the code dramatically as well: > BCSymbolSet *set=....union of the symbol sets of seq 1 and 2... -> > not necessary (unless we make the matrix creation dependent on the > symbolset (see below) > BCSymbolMapping *mapping=[BCSymbolMapping mappingWithSymbolSet:set]; > -> not necessary > char *seq1=[mapping charMappingForSequence:sequenceObject1]; -> same > char *seq2=[mapping charMappingForSequence:sequenceObject2]; -> same > int **scores=[mapping charMappingForScoreMatrix:matrix]; -> int > **scores = [BCAlignment matrixForSymbolSet: set]; can't agree with this, because we need make the scoringMatrix customizable, so caching and converting has to be outside the BCAlignment class. Phil > // .... run the algorithm... > BCSequenceAlignment *result=[BCAlignment > alignementForSequences(int)count length:(int)length > charBuffer:(char*)seqs]; -> Why make BCSymbolmapping the mother of > alignments?! > >> >> Again, I do think that mapping to the representing char of a symbol >> will make sense and might do the job (and will be VERY convenient for >> debugging), so I agree with you Koen and Alex. > >> But separating the mapping step allows for easier modifications in >> the future: >> * it is possible that a 16 bytes score matrix will use the caches >> more efficiently than a 16 kilobytes; it is not just a RAM issue; L2 >> cache is 512 kb on dual G5, not sure about L1; if may even fit in >> registers (?) > Yes could be, but I really doubt if this is the bottleneck in the > algorithm, this would be a typical example of doing lots of tuning > before we even know where the problem is! Let's first make the thing > in the SIMPLE way and then optimize it. We can always implement the > remapping IF indeed there's lots to win in this area. > >> * if a score is an int or a float, the matrix is actually 128 x 128 x >> 4 = 64 kilobytes > That's right, but come on, 64kb that's nothing. > >> * it is possible that int will be better than char because of the >> cast step? I know it is a big issue for float to int, but I don't >> know about char --> int; so maybe we will use int? > Same thing, let's make the thing and Shark will tell us. >> >> The most important is: we don't know yet any of that and we will know >> only later, after running Shark on real cases. > Aha, to early again ;-) >> If we have everything in place to easily test and choose the best >> mapping, it will be easier. > No mapping it all ;-) >> Also, the mapping could be useful for other purposes (like saving as >> binary and compress, but not the best example!). Finally, if we find >> that we need to improve the mapping step, at least there will be >> mostly one class that will have to be modified. > Or none, well you got the point. Sorry for that couldn't resist. >> The mapping class may evolve to take more parameters and implement >> different approaches depending on the symbol set (at which point it >> would become a class cluster, but don't get me there). >> >> Sorry this whole email comes a bit after the discussion, but my main >> point is to make a case in favor of a separate class for mapping. I >> think it will help, and not obfuscate things, but actually separate >> things better, and make them clearer! > That I have no problem with, I believe there might be a need in the > future for this thing, but I don't see why we would need it in > alignments before we start to optimize things, and thus I don't see > why we would implement it now if there's not yet a purpose. We can > better focus on writing a damn fast BCSequence to char array converter > ;-) >> >> Phil, hang in there. Let's not let these guys take us down ;-) > GRRRRR!!!! LOL, > Cheers mates! > Alex From charles.parnot at stanford.edu Mon Mar 14 09:43:27 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Mon, 14 Mar 2005 06:43:27 -0800 Subject: Fwd: Re: [Biococoa-dev] (no subject) Message-ID: Phil, you sent this to me, but not the list, it seems (?). >X-Sieve: CMU Sieve 2.2 >From: Philipp Seibel >Subject: Re: [Biococoa-dev] (no subject) >Date: Mon, 14 Mar 2005 09:38:59 +0100 >To: Charles PARNOT >X-Virus-Scanned: by amavisd-new at mail.hoehmann.biz > > >Am 14.03.2005 um 09:13 schrieb Charles PARNOT: > >>At 9:28 AM +0100 3/12/05, Alexander Griekspoor wrote: >>>Sounds awesome Charles, great ideas. I guess many algorithms can benefit from this approach. It's indeed very wise to "standardize" this conversion path and provide some "legal" form to go from symbols to c structures and vice versa for performance reasons in the case that native use of BCSequences is not possible or does not suffice. >>>Alex >> >>At 10:41 PM +0100 3/13/05, Alexander Griekspoor wrote: >>>Hmmm, somehow I totally miss the reason the remapping. Why would it be leaner/faster? >> >> >>Somehow, you have to explain that in more details ;-) >> >>The BCSymbolMapping class proposed by Phil is exactly what I had in mind. I would add the following methods: >>- (char *)charMappingForSequence:(BCAbstractSequence *)sequence; >>- (char **)charMappingForScoreMatrix:yadayada..; >>... and the same backwards... >> >>The BCSymbolMapping can even take care of the malloc, like put above (with automatic autorelease; I can give more details how). It could implement some caching in the future if needed (@Phil: BTW, I would rather have BCSymbolMapping do the caching than BCScoreMatrix, ref: a previous email from you, see what I mean?). >> >> >> >>The whole idea of this class, again, would be to have a separate class that takes care of the mapping, and only of the mapping: >> >> objects ------> C ------> algorithm -------> C -------> Objects >> >>The algorithm should not know anything about the biology. I would not want to see anything like -whatevermatrix['A']['G']- in the middle of the algorithm. Having the mapping done in a separate class allows to write the algorithm like this: >> >>BCSymbolSet *set=....union of the symbol sets of seq 1 and 2... >>BCSymbolMapping *mapping=[BCSymbolMapping mappingWithSymbolSet:set]; >>char *seq1=[mapping charMappingForSequence:sequenceObject1]; >>char *seq2=[mapping charMappingForSequence:sequenceObject2]; >>int **scores=[mapping charMappingForScoreMatrix:matrix]; >>// .... run the algorithm... >>BCSequenceAlignment *result=[mapping alignementForSequences(int)count length:(int)length charBuffer:(char*)seqs]; >> >>Again, I do think that mapping to the representing char of a symbol will make sense and might do the job (and will be VERY convenient for debugging), so I agree with you Koen and Alex. But separating the mapping step allows for easier modifications in the future: >>* it is possible that a 16 bytes score matrix will use the caches more efficiently than a 16 kilobytes; it is not just a RAM issue; L2 cache is 512 kb on dual G5, not sure about L1; if may even fit in registers (?) >>* if a score is an int or a float, the matrix is actually 128 x 128 x 4 = 64 kilobytes >>* it is possible that int will be better than char because of the cast step? I know it is a big issue for float to int, but I don't know about char --> int; so maybe we will use int? >> >>The most important is: we don't know yet any of that and we will know only later, after running Shark on real cases. If we have everything in place to easily test and choose the best mapping, it will be easier. Also, the mapping could be useful for other purposes (like saving as binary and compress, but not the best example!). Finally, if we find that we need to improve the mapping step, at least there will be mostly one class that will have to be modified. The mapping class may evolve to take more parameters and implement different approaches depending on the symbol set (at which point it would become a class cluster, but don't get me there). >> >>Sorry this whole email comes a bit after the discussion, but my main point is to make a case in favor of a separate class for mapping. I think it will help, and not obfuscate things, but actually separate things better, and make them clearer! > >Puh charles, last second ;-) > >>Phil, hang in there. Let's not let these guys take us down ;-) > >Here i am !!!! I personally like that approach very much, because it will allow us to adapt more sequence based algorithms in c. And we will get FAST !!! And we need to get fast, because the next step i want to go is phylogenetics ..... ( but this is a long long way ;-) ) > >So charles perhaps you can set up the class, and i take it to finish the alignment stuff ( BCSequenceAlignment, BCScoreMatrix ). >When the class is there the discussion will end :-). >Can't remember who brought it up ...... lol > > >Phil -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From jtimmer at bellatlantic.net Tue Mar 15 10:40:18 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Tue, 15 Mar 2005 10:40:18 -0500 Subject: [Biococoa-dev] BCScanner In-Reply-To: <15ff215188b6f2477d50e97fb09cee9d@earthlink.net> Message-ID: > Because it is needed for the BCDigest class, I started working on the > BCScanner class. First method I attempted was > > - (BOOL)scanSequence:(BCAbstractSequence*)subSequence > intoSequence:(BCAbstractSequence **)value. > > Trying to emulate what NSScanner does with a string, I am now using > BCToolSequenceFinder to find the first occurance of the passed > sequence. Because the BCScanner is probably going to be called a number > of times in succession, I am doubting if using the BCToolSequenceFinder > in this way is the most efficient. Maybe the BCToolSequenceFinder > should be an ivar of the class? That would probably also be a better > way to keep track of the scanLocation. > > I committed my first attempt, so have a look and let me know what you > think. Koen - I haven't had a chance to checkout the code you put in, but just a quick question: what's the advantage to having a scanner for digests, as opposed to just having the sequence finder return the array of all site ranges? Just a gut response, but having one less intervening object would improve code efficiency and readability, so I'm wondering what having the scanner would provide. In no way am I saying we shouldn't eventually create a scanner, mind you, just wondering about its use in this case. Incidentally, I've looked over the key method "findSequence" in the sequence finder, and I think it would be very easy to optimize this in some significant ways, at the expense of only a little readability. A few of my ideas: We have several "if" conditionals inside tight loops - if we inverted things and put the conditionals outside the loops (which would cause some code repetition), we'd cut down on the code inside the loops substantially. I think we'd have to code 4 separate loops (only 1 of which would be used during a given method call). We call to an external method - "compareSymbols" - within the loop. The method's very short, so we could either move it inside the loop, or convert it to a static inline function (which essentially does the same thing). In that method, we use a "isEqualToSymbol" operator between two objects in the case where we're looking at a strict comparison. Since we're only using singletons for symbols, we could replace this with "==" and cut out the overhead of function calls. Any objections to me implementing this? I may even put some cases in the test app and do a before/after comparison. JT _______________________________________________ This mind intentionally left blank From jtimmer at bellatlantic.net Tue Mar 15 10:54:44 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Tue, 15 Mar 2005 10:54:44 -0500 Subject: [Biococoa-dev] Weighted sequence score Message-ID: One of the things the alignment work has gotten me thinking about implementing is a weighted sequence score. This is for situations like splice sites or transcription factor binding sites, where you don't tend to have absolute sequences, but often have situations like "80% of the time, the first base is an A, and when it's not, 15% of the time it's a G". The best you can do is evaluate how close a given sequence is to the ideal sequence - ie, the best score you can get at position 1 in the example above is only 80%, not 100%. The actual implementation of this doesn't seem that hard, but the details are driving me nuts. Three in particular: How to provide the user a way to set up the scoring table. My best idea would be to require a formatted string, like this: A:80,G:15,C:5 T:60,C:40 Etc. Does this sound good? The second is ambiguity. I could just require that the queried sequence be strict, but that seems pretty limiting. The question then becomes how to evaluate a situation where the first base in the example above is compared to a purine? It shouldn't score as well as matching A, but it shouldn't be penalized as much as matching to an N. I could just require the user to supply a value for purines, but that may become a real pain for fairly ambiguous sequences. Non-100% value totals. What if the user, for base 1, doesn't supply a C value, meaning that 5% of the time it could be anything? I could just score it as 5%. The problem with that is how to score position where there's 100% defined symbols, but it's compared with an N? My gut response there would be to give a 25% score, but then that's penalized less than a known base that gets the 5% score, which seems odd. Anyway, ideas or suggestions would be welcome. In the mean time, I'm probably going to try to dig through BioJava and see what they do. JT _______________________________________________ This mind intentionally left blank From biococoa at bioworxx.com Tue Mar 15 11:16:37 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Tue, 15 Mar 2005 17:16:37 +0100 Subject: [Biococoa-dev] Weighted sequence score In-Reply-To: References: Message-ID: <058cf4eddba810fccff8f17083089545@bioworxx.com> Am 15.03.2005 um 16:54 schrieb John Timmer: > One of the things the alignment work has gotten me thinking about > implementing is a weighted sequence score. This is for situations like > splice sites or transcription factor binding sites, where you don't > tend to > have absolute sequences, but often have situations like "80% of the > time, > the first base is an A, and when it's not, 15% of the time it's a G". > The > best you can do is evaluate how close a given sequence is to the ideal > sequence - ie, the best score you can get at position 1 in the example > above is only 80%, not 100%. Seems to be something like sequence profiles, am i right ? You want to now how good a sequence fits to a profile of other sequences, which is made for example out of an alignment ? Thats a very good thing, id like to have this as well. Could be used for sequence searching, or phylogenetics. Phil > > The actual implementation of this doesn't seem that hard, but the > details > are driving me nuts. Three in particular: > > How to provide the user a way to set up the scoring table. My best > idea > would be to require a formatted string, like this: > A:80,G:15,C:5 > T:60,C:40 > Etc. > Does this sound good? > > The second is ambiguity. I could just require that the queried > sequence be > strict, but that seems pretty limiting. The question then becomes how > to > evaluate a situation where the first base in the example above is > compared > to a purine? It shouldn't score as well as matching A, but it > shouldn't be > penalized as much as matching to an N. I could just require the user > to > supply a value for purines, but that may become a real pain for fairly > ambiguous sequences. > > Non-100% value totals. What if the user, for base 1, doesn't supply a > C > value, meaning that 5% of the time it could be anything? I could just > score > it as 5%. The problem with that is how to score position where > there's > 100% defined symbols, but it's compared with an N? My gut response > there > would be to give a 25% score, but then that's penalized less than a > known > base that gets the 5% score, which seems odd. > > Anyway, ideas or suggestions would be welcome. In the mean time, I'm > probably going to try to dig through BioJava and see what they do. > > JT > > _______________________________________________ > This mind intentionally left blank > > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > From a.griekspoor at nki.nl Tue Mar 15 13:06:26 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Tue, 15 Mar 2005 19:06:26 +0100 Subject: [Biococoa-dev] BCScanner In-Reply-To: References: Message-ID: > Any objections to me implementing this? I may even put some cases in > the > test app and do a before/after comparison. Sounds good! Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows is a 32-bit patch to a 16-bit shell for an 8-bit operating system, written for a 4-bit processor by a 2- bit company without 1 bit of sense. ********************************************************* From a.griekspoor at nki.nl Tue Mar 15 15:03:49 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Tue, 15 Mar 2005 21:03:49 +0100 Subject: [Biococoa-dev] Weighted sequence score In-Reply-To: References: Message-ID: On 15-mrt-05, at 16:54, John Timmer wrote: > One of the things the alignment work has gotten me thinking about > implementing is a weighted sequence score. This is for situations like > splice sites or transcription factor binding sites, where you don't > tend to > have absolute sequences, but often have situations like "80% of the > time, > the first base is an A, and when it's not, 15% of the time it's a G". > The > best you can do is evaluate how close a given sequence is to the ideal > sequence - ie, the best score you can get at position 1 in the example > above is only 80%, not 100%. Nice idea indeed, perfect to find consensus sequences in your sequence. > > The actual implementation of this doesn't seem that hard, but the > details > are driving me nuts. Three in particular: > > How to provide the user a way to set up the scoring table. My best > idea > would be to require a formatted string, like this: > A:80,G:15,C:5 > T:60,C:40 > Etc. > Does this sound good? Hmm, not really, but I don't have a good alternative either, perhaps some "consensus site object". > > The second is ambiguity. I could just require that the queried > sequence be > strict, but that seems pretty limiting. Absolutely because that's the idea of the thing right! If I'm not allowed to input W:100, I will just input A:50, T:50 right ;-) In fact that is how you might solve the problem... I'll think about the other problems John... Alex > The question then becomes how to > evaluate a situation where the first base in the example above is > compared > to a purine? It shouldn't score as well as matching A, but it > shouldn't be > penalized as much as matching to an N. I could just require the user > to > supply a value for purines, but that may become a real pain for fairly > ambiguous sequences. > > Non-100% value totals. What if the user, for base 1, doesn't supply a > C > value, meaning that 5% of the time it could be anything? I could just > score > it as 5%. The problem with that is how to score position where > there's > 100% defined symbols, but it's compared with an N? My gut response > there > would be to give a 25% score, but then that's penalized less than a > known > base that gets the 5% score, which seems odd. > > Anyway, ideas or suggestions would be welcome. In the mean time, I'm > probably going to try to dig through BioJava and see what they do. > > JT > > _______________________________________________ > This mind intentionally left blank > > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Claiming that the Macintosh is inferior to Windows because most people use Windows, is like saying that all other restaurants serve food that is inferior to McDonalds ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com 4Peaks - For Peaks, Four Peaks. 2004 Winner of the Apple Design Awards Best Mac OS X Student Product http://www.mekentosj.com/4peaks ********************************************************* From jtimmer at bellatlantic.net Tue Mar 15 15:45:09 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Tue, 15 Mar 2005 15:45:09 -0500 Subject: [Biococoa-dev] Optimizations In-Reply-To: Message-ID: Well, my work has paid off in an ironic way. I cut about 1/4 off the time for the strict search. Unfortunately, I caught a bug in the non-strict search (the first equivalence test only tested representations in one direction, instead of in both). Fixing that greatly increased the comparisons done, which nicely doubled the time it took. Sigh. Anyway, I committed the changes, and left the previous version there as slow_findSequence. Maybe I should change that to faster_buggier_findSequence? Right now, about 1/4 the time is spent in the "representsSymbol" method, querying arrays, so I'll look into finding a way to speed that up. JT > Incidentally, I've looked over the key method "findSequence" in the sequence > finder, and I think it would be very easy to optimize this in some > significant ways, at the expense of only a little readability. A few of my > ideas: > We have several "if" conditionals inside tight loops - if we inverted things > and put the conditionals outside the loops (which would cause some code > repetition), we'd cut down on the code inside the loops substantially. I > think we'd have to code 4 separate loops (only 1 of which would be used > during a given method call). > > We call to an external method - "compareSymbols" - within the loop. The > method's very short, so we could either move it inside the loop, or convert > it to a static inline function (which essentially does the same thing). > > In that method, we use a "isEqualToSymbol" operator between two objects in > the case where we're looking at a strict comparison. Since we're only using > singletons for symbols, we could replace this with "==" and cut out the > overhead of function calls. > _______________________________________________ This mind intentionally left blank From jtimmer at bellatlantic.net Tue Mar 15 16:34:39 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Tue, 15 Mar 2005 16:34:39 -0500 Subject: [Biococoa-dev] Optimizations In-Reply-To: Message-ID: > Right now, about 1/4 the time is spent in the "representsSymbol" method, > querying arrays, so I'll look into finding a way to speed that up. In answer to my own question, NSSet seems to be more efficient than an array for this use. Any thoughts on using one? We could either use an internal, private ivar only for tests such as this, or change the array to a set. Arrays and sets seem pretty readily convertible, and all this is in the BCSymbol class, so this shouldn't be a big deal. JT _______________________________________________ This mind intentionally left blank From biococoa at bioworxx.com Tue Mar 15 17:06:02 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Tue, 15 Mar 2005 23:06:02 +0100 Subject: [Biococoa-dev] Weighted sequence score In-Reply-To: References: Message-ID: Am 15.03.2005 um 21:03 schrieb Alexander Griekspoor: > On 15-mrt-05, at 16:54, John Timmer wrote: > >> One of the things the alignment work has gotten me thinking about >> implementing is a weighted sequence score. Its more a weighted base score, isn't it ? >> This is for situations like >> splice sites or transcription factor binding sites, where you don't >> tend to >> have absolute sequences, but often have situations like "80% of the >> time, >> the first base is an A, and when it's not, 15% of the time it's a G". >> The >> best you can do is evaluate how close a given sequence is to the ideal >> sequence - ie, the best score you can get at position 1 in the >> example >> above is only 80%, not 100%. > Nice idea indeed, perfect to find consensus sequences in your > sequence.S A Profile is nothing else than a bunch of sequences represented by a "probabilistic" model. So if you look at it, like 80% of my sequences have at a specific position an A and 15% of them have a G, it will bring you to a convenient method like: + (BCSequenceProfile *)profileWithSequenceArray:(NSArray *)array; >> >> The actual implementation of this doesn't seem that hard, but the >> details >> are driving me nuts. Three in particular: >> >> How to provide the user a way to set up the scoring table. My best >> idea >> would be to require a formatted string, like this: >> A:80,G:15,C:5 >> T:60,C:40 >> Etc. >> Does this sound good? > Hmm, not really, but I don't have a good alternative either, perhaps > some "consensus site object". Don't think we will need it, because you can construct a profile like this: sequenceA : AAAATATAGC sequenceB : AAATATATAT sequenceC: AAATTATATT with the previous described method A: 100 A: 100 A: 100 A: 33 T: 66 A: 33 T: 66 .... Of course profiles could have a convenient method like this: + (BCSequenceProfile *)profileWithAlignment:(BCSequenceAlignment *)alignment; Phil >> >> The second is ambiguity. I could just require that the queried >> sequence be >> strict, but that seems pretty limiting. > Absolutely because that's the idea of the thing right! If I'm not > allowed to input W:100, I will just input A:50, T:50 right ;-) In fact > that is how you might solve the problem... > I'll think about the other problems John... > Alex > > >> The question then becomes how to >> evaluate a situation where the first base in the example above is >> compared >> to a purine? It shouldn't score as well as matching A, but it >> shouldn't be >> penalized as much as matching to an N. I could just require the user >> to >> supply a value for purines, but that may become a real pain for fairly >> ambiguous sequences. >> >> Non-100% value totals. What if the user, for base 1, doesn't supply >> a C >> value, meaning that 5% of the time it could be anything? I could >> just score >> it as 5%. The problem with that is how to score position where >> there's >> 100% defined symbols, but it's compared with an N? My gut response >> there >> would be to give a 25% score, but then that's penalized less than a >> known >> base that gets the 5% score, which seems odd. >> >> Anyway, ideas or suggestions would be welcome. In the mean time, I'm >> probably going to try to dig through BioJava and see what they do. >> >> JT >> >> _______________________________________________ >> This mind intentionally left blank >> >> >> _______________________________________________ >> Biococoa-dev mailing list >> Biococoa-dev at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/biococoa-dev >> >> > ********************************************************* > ** Alexander Griekspoor ** > ********************************************************* > The Netherlands Cancer Institute > Department of Tumorbiology (H4) > Plesmanlaan 121, 1066 CX, Amsterdam > Tel: + 31 20 - 512 2023 > Fax: + 31 20 - 512 2029 > AIM: mekentosj at mac.com > E-mail: a.griekspoor at nki.nl > Web: http://www.mekentosj.com > > Claiming that the Macintosh is inferior to Windows > because most people use Windows, is like saying > that all other restaurants serve food that is > inferior to McDonalds > > ********************************************************* > > > ********************************************************* > ** Alexander Griekspoor ** > ********************************************************* > The Netherlands Cancer Institute > Department of Tumorbiology (H4) > Plesmanlaan 121, 1066 CX, Amsterdam > Tel: + 31 20 - 512 2023 > Fax: + 31 20 - 512 2029 > AIM: mekentosj at mac.com > E-mail: a.griekspoor at nki.nl > Web: http://www.mekentosj.com > > 4Peaks - For Peaks, Four Peaks. > 2004 Winner of the Apple Design Awards > Best Mac OS X Student Product > http://www.mekentosj.com/4peaks > > ********************************************************* > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > From jtimmer at bellatlantic.net Tue Mar 15 17:36:51 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Tue, 15 Mar 2005 17:36:51 -0500 Subject: [Biococoa-dev] Weighted sequence score In-Reply-To: Message-ID: >>> One of the things the alignment work has gotten me thinking about >>> implementing is a weighted sequence score. > > Its more a weighted base score, isn't it ? Well, it should work for anything, although the examples I'm thinking of using it for are bases. That's just my personal bias, though. Regarding what's below, I think it's a great idea as one alternative for creating a weighted consensus, but it's not a general case. There's statistics generated from several hundred mammalian splice sites, which have awkward fractions like 43%, and we'd need a way of having those imported without forcing the user to create enough sequences to generate a 43% fraction. > A Profile is nothing else than a bunch of sequences represented by a > "probabilistic" model. So if you look at it, like > 80% of my sequences have at a specific position an A and 15% of them > have a G, it will bring you to a convenient method like: > > + (BCSequenceProfile *)profileWithSequenceArray:(NSArray *)array; > >> Hmm, not really, but I don't have a good alternative either, perhaps >> some "consensus site object". > > Don't think we will need it, because you can construct a profile like > this: > > sequenceA : AAAATATAGC > sequenceB : AAATATATAT > sequenceC: AAATTATATT > > with the previous described method > > A: 100 > A: 100 > A: 100 > A: 33 T: 66 > A: 33 T: 66 > .... > > Of course profiles could have a convenient method like this: > > + (BCSequenceProfile *)profileWithAlignment:(BCSequenceAlignment > *)alignment; > > Phil _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Tue Mar 15 17:44:06 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue, 15 Mar 2005 17:44:06 -0500 Subject: [Biococoa-dev] BCPairwiseAlignment In-Reply-To: References: <04f6e5330d6947d613b3dde3bf65fbea@earthlink.net> Message-ID: On Mar 14, 2005, at 3:19 AM, Philipp Seibel wrote: >> Should BCPairwiseAlignment still be in the framework? I noticed you >> removed it from BCFoundation.h, but it is still in the project. > > I still need the BCPairwiseAlignment.m, but not the header file. The > .m imports the header BCSequenceAlignment.h. I did this to seperate > the code. > Ah, yes, BCPairwiseAlignment is now a category for BCSequenceAlignment. Then BCPairwiseAlignment.h can be removed from the framework? I still have it in my list. - Koen. From biococoa at bioworxx.com Tue Mar 15 18:03:55 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Wed, 16 Mar 2005 00:03:55 +0100 Subject: [Biococoa-dev] Weighted sequence score In-Reply-To: References: Message-ID: Am 15.03.2005 um 23:36 schrieb John Timmer: > >>>> One of the things the alignment work has gotten me thinking about >>>> implementing is a weighted sequence score. >> >> Its more a weighted base score, isn't it ? > > Well, it should work for anything, although the examples I'm thinking > of > using it for are bases. That's just my personal bias, though. i understand, i just wanted to differentiate to what is called "sequence weighting" which is a method to give weights to complete sequences. > Regarding what's below, I think it's a great idea as one alternative > for > creating a weighted consensus, but it's not a general case. There's > statistics generated from several hundred mammalian splice sites, > which have > awkward fractions like 43%, and we'd need a way of having those > imported > without forcing the user to create enough sequences to generate a 43% > fraction. Yes you are right, i was just thinking of the original meaning of a sequence profile. Of course a profile can represent several thousand sequences, so the user shouldn't need to create the profile with all these sequences. There has to be a method to set the percentage for one specific position of course. just wanted to be sure that we are talking about, what usually is called sequence profile. ;-) Phil >> A Profile is nothing else than a bunch of sequences represented by a >> "probabilistic" model. So if you look at it, like >> 80% of my sequences have at a specific position an A and 15% of them >> have a G, it will bring you to a convenient method like: >> >> + (BCSequenceProfile *)profileWithSequenceArray:(NSArray *)array; >> >>> Hmm, not really, but I don't have a good alternative either, perhaps >>> some "consensus site object". >> >> Don't think we will need it, because you can construct a profile like >> this: >> >> sequenceA : AAAATATAGC >> sequenceB : AAATATATAT >> sequenceC: AAATTATATT >> >> with the previous described method >> >> A: 100 >> A: 100 >> A: 100 >> A: 33 T: 66 >> A: 33 T: 66 >> .... >> >> Of course profiles could have a convenient method like this: >> >> + (BCSequenceProfile *)profileWithAlignment:(BCSequenceAlignment >> *)alignment; >> >> Phil > > > _______________________________________________ > This mind intentionally left blank > > > From biococoa at bioworxx.com Tue Mar 15 18:06:09 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Wed, 16 Mar 2005 00:06:09 +0100 Subject: [Biococoa-dev] BCPairwiseAlignment In-Reply-To: References: <04f6e5330d6947d613b3dde3bf65fbea@earthlink.net> Message-ID: Am 15.03.2005 um 23:44 schrieb Koen van der Drift: > > On Mar 14, 2005, at 3:19 AM, Philipp Seibel wrote: > >>> Should BCPairwiseAlignment still be in the framework? I noticed you >>> removed it from BCFoundation.h, but it is still in the project. >> >> I still need the BCPairwiseAlignment.m, but not the header file. The >> .m imports the header BCSequenceAlignment.h. I did this to seperate >> the code. >> > > Ah, yes, BCPairwiseAlignment is now a category for > BCSequenceAlignment. Then BCPairwiseAlignment.h can be removed from > the framework? I still have it in my list. > > - Koen. yes it can. i didn't remove it, because the discussion about the structure of alignments and alignment algorithms isn't finished yet. Phil From kvddrift at earthlink.net Tue Mar 15 19:11:16 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue, 15 Mar 2005 19:11:16 -0500 Subject: [Biococoa-dev] Optimizations In-Reply-To: References: Message-ID: On Mar 15, 2005, at 4:34 PM, John Timmer wrote: > In answer to my own question, NSSet seems to be more efficient than an > array > for this use. Any thoughts on using one? We could either use an > internal, > private ivar only for tests such as this, or change the array to a set. > Arrays and sets seem pretty readily convertible, and all this is in the > BCSymbol class, so this shouldn't be a big deal. > Where do you want to use an NSSet, as a return value for findSequence? The advantage of the array is that the found sequences are in the 'right order'. - Koen. From kvddrift at earthlink.net Tue Mar 15 19:21:36 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue, 15 Mar 2005 19:21:36 -0500 Subject: [Biococoa-dev] BCScanner In-Reply-To: References: Message-ID: <914ae27279cb1c51f5ab4c9da069d8c4@earthlink.net> On Mar 15, 2005, at 10:40 AM, John Timmer wrote: > I haven't had a chance to checkout the code you put in, but just a > quick > question: what's the advantage to having a scanner for digests, as > opposed > to just having the sequence finder return the array of all site ranges? > Just a gut response, but having one less intervening object would > improve > code efficiency and readability, so I'm wondering what having the > scanner > would provide. > > In no way am I saying we shouldn't eventually create a scanner, mind > you, > just wondering about its use in this case. > This dates back to an early discussion, where it was suggested that it would be nice if we have a NSScanner equivalent that deals with native BCSequences. I think Alex made the initial interface of BCScanner, I just started to add some implementation. See this thread: http://bioinformatics.org/pipermail/biococoa-dev/2004-September/ 000286.html If we use a plain NSScanner, we add two conversion steps between a BCSequence and its string which could add some overhead: BCSequence -> string -> NSScanner -> strings -> BCSequences. - Koen. From jtimmer at bellatlantic.net Tue Mar 15 19:29:26 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Tue, 15 Mar 2005 19:29:26 -0500 Subject: [Biococoa-dev] Optimizations In-Reply-To: Message-ID: > > On Mar 15, 2005, at 4:34 PM, John Timmer wrote: > >> In answer to my own question, NSSet seems to be more efficient than an >> array >> for this use. Any thoughts on using one? We could either use an >> internal, >> private ivar only for tests such as this, or change the array to a set. >> Arrays and sets seem pretty readily convertible, and all this is in the >> BCSymbol class, so this shouldn't be a big deal. >> > > Where do you want to use an NSSet, as a return value for findSequence? > The advantage of the array is that the found sequences are in the > 'right order'. > Sorry for my lack of clarity. Shark says that over 30% of the execution time in the "findSequence" method is spent checking whether one symbol represents another. Currently, that's done by checking whether the submitted symbol occurs in the array of represented symbols. According to the docs, making the represented symbols a set instead of an array will speed this up significantly. Returning an array from the method doesn't enter into this issue, and definitely should not be changed. I may be obsessing about this, but my tests earlier today showed that the non-strict version of the code to take 4-5X the time to execute compared to the strict one. In a 1.2Kb sequence, it's the difference between barely perceptible and wondering whether something's broken. JT PS - Once symbol sets are done, a quick test for the symbol set used would also allow us to set the strict flag, even if the user hasn't done so, and speed up many cases, so let me know when it's done and in use. _______________________________________________ This mind intentionally left blank From jtimmer at bellatlantic.net Tue Mar 15 19:38:36 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Tue, 15 Mar 2005 19:38:36 -0500 Subject: [Biococoa-dev] BCScanner In-Reply-To: <914ae27279cb1c51f5ab4c9da069d8c4@earthlink.net> Message-ID: > > This dates back to an early discussion, where it was suggested that it > would be nice if we have a NSScanner equivalent that deals with native > BCSequences. I think Alex made the initial interface of BCScanner, I > just started to add some implementation. See this thread: > > http://bioinformatics.org/pipermail/biococoa-dev/2004-September/ > 000286.html > > > If we use a plain NSScanner, we add two conversion steps between a > BCSequence and its string which could add some overhead: > > BCSequence -> string -> NSScanner -> strings -> BCSequences. Right, and I agree with that completely. But my thought was more as to whether a scanner-type object is needed at all for digests, since Scanners are useful for reading ordered objects one at a time. The products of a digest aren't necessarily ordered in any way. I would have thought a digest should generate an array containing all fragments at once, through a process like: BCSequence Pocessed by BCFindSequence -> array of ranges BCSequence subSequenceInRange -> individual digest fragments A scanner seems like overkill for this need. JT _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Tue Mar 15 20:17:40 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue, 15 Mar 2005 20:17:40 -0500 Subject: [Biococoa-dev] BCScanner In-Reply-To: References: Message-ID: <5fb724569a5baff7a5923f26671ad0dc@earthlink.net> On Mar 15, 2005, at 7:38 PM, John Timmer wrote: > BCSequence > Pocessed by BCFindSequence -> array of ranges > BCSequence subSequenceInRange -> individual digest fragments > > A scanner seems like overkill for this need. > The nice thing of a scanner is that you can pass it a symbolset/characterset, containing all the various locations where a sequence has to be cut. At least this is how I use it for proteins in my own app. Eg trypsin cuts at K and R, if we pass KR to findSequence, it will look for that actual sequence. If we pass it as a symbolset to a scanner, it will look for each individual symbol, which is what we want. - Koen. From kvddrift at earthlink.net Tue Mar 15 20:20:47 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue, 15 Mar 2005 20:20:47 -0500 Subject: [Biococoa-dev] Optimizations In-Reply-To: References: Message-ID: <1c8f30378167ddbe2e040c641b619453@earthlink.net> On Mar 15, 2005, at 7:29 PM, John Timmer wrote: > Sorry for my lack of clarity. Shark says that over 30% of the > execution > time in the "findSequence" method is spent checking whether one symbol > represents another. Currently, that's done by checking whether the > submitted symbol occurs in the array of represented symbols. > According to > the docs, making the represented symbols a set instead of an array will > speed this up significantly. > > Returning an array from the method doesn't enter into this issue, and > definitely should not be changed. > > I may be obsessing about this, but my tests earlier today showed that > the > non-strict version of the code to take 4-5X the time to execute > compared to > the strict one. In a 1.2Kb sequence, it's the difference between > barely > perceptible and wondering whether something's broken. > Ah, I see what you mean now, and yes, a BCSymbolSet could be much faster. I think the symbolsets are ready for use (is that right, Charles?). What's missing so far is that they have not been implemented to the BCSequence code yet. - Koen. From a.griekspoor at nki.nl Wed Mar 16 13:13:39 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Wed, 16 Mar 2005 19:13:39 +0100 Subject: [Biococoa-dev] Optimizations Message-ID: <4de7a9cff58290e661a4bebe193c86b5@nki.nl> On 16-mrt-05, at 1:29, John Timmer wrote: >> Where do you want to use an NSSet, as a return value for findSequence? >> The advantage of the array is that the found sequences are in the >> 'right order'. >> > > Sorry for my lack of clarity. Shark says that over 30% of the > execution > time in the "findSequence" method is spent checking whether one symbol > represents another. Currently, that's done by checking whether the > submitted symbol occurs in the array of represented symbols. > According to > the docs, making the represented symbols a set instead of an array will > speed this up significantly. It would make sense to turn the representedsymbols as a NSSet, there's no specific order to keep in mind, so go ahead. Be careful though how to implement this in combination with the aminoacid template plist we use. I don't believe NSSet can be used directly in a plist, so you perhaps have to do the conversion from the array you get from the plist... Alex ************************************************************** ** Alexander Griekspoor ** ************************************************************** The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com MacOS X: The power of UNIX with the simplicity of the Mac *************************************************************** From jtimmer at bellatlantic.net Wed Mar 16 14:24:55 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Wed, 16 Mar 2005 14:24:55 -0500 Subject: [Biococoa-dev] BCScanner In-Reply-To: <5fb724569a5baff7a5923f26671ad0dc@earthlink.net> Message-ID: >> BCSequence >> Pocessed by BCFindSequence -> array of ranges >> BCSequence subSequenceInRange -> individual digest fragments >> >> A scanner seems like overkill for this need. >> > > The nice thing of a scanner is that you can pass it a > symbolset/characterset, containing all the various locations where a > sequence has to be cut. At least this is how I use it for proteins in > my own app. Eg trypsin cuts at K and R, if we pass KR to findSequence, > it will look for that actual sequence. If we pass it as a symbolset to > a scanner, it will look for each individual symbol, which is what we > want. Ah, I hadn't thought of that case. I keep thinking in nucleotide terms, where every possible combination of nucleotides is represented by a single symbol. This might be another case where splitting the tool two ways (one for aa's, one for nt's) could be justified by performance profiling. JT _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Wed Mar 16 18:58:41 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 16 Mar 2005 18:58:41 -0500 Subject: [Biococoa-dev] Optimizations In-Reply-To: <4de7a9cff58290e661a4bebe193c86b5@nki.nl> References: <4de7a9cff58290e661a4bebe193c86b5@nki.nl> Message-ID: On Mar 16, 2005, at 1:13 PM, Alexander Griekspoor wrote: > It would make sense to turn the representedsymbols as a NSSet, there's > no specific order to keep in mind, so go ahead. Be careful though how > to implement this in combination with the aminoacid template plist we > use. I don't believe NSSet can be used directly in a plist, so you > perhaps have to do the conversion from the array you get from the > plist... > Or maybe we should make it into a BCSymbolSet? The we can just pass a NSString of symbolChars to create the set. - Koen. From jtimmer at bellatlantic.net Wed Mar 16 19:25:26 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Wed, 16 Mar 2005 19:25:26 -0500 Subject: [Biococoa-dev] Optimizations In-Reply-To: Message-ID: > > On Mar 16, 2005, at 1:13 PM, Alexander Griekspoor wrote: > >> It would make sense to turn the representedsymbols as a NSSet, there's >> no specific order to keep in mind, so go ahead. Be careful though how >> to implement this in combination with the aminoacid template plist we >> use. I don't believe NSSet can be used directly in a plist, so you >> perhaps have to do the conversion from the array you get from the >> plist... >> > > Or maybe we should make it into a BCSymbolSet? The we can just pass a > NSString of symbolChars to create the set. Maybe - I'll look into it. It depends on how deeply into the class structure I'd have to dig. If it turns out that I'd have to dig into every subclass's initialization code to do this, I'll probably just make an NSSet a private ivar so as not to disrupt everything we have working. If all the code I'd need to tweak is in the superclass, then replacing it with a SymbolSet should be okay, although it does have an expense in terms of overhead and the ability of people to follow through our code. I'll try to spend some time on it tomorrow. JT _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Wed Mar 16 19:43:29 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 16 Mar 2005 19:43:29 -0500 Subject: [Biococoa-dev] string definitions Message-ID: <6fe0eb471e8e2c7895299ec59ef97c8b@earthlink.net> Hi, While looking at the alignment code (and trying to understand it ;-) I noticed the NSString definitions that Philipp put in. I think this is a good idea, I suggest we should also use those for reading the plists for the BCSymbols and other places. So instead of hardcoding something like: name = [[symbolInfo objectForKey:@"Name"] copy]; We could define const NSStrings. In this case @"Name" could be replaced by BCSymbolName, or something equivalent. I also suggest if we implement this, we do this in one general headerfile, instead of each individual file. what do you think? (BTW what is FOUNDATION_EXPORT, I didn't find it in the devdocs.) - Koen. From kvddrift at earthlink.net Wed Mar 16 19:45:59 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 16 Mar 2005 19:45:59 -0500 Subject: [Biococoa-dev] BCPairwiseAlignment In-Reply-To: References: <04f6e5330d6947d613b3dde3bf65fbea@earthlink.net> Message-ID: On Mar 15, 2005, at 6:06 PM, Philipp Seibel wrote: > yes it can. i didn't remove it, because the discussion about the > structure of alignments and alignment algorithms isn't finished yet. > That makes sense. BTW, could any of you recommend some good reading on understanding the alignment coding? I sort of understand the basics, but when I look at the code it's difficult to follow for me :( thanks, - Koen. From charles.parnot at stanford.edu Thu Mar 17 00:15:58 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Wed, 16 Mar 2005 21:15:58 -0800 Subject: [Biococoa-dev] BCSymbolMapping (was: no subject) In-Reply-To: <42857544500b21b42dd9f5913091fabf@nki.nl> References: <42857544500b21b42dd9f5913091fabf@nki.nl> Message-ID: Sorry I could not reply earlier... About optimizing later: this is a very true statement, I actually brought it up several times, but the other important thing to keep in mind is you still want to keep your code 'optimizable' later when you feel there is a chance something could be done, and not lock you up in a difficult to change implementation. What you propose is not that bad, I have to say, but still won't allow to test different options. Specifically, we would probably want to test other mapping options if we find that the algorithm spends more than 20% of the time retrieving scores from the score matrix. And I am quite confident that will happen... But there is a very good chance that I am wronf, so we would have to ask Shark. And then test different mapping if necessary. To test different mapping, we would need BCSymbolMapping. So here is what I propose: * we try a little test program to see how much time is spent on the score retrieval for the alignment algorithm, say to align two sequences of 1000 bases and 2 sequences of 10000 bases? * if the algorithm spends a lot of time there, then we implement BCSymbolMapping I actually started writing the program by copying and pasting the code, but the alignement does not work. I will post the code on a separate message for Phil and you to have a look, because I have no idea how the algorithm (Koen, you are not alone!). Now, I still need to answer some of your points and continue the battle ;-) >Caching would be nice, but again, why not let the BCSequence do the job itself (no hassle with helper objects), it's also THE place to store the cache IMHO... If BCSequence takes care of the mapping, yes, sure. But if the mapping is dependent on the BCSymbolSet used for it, then no, because the symbol set may be different from the sequence symbol set. >>The whole idea of this class, again, would be to have a separate class that takes care of the mapping, and only of the mapping: >> >> objects ------> C ------> algorithm -------> C -------> Objects >> >>The algorithm should not know anything about the biology. I would not want to see anything like -whatevermatrix['A']['G']- in the middle of the algorithm. Having the mapping done in a separate class allows to write the algorithm like this: >Well, perhaps I'm more humanoid, but I like it better than whatevermatrix['0x00']['0x03']; Sorry it was not clear. My point was more that the algorithm should not know what an 'A' or a 'G' is. This is why you should not see whatevermatrix['A']['G'], and you should not see whatevermatrix['0x00']['0x03']. The algorithm could well be aligning the Bible with the BioCocoa framework code, and do the job and not care. If the mapping is not known from the algorithm, then no risk that some assumptions are made. This is what I really meant, just separating code. >Also it would change the code dramatically as well: >BCSymbolSet *set=....union of the symbol sets of seq 1 and 2... -> not necessary (unless we make the matrix creation dependent on the symbolset (see below) >BCSymbolMapping *mapping=[BCSymbolMapping mappingWithSymbolSet:set]; >-> not necessary >char *seq1=[mapping charMappingForSequence:sequenceObject1]; -> same >char *seq2=[mapping charMappingForSequence:sequenceObject2]; -> same >int **scores=[mapping charMappingForScoreMatrix:matrix]; -> int **scores = [BCAlignment matrixForSymbolSet: set]; >// .... run the algorithm... >BCSequenceAlignment *result=[BCAlignment alignementForSequences(int)count length:(int)length charBuffer:(char*)seqs]; -> Why make BCSymbolmapping the mother of alignments?! BCSymbolMapping would not be the mother of anybody! It should be able to map any of the BioCocoa objects into c arrrays. Maybe not BCSequenceAlignement, as they can be reconstructed from an array of sequences, I suppose. >>* if a score is an int or a float, the matrix is actually 128 x 128 x 4 = 64 kilobytes >That's right, but come on, 64kb that's nothing. This is bigger than the L1 cache of most macs out there. I believe this is the size of the L1 cache on the most recent G5. This is also 1/8 of the L2 cache. This means that the chip might even go back to RAM every time it tries to access the score matrix. And it will access the score matrix a lot, every time it compares two symbols. Like I said, Shark will tell. I just wanted to make my point about the size of that array, not in terms of RAM, but in terms of cache. >>* it is possible that int will be better than char because of the cast step? I know it is a big issue for float to int, but I don't know about char --> int; so maybe we will use int? >Same thing, let's make the thing and Shark will tell us. How will Shark tell us if we cannot easily change the mapping and compare implementations with everything else equal? >>Phil, hang in there. Let's not let these guys take us down ;-) >GRRRRR!!!! LOL, >Cheers mates! >Alex Now I am going to add a little nerve playing part...ah,ah,ah... We had a nice barbecue yesterday evening after the swimming-pool. It was so warm outside it was really a relief to get in the water. This is why I did not answer the email earlier... Or the days before. cheers :-) charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From charles.parnot at stanford.edu Thu Mar 17 00:22:03 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Wed, 16 Mar 2005 21:22:03 -0800 Subject: [Biococoa-dev] testing alignements In-Reply-To: References: <42857544500b21b42dd9f5913091fabf@nki.nl> Message-ID: >I actually started writing the program by copying and pasting the code, but the alignement does not work. I will post the code on a separate message for Phil and you to have a look, because I have no idea how the algorithm works (Koen, you are not alone!). Here is the code I used for testing purpose, but the alignement does not work, it seems /*****************STARTING HERE***************/ /* * TestAlignment.c * */ #define DIAG (idxB - 1) * lenA + (idxA - 1) #define LEFT idxB * lenA + (idxA - 1) #define UP (idxB -1) * lenA + idxA typedef enum { kNone = 0, kDiagonal, kLeft, kUp } Pointers; void alignSequences(char *seqA, char *seqB, int lenA, int lenB) { /* set up score matrix */ int *scoreMatrix = (int *)malloc( sizeof( int ) * 128 * 128 ); int match = 1; int mismatch = -1; int ii,jj; for ( ii = 0; ii < 128 ; ii++ ) for ( jj = 0; jj < 128 ; jj++ ) if (ii == jj ) scoreMatrix[ii+128*jj] = match; else scoreMatrix[ii+128*jj] = mismatch; int gapCosts = -1; int *backtracking = (int *)malloc( sizeof( int ) * lenA * lenB ); int *dynMatrix = (int *)malloc( sizeof( int ) * lenA * lenB ); unsigned int idxA; unsigned int idxB; dynMatrix[ 0 ] = scoreMatrix[seqA[0]*128+seqB[0]]; backtracking[ 0 ] = kNone; for ( idxA = 1; idxA < lenA; idxA++ ) { backtracking[ idxA ] = kLeft; dynMatrix[ idxA ] = idxA * gapCosts; } for ( idxB = 1; idxB < lenB; idxB++ ) { backtracking[ idxB * lenA ] = kUp; dynMatrix[ idxB * lenA ] = idxB * gapCosts; } for ( idxA = 1; idxA < lenA; idxA++ ) { for ( idxB = 1; idxB < lenB; idxB++ ) { unsigned int currPos = idxB * lenA + idxA; int substitutionScore = scoreMatrix[seqA[idxA]*128+seqB[idxB]]; int diagScore = dynMatrix[ DIAG ] + substitutionScore; int rightScore = dynMatrix[ LEFT ] + gapCosts; int downScore = dynMatrix[ UP ] + gapCosts; if ( diagScore >= rightScore ) { if ( diagScore > downScore ) { backtracking[ currPos ] = kDiagonal; dynMatrix[ currPos ] = diagScore; } else { backtracking[ currPos ] = kUp; dynMatrix[ currPos ] = downScore; } } else { if ( rightScore > downScore ) { backtracking[ currPos ] = kLeft; dynMatrix[ currPos ] = rightScore; } else { backtracking[ currPos ] = kUp; dynMatrix[ currPos ] = downScore; } } } } int i = lenA; int j = lenB; int k = 0; char *a = ( char * ) malloc( (lenA + lenB) * sizeof(char)); char *b = ( char * ) malloc( (lenA + lenB) * sizeof(char)); while ( 1 ) { // escape when origin is reached if(backtracking[i * lenA + j ] == kNone) break; switch(backtracking[i * lenA + j ]){ case kDiagonal : a[k] = seqA[i - 1]; b[k] = seqB[j - 1]; i--; j--; k++; break; case kLeft : a[k] = seqA[i - 1]; b[k] = '-'; i--; k++; break; case kUp : a[k] = '-'; b[k] = seqB[j - 1]; j--; k++; break; } } for(i=k-1;i>=0;i--) printf("%c",a[i]); printf("\n"); for(j=k-1;j>=0;j--) printf("%c",b[j]); printf("\n"); } void alignment1() { char *seqA = "ATGTAGTCTGATGATGAGATGACGT"; char *seqB = "ATGTCAGTCTGATGATGAGATGACGAT"; alignSequences(seqA,seqB,25,27); } int main (int argc, const char * argv[]) { while(1<2) { alignment1(); } return 0; } /*****************ENDING HERE***************/ -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From a.griekspoor at nki.nl Thu Mar 17 05:05:35 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Thu, 17 Mar 2005 11:05:35 +0100 Subject: [Biococoa-dev] BCPairwiseAlignment In-Reply-To: References: <04f6e5330d6947d613b3dde3bf65fbea@earthlink.net> Message-ID: <85509f99521c0c915ff398e6d89760cb@nki.nl> Hi guys, I've (over-commented) the code of phil to give you an idea what's going on, perhaps it makes it easier to understand... I did not add it to the CVS don't worry. I leave it up to Phil to do the real documentation once the implementation is ready in his opinion, he can copy the comments or discard them from this file.... More to follow... Cheers, Alex -------------- next part -------------- A non-text attachment was scrubbed... Name: BCPairwiseAlignment.m Type: application/octet-stream Size: 6666 bytes Desc: not available URL: -------------- next part -------------- On 17-mrt-05, at 1:45, Koen van der Drift wrote: > > On Mar 15, 2005, at 6:06 PM, Philipp Seibel wrote: > >> yes it can. i didn't remove it, because the discussion about the >> structure of alignments and alignment algorithms isn't finished yet. >> > > That makes sense. > > BTW, could any of you recommend some good reading on understanding the > alignment coding? I sort of understand the basics, but when I look at > the code it's difficult to follow for me :( > > > thanks, > > - Koen. > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com iRNAi, do you? http://www.mekentosj.com/irnai ********************************************************* From a.griekspoor at nki.nl Thu Mar 17 06:10:24 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Thu, 17 Mar 2005 12:10:24 +0100 Subject: [Biococoa-dev] BCPairwiseAlignment Message-ID: On 17-mrt-05, at 1:45, Koen van der Drift wrote: > BTW, could any of you recommend some good reading on understanding the > alignment coding? I sort of understand the basics, but when I look at > the code it's difficult to follow for me :( The two books I've read that helped me a lot were: BLAST from O'Reilly. Centered around BLAST (obviously) it starts with a nice introduction on sequence alignments. But perhaps you can better read "An introduction Bioinformatics Algorithms" by Jones and Pevzner, very nice book describing many algorithms in a clear way. Check out the website http://www.bioalgorithms.info/ Cheers, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Microsoft is not the answer, Microsoft is the question, NO is the answer ********************************************************* From kvddrift at earthlink.net Thu Mar 17 06:35:23 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 17 Mar 2005 06:35:23 -0500 Subject: [Biococoa-dev] BCPairwiseAlignment In-Reply-To: <881f42fc4a3dcad4fd24ee3f2244cb3e@nki.nl> References: <04f6e5330d6947d613b3dde3bf65fbea@earthlink.net> <881f42fc4a3dcad4fd24ee3f2244cb3e@nki.nl> Message-ID: <9eb4780ee4c92fc64e6692487a20cf1a@earthlink.net> On Mar 17, 2005, at 6:09 AM, Alexander Griekspoor wrote: > But perhaps you can better read "An introduction Bioinformatics > Algorithms" by Jones and Pevzner, very nice book describing many > algorithms in a clear way. Check out the website > http://www.bioalgorithms.info/ > Thanks - the website also has a nice section on molecular biology which is also helpful for me :) - Koen. From a.griekspoor at nki.nl Thu Mar 17 16:36:35 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Thu, 17 Mar 2005 22:36:35 +0100 Subject: [Biococoa-dev] BCSymbolMapping (was: no subject) Message-ID: <121e14a96f050a8fa4ea409ffd4dc1de@nki.nl> On 17-mrt-05, at 6:15, Charles PARNOT wrote: > Sorry I could not reply earlier... > > About optimizing later: this is a very true statement, I actually > brought it up several times, but the other important thing to keep in > mind is you still want to keep your code 'optimizable' later when you > feel there is a chance something could be done, and not lock you up in > a difficult to change implementation. True. > What you propose is not that bad, I have to say, but still won't allow > to test different options. Specifically, we would probably want to > test other mapping options if we find that the algorithm spends more > than 20% of the time retrieving scores from the score matrix. And I am > quite confident that will happen... But there is a very good chance > that I am wronf, so we would have to ask Shark. And then test > different mapping if necessary. To test different mapping, we would > need BCSymbolMapping. So here is what I propose: > * we try a little test program to see how much time is spent on the > score retrieval for the alignment algorithm, say to align two > sequences of 1000 bases and 2 sequences of 10000 bases? > * if the algorithm spends a lot of time there, then we implement > BCSymbolMapping Yes, exactly the idea, only start such implementations once you have actually seen that the problem is there. > > I actually started writing the program by copying and pasting the > code, but the alignement does not work. I will post the code on a > separate message for Phil and you to have a look, because I have no > idea how the algorithm (Koen, you are not alone!). Did my comments in the .m file help? > > Now, I still need to answer some of your points and continue the > battle ;-) Oh oh... > >> Caching would be nice, but again, why not let the BCSequence do the >> job itself (no hassle with helper objects), it's also THE place to >> store the cache IMHO... > If BCSequence takes care of the mapping, yes, sure. > But if the mapping is dependent on the BCSymbolSet used for it, then > no, because the symbol set may be different from the sequence symbol > set. True, if we implement the mapping the situation is different indeed. > >>> The whole idea of this class, again, would be to have a separate >>> class that takes care of the mapping, and only of the mapping: >>> >>> objects ------> C ------> algorithm -------> C -------> Objects >>> >>> The algorithm should not know anything about the biology. I would >>> not want to see anything like -whatevermatrix['A']['G']- in the >>> middle of the algorithm. Having the mapping done in a separate class >>> allows to write the algorithm like this: >> Well, perhaps I'm more humanoid, but I like it better than >> whatevermatrix['0x00']['0x03']; > Sorry it was not clear. My point was more that the algorithm should > not know what an 'A' or a 'G' is. This is why you should not see > whatevermatrix['A']['G'], and you should not see > whatevermatrix['0x00']['0x03']. The algorithm could well be aligning > the Bible with the BioCocoa framework code, and do the job and not > care. > If the mapping is not known from the algorithm, then no risk that some > assumptions are made. This is what I really meant, just separating > code. Yep that's a good point, again the same thing applies as above, IF we go for the mapping you're absolutely right. > >> Also it would change the code dramatically as well: >> BCSymbolSet *set=....union of the symbol sets of seq 1 and 2... -> >> not necessary (unless we make the matrix creation dependent on the >> symbolset (see below) >> BCSymbolMapping *mapping=[BCSymbolMapping mappingWithSymbolSet:set]; >> -> not necessary >> char *seq1=[mapping charMappingForSequence:sequenceObject1]; -> same >> char *seq2=[mapping charMappingForSequence:sequenceObject2]; -> same >> int **scores=[mapping charMappingForScoreMatrix:matrix]; -> int >> **scores = [BCAlignment matrixForSymbolSet: set]; >> // .... run the algorithm... >> BCSequenceAlignment *result=[BCAlignment >> alignementForSequences(int)count length:(int)length >> charBuffer:(char*)seqs]; -> Why make BCSymbolmapping the mother of >> alignments?! > BCSymbolMapping would not be the mother of anybody! It should be able > to map any of the BioCocoa objects into c arrrays. Hmm, here I am again, I would then still vote to have convenience methods as well (that work via BCSymbolMapping objects). I just like to call [myBCSequence mapping] (which uses the BCSequence' symbolset by default) and [myBCSequence mappingWithSymbolSet:set] (which allows you to use a different set) instead of having to go through the helper object explicitly. Understand me well, I have to problem with the fact that it can be done as above, doing things manually can be handy to cache the object for instance (like the example above where you use mapping a few times in a row), but in general I hate the BioJava exorbitant use of factories and helper objects. NSString as an example, imagine having to great a helper object any time you want its filesystemRepresentation.... >>> * if a score is an int or a float, the matrix is actually 128 x 128 >>> x 4 = 64 kilobytes >> That's right, but come on, 64kb that's nothing. > > This is bigger than the L1 cache of most macs out there. I believe > this is the size of the L1 cache on the most recent G5. This is also > 1/8 of the L2 cache. This means that the chip might even go back to > RAM every time it tries to access the score matrix. And it will access > the score matrix a lot, every time it compares two symbols. Like I > said, Shark will tell. I just wanted to make my point about the size > of that array, not in terms of RAM, but in terms of cache. Ok, well I'm definitely off my terrain here, so stupid things may now follow. But if you know the position in the matrix you want the value for, and you have the pointer to the memory location, do you really have to feed the whole matrix into the processor's cache? [ignorant fool's talk] Why can't it just read an int from that memory position? [/ignorant fool's talk]. It's embarrassing to know so little about how these things work... > >>> * it is possible that int will be better than char because of the >>> cast step? I know it is a big issue for float to int, but I don't >>> know about char --> int; so maybe we will use int? >> Same thing, let's make the thing and Shark will tell us. > How will Shark tell us if we cannot easily change the mapping and > compare implementations with everything else equal? I thought the conclusion was to ask Shark if there's a problem at all centered around this step and then implement the mapping and see if it helps ;-) > > Now I am going to add a little nerve playing part...ah,ah,ah... We had > a nice barbecue yesterday evening after the swimming-pool. It was so > warm outside it was really a relief to get in the water. This is why I > did not answer the email earlier... Or the days before. That's playing unfair, shitty Dutch weather... Cheers, Alex > ************************************************************** ** Alexander Griekspoor ** ************************************************************** The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com MacOS X: The power of UNIX with the simplicity of the Mac *************************************************************** ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Claiming that the Macintosh is inferior to Windows because most people use Windows, is like saying that all other restaurants serve food that is inferior to McDonalds ********************************************************* From jtimmer at bellatlantic.net Thu Mar 17 17:49:58 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Thu, 17 Mar 2005 17:49:58 -0500 Subject: [Biococoa-dev] BCSymbolMapping (was: no subject) In-Reply-To: <121e14a96f050a8fa4ea409ffd4dc1de@nki.nl> Message-ID: >> Now I am going to add a little nerve playing part...ah,ah,ah... We had >> a nice barbecue yesterday evening after the swimming-pool. It was so >> warm outside it was really a relief to get in the water. This is why I >> did not answer the email earlier... Or the days before. > That's playing unfair, shitty Dutch weather... Not just Dutch. New York's hardly been a pleasure to live in recently, weather wise. The worst thing is that, having done my PhD at Berkeley, I know exactly what I'm missing over in the Stanford area... JT _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Thu Mar 17 18:25:33 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 17 Mar 2005 18:25:33 -0500 Subject: [Biococoa-dev] BCSymbolMapping (was: no subject) In-Reply-To: References: Message-ID: <405cbc246d69d42e5b65b62f5f5f5753@earthlink.net> On Mar 17, 2005, at 5:49 PM, John Timmer wrote: >> That's playing unfair, shitty Dutch weather... > > Not just Dutch. New York's hardly been a pleasure to live in recently, > weather wise. The worst thing is that, having done my PhD at > Berkeley, I > know exactly what I'm missing over in the Stanford area... > We even had snow today in North Carolina :( That stupid groundhog was right. But the basketball makes up for that ;-) - Koen. From a.griekspoor at nki.nl Thu Mar 17 18:34:48 2005 From: a.griekspoor at nki.nl (a.griekspoor at nki.nl) Date: Fri, 18 Mar 2005 00:34:48 +0100 Subject: [Biococoa-dev] BCSymbolMapping (was: no subject) Message-ID: <667464FDA2C81D4CA79D7F3B728D10E73C151C@adsrv100.nki.nl> Strange things are happening, within 2 weeks we went from 50cm of snow (last time was 1979) and - 20 (never so cold in march since they started official measurements two centuries ago) to + 20 the coming weekend ;-) hmmm. Alex -----Original Message----- From: Koen van der Drift [mailto:kvddrift at earthlink.net] Sent: Fri 3/18/2005 12:25 AM To: John Timmer Cc: Alexander Griekspoor; BioCocoa Mailinglist Subject: Re: [Biococoa-dev] BCSymbolMapping (was: no subject) On Mar 17, 2005, at 5:49 PM, John Timmer wrote: >> That's playing unfair, shitty Dutch weather... > > Not just Dutch. New York's hardly been a pleasure to live in recently, > weather wise. The worst thing is that, having done my PhD at > Berkeley, I > know exactly what I'm missing over in the Stanford area... > We even had snow today in North Carolina :( That stupid groundhog was right. But the basketball makes up for that ;-) - Koen. From kvddrift at earthlink.net Thu Mar 17 18:39:16 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 17 Mar 2005 18:39:16 -0500 Subject: [Biococoa-dev] BCSymbolMapping (was: no subject) In-Reply-To: <667464FDA2C81D4CA79D7F3B728D10E73C151C@adsrv100.nki.nl> References: <667464FDA2C81D4CA79D7F3B728D10E73C151C@adsrv100.nki.nl> Message-ID: On Mar 17, 2005, at 6:34 PM, wrote: > Strange things are happening, within 2 weeks we went from 50cm of snow > (last time was 1979) and - 20 (never so cold in march since they > started official measurements two centuries ago) to + 20 the coming > weekend ;-) hmmm. > It's the revenge of the leprechauns :D - Koen. From kvddrift at earthlink.net Thu Mar 17 18:41:15 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 17 Mar 2005 18:41:15 -0500 Subject: [Biococoa-dev] string definitions In-Reply-To: <067fed2f07ba0cc280e58c70a4a45e9d@nki.nl> References: <6fe0eb471e8e2c7895299ec59ef97c8b@earthlink.net> <067fed2f07ba0cc280e58c70a4a45e9d@nki.nl> Message-ID: On Mar 17, 2005, at 4:17 PM, Alexander Griekspoor wrote: >> We could define const NSStrings. In this case @"Name" could be >> replaced by BCSymbolName, or something equivalent. I also suggest if >> we implement this, we do this in one general headerfile, instead of >> each individual file. >> >> what do you think? > > Yes! Indeed very nice as I also mentioned in the commented alignment > .m file. > I suggest we use BCStringDefinitions.h. If no-one objects, I will go ahead and add that file. - Koen. From a.griekspoor at nki.nl Fri Mar 18 02:11:51 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Fri, 18 Mar 2005 08:11:51 +0100 Subject: [Biococoa-dev] string definitions In-Reply-To: References: <6fe0eb471e8e2c7895299ec59ef97c8b@earthlink.net> <067fed2f07ba0cc280e58c70a4a45e9d@nki.nl> Message-ID: Good plan! On 18-mrt-05, at 0:41, Koen van der Drift wrote: > > On Mar 17, 2005, at 4:17 PM, Alexander Griekspoor wrote: > >>> We could define const NSStrings. In this case @"Name" could be >>> replaced by BCSymbolName, or something equivalent. I also suggest if >>> we implement this, we do this in one general headerfile, instead of >>> each individual file. >>> >>> what do you think? >> >> Yes! Indeed very nice as I also mentioned in the commented alignment >> .m file. >> > > I suggest we use BCStringDefinitions.h. If no-one objects, I will go > ahead and add that file. > > > - Koen. > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ************************************************************** ** Alexander Griekspoor ** ************************************************************** The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com MacOS X: The power of UNIX with the simplicity of the Mac *************************************************************** From biococoa at bioworxx.com Sat Mar 19 03:49:40 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Sat, 19 Mar 2005 09:49:40 +0100 Subject: [Biococoa-dev] testing alignements In-Reply-To: References: <42857544500b21b42dd9f5913091fabf@nki.nl> Message-ID: Sorry me writing so late, but i was in cologne for two days. I will comment the code soon, i promise. Your code doesn't work because you used an old version.... , there is a new version in the cvs since wednesday. If the new piece of code doesn't work, please tell. Phil Am 17.03.2005 um 06:22 schrieb Charles PARNOT: >> I actually started writing the program by copying and pasting the >> code, but the alignement does not work. I will post the code on a >> separate message for Phil and you to have a look, because I have no >> idea how the algorithm works (Koen, you are not alone!). > > Here is the code I used for testing purpose, but the alignement does > not work, it seems > > > /*****************STARTING HERE***************/ > > /* > * TestAlignment.c > * > */ > > #define DIAG (idxB - 1) * lenA + (idxA - 1) > #define LEFT idxB * lenA + (idxA - 1) > #define UP (idxB -1) * lenA + idxA > > typedef enum { > kNone = 0, > kDiagonal, > kLeft, > kUp > } Pointers; > > > void alignSequences(char *seqA, char *seqB, int lenA, int lenB) > { > > /* set up score matrix */ > int *scoreMatrix = (int *)malloc( sizeof( int ) * 128 * 128 ); > int match = 1; > int mismatch = -1; > int ii,jj; > for ( ii = 0; ii < 128 ; ii++ ) > for ( jj = 0; jj < 128 ; jj++ ) > if (ii == jj ) > scoreMatrix[ii+128*jj] = match; > else > scoreMatrix[ii+128*jj] = mismatch; > > int gapCosts = -1; > > int *backtracking = (int *)malloc( sizeof( int ) * lenA * lenB ); > int *dynMatrix = (int *)malloc( sizeof( int ) * lenA * lenB ); > > unsigned int idxA; > unsigned int idxB; > > dynMatrix[ 0 ] = scoreMatrix[seqA[0]*128+seqB[0]]; > backtracking[ 0 ] = kNone; > > for ( idxA = 1; idxA < lenA; idxA++ ) { > backtracking[ idxA ] = kLeft; > dynMatrix[ idxA ] = idxA * gapCosts; > } > > for ( idxB = 1; idxB < lenB; idxB++ ) { > backtracking[ idxB * lenA ] = kUp; > dynMatrix[ idxB * lenA ] = idxB * gapCosts; > } > > for ( idxA = 1; idxA < lenA; idxA++ ) { > for ( idxB = 1; idxB < lenB; idxB++ ) { > unsigned int currPos = idxB * lenA + idxA; > > int substitutionScore = scoreMatrix[seqA[idxA]*128+seqB[idxB]]; > int diagScore = dynMatrix[ DIAG ] + substitutionScore; > int rightScore = dynMatrix[ LEFT ] + gapCosts; > int downScore = dynMatrix[ UP ] + gapCosts; > > if ( diagScore >= rightScore ) { > if ( diagScore > downScore ) { > backtracking[ currPos ] = kDiagonal; > dynMatrix[ currPos ] = diagScore; > } > else { > backtracking[ currPos ] = kUp; > dynMatrix[ currPos ] = downScore; > } > } > else { > if ( rightScore > downScore ) { > backtracking[ currPos ] = kLeft; > dynMatrix[ currPos ] = rightScore; > } > else { > backtracking[ currPos ] = kUp; > dynMatrix[ currPos ] = downScore; > } > } > } > } > > int i = lenA; > int j = lenB; > int k = 0; > char *a = ( char * ) malloc( (lenA + lenB) * sizeof(char)); > char *b = ( char * ) malloc( (lenA + lenB) * sizeof(char)); > > while ( 1 ) { > // escape when origin is reached > if(backtracking[i * lenA + j ] == kNone) break; > > switch(backtracking[i * lenA + j ]){ > case kDiagonal : > a[k] = seqA[i - 1]; > b[k] = seqB[j - 1]; > i--; > j--; > k++; > break; > > case kLeft : > a[k] = seqA[i - 1]; > b[k] = '-'; > i--; > k++; > break; > > case kUp : > a[k] = '-'; > b[k] = seqB[j - 1]; > j--; > k++; > break; > } > } > > for(i=k-1;i>=0;i--) printf("%c",a[i]); > printf("\n"); > for(j=k-1;j>=0;j--) printf("%c",b[j]); > printf("\n"); > > } > > void alignment1() > { > char *seqA = "ATGTAGTCTGATGATGAGATGACGT"; > char *seqB = "ATGTCAGTCTGATGATGAGATGACGAT"; > alignSequences(seqA,seqB,25,27); > } > > > int main (int argc, const char * argv[]) { > while(1<2) { > alignment1(); > } > return 0; > } > > /*****************ENDING HERE***************/ > > -- > Help science go fast forward: > http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ > > Charles Parnot > charles.parnot at stanford.edu > > Room B157 in Beckman Center > 279, Campus Drive > Stanford University > Stanford, CA 94305 (USA) > > Tel +1 650 725 7754 > Fax +1 650 725 8021 > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > From mek at mekentosj.com Sat Mar 19 07:22:35 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sat, 19 Mar 2005 13:22:35 +0100 Subject: [BioCocoa-dev] Peptides... Message-ID: <865134ceae5879a882c2faf3050ca357@mekentosj.com> Hi everyone, Someone in our institute asked me to create a little Cocoa app for a specific problem he had. Given a certain protein sequence and the weight for a peptide reported by the mass spec, which peptide would be the closest match? Of course an ideal situation to use BioCocoa ;-) I've used a modified version of our translation demo, which can be downloaded here: http://www.mekentosj.com/temporary/Peptides.zip Please check it out and perhaps we can add the example or parts of it to the framework. Often using the framework is the best way to discover problems or come up with new ideas (so I guess I better start working on the demo app we plan to make), so here are a few suggestions/changes I'd like to feed back in the repository but not before you have agreed to do so... 1) In this case I have one sequence of which I want to calculate the mass of many peptides. So I wanted to cache the BCToolMassCalculator and ask it for the mass of a certain range. To avoid the overhead of having to create a subsequence first, instantiate a new BCToolMassCalculator from it and calculate the mass, I've added a simple method: -(NSArray *)calculateMassForRange: (NSRange)aRange; and changed the original calculateMass method to a convenience method: --(NSArray *)calculateMass{ return [self calculateMassForRange: NSMakeRange(0, [[self sequence] length])]; } As BCToolMassCalculator uses BCToolSymbolCounter (very elegant Koen!) I've added the same method there as well: - (NSCountedSet *)countSymbols; - (NSCountedSet *)countSymbolsForRange: (NSRange)aRange; 2) I noticed that the BCSequenceView still needs a lot of works. For one, it can display line numbers (not so useful), but it wrongly displays symbol numbers. Also many things are not configurable yet (like the indent of the spacing). Also, none of the selection, marking etc methods take the spaces into account, so for a start I added an override of the setSelectedRange: method: - (void)setSelectedRange:(NSRange)charRange{ int start = charRange.location; int end = charRange.location + charRange.length; start += start/10; end += end/10; [super setSelectedRange: NSMakeRange(start, end-start)]; } Again, the 10 here should become configurable later.. I'd like to spend some time on this one further, and see what we can do... (it's also required for the demo app). Finally, I think the program is acceptable fast for the purpose I made it for (it calculates the mass of approx. 1200 peptides a second on my 1.5Ghz G4), but I'm sure it can be a lot faster if we optimize the symbolcounter where it spends most of its time (doing a lot of object messaging). But first I'll let the guy who asked me to help play with it, I challenged him to come up with something dramatically faster ;-) Cheers, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com 4Peaks - For Peaks, Four Peaks. 2004 Winner of the Apple Design Awards Best Mac OS X Student Product http://www.mekentosj.com/4peaks ********************************************************* -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 4085 bytes Desc: not available URL: From kvddrift at earthlink.net Sat Mar 19 07:52:48 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 19 Mar 2005 07:52:48 -0500 Subject: [BioCocoa-dev] Peptides... In-Reply-To: <865134ceae5879a882c2faf3050ca357@mekentosj.com> References: <865134ceae5879a882c2faf3050ca357@mekentosj.com> Message-ID: Alex, I can't build it, get a lot of build errors: /Users/koen/Desktop/Peptides/theController.m:10:34: BioCocoa/BCFoundation.h: No such file or directory /Users/koen/Desktop/Peptides/theController.m:11:30: BioCocoa/BCAppKit.h: No such file or directory I also linked to the framework on my HD, but that didn't help. - Koen. On Mar 19, 2005, at 7:22 AM, Alexander Griekspoor wrote: > Hi everyone, > > Someone in our institute asked me to create a little Cocoa app for a > specific problem he had. Given a certain protein sequence and the > weight for a peptide reported by the mass spec, which peptide would be > the closest match? > Of course an ideal situation to use BioCocoa ;-) > I've used a modified version of our translation demo, which can be > downloaded here: > http://www.mekentosj.com/temporary/Peptides.zip > Please check it out and perhaps we can add the example or parts of it > to the framework. > > Often using the framework is the best way to discover problems or come > up with new ideas (so I guess I better start working on the demo app > we plan to make), so here are a few suggestions/changes I'd like to > feed back in the repository but not before you have agreed to do so... > 1) In this case I have one sequence of which I want to calculate the > mass of many peptides. So I wanted to cache the BCToolMassCalculator > and ask it for the mass of a certain range. To avoid the overhead of > having to create a subsequence first, instantiate a new > BCToolMassCalculator from it and calculate the mass, I've added a > simple method: > -(NSArray *)calculateMassForRange: (NSRange)aRange; > and changed the original calculateMass method to a convenience method: > --(NSArray *)calculateMass{ > return [self calculateMassForRange: NSMakeRange(0, [[self sequence] > length])]; > } > As BCToolMassCalculator uses BCToolSymbolCounter (very elegant Koen!) > I've added the same method there as well: > - (NSCountedSet *)countSymbols; > - (NSCountedSet *)countSymbolsForRange: (NSRange)aRange; > > 2) I noticed that the BCSequenceView still needs a lot of works. For > one, it can display line numbers (not so useful), but it wrongly > displays symbol numbers. Also many things are not configurable yet > (like the indent of the spacing). Also, none of the selection, marking > etc methods take the spaces into account, so for a start I added an > override of the setSelectedRange: method: > - (void)setSelectedRange:(NSRange)charRange{ > int start = charRange.location; > int end = charRange.location + charRange.length; > start += start/10; > end += end/10; > [super setSelectedRange: NSMakeRange(start, end-start)]; > } > Again, the 10 here should become configurable later.. I'd like to > spend some time on this one further, and see what we can do... (it's > also required for the demo app). > > Finally, I think the program is acceptable fast for the purpose I made > it for (it calculates the mass of approx. 1200 peptides a second on my > 1.5Ghz G4), but I'm sure it can be a lot faster if we optimize the > symbolcounter where it spends most of its time (doing a lot of object > messaging). But first I'll let the guy who asked me to help play with > it, I challenged him to come up with something dramatically faster ;-) > Cheers, > Alex > > ********************************************************* > ** Alexander Griekspoor ** > ********************************************************* > The Netherlands Cancer Institute > Department of Tumorbiology (H4) > Plesmanlaan 121, 1066 CX, Amsterdam > Tel: + 31 20 - 512 2023 > Fax: + 31 20 - 512 2029 > AIM: mekentosj at mac.com > E-mail: a.griekspoor at nki.nl > Web: http://www.mekentosj.com > > 4Peaks - For Peaks, Four Peaks. > 2004 Winner of the Apple Design Awards > Best Mac OS X Student Product > http://www.mekentosj.com/4peaks > > *********************************************************______________ > _________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev From mek at mekentosj.com Sat Mar 19 09:07:45 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sat, 19 Mar 2005 15:07:45 +0100 Subject: [BioCocoa-dev] Peptides... In-Reply-To: References: <865134ceae5879a882c2faf3050ca357@mekentosj.com> Message-ID: <9d0cc985a7e2df5b0f6ac1a1cc1a3f0f@mekentosj.com> Ok, don't ask why, but for some reason XCode can't find the build framework anymore if you open the project on another machine. The trick is to delete the reference in the xcode project and add the framework again (be sure to also add it to the Copy Files build phase in the Target to make the app standalone. It doesn't even help if you add the framework "relative to the project". Anyone a clue? Also, I forgot to build it as with the deployment build style so the build app doesn't work also. I changed that in the version now available at: http://www.mekentosj.com/temporary/Peptides.zip But you still have to re-add the framework again. The build app itself can be downloaded from: http://www.mekentosj.com/temporary/Peptides_app.zip if you just want to get an idea of the app... Hope this helps, Alex On 19-mrt-05, at 13:52, Koen van der Drift wrote: > Alex, > > I can't build it, get a lot of build errors: > > /Users/koen/Desktop/Peptides/theController.m:10:34: > BioCocoa/BCFoundation.h: No such file or directory > /Users/koen/Desktop/Peptides/theController.m:11:30: > BioCocoa/BCAppKit.h: No such file or directory > > I also linked to the framework on my HD, but that didn't help. > > > - Koen. > > > On Mar 19, 2005, at 7:22 AM, Alexander Griekspoor wrote: > >> Hi everyone, >> >> Someone in our institute asked me to create a little Cocoa app for a >> specific problem he had. Given a certain protein sequence and the >> weight for a peptide reported by the mass spec, which peptide would >> be the closest match? >> Of course an ideal situation to use BioCocoa ;-) >> I've used a modified version of our translation demo, which can be >> downloaded here: >> http://www.mekentosj.com/temporary/Peptides.zip >> Please check it out and perhaps we can add the example or parts of it >> to the framework. >> >> Often using the framework is the best way to discover problems or >> come up with new ideas (so I guess I better start working on the demo >> app we plan to make), so here are a few suggestions/changes I'd like >> to feed back in the repository but not before you have agreed to do >> so... >> 1) In this case I have one sequence of which I want to calculate the >> mass of many peptides. So I wanted to cache the BCToolMassCalculator >> and ask it for the mass of a certain range. To avoid the overhead of >> having to create a subsequence first, instantiate a new >> BCToolMassCalculator from it and calculate the mass, I've added a >> simple method: >> -(NSArray *)calculateMassForRange: (NSRange)aRange; >> and changed the original calculateMass method to a convenience method: >> --(NSArray *)calculateMass{ >> return [self calculateMassForRange: NSMakeRange(0, [[self sequence] >> length])]; >> } >> As BCToolMassCalculator uses BCToolSymbolCounter (very elegant Koen!) >> I've added the same method there as well: >> - (NSCountedSet *)countSymbols; >> - (NSCountedSet *)countSymbolsForRange: (NSRange)aRange; >> >> 2) I noticed that the BCSequenceView still needs a lot of works. For >> one, it can display line numbers (not so useful), but it wrongly >> displays symbol numbers. Also many things are not configurable yet >> (like the indent of the spacing). Also, none of the selection, >> marking etc methods take the spaces into account, so for a start I >> added an override of the setSelectedRange: method: >> - (void)setSelectedRange:(NSRange)charRange{ >> int start = charRange.location; >> int end = charRange.location + charRange.length; >> start += start/10; >> end += end/10; >> [super setSelectedRange: NSMakeRange(start, end-start)]; >> } >> Again, the 10 here should become configurable later.. I'd like to >> spend some time on this one further, and see what we can do... (it's >> also required for the demo app). >> >> Finally, I think the program is acceptable fast for the purpose I >> made it for (it calculates the mass of approx. 1200 peptides a second >> on my 1.5Ghz G4), but I'm sure it can be a lot faster if we optimize >> the symbolcounter where it spends most of its time (doing a lot of >> object messaging). But first I'll let the guy who asked me to help >> play with it, I challenged him to come up with something dramatically >> faster ;-) >> Cheers, >> Alex >> >> ********************************************************* >> ** Alexander Griekspoor ** >> ********************************************************* >> The Netherlands Cancer Institute >> Department of Tumorbiology (H4) >> Plesmanlaan 121, 1066 CX, Amsterdam >> Tel: + 31 20 - 512 2023 >> Fax: + 31 20 - 512 2029 >> AIM: mekentosj at mac.com >> E-mail: a.griekspoor at nki.nl >> Web: http://www.mekentosj.com >> >> 4Peaks - For Peaks, Four Peaks. >> 2004 Winner of the Apple Design Awards >> Best Mac OS X Student Product >> http://www.mekentosj.com/4peaks >> >> *********************************************************_____________ >> __________________________________ >> Biococoa-dev mailing list >> Biococoa-dev at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/biococoa-dev > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com iRNAi, do you? http://www.mekentosj.com/irnai ********************************************************* From kvddrift at earthlink.net Sat Mar 19 14:45:30 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 19 Mar 2005 14:45:30 -0500 Subject: [BioCocoa-dev] Peptides... In-Reply-To: <9d0cc985a7e2df5b0f6ac1a1cc1a3f0f@mekentosj.com> References: <865134ceae5879a882c2faf3050ca357@mekentosj.com> <9d0cc985a7e2df5b0f6ac1a1cc1a3f0f@mekentosj.com> Message-ID: On Mar 19, 2005, at 9:07 AM, Alexander Griekspoor wrote: > Ok, don't ask why, but for some reason XCode can't find the build > framework anymore if you open the project on another machine. The > trick is to delete the reference in the xcode project and add the > framework again (be sure to also add it to the Copy Files build phase > in the Target to make the app standalone. > Thanks, now it works! I like it a lot, also the icon :) Maybe we can add a margin window, where we set the search range. In mass spec you usually only search for peptides within a small window of the mass you type in. The default value is maybe 1 Da (or in ppm). Also for peptides, monoisotopic mass should be the default. - Koen. From kvddrift at earthlink.net Sat Mar 19 15:41:56 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 19 Mar 2005 15:41:56 -0500 Subject: [BioCocoa-dev] Peptides... In-Reply-To: References: <865134ceae5879a882c2faf3050ca357@mekentosj.com> <9d0cc985a7e2df5b0f6ac1a1cc1a3f0f@mekentosj.com> Message-ID: <05ac80d856b335b9e9626de2997075ca@earthlink.net> On Mar 19, 2005, at 2:45 PM, Koen van der Drift wrote: > Maybe we can add a margin window, where we set the search range. In > mass spec you usually only search for peptides within a small window > of the mass you type in. Even better would be to leave the sign in the value for the setDiff method. Only use abs to look for the difference with the value in the window float diff, searchdiff; searchdiff = [diffinput floatValue]; // need to add a inputfield in the window diff = mw-theweight // distance from target mw if (fabs(diff) <= searchdiff ) { [peptide setDiff: diff]; [peptide setRange: aRange]; // add it to the results array, release it to counterbalance the init as it is now retained by the array [results addObject: peptide]; [peptide release]; // yep, we have another one screened: counter++; } - Koen. From kvddrift at earthlink.net Sat Mar 19 16:17:21 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 19 Mar 2005 16:17:21 -0500 Subject: [BioCocoa-dev] Peptides... In-Reply-To: <9d0cc985a7e2df5b0f6ac1a1cc1a3f0f@mekentosj.com> References: <865134ceae5879a882c2faf3050ca357@mekentosj.com> <9d0cc985a7e2df5b0f6ac1a1cc1a3f0f@mekentosj.com> Message-ID: Hi, I found a bug in the display of the peptides, it is an error between 0-based and 1-based calculations. If I select a peptide in the right pane, in the left pane the peptide shown is one position off (too high). First I tried changing that in the search code, but tht is not right because all calculations are 0-based. So it has to be changed in the display code: - (void)tableViewSelectionDidChange:(NSNotification *)aNotification{ // is there a selection? if ([[aNotification object] selectedRow] == -1) return; else { NSRange aRange; // which result is selected? Result *res = [results objectAtIndex: [tv selectedRow]]; // select the location of the peptide in the inputview aRange = [res range]; aRange.location -= 1; [theInput setSelectedRange: aRange]; } } As you already mentioned in the comments, the code can be sped up by using a search window of the average MW of an amino acid (= 110 Da), +/- 10% (or maybe 15%). - Koen. On Mar 19, 2005, at 9:07 AM, Alexander Griekspoor wrote: > Ok, don't ask why, but for some reason XCode can't find the build > framework anymore if you open the project on another machine. The > trick is to delete the reference in the xcode project and add the > framework again (be sure to also add it to the Copy Files build phase > in the Target to make the app standalone. > It doesn't even help if you add the framework "relative to the > project". Anyone a clue? > Also, I forgot to build it as with the deployment build style so the > build app doesn't work also. > I changed that in the version now available at: > http://www.mekentosj.com/temporary/Peptides.zip > But you still have to re-add the framework again. > The build app itself can be downloaded from: > http://www.mekentosj.com/temporary/Peptides_app.zip > if you just want to get an idea of the app... > Hope this helps, > Alex > > > On 19-mrt-05, at 13:52, Koen van der Drift wrote: > >> Alex, >> >> I can't build it, get a lot of build errors: >> >> /Users/koen/Desktop/Peptides/theController.m:10:34: >> BioCocoa/BCFoundation.h: No such file or directory >> /Users/koen/Desktop/Peptides/theController.m:11:30: >> BioCocoa/BCAppKit.h: No such file or directory >> >> I also linked to the framework on my HD, but that didn't help. >> >> >> - Koen. >> >> >> On Mar 19, 2005, at 7:22 AM, Alexander Griekspoor wrote: >> >>> Hi everyone, >>> >>> Someone in our institute asked me to create a little Cocoa app for a >>> specific problem he had. Given a certain protein sequence and the >>> weight for a peptide reported by the mass spec, which peptide would >>> be the closest match? >>> Of course an ideal situation to use BioCocoa ;-) >>> I've used a modified version of our translation demo, which can be >>> downloaded here: >>> http://www.mekentosj.com/temporary/Peptides.zip >>> Please check it out and perhaps we can add the example or parts of >>> it to the framework. >>> >>> Often using the framework is the best way to discover problems or >>> come up with new ideas (so I guess I better start working on the >>> demo app we plan to make), so here are a few suggestions/changes I'd >>> like to feed back in the repository but not before you have agreed >>> to do so... >>> 1) In this case I have one sequence of which I want to calculate the >>> mass of many peptides. So I wanted to cache the BCToolMassCalculator >>> and ask it for the mass of a certain range. To avoid the overhead of >>> having to create a subsequence first, instantiate a new >>> BCToolMassCalculator from it and calculate the mass, I've added a >>> simple method: >>> -(NSArray *)calculateMassForRange: (NSRange)aRange; >>> and changed the original calculateMass method to a convenience >>> method: >>> --(NSArray *)calculateMass{ >>> return [self calculateMassForRange: NSMakeRange(0, [[self sequence] >>> length])]; >>> } >>> As BCToolMassCalculator uses BCToolSymbolCounter (very elegant >>> Koen!) I've added the same method there as well: >>> - (NSCountedSet *)countSymbols; >>> - (NSCountedSet *)countSymbolsForRange: (NSRange)aRange; >>> >>> 2) I noticed that the BCSequenceView still needs a lot of works. For >>> one, it can display line numbers (not so useful), but it wrongly >>> displays symbol numbers. Also many things are not configurable yet >>> (like the indent of the spacing). Also, none of the selection, >>> marking etc methods take the spaces into account, so for a start I >>> added an override of the setSelectedRange: method: >>> - (void)setSelectedRange:(NSRange)charRange{ >>> int start = charRange.location; >>> int end = charRange.location + charRange.length; >>> start += start/10; >>> end += end/10; >>> [super setSelectedRange: NSMakeRange(start, end-start)]; >>> } >>> Again, the 10 here should become configurable later.. I'd like to >>> spend some time on this one further, and see what we can do... (it's >>> also required for the demo app). >>> >>> Finally, I think the program is acceptable fast for the purpose I >>> made it for (it calculates the mass of approx. 1200 peptides a >>> second on my 1.5Ghz G4), but I'm sure it can be a lot faster if we >>> optimize the symbolcounter where it spends most of its time (doing a >>> lot of object messaging). But first I'll let the guy who asked me to >>> help play with it, I challenged him to come up with something >>> dramatically faster ;-) >>> Cheers, >>> Alex >>> >>> ********************************************************* >>> ** Alexander Griekspoor ** >>> ********************************************************* >>> The Netherlands Cancer Institute >>> Department of Tumorbiology (H4) >>> Plesmanlaan 121, 1066 CX, Amsterdam >>> Tel: + 31 20 - 512 2023 >>> Fax: + 31 20 - 512 2029 >>> AIM: mekentosj at mac.com >>> E-mail: a.griekspoor at nki.nl >>> Web: http://www.mekentosj.com >>> >>> 4Peaks - For Peaks, Four Peaks. >>> 2004 Winner of the Apple Design Awards >>> Best Mac OS X Student Product >>> http://www.mekentosj.com/4peaks >>> >>> *********************************************************____________ >>> ___________________________________ >>> Biococoa-dev mailing list >>> Biococoa-dev at bioinformatics.org >>> https://bioinformatics.org/mailman/listinfo/biococoa-dev >> >> _______________________________________________ >> Biococoa-dev mailing list >> Biococoa-dev at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/biococoa-dev >> >> > ********************************************************* > ** Alexander Griekspoor ** > ********************************************************* > The Netherlands Cancer Institute > Department of Tumorbiology (H4) > Plesmanlaan 121, 1066 CX, Amsterdam > Tel: + 31 20 - 512 2023 > Fax: + 31 20 - 512 2029 > AIM: mekentosj at mac.com > E-mail: a.griekspoor at nki.nl > Web: http://www.mekentosj.com > > iRNAi, do you? > http://www.mekentosj.com/irnai > > ********************************************************* > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > From kvddrift at earthlink.net Sat Mar 19 16:56:56 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 19 Mar 2005 16:56:56 -0500 Subject: [BioCocoa-dev] Peptides... In-Reply-To: References: <865134ceae5879a882c2faf3050ca357@mekentosj.com> <9d0cc985a7e2df5b0f6ac1a1cc1a3f0f@mekentosj.com> Message-ID: On Mar 19, 2005, at 4:17 PM, Koen van der Drift wrote: > I found a bug in the display of the peptides, it is an error between > 0-based and 1-based calculations. If I select a peptide in the right > pane, in the left pane the peptide shown is one position off (too > high). First I tried changing that in the search code, but tht is not > right because all calculations are 0-based. So it has to be changed in > the display code: > Sorry, this is not right either :). The change should be in Results.m: - (NSString *)description{ return [NSString stringWithFormat: @"Peptide: %3.d-%3.d\t MW: %.3f\t (%.4f)", range.location + 1, range.location + range.length, mw, diff]; } - Koen. From kvddrift at earthlink.net Sat Mar 19 16:57:56 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 19 Mar 2005 16:57:56 -0500 Subject: [BioCocoa-dev] Peptides... In-Reply-To: <865134ceae5879a882c2faf3050ca357@mekentosj.com> References: <865134ceae5879a882c2faf3050ca357@mekentosj.com> Message-ID: <675a758b212809b0dff2fb9a303b274c@earthlink.net> On Mar 19, 2005, at 7:22 AM, Alexander Griekspoor wrote: > Often using the framework is the best way to discover problems or come > up with new ideas (so I guess I better start working on the demo app > we plan to make), so here are a few suggestions/changes I'd like to > feed back in the repository but not before you have agreed to do so... > 1) In this case I have one sequence of which I want to calculate the > mass of many peptides. So I wanted to cache the BCToolMassCalculator > and ask it for the mass of a certain range. To avoid the overhead of > having to create a subsequence first, instantiate a new > BCToolMassCalculator from it and calculate the mass, I've added a > simple method: > -(NSArray *)calculateMassForRange: (NSRange)aRange; > and changed the original calculateMass method to a convenience method: > --(NSArray *)calculateMass{ > return [self calculateMassForRange: NSMakeRange(0, [[self sequence] > length])]; > } Sounds good. Some comments on the mass calculation. In mass spectrometry a molecule can only be detected if it has a charge. With most modern MS equipment the charge of peptides and proteins is obtained by addition of one or more protons from the solvent. Usually this is denoted as [M+H]+ or [M+2H]2+. Especially for peptides, the charge in general is 2+; a peptide of MW 2000, will therefore be observed as [2000 + 2* protonmass]/2 = 1001. This if referred to as the mass-over-charge ratio (m/z). So when a mass spectrum shows a peptide of 1001, the peptide can actually have an uncharged mass of 1000 or 2000. Looking at the spectrum will reveal if a species is 'singly-charged' or 'doubly-charged'. So for our search program, we need to take into account what the charge of the peptide is that we are looking at. The mass of a proton is defined in BCFoundationDefines.h: H_mono and H_ave. This code probably needs to be added to BCMassCalculator tool - I will look into that and will start with just protons, but we need to keep in mind that there are more possibilities, eg sodium. > As BCToolMassCalculator uses BCToolSymbolCounter (very elegant Koen!) > I've added the same method there as well: > - (NSCountedSet *)countSymbols; > - (NSCountedSet *)countSymbolsForRange: (NSRange)aRange; Sounds good, too. > > 2) I noticed that the BCSequenceView still needs a lot of works. For > one, it can display line numbers (not so useful), but it wrongly > displays symbol numbers. Also many things are not configurable yet > (like the indent of the spacing). Also, none of the selection, marking > etc methods take the spaces into account, so for a start I added an > override of the setSelectedRange: method: > - (void)setSelectedRange:(NSRange)charRange{ > int start = charRange.location; > int end = charRange.location + charRange.length; > start += start/10; > end += end/10; > [super setSelectedRange: NSMakeRange(start, end-start)]; > } > Again, the 10 here should become configurable later.. I'd like to > spend some time on this one further, and see what we can do... (it's > also required for the demo app). As discussed a while ago, for our BCSequenceView, it might eventually be better to roll our own view, including a NSLayoutManager, etc. The code that is currently used is sufficient for displaying a simple sequence in an NSTextView. - Koen. From mek at mekentosj.com Sat Mar 19 17:48:30 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sat, 19 Mar 2005 23:48:30 +0100 Subject: [BioCocoa-dev] Peptides... In-Reply-To: References: <865134ceae5879a882c2faf3050ca357@mekentosj.com> <9d0cc985a7e2df5b0f6ac1a1cc1a3f0f@mekentosj.com> Message-ID: Hi Koen, > Thanks, now it works! I like it a lot, also the icon :) Great! Unfortunately we can't use the icon as I ripped it somewhere from the internet ;-) > > Maybe we can add a margin window, where we set the search range. In > mass spec you usually only search for peptides within a small window > of the mass you type in. That's a good idea... > The default value is maybe 1 Da (or in ppm). What do you mean exactly? Round the values to 1Da? > Also for peptides, monoisotopic mass should be the default. That's easy to change. I basically did my thing and forwarded the project to the guy I helped. What shall we do? Are you interested in further modifying it? Do you want to add it to the project (the easiest way to further update it a bit)? It's pretty similar to the translation demo however (which on itself is not too finished though)... > Even better would be to leave the sign in the value for the setDiff > method. Only use abs to look for the difference with the value in the > window That's a good suggestion as well, we only need the absolute function in the example you gave and in the compare: method actually... > I found a bug in the display of the peptides, it is an error between > 0-based and 1-based calculations. If I select a peptide in the right > pane, in the left pane the peptide shown is one position off (too > high). First I tried changing that in the search code, but tht is not > right because all calculations are 0-based. So it has to be changed in > the display code: Yep, I'll change that... > As you already mentioned in the comments, the code can be sped up by > using a search window of the average MW of an amino acid (= 110 Da), > +/- 10% (or maybe 15%). It would definitely make things go faster with larger protein sequences > Some comments on the mass calculation. In mass spectrometry a molecule > can only be detected if it has a charge. With most modern MS equipment > the charge of peptides and proteins is obtained by addition of one or > more protons from the solvent. Usually this is denoted as [M+H]+ or > [M+2H]2+. Especially for peptides, the charge in general is 2+; a > peptide of MW 2000, will therefore be observed as [2000 + 2* > protonmass]/2 = 1001. This if referred to as the mass-over-charge > ratio (m/z). So when a mass spectrum shows a peptide of 1001, the > peptide can actually have an uncharged mass of 1000 or 2000. Looking > at the spectrum will reveal if a species is 'singly-charged' or > 'doubly-charged'. So for our search program, we need to take into > account what the charge of the peptide is that we are looking at. The > mass of a proton is defined in BCFoundationDefines.h: H_mono and > H_ave. This code probably needs to be added to BCMassCalculator tool - > I will look into that and will start with just protons, but we need to > keep in mind that there are more possibilities, eg sodium. I already hoped you could give me some more insight, thanks! Again, you have the knowledge to further improve the app a lot, feel free to do so if you're interested... A simple popupbutton of expected modifications, or at least a matrix for singly or double charged would be easy to add and compensated for... > As discussed a while ago, for our BCSequenceView, it might eventually > be better to roll our own view, including a NSLayoutManager, etc. The > code that is currently used is sufficient for displaying a simple > sequence in an NSTextView. Absolutely true, I think there are two possibilities here however. First we can further improve the current BCSequenceView quite a bit with a few relatively simple additions to take the spacing into account. The second is indeed a far more difficult one, to create a "native" BCSequence view. Ideally one that can be further extended to display alignments as well. Finally, here are the initial comments from the Tassos, the guy who challenged me to create the project. It's interesting to see that he indeed managed to get the thing working at 50x the speed our framework manages to get (at least he claims to ;-). Shark tells me that most time (65%) is already lost to object messaging in the symbol counter, so perhaps that could deserve some optimization ;-) The kudos for the added water molecule go to you! And the comment about the framework to us all! nice ... very nice ;-) did not look at the code yet, but for the same sequence size I can do it at less than 1 second instead of 30 secs, really curious to see what you screw up in the coding ;-) my speed test was looking for the same lengths as you did, which - due to bad advice from me- was excessive by far. checking +/-15 is far enough. (do I sound like Victor ?) but, very nice, you added 18 to all fragments ;-) obvious error to make in first implementation avoided ...! Only one little bug, you display the wrong aa range (one aa to the left) but you highlight the correct one ;-) scientifically now, its clear that the ms accuracy you need can not be achieved .... so, i can have fun coding a feature to input 1-3 aa from N-term sequencing, while I try and speed up the code ... and maybe fix the bugs ;-) really thanx, its a really nice framework to play with ! ??????? A. ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 6172 bytes Desc: not available URL: From kvddrift at earthlink.net Sat Mar 19 17:54:22 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 19 Mar 2005 17:54:22 -0500 Subject: [BioCocoa-dev] Peptides... In-Reply-To: References: <865134ceae5879a882c2faf3050ca357@mekentosj.com> <9d0cc985a7e2df5b0f6ac1a1cc1a3f0f@mekentosj.com> Message-ID: On Mar 19, 2005, at 5:48 PM, Alexander Griekspoor wrote: > I basically did my thing and forwarded the project to the guy I > helped. What shall we do? Are you interested in further modifying it? > Do you want to add it to the project (the easiest way to further > update it a bit)? It's pretty similar to the translation demo however > (which on itself is not too finished though)... > No problem, I will add it to the project, including my fixes. So if you have more changes, you can from now use the code in the framework. - Koen. From mek at mekentosj.com Sat Mar 19 17:55:46 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sat, 19 Mar 2005 23:55:46 +0100 Subject: [BioCocoa-dev] Peptides... In-Reply-To: References: <865134ceae5879a882c2faf3050ca357@mekentosj.com> <9d0cc985a7e2df5b0f6ac1a1cc1a3f0f@mekentosj.com> Message-ID: <1a278fbae50bc433edcddbccc740424d@mekentosj.com> Super! On 19-mrt-05, at 23:54, Koen van der Drift wrote: > > On Mar 19, 2005, at 5:48 PM, Alexander Griekspoor wrote: > >> I basically did my thing and forwarded the project to the guy I >> helped. What shall we do? Are you interested in further modifying it? >> Do you want to add it to the project (the easiest way to further >> update it a bit)? It's pretty similar to the translation demo however >> (which on itself is not too finished though)... >> > > No problem, I will add it to the project, including my fixes. So if > you have more changes, you can from now use the code in the framework. > > > - Koen. > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Microsoft is not the answer, Microsoft is the question, NO is the answer ********************************************************* From a.griekspoor at nki.nl Sat Mar 19 18:10:16 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Sun, 20 Mar 2005 00:10:16 +0100 Subject: [BioCocoa-dev] Peptides... In-Reply-To: <1a278fbae50bc433edcddbccc740424d@mekentosj.com> References: <865134ceae5879a882c2faf3050ca357@mekentosj.com> <9d0cc985a7e2df5b0f6ac1a1cc1a3f0f@mekentosj.com> <1a278fbae50bc433edcddbccc740424d@mekentosj.com> Message-ID: <964fd50b4913c9ef1020982ab8c1fb23@nki.nl> I've committed the changes to BCSymbolCounter and BCToolMassCalculator so the Peptides demo app should work with the current framework. Alex On 19-mrt-05, at 23:55, Alexander Griekspoor wrote: > Super! > > On 19-mrt-05, at 23:54, Koen van der Drift wrote: > >> >> On Mar 19, 2005, at 5:48 PM, Alexander Griekspoor wrote: >> >>> I basically did my thing and forwarded the project to the guy I >>> helped. What shall we do? Are you interested in further modifying >>> it? Do you want to add it to the project (the easiest way to further >>> update it a bit)? It's pretty similar to the translation demo >>> however (which on itself is not too finished though)... >>> >> >> No problem, I will add it to the project, including my fixes. So if >> you have more changes, you can from now use the code in the >> framework. >> >> >> - Koen. >> >> _______________________________________________ >> Biococoa-dev mailing list >> Biococoa-dev at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/biococoa-dev >> >> > ********************************************************* > ** Alexander Griekspoor ** > ********************************************************* > The Netherlands Cancer Institute > Department of Tumorbiology (H4) > Plesmanlaan 121, 1066 CX, Amsterdam > Tel: + 31 20 - 512 2023 > Fax: + 31 20 - 512 2029 > AIM: mekentosj at mac.com > E-mail: a.griekspoor at nki.nl > Web: http://www.mekentosj.com > > Microsoft is not the answer, > Microsoft is the question, > NO is the answer > > ********************************************************* > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* From kvddrift at earthlink.net Sat Mar 19 18:11:41 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 19 Mar 2005 18:11:41 -0500 Subject: [BioCocoa-dev] Peptides... In-Reply-To: <964fd50b4913c9ef1020982ab8c1fb23@nki.nl> References: <865134ceae5879a882c2faf3050ca357@mekentosj.com> <9d0cc985a7e2df5b0f6ac1a1cc1a3f0f@mekentosj.com> <1a278fbae50bc433edcddbccc740424d@mekentosj.com> <964fd50b4913c9ef1020982ab8c1fb23@nki.nl> Message-ID: Super! On Mar 19, 2005, at 6:10 PM, Alexander Griekspoor wrote: > I've committed the changes to BCSymbolCounter and BCToolMassCalculator > so the Peptides demo app should work with the current framework. From kvddrift at earthlink.net Sat Mar 19 18:32:18 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 19 Mar 2005 18:32:18 -0500 Subject: [BioCocoa-dev] Peptides... In-Reply-To: References: <865134ceae5879a882c2faf3050ca357@mekentosj.com> <9d0cc985a7e2df5b0f6ac1a1cc1a3f0f@mekentosj.com> Message-ID: <53ecb01a6ce610ba0710511be21cdfe2@earthlink.net> On Mar 19, 2005, at 5:48 PM, Alexander Griekspoor wrote: > Shark tells me that most time (65%) is already lost to object > messaging in the symbol counter, so perhaps that could deserve some > optimization ;-) Try setting the #if 0 in calculateMassForRange. Then it won't use the symbolCounter, but calculate each individual symbol. Maybe that's faster. - Koen. From mek at mekentosj.com Sat Mar 19 18:34:58 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sun, 20 Mar 2005 00:34:58 +0100 Subject: [BioCocoa-dev] Peptides... In-Reply-To: <53ecb01a6ce610ba0710511be21cdfe2@earthlink.net> References: <865134ceae5879a882c2faf3050ca357@mekentosj.com> <9d0cc985a7e2df5b0f6ac1a1cc1a3f0f@mekentosj.com> <53ecb01a6ce610ba0710511be21cdfe2@earthlink.net> Message-ID: I'll test it and see what's the fasted method... Alex On 20-mrt-05, at 0:32, Koen van der Drift wrote: > > On Mar 19, 2005, at 5:48 PM, Alexander Griekspoor wrote: > >> Shark tells me that most time (65%) is already lost to object >> messaging in the symbol counter, so perhaps that could deserve some >> optimization ;-) > > > Try setting the #if 0 in calculateMassForRange. Then it won't use the > symbolCounter, but calculate each individual symbol. Maybe that's > faster. > > - Koen. > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* From mek at mekentosj.com Sun Mar 20 05:14:17 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sun, 20 Mar 2005 11:14:17 +0100 Subject: [BioCocoa-Dev] Project update Message-ID: Guys, right now I'm changing some stuff in the Xcode project, please don't commit anything in the next hour or so... More to follow... Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* ************************************************************** ** Alexander Griekspoor ** ************************************************************** The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com MacOS X: The power of UNIX with the simplicity of the Mac *************************************************************** From a.perrakis at nki.nl Sun Mar 20 06:55:48 2005 From: a.perrakis at nki.nl (Anastassis Perrakis) Date: Sun, 20 Mar 2005 12:55:48 +0100 Subject: [BioCocoa-dev] Peptides... In-Reply-To: References: <865134ceae5879a882c2faf3050ca357@mekentosj.com> <9d0cc985a7e2df5b0f6ac1a1cc1a3f0f@mekentosj.com> Message-ID: >> Some comments on the mass calculation. In mass spectrometry a >> molecule can only be detected if it has a charge. With most modern MS >> equipment the charge of peptides and proteins is obtained by addition >> of one or more protons from the solvent. Usually this is denoted as >> [M+H]+ or [M+2H]2+. Especially for peptides, the charge in general is >> 2+; a peptide of MW 2000, will therefore be observed as [2000 + 2* >> protonmass]/2 = 1001. This if referred to as the mass-over-charge >> ratio (m/z). So when a mass spectrum shows a peptide of 1001, the >> peptide can actually have an uncharged mass of 1000 or 2000. Looking >> at the spectrum will reveal if a species is 'singly-charged' or >> 'doubly-charged'. So for our search program, we need to take into >> account what the charge of the peptide is that we are looking at. The >> mass of a proton is defined in BCFoundationDefines.h: H_mono and >> H_ave. This code probably needs to be added to BCMassCalculator tool >> - I will look into that and will start with just protons, but we need >> to keep in mind that there are more possibilities, eg sodium. > I already hoped you could give me some more insight, thanks! Again, > you have the knowledge to further improve the app a lot, feel free to > do so if you're interested... A simple popupbutton of expected > modifications, or at least a matrix for singly or double charged would > be easy to add and compensated for... Although all the above 100% correct the application I suggested to Alex was to find a already corrected mass in the sequence. Thus the charge correction is not needed, the mass is coming from 8-14 peaks typically and then is corrected. What of course would be cool would be to read the scan, find the peaks, get the MW and correct. They sell such software for 12.000 Euro - believe it or not ! I got an Excel sheet and a little c application using GSL - they took 30 mins each to make, but for both you need to type in the peaks. To identify peaks is trivial - I just need to get my 3-D code to 1-D ;-) I can prototype that in c or f77 and then let Alex slow it down by a factor of 50 or so ... Can be a fun project if I get some time. Do you guys have any MS tools for proteins already though ? > >> As discussed a while ago, for our BCSequenceView, it might eventually >> be better to roll our own view, including a NSLayoutManager, etc. The >> code that is currently used is sufficient for displaying a simple >> sequence in an NSTextView. > Absolutely true, I think there are two possibilities here however. > First we can further improve the current BCSequenceView quite a bit > with a few relatively simple additions to take the spacing into > account. The second is indeed a far more difficult one, to create a > "native" BCSequence view. Ideally one that can be further extended to > display alignments as well. > > Finally, here are the initial comments from the Tassos, the guy who > challenged me to create the project. It's interesting to see that he > indeed managed to get the thing working at 50x the speed our framework > manages to get (at least he claims to ;-). f77 available on request ;-) ... hmm, but i was so lazy I did hardcode the mass you look for .. another 10 mins of programming to fix it. A. > Shark tells me that most time (65%) is already lost to object > messaging in the symbol counter, so perhaps that could deserve some > optimization ;-) The kudos for the added water molecule go to you! And > the comment about the framework to us all! > > ... actually, its not a water molecule: a residue has N/CA/C/O atoms, plus the side chain. When its cleaved it will get an addtional OH group at the C+ thats created. And, you need to count one extra H at the Nterm ;-) ... well, here is your 'water' ;-) (Alex has been doing to much FRET lately and he needs his chemistry reminded( Ciao guys ! A. > nice ... very nice ;-) > > did not look at the code yet, but for the same sequence size I can do > it at less than 1 second instead of 30 secs, really curious to see > what > you screw up in the coding ;-) > my speed test was looking for the same lengths as you did, which - due > to bad advice from me- > was excessive by far. checking +/-15 is far enough. > > (do I sound like Victor ?) > > but, very nice, you added 18 to all fragments ;-) obvious error to > make > in first implementation avoided ...! Only one little bug, you display > the wrong aa range (one aa to the left) but you highlight the correct > one ;-) > > scientifically now, its clear that the ms accuracy you need can not be > achieved .... > so, i can have fun coding a feature to input 1-3 aa from N-term > sequencing, while > I try and speed up the code ... and maybe fix the bugs ;-) > > really thanx, its a really nice framework to play with ! > > ??????? A. > > ********************************************************* > ** Alexander Griekspoor ** > ********************************************************* > The Netherlands Cancer Institute > Department of Tumorbiology (H4) > Plesmanlaan 121, 1066 CX, Amsterdam > Tel: + 31 20 - 512 2023 > Fax: + 31 20 - 512 2029 > AIM: mekentosj at mac.com > E-mail: a.griekspoor at nki.nl > Web: http://www.mekentosj.com > > LabAssistant - Get your life organized! > http://www.mekentosj.com/labassistant > > ********************************************************* -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 5755 bytes Desc: not available URL: From kvddrift at earthlink.net Sun Mar 20 07:13:01 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 20 Mar 2005 07:13:01 -0500 Subject: [BioCocoa-dev] Peptides... In-Reply-To: References: <865134ceae5879a882c2faf3050ca357@mekentosj.com> <9d0cc985a7e2df5b0f6ac1a1cc1a3f0f@mekentosj.com> Message-ID: <0ddbd668025a6d5e802958f8dbfbc7bd@earthlink.net> On Mar 20, 2005, at 6:55 AM, Anastassis Perrakis wrote: > Although all the above 100% correct the application I suggested to > Alex was to find a already corrected mass in the sequence. Thus the > charge correction is not needed, the mass is coming from 8-14 peaks > typically and then is corrected. > I guess we need to add a charge popup button, including no charge (z=0). > What of course would be cool would be to read the scan, find the > peaks, get the MW and correct. > They sell such software for 12.000 Euro - believe it or not ! > I got an Excel sheet and a little c application using GSL - they took > 30 mins each to make, but for both > you need to type in the peaks. > > To identify peaks is trivial - I just need to get my 3-D code to 1-D > ;-) > I can prototype that in c or f77 and then let Alex slow it down by a > factor of 50 or so ... > Can be a fun project if I get some time. > > Do you guys have any MS tools for proteins already though ? > Not yet, I havesome code to do digesting in silico, but it is pending on the introduction of the BCScanner class. We had some discussions about this a few days ago. If you want, I can mail you (offline) my app that already does some of this (using a non BioCocoa class). - Koen. From mek at mekentosj.com Sun Mar 20 07:15:23 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sun, 20 Mar 2005 13:15:23 +0100 Subject: [BioCocoa-dev] Peptides... In-Reply-To: <0ddbd668025a6d5e802958f8dbfbc7bd@earthlink.net> References: <865134ceae5879a882c2faf3050ca357@mekentosj.com> <9d0cc985a7e2df5b0f6ac1a1cc1a3f0f@mekentosj.com> <0ddbd668025a6d5e802958f8dbfbc7bd@earthlink.net> Message-ID: <4564f18cbafef4be422d00fe758b7cc6@mekentosj.com> Watch out for this guy, before you know he has speeded up your code 50 times and you're the fool in this play... ;-) Alex On 20-mrt-05, at 13:13, Koen van der Drift wrote: > > On Mar 20, 2005, at 6:55 AM, Anastassis Perrakis wrote: > >> Although all the above 100% correct the application I suggested to >> Alex was to find a already corrected mass in the sequence. Thus the >> charge correction is not needed, the mass is coming from 8-14 peaks >> typically and then is corrected. >> > > I guess we need to add a charge popup button, including no charge > (z=0). > > >> What of course would be cool would be to read the scan, find the >> peaks, get the MW and correct. >> They sell such software for 12.000 Euro - believe it or not ! >> I got an Excel sheet and a little c application using GSL - they took >> 30 mins each to make, but for both >> you need to type in the peaks. >> >> To identify peaks is trivial - I just need to get my 3-D code to 1-D >> ;-) >> I can prototype that in c or f77 and then let Alex slow it down by a >> factor of 50 or so ... >> Can be a fun project if I get some time. >> >> Do you guys have any MS tools for proteins already though ? >> > > Not yet, I havesome code to do digesting in silico, but it is pending > on the introduction of the BCScanner class. We had some discussions > about this a few days ago. If you want, I can mail you (offline) my > app that already does some of this (using a non BioCocoa class). > > > - Koen. > > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 E-mail: a.griekspoor at nki.nl AIM: mekentosj at mac.com Web: http://www.mekentosj.com EnzymeX - To cut or not to cut http://www.mekentosj.com/enzymex ********************************************************* From mek at mekentosj.com Sun Mar 20 07:23:34 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sun, 20 Mar 2005 13:23:34 +0100 Subject: [BioCocoa-Dev] Project Update...cvs struggle Message-ID: <0f549186b3e473ce5e96d435ebfff938@mekentosj.com> I HATE CVS! My god, after struggling for more than an hour I give up. I've modified the project in several places: Added Peptides demo-app Modified Project: - removed old targets - updated everything to new targets - changed plist names - Reorganized some stuff But to get it all in CVS is hell! First it started to complain that my pbx project file was inconsistent, which I managed to fix (don't ask how) and now everything seems to be ok, except that I can't get the english.lproj folder of my Peptides demo app in. So I've attached the files and ask your help to force CVS to eat the thing and place it at Examples/Peptides/ The Xcode project already knows of there present, but obviously can't find them right now.. If you manage to do so, please let me know how I should have done that. I hope the project works fine again, if you want to be sure, please do a fresh checkout the get all the modifications... Also let me know if all targets work properly. Charles could you check things like the SDKRoot variables and stuff so that we build against the proper SDK for all targets I also added the earlier discussed stringByAddingURLEscapesUsingEncoding to BCUtilStrings which removes the build warning in BCUtilCGI and makes our framework fully 10.2 compatible ;-) I have to get rid of some anger now... I'm off running... Cheers, Alex -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 1379 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: English.lproj.zip Type: application/zip Size: 19032 bytes Desc: not available URL: -------------- next part -------------- ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com iRNAi, do you? http://www.mekentosj.com/irnai ********************************************************* -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 685 bytes Desc: not available URL: From kvddrift at earthlink.net Sun Mar 20 08:09:57 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 20 Mar 2005 08:09:57 -0500 Subject: [BioCocoa-Dev] Project Update...cvs struggle In-Reply-To: <0f549186b3e473ce5e96d435ebfff938@mekentosj.com> References: <0f549186b3e473ce5e96d435ebfff938@mekentosj.com> Message-ID: <350f937dd69ab015b9c97e44c6eff340@earthlink.net> On Mar 20, 2005, at 7:23 AM, Alexander Griekspoor wrote: > So I've attached the files and ask your help to force CVS to eat the > thing and place it at Examples/Peptides/ The Xcode project already > knows of there present, but obviously can't find them right now.. > If you manage to do so, please let me know how I should have done that. > This is a big pita :) Even when ssh'ed into biococoa I cannot get it to work :( I'll try some more after breakfast. - Koen. From kvddrift at earthlink.net Sun Mar 20 08:25:21 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 20 Mar 2005 08:25:21 -0500 Subject: [BioCocoa-Dev] Project Update...cvs struggle In-Reply-To: <0f549186b3e473ce5e96d435ebfff938@mekentosj.com> References: <0f549186b3e473ce5e96d435ebfff938@mekentosj.com> Message-ID: I think I got it to work. Let me know if you get the right files in the right palce. If so, I will tell you how I did it ;-) - Koen. On Mar 20, 2005, at 7:23 AM, Alexander Griekspoor wrote: > I HATE CVS! > My god, after struggling for more than an hour I give up. I've > modified the project in several places: > Added Peptides demo-app > Modified Project: > - removed old targets > - updated everything to new targets > - changed plist names > - Reorganized some stuff > > But to get it all in CVS is hell! First it started to complain that my > pbx project file was inconsistent, which I managed to fix (don't ask > how) and now everything seems to be ok, except that I can't get the > english.lproj folder of my Peptides demo app in. So I've attached the > files and ask your help to force CVS to eat the thing and place it at > Examples/Peptides/ The Xcode project already knows of there present, > but obviously can't find them right now.. > If you manage to do so, please let me know how I should have done that. > > I hope the project works fine again, if you want to be sure, please do > a fresh checkout the get all the modifications... Also let me know if > all targets work properly. Charles could you check things like the > SDKRoot variables and stuff so that we build against the proper SDK > for all targets > > I also added the earlier discussed > stringByAddingURLEscapesUsingEncoding to BCUtilStrings which removes > the build warning in BCUtilCGI and makes our framework fully 10.2 > compatible ;-) > > I have to get rid of some anger now... I'm off running... > Cheers, > Alex > > > > ********************************************************* > ** Alexander Griekspoor ** > ********************************************************* > The Netherlands Cancer Institute > Department of Tumorbiology (H4) > Plesmanlaan 121, 1066 CX, Amsterdam > Tel: + 31 20 - 512 2023 > Fax: + 31 20 - 512 2029 > AIM: mekentosj at mac.com > E-mail: a.griekspoor at nki.nl > Web: http://www.mekentosj.com > > iRNAi, do you? > http://www.mekentosj.com/irnai > > *********************************************************______________ > _________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev From kvddrift at earthlink.net Sun Mar 20 09:19:41 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 20 Mar 2005 09:19:41 -0500 Subject: [Biococoa-dev] IB question Message-ID: Hi, I added a popup for the charge state in the Peptides example. Now how do I get the value? In the code I added: int chargeState = [[chargeStatePopup titleOfSelectedItem] intValue]; but it always returns zero. thanks, - Koen. From mek at mekentosj.com Sun Mar 20 09:25:21 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sun, 20 Mar 2005 06:25:21 -0800 Subject: [Biococoa-dev] IB question Message-ID: <200503200625.AA1531052154@mekentosj.com> [chargeStatePopup indexOfSelectedItem] from the top of my head... I can't check the CVS here at work, I'll let you know if you managed to get the lproj folder added... Thanks! Alex ---------- Original Message ---------------------------------- From: Koen van der Drift Date: Sun, 20 Mar 2005 09:19:41 -0500 >Hi, > >I added a popup for the charge state in the Peptides example. Now how >do I get the value? In the code I added: > > int chargeState = [[chargeStatePopup titleOfSelectedItem] intValue]; > >but it always returns zero. > > >thanks, > >- Koen. > >_______________________________________________ >Biococoa-dev mailing list >Biococoa-dev at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/biococoa-dev > ___________________________________________________________ $0 Web Hosting with up to 120MB web space, 1000 MB Transfer 10 Personalized POP and Web E-mail Accounts, and much more. Signup at www.doteasy.com From kvddrift at earthlink.net Sun Mar 20 12:22:11 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 20 Mar 2005 12:22:11 -0500 Subject: [Biococoa-dev] IB question In-Reply-To: <200503200625.AA1531052154@mekentosj.com> References: <200503200625.AA1531052154@mekentosj.com> Message-ID: <6fcc795325ecf004cbdce7b5e92815ab@earthlink.net> On Mar 20, 2005, at 9:25 AM, Alexander Griekspoor wrote: > [chargeStatePopup indexOfSelectedItem] from the top of my head... > Tried that too, every setting gives '0'. I guess I missed something in IB. - Koen. From mek at mekentosj.com Sun Mar 20 12:33:10 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sun, 20 Mar 2005 18:33:10 +0100 Subject: [BioCocoa-Dev] Project Update...cvs struggle In-Reply-To: References: <0f549186b3e473ce5e96d435ebfff938@mekentosj.com> Message-ID: <30c330165876e56a26cebb3c69237820@mekentosj.com> Yes, it worked!! How how how? And the legal way? ;-) Alex On 20-mrt-05, at 14:25, Koen van der Drift wrote: > I think I got it to work. Let me know if you get the right files in > the right palce. If so, I will tell you how I did it ;-) > > > - Koen. > > > > On Mar 20, 2005, at 7:23 AM, Alexander Griekspoor wrote: > >> I HATE CVS! >> My god, after struggling for more than an hour I give up. I've >> modified the project in several places: >> Added Peptides demo-app >> Modified Project: >> - removed old targets >> - updated everything to new targets >> - changed plist names >> - Reorganized some stuff >> >> But to get it all in CVS is hell! First it started to complain that >> my pbx project file was inconsistent, which I managed to fix (don't >> ask how) and now everything seems to be ok, except that I can't get >> the english.lproj folder of my Peptides demo app in. So I've attached >> the files and ask your help to force CVS to eat the thing and place >> it at Examples/Peptides/ The Xcode project already knows of there >> present, but obviously can't find them right now.. >> If you manage to do so, please let me know how I should have done >> that. >> >> I hope the project works fine again, if you want to be sure, please >> do a fresh checkout the get all the modifications... Also let me know >> if all targets work properly. Charles could you check things like the >> SDKRoot variables and stuff so that we build against the proper SDK >> for all targets >> >> I also added the earlier discussed >> stringByAddingURLEscapesUsingEncoding to BCUtilStrings which removes >> the build warning in BCUtilCGI and makes our framework fully 10.2 >> compatible ;-) >> >> I have to get rid of some anger now... I'm off running... >> Cheers, >> Alex >> >> >> >> ********************************************************* >> ** Alexander Griekspoor ** >> ********************************************************* >> The Netherlands Cancer Institute >> Department of Tumorbiology (H4) >> Plesmanlaan 121, 1066 CX, Amsterdam >> Tel: + 31 20 - 512 2023 >> Fax: + 31 20 - 512 2029 >> AIM: mekentosj at mac.com >> E-mail: a.griekspoor at nki.nl >> Web: http://www.mekentosj.com >> >> iRNAi, do you? >> http://www.mekentosj.com/irnai >> >> *********************************************************_____________ >> __________________________________ >> Biococoa-dev mailing list >> Biococoa-dev at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/biococoa-dev > > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows vs Mac 65 million years ago, there were more dinosaurs than humans. Where are the dinosaurs now? ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Microsoft is not the answer, Microsoft is the question, NO is the answer ********************************************************* From mek at mekentosj.com Sun Mar 20 12:34:08 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Sun, 20 Mar 2005 18:34:08 +0100 Subject: [Biococoa-dev] IB question In-Reply-To: <6fcc795325ecf004cbdce7b5e92815ab@earthlink.net> References: <200503200625.AA1531052154@mekentosj.com> <6fcc795325ecf004cbdce7b5e92815ab@earthlink.net> Message-ID: <09a81641b24964671349a5a17496f1cd@mekentosj.com> Indeed sounds like a non-connected popupbutton... alex On 20-mrt-05, at 18:22, Koen van der Drift wrote: > > On Mar 20, 2005, at 9:25 AM, Alexander Griekspoor wrote: > >> [chargeStatePopup indexOfSelectedItem] from the top of my head... >> > > Tried that too, every setting gives '0'. I guess I missed something in > IB. > > - Koen. > > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Microsoft is not the answer, Microsoft is the question, NO is the answer ********************************************************* From kvddrift at earthlink.net Sun Mar 20 12:44:28 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 20 Mar 2005 12:44:28 -0500 Subject: [Biococoa-dev] IB question In-Reply-To: <09a81641b24964671349a5a17496f1cd@mekentosj.com> References: <200503200625.AA1531052154@mekentosj.com> <6fcc795325ecf004cbdce7b5e92815ab@earthlink.net> <09a81641b24964671349a5a17496f1cd@mekentosj.com> Message-ID: On Mar 20, 2005, at 12:34 PM, Alexander Griekspoor wrote: > Indeed sounds like a non-connected popupbutton... > Well, it was connected. However, I used a different name in IB and theController ;-) Now it is working, and I will commit some changes soon. - Koen. From kvddrift at earthlink.net Sun Mar 20 12:51:00 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 20 Mar 2005 12:51:00 -0500 Subject: [BioCocoa-Dev] Project Update...cvs struggle In-Reply-To: <30c330165876e56a26cebb3c69237820@mekentosj.com> References: <0f549186b3e473ce5e96d435ebfff938@mekentosj.com> <30c330165876e56a26cebb3c69237820@mekentosj.com> Message-ID: On Mar 20, 2005, at 12:33 PM, Alexander Griekspoor wrote: > Yes, it worked!! How how how? And the legal way? ;-) > Alex > > This is not for the faint of heart, and all from the command line. First I ssh'ed into the Biococoa directory at bioinformatics.org. I completely removed the English.lproj folder from the repository. Then I went to my HD, and first created the enclosing folder (English.lproj/), added and committed it to cvs. Then I cd'ed into that folder, and added the next item. Be aware though, MainMenu is a folder as well. So I created an empty MainMenu.nib folder, add + commit to cvs, and again added one by one the three items (info.nib, classes.nib, and keyedobjects.nib). Because keyedobjects.nib is a binary file, I used the -kb flag for the cvs commands. In the file you mailed there were some items named ..java.., which I did not use. Finally, I had to restore the right MainMenu.nib file in the translation demo. I think with your reorganization you by accident used the one from the Peptides demo ;-) - Koen. From kvddrift at earthlink.net Sun Mar 20 20:41:58 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 20 Mar 2005 20:41:58 -0500 Subject: [Biococoa-dev] string definitions In-Reply-To: References: <6fe0eb471e8e2c7895299ec59ef97c8b@earthlink.net> <067fed2f07ba0cc280e58c70a4a45e9d@nki.nl> Message-ID: <99b137d484bbe47ee1b596a16c79cb6a@earthlink.net> Hi, I was adding the stringdefinitions to some more source files, and ran into a weird build error. I can add #import "BCStringDefinitions.h" to one file (BCSymbol.m in this case), and use the replacement properties for various strings. However, if I then add #import "BCStringDefinitions.h" to the next file (BCAminoAcid.m), I get linker errors such as: ld: multiple definitions of symbol _BCSymbolpKaProperty /Users/koen/Documents/Development/Active Projects/FrameWorks/BioCocoa/build/BioCocoa.build/BioCocoa.build/ Objects-normal/ppc/BCAminoAcid.o definition of _BCSymbolpKaProperty in section (__DATA,__const) /Users/koen/Documents/Development/Active Projects/FrameWorks/BioCocoa/build/BioCocoa.build/BioCocoa.build/ Objects-normal/ppc/BCSymbol.o definition of _BCSymbolpKaProperty in section (__DATA,__const) ld: warning prebinding disabled because dependent library: /Developer/SDKs/MacOSX10.2.8.sdk/System/Library/Frameworks/ AppKit.framework/Versions/C/AppKit can't be searched Anyone has more knowledge about linkers, and hopefully how to fix this? The file /Developer/SDKs/MacOSX10.2.8.sdk/System/Library/Frameworks/ AppKit.framework/Versions/C/AppKit is present on my system. thanks, - Koen. On Mar 18, 2005, at 2:11 AM, Alexander Griekspoor wrote: > Good plan! > On 18-mrt-05, at 0:41, Koen van der Drift wrote: > >> >> On Mar 17, 2005, at 4:17 PM, Alexander Griekspoor wrote: >> >>>> We could define const NSStrings. In this case @"Name" could be >>>> replaced by BCSymbolName, or something equivalent. I also suggest >>>> if we implement this, we do this in one general headerfile, instead >>>> of each individual file. >>>> >>>> what do you think? >>> >>> Yes! Indeed very nice as I also mentioned in the commented alignment >>> .m file. >>> >> >> I suggest we use BCStringDefinitions.h. If no-one objects, I will go >> ahead and add that file. >> >> >> - Koen. >> >> _______________________________________________ >> Biococoa-dev mailing list >> Biococoa-dev at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/biococoa-dev >> >> > ************************************************************** > ** Alexander Griekspoor ** > ************************************************************** > The Netherlands Cancer Institute > Department of Tumorbiology (H4) > Plesmanlaan 121, 1066 CX, Amsterdam > Tel: + 31 20 - 512 2023 > Fax: + 31 20 - 512 2029 > AIM: mekentosj at mac.com > E-mail: a.griekspoor at nki.nl > Web: http://www.mekentosj.com > > MacOS X: The power of UNIX with the simplicity of the Mac > > *************************************************************** > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > From charles.parnot at stanford.edu Mon Mar 21 00:16:53 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Sun, 20 Mar 2005 21:16:53 -0800 Subject: [Biococoa-dev] testing alignements In-Reply-To: References: <42857544500b21b42dd9f5913091fabf@nki.nl> Message-ID: At 9:49 AM +0100 3/19/05, Philipp Seibel wrote: >Sorry me writing so late, but i was in cologne for two days. >I will comment the code soon, i promise. >Your code doesn't work because you used an old version.... , there is a new version in the cvs since wednesday. >If the new piece of code doesn't work, please tell. > >Phil Thanks! I profiled the code in Shark (on my old iMac G3 400), and found it spends ~35% of the time reading the score matrix. I don't know if it is worth changing the char mapping to find if things are improved? What do you think, guys? charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From charles.parnot at stanford.edu Mon Mar 21 00:26:20 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Sun, 20 Mar 2005 21:26:20 -0800 Subject: [POSSIBLE VIRUS:###] [BioCocoa-Dev] Project Update...cvs struggle In-Reply-To: <0f549186b3e473ce5e96d435ebfff938@mekentosj.com> References: <0f549186b3e473ce5e96d435ebfff938@mekentosj.com> Message-ID: At 1:23 PM +0100 3/20/05, Alexander Griekspoor wrote: >I HATE CVS! Part of the problem is Xcode. I gave up on cvs integration in Xcode for the BioCocoa project, and now went back to CVL, a GUI for CVS which is very nice indeed (interestingly, same guys as OCUnit). I can do project-wide update (include automatically new folders) and commit easily with recursive behavior, add folders, files,... The only thing is you need to close your Xcode project if the main file was changed. Nib files are folders and are therefore visualized as such. The same with the project file. And files not seen by Xcode are handled too, of course, which helps in some situations. charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From charles.parnot at stanford.edu Mon Mar 21 00:42:42 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Sun, 20 Mar 2005 21:42:42 -0800 Subject: [Biococoa-dev] string definitions In-Reply-To: <99b137d484bbe47ee1b596a16c79cb6a@earthlink.net> References: <6fe0eb471e8e2c7895299ec59ef97c8b@earthlink.net> <067fed2f07ba0cc280e58c70a4a45e9d@nki.nl> <99b137d484bbe47ee1b596a16c79cb6a@earthlink.net> Message-ID: >Hi, > >I was adding the stringdefinitions to some more source files, and ran into a weird build error. I can add #import "BCStringDefinitions.h" to one file (BCSymbol.m in this case), and use the replacement properties for various strings. However, if I then add #import "BCStringDefinitions.h" to the next file (BCAminoAcid.m), I get linker errors such as: > >ld: multiple definitions of symbol _BCSymbolpKaProperty >/Users/koen/Documents/Development/Active Projects/FrameWorks/BioCocoa/build/BioCocoa.build/BioCocoa.build/Objects-normal/ppc/BCAminoAci<...snip...> >thanks, > >- Koen. > You can't define the value of a variable in a header and then call that header from multiple file. Otherwise, the compiler creates one variable for each implementation file and each compiled '.o' piece of code. What you are supposed to do is declare those variables in the header using extern, so the linker knows that these variables will exist. And then the values should be set in an implementation file. Constants have to be associated with one particular class. It should probably be in BCSymbol.m. So the header should have extern NSString *BCSymbolNameProperty; and the implementation NSString *BCSymbolNameProperty=@"Name"; charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From a.griekspoor at nki.nl Mon Mar 21 03:27:28 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Mon, 21 Mar 2005 09:27:28 +0100 Subject: [POSSIBLE VIRUS:###] [BioCocoa-Dev] Project Update...cvs struggle In-Reply-To: References: <0f549186b3e473ce5e96d435ebfff938@mekentosj.com> Message-ID: Well I used CVL yesterday but that didn't help much either, for instance it refused to get the English.lproj folder in as well. Another such irritations, it started to mark almost every folder as "need update" because it didn't know how to handle DS_Store files ?! But perhaps this was all because XCode had already screwed up for me... For simple updates/modifications XCode is fine, but the moment you want to add things to the repository it's hell. Charles, have you completely disabled CVS in XCode? Alex On 21-mrt-05, at 6:26, Charles PARNOT wrote: > At 1:23 PM +0100 3/20/05, Alexander Griekspoor wrote: >> I HATE CVS! > > > Part of the problem is Xcode. I gave up on cvs integration in Xcode > for the BioCocoa project, and now went back to CVL, a GUI for CVS > which is very nice indeed (interestingly, same guys as OCUnit). I can > do project-wide update (include automatically new folders) and commit > easily with recursive behavior, add folders, files,... The only thing > is you need to close your Xcode project if the main file was changed. > Nib files are folders and are therefore visualized as such. The same > with the project file. And files not seen by Xcode are handled too, of > course, which helps in some situations. > > charles > > -- > Help science go fast forward: > http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ > > Charles Parnot > charles.parnot at stanford.edu > > Room B157 in Beckman Center > 279, Campus Drive > Stanford University > Stanford, CA 94305 (USA) > > Tel +1 650 725 7754 > Fax +1 650 725 8021 > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com iRNAi, do you? http://www.mekentosj.com/irnai ********************************************************* From mek at mekentosj.com Mon Mar 21 03:30:00 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Mon, 21 Mar 2005 09:30:00 +0100 Subject: [BioCocoa-Dev] Project Update...cvs struggle In-Reply-To: References: <0f549186b3e473ce5e96d435ebfff938@mekentosj.com> <30c330165876e56a26cebb3c69237820@mekentosj.com> Message-ID: > Finally, I had to restore the right MainMenu.nib file in the > translation demo. I think with your reorganization you by accident > used the one from the Peptides demo ;-) Yeah, now I remember something like that indeed, the trick is to FIRST copy the files in the proper folder, then add them to the XCode project. Initially I just dropped the files from another location into the project, resulting in all files being copied in the root. Then I had to move them and somewhere that is where the mistake must have happened. Pfeww, I'm glad this is all over and everything works now. No one of you had troubles with the renames/moved plist files/targets? Great! Alex > > > - Koen. > > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* From a.griekspoor at nki.nl Mon Mar 21 03:49:54 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Mon, 21 Mar 2005 09:49:54 +0100 Subject: [Biococoa-dev] testing alignements In-Reply-To: References: <42857544500b21b42dd9f5913091fabf@nki.nl> Message-ID: <5d67fd2141deb19f55c9f9a3766aaaba@nki.nl> > I profiled the code in Shark (on my old iMac G3 400), and found it > spends ~35% of the time reading the score matrix. I don't know if it > is worth changing the char mapping to find if things are improved? > What do you think, guys? I would say that depends on a number of things I guess. Let me explain: - how well is the code performing right now? Is it slow? - is this the only major part the code spends it time in and the rest is fragmented? Then yes, if there's something else where the algorithm stays 50% of the time in, I would first focus on that process - are there real reasons to believe that remapping will significantly speed up things, then it's worth trying (don't forget to take the time remapping takes into account for a fair comparison) - finally and perhaps most important, how much would remapping help in other implementations. Don't wanna sound to picky but although very nice, the current algorithms we have still have the classical memory limits. So if we can't run to large alignments because we run out of memory, why spend all the energy in optimizing speed instead of implementing an algorithm with subquadratic memory requirements? The real question then becomes, do we need the mapping there? Phil can probably tell... Cheers, Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* From kvddrift at earthlink.net Mon Mar 21 06:39:49 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Mon, 21 Mar 2005 06:39:49 -0500 Subject: [Biococoa-dev] string definitions In-Reply-To: References: <6fe0eb471e8e2c7895299ec59ef97c8b@earthlink.net> <067fed2f07ba0cc280e58c70a4a45e9d@nki.nl> <99b137d484bbe47ee1b596a16c79cb6a@earthlink.net> Message-ID: <748fb4ed438d490f2412c2e2c2686d46@earthlink.net> On Mar 21, 2005, at 12:42 AM, Charles PARNOT wrote: > You can't define the value of a variable in a header and then call > that header from multiple file. Otherwise, the compiler creates one > variable for each implementation file and each compiled '.o' piece of > code. What you are supposed to do is declare those variables in the > header using extern, so the linker knows that these variables will > exist. And then the values should be set in an implementation file. > Constants have to be associated with one particular class. It should > probably be in BCSymbol.m. > > So the header should have > extern NSString *BCSymbolNameProperty; > > and the implementation > NSString *BCSymbolNameProperty=@"Name"; > Well then that would defeat the purpose of one central file with all the strings defines, some of which might occur in more than one file. Maybe there is another construction possible to do such a thing? thanks, - Koen. From charles.parnot at stanford.edu Mon Mar 21 14:52:22 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Mon, 21 Mar 2005 11:52:22 -0800 Subject: [Biococoa-dev] testing alignements In-Reply-To: <5d67fd2141deb19f55c9f9a3766aaaba@nki.nl> References: <42857544500b21b42dd9f5913091fabf@nki.nl> <5d67fd2141deb19f55c9f9a3766aaaba@nki.nl> Message-ID: At 9:49 +0100 3/21/05, Alexander Griekspoor wrote: >>I profiled the code in Shark (on my old iMac G3 400), and found it spends ~35% of the time reading the score matrix. I don't know if it is worth changing the char mapping to find if things are improved? What do you think, guys? > >I would say that depends on a number of things I guess. Let me explain: >- how well is the code performing right now? Is it slow? I don't know what fast or slow is supposed to be ;-) Faster than my eye can follow! >- is this the only major part the code spends it time in and the rest is fragmented? Then yes, if there's something else where the algorithm stays 50% of the time in, I would first focus on that process Yes, this is the situation at this point. The rest of the time is spread out over several instructions. >- are there real reasons to believe that remapping will significantly speed up things, then it's worth trying (don't forget to take the time remapping takes into account for a fair comparison) I don't know. I suppose at this point, we would need to write a separate class for the mapping to do fair comparisons and easily test different options, but maybe we will wait for more results from other algorithms and see what our needs are. Wait and see... I will go back to my 'agenda' for now and work on BCSequence with BCSymbolSet :-) charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From charles.parnot at stanford.edu Mon Mar 21 15:04:27 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Mon, 21 Mar 2005 12:04:27 -0800 Subject: [BioCocoa-Dev] Project Update...cvs struggle In-Reply-To: References: <0f549186b3e473ce5e96d435ebfff938@mekentosj.com> <30c330165876e56a26cebb3c69237820@mekentosj.com> <0f549186b3e473ce5e96d435ebfff938@mekentosj.com> Message-ID: At 9:27 +0100 3/21/05, Alexander Griekspoor wrote: >Well I used CVL yesterday but that didn't help much either, for instance it refused to get the English.lproj folder in as well. Another such irritations, it started to mark almost every folder as "need update" because it didn't know how to handle DS_Store files ?! >But perhaps this was all because XCode had already screwed up for me... For simple updates/modifications XCode is fine, but the moment you want to add things to the repository it's hell. Charles, have you completely disabled CVS in XCode? >Alex Yes, I have. The problem with Xcode is: * it does not care about files that are not referenced in the project, but cvs does. * it does not handle new folders * it does the cvs removal for you when you remove a file from the project, which is often not wanted CVL is much closer to CVS in terms of the logic. The fact that is is GUI makes it much simpler than the CVS CLI, and seeing the filesystem hierarchy in the GUI is also very useful. At 9:30 +0100 3/21/05, Alexander Griekspoor wrote: > >>Finally, I had to restore the right MainMenu.nib file in the translation demo. I think with your reorganization you by accident used the one from the Peptides demo ;-) > >Yeah, now I remember something like that indeed, the trick is to FIRST copy the files in the proper folder, then add them to the XCode project. Initially I just dropped the files from another location into the project, resulting in all files being copied in the root. Then I had to move them and somewhere that is where the mistake must have happened. Pfeww, I'm glad this is all over and everything works now. No one of you had troubles with the renames/moved plist files/targets? Great! >Alex > >>- Koen. > Again, CVL would have been a bit easier here, because at least you see immediately what goes wrong. (remember to do command-L to refresh the status, when appropriate). I was using CVL before starting BioCocoa, then gave Xcode another chance when I started working on the project, and moved back to CVL 3 weeks ago. You still need the public/private key trick, but don't put any password on your private key. You can still remove it without generating a new key. I was thinking of adding a how-to about that in the developer docs. charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From charles.parnot at stanford.edu Mon Mar 21 15:06:53 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Mon, 21 Mar 2005 12:06:53 -0800 Subject: [Biococoa-dev] string definitions In-Reply-To: <748fb4ed438d490f2412c2e2c2686d46@earthlink.net> References: <6fe0eb471e8e2c7895299ec59ef97c8b@earthlink.net> <067fed2f07ba0cc280e58c70a4a45e9d@nki.nl> <99b137d484bbe47ee1b596a16c79cb6a@earthlink.net> <748fb4ed438d490f2412c2e2c2686d46@earthlink.net> Message-ID: At 6:39 -0500 3/21/05, Koen van der Drift wrote: >On Mar 21, 2005, at 12:42 AM, Charles PARNOT wrote: > >>You can't define the value of a variable in a header and then call that header from multiple file. Otherwise, the compiler creates one variable for each implementation file and each compiled '.o' piece of code. What you are supposed to do is declare those variables in the header using extern, so the linker knows that these variables will exist. And then the values should be set in an implementation file. Constants have to be associated with one particular class. It should probably be in BCSymbol.m. >> >>So the header should have >>extern NSString *BCSymbolNameProperty; >> >>and the implementation >>NSString *BCSymbolNameProperty=@"Name"; >> > > >Well then that would defeat the purpose of one central file with all the strings defines, some of which might occur in more than one file. Maybe there is another construction possible to do such a thing? > >thanks, > >- Koen. The header can still be separate from the BCSymbol header. But they should be set somewhere. It could in BCSymbol.m, or it could be in another file 'BCStringDefinitions.m'. When the framework is loaded, then all the global variables are loaded too, no matter where they are set. charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From a.griekspoor at nki.nl Mon Mar 21 15:09:55 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Mon, 21 Mar 2005 21:09:55 +0100 Subject: [BioCocoa-Dev] Project Update...cvs struggle In-Reply-To: References: <0f549186b3e473ce5e96d435ebfff938@mekentosj.com> <30c330165876e56a26cebb3c69237820@mekentosj.com> <0f549186b3e473ce5e96d435ebfff938@mekentosj.com> Message-ID: <55f733a37f4a7c8d00b602894a72322e@nki.nl> Yes please! alex On 21-mrt-05, at 21:04, Charles PARNOT wrote: > I was using CVL before starting BioCocoa, then gave Xcode another > chance when I started working on the project, and moved back to CVL 3 > weeks ago. You still need the public/private key trick, but don't put > any password on your private key. You can still remove it without > generating a new key. > > I was thinking of adding a how-to about that in the developer docs. > > charles ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com iRNAi, do you? http://www.mekentosj.com/irnai ********************************************************* From kvddrift at earthlink.net Mon Mar 21 21:09:25 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Mon, 21 Mar 2005 21:09:25 -0500 Subject: [Biococoa-dev] string definitions In-Reply-To: References: <6fe0eb471e8e2c7895299ec59ef97c8b@earthlink.net> <067fed2f07ba0cc280e58c70a4a45e9d@nki.nl> <99b137d484bbe47ee1b596a16c79cb6a@earthlink.net> <748fb4ed438d490f2412c2e2c2686d46@earthlink.net> Message-ID: <87393708e5e8234eef8a4c2417ddc6a0@earthlink.net> On Mar 21, 2005, at 3:06 PM, Charles PARNOT wrote: > The header can still be separate from the BCSymbol header. But they > should be set somewhere. It could in BCSymbol.m, or it could be in > another file 'BCStringDefinitions.m'. When the framework is loaded, > then all the global variables are loaded too, no matter where they are > set. > > Yes, thanks, now it works! I encourage everyone to replace occurances of strings with a BCFooProperty constant. - Koen. From kvddrift at earthlink.net Mon Mar 21 21:12:25 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Mon, 21 Mar 2005 21:12:25 -0500 Subject: [POSSIBLE VIRUS:###] [BioCocoa-Dev] Project Update...cvs struggle In-Reply-To: References: <0f549186b3e473ce5e96d435ebfff938@mekentosj.com> Message-ID: <77d431a00b5b716721380adfb55007ff@earthlink.net> On Mar 21, 2005, at 3:27 AM, Alexander Griekspoor wrote: > Another such irritations, it started to mark almost every folder as > "need update" because it didn't know how to handle DS_Store files ?! I added a cvsingnore file in the repository for .DS_Store files, maybe that could have caused it? Anyway, some instructions on how to use CVL with our project (as Charles offered to do) would be appreciated. So far I haven't been able to connect to the bioinformatics server with CVL. - Koe From charles.parnot at stanford.edu Tue Mar 22 02:17:44 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Mon, 21 Mar 2005 23:17:44 -0800 Subject: [BioCocoa-Dev] Project Update...cvs struggle In-Reply-To: <55f733a37f4a7c8d00b602894a72322e@nki.nl> <77d431a00b5b716721380adfb55007ff@earthlink.net> References: <0f549186b3e473ce5e96d435ebfff938@mekentosj.com> <30c330165876e56a26cebb3c69237820@mekentosj.com> <0f549186b3e473ce5e96d435ebfff938@mekentosj.com> <55f733a37f4a7c8d00b602894a72322e@nki.nl> <0f549186b3e473ce5e96d435ebfff938@mekentosj.com> <77d431a00b5b716721380adfb55007ff@earthlink.net> Message-ID: I added some instructions on how to use CVL... :-) Update your project! charles At 9:09 PM +0100 3/21/05, Alexander Griekspoor wrote: >Yes please! >alex > >On 21-mrt-05, at 21:04, Charles PARNOT wrote: > >>I was using CVL before starting BioCocoa, then gave Xcode another chance when I started working on the project, and moved back to ... >>...snip... >>... can still remove it without generating a new key. >> >>I was thinking of adding a how-to about that in the developer docs. >> >>charles > >********************************************************* > ** Alexander Griekspoor ** >********************************************************* > The Netherlands Cancer Institute > Department of Tumorbiology (H4) > Plesmanlaan 121, 1066 CX, Amsterdam > Tel: + 31 20 - 512 2023 > Fax: + 31 20 - 512 2029 > AIM: mekentosj at mac.com > E-mail: a.griekspoor at nki.nl > Web: http://www.mekentosj.com > > iRNAi, do you? > http://www.mekentosj.com/irnai > >********************************************************* At 9:12 PM -0500 3/21/05, Koen van der Drift wrote: >On Mar 21, 2005, at 3:27 AM, Alexander Griekspoor wrote: > >> Another such irritations, it started to mark almost every folder as "need update" because it didn't know how to handle DS_Store files ?! > >I added a cvsingnore file in the repository for .DS_Store files, maybe that could have caused it? > >Anyway, some instructions on how to use CVL with our project (as Charles offered to do) would be appreciated. So far I haven't been able to connect to the bioinformatics server with CVL. > > >- Koe -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From kvddrift at earthlink.net Tue Mar 22 18:32:06 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue, 22 Mar 2005 18:32:06 -0500 Subject: [BioCocoa-Dev] Project Update...cvs struggle In-Reply-To: References: <0f549186b3e473ce5e96d435ebfff938@mekentosj.com> <30c330165876e56a26cebb3c69237820@mekentosj.com> <0f549186b3e473ce5e96d435ebfff938@mekentosj.com> <55f733a37f4a7c8d00b602894a72322e@nki.nl> <0f549186b3e473ce5e96d435ebfff938@mekentosj.com> <77d431a00b5b716721380adfb55007ff@earthlink.net> Message-ID: <1d1383c95603798142ba9dc6c8d62b25@earthlink.net> On Mar 22, 2005, at 2:17 AM, Charles PARNOT wrote: > I added some instructions on how to use CVL... :-) > > Update your project! > Very nice, thanks. I improvement the document a little bit (added some text and links). - Koen. From charles.parnot at stanford.edu Wed Mar 23 03:15:02 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Wed, 23 Mar 2005 00:15:02 -0800 Subject: [BioCocoa-Dev] Project Update...cvs struggle In-Reply-To: <1d1383c95603798142ba9dc6c8d62b25@earthlink.net> References: <0f549186b3e473ce5e96d435ebfff938@mekentosj.com> <30c330165876e56a26cebb3c69237820@mekentosj.com> <0f549186b3e473ce5e96d435ebfff938@mekentosj.com> <55f733a37f4a7c8d00b602894a72322e@nki.nl> <0f549186b3e473ce5e96d435ebfff938@mekentosj.com> <77d431a00b5b716721380adfb55007ff@earthlink.net> <1d1383c95603798142ba9dc6c8d62b25@earthlink.net> Message-ID: At 6:32 PM -0500 3/22/05, Koen van der Drift wrote: >On Mar 22, 2005, at 2:17 AM, Charles PARNOT wrote: > >>I added some instructions on how to use CVL... :-) >> >>Update your project! >> > >Very nice, thanks. I improvement the document a little bit (added some text and links). > > >- Koen. Thanks. This is indeed better and looks more like what I was aiming at :-) I added the symbol set code to the BCAbstractSequence and subclasses. The code is much simpler as a result... BCSequence still needs a little bit of work. In my todo list is still an update to the dev-docs to explain the sequence class design. charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From kvddrift at earthlink.net Wed Mar 23 06:07:46 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 23 Mar 2005 06:07:46 -0500 Subject: [BioCocoa-Dev] Project Update...cvs struggle In-Reply-To: References: <0f549186b3e473ce5e96d435ebfff938@mekentosj.com> <30c330165876e56a26cebb3c69237820@mekentosj.com> <0f549186b3e473ce5e96d435ebfff938@mekentosj.com> <55f733a37f4a7c8d00b602894a72322e@nki.nl> <0f549186b3e473ce5e96d435ebfff938@mekentosj.com> <77d431a00b5b716721380adfb55007ff@earthlink.net> <1d1383c95603798142ba9dc6c8d62b25@earthlink.net> Message-ID: <3d57452c1debc6fc190e67f548471249@earthlink.net> On Mar 23, 2005, at 3:15 AM, Charles PARNOT wrote: > I added the symbol set code to the BCAbstractSequence and subclasses. > The code is much simpler as a result... BCSequence still needs a > little bit of work. > Very nice. I spotted this part: for (i=0;i References: <0f549186b3e473ce5e96d435ebfff938@mekentosj.com> <30c330165876e56a26cebb3c69237820@mekentosj.com> <0f549186b3e473ce5e96d435ebfff938@mekentosj.com> <55f733a37f4a7c8d00b602894a72322e@nki.nl> <0f549186b3e473ce5e96d435ebfff938@mekentosj.com> <77d431a00b5b716721380adfb55007ff@earthlink.net> <1d1383c95603798142ba9dc6c8d62b25@earthlink.net> <3d57452c1debc6fc190e67f548471249@earthlink.net> Message-ID: At 6:07 AM -0500 3/23/05, Koen van der Drift wrote: >On Mar 23, 2005, at 3:15 AM, Charles PARNOT wrote: > >>I added the symbol set code to the BCAbstractSequence and subclasses. The code is much simpler as a result... BCSequence still needs a little bit of work. >> > >Very nice. I spotted this part: > > for (i=0;i unichar aChar=[aString characterAtIndex:i]; > if (aSymbol=[aSet symbolForChar:aChar]) > [anArray addObject:aSymbol]; > } > > >Shouldn't that be > > if (aSymbol == [aSet symbolForChar:aChar]) ? > >- Koen. no, because the set returns nil in the symbol is not in it. Just like 'objectForKey:' (NSDictionary) or 'member:' (NSSet). charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From biococoa at bioworxx.com Thu Mar 24 12:22:02 2005 From: biococoa at bioworxx.com (Philipp Seibel) Date: Thu, 24 Mar 2005 18:22:02 +0100 Subject: [Biococoa-dev] Genbank - Annotations & Features Message-ID: <0f81e9b2594b30487c15b8a918404f62@bioworxx.com> Hi everybody, i'm currently wrangling with genbank entries. I'd like to represent the entries in a BCSequence with features and annotations. It seems that our Annotation structure isn't flexible enough to handle the genbank structure. So for example if we have several locations for one feature or more annotations for author. Alex can you explain me the idea behind your implementation & how to manage my problem with it. Phil From kvddrift at earthlink.net Thu Mar 24 20:06:03 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 24 Mar 2005 20:06:03 -0500 Subject: [Biococoa-dev] docs Message-ID: Hi, Just in case you haven't noticed it in the project, I cleaned up the docs a little bit, and even added a small stylesheet to make it a little bit more readable. . Now back to coding... ;-) - Koen. From kvddrift at earthlink.net Thu Mar 24 20:38:18 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 24 Mar 2005 20:38:18 -0500 Subject: [Biococoa-dev] cvs suggestion Message-ID: <16b7393e642cec122c1999756bb85896@earthlink.net> Hi, Currently the BioCocoa.pbproj folder contains the individual project setting for most of us. If nobody objects, I will remove these from the repository. Your local settings (containg window size, position, etc) are already on your own HD, and there is no need to save them in the repository. This will not only clean up the project, but also prevent cvs problems such as I had a while ago and which was caused by a conflicting pbxuser file. - Koen. From kvddrift at earthlink.net Thu Mar 24 22:11:51 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 24 Mar 2005 22:11:51 -0500 Subject: [BioCocoa-dev] Peptides... In-Reply-To: References: <865134ceae5879a882c2faf3050ca357@mekentosj.com> <9d0cc985a7e2df5b0f6ac1a1cc1a3f0f@mekentosj.com> <53ecb01a6ce610ba0710511be21cdfe2@earthlink.net> Message-ID: <862d450adf25cc038238ed33811940f5@earthlink.net> Without using Shark, just the counter in Peptides, the symbolcounter code is about 2 times faster than the other one... - Koen. On Mar 19, 2005, at 6:34 PM, Alexander Griekspoor wrote: > I'll test it and see what's the fasted method... > Alex > > On 20-mrt-05, at 0:32, Koen van der Drift wrote: > >> >> On Mar 19, 2005, at 5:48 PM, Alexander Griekspoor wrote: >> >>> Shark tells me that most time (65%) is already lost to object >>> messaging in the symbol counter, so perhaps that could deserve some >>> optimization ;-) >> >> >> Try setting the #if 0 in calculateMassForRange. Then it won't use the >> symbolCounter, but calculate each individual symbol. Maybe that's >> faster. >> >> - Koen. >> >> _______________________________________________ >> Biococoa-dev mailing list >> Biococoa-dev at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/biococoa-dev >> >> > ********************************************************* > ** Alexander Griekspoor ** > ********************************************************* > The Netherlands Cancer Institute > Department of Tumorbiology (H4) > Plesmanlaan 121, 1066 CX, Amsterdam > Tel: + 31 20 - 512 2023 > Fax: + 31 20 - 512 2029 > AIM: mekentosj at mac.com > E-mail: a.griekspoor at nki.nl > Web: http://www.mekentosj.com > > LabAssistant - Get your life organized! > http://www.mekentosj.com/labassistant > > ********************************************************* > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > From kvddrift at earthlink.net Fri Mar 25 07:18:38 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 25 Mar 2005 07:18:38 -0500 Subject: [BioCocoa-dev] Peptides... In-Reply-To: <862d450adf25cc038238ed33811940f5@earthlink.net> References: <865134ceae5879a882c2faf3050ca357@mekentosj.com> <9d0cc985a7e2df5b0f6ac1a1cc1a3f0f@mekentosj.com> <53ecb01a6ce610ba0710511be21cdfe2@earthlink.net> <862d450adf25cc038238ed33811940f5@earthlink.net> Message-ID: <219788ee1c186b704dc237fc70f1103a@earthlink.net> Aha, I found the cuplprit of the slowing down. Right now the accessor for symbolArray in BCAbstractSequence returns a copy of the array. So everytime a copy has to be created. Changing it to: - (NSArray *)symbolArray { // return [[symbolArray copy] autorelease]; return symbolArray; } gave a huge speed increase (8 -> 1.2 sec on a G4 867 MHz). I suggest we just use "return symbolArray". If people want an actual copy, we should add a different accessor, or leave it to their responsibility to create a copy after getting the pointer from the accessor. I will look in other classes and see what the accessors look like. cheers, - Koen. On Mar 24, 2005, at 10:11 PM, Koen van der Drift wrote: > Without using Shark, just the counter in Peptides, the symbolcounter > code is about 2 times faster than the other one... > > - Koen. > > > > On Mar 19, 2005, at 6:34 PM, Alexander Griekspoor wrote: > >> I'll test it and see what's the fasted method... >> Alex >> >> On 20-mrt-05, at 0:32, Koen van der Drift wrote: >> >>> >>> On Mar 19, 2005, at 5:48 PM, Alexander Griekspoor wrote: >>> >>>> Shark tells me that most time (65%) is already lost to object >>>> messaging in the symbol counter, so perhaps that could deserve some >>>> optimization ;-) >>> >>> >>> Try setting the #if 0 in calculateMassForRange. Then it won't use >>> the symbolCounter, but calculate each individual symbol. Maybe >>> that's faster. >>> >>> - Koen. >>> >>> _______________________________________________ >>> Biococoa-dev mailing list >>> Biococoa-dev at bioinformatics.org >>> https://bioinformatics.org/mailman/listinfo/biococoa-dev >>> >>> >> ********************************************************* >> ** Alexander Griekspoor ** >> ********************************************************* >> The Netherlands Cancer Institute >> Department of Tumorbiology (H4) >> Plesmanlaan 121, 1066 CX, Amsterdam >> Tel: + 31 20 - 512 2023 >> Fax: + 31 20 - 512 2029 >> AIM: mekentosj at mac.com >> E-mail: a.griekspoor at nki.nl >> Web: http://www.mekentosj.com >> >> LabAssistant - Get your life organized! >> http://www.mekentosj.com/labassistant >> >> ********************************************************* >> >> _______________________________________________ >> Biococoa-dev mailing list >> Biococoa-dev at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/biococoa-dev >> > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > From jtimmer at bellatlantic.net Fri Mar 25 08:40:27 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Fri, 25 Mar 2005 08:40:27 -0500 Subject: [BioCocoa-dev] Peptides... In-Reply-To: <219788ee1c186b704dc237fc70f1103a@earthlink.net> Message-ID: Don't worry, I'm changing that code today anyway. I finally should have some free time this afternoon, so I'm going to test both NSSet and SymbolSet instead of the array. I will definitely keep this info in mind when I make the changes. I guess my excessive caution had a cost here. We'll just have to trust people not to poke around in the memory at the other end of that pointer - thanks for the info. Thanks - JT > Aha, > > I found the cuplprit of the slowing down. Right now the accessor for > symbolArray in BCAbstractSequence returns a copy of the array. So > everytime a copy has to be created. Changing it to: > > - (NSArray *)symbolArray > { > // return [[symbolArray copy] autorelease]; > return symbolArray; > } > > gave a huge speed increase (8 -> 1.2 sec on a G4 867 MHz). I suggest we > just use "return symbolArray". If people want an actual copy, we should > add a different accessor, or leave it to their responsibility to create > a copy after getting the pointer from the accessor. I will look in > other classes and see what the accessors look like. > _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Fri Mar 25 09:44:49 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 25 Mar 2005 09:44:49 -0500 Subject: [BioCocoa-dev] Peptides... In-Reply-To: References: Message-ID: <93053ec7454fbca8b72b828d049c35af@earthlink.net> FWIW, I found some more info on speeding up iterations at: http://www.mulle-kybernetik.com/artikel/Optimization/opti-3-imp- deluxe.html - Koen. On Mar 25, 2005, at 8:40 AM, John Timmer wrote: > Don't worry, I'm changing that code today anyway. I finally should > have > some free time this afternoon, so I'm going to test both NSSet and > SymbolSet > instead of the array. > > I will definitely keep this info in mind when I make the changes. I > guess my > excessive caution had a cost here. We'll just have to trust people > not to > poke around in the memory at the other end of that pointer - thanks > for the > info. From kvddrift at earthlink.net Fri Mar 25 10:48:14 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 25 Mar 2005 10:48:14 -0500 Subject: [BioCocoa-dev] Peptides... In-Reply-To: References: Message-ID: <25387cfbc5ef84c67b0f2efc05d6e06a@earthlink.net> On Mar 25, 2005, at 8:40 AM, John Timmer wrote: >> // return [[symbolArray copy] autorelease]; >> There are indeed some more places where this construction is used. I understand the carefulness why this was introduced, but in most cases speed is probably a more important factor. I don't think we should throw these out, but maybe we should think of adding some accessors that have the word 'copy' in it. These can then be called when a copy is really needed, instead of only a pointer. For instance: - (NSArray *)copyOfSymbolArray or - (NSArray *)symbolArrayCopy Or something else. If we make the difference clear in the docs, I don't think there will be an issue. What do you think? - Koen. From jtimmer at bellatlantic.net Fri Mar 25 11:11:31 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Fri, 25 Mar 2005 11:11:31 -0500 Subject: [BioCocoa-dev] Peptides... In-Reply-To: <93053ec7454fbca8b72b828d049c35af@earthlink.net> Message-ID: Yeah, I'd read that one over before. Basically, it suggests grabbing everything you need in a loop - the underlying C classes and the pointers to their functions - before hand, and using raw C within the loop. As you can see from the example code, there's a HUGE penalty in terms of readability of the code. I've also not had a lot of experience with handling function pointers, so I'm hesitant to rely on my ability to do so for something like BioCocoa. Still, long term it might pay to try this for a couple of situations. JT > FWIW, > > I found some more info on speeding up iterations at: > > http://www.mulle-kybernetik.com/artikel/Optimization/opti-3-imp- > deluxe.html > > > - Koen. > > > > On Mar 25, 2005, at 8:40 AM, John Timmer wrote: > >> Don't worry, I'm changing that code today anyway. I finally should >> have >> some free time this afternoon, so I'm going to test both NSSet and >> SymbolSet >> instead of the array. >> >> I will definitely keep this info in mind when I make the changes. I >> guess my >> excessive caution had a cost here. We'll just have to trust people >> not to >> poke around in the memory at the other end of that pointer - thanks >> for the >> info. > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev _______________________________________________ This mind intentionally left blank From charles.parnot at stanford.edu Sat Mar 26 03:11:03 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Sat, 26 Mar 2005 00:11:03 -0800 Subject: [BioCocoa-dev] Peptides... In-Reply-To: <25387cfbc5ef84c67b0f2efc05d6e06a@earthlink.net> References: <25387cfbc5ef84c67b0f2efc05d6e06a@earthlink.net> Message-ID: At 10:48 AM -0500 3/25/05, Koen van der Drift wrote: >On Mar 25, 2005, at 8:40 AM, John Timmer wrote: > >>>// return [[symbolArray copy] autorelease]; >>> > >There are indeed some more places where this construction is used. I understand the carefulness why this was introduced, but in most cases speed is probably a more important factor. I don't think we should throw these out, but maybe we should think of adding some accessors that have the word 'copy' in it. These can then be called when a copy is really needed, instead of only a pointer. For instance: > > >- (NSArray *)copyOfSymbolArray or - (NSArray *)symbolArrayCopy > > >Or something else. If we make the difference clear in the docs, I don't think there will be an issue. What do you think? > >- Koen. I don't think we need to put any of that, because the returned object is typed as an NSArray, so the interface would be confusing for the user: what is the difference between an NSArray and its copy!?. It can be left as is: anybody trying to use NSMutableArray methods on the object returned will have compiler warnings. So even the not-too-careful user won't mess up with the original array. However, returning the original object (disguised as an NSArray) can bite us back in the near future. Suppose you are the user, you get the array back as a snapshot of the sequence at the time it is returned. Then you modify the BCSequence. You would expect the NSArray to still be the same, e.g.: BCSequence *seq; NSArray *myArray=[seq symbolArray]; int i=[myArray count]; [seq appendSequence:@"ATGT"]; int j=[myArray count]; NSAssert (i=j;@"An NSArray can't change!?? What happened"); This might seem trivial, but it can be much more subtle than that and very difficult to debug when you expect an NSArray to be immutable. With multithreading, things can get even uglier. I did not follow the peptides story, but would an immutable sequence object be useful in that context? If yes, then here is a hint that something needs to be done. (it is in my agenda) Another option is to have a 'mutableSymbolArray' and 'symbolArray' methods. charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From charles.parnot at stanford.edu Sat Mar 26 03:14:58 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Sat, 26 Mar 2005 00:14:58 -0800 Subject: [BioCocoa-dev] Peptides... In-Reply-To: References: Message-ID: At 11:11 AM -0500 3/25/05, John Timmer wrote: >Yeah, I'd read that one over before. Basically, it suggests grabbing >everything you need in a loop - the underlying C classes and the pointers to >their functions - before hand, and using raw C within the loop. As you can >see from the example code, there's a HUGE penalty in terms of readability of >the code. I've also not had a lot of experience with handling function >pointers, so I'm hesitant to rely on my ability to do so for something like >BioCocoa. > >Still, long term it might pay to try this for a couple of situations. > >JT I agree with that very much!! Even the CFArray I have seen here and there in the code have made the code a bit hard to read. Any optimization should be very clearly documented, explaining why. Otherwise, someone will come and modify the code again, thinking somebody made it more complicated than nexessary or missed a possible issue (could very easily happen on the 'symbolArray'!) charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From kvddrift at earthlink.net Sat Mar 26 07:03:19 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 26 Mar 2005 07:03:19 -0500 Subject: [BioCocoa-dev] Peptides... In-Reply-To: References: Message-ID: On Mar 26, 2005, at 3:14 AM, Charles PARNOT wrote: >> Yeah, I'd read that one over before. Basically, it suggests grabbing >> everything you need in a loop - the underlying C classes and the >> pointers to >> their functions - before hand, and using raw C within the loop. As >> you can >> see from the example code, there's a HUGE penalty in terms of >> readability of >> the code. I've also not had a lot of experience with handling >> function >> pointers, so I'm hesitant to rely on my ability to do so for >> something like >> BioCocoa. >> >> Still, long term it might pay to try this for a couple of situations. >> >> JT > > I agree with that very much!! Even the CFArray I have seen here and > there in the code have made the code a bit hard to read. Any > optimization should be very clearly documented, explaining why. > Otherwise, someone will come and modify the code again, thinking > somebody made it more complicated than nexessary or missed a possible > issue (could very easily happen on the 'symbolArray'!) > > I tried one the optimizations, and it didn't increase the speed in this case. And I also agree, that readability is very important too. - Koen. From kvddrift at earthlink.net Sat Mar 26 08:01:04 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 26 Mar 2005 08:01:04 -0500 Subject: [BioCocoa-dev] Peptides... In-Reply-To: References: <25387cfbc5ef84c67b0f2efc05d6e06a@earthlink.net> Message-ID: On Mar 26, 2005, at 3:11 AM, Charles PARNOT wrote: > Another option is to have a 'mutableSymbolArray' and 'symbolArray' > methods. > > I like the mutableSymbolArray (or maybe mutableSymbolArrayCopy) method. That should indeed return a new instance with which the the user can do what she want. The symbolArray should just return a pointer as it is right now (I already changed that in CVS). - Koen. From jtimmer at bellatlantic.net Sat Mar 26 16:56:01 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Sat, 26 Mar 2005 16:56:01 -0500 Subject: [Biococoa-dev] Optimizations In-Reply-To: Message-ID: Okay, I changed the represents/representedBy collections from arrays to sets and fixed the all the related calls to adjust to that. It definitely makes a difference - cuts the time about in half. On a dual 1.8 G5, I got the following results using a 6-mer searching a 1.2KB DNA sequence 50 times: 2005-03-26 16:38:51.902 Translation[8375] ambiguous finding took -0.821547 seconds 2005-03-26 16:38:52.775 Translation[8375] ambiguous old finding took -0.873676 seconds 2005-03-26 16:38:53.034 Translation[8375] strict finding took -0.258846 seconds 2005-03-26 16:38:53.466 Translation[8375] strict old finding took -0.431471 seconds This is after catching two bugs, one in the old and one in the new method. It was pretty funny - the old version kept coming in faster, so I knew there had to be something wrong ;). For the curious, extrapolating from this single data point indicates that the ambiguous search is faster than searching for each of its possible strict sequences as soon as the ambiguity can't be resolved into <4 strict sequences. Given the big boosts, I'm going to do the same for complements now - I expect that will significantly boost translation speeds. JT _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Sat Mar 26 20:38:35 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sat, 26 Mar 2005 20:38:35 -0500 Subject: [Biococoa-dev] Optimizations In-Reply-To: References: Message-ID: Hi, Looks very nice John. Would there be any problems to use BCSymbolSet instead of NSSet? Basically BCSymbolSet is a wrapper around NSSet, but it could give us some additional advantages over directly using an NSSet. Or will that slow down your code? cheers, - Koen. On Mar 26, 2005, at 4:56 PM, John Timmer wrote: > Okay, I changed the represents/representedBy collections from arrays > to sets > and fixed the all the related calls to adjust to that. It definitely > makes > a difference - cuts the time about in half. On a dual 1.8 G5, I got > the > following results using a 6-mer searching a 1.2KB DNA sequence 50 > times: > > 2005-03-26 16:38:51.902 Translation[8375] ambiguous finding took > -0.821547 > seconds > 2005-03-26 16:38:52.775 Translation[8375] ambiguous old finding took > -0.873676 seconds > 2005-03-26 16:38:53.034 Translation[8375] strict finding took -0.258846 > seconds > 2005-03-26 16:38:53.466 Translation[8375] strict old finding took > -0.431471 > seconds > > This is after catching two bugs, one in the old and one in the new > method. > It was pretty funny - the old version kept coming in faster, so I knew > there > had to be something wrong ;). > > For the curious, extrapolating from this single data point indicates > that > the ambiguous search is faster than searching for each of its possible > strict sequences as soon as the ambiguity can't be resolved into <4 > strict > sequences. > > Given the big boosts, I'm going to do the same for complements now - I > expect that will significantly boost translation speeds. > > JT > _______________________________________________ > This mind intentionally left blank > > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > From jtimmer at bellatlantic.net Sat Mar 26 23:02:26 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Sat, 26 Mar 2005 23:02:26 -0500 Subject: [Biococoa-dev] Optimizations In-Reply-To: Message-ID: I knew I was forgetting something - I'll give it a try on Monday and find out. > Looks very nice John. Would there be any problems to use BCSymbolSet > instead of NSSet? Basically BCSymbolSet is a wrapper around NSSet, but > it could give us some additional advantages over directly using an > NSSet. Or will that slow down your code? _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Sun Mar 27 22:58:42 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Sun, 27 Mar 2005 22:58:42 -0500 Subject: [BioCocoa-Dev] Project update In-Reply-To: References: Message-ID: On Mar 20, 2005, at 5:14 AM, Alexander Griekspoor wrote: > Guys, right now I'm changing some stuff in the Xcode project, please > don't commit anything in the next hour or so... > More to follow... > Didn't notice it before, but with the changes I can now have one of the examples open, edit the something in the main target, and build the examples without changing targets. The main target will automatically build first. Great timesaver! - Koen. From charles.parnot at stanford.edu Mon Mar 28 02:53:46 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Sun, 27 Mar 2005 23:53:46 -0800 Subject: [BioCocoa-dev] Peptides... In-Reply-To: References: <25387cfbc5ef84c67b0f2efc05d6e06a@earthlink.net> Message-ID: At 8:01 AM -0500 3/26/05, Koen van der Drift wrote: >On Mar 26, 2005, at 3:11 AM, Charles PARNOT wrote: > >>Another option is to have a 'mutableSymbolArray' and 'symbolArray' methods. >> > >I like the mutableSymbolArray (or maybe mutableSymbolArrayCopy) >method. That should indeed return a new instance with which the the >user can do what she want. The symbolArray should just return a >pointer as it is right now (I already changed that in CVS). > > >- Koen. I did not see the change (did you commit?) so I made it while I was working on the final integration of sequence classes with symbol sets. I was not sure where to change the call to symbolArray into mutableSymbolArray, so that 'Peptides' runs fast? charles NB: here are the methods - (NSMutableArray *)mutableSymbolArray { return symbolArray; } - (NSArray *)symbolArray { return [[symbolArray copy] autorelease]; } -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mek at mekentosj.com Mon Mar 28 03:46:05 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Mon, 28 Mar 2005 10:46:05 +0200 Subject: [BioCocoa-Dev] Project update In-Reply-To: References: Message-ID: Yeah pretty neat, that's done by adding a dependency to the target: Select the target -> Get info -> In the first tab ("General") add the framework as a direct depency. I've done that for all examples. That reminds me that I still have to add the target addition instructions... Cheers, Alex On 28-mrt-05, at 5:58, Koen van der Drift wrote: > > On Mar 20, 2005, at 5:14 AM, Alexander Griekspoor wrote: > >> Guys, right now I'm changing some stuff in the Xcode project, please >> don't commit anything in the next hour or so... >> More to follow... >> > > Didn't notice it before, but with the changes I can now have one of > the examples open, edit the something in the main target, and build > the examples without changing targets. The main target will > automatically build first. Great timesaver! > > - Koen. > > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com LabAssistant - Get your life organized! http://www.mekentosj.com/labassistant ********************************************************* ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 E-mail: a.griekspoor at nki.nl AIM: mekentosj at mac.com Web: http://www.mekentosj.com EnzymeX - To cut or not to cut http://www.mekentosj.com/enzymex ********************************************************* From a.griekspoor at nki.nl Mon Mar 28 03:51:45 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Mon, 28 Mar 2005 10:51:45 +0200 Subject: [BioCocoa-dev] Peptides... In-Reply-To: References: <25387cfbc5ef84c67b0f2efc05d6e06a@earthlink.net> Message-ID: <35d5e1333a1fffd1cc61f87370730a60@nki.nl> And perhaps you've done that already, but the "speed" issue with the two should definitely be mentioned in the headerdoc, and if this is a general approach we take (which I think would be a good thing in many occasions) we should document the convention we make here and spend a few words on safety vs performance... Good works on the speed up guys! Alex On 28-mrt-05, at 9:53, Charles PARNOT wrote: > At 8:01 AM -0500 3/26/05, Koen van der Drift wrote: > On Mar 26, 2005, at 3:11 AM, Charles PARNOT wrote: > > Another option is to have a 'mutableSymbolArray' and 'symbolArray' > methods. > > > I like the mutableSymbolArray (or maybe mutableSymbolArrayCopy) > method. That should indeed return a new instance with which the the > user can do what she want. The symbolArray should just return a > pointer as it is right now (I already changed that in CVS). > > > - Koen. > > I did not see the change (did you commit?) so I made it while I was > working on the final integration of sequence classes with symbol sets. > I was not sure where to change the call to symbolArray into > mutableSymbolArray, so that 'Peptides' runs fast? > > charles > > NB: here are the methods > > - (NSMutableArray *)mutableSymbolArray > { > return symbolArray; > } > > - (NSArray *)symbolArray > { > ?? return [[symbolArray copy] autorelease]; > } > > -- > > Help science go fast forward: > http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ > > Charles Parnot > charles.parnot at stanford.edu > > Room? B157 in Beckman Center > 279, Campus Drive > Stanford University > Stanford, CA 94305 (USA) > > Tel +1 650 725 7754 > Fax +1 650 725 8021 > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > ************************************************************** ** Alexander Griekspoor ** ************************************************************** The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com MacOS X: The power of UNIX with the simplicity of the Mac *************************************************************** -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 2969 bytes Desc: not available URL: From mek at mekentosj.com Mon Mar 28 05:59:45 2005 From: mek at mekentosj.com (Alexander Griekspoor) Date: Mon, 28 Mar 2005 12:59:45 +0200 Subject: [BioCocoa-dev] Target namespace Message-ID: <68a7cf6b7c1cfaa290feabae760d8d69@mekentosj.com> Charles, in the light of some target name standardization, could you rename your target "BCFoundation-Tests" to "Test - BCFoundation" ? Also, perhaps you want to add the dependency to the biococoa framework as well... Alex ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Microsoft is not the answer, Microsoft is the question, NO is the answer ********************************************************* From a.griekspoor at nki.nl Mon Mar 28 06:37:24 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Mon, 28 Mar 2005 13:37:24 +0200 Subject: [BioCocoa-dev] Target namespace In-Reply-To: <68a7cf6b7c1cfaa290feabae760d8d69@mekentosj.com> References: <68a7cf6b7c1cfaa290feabae760d8d69@mekentosj.com> Message-ID: Ok, I've added the promised docs on the addition of a new target. Please check if I didn't forget anything. Also, I reduced the font size of the docs a bit in the css. Koen are you developing on a 30'' screen or what? ;-) Cheers, Alex On 28-mrt-05, at 12:59, Alexander Griekspoor wrote: > Charles, in the light of some target name standardization, could you > rename your target "BCFoundation-Tests" to "Test - BCFoundation" ? > Also, perhaps you want to add the dependency to the biococoa framework > as well... > Alex > > ********************************************************* > ** Alexander Griekspoor ** > ********************************************************* > The Netherlands Cancer Institute > Department of Tumorbiology (H4) > Plesmanlaan 121, 1066 CX, Amsterdam > Tel: + 31 20 - 512 2023 > Fax: + 31 20 - 512 2029 > AIM: mekentosj at mac.com > E-mail: a.griekspoor at nki.nl > Web: http://www.mekentosj.com > > Microsoft is not the answer, > Microsoft is the question, > NO is the answer > > ********************************************************* > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 E-mail: a.griekspoor at nki.nl AIM: mekentosj at mac.com Web: http://www.mekentosj.com EnzymeX - To cut or not to cut http://www.mekentosj.com/enzymex ********************************************************* From kvddrift at earthlink.net Mon Mar 28 06:38:51 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Mon, 28 Mar 2005 06:38:51 -0500 Subject: [BioCocoa-dev] Peptides... In-Reply-To: References: <25387cfbc5ef84c67b0f2efc05d6e06a@earthlink.net> Message-ID: <3a601fa9d498df9d4c4fbc067439f5e5@earthlink.net> On Mar 28, 2005, at 2:53 AM, Charles PARNOT wrote: > At 8:01 AM -0500 3/26/05, Koen van der Drift wrote: > On Mar 26, 2005, at 3:11 AM, Charles PARNOT wrote: > > Another option is to have a 'mutableSymbolArray' and 'symbolArray' > methods. > > > I like the mutableSymbolArray (or maybe mutableSymbolArrayCopy) > method. That should indeed return a new instance with which the the > user can do what she want. The symbolArray should just return a > pointer as it is right now (I already changed that in CVS). > > > - Koen. > > I did not see the change (did you commit?) so I made it while I was > working on the final integration of sequence classes with symbol sets. > I was not sure where to change the call to symbolArray into > mutableSymbolArray, so that 'Peptides' runs fast? > > charles > > NB: here are the methods > > - (NSMutableArray *)mutableSymbolArray > { > return symbolArray; > } > > - (NSArray *)symbolArray > { > ?? return [[symbolArray copy] autorelease]; > } > I think you switched them around. IMO it should be as follows: - (NSMutableArray *)mutableSymbolArray { ?? return [[symbolArray mutableCopy] autorelease]; // do we need an autorelease here? } - (NSArray *)symbolArray { ?? return symbolArray; } - Koen. From a.griekspoor at nki.nl Mon Mar 28 07:38:18 2005 From: a.griekspoor at nki.nl (Alexander Griekspoor) Date: Mon, 28 Mar 2005 14:38:18 +0200 Subject: [BioCocoa-dev] Peptides... In-Reply-To: <3a601fa9d498df9d4c4fbc067439f5e5@earthlink.net> References: <25387cfbc5ef84c67b0f2efc05d6e06a@earthlink.net> <3a601fa9d498df9d4c4fbc067439f5e5@earthlink.net> Message-ID: It makes more sense to me.... You need the autorelease because a (mutable)copy retains the object and we have to counterbalance that before handing it off. Alex On 28-mrt-05, at 13:38, Koen van der Drift wrote: > I think you switched them around. IMO it should be as follows: > > - (NSMutableArray *)mutableSymbolArray > { > ?? return [[symbolArray mutableCopy] autorelease]; // do we need an > autorelease here? > } > > - (NSArray *)symbolArray > { > ?? return symbolArray; > } > > > > - Koen. > > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > > ********************************************************* ** Alexander Griekspoor ** ********************************************************* The Netherlands Cancer Institute Department of Tumorbiology (H4) Plesmanlaan 121, 1066 CX, Amsterdam Tel: + 31 20 - 512 2023 Fax: + 31 20 - 512 2029 AIM: mekentosj at mac.com E-mail: a.griekspoor at nki.nl Web: http://www.mekentosj.com Windows is a 32-bit patch to a 16-bit shell for an 8-bit operating system, written for a 4-bit processor by a 2- bit company without 1 bit of sense. ********************************************************* From jtimmer at bellatlantic.net Mon Mar 28 11:48:35 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Mon, 28 Mar 2005 11:48:35 -0500 Subject: [Biococoa-dev] Optimizations In-Reply-To: Message-ID: > Hi, > > Looks very nice John. Would there be any problems to use BCSymbolSet > instead of NSSet? Basically BCSymbolSet is a wrapper around NSSet, but > it could give us some additional advantages over directly using an > NSSet. Or will that slow down your code? > Just for context, the original timing for a 50 repeat find of a 6-mer with two ambiguous bases in a 1.2Kb sequence was around 0.821547 seconds. I tried replacing the NSSets with BCSymboSets on the same machine I did my previous timings with. After a couple of runs under the same conditions, it appears that doing so adds about .15 sec, or somewhere between 15% and 20% to the execution time. Presumably, this is all spent message sending, though I haven't checked with Shark to confirm this. In contrast, keeping it as an NSSet and using the CoreFoundation CFSet function that's the equivalent of "containsObject" cut the time by about .2 seconds, knocking the time down to about .62 seconds total. Again, I'll assume this is entirely due to ditching the message sending overhead, since it's only changing a single line of code. Given those numbers, my preference would be to stick with NSSet and use the CF function. Since NSSets and BCSymbolSets are very easy to interchange and we already have methods in place to return either, there's really no difference externally, and internally, we're talking about a > 30% difference in performance. Unless somebody disagrees, I'll commit those changes later today. JT _______________________________________________ This mind intentionally left blank From jtimmer at bellatlantic.net Mon Mar 28 12:01:54 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Mon, 28 Mar 2005 12:01:54 -0500 Subject: [Biococoa-dev] Optimizations In-Reply-To: Message-ID: Just to follow up on my own info: to call the CFSet function, I was typecasting it using (CFSetRef *), but that was spitting out a warning anyway, so I tried just getting rid of it. It worked anyway, and actually appeared to be slightly faster. Would that be expected? If so, there's a bunch of places where we're using CF structs and we could trade compiler warnings for performance... JT _______________________________________________ This mind intentionally left blank From charles.parnot at stanford.edu Mon Mar 28 12:22:12 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Mon, 28 Mar 2005 09:22:12 -0800 Subject: [BioCocoa-dev] Peptides... In-Reply-To: <3a601fa9d498df9d4c4fbc067439f5e5@earthlink.net> References: <25387cfbc5ef84c67b0f2efc05d6e06a@earthlink.net> <3a601fa9d498df9d4c4fbc067439f5e5@earthlink.net> Message-ID: >I think you switched them around. IMO it should be as follows: > >- (NSMutableArray *)mutableSymbolArray > { > return [[symbolArray mutableCopy] autorelease]; // do we need an autorelease here? > } > > - (NSArray *)symbolArray > { > return symbolArray; > } > > > >- Koen. About the autorelease, yes: 'copy' adds one to the retain count, just like alloc/init. So we need to autorelease. Now, regarding the switching, there was a misunderstanding at some point... My issue is: all the problems come if you return a NSMutableArray instead of a NSArray in the symbolArray accessor. When you tell the user it gets an NSArray, the user expects that the array will never change. But if we return the ivar, the array may change and this will yield unexpected and difficult to debug results. for instance: BCSequence *seq=[BCSequence sequenceWithString:@"ATGT"]; NSArray *symbols=[seq symbolArray]; int n=[symbols count]; [seq removeSymbolAtIndex:1]; BCSymbol *last=[symbols objectAtIndex:n]; //UNEXPECTED EXCEPTION To make things clear, my 2-cents idea was that we could return the NSMutableArray ivar directly when performance is an issue. The headerdoc would clearly tell that the method returns a pointer to the ivar, and that it should only been used when performance is an issue (and it is actually already in the headerdoc, yes, Alex;-). Even if the user does not read the doc, at least it can't expect the returned array to be immutable and may be more careful anyway and if it behaves in a weird way, the fact that it is an NSMutableArray should be a clue. My opininon is the standard accessor 'symbolArray' should behave in a very standard way, because the average user will use it and it should be rock-solid. So when we say we return an immutable NSArray, it should really return an immutable NSArray. What do the others think? Alex? John? Phil?... Ultimately, we will probably want immutable versions of the sequence classes. Or is is just me? ;-) charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From charles.parnot at stanford.edu Mon Mar 28 12:24:31 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Mon, 28 Mar 2005 09:24:31 -0800 Subject: [BioCocoa-dev] Target namespace In-Reply-To: <68a7cf6b7c1cfaa290feabae760d8d69@mekentosj.com> References: <68a7cf6b7c1cfaa290feabae760d8d69@mekentosj.com> Message-ID: At 12:59 +0200 3/28/05, Alexander Griekspoor wrote: >Charles, in the light of some target name standardization, could you rename your target "BCFoundation-Tests" to "Test - BCFoundation" ? >Also, perhaps you want to add the dependency to the biococoa framework as well... >Alex OK, I'll try to remember that! charles NB: is the location in the filesystem and the Xcode groups appropriate (it is inside BCFoundation) -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From kvddrift at earthlink.net Mon Mar 28 21:01:53 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Mon, 28 Mar 2005 21:01:53 -0500 Subject: [BioCocoa-dev] Peptides... In-Reply-To: References: <25387cfbc5ef84c67b0f2efc05d6e06a@earthlink.net> <3a601fa9d498df9d4c4fbc067439f5e5@earthlink.net> Message-ID: On Mar 28, 2005, at 12:22 PM, Charles PARNOT wrote: > My issue is: all the problems come if you return a NSMutableArray > instead of a NSArray in the symbolArray accessor. When you tell the > user it gets an NSArray, the user expects that the array will never > change. But if we return the ivar, the array may change and this will > yield unexpected and difficult to debug results. > for instance: > > BCSequence *seq=[BCSequence sequenceWithString:@"ATGT"]; > NSArray *symbols=[seq symbolArray]; > int n=[symbols count]; > [seq removeSymbolAtIndex:1]; > BCSymbol *last=[symbols objectAtIndex:n]; //UNEXPECTED EXCEPTION > > To make things clear, my 2-cents idea was that we could return the > NSMutableArray ivar directly when performance is an issue. The > headerdoc would clearly tell that the method returns a pointer to the > ivar, and that it should only been used when performance is an issue > (and it is actually already in the headerdoc, yes, Alex;-). Even if > the user does not read the doc, at least it can't expect the returned > array to be immutable and may be more careful anyway and if it behaves > in a weird way, the fact that it is an NSMutableArray should be a > clue. > > My opininon is the standard accessor 'symbolArray' should behave in a > very standard way, because the average user will use it and it should > be rock-solid. So when we say we return an immutable NSArray, it > should really return an immutable NSArray. > > The original issue for the Peptides example is not the difference between immutable and mutable. Instead, the issue was that the accessor symbolArray returned a copy of the symbolArray, instead of just a pointer to that array, which is usual the case with accessors. Copying the array highly slowed down the Peptides calculation, while it was not necessary at all to create a copy. Therefore I suggested that we have the standard accessor: - (NSArray *)symbolArray { return symbolArray; } If the user really needs a copy of the array, there are at least two possibilities: 1. we supply an additional accessor that returns a copy (using [[symbolArray copy] autorelease]) 2. the user uses the standard accessor, and creates the copy herself. The mutable/immutable question is unrelated to this. - Koen. From kvddrift at earthlink.net Mon Mar 28 21:04:50 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Mon, 28 Mar 2005 21:04:50 -0500 Subject: [BioCocoa-dev] Target namespace In-Reply-To: References: <68a7cf6b7c1cfaa290feabae760d8d69@mekentosj.com> Message-ID: On Mar 28, 2005, at 6:37 AM, Alexander Griekspoor wrote: > Also, I reduced the font size of the docs a bit in the css. Koen are > you developing on a 30'' screen or what? ;-) > Only a 15" PB :( BTW, I found that if you set the file-type of .html files in Xcode to text.html.documentation, it will show you the HTML rendition in the project. - Koen. From charles.parnot at stanford.edu Tue Mar 29 12:06:43 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Tue, 29 Mar 2005 09:06:43 -0800 Subject: [BioCocoa-dev] Peptides... In-Reply-To: References: <25387cfbc5ef84c67b0f2efc05d6e06a@earthlink.net> <3a601fa9d498df9d4c4fbc067439f5e5@earthlink.net> Message-ID: >The original issue for the Peptides example is not the difference between immutable and mutable. Instead, the issue was that the accessor symbolArray returned a copy of the symbolArray, instead of just a pointer to that array, which is usual the case with accessors. Copying the array highly slowed down the Peptides calculation, while it was not necessary at all to create a copy. Therefore I suggested that we have the standard accessor: > >- (NSArray *)symbolArray >{ > return symbolArray; >} > > >If the user really needs a copy of the array, there are at least two possibilities: 1. we supply an additional accessor that returns a copy (using [[symbolArray copy] autorelease]) 2. the user uses the standard accessor, and creates the copy herself. > >The mutable/immutable question is unrelated to this. > >- Koen. OK, sorry I was still confusing. I will try to make it shorter and more to the point this time ;-) I am completely aware of the optimization issue, coming from the copy (this is the kind of issue that immutable objects are supposed to address, actually, but I will keep that for later!). The problem is that in the proposed implementation, you return an NSArray, when in fact, it is a disguised NSMutableArray. This can easily bite us or the user back, for instance: BCSequence *seq=[BCSequence sequenceWithString:@"ATGT"]; NSArray *symbols=[seq symbolArray]; int n=[symbols count]; [seq removeSymbolAtIndex:1]; BCSymbol *last=[symbols objectAtIndex:n]; //UNEXPECTED EXCEPTION Don't you guys think this is an issue? Sorry if I am insisting, but you have not answered that particular point yet :-) charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From kvddrift at earthlink.net Tue Mar 29 17:38:54 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue, 29 Mar 2005 17:38:54 -0500 Subject: [BioCocoa-dev] Peptides... In-Reply-To: References: <25387cfbc5ef84c67b0f2efc05d6e06a@earthlink.net> <3a601fa9d498df9d4c4fbc067439f5e5@earthlink.net> Message-ID: <3c5900c67e2fec9a5178f811c0f0e869@earthlink.net> On Mar 29, 2005, at 12:06 PM, Charles PARNOT wrote: > The problem is that in the proposed implementation, you return an > NSArray, when in fact, it is a disguised NSMutableArray. This can > easily bite us or the user back, for instance: Ah, now I see what you mean :) Yes that could be a potential problem. I can think of two solutions: Make the symbolArray ivar non-mutable, this would mean some additional chnages in the various init methods. Or we return a mutable array from the accessor. I would prefer the second solution: - (NSMutableArray *)symbolArray { return symbolArray; } Using Cocoa/ObjC conventions, I wouldn't call the accessor mutableSymbolArray but just the name of the ivar. Also, we can always add an additional accessor to get a real copy. I hope this answers your question? cheers, - Koen. > > BCSequence *seq=[BCSequence sequenceWithString:@"ATGT"]; > NSArray *symbols=[seq symbolArray]; > int n=[symbols count]; > [seq removeSymbolAtIndex:1]; > BCSymbol *last=[symbols objectAtIndex:n]; //UNEXPECTED EXCEPTION > > Don't you guys think this is an issue? Sorry if I am insisting, but > you have not answered that particular point yet :-) > > charles > -- > Help science go fast forward: > http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ > > Charles Parnot > charles.parnot at stanford.edu > > Room B157 in Beckman Center > 279, Campus Drive > Stanford University > Stanford, CA 94305 (USA) > > Tel +1 650 725 7754 > Fax +1 650 725 8021 > _______________________________________________ > Biococoa-dev mailing list > Biococoa-dev at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biococoa-dev > From charles.parnot at stanford.edu Tue Mar 29 17:47:43 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Tue, 29 Mar 2005 14:47:43 -0800 Subject: [BioCocoa-dev] Peptides... In-Reply-To: <3c5900c67e2fec9a5178f811c0f0e869@earthlink.net> References: <25387cfbc5ef84c67b0f2efc05d6e06a@earthlink.net> <3a601fa9d498df9d4c4fbc067439f5e5@earthlink.net> <3c5900c67e2fec9a5178f811c0f0e869@earthlink.net> Message-ID: At 17:38 -0500 3/29/05, Koen van der Drift wrote: >On Mar 29, 2005, at 12:06 PM, Charles PARNOT wrote: >>The problem is that in the proposed implementation, you return an NSArray, when in fact, it is a disguised NSMutableArray. This can easily bite us or the user back, for instance: > > >Ah, now I see what you mean :) Yes that could be a potential problem. I can think of two solutions: > >Make the symbolArray ivar non-mutable, this would mean some additional chnages in the various init methods. Or we return a mutable array from the accessor. I would prefer the second solution: > >- (NSMutableArray *)symbolArray >{ > return symbolArray; >} > >Using Cocoa/ObjC conventions, I wouldn't call the accessor mutableSymbolArray but just the name of the ivar. Also, we can always add an additional accessor to get a real copy. > >I hope this answers your question? > > >cheers, > >- Koen. Yes, I agree it makes more sense at this stage to have simply: - (NSMutableArray *)symbolArray { return symbolArray; } And indeed we don't need the mutableSymbolArray method. And I would be completely happy with this :-) The only little issue is we are trusting the user will read the docs and not mess up with the array. A real NSArray is nice, because it respects encapsulation much more. If we make the symbolArray non-mutable, it almost means we have an immutable sequence... though we can always create a new array every time the sequence is modified and thus have the sequence be mutable. But that would not be very good for performance and it would become a problem in some other situations! Anyway, my opinion is that at some point, we will want to have immutable and mutable versions of the sequence, as the user is then free to use the most appropriate object depending on what she is doing, in terms of performance. charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From kvddrift at earthlink.net Tue Mar 29 17:56:11 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue, 29 Mar 2005 17:56:11 -0500 Subject: [BioCocoa-dev] Peptides... In-Reply-To: References: <25387cfbc5ef84c67b0f2efc05d6e06a@earthlink.net> <3a601fa9d498df9d4c4fbc067439f5e5@earthlink.net> <3c5900c67e2fec9a5178f811c0f0e869@earthlink.net> Message-ID: <15973cdbfabaa514f748e953dbfc5823@earthlink.net> On Mar 29, 2005, at 5:47 PM, Charles PARNOT wrote: > Yes, I agree it makes more sense at this stage to have simply: > - (NSMutableArray *)symbolArray > { > return symbolArray; > } > > And indeed we don't need the mutableSymbolArray method. And I would be > completely happy with this :-) > > The only little issue is we are trusting the user will read the docs > and not mess up with the array. A real NSArray is nice, because it > respects encapsulation much more. > Well that's part of using a framework I guess ;-) I will commit the changes later tonight, including some headerdoc info on the method. - Koen. From kvddrift at earthlink.net Tue Mar 29 18:13:24 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue, 29 Mar 2005 18:13:24 -0500 Subject: [Biococoa-dev] Optimizations In-Reply-To: References: Message-ID: <47c2a3569f34f6f1e2c58d01317cba1a@earthlink.net> On Mar 28, 2005, at 12:01 PM, John Timmer wrote: > Just to follow up on my own info: to call the CFSet function, I was > typecasting it using (CFSetRef *), but that was spitting out a warning > anyway, so I tried just getting rid of it. It worked anyway, and > actually > appeared to be slightly faster. Would that be expected? > You don't get the warning if you typecast it as (CFSetRef) - without the pointer symbol. - Koen. From jtimmer at bellatlantic.net Tue Mar 29 18:31:54 2005 From: jtimmer at bellatlantic.net (John Timmer) Date: Tue, 29 Mar 2005 18:31:54 -0500 Subject: [BioCocoa-dev] Peptides... In-Reply-To: Message-ID: > The only little issue is we are trusting the user will read the docs and not > mess up with the array. A real NSArray is nice, because it respects > encapsulation much more. The other alternative is that, if this method is called, the sequence starts key/value observing the array. I'm not sure when array operators were introduced, but if it's 10.2, we could start observing the @count of the array as soon as somebody got access to it. That won't help us if anybody starts replacing nucleotides with an amino acid, but at least allows us to do bounds safety. JT _______________________________________________ This mind intentionally left blank From kvddrift at earthlink.net Tue Mar 29 19:38:02 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Tue, 29 Mar 2005 19:38:02 -0500 Subject: [BioCocoa-dev] Peptides... In-Reply-To: References: Message-ID: <4d3681ed9e47c5cad4b4a06d8af457c0@earthlink.net> On Mar 29, 2005, at 6:31 PM, John Timmer wrote: > That won't help us if anybody starts replacing nucleotides with an > amino > acid, but at least allows us to do bounds safety. > I think we should discourage users to get the symbolArray, and let them use the BCSequence as much as possible, unless they really know what they are doing. If they stick to the BCSequence, the symbolset will prevent that nucleotides are replaced by amino acids. - Koen. From kvddrift at earthlink.net Wed Mar 30 17:04:36 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Wed, 30 Mar 2005 17:04:36 -0500 Subject: [Biococoa-dev] using alignment code Message-ID: Hi, Being curious how the alignment works, I added the following code to one of the demos: // do some testing for the alignment code BCSequence *first = [BCSequence sequenceWithString:@"CTATGTTGATTTGGAA"]; BCSequence *second = [BCSequence sequenceWithString:@"ATGGTGATTTTGAA"]; BCSequenceAlignment *alignArray = [BCSequenceAlignment needlemanWunschAlignmentWithSequences: [NSArray arrayWithObjects: first, second, nil] properties: nil]; NSLog ( @"the first alignment sequence is %@", [alignArray sequenceAtIndex:2] ); NSLog ( @"the second alignment sequence is %@", [alignArray sequenceAtIndex:1] ); The output is as follwos: 2005-03-30 17:01:20.529 Translation[24738] the first alignment sequence is 2005-03-30 17:01:20.530 Translation[24738] the second alignment sequence is ?ATGGTGATTTTGAA First question, is this the correct approach? Second question is, what do I use for the 'properties'? cheers, - Koen. From kvddrift at earthlink.net Thu Mar 31 10:23:56 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 31 Mar 2005 10:23:56 -0500 Subject: [Biococoa-dev] more on Peptides Message-ID: Hi, Right now the Peptides example uses the Results class for the peptides that were found. Since we are using the BioCocoa framwork (d'oh), shouldn't we be using the BCSequenceProtein class instead? To see if this would make a speed difference, I added a new class to the target, Peptide, which now is a subclass of BCSequenceProtein. Also in the theController.m I had to make some changes, however, I have bracketed these with a #define that can be set to 0 or 1 to use the Peptide or Result class, respectively. It is located at the top the theController.m: #define USE_RESULT_CLASS 0 Looking at the speed (on an iMac G3 400 MHz), and these settings: peptide mass: 876 charge: 2 accuracy: 150 ppm -> actually this should be called tolerance, I will commit that too with the extra code Using the Peptide class averages at 2.5 sec, using the Result class averages at 2.4 sec. So there is not much of a difference on my iMac. What are others opinion on this? cheers, - Koen. From charles.parnot at stanford.edu Thu Mar 31 11:51:41 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Thu, 31 Mar 2005 08:51:41 -0800 Subject: [Biococoa-dev] more on Peptides In-Reply-To: References: Message-ID: The difference in speed does not look significant to me, so yes, we could use a BCPeptide class if this is useful! charles At 10:23 -0500 3/31/05, Koen van der Drift wrote: >Hi, > >Right now the Peptides example uses the Results class for the peptides that were found. Since we are using the BioCocoa framwork (d'oh), shouldn't we be using the BCSequenceProtein class instead? To see if this would make a speed difference, I added a new class to the target, Peptide, which now is a subclass of BCSequenceProtein. Also in the theController.m I had to make some changes, however, I have bracketed these with a #define that can be set to 0 or 1 to use the Peptide or Result class, respectively. It is located at the top the theController.m: > >#define USE_RESULT_CLASS 0 > > >Looking at the speed (on an iMac G3 400 MHz), and these settings: > >peptide mass: 876 >charge: 2 >accuracy: 150 ppm -> actually this should be called tolerance, I will commit that too with the extra code > >Using the Peptide class averages at 2.5 sec, using the Result class averages at 2.4 sec. So there is not much of a difference on my iMac. > >What are others opinion on this? > > >cheers, > >- Koen. > >_______________________________________________ >Biococoa-dev mailing list >Biococoa-dev at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/biococoa-dev -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From kvddrift at earthlink.net Thu Mar 31 12:00:49 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 31 Mar 2005 12:00:49 -0500 Subject: [Biococoa-dev] more on Peptides In-Reply-To: References: Message-ID: <7390c52d8704ad406aaf7d67ceb215a4@earthlink.net> On Mar 31, 2005, at 11:51 AM, Charles PARNOT wrote: > The difference in speed does not look significant to me, so yes, we > could use a BCPeptide class if this is useful! > There is actually no need for a separate BCPeptide class, I think. It's just a very short protein, so we can use the BCSequenceProtein class for it. The new class in the demo has some additional ivars specifically to be used by the GUI of that app, which is why I created the Peptide class. - Koen. From charles.parnot at stanford.edu Thu Mar 31 12:13:55 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Thu, 31 Mar 2005 09:13:55 -0800 Subject: [Biococoa-dev] more on Peptides In-Reply-To: <7390c52d8704ad406aaf7d67ceb215a4@earthlink.net> References: <7390c52d8704ad406aaf7d67ceb215a4@earthlink.net> Message-ID: At 12:00 -0500 3/31/05, Koen van der Drift wrote: >On Mar 31, 2005, at 11:51 AM, Charles PARNOT wrote: > >>The difference in speed does not look significant to me, so yes, we could use a BCPeptide class if this is useful! >> > >There is actually no need for a separate BCPeptide class, I think. It's just a very short protein, so we can use the BCSequenceProtein class for it. The new class in the demo has some additional ivars specifically to be used by the GUI of that app, which is why I created the Peptide class. > >- Koen. OK, I did not understand in the first place, but you were just talking about the Peptide example implementation and I have not followed that very much, so I don't really have an opinion. Still true, though: the difference in speed does not look significant to me :-) charles -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021 From kvddrift at earthlink.net Thu Mar 31 12:24:18 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Thu, 31 Mar 2005 12:24:18 -0500 Subject: [Biococoa-dev] BCFoundation Test Message-ID: Hi Charles, The BCFoundation-Test target now gives many warnings about skippingUnknownSymbols, which you removed from the BCAbstractSequence init methods a few days ago. I don't to mess with your code, so would you mind fixing that and commit it to cvs? Also there is one error: ld: warning prebinding disabled because of undefined symbols ld: Undefined symbols: .objc_class_name_BCSequence .objc_class_name_BCSequenceDNA .objc_class_name_BCSequenceRNA .objc_class_name_BCSequenceProtein /usr/bin/libtool: internal link edit command failed I tried adding the BCCocoa framework to the target, but that didn't work (the drag-drop was not accepted). Any idea how to fix that? cheers, - Koen. From charles.parnot at stanford.edu Thu Mar 31 12:54:34 2005 From: charles.parnot at stanford.edu (Charles PARNOT) Date: Thu, 31 Mar 2005 09:54:34 -0800 Subject: [Biococoa-dev] Re: BCFoundation Test In-Reply-To: References: Message-ID: Yes, I know about that. This is next in my agenda... I want to update the tests to use the modified BCSequence, and add more info in the dev docs. If you are trying to write some tests, you could temporarily remove the faulty test classes from the target (uncheck the Target box). I really think a lot of things could be added to the test classes. I also need to change the name to 'Test - BCFoundation' charles At 12:24 -0500 3/31/05, Koen van der Drift wrote: >Hi Charles, > >The BCFoundation-Test target now gives many warnings about skippingUnknownSymbols, which you removed from the BCAbstractSequence init methods a few days ago. I don't to mess with your code, so would you mind fixing that and commit it to cvs? Also there is one error: > >ld: warning prebinding disabled because of undefined symbols >ld: Undefined symbols: >.objc_class_name_BCSequence >.objc_class_name_BCSequenceDNA >.objc_class_name_BCSequenceRNA >.objc_class_name_BCSequenceProtein >/usr/bin/libtool: internal link edit command failed > >I tried adding the BCCocoa framework to the target, but that didn't work (the drag-drop was not accepted). Any idea how to fix that? > >cheers, > >- Koen. -- Help science go fast forward: http://cmgm.stanford.edu/~cparnot/xgrid-stanford/ Charles Parnot charles.parnot at stanford.edu Room B157 in Beckman Center 279, Campus Drive Stanford University Stanford, CA 94305 (USA) Tel +1 650 725 7754 Fax +1 650 725 8021