<HTML><BODY style="word-wrap: break-word; -khtml-nbsp-mode: space; -khtml-line-break: after-white-space; ">Hi guys,<DIV><BR class="khtml-block-placeholder"></DIV><DIV>My apologies for not having jumped in earlier, certainly towards Koen and John I'm sorry, I should have given a summary of the WWDC meeting much earlier. I understand that all of this comes out of the blue, but understand us well, this is all still open for debate. In fact I hope that we will discuss things more elaborate on the list in order to come up with the best implementation. I'll try to summarise the topics discussed at the WWDC and the thoughts behind them below.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>The story begins while Phil and I were preparing the slides for our small presentation planned on Wednesday evening. I have to admit that I had spend very few time on BioCocoa in the month before and did not have the exact structure in my head anymore. When I started to look again at our implementation it was not really trivial how the sequence class cluster was set up and also Phil had problems getting the exact idea. John did a wonderful job explaining many of the ideas in his document he created just before the WWDC, but still I think it needs re-consideration. As even developers of the framework can't get it easily, imagine new users. So we decided to not spend much time during the presentation on the implementation, both because it's a moving target still and also because we thought that it would not be of particular interest to the audience. We did decide to tell about our biojava like approach for singleton BCSymbol objects. That pattern is easy to explain and easy to get. Our main focus however was on the things we had in mind with the framework, the potential use, and the question for feedback and input. What needs can it fulfil and what are people looking for?</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>Partially due to the rescheduled apple design awards (hooray for Peter!) we had a fairly small group of listeners, but already the discussion with the group was worth coming together I think. It was clear that most "new" people were from fields that focused on large scale genomics projects, clearly a different "target audience" than our frameworks aims at. If I maybe so blunt, I think it's safe to say that initially we aim at developers like ourselves, who create fairly small applications with many standard (and fairly simple) sequence editing routines on small sized sequences. Of course, we should aim at expanding this levels way higher, but that's not our initial goal right? One of the guys in the public explained that the philosophy behind BioJava was actually opposite, aimed at large scale genome-sized sequences, mainly focused on annotations. It was even difficult to convince the guy that there was a need for something we do! I told him that there clearly is a need for programs like vector-nti, which he agreed with in the end.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>Not suprisingly, the main topic of discussion quickly turned to performance, with the basic question: where do we place the border between objects and structures. We want a cocoa-like interface and ease of use, but also performance in terms of speed and memory footprint. Ideally we would like to have something like NSString, which is easy to use, has many convenient methods, but works fast because of under-the-hood implementation that uses different c structures based on the type of string you use. Now the problem is that we have to design that under the hood part of our sequence objects. </DIV><DIV>Initially we choose for the BioJava approach of singleton objects (yes, I was(/am) a great fan). </DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>Let me summarize the benefits:</DIV><DIV>- Objects! Powerful methods, easy accessible properties, etc. all the nice goodies from cocoa</DIV><DIV>- Way more powerful than a simple char</DIV><DIV>- Singleton objects to dramatically reduce memory footprint, a sequence is simply a list of pointers to the singleton objects.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>However there are clear negatives as well, many discussed before:</DIV><DIV>- Objects! Bigger than char, not that much but still. Storing 200Mb of sequence or 4-8 times as much makes a difference! The singleton do make it dramatically different though, and I still consider this one of the smallest problems.</DIV><DIV>- Speed. Object messaging is the number one problem here, requiring all kinds of hacks and tricks to get decent performance. The main problem lies in the use of NSArray and alike to store the list of pointers to the symbols. Although very convenient for editing, this kills performance. Certainly when the most frequent operation with sequences is iteration over the array.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>In conclusion, the singleton symbols are great! But the problem lies in the NSArray way of storing the sequence of them! </DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>Now is there a better solution? Well one obvious theme brought up many times was the old trick to convert the sequence object to a string, do the stuff that needs to be done, and convert the result back to a sequence object. The benefits are easy to see: chars are smaller and speedier to work with, and another plus: many algorithms are available for strings already. We also realized that this was something that would often be needed, thus needed a general sequence-to-string-and-back implementation. Why not? </DIV><DIV>It's slow, even more slowdowns! true, the conversion time would often be neglect-able compared to the actual implementation, still it would take time.</DIV><DIV>I always opposed quite strongly against all this if I could. The idea was simple, if we go for a certain implementation we should eat our own dog food, it should be so good that it would be able to handle the problems described. Alignments should work natively with BCSequences, reversing should etc. I realized that that was an illusion, and not practical. </DIV><DIV>But now, I realize even more that this indeed tells us that we were on the wrong track! Our BCSequences could not be used for this, they're not suited for most of the tasks they should perform! We need another implementation. </DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>The credits have to go to Jeff, a graduate student new to BioCocoa and who I hope will join the project one day. But from all above it should be obvious what to do. We should use strings (or char arrays to be more precise). Now to quote Koen: WTF are we throwing away all the things we did in the past months?</DIV><DIV>No, absolutely not. The idea is simple. The native way of storing the sequence INSIDE a BCSequence object should not be an NSArray of pointers to symbols, but would be a char array (or NSData object as Charles suggested, but lets skip the implementation for now and focus on the idea). The BCSequence object would become a wrapper object around the string as "data store". The benefits are easy to see:</DIV><DIV>- size is as compact as possible, one could even think of applying classical compression algorithms to make them even smaller.</DIV><DIV>- the string is always available to any implementation so:</DIV><DIV> - no conversion needed, the string is always there</DIV><DIV> - speed, all implementations work with strings, no iterations over ns/cfarrays</DIV><DIV> - we can use all existing and standard string based algorithms, i.e. for alignments, but also for instance standard regular expression libraries for searching, matching, etc.</DIV><DIV>However, to the OUTSIDE world we ARE (or perhaps better SEEM) arrays of singleton objects. If the sequence is asked for the symbol at position 18 for instance, we return the singleton object. If they want a subsequence however, we again return a bcsequence which internally has its char array of course. If you think about the number of times you really want the symbol and not for instance a sequence, range or annotation, that's not many I think. </DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>The really only downside I think is the fact that programming the implementations using strings is somewhat more complex, more c less cocoa, more pointer fiddling, less enumerators. But since in many occasions we already started that to "hack" things faster, and already opted to do the conversions necessary to get at that point, I guess it's not a problem so much. In fact, we can now use many standard char implementations already available (and tested). Of course, if speed is not an issue we can still do it the old way because there is still a way to get the pointer to the symbol for any position.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>So far the theory, now part II: implementing the thing....</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV><BR><DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><FONT face="Helvetica" size="3" style="font: 12.0px Helvetica">*********************************************************<SPAN class="Apple-converted-space"> </SPAN></FONT></DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><FONT face="Helvetica" size="3" style="font: 12.0px Helvetica"><SPAN class="Apple-converted-space"> </SPAN>** Alexander Griekspoor **</FONT></DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><FONT face="Helvetica" size="3" style="font: 12.0px Helvetica">*********************************************************<SPAN class="Apple-converted-space"> </SPAN></FONT></DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><FONT face="Helvetica" size="3" style="font: 12.0px Helvetica"><SPAN class="Apple-converted-space"> </SPAN>The Netherlands Cancer Institute</FONT></DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><FONT face="Helvetica" size="3" style="font: 12.0px Helvetica"><SPAN class="Apple-converted-space"> </SPAN>Department of Tumorbiology (H4)</FONT></DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><FONT face="Helvetica" size="3" style="font: 12.0px Helvetica"><SPAN class="Apple-converted-space"> </SPAN>Plesmanlaan 121, 1066 CX, Amsterdam</FONT></DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><FONT face="Helvetica" size="3" style="font: 12.0px Helvetica"><SPAN class="Apple-converted-space"> </SPAN>Tel:<SPAN class="Apple-converted-space"> </SPAN>+ 31 20 - 512 2023</FONT></DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><FONT face="Helvetica" size="3" style="font: 12.0px Helvetica"><SPAN class="Apple-converted-space"> </SPAN>Fax:<SPAN class="Apple-converted-space"> </SPAN>+ 31 20 - 512 2029</FONT></DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><FONT face="Helvetica" size="3" style="font: 12.0px Helvetica"><SPAN class="Apple-converted-space"> </SPAN>AIM: <A href="mailto:mekentosj@mac.com">mekentosj@mac.com</A></FONT></DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><FONT face="Helvetica" size="3" style="font: 12.0px Helvetica"><SPAN class="Apple-converted-space"> </SPAN>E-mail: <A href="mailto:a.griekspoor@nki.nl">a.griekspoor@nki.nl</A></FONT></DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><FONT face="Helvetica" size="3" style="font: 12.0px Helvetica"><SPAN class="Apple-converted-space"> </SPAN>Web: <A href="http://www.mekentosj.com">http://www.mekentosj.com</A></FONT></DIV><P style="margin: 0.0px 0.0px 0.0px 0.0px; min-height: 14.0px"><FONT face="Helvetica" size="3" style="font: 12.0px Helvetica"><SPAN class="Apple-converted-space"> </SPAN></FONT><BR class="khtml-block-placeholder"></P><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><FONT face="Helvetica" size="3" style="font: 12.0px Helvetica"><SPAN class="Apple-converted-space"> </SPAN>Microsoft is not the answer,</FONT></DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><FONT face="Helvetica" size="3" style="font: 12.0px Helvetica"><SPAN class="Apple-converted-space"> </SPAN>Microsoft is the question,</FONT></DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><FONT face="Helvetica" size="3" style="font: 12.0px Helvetica"><SPAN class="Apple-converted-space"> </SPAN>NO is the answer</FONT></DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font: normal normal normal 12px/normal Helvetica; min-height: 14px; "><BR></DIV><DIV style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><FONT face="Helvetica" size="3" style="font: 12.0px Helvetica">*********************************************************</FONT></DIV> </DIV><BR></DIV></BODY></HTML>