[Biococoa-dev] New Structure for BioCocoa

Sat Jul 2 20:59:39 EDT 2005

First quick reaction:  WTF - is this going to throw away all our 
efforts up until now? Should I stop adding stuff to the framework until 
the new structure is in place?

Second reaction: using an internal string does make a lot of sense, 
especially because a lot of manipulations can be done much easier.  It 
probably makes it also easier to read/write text files, from databases 
and xml.

Now I need some time to really think about the new development ideas. 
Are there more surprises from wwdc?

cheers,

- Koen.

On Jul 2, 2005, at 7:39 PM, Charles Parnot wrote:

> Thanks Phil!
>
> I like the parser idea, particularly if it is already written by you 
> ;-)
> I won't be of any help with C++, though!
>
> The structure you outline looks fine to me, and I am not sure why we 
> should stop implementing stuff now. Clearly, if we agree to use a 
> parser, we should not write code for the IO until it is ready (though 
> to test the parser, the best is to use it, so the IO would probably 
> grow at the same time as the parser). But the modifications in the 
> sequence structure can be implemented now. I think we should simply 
> define goals and have everybody make it clear what they want to 
> contribute too, and have several independent lines of development that 
> do not depend too much on each other and that can be done 
> independently. Here is a possible roadmap, made up in 5 minutes (needs 
> some refinement!):
> * get the IO to work (at least read sequences)
> * modify the sequence structure (read below) and make sure we have 
> some methods that can be used by the parser to create the sequence 
> (the internals of BCSequence should be as much as possible 
> encapsulated and not directly accessed by the parser)
> * get the annotations up and running; the annotation issue should not 
> prevent the IO from being implemented; in a first phase, the IO can 
> parse the annotations but not use them; classes and methods to 
> manipulate annotations can be later added to the sequence object, and 
> the parser modified to add these calls.
>
>
>
> Now, Koen rightly complained he did not get a report of the WWDC 
> meeting (and the other absent did not get it too). Here is a 
> (complete?) list of the decisions/discussions we had.:
> * change the internal structure of teh sequence string in BCSequence 
> (read below)
> * think about annotations
> * look at the internals of BCAnnotatedString of GNUStep to see how the 
> annotations are done, because the structure of NSAnnotatedString is 
> very similar to sequence annotations
> * probably not worry about performance issues with annotations; 
> manipulating annotations will not happen that often, mostly when 
> modifying a sequence, and generating a subsequence; the bottom line is 
> we can probably stick to NSMutableDictionary (I discussed that in a 
> previous email)
> * still think even more about annotations
> * better define the purpose of BioCocoa, and the programmer niche we 
> are trying to target (the niche is probably us, at this point!)
> * write some code
>
>
> Regarding the sequence structure Phil mentions, I will try to explain 
> it now for those of us that were not part of the discussion.
>
> Short version
> -------------
> Replace the NSArray of BCSymbol with a char [ ]...
>
>
> Long version
> ------------
>
> * The sequence will be stored internally as an array of char, which 
> will make the performance discussions moot. A lot of the sequence 
> manipulations are particularly easy to handle as strings. I don't know 
> if we have decided to use an NSMutableData ivar, or do the malloc 
> ourselves. Using NSData is probably a better idea, as it will already 
> be optimized for
>
> * The public interface will expose arrays of BCSymbols. Because a 
> BCSequence has always a BCSymbolSet associated with it, it is easy to 
> convert between chars and BCSymbol objects on demand. All the methods 
> for that are already available. The NSArray can even be cached (and 
> reconstructed as needed as soon as the sequence is modified).
>
> * The public interface could probably have a method to return the 
> array of chars as well as an autoreleased object. This is very easy 
> e.g. creating an autoreleased NSData populated with a copy of the 
> sequence bytes (and return either the *char or the NSData itself). The 
> copy of the bytes (necessary for mutable sequences) will be fast, much 
> faster than copying the NSArray (with all the useless retain/release 
> of the singleton BCSymbols). So we don't have to worry about the issue 
> of returning the internal array used by the sequence when the sequence 
> is mutable (we only have mutable sequences at this point, but I plan 
> to add immutable ones, I know, I am obsessed with that issue).
>
>
> On Jul 2, 2005, at 8:45 AM, Philipp Seibel wrote:
>
>> Hi all,
>>
>> i want to start the discussion on the mailinglist, we allready 
>> started at the wwdc.
>> In my point of view the BioCocoa project needs to get a modular and 
>> flexible structure. The attached pdf shows my suggestion of the 
>> possible new structure.
>> The next thing we have to discuss is the implementation of the 
>> datastructures in the BCFoundation framework. Our wwdc-discussion 
>> lead to a new string based sequence structure.
>> I think we should spend quite some time to plan the future structure 
>> of BioCocoa and stop implementation until the new structure is 
>> decided. We all want a 1.0 version of the framework and there are at 
>> least two persons from the wwdc, who want to use BioCocoa in their 
>> projects, so we should go for it. :-) (i should teach professional 
>> motivation practices :-)).
>>
>> The discussion is open .......
>>
>> BTW: I allready startet the BCParser.framework mentioned in the 
>> attached document. I think of a very flexible highlevel parser 
>> framework with event driven parsers like NSXMLParser.
>> This allows easy implementation of various file formats for different 
>> datastructures. Not everybody is satisfied with a biococa sequence 
>> and wants to have his own structure, the parser api allows to parse 
>> the files into any datastructure, and of course also into our future 
>> BCFoundation structures. The api is based on the c++ boost-spirit 
>> parser apis and is developed as objective-c++ framework, without any 
>> dynamic linking dependancies. Just tell me what you think about it 
>> ....
>>
>> cheers,
>>
>> Phil
>>
>>
>> <BCFrameworks.pdf>
>>  _______________________________________________
>> Biococoa-dev mailing list
>> Biococoa-dev at bioinformatics.org
>> https://bioinformatics.org/mailman/listinfo/biococoa-dev
>>
>
> --
> Xgrid-at-Stanford
> Help science move fast forward:
> http://cmgm.stanford.edu/~cparnot/xgrid-stanford
>
> Charles Parnot
> charles.parnot at gmail.com
>
>
>
> _______________________________________________
> Biococoa-dev mailing list
> Biococoa-dev at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biococoa-dev
>