[Biococoa-dev] New Structure for BioCocoa

Sun Jul 3 06:12:22 EDT 2005

Hey Charles,

Am 03.07.2005 um 01:39 schrieb Charles Parnot:

> Thanks Phil!
>
> I like the parser idea, particularly if it is already written by  
> you ;-)
> I won't be of any help with C++, though!

this is no problem, because there is no much need of c++ except from  
expression templates.
The parser isn't allready written by me sorry ;-), but i'm working on  
it.

I'd like to know the different sequence formats i should implement,  
that i can finish the parsers for the sequence io first.

> The structure you outline looks fine to me, and I am not sure why  
> we should stop implementing stuff now. Clearly, if we agree to use  
> a parser, we should not write code for the IO until it is ready  
> (though to test the parser, the best is to use it, so the IO would  
> probably grow at the same time as the parser). But the  
> modifications in the sequence structure can be implemented now. I  
> think we should simply define goals and have everybody make it  
> clear what they want to contribute too, and have several  
> independent lines of development that do not depend too much on  
> each other and that can be done independently. Here is a possible  
> roadmap, made up in 5 minutes (needs some refinement!):
> * get the IO to work (at least read sequences)
> * modify the sequence structure (read below) and make sure we have  
> some methods that can be used by the parser to create the sequence  
> (the internals of BCSequence should be as much as possible  
> encapsulated and not directly accessed by the parser)
> * get the annotations up and running; the annotation issue should  
> not prevent the IO from being implemented; in a first phase, the IO  
> can parse the annotations but not use them; classes and methods to  
> manipulate annotations can be later added to the sequence object,  
> and the parser modified to add these calls.

Sounds good to me, just wanted to make clear that we don't do any  
work, we can't use in the future.

>
>
> Now, Koen rightly complained he did not get a report of the WWDC  
> meeting (and the other absent did not get it too). Here is a  
> (complete?) list of the decisions/discussions we had.:
> * change the internal structure of teh sequence string in  
> BCSequence (read below)
> * think about annotations
> * look at the internals of BCAnnotatedString of GNUStep to see how  
> the annotations are done, because the structure of  
> NSAnnotatedString is very similar to sequence annotations
> * probably not worry about performance issues with annotations;  
> manipulating annotations will not happen that often, mostly when  
> modifying a sequence, and generating a subsequence; the bottom line  
> is we can probably stick to NSMutableDictionary (I discussed that  
> in a previous email)
> * still think even more about annotations
> * better define the purpose of BioCocoa, and the programmer niche  
> we are trying to target (the niche is probably us, at this point!)
> * write some code
>
>
> Regarding the sequence structure Phil mentions, I will try to  
> explain it now for those of us that were not part of the discussion.
>
> Short version
> -------------
> Replace the NSArray of BCSymbol with a char [ ]...
>
>
> Long version
> ------------
>
> * The sequence will be stored internally as an array of char, which  
> will make the performance discussions moot. A lot of the sequence  
> manipulations are particularly easy to handle as strings. I don't  
> know if we have decided to use an NSMutableData ivar, or do the  
> malloc ourselves. Using NSData is probably a better idea, as it  
> will already be optimized for
>
> * The public interface will expose arrays of BCSymbols. Because a  
> BCSequence has always a BCSymbolSet associated with it, it is easy  
> to convert between chars and BCSymbol objects on demand. All the  
> methods for that are already available. The NSArray can even be  
> cached (and reconstructed as needed as soon as the sequence is  
> modified).
>
> * The public interface could probably have a method to return the  
> array of chars as well as an autoreleased object. This is very easy  
> e.g. creating an autoreleased NSData populated with a copy of the  
> sequence bytes (and return either the *char or the NSData itself).  
> The copy of the bytes (necessary for mutable sequences) will be  
> fast, much faster than copying the NSArray (with all the useless  
> retain/release of the singleton BCSymbols). So we don't have to  
> worry about the issue of returning the internal array used by the  
> sequence when the sequence is mutable (we only have mutable  
> sequences at this point, but I plan to add immutable ones, I know,  
> I am obsessed with that issue).

Very good summary of the discussion.
I think we should try to implement the string thing in different ways  
and test the performance, to see which one is the best.

cheers,

Phil

> On Jul 2, 2005, at 8:45 AM, Philipp Seibel wrote:
>
>
>> Hi all,
>>
>> i want to start the discussion on the mailinglist, we allready  
>> started at the wwdc.
>> In my point of view the BioCocoa project needs to get a modular  
>> and flexible structure. The attached pdf shows my suggestion of  
>> the possible new structure.
>> The next thing we have to discuss is the implementation of the  
>> datastructures in the BCFoundation framework. Our wwdc-discussion  
>> lead to a new string based sequence structure.
>> I think we should spend quite some time to plan the future  
>> structure of BioCocoa and stop implementation until the new  
>> structure is decided. We all want a 1.0 version of the framework  
>> and there are at least two persons from the wwdc, who want to use  
>> BioCocoa in their projects, so we should go for it. :-) (i should  
>> teach professional motivation practices :-)).
>>
>> The discussion is open .......
>>
>> BTW: I allready startet the BCParser.framework mentioned in the  
>> attached document. I think of a very flexible highlevel parser  
>> framework with event driven parsers like NSXMLParser.
>> This allows easy implementation of various file formats for  
>> different datastructures. Not everybody is satisfied with a  
>> biococa sequence and wants to have his own structure, the parser  
>> api allows to parse the files into any datastructure, and of  
>> course also into our future BCFoundation structures. The api is  
>> based on the c++ boost-spirit parser apis and is developed as  
>> objective-c++ framework, without any dynamic linking dependancies.  
>> Just tell me what you think about it ....
>>
>> cheers,
>>
>> Phil
>>
>>
>> <BCFrameworks.pdf>
>>  _______________________________________________
>> Biococoa-dev mailing list
>> Biococoa-dev at bioinformatics.org
>> https://bioinformatics.org/mailman/listinfo/biococoa-dev
>>
>>
>
> --
> Xgrid-at-Stanford
> Help science move fast forward:
> http://cmgm.stanford.edu/~cparnot/xgrid-stanford
>
> Charles Parnot
> charles.parnot at gmail.com
>
>
>
> _______________________________________________
> Biococoa-dev mailing list
> Biococoa-dev at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biococoa-dev
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/biococoa-dev/attachments/20050703/a1eb08f4/attachment.html>