[Biococoa-dev] New Structure for BioCocoa
Philipp Seibel
biococoa at bioworxx.com
Sun Jul 3 06:12:22 EDT 2005
Hey Charles,
Am 03.07.2005 um 01:39 schrieb Charles Parnot:
> Thanks Phil!
>
> I like the parser idea, particularly if it is already written by
> you ;-)
> I won't be of any help with C++, though!
this is no problem, because there is no much need of c++ except from
expression templates.
The parser isn't allready written by me sorry ;-), but i'm working on
it.
I'd like to know the different sequence formats i should implement,
that i can finish the parsers for the sequence io first.
> The structure you outline looks fine to me, and I am not sure why
> we should stop implementing stuff now. Clearly, if we agree to use
> a parser, we should not write code for the IO until it is ready
> (though to test the parser, the best is to use it, so the IO would
> probably grow at the same time as the parser). But the
> modifications in the sequence structure can be implemented now. I
> think we should simply define goals and have everybody make it
> clear what they want to contribute too, and have several
> independent lines of development that do not depend too much on
> each other and that can be done independently. Here is a possible
> roadmap, made up in 5 minutes (needs some refinement!):
> * get the IO to work (at least read sequences)
> * modify the sequence structure (read below) and make sure we have
> some methods that can be used by the parser to create the sequence
> (the internals of BCSequence should be as much as possible
> encapsulated and not directly accessed by the parser)
> * get the annotations up and running; the annotation issue should
> not prevent the IO from being implemented; in a first phase, the IO
> can parse the annotations but not use them; classes and methods to
> manipulate annotations can be later added to the sequence object,
> and the parser modified to add these calls.
Sounds good to me, just wanted to make clear that we don't do any
work, we can't use in the future.
>
>
> Now, Koen rightly complained he did not get a report of the WWDC
> meeting (and the other absent did not get it too). Here is a
> (complete?) list of the decisions/discussions we had.:
> * change the internal structure of teh sequence string in
> BCSequence (read below)
> * think about annotations
> * look at the internals of BCAnnotatedString of GNUStep to see how
> the annotations are done, because the structure of
> NSAnnotatedString is very similar to sequence annotations
> * probably not worry about performance issues with annotations;
> manipulating annotations will not happen that often, mostly when
> modifying a sequence, and generating a subsequence; the bottom line
> is we can probably stick to NSMutableDictionary (I discussed that
> in a previous email)
> * still think even more about annotations
> * better define the purpose of BioCocoa, and the programmer niche
> we are trying to target (the niche is probably us, at this point!)
> * write some code
>
>
> Regarding the sequence structure Phil mentions, I will try to
> explain it now for those of us that were not part of the discussion.
>
> Short version
> -------------
> Replace the NSArray of BCSymbol with a char [ ]...
>
>
> Long version
> ------------
>
> * The sequence will be stored internally as an array of char, which
> will make the performance discussions moot. A lot of the sequence
> manipulations are particularly easy to handle as strings. I don't
> know if we have decided to use an NSMutableData ivar, or do the
> malloc ourselves. Using NSData is probably a better idea, as it
> will already be optimized for
>
> * The public interface will expose arrays of BCSymbols. Because a
> BCSequence has always a BCSymbolSet associated with it, it is easy
> to convert between chars and BCSymbol objects on demand. All the
> methods for that are already available. The NSArray can even be
> cached (and reconstructed as needed as soon as the sequence is
> modified).
>
> * The public interface could probably have a method to return the
> array of chars as well as an autoreleased object. This is very easy
> e.g. creating an autoreleased NSData populated with a copy of the
> sequence bytes (and return either the *char or the NSData itself).
> The copy of the bytes (necessary for mutable sequences) will be
> fast, much faster than copying the NSArray (with all the useless
> retain/release of the singleton BCSymbols). So we don't have to
> worry about the issue of returning the internal array used by the
> sequence when the sequence is mutable (we only have mutable
> sequences at this point, but I plan to add immutable ones, I know,
> I am obsessed with that issue).
Very good summary of the discussion.
I think we should try to implement the string thing in different ways
and test the performance, to see which one is the best.
cheers,
Phil
> On Jul 2, 2005, at 8:45 AM, Philipp Seibel wrote:
>
>
>> Hi all,
>>
>> i want to start the discussion on the mailinglist, we allready
>> started at the wwdc.
>> In my point of view the BioCocoa project needs to get a modular
>> and flexible structure. The attached pdf shows my suggestion of
>> the possible new structure.
>> The next thing we have to discuss is the implementation of the
>> datastructures in the BCFoundation framework. Our wwdc-discussion
>> lead to a new string based sequence structure.
>> I think we should spend quite some time to plan the future
>> structure of BioCocoa and stop implementation until the new
>> structure is decided. We all want a 1.0 version of the framework
>> and there are at least two persons from the wwdc, who want to use
>> BioCocoa in their projects, so we should go for it. :-) (i should
>> teach professional motivation practices :-)).
>>
>> The discussion is open .......
>>
>> BTW: I allready startet the BCParser.framework mentioned in the
>> attached document. I think of a very flexible highlevel parser
>> framework with event driven parsers like NSXMLParser.
>> This allows easy implementation of various file formats for
>> different datastructures. Not everybody is satisfied with a
>> biococa sequence and wants to have his own structure, the parser
>> api allows to parse the files into any datastructure, and of
>> course also into our future BCFoundation structures. The api is
>> based on the c++ boost-spirit parser apis and is developed as
>> objective-c++ framework, without any dynamic linking dependancies.
>> Just tell me what you think about it ....
>>
>> cheers,
>>
>> Phil
>>
>>
>> <BCFrameworks.pdf>
>> _______________________________________________
>> Biococoa-dev mailing list
>> Biococoa-dev at bioinformatics.org
>> https://bioinformatics.org/mailman/listinfo/biococoa-dev
>>
>>
>
> --
> Xgrid-at-Stanford
> Help science move fast forward:
> http://cmgm.stanford.edu/~cparnot/xgrid-stanford
>
> Charles Parnot
> charles.parnot at gmail.com
>
>
>
> _______________________________________________
> Biococoa-dev mailing list
> Biococoa-dev at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biococoa-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/biococoa-dev/attachments/20050703/a1eb08f4/attachment.html>
More information about the Biococoa-dev
mailing list