[Biococoa-dev] New Structure for BioCocoa

Sat Jul 2 19:39:45 EDT 2005

Thanks Phil!

I like the parser idea, particularly if it is already written by you ;-)
I won't be of any help with C++, though!

The structure you outline looks fine to me, and I am not sure why we  
should stop implementing stuff now. Clearly, if we agree to use a  
parser, we should not write code for the IO until it is ready (though  
to test the parser, the best is to use it, so the IO would probably  
grow at the same time as the parser). But the modifications in the  
sequence structure can be implemented now. I think we should simply  
define goals and have everybody make it clear what they want to  
contribute too, and have several independent lines of development  
that do not depend too much on each other and that can be done  
independently. Here is a possible roadmap, made up in 5 minutes  
(needs some refinement!):
* get the IO to work (at least read sequences)
* modify the sequence structure (read below) and make sure we have  
some methods that can be used by the parser to create the sequence  
(the internals of BCSequence should be as much as possible  
encapsulated and not directly accessed by the parser)
* get the annotations up and running; the annotation issue should not  
prevent the IO from being implemented; in a first phase, the IO can  
parse the annotations but not use them; classes and methods to  
manipulate annotations can be later added to the sequence object, and  
the parser modified to add these calls.

Now, Koen rightly complained he did not get a report of the WWDC  
meeting (and the other absent did not get it too). Here is a  
(complete?) list of the decisions/discussions we had.:
* change the internal structure of teh sequence string in BCSequence  
(read below)
* think about annotations
* look at the internals of BCAnnotatedString of GNUStep to see how  
the annotations are done, because the structure of NSAnnotatedString  
is very similar to sequence annotations
* probably not worry about performance issues with annotations;  
manipulating annotations will not happen that often, mostly when  
modifying a sequence, and generating a subsequence; the bottom line  
is we can probably stick to NSMutableDictionary (I discussed that in  
a previous email)
* still think even more about annotations
* better define the purpose of BioCocoa, and the programmer niche we  
are trying to target (the niche is probably us, at this point!)
* write some code

Regarding the sequence structure Phil mentions, I will try to explain  
it now for those of us that were not part of the discussion.

Short version
-------------
Replace the NSArray of BCSymbol with a char [ ]...

Long version
------------

* The sequence will be stored internally as an array of char, which  
will make the performance discussions moot. A lot of the sequence  
manipulations are particularly easy to handle as strings. I don't  
know if we have decided to use an NSMutableData ivar, or do the  
malloc ourselves. Using NSData is probably a better idea, as it will  
already be optimized for

* The public interface will expose arrays of BCSymbols. Because a  
BCSequence has always a BCSymbolSet associated with it, it is easy to  
convert between chars and BCSymbol objects on demand. All the methods  
for that are already available. The NSArray can even be cached (and  
reconstructed as needed as soon as the sequence is modified).

* The public interface could probably have a method to return the  
array of chars as well as an autoreleased object. This is very easy  
e.g. creating an autoreleased NSData populated with a copy of the  
sequence bytes (and return either the *char or the NSData itself).  
The copy of the bytes (necessary for mutable sequences) will be fast,  
much faster than copying the NSArray (with all the useless retain/ 
release of the singleton BCSymbols). So we don't have to worry about  
the issue of returning the internal array used by the sequence when  
the sequence is mutable (we only have mutable sequences at this  
point, but I plan to add immutable ones, I know, I am obsessed with  
that issue).

On Jul 2, 2005, at 8:45 AM, Philipp Seibel wrote:

> Hi all,
>
> i want to start the discussion on the mailinglist, we allready  
> started at the wwdc.
> In my point of view the BioCocoa project needs to get a modular and  
> flexible structure. The attached pdf shows my suggestion of the  
> possible new structure.
> The next thing we have to discuss is the implementation of the  
> datastructures in the BCFoundation framework. Our wwdc-discussion  
> lead to a new string based sequence structure.
> I think we should spend quite some time to plan the future  
> structure of BioCocoa and stop implementation until the new  
> structure is decided. We all want a 1.0 version of the framework  
> and there are at least two persons from the wwdc, who want to use  
> BioCocoa in their projects, so we should go for it. :-) (i should  
> teach professional motivation practices :-)).
>
> The discussion is open .......
>
> BTW: I allready startet the BCParser.framework mentioned in the  
> attached document. I think of a very flexible highlevel parser  
> framework with event driven parsers like NSXMLParser.
> This allows easy implementation of various file formats for  
> different datastructures. Not everybody is satisfied with a biococa  
> sequence and wants to have his own structure, the parser api allows  
> to parse the files into any datastructure, and of course also into  
> our future BCFoundation structures. The api is based on the c++  
> boost-spirit parser apis and is developed as objective-c++  
> framework, without any dynamic linking dependancies. Just tell me  
> what you think about it ....
>
> cheers,
>
> Phil
>
>
> <BCFrameworks.pdf>
>  _______________________________________________
> Biococoa-dev mailing list
> Biococoa-dev at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biococoa-dev
>

--
Xgrid-at-Stanford
Help science move fast forward:
http://cmgm.stanford.edu/~cparnot/xgrid-stanford

Charles Parnot
charles.parnot at gmail.com