[Biococoa-dev] New Structure for BioCocoa
charles.parnot at gmail.com
Sat Jul 2 19:39:45 EDT 2005
I like the parser idea, particularly if it is already written by you ;-)
I won't be of any help with C++, though!
The structure you outline looks fine to me, and I am not sure why we
should stop implementing stuff now. Clearly, if we agree to use a
parser, we should not write code for the IO until it is ready (though
to test the parser, the best is to use it, so the IO would probably
grow at the same time as the parser). But the modifications in the
sequence structure can be implemented now. I think we should simply
define goals and have everybody make it clear what they want to
contribute too, and have several independent lines of development
that do not depend too much on each other and that can be done
independently. Here is a possible roadmap, made up in 5 minutes
(needs some refinement!):
* get the IO to work (at least read sequences)
* modify the sequence structure (read below) and make sure we have
some methods that can be used by the parser to create the sequence
(the internals of BCSequence should be as much as possible
encapsulated and not directly accessed by the parser)
* get the annotations up and running; the annotation issue should not
prevent the IO from being implemented; in a first phase, the IO can
parse the annotations but not use them; classes and methods to
manipulate annotations can be later added to the sequence object, and
the parser modified to add these calls.
Now, Koen rightly complained he did not get a report of the WWDC
meeting (and the other absent did not get it too). Here is a
(complete?) list of the decisions/discussions we had.:
* change the internal structure of teh sequence string in BCSequence
* think about annotations
* look at the internals of BCAnnotatedString of GNUStep to see how
the annotations are done, because the structure of NSAnnotatedString
is very similar to sequence annotations
* probably not worry about performance issues with annotations;
manipulating annotations will not happen that often, mostly when
modifying a sequence, and generating a subsequence; the bottom line
is we can probably stick to NSMutableDictionary (I discussed that in
a previous email)
* still think even more about annotations
* better define the purpose of BioCocoa, and the programmer niche we
are trying to target (the niche is probably us, at this point!)
* write some code
Regarding the sequence structure Phil mentions, I will try to explain
it now for those of us that were not part of the discussion.
Replace the NSArray of BCSymbol with a char [ ]...
* The sequence will be stored internally as an array of char, which
will make the performance discussions moot. A lot of the sequence
manipulations are particularly easy to handle as strings. I don't
know if we have decided to use an NSMutableData ivar, or do the
malloc ourselves. Using NSData is probably a better idea, as it will
already be optimized for
* The public interface will expose arrays of BCSymbols. Because a
BCSequence has always a BCSymbolSet associated with it, it is easy to
convert between chars and BCSymbol objects on demand. All the methods
for that are already available. The NSArray can even be cached (and
reconstructed as needed as soon as the sequence is modified).
* The public interface could probably have a method to return the
array of chars as well as an autoreleased object. This is very easy
e.g. creating an autoreleased NSData populated with a copy of the
sequence bytes (and return either the *char or the NSData itself).
The copy of the bytes (necessary for mutable sequences) will be fast,
much faster than copying the NSArray (with all the useless retain/
release of the singleton BCSymbols). So we don't have to worry about
the issue of returning the internal array used by the sequence when
the sequence is mutable (we only have mutable sequences at this
point, but I plan to add immutable ones, I know, I am obsessed with
On Jul 2, 2005, at 8:45 AM, Philipp Seibel wrote:
> Hi all,
> i want to start the discussion on the mailinglist, we allready
> started at the wwdc.
> In my point of view the BioCocoa project needs to get a modular and
> flexible structure. The attached pdf shows my suggestion of the
> possible new structure.
> The next thing we have to discuss is the implementation of the
> datastructures in the BCFoundation framework. Our wwdc-discussion
> lead to a new string based sequence structure.
> I think we should spend quite some time to plan the future
> structure of BioCocoa and stop implementation until the new
> structure is decided. We all want a 1.0 version of the framework
> and there are at least two persons from the wwdc, who want to use
> BioCocoa in their projects, so we should go for it. :-) (i should
> teach professional motivation practices :-)).
> The discussion is open .......
> BTW: I allready startet the BCParser.framework mentioned in the
> attached document. I think of a very flexible highlevel parser
> framework with event driven parsers like NSXMLParser.
> This allows easy implementation of various file formats for
> different datastructures. Not everybody is satisfied with a biococa
> sequence and wants to have his own structure, the parser api allows
> to parse the files into any datastructure, and of course also into
> our future BCFoundation structures. The api is based on the c++
> boost-spirit parser apis and is developed as objective-c++
> framework, without any dynamic linking dependancies. Just tell me
> what you think about it ....
> Biococoa-dev mailing list
> Biococoa-dev at bioinformatics.org
Help science move fast forward:
charles.parnot at gmail.com
More information about the Biococoa-dev