[Biococoa-dev] SequenceIO

Charles Parnot charles.parnot at gmail.com
Wed Jun 29 18:35:18 EDT 2005


> Changing the code was not that difficult and I will commit the  
> files soon, so everyone can see what is going on. That being said,  
> I am running in the following problem. Some file formats have many  
> lines with annotations, eg the test2.txt file in the Translation  
> example. As you can see some lines have the same identifier (DT,  
> OC, etc). If I use that as the key, the final dictionary wil only  
> contain the last line, because it will override existing keys. I  
> can think of a few solutions. First is what I do now, is to append  
> the values to the existing one, leaving only one line with each  
> identifier. This works fine, but could give problems if we want to  
> write the files out, because we don't know where the different  
> lines begin and end. We could of course put some kind of marker  
> inbetween the strings, so whe know where each next one begins.  
> Another solution could be to assign numbers to identifiers with  
> multiple lines, ID1, ID2, ID3, etc. Problem here is that this will  
> give preblems when searching for a specific key.  My preference  
> would be now the first solution, but if anyone has a better  
> suggestion, please shout.

Yes, concetenating all the lines, separated by a new-line, seems very  
reasonable, and easy to revert. You can use  
'componentsJoinedByString' and componentsSeparatedByString', using  
@"\n" as the separator (...or @"\r"???).


> Another issue are nested annotations. Again see the test2.txt file  
> and look for RN (for reference). It is followed by a set of  
> identifiers for the references, and then is followed by another  
> reference. I guess I could put the subannotations in a new  
> dictionary, and put those in the content of the RN annotation. A  
> similar issue can be found in ncbi files (see test4.txt)

Nested annotations are a big issue, particularly regarding sequence  
position. We have to come up with something good...

thanks, Koen, for all the work!


charles

--
Xgrid-at-Stanford
Help science move fast forward:
http://cmgm.stanford.edu/~cparnot/xgrid-stanford

Charles Parnot
charles.parnot at gmail.com






More information about the Biococoa-dev mailing list