[Biococoa-dev] Even more on sequence formats

Alexander Griekspoor a.griekspoor at nki.nl
Wed Apr 12 03:54:22 EDT 2006


Thanks Koen, an important file format added -cough-!
On a more relevant note, the binary file format reading now works on  
intel as well, here are the update methods:

- (NSDictionary *)readStriderFile:(NSString *)textFile{
	
	/*
	 Binary file format, read in header, determine features and sequence  
-> create dictionary.
	 */
	
	STRIDER_HEADER *signature;

	NSMutableDictionary *matrixDictionary = [NSMutableDictionary  
dictionary];
     NSMutableDictionary *striderDictionary = [NSMutableDictionary  
dictionary];
     NSMutableArray *itemArray = [NSMutableArray arrayWithCapacity:10];

	NSData *data  = [NSData dataWithContentsOfFile: textFile];
	
	// Memory alloc and read in struct
     signature = malloc(sizeof(STRIDER_HEADER));
     [data getBytes: signature length: sizeof(STRIDER_HEADER)];
	
	// Sequence
     NSData *seqdata = [data subdataWithRange: NSMakeRange(sizeof 
(STRIDER_HEADER), CFSwapInt32BigToHost(signature->nLength))];
     NSString *sequence = [[NSString alloc] initWithBytes: [seqdata  
bytes] length: [seqdata length] encoding: NSASCIIStringEncoding];
	NSString *filename = [[textFile lastPathComponent] 
stringByDeletingPathExtension];
	
    	[matrixDictionary setObject:sequence forKey:filename];
	[itemArray addObject: filename];
	[sequence release];

	// Comments
     if(signature->com_length > 0){
         NSData *comdata = [data subdataWithRange: NSMakeRange([data  
length] - CFSwapInt32BigToHost(signature->com_length),  
CFSwapInt32BigToHost(signature->com_length))];
         NSString *comments = [[NSString alloc] initWithBytes:  
[comdata bytes] length: [comdata length] encoding:  
NSASCIIStringEncoding];
         [striderDictionary setObject:comments forKey:@"comments"];
		[comments release];
     }
	
	[striderDictionary setObject:matrixDictionary forKey:@"matrix"];
	[striderDictionary setObject:itemArray forKey:@"items"];
	[striderDictionary setObject:@"DNAStrider" forKey:@"fileType"];
	
	// Clean up
     free(signature);

     return striderDictionary;
	
}



- (NSDictionary *)readGCKFile:(NSString *)textFile{
	
	/*
	 Binary file format, read in header, determine features and sequence  
-> create dictionary.
	 Same as DNA strider but comments are ignored
	 */
	
	GCK_HEADER *signature;
	
	NSMutableDictionary *matrixDictionary = [NSMutableDictionary  
dictionary];
     NSMutableDictionary *gckDictionary = [NSMutableDictionary  
dictionary];
     NSMutableArray *itemArray = [NSMutableArray arrayWithCapacity:10];

	NSData *data  = [NSData dataWithContentsOfFile: textFile];
	
	// Memory alloc and read in struct
     signature = malloc(sizeof(GCK_HEADER));
     [data getBytes: signature length: sizeof(GCK_HEADER)];
	
	// Sequence
     NSData *seqdata = [data subdataWithRange: NSMakeRange(sizeof 
(GCK_HEADER), CFSwapInt32BigToHost(signature->nLength))];
     NSString *sequence = [[NSString alloc] initWithBytes: [seqdata  
bytes] length: [seqdata length] encoding: NSASCIIStringEncoding];
	NSString *filename = [[textFile lastPathComponent] 
stringByDeletingPathExtension];
	
    	[matrixDictionary setObject:sequence forKey:filename];
	[itemArray addObject: filename];
	[sequence release];
	
	[gckDictionary setObject:matrixDictionary forKey:@"matrix"];
	[gckDictionary setObject:itemArray forKey:@"items"];
	[gckDictionary setObject:@"Gene Construction Kit" forKey:@"fileType"];
	
	// Clean up
     free(signature);

     return gckDictionary;
	
}


Cheers,
Alex




On 12-apr-2006, at 1:02, Charles Parnot wrote:

> I got you on this one, Koen :-)
>
> btw, great work. I see all these entries in the BioCocoa svn RSS  
> feed in NetNewsWire, and I am amazed!
>
> charles
>
>
>> On Apr 11, 2006, at 6:19 PM, Alexander Griekspoor wrote:
>>
>>> Hi Koen,
>>>
>>> The format is explained in detail on this page I happened to  
>>> encounter: http://www.mekentosj.com/enzymex
>>> I copied the relevant part below:
>>>
>>> Another sequence format?
>>> Not really. The files EnzymeX creates look like normal files, but  
>>> right-click and open their contents and you will see that they  
>>> consist of a simple FASTA file and a file in which EnzymeX stores  
>>> its preferences. Send a file to someone who doesn't have EnzymeX  
>>> or to a Windows user and they will simply see a folder with a  
>>> FASTA file. No problem! If you have any questions about the exDNA  
>>> file "format", don't hesitate to contact us.
>>
>>
>> Hehehehehe :)
>>
>> - Koen.
>
> --
> Xgrid-at-Stanford
> Help science move fast forward:
> http://cmgm.stanford.edu/~cparnot/xgrid-stanford
>
> Charles Parnot
> charles.parnot at gmail.com
>
>
>
>

*********************************************************
                     ** Alexander Griekspoor **
*********************************************************
               The Netherlands Cancer Institute
               Department of Tumorbiology (H4)
          Plesmanlaan 121, 1066 CX, Amsterdam
                     Tel:  + 31 20 - 512 2023
                     Fax:  + 31 20 - 512 2029
                     AIM: mekentosj at mac.com
                     E-mail: a.griekspoor at nki.nl
                 Web: http://www.mekentosj.com

Windows is a 32-bit patch to a 16-bit shell for an 8-bit
operating system, written for a 4-bit processor by a 2-
bit company without 1 bit of sense.

*********************************************************


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/biococoa-dev/attachments/20060412/1a56cff9/attachment.html>


More information about the Biococoa-dev mailing list