[Biococoa-dev] ultraconserved sarray info

Scott Christley schristley at mac.com
Wed Jul 16 21:18:16 EDT 2008


Hello Paolo,

When originally designing the programs, my thought was similar to  
this, combine all of the suffix arrays together.  Unfortunately this  
is not very efficient, it takes a long time to combine those suffix  
arrays and it produces a very very large file as you can imagine.   
Also, one of my design goals was to be able to split the work across  
multiple machines, with a single file you are limited to using just  
one machine.  So what I did instead is to union the results of  
mcp_sarray, this is the union_mcp command.

Later though, I did find a need to combine suffix arrays however only  
programmatically, I never wrote a command line program.  This is using  
the BCSuffixArrayUnionEnumerator.

Anyways, what I did in my work was to write a shell script to loop  
through all the chromosome files, such as this:


# i is mouse chromosomes
# j is human chromosomes
for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X Y; do
     for j in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22  
X Y; do
	mcp_sarray N human_chr${j} mouse_chr${i} h${j}_m${i} 40
     done
done


I actually did something a little more complicated than this because I  
distributed the work across a cluster of machines.  However, this will  
produce result files with hA_mB where A and B are chromosome numbers,  
then you can use another script to combine all of them together to  
produce a single result.


# i is mouse chromosomes
# j is human chromosomes
union_mcp h1_m1 h2_m1 hm_1
L=1
K=2
for j in 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y; do
     union_mcp hm_${L} h${j}_m1 hm_${K}
     K=$((K + 1))
     L=$((L + 1))
done

for i in 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X Y; do
     for j in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22  
X Y; do
	union_mcp hm_${L} h${j}_m${i} hm_${K}
	K=$((K + 1))
	L=$((L + 1))
     done
done


This produces files hm_K where K is an increasing number.  The file  
with the largest K at the end is your final result file.


cheers
Scott


On Jul 16, 2008, at 2:40 PM, paolo siniselli wrote:

> Hi Scott,
> How can I combine all the suffix array files for an organism into a  
> single suffix array so that mcp_sarray is just run once?
> I'd like to compare two sarray files for each two organisms  
> comparison such as human vs mouse...
> thanks
>
>
>
> --- Lun 14/7/08, Scott Christley <schristley at mac.com> ha scritto:
> Da: Scott Christley <schristley at mac.com>
> Oggetto: Re: [Biococoa-dev] Biococoa2 Install issue
> A: paolo_siniselli at yahoo.it
> Cc: "biococoa" <biococoa-dev at bioinformatics.org>
> Data: Lunedì 14 luglio 2008, 23:27
>
> Hello Paolo,
>
> There is an environment configuration script that comes with  
> GNUstep, you need to "source" this script to setup the GNUstep  
> environment before compiling or running applications.  Maybe  
> something like this:
>
> . /usr/share/GNUstep/Makefiles/GNUstep.sh
>
> As a side note, some of the reports I've heard from people is that  
> the GNUstep which comes with Debian is old and has bugs, I cannot  
> say how well it will work with BioCocoa.
>
> cheers
> Scott
>
>
> On Jul 14, 2008, at 2:06 PM, paolo siniselli wrote:
>
>> I'm trying to install BioCocoa2 on my Linux Debian (Lenny)
>> I've already installed GNUstep
>> So I went into truck/GNUstep folder and I typed "make install" but  
>> I received an error back:
>> GNUmakefile:27: *** You need to run the GNUstep configuration  
>> script before compiling!.  Stop.
>>
>> Those are the files in GNUstep directory:
>> aminoacids.plist  GNUmakefile.postamble  GNUmakefile.subproj
>> GNUmakefile       GNUmakefile.preamble   nucleotides.plist
>>
>> And those are the file in truck direcotry:
>> BCAppKit           BioCocoa.xcodeproj  Examples     Publications.txt
>> BCFoundation       Copying.txt         GNUstep      Readme.txt
>> BC.icns            DeveloperDocs       license.txt  Resources
>> BioCocoa_Prefix.h  Documentation       main.c       Tests
>>
>> Thanks
>> Paolo
>>
>> Posta, news, sport, oroscopo: tutto in una sola pagina
>> Crea l'home page che piace a  
>> te!._______________________________________________
>> Biococoa-dev mailing list
>> Biococoa-dev at bioinformatics.org
>> http://www.bioinformatics.org/mailman/listinfo/biococoa-dev
>
>
> Posta, news, sport, oroscopo: tutto in una sola pagina
> Crea l'home page che piace a te!.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/biococoa-dev/attachments/20080716/315a9209/attachment.html>


More information about the Biococoa-dev mailing list