[BiO BB] Seek advice on programming of miRNA problem

Alex Zhang mayagao1999 at yahoo.com
Fri Sep 23 00:15:18 EDT 2005


Dear all,

I am doing something about miRNA mapping. Please
suggest me on this question. Thank you very much ahead
of time. 

Sincerely,
   Alex



Basically, 

1.	There are two files to be processed. The first one
is “HuBACmap_OG1_4x4_Bld35v1_Mar8-05” which contains
the array information about Bacteria Artificial
Chromosomes. The other one is called “miRNA maps”
which holds the information about the microRNAs gained
from UCSC Genome Center.  
2.	“HuBACmap_OG1_4x4_Bld35v1_Mar8-05” file holds 8
columns and 9575 rows. The first row looks like:

ID   Block    Column   Row	  Chrom	    Start	   End	  
 Mid

>From the 2nd row to the last row, the information
about each BAC is listed block by block (there are 16
blocks from Block 1 to Block 16). A snapshot would be
like:

RP11-125P20	 1	18	 1	01	 88031403    88194307  
88112855
RP11-72M14	 1	19	 1	01	112869186   112904105 112886645
RP11-130B18	 1	20	 1		NA	NA	NA
RP11-192J8	 1	21	 1	01	118214732   118215022 118214877




The features for this file are:
(1)	The important information which will be used are:
“ID” which identify the individual BAC on each
chromosome.
“Chrom” which stands for each chromosome. There are 23
chromosomes.
“Start” and “End” which specify the location of each
BAC on each chromosome. 
(2)	This file includes some unknown information which
is designated by “NA”. If in this case, the row
including “NA” should be ignored.
(3)	There are duplicate data for some BACs. For
example, you may find:
       RP11-192J8	 1	21	 1	01	118214732   118215022 
118214877
       RP11-192J8	 2	45	 1	01	118214732   118215022 
118214877
       Which means that these two rows hold the same
information at least for ID, Chrom, Start, End and Mid
even though the values for Block or Column are
different. If in this case, the program should be able
to recognize the duplicate information and pick any
row for further processing. 
(4) The length of the site which each BAC occupies
ranges from 100bps-350bps.
3.	 “miRNA maps” file holds 10 columns and 325 rows.
The first row looks like:

#bin chrom  chromStart  chromEnd   name   score  
strand   thickStart   thickEnd   type

>From the 2nd row to the last row, the information
about each microRNA is listed chromosome by chromosome
(there are 23 chromosomes from Ch 1 to Ch X). A
snapshot would be like:

      593	chr1	1142406	1142501     hsa-mir-200b	960	+
1142459	1142483	miRna
      593	chr1	1143165	1143255     hsa-mir-200a	960	+
1143180	1143202	miRna




The features for this file are:
(1) The important information which will be used are:
“chrom” which identify the chromosome occupied by
microRNA.
 “name” which is the unique name for each microRNA.
“thickStart” and “thickEnd” specify the site which
microRNA occupies. 
(2) Usually the length for each site is around 22.
      4.    The task is to identify on each chromosome
the flanking BACs for each microRNA and which one is
much closer to the microRNA. To make it easy to
understand, we use a figure to illustrate:

_________________*   miRNA  
*_____________________________________
 *BAC0* *  BAC1*___|                  |____________*  
    BAC2          *   *  BAC3  *
                              D1                      
        D2
Note: 

"*" defines the boundary for miRNA or BAC.
BAC1 is one of the flanking BACs for miRNA because it
is much closer than BAC0;
BAC2 is another flanking BAC for miRNA because it is
much closer than BAC3;

After determining the sites of flanking BACs, we will
calculate the distance between each flanking BAC and
miRNA, which will be D1 and D2. If D1 is greater than
D2, we could say BAC2 is much closer to miRNA than
BAC1.

Then we will end up with a list like:

Chrom        name      ID1     ID2     ID3    D1    D2
 thickStart   thickEnd   

“Chrom” for each chromosome;
“name”   for each miRNA;
“ID1” for the left flanking BAC for each miRNA;
“ID2” for the right flanking BAC for each each miRNA;
“ID3” for the flanking BAC which is closer to miRNA;
“D1”  for the distance between the left flanking BAC
and miRNA;
“D2” for the distance between the right flanking BAC
and miRNA;
“thickStart” and “thickEnd” specify the site which
microRNA occupies

5. We also can imagine some special cases which might
exist. For example, there is a possiblity that a miRNA
shares a common region with its flanking BAC,which
looks like:
 
_________________*   miRNA   *____________________
                 *    BAC             *
It is simple to say that this BAC is a flanking one
for this miRNA. 
Besides, there is another possibilty that a miRNA is
within a BAC, which looks like:
 
_________________*   miRNA   *____________________
                 *          BAC                       
         *
This may happen because the length for a miRNA is
around 22 and the length for a BAC is more than 100. 

If in this case, we will suggest that this miRNA has
only one BAC and the distance is 0.  For example,

Chrom        name               ID1       ID2     ID3 
   D1    D2   thickStart   thickEnd   
  1            hsa-mir-200     RP12    RP12   RP12   0
      0      116677     116699

		
______________________________________________________ 
Yahoo! for Good 
Donate to the Hurricane Katrina relief effort. 
http://store.yahoo.com/redcross-donate3/ 


More information about the BBB mailing list