[BiO BB] Seek advice on programming of miRNA problem
Alex Zhang
mayagao1999 at yahoo.com
Fri Sep 23 00:15:18 EDT 2005
Dear all,
I am doing something about miRNA mapping. Please
suggest me on this question. Thank you very much ahead
of time.
Sincerely,
Alex
Basically,
1. There are two files to be processed. The first one
is HuBACmap_OG1_4x4_Bld35v1_Mar8-05 which contains
the array information about Bacteria Artificial
Chromosomes. The other one is called miRNA maps
which holds the information about the microRNAs gained
from UCSC Genome Center.
2. HuBACmap_OG1_4x4_Bld35v1_Mar8-05 file holds 8
columns and 9575 rows. The first row looks like:
ID Block Column Row Chrom Start End
Mid
>From the 2nd row to the last row, the information
about each BAC is listed block by block (there are 16
blocks from Block 1 to Block 16). A snapshot would be
like:
RP11-125P20 1 18 1 01 88031403 88194307
88112855
RP11-72M14 1 19 1 01 112869186 112904105 112886645
RP11-130B18 1 20 1 NA NA NA
RP11-192J8 1 21 1 01 118214732 118215022 118214877
The features for this file are:
(1) The important information which will be used are:
ID which identify the individual BAC on each
chromosome.
Chrom which stands for each chromosome. There are 23
chromosomes.
Start and End which specify the location of each
BAC on each chromosome.
(2) This file includes some unknown information which
is designated by NA. If in this case, the row
including NA should be ignored.
(3) There are duplicate data for some BACs. For
example, you may find:
RP11-192J8 1 21 1 01 118214732 118215022
118214877
RP11-192J8 2 45 1 01 118214732 118215022
118214877
Which means that these two rows hold the same
information at least for ID, Chrom, Start, End and Mid
even though the values for Block or Column are
different. If in this case, the program should be able
to recognize the duplicate information and pick any
row for further processing.
(4) The length of the site which each BAC occupies
ranges from 100bps-350bps.
3. miRNA maps file holds 10 columns and 325 rows.
The first row looks like:
#bin chrom chromStart chromEnd name score
strand thickStart thickEnd type
>From the 2nd row to the last row, the information
about each microRNA is listed chromosome by chromosome
(there are 23 chromosomes from Ch 1 to Ch X). A
snapshot would be like:
593 chr1 1142406 1142501 hsa-mir-200b 960 +
1142459 1142483 miRna
593 chr1 1143165 1143255 hsa-mir-200a 960 +
1143180 1143202 miRna
The features for this file are:
(1) The important information which will be used are:
chrom which identify the chromosome occupied by
microRNA.
name which is the unique name for each microRNA.
thickStart and thickEnd specify the site which
microRNA occupies.
(2) Usually the length for each site is around 22.
4. The task is to identify on each chromosome
the flanking BACs for each microRNA and which one is
much closer to the microRNA. To make it easy to
understand, we use a figure to illustrate:
_________________* miRNA
*_____________________________________
*BAC0* * BAC1*___| |____________*
BAC2 * * BAC3 *
D1
D2
Note:
"*" defines the boundary for miRNA or BAC.
BAC1 is one of the flanking BACs for miRNA because it
is much closer than BAC0;
BAC2 is another flanking BAC for miRNA because it is
much closer than BAC3;
After determining the sites of flanking BACs, we will
calculate the distance between each flanking BAC and
miRNA, which will be D1 and D2. If D1 is greater than
D2, we could say BAC2 is much closer to miRNA than
BAC1.
Then we will end up with a list like:
Chrom name ID1 ID2 ID3 D1 D2
thickStart thickEnd
Chrom for each chromosome;
name for each miRNA;
ID1 for the left flanking BAC for each miRNA;
ID2 for the right flanking BAC for each each miRNA;
ID3 for the flanking BAC which is closer to miRNA;
D1 for the distance between the left flanking BAC
and miRNA;
D2 for the distance between the right flanking BAC
and miRNA;
thickStart and thickEnd specify the site which
microRNA occupies
5. We also can imagine some special cases which might
exist. For example, there is a possiblity that a miRNA
shares a common region with its flanking BAC,which
looks like:
_________________* miRNA *____________________
* BAC *
It is simple to say that this BAC is a flanking one
for this miRNA.
Besides, there is another possibilty that a miRNA is
within a BAC, which looks like:
_________________* miRNA *____________________
* BAC
*
This may happen because the length for a miRNA is
around 22 and the length for a BAC is more than 100.
If in this case, we will suggest that this miRNA has
only one BAC and the distance is 0. For example,
Chrom name ID1 ID2 ID3
D1 D2 thickStart thickEnd
1 hsa-mir-200 RP12 RP12 RP12 0
0 116677 116699
______________________________________________________
Yahoo! for Good
Donate to the Hurricane Katrina relief effort.
http://store.yahoo.com/redcross-donate3/
More information about the BBB
mailing list