[Biodevelopers] database schema of OMIM.

Dan Bolser dmb at mrc-dunn.cam.ac.uk
Mon Sep 15 22:06:22 EDT 2003


Hello

If someone else has done this you should use their
schema. However, below are some pointers.

I would love to see a repository of RDB schemas 
for common databases. I think a community RDB
iniative would be a big help for data integration
and 'common platform' research. I think a few
public domain 'integrated databases' exist,
but I don't know where or how to get access to them.

Cheers

> 1.2|9|25|01|1pter-p36.13|ENO1, PPH, MPB1|C|Enolase-1,
> alpha||172430|S,
> F, R, REa|||Enolase deficiency (1)| | |4(Eno1)|

 
> 1  - Numbering system, in the format 
> Chromosome.Map_Entry_Number

Either use a float, or split the two numbers and create an INT
column for both.

When it comes to INTs you can save time and space by storing the
smallest kind of int you will need (mysql uses different numbers
of bits to store different sized numbers), I.e. ...

+------+--------+----------+-------------+----------------------+
| tiny | small  | medium   | regular     | big                  |
+------+--------+----------+-------------+----------------------+
|  127 |  32767 |  8388607 |  2147483647 |  9223372036854775807 |
| -128 | -32768 | -8388608 | -2147483648 | -9223372036854775808 |
+------+--------+----------+-------------+----------------------+

These represent the maximum and minimum possible integer size,
which you can double if you specify that you will not use -ve 
integers.

> 2  - Month entered

If this is *always* a number between 1 and 12 you can make uber
savings with the use of an ENUM column, which works just like an
array lookup, allowing you to store months as numbers and display
month names for nice output.

> 3  - Day     "
> 4  - Year    "

Probably a bit too many different values to mess around with ENUM,
so you could put the lot in a DATE column, they are good for dates!

> 5  - Location

Looks variable, best to use VARCHAR(30), and check after insert for
longer entries (can be very fustrating!)


> 6  - Gene Symbol(s)

Looks less variable, consider using a 'fixed width' CHAR column..

I just noticed the one to many... if you are keen you should 
split these up like so...

MAIN TABLE

PRIMARY KEY | DATA...
------------+--------
1.2         | 1pter-p36.13

SUB TABLE

FORIGN KEY  | DATA...
------------+--------
1.2         | ENO1
1.2         | PPH
1.2         | MPB1

Can be a pain in the assets.


> 7  - Gene Status (see below for codes)

Sounds good for an enum!


> 8  - Title

Go for VARCHAR(255) or a TEXT if this isn't long enough...
Look for warnings on insert for signs of data truncation.

> 9  - 

Dont bother. 

> 10 - MIM Number

MEDIUMINT UNSIGNED ?

> 11 - Method (see below for codes)

Maby an enum?
If their are only a few different codes, you could be brave
and try a SET column, which allows you to store one to many
relationships in a singe table column.

I.E.

PRIMARY KEY | DATA...      | METHODS SET
------------+--------------+--------------
1.2         | 1pter-p36.13 | S:F:R:REa

The only contraint on this kind of column is the total number of
methods.


> 12 - Comments

Go TEXT

> 13 -
> 14 - Disorders
> 15 - Disorders, cont.
> 16 - Disorders, cont

Maby VARCHAR(255)

> 17 - Mouse correlate
> 18 - Reference

??

In doubt go for TEXT, the data will be easyier
to normalize once you have it in a table;)

In theory you should create a sub table for all 
columns which can have multiple values in one
row, but sometimes it dosn't feel like the effort.

Thanks,


> 
> I now want to 
> CREATE TABLE genemap (
>  Chromosome.Map_Entry_Number VARCHAR(??)
>  Month_entered ????(?)
>  ...
>  )
> but I'm a bit unsure as of what
> parameters to use (VARCHAR, INTEGER,...), and what the
> values are of these
> parameters.
> 
> So, has anybody done this before, or can you point me
> to a place where I can the database schema of OMIM?  A
> visual representation (like the physical model of
> dbSNP, see
> http://www.ncbi.nlm.nih.gov/SNP/dbSNP_work000401_gif_frame.htm)
> or table structure forms (like in the GoldenPath
> database, see
> http://genome.ucsc.edu/goldenPath/hg16/database/),
> would be perfect.
> 
> Thanks in advance, Jeroen
> 
> __________________________________
> Do you Yahoo!?
> Yahoo! SiteBuilder - Free, easy-to-use web site design software
> http://sitebuilder.yahoo.com
> _______________________________________________
> Biodevelopers mailing list
> Biodevelopers at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biodevelopers
> 




More information about the Biodevelopers mailing list