ViewVC Help
View File | Revision Log | Show Annotations | View Changeset | Root Listing
root/bioseg/README.bioseg
Revision: 1
Committed: Tue Aug 14 22:49:05 2007 UTC (11 years, 10 months ago) by kmr
File size: 5713 byte(s)
Log Message:
Initial bioseg checkin.

Line User Rev File contents
1 kmr 1 This directory contains the code for the user-defined type,
2     BIOSEG, representing contiguous intervals in biological sequence.
3    
4     (Most of this documentation is copied from contrib/seg/README.seg in the
5     PostgreSQL source).
6    
7    
8     FILES
9     =====
10    
11     Makefile building instructions for the shared library
12    
13     README.bioseg the file you are now reading
14    
15     bioseg.c the implementation of this data type in C
16    
17     bioseg.sql.in SQL code needed to register this type with PostgreSQL
18     (transformed to bioseg.sql by make)
19    
20     INSTALLATION
21     ============
22    
23     Change into the contrib directory in PostgreSQL and unpack the bioseg tar
24     file:
25     gzip -d < bioseg-x.y.tar.gz | tar xf -
26    
27     To install the type, change to the bioseg directory and run
28    
29     make
30     make install
31    
32     The user running "make install" may need root access; depending on the
33     configuration of PostgreSQL.
34    
35     This only installs the type implementation and documentation. To make the
36     type available in any particular database, do
37    
38     psql -d databasename < bioseg.sql
39    
40     If you install the type in the template1 database, all subsequently created
41     databases will inherit it.
42    
43     To test the new type, after "make install" do
44    
45     make installcheck
46    
47     If it fails, examine the file regression.diffs to find out the reason (the
48     test code is a direct adaptation of the regression tests from the main
49     source tree).
50    
51     If you have a full installation of PostgreSQL, including the pg_config
52     program, bioseg can be unpacked anywhere and built like:
53    
54     make USE_PGXS=t
55     make install USE_PGXS=t
56    
57     and the type can then be installed in a particular database by any user with:
58    
59     psql -d databasename < `pg_config --sharedir`/contrib/bioseg.sql
60    
61    
62     SYNTAX
63     ======
64    
65     The external representation of an interval is formed using one or two
66     integers greater than 0 joined by the range operator ('..' or '...').
67     The first integer must be less than or equal to the second.
68    
69     11..22 An interval from 10 to 20 inclusive - length 11 (= 22-11+1)
70    
71     1...2 The same as 1..2
72    
73     50 The same as 50..50
74    
75    
76     USAGE
77     =====
78    
79     Available operators include:
80    
81     [a, b] && [c, d] Overlaps
82    
83     Returns true if and only if segments [a, b] and [c, d] overlap
84    
85     [a, b] << [c, d] Is left of
86    
87     The left operand, [a, b], occurs entirely to the left of the
88     right operand, [c, d]. It means, [a, b] << [c, d] is true if b
89     < c and false otherwise
90    
91     [a, b] >> [c, d] Is right of
92    
93     [a, b] is occurs entirely to the right of [c, d].
94     [a, b] >> [c, d] is true if a > d and false otherwise
95    
96     [a, b] &< [c, d] Overlaps or is left of
97    
98     This might be better read as "does not extend to right of".
99     It is true when b <= d.
100    
101     [a, b] &> [c, d] Overlaps or is right of
102    
103     This might be better read as "does not extend to left of".
104     It is true when a >= c.
105    
106     [a, b] = [c, d] Same as
107    
108     The segments [a, b] and [c, d] are identical, that is, a == b
109     and c == d
110    
111     [a, b] @> [c, d] Contains
112    
113     The segment [a, b] contains the segment [c, d], that is,
114     a <= c and b >= d
115    
116     [a, b] <@ [c, d] Contained in
117    
118     The segment [a, b] is contained in [c, d], that is,
119     a >= c and b <= d
120    
121     Although the mnemonics of the following operators is questionable, I
122     preserved them to maintain visual consistency with other geometric
123     data types defined in PostgreSQL.
124    
125     Other operators:
126    
127     [a, b] < [c, d] Less than
128     [a, b] > [c, d] Greater than
129    
130     These operators do not make a lot of sense for any practical
131     purpose but sorting. These operators first compare (a) to (c),
132     and if these are equal, compare (b) to (d). That accounts for
133     reasonably good sorting in most cases, which is useful if
134     you want to use ORDER BY with this type
135    
136    
137     NOTE: The performance of an R-tree index can largely depend on the
138     order of input values. It may be very helpful to sort the input table
139     on the BIOSEG column (see the script sort-segments.pl for an example)
140    
141    
142     INDEXES
143     =======
144    
145     A GiST index can created for bioseg columns that will greatly speed up
146     overlaps and contains queries. For example:
147    
148     CREATE TABLE tt (range bioseg, id integer);
149     CREATE INDEX tt_range_idx ON tt USING gist (range);
150    
151    
152     INTERBASE COORDINATES
153     =====================
154    
155     The standard bioseg type uses the common convention of numbering the bases
156     starting at 1. If you wish to use "interbase" coordinates (also known as "0
157     based" or "half-open intervals") run the build with INTERBASE_COORDS defined
158     in make, ie.:
159    
160     make INTERBASE_COORDS=t
161     make install INTERBASE_COORDS=t
162    
163     This will compile and install the implementation for the "bioseg0" type.
164     The "0" in the name being a mnemonic for "0-based".
165    
166     Then restart PostgreSQL and run:
167     psql -d databasename < bioseg.sql
168     as usual to install the type in the database.
169    
170     Note
171     ----
172     In the interbase system '1..10'::bioseg0 and '10..20'::bioseg0 don't overlap,
173     whereas in the 1-based system '1..10'::bioseg and '10..20'::bioseg have a one
174     base overlap. Also note that the length of '1..10'::bioseg0 is 9, whereas the
175     length of '1..10'::bioseg is 10.
176    
177     See:
178     http://www.gmod.org/wiki/index.php/Introduction_to_Chado#Interbase_Coordinates
179     for a longer discussion of the differences between the coordinate systems.
180    
181    
182     TESTS
183     =====
184    
185     The installation of bioseg can be checked by running:
186    
187     make installcheck
188    
189    
190     CREDITS
191     =======
192    
193     Note from the author: Most of the code and all of the hard work needed to
194     implement BIOSEG was by Gene Selkov, Jr, author of the SEG type (contrib/seg
195     in the PostgreSQL source). All bugs are due to me.
196    
197    
198     AUTHOR
199     ======
200    
201     Kim Rutherford <kmr@flymine.org>