1 |
kmr |
1 |
This directory contains the code for the user-defined type, |
2 |
|
|
BIOSEG, representing contiguous intervals in biological sequence. |
3 |
|
|
|
4 |
|
|
(Most of this documentation is copied from contrib/seg/README.seg in the |
5 |
|
|
PostgreSQL source). |
6 |
|
|
|
7 |
|
|
|
8 |
|
|
FILES |
9 |
|
|
===== |
10 |
|
|
|
11 |
|
|
Makefile building instructions for the shared library |
12 |
|
|
|
13 |
|
|
README.bioseg the file you are now reading |
14 |
|
|
|
15 |
|
|
bioseg.c the implementation of this data type in C |
16 |
|
|
|
17 |
|
|
bioseg.sql.in SQL code needed to register this type with PostgreSQL |
18 |
|
|
(transformed to bioseg.sql by make) |
19 |
|
|
|
20 |
|
|
INSTALLATION |
21 |
|
|
============ |
22 |
|
|
|
23 |
|
|
Change into the contrib directory in PostgreSQL and unpack the bioseg tar |
24 |
|
|
file: |
25 |
|
|
gzip -d < bioseg-x.y.tar.gz | tar xf - |
26 |
|
|
|
27 |
|
|
To install the type, change to the bioseg directory and run |
28 |
|
|
|
29 |
|
|
make |
30 |
|
|
make install |
31 |
|
|
|
32 |
|
|
The user running "make install" may need root access; depending on the |
33 |
|
|
configuration of PostgreSQL. |
34 |
|
|
|
35 |
|
|
This only installs the type implementation and documentation. To make the |
36 |
|
|
type available in any particular database, do |
37 |
|
|
|
38 |
|
|
psql -d databasename < bioseg.sql |
39 |
|
|
|
40 |
|
|
If you install the type in the template1 database, all subsequently created |
41 |
|
|
databases will inherit it. |
42 |
|
|
|
43 |
|
|
To test the new type, after "make install" do |
44 |
|
|
|
45 |
|
|
make installcheck |
46 |
|
|
|
47 |
|
|
If it fails, examine the file regression.diffs to find out the reason (the |
48 |
|
|
test code is a direct adaptation of the regression tests from the main |
49 |
|
|
source tree). |
50 |
|
|
|
51 |
|
|
If you have a full installation of PostgreSQL, including the pg_config |
52 |
|
|
program, bioseg can be unpacked anywhere and built like: |
53 |
|
|
|
54 |
|
|
make USE_PGXS=t |
55 |
|
|
make install USE_PGXS=t |
56 |
|
|
|
57 |
|
|
and the type can then be installed in a particular database by any user with: |
58 |
|
|
|
59 |
|
|
psql -d databasename < `pg_config --sharedir`/contrib/bioseg.sql |
60 |
|
|
|
61 |
|
|
|
62 |
|
|
SYNTAX |
63 |
|
|
====== |
64 |
|
|
|
65 |
kmr |
7 |
The user visible representation of an interval is formed using one or two |
66 |
kmr |
1 |
integers greater than 0 joined by the range operator ('..' or '...'). |
67 |
|
|
The first integer must be less than or equal to the second. |
68 |
|
|
|
69 |
kmr |
7 |
11..22 An interval from 10 to 20 inclusive - length 11 (= 22-11+1) |
70 |
kmr |
1 |
|
71 |
kmr |
7 |
1...2 The same as 1..2 |
72 |
kmr |
1 |
|
73 |
kmr |
7 |
50 The same as 50..50 |
74 |
kmr |
1 |
|
75 |
kmr |
7 |
In a statement, bioseg values have the form: |
76 |
|
|
'<start>..<end>'::bioseg |
77 |
|
|
or can be created with: |
78 |
|
|
bioseg_create(start, end) |
79 |
kmr |
1 |
|
80 |
kmr |
7 |
For example: |
81 |
|
|
CREATE TABLE test_bioseg (id integer, seg bioseg); |
82 |
|
|
insert into test_bioseg values (1, '1000..2000'::bioseg); |
83 |
|
|
or, equivalently |
84 |
|
|
insert into test_bioseg values (1, bioseg_create(1000, 2000)); |
85 |
|
|
|
86 |
|
|
|
87 |
kmr |
1 |
USAGE |
88 |
|
|
===== |
89 |
|
|
|
90 |
kmr |
7 |
The following is a list of the available operators. The [a, b] should be |
91 |
|
|
replace in a statement with 'a..b'::bioseg or bioseg_create(a, b). |
92 |
kmr |
1 |
|
93 |
|
|
[a, b] && [c, d] Overlaps |
94 |
|
|
|
95 |
|
|
Returns true if and only if segments [a, b] and [c, d] overlap |
96 |
|
|
|
97 |
|
|
[a, b] << [c, d] Is left of |
98 |
|
|
|
99 |
|
|
The left operand, [a, b], occurs entirely to the left of the |
100 |
|
|
right operand, [c, d]. It means, [a, b] << [c, d] is true if b |
101 |
|
|
< c and false otherwise |
102 |
|
|
|
103 |
|
|
[a, b] >> [c, d] Is right of |
104 |
|
|
|
105 |
|
|
[a, b] is occurs entirely to the right of [c, d]. |
106 |
|
|
[a, b] >> [c, d] is true if a > d and false otherwise |
107 |
|
|
|
108 |
|
|
[a, b] &< [c, d] Overlaps or is left of |
109 |
|
|
|
110 |
|
|
This might be better read as "does not extend to right of". |
111 |
|
|
It is true when b <= d. |
112 |
|
|
|
113 |
|
|
[a, b] &> [c, d] Overlaps or is right of |
114 |
|
|
|
115 |
|
|
This might be better read as "does not extend to left of". |
116 |
|
|
It is true when a >= c. |
117 |
|
|
|
118 |
|
|
[a, b] = [c, d] Same as |
119 |
|
|
|
120 |
|
|
The segments [a, b] and [c, d] are identical, that is, a == b |
121 |
|
|
and c == d |
122 |
|
|
|
123 |
|
|
[a, b] @> [c, d] Contains |
124 |
|
|
|
125 |
|
|
The segment [a, b] contains the segment [c, d], that is, |
126 |
|
|
a <= c and b >= d |
127 |
|
|
|
128 |
|
|
[a, b] <@ [c, d] Contained in |
129 |
|
|
|
130 |
|
|
The segment [a, b] is contained in [c, d], that is, |
131 |
|
|
a >= c and b <= d |
132 |
|
|
|
133 |
|
|
Although the mnemonics of the following operators is questionable, I |
134 |
|
|
preserved them to maintain visual consistency with other geometric |
135 |
|
|
data types defined in PostgreSQL. |
136 |
|
|
|
137 |
|
|
Other operators: |
138 |
|
|
|
139 |
|
|
[a, b] < [c, d] Less than |
140 |
|
|
[a, b] > [c, d] Greater than |
141 |
|
|
|
142 |
|
|
These operators do not make a lot of sense for any practical |
143 |
|
|
purpose but sorting. These operators first compare (a) to (c), |
144 |
|
|
and if these are equal, compare (b) to (d). That accounts for |
145 |
|
|
reasonably good sorting in most cases, which is useful if |
146 |
|
|
you want to use ORDER BY with this type |
147 |
|
|
|
148 |
|
|
|
149 |
|
|
NOTE: The performance of an R-tree index can largely depend on the |
150 |
|
|
order of input values. It may be very helpful to sort the input table |
151 |
|
|
on the BIOSEG column (see the script sort-segments.pl for an example) |
152 |
|
|
|
153 |
|
|
|
154 |
|
|
INDEXES |
155 |
|
|
======= |
156 |
|
|
|
157 |
|
|
A GiST index can created for bioseg columns that will greatly speed up |
158 |
|
|
overlaps and contains queries. For example: |
159 |
|
|
|
160 |
|
|
CREATE TABLE tt (range bioseg, id integer); |
161 |
|
|
CREATE INDEX tt_range_idx ON tt USING gist (range); |
162 |
|
|
|
163 |
|
|
|
164 |
|
|
INTERBASE COORDINATES |
165 |
|
|
===================== |
166 |
|
|
|
167 |
|
|
The standard bioseg type uses the common convention of numbering the bases |
168 |
|
|
starting at 1. If you wish to use "interbase" coordinates (also known as "0 |
169 |
|
|
based" or "half-open intervals") run the build with INTERBASE_COORDS defined |
170 |
|
|
in make, ie.: |
171 |
|
|
|
172 |
|
|
make INTERBASE_COORDS=t |
173 |
|
|
make install INTERBASE_COORDS=t |
174 |
|
|
|
175 |
|
|
This will compile and install the implementation for the "bioseg0" type. |
176 |
|
|
The "0" in the name being a mnemonic for "0-based". |
177 |
|
|
|
178 |
kmr |
5 |
Then restart PostgreSQL and read "bioseg0.sql": |
179 |
|
|
psql -d databasename < bioseg0.sql |
180 |
|
|
as to install the type in a database. |
181 |
kmr |
1 |
|
182 |
|
|
Note |
183 |
|
|
---- |
184 |
|
|
In the interbase system '1..10'::bioseg0 and '10..20'::bioseg0 don't overlap, |
185 |
|
|
whereas in the 1-based system '1..10'::bioseg and '10..20'::bioseg have a one |
186 |
|
|
base overlap. Also note that the length of '1..10'::bioseg0 is 9, whereas the |
187 |
|
|
length of '1..10'::bioseg is 10. |
188 |
|
|
|
189 |
|
|
See: |
190 |
|
|
http://www.gmod.org/wiki/index.php/Introduction_to_Chado#Interbase_Coordinates |
191 |
|
|
for a longer discussion of the differences between the coordinate systems. |
192 |
|
|
|
193 |
|
|
|
194 |
|
|
TESTS |
195 |
|
|
===== |
196 |
|
|
|
197 |
|
|
The installation of bioseg can be checked by running: |
198 |
|
|
|
199 |
|
|
make installcheck |
200 |
|
|
|
201 |
|
|
|
202 |
|
|
CREDITS |
203 |
|
|
======= |
204 |
|
|
|
205 |
|
|
Note from the author: Most of the code and all of the hard work needed to |
206 |
|
|
implement BIOSEG was by Gene Selkov, Jr, author of the SEG type (contrib/seg |
207 |
kmr |
7 |
in the PostgreSQL source). All bugs are due to me (kmr). |
208 |
kmr |
1 |
|
209 |
|
|
|
210 |
|
|
AUTHOR |
211 |
|
|
====== |
212 |
|
|
|
213 |
|
|
Kim Rutherford <kmr@flymine.org> |
214 |
kmr |
7 |
|
215 |
|
|
SEG code by Gene Selkov, Jr. |