1 |
This directory contains the code for the user-defined type, |
2 |
BIOSEG, representing contiguous intervals in biological sequence. |
3 |
|
4 |
(Most of this documentation is copied from contrib/seg/README.seg in the |
5 |
PostgreSQL source). |
6 |
|
7 |
|
8 |
FILES |
9 |
===== |
10 |
|
11 |
Makefile building instructions for the shared library |
12 |
|
13 |
README.bioseg the file you are now reading |
14 |
|
15 |
bioseg.c the implementation of this data type in C |
16 |
|
17 |
bioseg.sql.in SQL code needed to register this type with PostgreSQL |
18 |
(transformed to bioseg.sql by make) |
19 |
|
20 |
INSTALLATION |
21 |
============ |
22 |
|
23 |
Change into the contrib directory in PostgreSQL and unpack the bioseg tar |
24 |
file: |
25 |
gzip -d < bioseg-x.y.tar.gz | tar xf - |
26 |
|
27 |
(Or check-out from subversion with: |
28 |
svn checkout svn://bioinformatics.org/svnroot/bioseg/trunk bioseg |
29 |
in the contrib directory) |
30 |
|
31 |
To install the type, change to the bioseg directory and run |
32 |
|
33 |
make |
34 |
make install |
35 |
|
36 |
The user running "make install" may need root access; depending on the |
37 |
configuration of PostgreSQL. If so this may work: |
38 |
|
39 |
sudo make install |
40 |
|
41 |
This only installs the type implementation and documentation. To make the |
42 |
type available in any particular database, do |
43 |
|
44 |
psql -d databasename < bioseg.sql |
45 |
|
46 |
If you install the type in the template1 database, all subsequently created |
47 |
databases will inherit it. |
48 |
|
49 |
To test the new type, after "make install" do |
50 |
|
51 |
make installcheck |
52 |
|
53 |
If it fails, examine the file regression.diffs to find out the reason (the |
54 |
test code is a direct adaptation of the regression tests from the main |
55 |
source tree). |
56 |
|
57 |
If you have a full installation of PostgreSQL, including the pg_config |
58 |
program, bioseg can be unpacked anywhere and built like: |
59 |
|
60 |
make USE_PGXS=t |
61 |
make install USE_PGXS=t |
62 |
(or: sudo make install USE_PGXS=t) |
63 |
|
64 |
and the type can then be installed in a particular database by any user with: |
65 |
|
66 |
psql -d databasename < `pg_config --sharedir`/contrib/bioseg.sql |
67 |
|
68 |
|
69 |
SYNTAX |
70 |
====== |
71 |
|
72 |
The user visible representation of an interval is formed using one or two |
73 |
integers greater than 0 joined by the range operator ('..' or '...'). |
74 |
The first integer must be less than or equal to the second. |
75 |
|
76 |
11..22 An interval from 11 to 22 inclusive - length 12 (= 22-11+1) |
77 |
|
78 |
1...2 The same as 1..2 |
79 |
|
80 |
50 The same as 50..50 |
81 |
|
82 |
In a statement, bioseg values have the form: |
83 |
'<start>..<end>'::bioseg |
84 |
or can be created with: |
85 |
bioseg_create(start, end) |
86 |
|
87 |
For example: |
88 |
CREATE TABLE test_bioseg (id integer, seg bioseg); |
89 |
insert into test_bioseg values (1, '1000..2000'::bioseg); |
90 |
or, equivalently |
91 |
insert into test_bioseg values (1, bioseg_create(1000, 2000)); |
92 |
|
93 |
|
94 |
USAGE |
95 |
===== |
96 |
|
97 |
See http://www.bioinformatics.org/bioseg/wiki/Main/BiosegUsage for usage |
98 |
examples. |
99 |
|
100 |
The following is a list of the available operators. The [a, b] should be |
101 |
replaced in a statement with 'a..b'::bioseg or bioseg_create(a, b). |
102 |
|
103 |
[a, b] && [c, d] Overlaps |
104 |
|
105 |
Returns true if and only if segments [a, b] and [c, d] overlap |
106 |
|
107 |
[a, b] << [c, d] Is left of |
108 |
|
109 |
The left operand, [a, b], occurs entirely to the left of the |
110 |
right operand, [c, d]. It means, [a, b] << [c, d] is true if b |
111 |
< c and false otherwise |
112 |
|
113 |
[a, b] >> [c, d] Is right of |
114 |
|
115 |
[a, b] is occurs entirely to the right of [c, d]. |
116 |
[a, b] >> [c, d] is true if a > d and false otherwise |
117 |
|
118 |
[a, b] &< [c, d] Overlaps or is left of |
119 |
|
120 |
This might be better read as "does not extend to right of". |
121 |
It is true when b <= d. |
122 |
|
123 |
[a, b] &> [c, d] Overlaps or is right of |
124 |
|
125 |
This might be better read as "does not extend to left of". |
126 |
It is true when a >= c. |
127 |
|
128 |
[a, b] = [c, d] Same as |
129 |
|
130 |
The segments [a, b] and [c, d] are identical, that is, a == b |
131 |
and c == d |
132 |
|
133 |
[a, b] @> [c, d] Contains |
134 |
|
135 |
The segment [a, b] contains the segment [c, d], that is, |
136 |
a <= c and b >= d |
137 |
|
138 |
[a, b] <@ [c, d] Contained in |
139 |
|
140 |
The segment [a, b] is contained in [c, d], that is, |
141 |
a >= c and b <= d |
142 |
|
143 |
Although the mnemonics of the following operators is questionable, I |
144 |
preserved them to maintain visual consistency with other geometric |
145 |
data types defined in PostgreSQL. |
146 |
|
147 |
Other operators: |
148 |
|
149 |
[a, b] < [c, d] Less than |
150 |
[a, b] > [c, d] Greater than |
151 |
|
152 |
These operators do not make a lot of sense for any practical |
153 |
purpose but sorting. These operators first compare (a) to (c), |
154 |
and if these are equal, compare (b) to (d). That accounts for |
155 |
reasonably good sorting in most cases, which is useful if |
156 |
you want to use ORDER BY with this type |
157 |
|
158 |
|
159 |
NOTE: The performance of an R-tree index can largely depend on the |
160 |
order of input values. It may be very helpful to sort the input table |
161 |
on the BIOSEG column (see the script sort-segments.pl for an example) |
162 |
|
163 |
|
164 |
INDEXES |
165 |
======= |
166 |
|
167 |
A GiST index can created for bioseg columns that will greatly speed up |
168 |
overlaps and contains queries. For example: |
169 |
|
170 |
CREATE TABLE tt (range bioseg, id integer); |
171 |
CREATE INDEX tt_range_idx ON tt USING gist (range); |
172 |
|
173 |
|
174 |
INTERBASE COORDINATES |
175 |
===================== |
176 |
|
177 |
The standard bioseg type uses the common convention of numbering the bases |
178 |
starting at 1. If you wish to use "interbase" coordinates (also known as "0 |
179 |
based" or "half-open intervals") run the build with INTERBASE_COORDS defined |
180 |
in make, ie.: |
181 |
|
182 |
make INTERBASE_COORDS=t |
183 |
make install INTERBASE_COORDS=t |
184 |
|
185 |
This will compile and install the implementation for the "bioseg0" type. |
186 |
The "0" in the name is a mnemonic for "0-based". |
187 |
|
188 |
Then read "bioseg0.sql" into your database: |
189 |
psql -d databasename < bioseg0.sql |
190 |
to install the type. |
191 |
|
192 |
The bioseg and bioseg0 types can be mixed in the same database. |
193 |
|
194 |
Note |
195 |
---- |
196 |
In the interbase system '1..10'::bioseg0 and '10..20'::bioseg0 don't overlap, |
197 |
whereas in the 1-based system '1..10'::bioseg and '10..20'::bioseg have a one |
198 |
base overlap. Also note that the length of '1..10'::bioseg0 is 9, whereas the |
199 |
length of '1..10'::bioseg is 10. |
200 |
|
201 |
See: |
202 |
http://www.gmod.org/wiki/index.php/Introduction_to_Chado#Interbase_Coordinates |
203 |
for a longer discussion of the differences between the coordinate systems. |
204 |
|
205 |
|
206 |
TESTS |
207 |
===== |
208 |
|
209 |
The installation of bioseg can be checked by running: |
210 |
|
211 |
make installcheck |
212 |
|
213 |
|
214 |
CREDITS |
215 |
======= |
216 |
|
217 |
Note from the author: Most of the code and all of the hard work needed to |
218 |
implement BIOSEG was by Gene Selkov, Jr, author of the SEG type (contrib/seg |
219 |
in the PostgreSQL source). All bugs are due to me (kmr). |
220 |
|
221 |
|
222 |
THANKS |
223 |
====== |
224 |
|
225 |
Thanks to bioinformatics.org for hosting the project. |
226 |
|
227 |
|
228 |
AUTHOR |
229 |
====== |
230 |
|
231 |
Kim Rutherford <kmr@flymine.org> |
232 |
|
233 |
SEG code by Gene Selkov, Jr. |