ViewVC Help
View File | Revision Log | Show Annotations | View Changeset | Root Listing
root/bioseg/trunk/README.bioseg
Revision: 32
Committed: Thu Aug 21 16:13:43 2008 UTC (10 years, 9 months ago) by kmr
File size: 6960 byte(s)
Log Message:
Allow negative coordinates when using bioseg0.

Line File contents
1 This directory contains the code for the user-defined type,
2 BIOSEG, representing contiguous intervals in biological sequence.
3
4 (Most of this documentation is copied from contrib/seg/README.seg in the
5 PostgreSQL source).
6
7
8 FILES
9 =====
10
11 Makefile building instructions for the shared library
12
13 README.bioseg the file you are now reading
14
15 bioseg.c the implementation of this data type in C
16
17 bioseg.sql.in SQL code needed to register this type with PostgreSQL
18 (transformed to bioseg.sql by make)
19
20 INSTALLATION
21 ============
22
23 Change into the contrib directory in PostgreSQL and unpack the bioseg tar
24 file:
25 gzip -d < bioseg-x.y.tar.gz | tar xf -
26
27 (Or check-out from subversion with:
28 svn checkout svn://bioinformatics.org/svnroot/bioseg/trunk bioseg
29 in the contrib directory)
30
31 To install the type, change to the bioseg directory and run
32
33 make
34 make install
35
36 The user running "make install" may need root access; depending on the
37 configuration of PostgreSQL. If so this may work:
38
39 sudo make install
40
41 This only installs the type implementation and documentation. To make the
42 type available in any particular database, do
43
44 psql -d databasename < bioseg.sql
45
46 If you install the type in the template1 database, all subsequently created
47 databases will inherit it.
48
49 To test the new type, after "make install" do
50
51 make installcheck
52
53 If it fails, examine the file regression.diffs to find out the reason (the
54 test code is a direct adaptation of the regression tests from the main
55 source tree).
56
57 If you have a full installation of PostgreSQL, including the pg_config
58 program, bioseg can be unpacked anywhere and built like:
59
60 make USE_PGXS=t clean
61 make USE_PGXS=t
62 make install USE_PGXS=t
63 (or: sudo make install USE_PGXS=t)
64
65 and the type can then be installed in a particular database by any user with:
66
67 psql -d databasename < `pg_config --sharedir`/contrib/bioseg.sql
68
69
70 SYNTAX
71 ======
72
73 The user visible representation of an interval is formed using one or two
74 integers greater than 0 joined by the range operator ('..' or '...').
75 The first integer must be less than or equal to the second.
76
77 11..22 An interval from 11 to 22 inclusive - length 12 (= 22-11+1)
78
79 1...2 The same as 1..2
80
81 50 The same as 50..50
82
83 In a statement, bioseg values have the form:
84 '<start>..<end>'::bioseg
85 or can be created with:
86 bioseg_create(start, end)
87
88 For the bioseg type <start> must be less than or equal to <end> and <start>
89 must be 1 or more.
90
91 For example:
92 CREATE TABLE test_bioseg (id integer, seg bioseg);
93 insert into test_bioseg values (1, '1000..2000'::bioseg);
94 or, equivalently
95 insert into test_bioseg values (1, bioseg_create(1000, 2000));
96
97
98 USAGE
99 =====
100
101 See http://www.bioinformatics.org/bioseg/wiki/Main/BiosegUsage for usage
102 examples.
103
104 The following is a list of the available operators. The [a, b] should be
105 replaced in a statement with 'a..b'::bioseg or bioseg_create(a, b).
106
107 [a, b] && [c, d] Overlaps
108
109 Returns true if and only if segments [a, b] and [c, d] overlap
110
111 [a, b] << [c, d] Is left of
112
113 The left operand, [a, b], occurs entirely to the left of the
114 right operand, [c, d]. It means, [a, b] << [c, d] is true if b
115 < c and false otherwise
116
117 [a, b] >> [c, d] Is right of
118
119 [a, b] is occurs entirely to the right of [c, d].
120 [a, b] >> [c, d] is true if a > d and false otherwise
121
122 [a, b] &< [c, d] Overlaps or is left of
123
124 This might be better read as "does not extend to right of".
125 It is true when b <= d.
126
127 [a, b] &> [c, d] Overlaps or is right of
128
129 This might be better read as "does not extend to left of".
130 It is true when a >= c.
131
132 [a, b] = [c, d] Same as
133
134 The segments [a, b] and [c, d] are identical, that is, a == b
135 and c == d
136
137 [a, b] @> [c, d] Contains
138
139 The segment [a, b] contains the segment [c, d], that is,
140 a <= c and b >= d
141
142 [a, b] <@ [c, d] Contained in
143
144 The segment [a, b] is contained in [c, d], that is,
145 a >= c and b <= d
146
147 Although the mnemonics of the following operators is questionable, I
148 preserved them to maintain visual consistency with other geometric
149 data types defined in PostgreSQL.
150
151 Other operators:
152
153 [a, b] < [c, d] Less than
154 [a, b] > [c, d] Greater than
155
156 These operators do not make a lot of sense for any practical
157 purpose but sorting. These operators first compare (a) to (c),
158 and if these are equal, compare (b) to (d). That accounts for
159 reasonably good sorting in most cases, which is useful if
160 you want to use ORDER BY with this type
161
162
163 NOTE: The performance of an R-tree index can largely depend on the
164 order of input values. It may be very helpful to sort the input table
165 on the BIOSEG column (see the script sort-segments.pl for an example)
166
167
168 INDEXES
169 =======
170
171 A GiST index can created for bioseg columns that will greatly speed up
172 overlaps and contains queries. For example:
173
174 CREATE TABLE tt (range bioseg, id integer);
175 CREATE INDEX tt_range_idx ON tt USING gist (range);
176
177
178 INTERBASE COORDINATES
179 =====================
180
181 The standard bioseg type uses the common convention of numbering the bases
182 starting at 1. If you wish to use "interbase" coordinates (also known as "0
183 based" or "half-open intervals") run the build with INTERBASE_COORDS defined
184 in make, ie.:
185
186 make clean
187 make INTERBASE_COORDS=t
188 make install INTERBASE_COORDS=t
189 (or: sudo make install INTERBASE_COORDS=t)
190
191 This will compile and install the implementation for the "bioseg0" type.
192 The "0" in the name is a mnemonic for "0-based".
193
194 Then read "bioseg0.sql" into your database:
195 psql -d databasename < bioseg0.sql
196 to install the type.
197
198 The bioseg and bioseg0 types can be mixed in the same database.
199
200 Notes
201 -----
202 In the interbase system '1..10'::bioseg0 and '10..20'::bioseg0 don't overlap,
203 whereas in the 1-based system '1..10'::bioseg and '10..20'::bioseg have a one
204 base overlap. Also note that the length of '1..10'::bioseg0 is 9, whereas the
205 length of '1..10'::bioseg is 10.
206
207 Unlike the bioseg type the start and/or end of a bioseg0 can be negative, with
208 the expected reults.
209 eg. bioseg0_size('-10..10'::bioseg0) == 20
210
211 See:
212 http://www.gmod.org/wiki/index.php/Introduction_to_Chado#Interbase_Coordinates
213 for a longer discussion of the differences between the coordinate systems.
214
215
216 TESTS
217 =====
218
219 The installation of bioseg can be checked by running:
220
221 make installcheck
222
223
224 CREDITS
225 =======
226
227 Note from the author: Most of the code and all of the hard work needed to
228 implement BIOSEG was by Gene Selkov, Jr, author of the SEG type (contrib/seg
229 in the PostgreSQL source). All bugs are due to me (kmr).
230
231
232 THANKS
233 ======
234
235 Thanks to bioinformatics.org for hosting the project.
236
237
238 AUTHOR
239 ======
240
241 Kim Rutherford <kmr@flymine.org>
242
243 SEG code by Gene Selkov, Jr.