ViewVC Help
View File | Revision Log | Show Annotations | View Changeset | Root Listing
root/bioseg/trunk/README.bioseg
Revision: 38
Committed: Thu Aug 21 16:37:31 2008 UTC (10 years, 9 months ago) by kmr
File size: 6812 byte(s)
Log Message:
Don't mention sort-segments.pl in the README - it doesn't exist.

Line File contents
1 This directory contains the code for the user-defined type,
2 BIOSEG, representing contiguous intervals in biological sequence.
3
4 (Most of this documentation is copied from contrib/seg/README.seg in the
5 PostgreSQL source).
6
7
8 FILES
9 =====
10
11 Makefile building instructions for the shared library
12
13 README.bioseg the file you are now reading
14
15 bioseg.c the implementation of this data type in C
16
17 bioseg.sql.in SQL code needed to register this type with PostgreSQL
18 (transformed to bioseg.sql by make)
19
20 INSTALLATION
21 ============
22
23 Change into the contrib directory in PostgreSQL and unpack the bioseg tar
24 file:
25 gzip -d < bioseg-x.y.tar.gz | tar xf -
26
27 (Or check-out from subversion with:
28 svn checkout svn://bioinformatics.org/svnroot/bioseg/trunk bioseg
29 in the contrib directory)
30
31 To install the type, change to the bioseg directory and run
32
33 make
34 make install
35
36 The user running "make install" may need root access; depending on the
37 configuration of PostgreSQL. If so this may work:
38
39 sudo make install
40
41 This only installs the type implementation and documentation. To make the
42 type available in any particular database, do
43
44 psql -d databasename < bioseg.sql
45
46 If you install the type in the template1 database, all subsequently created
47 databases will inherit it.
48
49 To test the new type, after "make install" do
50
51 make installcheck
52
53 If it fails, examine the file regression.diffs to find out the reason (the
54 test code is a direct adaptation of the regression tests from the main
55 source tree).
56
57 If you have a full installation of PostgreSQL, including the pg_config
58 program, bioseg can be unpacked anywhere and built like:
59
60 make USE_PGXS=t clean
61 make USE_PGXS=t
62 make install USE_PGXS=t
63 (or: sudo make install USE_PGXS=t)
64
65 and the type can then be installed in a particular database by any user with:
66
67 psql -d databasename < `pg_config --sharedir`/contrib/bioseg.sql
68
69
70 SYNTAX
71 ======
72
73 The user visible representation of an interval is formed using one or two
74 integers greater than 0 joined by the range operator ('..' or '...').
75 The first integer must be less than or equal to the second.
76
77 11..22 An interval from 11 to 22 inclusive - length 12 (= 22-11+1)
78
79 1...2 The same as 1..2
80
81 50 The same as 50..50
82
83 In a statement, bioseg values have the form:
84 '<start>..<end>'::bioseg
85 or can be created with:
86 bioseg_create(start, end)
87
88 For example:
89 CREATE TABLE test_bioseg (id integer, seg bioseg);
90 insert into test_bioseg values (1, '1000..2000'::bioseg);
91 or, equivalently
92 insert into test_bioseg values (1, bioseg_create(1000, 2000));
93
94
95 USAGE
96 =====
97
98 See http://www.bioinformatics.org/bioseg/wiki/Main/BiosegUsage for usage
99 examples.
100
101 The following is a list of the available operators. The [a, b] should be
102 replaced in a statement with 'a..b'::bioseg or bioseg_create(a, b).
103
104 [a, b] && [c, d] Overlaps
105
106 Returns true if and only if segments [a, b] and [c, d] overlap
107
108 [a, b] << [c, d] Is left of
109
110 The left operand, [a, b], occurs entirely to the left of the
111 right operand, [c, d]. It means, [a, b] << [c, d] is true if b
112 < c and false otherwise
113
114 [a, b] >> [c, d] Is right of
115
116 [a, b] is occurs entirely to the right of [c, d].
117 [a, b] >> [c, d] is true if a > d and false otherwise
118
119 [a, b] &< [c, d] Overlaps or is left of
120
121 This might be better read as "does not extend to right of".
122 It is true when b <= d.
123
124 [a, b] &> [c, d] Overlaps or is right of
125
126 This might be better read as "does not extend to left of".
127 It is true when a >= c.
128
129 [a, b] = [c, d] Same as
130
131 The segments [a, b] and [c, d] are identical, that is, a == b
132 and c == d
133
134 [a, b] @> [c, d] Contains
135
136 The segment [a, b] contains the segment [c, d], that is,
137 a <= c and b >= d
138
139 [a, b] <@ [c, d] Contained in
140
141 The segment [a, b] is contained in [c, d], that is,
142 a >= c and b <= d
143
144 Although the mnemonics of the following operators is questionable, I
145 preserved them to maintain visual consistency with other geometric
146 data types defined in PostgreSQL.
147
148 Other operators:
149
150 [a, b] < [c, d] Less than
151 [a, b] > [c, d] Greater than
152
153 These operators do not make a lot of sense for any practical
154 purpose but sorting. These operators first compare (a) to (c),
155 and if these are equal, compare (b) to (d). That accounts for
156 reasonably good sorting in most cases, which is useful if
157 you want to use ORDER BY with this type
158
159
160 NOTE: The performance of an R-tree index can largely depend on the order of
161 input values. It may be helpful to sort the input table on the BIOSEG column.
162
163
164 INDEXES
165 =======
166
167 A GiST index can created for bioseg columns that will greatly speed up
168 overlaps and contains queries. For example:
169
170 CREATE TABLE tt (range bioseg, id integer);
171 CREATE INDEX tt_range_idx ON tt USING gist (range);
172
173
174 INTERBASE COORDINATES
175 =====================
176
177 The standard bioseg type uses the common convention of numbering the bases
178 starting at 1. If you wish to use "interbase" coordinates (also known as "0
179 based" or "half-open intervals") run the build with INTERBASE_COORDS defined
180 in make, ie.:
181
182 make clean
183 make INTERBASE_COORDS=t
184 make install INTERBASE_COORDS=t
185 (or: sudo make install INTERBASE_COORDS=t)
186
187 This will compile and install the implementation for the "bioseg0" type.
188 The "0" in the name is a mnemonic for "0-based".
189
190 Then read "bioseg0.sql" into your database:
191 psql -d databasename < bioseg0.sql
192 to install the type.
193
194 The bioseg and bioseg0 types can be mixed in the same database.
195
196 Notes
197 -----
198 In the interbase system '1..10'::bioseg0 and '10..20'::bioseg0 don't overlap,
199 whereas in the 1-based system '1..10'::bioseg and '10..20'::bioseg have a one
200 base overlap. Also note that the length of '1..10'::bioseg0 is 9, whereas the
201 length of '1..10'::bioseg is 10.
202
203 Unlike the bioseg type the start and/or end of a bioseg0 can be negative, with
204 the expected reults.
205 eg. bioseg0_size('-10..10'::bioseg0) == 20
206
207 See:
208 http://www.gmod.org/wiki/index.php/Introduction_to_Chado#Interbase_Coordinates
209 for a longer discussion of the differences between the coordinate systems.
210
211
212 TESTS
213 =====
214
215 The installation of bioseg can be checked by running:
216
217 make installcheck
218
219
220 CREDITS
221 =======
222
223 Note from the author: Most of the code and all of the hard work needed to
224 implement BIOSEG was by Gene Selkov, Jr, author of the SEG type (contrib/seg
225 in the PostgreSQL source). All bugs are due to me (kmr).
226
227
228 THANKS
229 ======
230
231 Thanks to bioinformatics.org for hosting the project.
232
233
234 AUTHOR
235 ======
236
237 Kim Rutherford <kmr@flymine.org>
238
239 SEG code by Gene Selkov, Jr.