ViewVC Help
View File | Revision Log | Show Annotations | View Changeset | Root Listing
root/bioseg/tags/release-0.6/README.bioseg
Revision: 37
Committed: Thu Aug 21 16:20:40 2008 UTC (11 years, 1 month ago) by kmr
File size: 6864 byte(s)
Log Message:
Tagged 0.6

Line File contents
1 This directory contains the code for the user-defined type,
2 BIOSEG, representing contiguous intervals in biological sequence.
3
4 (Most of this documentation is copied from contrib/seg/README.seg in the
5 PostgreSQL source).
6
7
8 FILES
9 =====
10
11 Makefile building instructions for the shared library
12
13 README.bioseg the file you are now reading
14
15 bioseg.c the implementation of this data type in C
16
17 bioseg.sql.in SQL code needed to register this type with PostgreSQL
18 (transformed to bioseg.sql by make)
19
20 INSTALLATION
21 ============
22
23 Change into the contrib directory in PostgreSQL and unpack the bioseg tar
24 file:
25 gzip -d < bioseg-x.y.tar.gz | tar xf -
26
27 (Or check-out from subversion with:
28 svn checkout svn://bioinformatics.org/svnroot/bioseg/trunk bioseg
29 in the contrib directory)
30
31 To install the type, change to the bioseg directory and run
32
33 make
34 make install
35
36 The user running "make install" may need root access; depending on the
37 configuration of PostgreSQL. If so this may work:
38
39 sudo make install
40
41 This only installs the type implementation and documentation. To make the
42 type available in any particular database, do
43
44 psql -d databasename < bioseg.sql
45
46 If you install the type in the template1 database, all subsequently created
47 databases will inherit it.
48
49 To test the new type, after "make install" do
50
51 make installcheck
52
53 If it fails, examine the file regression.diffs to find out the reason (the
54 test code is a direct adaptation of the regression tests from the main
55 source tree).
56
57 If you have a full installation of PostgreSQL, including the pg_config
58 program, bioseg can be unpacked anywhere and built like:
59
60 make USE_PGXS=t clean
61 make USE_PGXS=t
62 make install USE_PGXS=t
63 (or: sudo make install USE_PGXS=t)
64
65 and the type can then be installed in a particular database by any user with:
66
67 psql -d databasename < `pg_config --sharedir`/contrib/bioseg.sql
68
69
70 SYNTAX
71 ======
72
73 The user visible representation of an interval is formed using one or two
74 integers greater than 0 joined by the range operator ('..' or '...').
75 The first integer must be less than or equal to the second.
76
77 11..22 An interval from 11 to 22 inclusive - length 12 (= 22-11+1)
78
79 1...2 The same as 1..2
80
81 50 The same as 50..50
82
83 In a statement, bioseg values have the form:
84 '<start>..<end>'::bioseg
85 or can be created with:
86 bioseg_create(start, end)
87
88 For example:
89 CREATE TABLE test_bioseg (id integer, seg bioseg);
90 insert into test_bioseg values (1, '1000..2000'::bioseg);
91 or, equivalently
92 insert into test_bioseg values (1, bioseg_create(1000, 2000));
93
94
95 USAGE
96 =====
97
98 See http://www.bioinformatics.org/bioseg/wiki/Main/BiosegUsage for usage
99 examples.
100
101 The following is a list of the available operators. The [a, b] should be
102 replaced in a statement with 'a..b'::bioseg or bioseg_create(a, b).
103
104 [a, b] && [c, d] Overlaps
105
106 Returns true if and only if segments [a, b] and [c, d] overlap
107
108 [a, b] << [c, d] Is left of
109
110 The left operand, [a, b], occurs entirely to the left of the
111 right operand, [c, d]. It means, [a, b] << [c, d] is true if b
112 < c and false otherwise
113
114 [a, b] >> [c, d] Is right of
115
116 [a, b] is occurs entirely to the right of [c, d].
117 [a, b] >> [c, d] is true if a > d and false otherwise
118
119 [a, b] &< [c, d] Overlaps or is left of
120
121 This might be better read as "does not extend to right of".
122 It is true when b <= d.
123
124 [a, b] &> [c, d] Overlaps or is right of
125
126 This might be better read as "does not extend to left of".
127 It is true when a >= c.
128
129 [a, b] = [c, d] Same as
130
131 The segments [a, b] and [c, d] are identical, that is, a == b
132 and c == d
133
134 [a, b] @> [c, d] Contains
135
136 The segment [a, b] contains the segment [c, d], that is,
137 a <= c and b >= d
138
139 [a, b] <@ [c, d] Contained in
140
141 The segment [a, b] is contained in [c, d], that is,
142 a >= c and b <= d
143
144 Although the mnemonics of the following operators is questionable, I
145 preserved them to maintain visual consistency with other geometric
146 data types defined in PostgreSQL.
147
148 Other operators:
149
150 [a, b] < [c, d] Less than
151 [a, b] > [c, d] Greater than
152
153 These operators do not make a lot of sense for any practical
154 purpose but sorting. These operators first compare (a) to (c),
155 and if these are equal, compare (b) to (d). That accounts for
156 reasonably good sorting in most cases, which is useful if
157 you want to use ORDER BY with this type
158
159
160 NOTE: The performance of an R-tree index can largely depend on the
161 order of input values. It may be very helpful to sort the input table
162 on the BIOSEG column (see the script sort-segments.pl for an example)
163
164
165 INDEXES
166 =======
167
168 A GiST index can created for bioseg columns that will greatly speed up
169 overlaps and contains queries. For example:
170
171 CREATE TABLE tt (range bioseg, id integer);
172 CREATE INDEX tt_range_idx ON tt USING gist (range);
173
174
175 INTERBASE COORDINATES
176 =====================
177
178 The standard bioseg type uses the common convention of numbering the bases
179 starting at 1. If you wish to use "interbase" coordinates (also known as "0
180 based" or "half-open intervals") run the build with INTERBASE_COORDS defined
181 in make, ie.:
182
183 make clean
184 make INTERBASE_COORDS=t
185 make install INTERBASE_COORDS=t
186 (or: sudo make install INTERBASE_COORDS=t)
187
188 This will compile and install the implementation for the "bioseg0" type.
189 The "0" in the name is a mnemonic for "0-based".
190
191 Then read "bioseg0.sql" into your database:
192 psql -d databasename < bioseg0.sql
193 to install the type.
194
195 The bioseg and bioseg0 types can be mixed in the same database.
196
197 Notes
198 -----
199 In the interbase system '1..10'::bioseg0 and '10..20'::bioseg0 don't overlap,
200 whereas in the 1-based system '1..10'::bioseg and '10..20'::bioseg have a one
201 base overlap. Also note that the length of '1..10'::bioseg0 is 9, whereas the
202 length of '1..10'::bioseg is 10.
203
204 Unlike the bioseg type the start and/or end of a bioseg0 can be negative, with
205 the expected reults.
206 eg. bioseg0_size('-10..10'::bioseg0) == 20
207
208 See:
209 http://www.gmod.org/wiki/index.php/Introduction_to_Chado#Interbase_Coordinates
210 for a longer discussion of the differences between the coordinate systems.
211
212
213 TESTS
214 =====
215
216 The installation of bioseg can be checked by running:
217
218 make installcheck
219
220
221 CREDITS
222 =======
223
224 Note from the author: Most of the code and all of the hard work needed to
225 implement BIOSEG was by Gene Selkov, Jr, author of the SEG type (contrib/seg
226 in the PostgreSQL source). All bugs are due to me (kmr).
227
228
229 THANKS
230 ======
231
232 Thanks to bioinformatics.org for hosting the project.
233
234
235 AUTHOR
236 ======
237
238 Kim Rutherford <kmr@flymine.org>
239
240 SEG code by Gene Selkov, Jr.