ViewVC Help
View File | Revision Log | Show Annotations | View Changeset | Root Listing
root/bioseg/trunk/README.bioseg
Revision: 43
Committed: Fri Dec 19 14:34:50 2008 UTC (9 years, 6 months ago) by kmr
File size: 7011 byte(s)
Log Message:
Documentation fix.

Line File contents
1 This directory contains the code for the user-defined type,
2 BIOSEG, representing contiguous intervals in biological sequence.
3
4 (Most of this documentation is copied from contrib/seg/README.seg in the
5 PostgreSQL source).
6
7
8 FILES
9 =====
10
11 Makefile building instructions for the shared library
12
13 README.bioseg the file you are now reading
14
15 bioseg.c the implementation of this data type in C
16
17 bioseg.sql.in SQL code needed to register this type with PostgreSQL
18 (transformed to bioseg.sql by make)
19
20 INSTALLATION
21 ============
22
23 Change into the contrib directory in the PostgreSQL source and unpack the
24 bioseg tar file:
25 gzip -d < bioseg-x.y.tar.gz | tar xf -
26
27 (Or check-out from subversion with:
28 svn checkout svn://bioinformatics.org/svnroot/bioseg/trunk bioseg
29 in the contrib directory)
30
31 To install the type, change to the bioseg directory and run
32
33 make
34 make install
35
36 The user running "make install" may need root access; depending on the
37 configuration of PostgreSQL. If so this may work:
38
39 sudo make install
40
41 This only installs the type implementation and documentation. To make the
42 type available in any particular database, do
43
44 psql -d databasename < bioseg.sql
45
46 If you install the type in the template1 database, all subsequently created
47 databases will inherit it.
48
49 To test the new type, after "make install" do
50
51 make installcheck
52
53 If it fails, examine the file regression.diffs to find out the reason (the
54 test code is a direct adaptation of the regression tests from the main
55 source tree).
56
57 If you have a full installation of PostgreSQL, including the pg_config
58 program, bioseg can be unpacked anywhere and built like:
59
60 make USE_PGXS=t clean
61 make USE_PGXS=t
62 make install USE_PGXS=t
63 (or: sudo make install USE_PGXS=t)
64
65 and the type can then be installed in a particular database by any user with:
66
67 psql -d databasename < `pg_config --sharedir`/contrib/bioseg.sql
68
69
70 SYNTAX
71 ======
72
73 The user visible representation of an interval is formed using one or two
74 integers greater than 0 joined by the range operator ('..' or '...').
75 The first integer must be less than or equal to the second.
76
77 11..22 An interval from 11 to 22 inclusive - length 12 (= 22-11+1)
78
79 1...2 The same as 1..2
80
81 50 The same as 50..50
82
83 In a statement, bioseg values have the form:
84 '<start>..<end>'::bioseg
85 or can be created with:
86 bioseg_create(start, end)
87
88 For example:
89 CREATE TABLE test_bioseg (id integer, seg bioseg);
90 insert into test_bioseg values (1, '1000..2000'::bioseg);
91 or, equivalently
92 insert into test_bioseg values (1, bioseg_create(1000, 2000));
93
94
95 USAGE
96 =====
97
98 See http://www.bioinformatics.org/bioseg/wiki/Main/BiosegUsage for usage
99 examples.
100
101 The following is a list of the available operators. The [a, b] should be
102 replaced in a statement with 'a..b'::bioseg or bioseg_create(a, b).
103
104 [a, b] && [c, d] Overlaps
105
106 Returns true if and only if segments [a, b] and [c, d] overlap
107
108 [a, b] << [c, d] Is left of
109
110 The left operand, [a, b], occurs entirely to the left of the
111 right operand, [c, d]. It means, [a, b] << [c, d] is true if b
112 < c and false otherwise
113
114 [a, b] >> [c, d] Is right of
115
116 [a, b] is occurs entirely to the right of [c, d].
117 [a, b] >> [c, d] is true if a > d and false otherwise
118
119 [a, b] &< [c, d] Overlaps or is left of
120
121 This might be better read as "does not extend to right of".
122 It is true when b <= d.
123
124 [a, b] &> [c, d] Overlaps or is right of
125
126 This might be better read as "does not extend to left of".
127 It is true when a >= c.
128
129 [a, b] = [c, d] Same as
130
131 The segments [a, b] and [c, d] are identical, that is, a == b
132 and c == d
133
134 [a, b] @> [c, d] Contains
135
136 The segment [a, b] contains the segment [c, d], that is,
137 a <= c and b >= d
138
139 [a, b] <@ [c, d] Contained in
140
141 The segment [a, b] is contained in [c, d], that is,
142 a >= c and b <= d
143
144 Other operators:
145
146 [a, b] < [c, d] Less than
147 [a, b] > [c, d] Greater than
148
149 These operators do not make a lot of sense for any practical
150 purpose but sorting. These operators first compare (a) to (c),
151 and if these are equal, compare (b) to (d). That accounts for
152 reasonably good sorting in most cases, which is useful if
153 you want to use ORDER BY with this type
154
155
156 NOTE: The performance of an R-tree index can largely depend on the order of
157 input values. It may be helpful to sort the input table on the BIOSEG column.
158
159
160 INDEXES
161 =======
162
163 A GiST index can created for bioseg columns that will greatly speed up
164 overlaps and contains queries. For example:
165
166 CREATE TABLE tt (range bioseg, id integer);
167 CREATE INDEX tt_range_idx ON tt USING gist (range);
168
169 Or for an existing table a function index can be used. For example on a
170 feature table with fmin and fmax:
171
172 CREATE INDEX bioseg_index ON feature USING gist (bioseg_create(fmin, fmax));
173
174 This query will then find features that overlap 2000..3000, using the index:
175
176 SELECT * FROM feature
177 WHERE '2000..3000'::bioseg && bioseg_create(fmin, fmax);
178
179
180 INTERBASE COORDINATES
181 =====================
182
183 The standard bioseg type uses the common convention of numbering the bases
184 starting at 1. If you wish to use "interbase" coordinates (also known as "0
185 based" or "half-open intervals") run the build with INTERBASE_COORDS defined
186 in make, ie.:
187
188 make clean
189 make INTERBASE_COORDS=t
190 make install INTERBASE_COORDS=t
191 (or: sudo make install INTERBASE_COORDS=t)
192
193 This will compile and install the implementation for the "bioseg0" type.
194 The "0" in the name is a mnemonic for "0-based".
195
196 Then read "bioseg0.sql" into your database:
197 psql -d databasename < bioseg0.sql
198 to install the type.
199
200 The bioseg and bioseg0 types can be mixed in the same database.
201
202 Notes
203 -----
204 In the interbase system '1..10'::bioseg0 and '10..20'::bioseg0 don't overlap,
205 whereas in the 1-based system '1..10'::bioseg and '10..20'::bioseg have a one
206 base overlap. Also note that the length of '1..10'::bioseg0 is 9, whereas the
207 length of '1..10'::bioseg is 10.
208
209 Unlike the bioseg type the start and/or end of a bioseg0 can be negative, with
210 the expected reults.
211 eg. bioseg0_size('-10..10'::bioseg0) == 20
212
213 See:
214 http://www.gmod.org/wiki/index.php/Introduction_to_Chado#Interbase_Coordinates
215 for a longer discussion of the differences between the coordinate systems.
216
217
218 TESTS
219 =====
220
221 The installation of bioseg can be checked by running:
222
223 make installcheck
224
225
226 CREDITS
227 =======
228
229 Note from the author: Most of the code and all of the hard work needed to
230 implement BIOSEG was by Gene Selkov, Jr, author of the SEG type (contrib/seg
231 in the PostgreSQL source). All bugs are due to me (kmr).
232
233
234 THANKS
235 ======
236
237 Thanks to bioinformatics.org for hosting the project.
238
239
240 AUTHOR
241 ======
242
243 Kim Rutherford <kmr@flymine.org>
244
245 SEG code by Gene Selkov, Jr.