ViewVC Help
View File | Revision Log | Show Annotations | View Changeset | Root Listing
root/bioseg/trunk/README.bioseg
Revision: 51
Committed: Mon Aug 29 00:57:17 2011 UTC (10 years, 10 months ago) by kmr
File size: 7144 byte(s)
Log Message:
Reword the README to put the USE_PGXS=t installation method first.

Line File contents
1 This directory contains the code for the user-defined type,
2 BIOSEG, representing contiguous intervals in biological sequence.
3
4 (Most of this documentation is copied from contrib/seg/README.seg in the
5 PostgreSQL source).
6
7
8 FILES
9 =====
10
11 Makefile building instructions for the shared library
12
13 README.bioseg the file you are now reading
14
15 bioseg.c the implementation of this data type in C
16
17 bioseg.sql.in SQL code needed to register this type with PostgreSQL
18 (transformed to bioseg.sql by make)
19
20 INSTALLATION
21 ============
22
23 If you have a full installation of PostgreSQL, including the pg_config
24 program, bioseg can be unpacked anywhere and built like:
25
26 gzip -d < bioseg-x.y.tar.gz | tar xf -
27 cd bioseg-x.y
28 make USE_PGXS=t clean
29 make USE_PGXS=t
30 make install USE_PGXS=t
31 (or: sudo make install USE_PGXS=t)
32
33 and the type can then be installed in a particular database by any user with:
34
35 psql -d databasename < `pg_config --sharedir`/contrib/bioseg.sql
36
37 Alternatively, if you don't have pg_config, but you have access to the
38 PostgreSQL source code, you can unpack the bioseg tar into the contrib
39 directory with:
40 gzip -d < bioseg-x.y.tar.gz | tar xf -
41
42 (Or check-out from subversion with:
43 svn checkout svn://bioinformatics.org/svnroot/bioseg/trunk bioseg
44 in the contrib directory)
45
46 Then to install the type, change to the bioseg directory and run
47
48 make
49 make install
50
51 The user running "make install" may need root access; depending on the
52 configuration of PostgreSQL. If so this may work:
53
54 sudo make install
55
56 This only installs the type implementation and documentation. To make the
57 type available in any particular database, do
58
59 psql -d databasename < bioseg.sql
60
61 If you install the type in the template1 database, all subsequently created
62 databases will inherit it.
63
64 To test the new type, after "make install" do
65
66 make installcheck
67
68 If it fails, examine the file regression.diffs to find out the reason (the
69 test code is a direct adaptation of the regression tests from the main
70 source tree).
71
72
73 SYNTAX
74 ======
75
76 The user visible representation of an interval is formed using one or two
77 integers greater than 0 joined by the range operator ('..' or '...').
78 The first integer must be less than or equal to the second.
79
80 11..22 An interval from 11 to 22 inclusive - length 12 (= 22-11+1)
81
82 1...2 The same as 1..2
83
84 50 The same as 50..50
85
86 In a statement, bioseg values have the form:
87 '<start>..<end>'::bioseg
88 or can be created with:
89 bioseg_create(start, end)
90
91 For example:
92 CREATE TABLE test_bioseg (id integer, seg bioseg);
93 insert into test_bioseg values (1, '1000..2000'::bioseg);
94 or, equivalently
95 insert into test_bioseg values (1, bioseg_create(1000, 2000));
96
97
98 USAGE
99 =====
100
101 See http://www.bioinformatics.org/bioseg/wiki/Main/BiosegUsage for usage
102 examples.
103
104 The following is a list of the available operators. The [a, b] should be
105 replaced in a statement with 'a..b'::bioseg or bioseg_create(a, b).
106
107 [a, b] && [c, d] Overlaps
108
109 Returns true if and only if segments [a, b] and [c, d] overlap
110
111 [a, b] << [c, d] Is left of
112
113 The left operand, [a, b], occurs entirely to the left of the
114 right operand, [c, d]. It means, [a, b] << [c, d] is true if b
115 < c and false otherwise
116
117 [a, b] >> [c, d] Is right of
118
119 [a, b] is occurs entirely to the right of [c, d].
120 [a, b] >> [c, d] is true if a > d and false otherwise
121
122 [a, b] &< [c, d] Overlaps or is left of
123
124 This might be better read as "does not extend to right of".
125 It is true when b <= d.
126
127 [a, b] &> [c, d] Overlaps or is right of
128
129 This might be better read as "does not extend to left of".
130 It is true when a >= c.
131
132 [a, b] = [c, d] Same as
133
134 The segments [a, b] and [c, d] are identical, that is, a == b
135 and c == d
136
137 [a, b] @> [c, d] Contains
138
139 The segment [a, b] contains the segment [c, d], that is,
140 a <= c and b >= d
141
142 [a, b] <@ [c, d] Contained in
143
144 The segment [a, b] is contained in [c, d], that is,
145 a >= c and b <= d
146
147 Other operators:
148
149 [a, b] < [c, d] Less than
150 [a, b] > [c, d] Greater than
151
152 These operators do not make a lot of sense for any practical
153 purpose but sorting. These operators first compare (a) to (c),
154 and if these are equal, compare (b) to (d). That accounts for
155 reasonably good sorting in most cases, which is useful if
156 you want to use ORDER BY with this type
157
158
159 NOTE: The performance of an R-tree index can largely depend on the order of
160 input values. It may be helpful to sort the input table on the BIOSEG column.
161
162
163 INDEXES
164 =======
165
166 A GiST index can created for bioseg columns that will greatly speed up
167 overlaps and contains queries. For example:
168
169 CREATE TABLE tt (range bioseg, id integer);
170 CREATE INDEX tt_range_idx ON tt USING gist (range);
171
172 Or for an existing table a function index can be used. For example on a
173 feature table with fmin and fmax:
174
175 CREATE INDEX bioseg_index ON feature USING gist (bioseg_create(fmin, fmax));
176
177 This query will then find features that overlap 2000..3000, using the index:
178
179 SELECT * FROM feature
180 WHERE '2000..3000'::bioseg && bioseg_create(fmin, fmax);
181
182
183 INTERBASE COORDINATES
184 =====================
185
186 The standard bioseg type uses the common convention of numbering the bases
187 starting at 1. If you wish to use "interbase" coordinates (also known as "0
188 based" or "half-open intervals") run the build with INTERBASE_COORDS defined
189 in make, ie.:
190
191 make clean
192 make INTERBASE_COORDS=t
193 make install INTERBASE_COORDS=t
194 (or: sudo make install INTERBASE_COORDS=t)
195
196 This will compile and install the implementation for the "bioseg0" type.
197 The "0" in the name is a mnemonic for "0-based".
198
199 Then read "bioseg0.sql" into your database:
200 psql -d databasename < bioseg0.sql
201 to install the type.
202
203 The bioseg and bioseg0 types can be mixed in the same database.
204
205 Notes
206 -----
207 In the interbase system '1..10'::bioseg0 and '10..20'::bioseg0 don't overlap,
208 whereas in the 1-based system '1..10'::bioseg and '10..20'::bioseg have a one
209 base overlap. Also note that the length of '1..10'::bioseg0 is 9, whereas the
210 length of '1..10'::bioseg is 10.
211
212 Unlike the bioseg type the start and/or end of a bioseg0 can be negative, with
213 the expected reults.
214 eg. bioseg0_size('-10..10'::bioseg0) == 20
215
216 See:
217 http://www.gmod.org/wiki/index.php/Introduction_to_Chado#Interbase_Coordinates
218 for a longer discussion of the differences between the coordinate systems.
219
220
221 TESTS
222 =====
223
224 The installation of bioseg can be checked by running:
225
226 make installcheck
227
228
229 CREDITS
230 =======
231
232 Note from the author: Most of the code and all of the hard work needed to
233 implement BIOSEG was by Gene Selkov, Jr, author of the SEG type (contrib/seg
234 in the PostgreSQL source). All bugs are due to me (kmr).
235
236
237 THANKS
238 ======
239
240 Thanks to bioinformatics.org for hosting the project.
241
242
243 AUTHOR
244 ======
245
246 Kim Rutherford <kmr@flymine.org>
247
248 SEG code by Gene Selkov, Jr.