ViewVC Help
View File | Revision Log | Show Annotations | View Changeset | Root Listing
root/bioseg/trunk/README.bioseg
Revision: 18
Committed: Sat Mar 15 20:18:07 2008 UTC (11 years, 3 months ago) by kmr
File size: 6299 byte(s)
Log Message:
Mention the bioseg wiki in the USAGE section of the README.

Line File contents
1 This directory contains the code for the user-defined type,
2 BIOSEG, representing contiguous intervals in biological sequence.
3
4 (Most of this documentation is copied from contrib/seg/README.seg in the
5 PostgreSQL source).
6
7
8 FILES
9 =====
10
11 Makefile building instructions for the shared library
12
13 README.bioseg the file you are now reading
14
15 bioseg.c the implementation of this data type in C
16
17 bioseg.sql.in SQL code needed to register this type with PostgreSQL
18 (transformed to bioseg.sql by make)
19
20 INSTALLATION
21 ============
22
23 Change into the contrib directory in PostgreSQL and unpack the bioseg tar
24 file:
25 gzip -d < bioseg-x.y.tar.gz | tar xf -
26
27 To install the type, change to the bioseg directory and run
28
29 make
30 make install
31
32 The user running "make install" may need root access; depending on the
33 configuration of PostgreSQL.
34
35 This only installs the type implementation and documentation. To make the
36 type available in any particular database, do
37
38 psql -d databasename < bioseg.sql
39
40 If you install the type in the template1 database, all subsequently created
41 databases will inherit it.
42
43 To test the new type, after "make install" do
44
45 make installcheck
46
47 If it fails, examine the file regression.diffs to find out the reason (the
48 test code is a direct adaptation of the regression tests from the main
49 source tree).
50
51 If you have a full installation of PostgreSQL, including the pg_config
52 program, bioseg can be unpacked anywhere and built like:
53
54 make USE_PGXS=t
55 make install USE_PGXS=t
56
57 and the type can then be installed in a particular database by any user with:
58
59 psql -d databasename < `pg_config --sharedir`/contrib/bioseg.sql
60
61
62 SYNTAX
63 ======
64
65 The user visible representation of an interval is formed using one or two
66 integers greater than 0 joined by the range operator ('..' or '...').
67 The first integer must be less than or equal to the second.
68
69 11..22 An interval from 10 to 20 inclusive - length 11 (= 22-11+1)
70
71 1...2 The same as 1..2
72
73 50 The same as 50..50
74
75 In a statement, bioseg values have the form:
76 '<start>..<end>'::bioseg
77 or can be created with:
78 bioseg_create(start, end)
79
80 For example:
81 CREATE TABLE test_bioseg (id integer, seg bioseg);
82 insert into test_bioseg values (1, '1000..2000'::bioseg);
83 or, equivalently
84 insert into test_bioseg values (1, bioseg_create(1000, 2000));
85
86
87 USAGE
88 =====
89
90 See http://www.bioinformatics.org/bioseg/wiki/Main/BiosegUsage for usage
91 examples.
92
93 The following is a list of the available operators. The [a, b] should be
94 replaced in a statement with 'a..b'::bioseg or bioseg_create(a, b).
95
96 [a, b] && [c, d] Overlaps
97
98 Returns true if and only if segments [a, b] and [c, d] overlap
99
100 [a, b] << [c, d] Is left of
101
102 The left operand, [a, b], occurs entirely to the left of the
103 right operand, [c, d]. It means, [a, b] << [c, d] is true if b
104 < c and false otherwise
105
106 [a, b] >> [c, d] Is right of
107
108 [a, b] is occurs entirely to the right of [c, d].
109 [a, b] >> [c, d] is true if a > d and false otherwise
110
111 [a, b] &< [c, d] Overlaps or is left of
112
113 This might be better read as "does not extend to right of".
114 It is true when b <= d.
115
116 [a, b] &> [c, d] Overlaps or is right of
117
118 This might be better read as "does not extend to left of".
119 It is true when a >= c.
120
121 [a, b] = [c, d] Same as
122
123 The segments [a, b] and [c, d] are identical, that is, a == b
124 and c == d
125
126 [a, b] @> [c, d] Contains
127
128 The segment [a, b] contains the segment [c, d], that is,
129 a <= c and b >= d
130
131 [a, b] <@ [c, d] Contained in
132
133 The segment [a, b] is contained in [c, d], that is,
134 a >= c and b <= d
135
136 Although the mnemonics of the following operators is questionable, I
137 preserved them to maintain visual consistency with other geometric
138 data types defined in PostgreSQL.
139
140 Other operators:
141
142 [a, b] < [c, d] Less than
143 [a, b] > [c, d] Greater than
144
145 These operators do not make a lot of sense for any practical
146 purpose but sorting. These operators first compare (a) to (c),
147 and if these are equal, compare (b) to (d). That accounts for
148 reasonably good sorting in most cases, which is useful if
149 you want to use ORDER BY with this type
150
151
152 NOTE: The performance of an R-tree index can largely depend on the
153 order of input values. It may be very helpful to sort the input table
154 on the BIOSEG column (see the script sort-segments.pl for an example)
155
156
157 INDEXES
158 =======
159
160 A GiST index can created for bioseg columns that will greatly speed up
161 overlaps and contains queries. For example:
162
163 CREATE TABLE tt (range bioseg, id integer);
164 CREATE INDEX tt_range_idx ON tt USING gist (range);
165
166
167 INTERBASE COORDINATES
168 =====================
169
170 The standard bioseg type uses the common convention of numbering the bases
171 starting at 1. If you wish to use "interbase" coordinates (also known as "0
172 based" or "half-open intervals") run the build with INTERBASE_COORDS defined
173 in make, ie.:
174
175 make INTERBASE_COORDS=t
176 make install INTERBASE_COORDS=t
177
178 This will compile and install the implementation for the "bioseg0" type.
179 The "0" in the name being a mnemonic for "0-based".
180
181 Then restart PostgreSQL and read "bioseg0.sql":
182 psql -d databasename < bioseg0.sql
183 as to install the type in a database.
184
185 Note
186 ----
187 In the interbase system '1..10'::bioseg0 and '10..20'::bioseg0 don't overlap,
188 whereas in the 1-based system '1..10'::bioseg and '10..20'::bioseg have a one
189 base overlap. Also note that the length of '1..10'::bioseg0 is 9, whereas the
190 length of '1..10'::bioseg is 10.
191
192 See:
193 http://www.gmod.org/wiki/index.php/Introduction_to_Chado#Interbase_Coordinates
194 for a longer discussion of the differences between the coordinate systems.
195
196
197 TESTS
198 =====
199
200 The installation of bioseg can be checked by running:
201
202 make installcheck
203
204
205 CREDITS
206 =======
207
208 Note from the author: Most of the code and all of the hard work needed to
209 implement BIOSEG was by Gene Selkov, Jr, author of the SEG type (contrib/seg
210 in the PostgreSQL source). All bugs are due to me (kmr).
211
212
213 AUTHOR
214 ======
215
216 Kim Rutherford <kmr@flymine.org>
217
218 SEG code by Gene Selkov, Jr.