ViewVC Help
View File | Revision Log | Show Annotations | View Changeset | Root Listing
root/bioseg/trunk/README.bioseg
Revision: 27
Committed: Wed Aug 20 20:10:26 2008 UTC (10 years, 4 months ago) by kmr
File size: 6626 byte(s)
Log Message:
Fixed typo in README.

Line File contents
1 This directory contains the code for the user-defined type,
2 BIOSEG, representing contiguous intervals in biological sequence.
3
4 (Most of this documentation is copied from contrib/seg/README.seg in the
5 PostgreSQL source).
6
7
8 FILES
9 =====
10
11 Makefile building instructions for the shared library
12
13 README.bioseg the file you are now reading
14
15 bioseg.c the implementation of this data type in C
16
17 bioseg.sql.in SQL code needed to register this type with PostgreSQL
18 (transformed to bioseg.sql by make)
19
20 INSTALLATION
21 ============
22
23 Change into the contrib directory in PostgreSQL and unpack the bioseg tar
24 file:
25 gzip -d < bioseg-x.y.tar.gz | tar xf -
26
27 (Or check-out from subversion with:
28 svn checkout svn://bioinformatics.org/svnroot/bioseg/trunk bioseg
29 in the contrib directory)
30
31 To install the type, change to the bioseg directory and run
32
33 make
34 make install
35
36 The user running "make install" may need root access; depending on the
37 configuration of PostgreSQL. If so this may work:
38
39 sudo make install
40
41 This only installs the type implementation and documentation. To make the
42 type available in any particular database, do
43
44 psql -d databasename < bioseg.sql
45
46 If you install the type in the template1 database, all subsequently created
47 databases will inherit it.
48
49 To test the new type, after "make install" do
50
51 make installcheck
52
53 If it fails, examine the file regression.diffs to find out the reason (the
54 test code is a direct adaptation of the regression tests from the main
55 source tree).
56
57 If you have a full installation of PostgreSQL, including the pg_config
58 program, bioseg can be unpacked anywhere and built like:
59
60 make USE_PGXS=t
61 make install USE_PGXS=t
62 (or: sudo make install USE_PGXS=t)
63
64 and the type can then be installed in a particular database by any user with:
65
66 psql -d databasename < `pg_config --sharedir`/contrib/bioseg.sql
67
68
69 SYNTAX
70 ======
71
72 The user visible representation of an interval is formed using one or two
73 integers greater than 0 joined by the range operator ('..' or '...').
74 The first integer must be less than or equal to the second.
75
76 11..22 An interval from 11 to 22 inclusive - length 12 (= 22-11+1)
77
78 1...2 The same as 1..2
79
80 50 The same as 50..50
81
82 In a statement, bioseg values have the form:
83 '<start>..<end>'::bioseg
84 or can be created with:
85 bioseg_create(start, end)
86
87 For example:
88 CREATE TABLE test_bioseg (id integer, seg bioseg);
89 insert into test_bioseg values (1, '1000..2000'::bioseg);
90 or, equivalently
91 insert into test_bioseg values (1, bioseg_create(1000, 2000));
92
93
94 USAGE
95 =====
96
97 See http://www.bioinformatics.org/bioseg/wiki/Main/BiosegUsage for usage
98 examples.
99
100 The following is a list of the available operators. The [a, b] should be
101 replaced in a statement with 'a..b'::bioseg or bioseg_create(a, b).
102
103 [a, b] && [c, d] Overlaps
104
105 Returns true if and only if segments [a, b] and [c, d] overlap
106
107 [a, b] << [c, d] Is left of
108
109 The left operand, [a, b], occurs entirely to the left of the
110 right operand, [c, d]. It means, [a, b] << [c, d] is true if b
111 < c and false otherwise
112
113 [a, b] >> [c, d] Is right of
114
115 [a, b] is occurs entirely to the right of [c, d].
116 [a, b] >> [c, d] is true if a > d and false otherwise
117
118 [a, b] &< [c, d] Overlaps or is left of
119
120 This might be better read as "does not extend to right of".
121 It is true when b <= d.
122
123 [a, b] &> [c, d] Overlaps or is right of
124
125 This might be better read as "does not extend to left of".
126 It is true when a >= c.
127
128 [a, b] = [c, d] Same as
129
130 The segments [a, b] and [c, d] are identical, that is, a == b
131 and c == d
132
133 [a, b] @> [c, d] Contains
134
135 The segment [a, b] contains the segment [c, d], that is,
136 a <= c and b >= d
137
138 [a, b] <@ [c, d] Contained in
139
140 The segment [a, b] is contained in [c, d], that is,
141 a >= c and b <= d
142
143 Although the mnemonics of the following operators is questionable, I
144 preserved them to maintain visual consistency with other geometric
145 data types defined in PostgreSQL.
146
147 Other operators:
148
149 [a, b] < [c, d] Less than
150 [a, b] > [c, d] Greater than
151
152 These operators do not make a lot of sense for any practical
153 purpose but sorting. These operators first compare (a) to (c),
154 and if these are equal, compare (b) to (d). That accounts for
155 reasonably good sorting in most cases, which is useful if
156 you want to use ORDER BY with this type
157
158
159 NOTE: The performance of an R-tree index can largely depend on the
160 order of input values. It may be very helpful to sort the input table
161 on the BIOSEG column (see the script sort-segments.pl for an example)
162
163
164 INDEXES
165 =======
166
167 A GiST index can created for bioseg columns that will greatly speed up
168 overlaps and contains queries. For example:
169
170 CREATE TABLE tt (range bioseg, id integer);
171 CREATE INDEX tt_range_idx ON tt USING gist (range);
172
173
174 INTERBASE COORDINATES
175 =====================
176
177 The standard bioseg type uses the common convention of numbering the bases
178 starting at 1. If you wish to use "interbase" coordinates (also known as "0
179 based" or "half-open intervals") run the build with INTERBASE_COORDS defined
180 in make, ie.:
181
182 make INTERBASE_COORDS=t
183 make install INTERBASE_COORDS=t
184
185 This will compile and install the implementation for the "bioseg0" type.
186 The "0" in the name is a mnemonic for "0-based".
187
188 Then read "bioseg0.sql" into your database:
189 psql -d databasename < bioseg0.sql
190 to install the type.
191
192 The bioseg and bioseg0 types can be mixed in the same database.
193
194 Note
195 ----
196 In the interbase system '1..10'::bioseg0 and '10..20'::bioseg0 don't overlap,
197 whereas in the 1-based system '1..10'::bioseg and '10..20'::bioseg have a one
198 base overlap. Also note that the length of '1..10'::bioseg0 is 9, whereas the
199 length of '1..10'::bioseg is 10.
200
201 See:
202 http://www.gmod.org/wiki/index.php/Introduction_to_Chado#Interbase_Coordinates
203 for a longer discussion of the differences between the coordinate systems.
204
205
206 TESTS
207 =====
208
209 The installation of bioseg can be checked by running:
210
211 make installcheck
212
213
214 CREDITS
215 =======
216
217 Note from the author: Most of the code and all of the hard work needed to
218 implement BIOSEG was by Gene Selkov, Jr, author of the SEG type (contrib/seg
219 in the PostgreSQL source). All bugs are due to me (kmr).
220
221
222 THANKS
223 ======
224
225 Thanks to bioinformatics.org for hosting the project.
226
227
228 AUTHOR
229 ======
230
231 Kim Rutherford <kmr@flymine.org>
232
233 SEG code by Gene Selkov, Jr.