ViewVC Help
View File | Revision Log | Show Annotations | View Changeset | Root Listing
root/bioseg/trunk/README.bioseg
Revision: 22
Committed: Mon Apr 7 10:43:47 2008 UTC (11 years, 7 months ago) by kmr
File size: 6383 byte(s)
Log Message:
Documentation fix.

Line File contents
1 This directory contains the code for the user-defined type,
2 BIOSEG, representing contiguous intervals in biological sequence.
3
4 (Most of this documentation is copied from contrib/seg/README.seg in the
5 PostgreSQL source).
6
7
8 FILES
9 =====
10
11 Makefile building instructions for the shared library
12
13 README.bioseg the file you are now reading
14
15 bioseg.c the implementation of this data type in C
16
17 bioseg.sql.in SQL code needed to register this type with PostgreSQL
18 (transformed to bioseg.sql by make)
19
20 INSTALLATION
21 ============
22
23 Change into the contrib directory in PostgreSQL and unpack the bioseg tar
24 file:
25 gzip -d < bioseg-x.y.tar.gz | tar xf -
26
27 To install the type, change to the bioseg directory and run
28
29 make
30 make install
31
32 The user running "make install" may need root access; depending on the
33 configuration of PostgreSQL. If so this may work:
34
35 sudo make install
36
37 This only installs the type implementation and documentation. To make the
38 type available in any particular database, do
39
40 psql -d databasename < bioseg.sql
41
42 If you install the type in the template1 database, all subsequently created
43 databases will inherit it.
44
45 To test the new type, after "make install" do
46
47 make installcheck
48
49 If it fails, examine the file regression.diffs to find out the reason (the
50 test code is a direct adaptation of the regression tests from the main
51 source tree).
52
53 If you have a full installation of PostgreSQL, including the pg_config
54 program, bioseg can be unpacked anywhere and built like:
55
56 make USE_PGXS=t
57 make install USE_PGXS=t
58
59 and the type can then be installed in a particular database by any user with:
60
61 psql -d databasename < `pg_config --sharedir`/contrib/bioseg.sql
62
63
64 SYNTAX
65 ======
66
67 The user visible representation of an interval is formed using one or two
68 integers greater than 0 joined by the range operator ('..' or '...').
69 The first integer must be less than or equal to the second.
70
71 11..22 An interval from 10 to 20 inclusive - length 11 (= 22-11+1)
72
73 1...2 The same as 1..2
74
75 50 The same as 50..50
76
77 In a statement, bioseg values have the form:
78 '<start>..<end>'::bioseg
79 or can be created with:
80 bioseg_create(start, end)
81
82 For example:
83 CREATE TABLE test_bioseg (id integer, seg bioseg);
84 insert into test_bioseg values (1, '1000..2000'::bioseg);
85 or, equivalently
86 insert into test_bioseg values (1, bioseg_create(1000, 2000));
87
88
89 USAGE
90 =====
91
92 See http://www.bioinformatics.org/bioseg/wiki/Main/BiosegUsage for usage
93 examples.
94
95 The following is a list of the available operators. The [a, b] should be
96 replaced in a statement with 'a..b'::bioseg or bioseg_create(a, b).
97
98 [a, b] && [c, d] Overlaps
99
100 Returns true if and only if segments [a, b] and [c, d] overlap
101
102 [a, b] << [c, d] Is left of
103
104 The left operand, [a, b], occurs entirely to the left of the
105 right operand, [c, d]. It means, [a, b] << [c, d] is true if b
106 < c and false otherwise
107
108 [a, b] >> [c, d] Is right of
109
110 [a, b] is occurs entirely to the right of [c, d].
111 [a, b] >> [c, d] is true if a > d and false otherwise
112
113 [a, b] &< [c, d] Overlaps or is left of
114
115 This might be better read as "does not extend to right of".
116 It is true when b <= d.
117
118 [a, b] &> [c, d] Overlaps or is right of
119
120 This might be better read as "does not extend to left of".
121 It is true when a >= c.
122
123 [a, b] = [c, d] Same as
124
125 The segments [a, b] and [c, d] are identical, that is, a == b
126 and c == d
127
128 [a, b] @> [c, d] Contains
129
130 The segment [a, b] contains the segment [c, d], that is,
131 a <= c and b >= d
132
133 [a, b] <@ [c, d] Contained in
134
135 The segment [a, b] is contained in [c, d], that is,
136 a >= c and b <= d
137
138 Although the mnemonics of the following operators is questionable, I
139 preserved them to maintain visual consistency with other geometric
140 data types defined in PostgreSQL.
141
142 Other operators:
143
144 [a, b] < [c, d] Less than
145 [a, b] > [c, d] Greater than
146
147 These operators do not make a lot of sense for any practical
148 purpose but sorting. These operators first compare (a) to (c),
149 and if these are equal, compare (b) to (d). That accounts for
150 reasonably good sorting in most cases, which is useful if
151 you want to use ORDER BY with this type
152
153
154 NOTE: The performance of an R-tree index can largely depend on the
155 order of input values. It may be very helpful to sort the input table
156 on the BIOSEG column (see the script sort-segments.pl for an example)
157
158
159 INDEXES
160 =======
161
162 A GiST index can created for bioseg columns that will greatly speed up
163 overlaps and contains queries. For example:
164
165 CREATE TABLE tt (range bioseg, id integer);
166 CREATE INDEX tt_range_idx ON tt USING gist (range);
167
168
169 INTERBASE COORDINATES
170 =====================
171
172 The standard bioseg type uses the common convention of numbering the bases
173 starting at 1. If you wish to use "interbase" coordinates (also known as "0
174 based" or "half-open intervals") run the build with INTERBASE_COORDS defined
175 in make, ie.:
176
177 make INTERBASE_COORDS=t
178 make install INTERBASE_COORDS=t
179
180 This will compile and install the implementation for the "bioseg0" type.
181 The "0" in the name is a mnemonic for "0-based".
182
183 Then read "bioseg0.sql" into your database:
184 psql -d databasename < bioseg0.sql
185 to install the type.
186
187 The bioseg and bioseg0 types can be mixed in the same database.
188
189 Note
190 ----
191 In the interbase system '1..10'::bioseg0 and '10..20'::bioseg0 don't overlap,
192 whereas in the 1-based system '1..10'::bioseg and '10..20'::bioseg have a one
193 base overlap. Also note that the length of '1..10'::bioseg0 is 9, whereas the
194 length of '1..10'::bioseg is 10.
195
196 See:
197 http://www.gmod.org/wiki/index.php/Introduction_to_Chado#Interbase_Coordinates
198 for a longer discussion of the differences between the coordinate systems.
199
200
201 TESTS
202 =====
203
204 The installation of bioseg can be checked by running:
205
206 make installcheck
207
208
209 CREDITS
210 =======
211
212 Note from the author: Most of the code and all of the hard work needed to
213 implement BIOSEG was by Gene Selkov, Jr, author of the SEG type (contrib/seg
214 in the PostgreSQL source). All bugs are due to me (kmr).
215
216
217 AUTHOR
218 ======
219
220 Kim Rutherford <kmr@flymine.org>
221
222 SEG code by Gene Selkov, Jr.