ViewVC Help
View File | Revision Log | Show Annotations | View Changeset | Root Listing
root/bioseg/trunk/README.bioseg
Revision: 22
Committed: Mon Apr 7 10:43:47 2008 UTC (11 years, 1 month ago) by kmr
File size: 6383 byte(s)
Log Message:
Documentation fix.

Line User Rev File contents
1 kmr 1 This directory contains the code for the user-defined type,
2     BIOSEG, representing contiguous intervals in biological sequence.
3    
4     (Most of this documentation is copied from contrib/seg/README.seg in the
5     PostgreSQL source).
6    
7    
8     FILES
9     =====
10    
11     Makefile building instructions for the shared library
12    
13     README.bioseg the file you are now reading
14    
15     bioseg.c the implementation of this data type in C
16    
17     bioseg.sql.in SQL code needed to register this type with PostgreSQL
18     (transformed to bioseg.sql by make)
19    
20     INSTALLATION
21     ============
22    
23     Change into the contrib directory in PostgreSQL and unpack the bioseg tar
24     file:
25     gzip -d < bioseg-x.y.tar.gz | tar xf -
26    
27     To install the type, change to the bioseg directory and run
28    
29     make
30     make install
31    
32     The user running "make install" may need root access; depending on the
33 kmr 22 configuration of PostgreSQL. If so this may work:
34 kmr 1
35 kmr 22 sudo make install
36    
37 kmr 1 This only installs the type implementation and documentation. To make the
38     type available in any particular database, do
39    
40     psql -d databasename < bioseg.sql
41    
42     If you install the type in the template1 database, all subsequently created
43     databases will inherit it.
44    
45     To test the new type, after "make install" do
46    
47     make installcheck
48    
49     If it fails, examine the file regression.diffs to find out the reason (the
50     test code is a direct adaptation of the regression tests from the main
51     source tree).
52    
53     If you have a full installation of PostgreSQL, including the pg_config
54     program, bioseg can be unpacked anywhere and built like:
55    
56 kmr 22 make USE_PGXS=t
57     make install USE_PGXS=t
58 kmr 1
59     and the type can then be installed in a particular database by any user with:
60    
61     psql -d databasename < `pg_config --sharedir`/contrib/bioseg.sql
62    
63    
64     SYNTAX
65     ======
66    
67 kmr 7 The user visible representation of an interval is formed using one or two
68 kmr 1 integers greater than 0 joined by the range operator ('..' or '...').
69     The first integer must be less than or equal to the second.
70    
71 kmr 7 11..22 An interval from 10 to 20 inclusive - length 11 (= 22-11+1)
72 kmr 1
73 kmr 7 1...2 The same as 1..2
74 kmr 1
75 kmr 7 50 The same as 50..50
76 kmr 1
77 kmr 7 In a statement, bioseg values have the form:
78     '<start>..<end>'::bioseg
79     or can be created with:
80     bioseg_create(start, end)
81 kmr 1
82 kmr 7 For example:
83     CREATE TABLE test_bioseg (id integer, seg bioseg);
84     insert into test_bioseg values (1, '1000..2000'::bioseg);
85     or, equivalently
86     insert into test_bioseg values (1, bioseg_create(1000, 2000));
87    
88    
89 kmr 1 USAGE
90     =====
91    
92 kmr 18 See http://www.bioinformatics.org/bioseg/wiki/Main/BiosegUsage for usage
93     examples.
94    
95 kmr 7 The following is a list of the available operators. The [a, b] should be
96 kmr 18 replaced in a statement with 'a..b'::bioseg or bioseg_create(a, b).
97 kmr 1
98     [a, b] && [c, d] Overlaps
99    
100     Returns true if and only if segments [a, b] and [c, d] overlap
101    
102     [a, b] << [c, d] Is left of
103    
104     The left operand, [a, b], occurs entirely to the left of the
105     right operand, [c, d]. It means, [a, b] << [c, d] is true if b
106     < c and false otherwise
107    
108     [a, b] >> [c, d] Is right of
109    
110     [a, b] is occurs entirely to the right of [c, d].
111     [a, b] >> [c, d] is true if a > d and false otherwise
112    
113     [a, b] &< [c, d] Overlaps or is left of
114    
115     This might be better read as "does not extend to right of".
116     It is true when b <= d.
117    
118     [a, b] &> [c, d] Overlaps or is right of
119    
120     This might be better read as "does not extend to left of".
121     It is true when a >= c.
122    
123     [a, b] = [c, d] Same as
124    
125     The segments [a, b] and [c, d] are identical, that is, a == b
126     and c == d
127    
128     [a, b] @> [c, d] Contains
129    
130     The segment [a, b] contains the segment [c, d], that is,
131     a <= c and b >= d
132    
133     [a, b] <@ [c, d] Contained in
134    
135     The segment [a, b] is contained in [c, d], that is,
136     a >= c and b <= d
137    
138     Although the mnemonics of the following operators is questionable, I
139     preserved them to maintain visual consistency with other geometric
140     data types defined in PostgreSQL.
141    
142     Other operators:
143    
144     [a, b] < [c, d] Less than
145     [a, b] > [c, d] Greater than
146    
147     These operators do not make a lot of sense for any practical
148     purpose but sorting. These operators first compare (a) to (c),
149     and if these are equal, compare (b) to (d). That accounts for
150     reasonably good sorting in most cases, which is useful if
151     you want to use ORDER BY with this type
152    
153    
154     NOTE: The performance of an R-tree index can largely depend on the
155     order of input values. It may be very helpful to sort the input table
156     on the BIOSEG column (see the script sort-segments.pl for an example)
157    
158    
159     INDEXES
160     =======
161    
162     A GiST index can created for bioseg columns that will greatly speed up
163     overlaps and contains queries. For example:
164    
165     CREATE TABLE tt (range bioseg, id integer);
166     CREATE INDEX tt_range_idx ON tt USING gist (range);
167    
168    
169     INTERBASE COORDINATES
170     =====================
171    
172     The standard bioseg type uses the common convention of numbering the bases
173     starting at 1. If you wish to use "interbase" coordinates (also known as "0
174     based" or "half-open intervals") run the build with INTERBASE_COORDS defined
175     in make, ie.:
176    
177     make INTERBASE_COORDS=t
178     make install INTERBASE_COORDS=t
179    
180     This will compile and install the implementation for the "bioseg0" type.
181 kmr 22 The "0" in the name is a mnemonic for "0-based".
182 kmr 1
183 kmr 22 Then read "bioseg0.sql" into your database:
184 kmr 5 psql -d databasename < bioseg0.sql
185 kmr 22 to install the type.
186 kmr 1
187 kmr 22 The bioseg and bioseg0 types can be mixed in the same database.
188    
189 kmr 1 Note
190     ----
191     In the interbase system '1..10'::bioseg0 and '10..20'::bioseg0 don't overlap,
192     whereas in the 1-based system '1..10'::bioseg and '10..20'::bioseg have a one
193     base overlap. Also note that the length of '1..10'::bioseg0 is 9, whereas the
194     length of '1..10'::bioseg is 10.
195    
196     See:
197     http://www.gmod.org/wiki/index.php/Introduction_to_Chado#Interbase_Coordinates
198     for a longer discussion of the differences between the coordinate systems.
199    
200    
201     TESTS
202     =====
203    
204     The installation of bioseg can be checked by running:
205    
206     make installcheck
207    
208    
209     CREDITS
210     =======
211    
212     Note from the author: Most of the code and all of the hard work needed to
213     implement BIOSEG was by Gene Selkov, Jr, author of the SEG type (contrib/seg
214 kmr 7 in the PostgreSQL source). All bugs are due to me (kmr).
215 kmr 1
216    
217     AUTHOR
218     ======
219    
220     Kim Rutherford <kmr@flymine.org>
221 kmr 7
222     SEG code by Gene Selkov, Jr.