ViewVC Help
View File | Revision Log | Show Annotations | View Changeset | Root Listing
root/bioseg/trunk/README.bioseg
Revision: 7
Committed: Sat Aug 18 14:25:34 2007 UTC (12 years, 3 months ago) by kmr
File size: 6214 byte(s)
Log Message:
README improvements.

Line User Rev File contents
1 kmr 1 This directory contains the code for the user-defined type,
2     BIOSEG, representing contiguous intervals in biological sequence.
3    
4     (Most of this documentation is copied from contrib/seg/README.seg in the
5     PostgreSQL source).
6    
7    
8     FILES
9     =====
10    
11     Makefile building instructions for the shared library
12    
13     README.bioseg the file you are now reading
14    
15     bioseg.c the implementation of this data type in C
16    
17     bioseg.sql.in SQL code needed to register this type with PostgreSQL
18     (transformed to bioseg.sql by make)
19    
20     INSTALLATION
21     ============
22    
23     Change into the contrib directory in PostgreSQL and unpack the bioseg tar
24     file:
25     gzip -d < bioseg-x.y.tar.gz | tar xf -
26    
27     To install the type, change to the bioseg directory and run
28    
29     make
30     make install
31    
32     The user running "make install" may need root access; depending on the
33     configuration of PostgreSQL.
34    
35     This only installs the type implementation and documentation. To make the
36     type available in any particular database, do
37    
38     psql -d databasename < bioseg.sql
39    
40     If you install the type in the template1 database, all subsequently created
41     databases will inherit it.
42    
43     To test the new type, after "make install" do
44    
45     make installcheck
46    
47     If it fails, examine the file regression.diffs to find out the reason (the
48     test code is a direct adaptation of the regression tests from the main
49     source tree).
50    
51     If you have a full installation of PostgreSQL, including the pg_config
52     program, bioseg can be unpacked anywhere and built like:
53    
54     make USE_PGXS=t
55     make install USE_PGXS=t
56    
57     and the type can then be installed in a particular database by any user with:
58    
59     psql -d databasename < `pg_config --sharedir`/contrib/bioseg.sql
60    
61    
62     SYNTAX
63     ======
64    
65 kmr 7 The user visible representation of an interval is formed using one or two
66 kmr 1 integers greater than 0 joined by the range operator ('..' or '...').
67     The first integer must be less than or equal to the second.
68    
69 kmr 7 11..22 An interval from 10 to 20 inclusive - length 11 (= 22-11+1)
70 kmr 1
71 kmr 7 1...2 The same as 1..2
72 kmr 1
73 kmr 7 50 The same as 50..50
74 kmr 1
75 kmr 7 In a statement, bioseg values have the form:
76     '<start>..<end>'::bioseg
77     or can be created with:
78     bioseg_create(start, end)
79 kmr 1
80 kmr 7 For example:
81     CREATE TABLE test_bioseg (id integer, seg bioseg);
82     insert into test_bioseg values (1, '1000..2000'::bioseg);
83     or, equivalently
84     insert into test_bioseg values (1, bioseg_create(1000, 2000));
85    
86    
87 kmr 1 USAGE
88     =====
89    
90 kmr 7 The following is a list of the available operators. The [a, b] should be
91     replace in a statement with 'a..b'::bioseg or bioseg_create(a, b).
92 kmr 1
93     [a, b] && [c, d] Overlaps
94    
95     Returns true if and only if segments [a, b] and [c, d] overlap
96    
97     [a, b] << [c, d] Is left of
98    
99     The left operand, [a, b], occurs entirely to the left of the
100     right operand, [c, d]. It means, [a, b] << [c, d] is true if b
101     < c and false otherwise
102    
103     [a, b] >> [c, d] Is right of
104    
105     [a, b] is occurs entirely to the right of [c, d].
106     [a, b] >> [c, d] is true if a > d and false otherwise
107    
108     [a, b] &< [c, d] Overlaps or is left of
109    
110     This might be better read as "does not extend to right of".
111     It is true when b <= d.
112    
113     [a, b] &> [c, d] Overlaps or is right of
114    
115     This might be better read as "does not extend to left of".
116     It is true when a >= c.
117    
118     [a, b] = [c, d] Same as
119    
120     The segments [a, b] and [c, d] are identical, that is, a == b
121     and c == d
122    
123     [a, b] @> [c, d] Contains
124    
125     The segment [a, b] contains the segment [c, d], that is,
126     a <= c and b >= d
127    
128     [a, b] <@ [c, d] Contained in
129    
130     The segment [a, b] is contained in [c, d], that is,
131     a >= c and b <= d
132    
133     Although the mnemonics of the following operators is questionable, I
134     preserved them to maintain visual consistency with other geometric
135     data types defined in PostgreSQL.
136    
137     Other operators:
138    
139     [a, b] < [c, d] Less than
140     [a, b] > [c, d] Greater than
141    
142     These operators do not make a lot of sense for any practical
143     purpose but sorting. These operators first compare (a) to (c),
144     and if these are equal, compare (b) to (d). That accounts for
145     reasonably good sorting in most cases, which is useful if
146     you want to use ORDER BY with this type
147    
148    
149     NOTE: The performance of an R-tree index can largely depend on the
150     order of input values. It may be very helpful to sort the input table
151     on the BIOSEG column (see the script sort-segments.pl for an example)
152    
153    
154     INDEXES
155     =======
156    
157     A GiST index can created for bioseg columns that will greatly speed up
158     overlaps and contains queries. For example:
159    
160     CREATE TABLE tt (range bioseg, id integer);
161     CREATE INDEX tt_range_idx ON tt USING gist (range);
162    
163    
164     INTERBASE COORDINATES
165     =====================
166    
167     The standard bioseg type uses the common convention of numbering the bases
168     starting at 1. If you wish to use "interbase" coordinates (also known as "0
169     based" or "half-open intervals") run the build with INTERBASE_COORDS defined
170     in make, ie.:
171    
172     make INTERBASE_COORDS=t
173     make install INTERBASE_COORDS=t
174    
175     This will compile and install the implementation for the "bioseg0" type.
176     The "0" in the name being a mnemonic for "0-based".
177    
178 kmr 5 Then restart PostgreSQL and read "bioseg0.sql":
179     psql -d databasename < bioseg0.sql
180     as to install the type in a database.
181 kmr 1
182     Note
183     ----
184     In the interbase system '1..10'::bioseg0 and '10..20'::bioseg0 don't overlap,
185     whereas in the 1-based system '1..10'::bioseg and '10..20'::bioseg have a one
186     base overlap. Also note that the length of '1..10'::bioseg0 is 9, whereas the
187     length of '1..10'::bioseg is 10.
188    
189     See:
190     http://www.gmod.org/wiki/index.php/Introduction_to_Chado#Interbase_Coordinates
191     for a longer discussion of the differences between the coordinate systems.
192    
193    
194     TESTS
195     =====
196    
197     The installation of bioseg can be checked by running:
198    
199     make installcheck
200    
201    
202     CREDITS
203     =======
204    
205     Note from the author: Most of the code and all of the hard work needed to
206     implement BIOSEG was by Gene Selkov, Jr, author of the SEG type (contrib/seg
207 kmr 7 in the PostgreSQL source). All bugs are due to me (kmr).
208 kmr 1
209    
210     AUTHOR
211     ======
212    
213     Kim Rutherford <kmr@flymine.org>
214 kmr 7
215     SEG code by Gene Selkov, Jr.