ViewVC Help
View File | Revision Log | Show Annotations | View Changeset | Root Listing
root/bioseg/trunk/README.bioseg
Revision: 18
Committed: Sat Mar 15 20:18:07 2008 UTC (11 years, 7 months ago) by kmr
File size: 6299 byte(s)
Log Message:
Mention the bioseg wiki in the USAGE section of the README.

Line User Rev File contents
1 kmr 1 This directory contains the code for the user-defined type,
2     BIOSEG, representing contiguous intervals in biological sequence.
3    
4     (Most of this documentation is copied from contrib/seg/README.seg in the
5     PostgreSQL source).
6    
7    
8     FILES
9     =====
10    
11     Makefile building instructions for the shared library
12    
13     README.bioseg the file you are now reading
14    
15     bioseg.c the implementation of this data type in C
16    
17     bioseg.sql.in SQL code needed to register this type with PostgreSQL
18     (transformed to bioseg.sql by make)
19    
20     INSTALLATION
21     ============
22    
23     Change into the contrib directory in PostgreSQL and unpack the bioseg tar
24     file:
25     gzip -d < bioseg-x.y.tar.gz | tar xf -
26    
27     To install the type, change to the bioseg directory and run
28    
29     make
30     make install
31    
32     The user running "make install" may need root access; depending on the
33     configuration of PostgreSQL.
34    
35     This only installs the type implementation and documentation. To make the
36     type available in any particular database, do
37    
38     psql -d databasename < bioseg.sql
39    
40     If you install the type in the template1 database, all subsequently created
41     databases will inherit it.
42    
43     To test the new type, after "make install" do
44    
45     make installcheck
46    
47     If it fails, examine the file regression.diffs to find out the reason (the
48     test code is a direct adaptation of the regression tests from the main
49     source tree).
50    
51     If you have a full installation of PostgreSQL, including the pg_config
52     program, bioseg can be unpacked anywhere and built like:
53    
54     make USE_PGXS=t
55     make install USE_PGXS=t
56    
57     and the type can then be installed in a particular database by any user with:
58    
59     psql -d databasename < `pg_config --sharedir`/contrib/bioseg.sql
60    
61    
62     SYNTAX
63     ======
64    
65 kmr 7 The user visible representation of an interval is formed using one or two
66 kmr 1 integers greater than 0 joined by the range operator ('..' or '...').
67     The first integer must be less than or equal to the second.
68    
69 kmr 7 11..22 An interval from 10 to 20 inclusive - length 11 (= 22-11+1)
70 kmr 1
71 kmr 7 1...2 The same as 1..2
72 kmr 1
73 kmr 7 50 The same as 50..50
74 kmr 1
75 kmr 7 In a statement, bioseg values have the form:
76     '<start>..<end>'::bioseg
77     or can be created with:
78     bioseg_create(start, end)
79 kmr 1
80 kmr 7 For example:
81     CREATE TABLE test_bioseg (id integer, seg bioseg);
82     insert into test_bioseg values (1, '1000..2000'::bioseg);
83     or, equivalently
84     insert into test_bioseg values (1, bioseg_create(1000, 2000));
85    
86    
87 kmr 1 USAGE
88     =====
89    
90 kmr 18 See http://www.bioinformatics.org/bioseg/wiki/Main/BiosegUsage for usage
91     examples.
92    
93 kmr 7 The following is a list of the available operators. The [a, b] should be
94 kmr 18 replaced in a statement with 'a..b'::bioseg or bioseg_create(a, b).
95 kmr 1
96     [a, b] && [c, d] Overlaps
97    
98     Returns true if and only if segments [a, b] and [c, d] overlap
99    
100     [a, b] << [c, d] Is left of
101    
102     The left operand, [a, b], occurs entirely to the left of the
103     right operand, [c, d]. It means, [a, b] << [c, d] is true if b
104     < c and false otherwise
105    
106     [a, b] >> [c, d] Is right of
107    
108     [a, b] is occurs entirely to the right of [c, d].
109     [a, b] >> [c, d] is true if a > d and false otherwise
110    
111     [a, b] &< [c, d] Overlaps or is left of
112    
113     This might be better read as "does not extend to right of".
114     It is true when b <= d.
115    
116     [a, b] &> [c, d] Overlaps or is right of
117    
118     This might be better read as "does not extend to left of".
119     It is true when a >= c.
120    
121     [a, b] = [c, d] Same as
122    
123     The segments [a, b] and [c, d] are identical, that is, a == b
124     and c == d
125    
126     [a, b] @> [c, d] Contains
127    
128     The segment [a, b] contains the segment [c, d], that is,
129     a <= c and b >= d
130    
131     [a, b] <@ [c, d] Contained in
132    
133     The segment [a, b] is contained in [c, d], that is,
134     a >= c and b <= d
135    
136     Although the mnemonics of the following operators is questionable, I
137     preserved them to maintain visual consistency with other geometric
138     data types defined in PostgreSQL.
139    
140     Other operators:
141    
142     [a, b] < [c, d] Less than
143     [a, b] > [c, d] Greater than
144    
145     These operators do not make a lot of sense for any practical
146     purpose but sorting. These operators first compare (a) to (c),
147     and if these are equal, compare (b) to (d). That accounts for
148     reasonably good sorting in most cases, which is useful if
149     you want to use ORDER BY with this type
150    
151    
152     NOTE: The performance of an R-tree index can largely depend on the
153     order of input values. It may be very helpful to sort the input table
154     on the BIOSEG column (see the script sort-segments.pl for an example)
155    
156    
157     INDEXES
158     =======
159    
160     A GiST index can created for bioseg columns that will greatly speed up
161     overlaps and contains queries. For example:
162    
163     CREATE TABLE tt (range bioseg, id integer);
164     CREATE INDEX tt_range_idx ON tt USING gist (range);
165    
166    
167     INTERBASE COORDINATES
168     =====================
169    
170     The standard bioseg type uses the common convention of numbering the bases
171     starting at 1. If you wish to use "interbase" coordinates (also known as "0
172     based" or "half-open intervals") run the build with INTERBASE_COORDS defined
173     in make, ie.:
174    
175     make INTERBASE_COORDS=t
176     make install INTERBASE_COORDS=t
177    
178     This will compile and install the implementation for the "bioseg0" type.
179     The "0" in the name being a mnemonic for "0-based".
180    
181 kmr 5 Then restart PostgreSQL and read "bioseg0.sql":
182     psql -d databasename < bioseg0.sql
183     as to install the type in a database.
184 kmr 1
185     Note
186     ----
187     In the interbase system '1..10'::bioseg0 and '10..20'::bioseg0 don't overlap,
188     whereas in the 1-based system '1..10'::bioseg and '10..20'::bioseg have a one
189     base overlap. Also note that the length of '1..10'::bioseg0 is 9, whereas the
190     length of '1..10'::bioseg is 10.
191    
192     See:
193     http://www.gmod.org/wiki/index.php/Introduction_to_Chado#Interbase_Coordinates
194     for a longer discussion of the differences between the coordinate systems.
195    
196    
197     TESTS
198     =====
199    
200     The installation of bioseg can be checked by running:
201    
202     make installcheck
203    
204    
205     CREDITS
206     =======
207    
208     Note from the author: Most of the code and all of the hard work needed to
209     implement BIOSEG was by Gene Selkov, Jr, author of the SEG type (contrib/seg
210 kmr 7 in the PostgreSQL source). All bugs are due to me (kmr).
211 kmr 1
212    
213     AUTHOR
214     ======
215    
216     Kim Rutherford <kmr@flymine.org>
217 kmr 7
218     SEG code by Gene Selkov, Jr.