ViewVC Help
View File | Revision Log | Show Annotations | View Changeset | Root Listing
root/bioseg/trunk/README.bioseg
Revision: 43
Committed: Fri Dec 19 14:34:50 2008 UTC (10 years, 5 months ago) by kmr
File size: 7011 byte(s)
Log Message:
Documentation fix.

Line User Rev File contents
1 kmr 1 This directory contains the code for the user-defined type,
2     BIOSEG, representing contiguous intervals in biological sequence.
3    
4     (Most of this documentation is copied from contrib/seg/README.seg in the
5     PostgreSQL source).
6    
7    
8     FILES
9     =====
10    
11     Makefile building instructions for the shared library
12    
13     README.bioseg the file you are now reading
14    
15     bioseg.c the implementation of this data type in C
16    
17     bioseg.sql.in SQL code needed to register this type with PostgreSQL
18     (transformed to bioseg.sql by make)
19    
20     INSTALLATION
21     ============
22    
23 kmr 43 Change into the contrib directory in the PostgreSQL source and unpack the
24     bioseg tar file:
25 kmr 1 gzip -d < bioseg-x.y.tar.gz | tar xf -
26    
27 kmr 23 (Or check-out from subversion with:
28     svn checkout svn://bioinformatics.org/svnroot/bioseg/trunk bioseg
29     in the contrib directory)
30    
31 kmr 1 To install the type, change to the bioseg directory and run
32    
33     make
34     make install
35    
36     The user running "make install" may need root access; depending on the
37 kmr 22 configuration of PostgreSQL. If so this may work:
38 kmr 1
39 kmr 22 sudo make install
40    
41 kmr 1 This only installs the type implementation and documentation. To make the
42     type available in any particular database, do
43    
44     psql -d databasename < bioseg.sql
45    
46     If you install the type in the template1 database, all subsequently created
47     databases will inherit it.
48    
49     To test the new type, after "make install" do
50    
51     make installcheck
52    
53     If it fails, examine the file regression.diffs to find out the reason (the
54     test code is a direct adaptation of the regression tests from the main
55     source tree).
56    
57     If you have a full installation of PostgreSQL, including the pg_config
58     program, bioseg can be unpacked anywhere and built like:
59    
60 kmr 32 make USE_PGXS=t clean
61 kmr 22 make USE_PGXS=t
62     make install USE_PGXS=t
63 kmr 23 (or: sudo make install USE_PGXS=t)
64 kmr 1
65     and the type can then be installed in a particular database by any user with:
66    
67     psql -d databasename < `pg_config --sharedir`/contrib/bioseg.sql
68    
69    
70     SYNTAX
71     ======
72    
73 kmr 7 The user visible representation of an interval is formed using one or two
74 kmr 1 integers greater than 0 joined by the range operator ('..' or '...').
75     The first integer must be less than or equal to the second.
76    
77 kmr 27 11..22 An interval from 11 to 22 inclusive - length 12 (= 22-11+1)
78 kmr 1
79 kmr 7 1...2 The same as 1..2
80 kmr 1
81 kmr 7 50 The same as 50..50
82 kmr 1
83 kmr 7 In a statement, bioseg values have the form:
84     '<start>..<end>'::bioseg
85     or can be created with:
86     bioseg_create(start, end)
87 kmr 1
88 kmr 7 For example:
89     CREATE TABLE test_bioseg (id integer, seg bioseg);
90     insert into test_bioseg values (1, '1000..2000'::bioseg);
91     or, equivalently
92     insert into test_bioseg values (1, bioseg_create(1000, 2000));
93    
94    
95 kmr 1 USAGE
96     =====
97    
98 kmr 18 See http://www.bioinformatics.org/bioseg/wiki/Main/BiosegUsage for usage
99     examples.
100    
101 kmr 7 The following is a list of the available operators. The [a, b] should be
102 kmr 18 replaced in a statement with 'a..b'::bioseg or bioseg_create(a, b).
103 kmr 1
104     [a, b] && [c, d] Overlaps
105    
106     Returns true if and only if segments [a, b] and [c, d] overlap
107    
108     [a, b] << [c, d] Is left of
109    
110     The left operand, [a, b], occurs entirely to the left of the
111     right operand, [c, d]. It means, [a, b] << [c, d] is true if b
112     < c and false otherwise
113    
114     [a, b] >> [c, d] Is right of
115    
116     [a, b] is occurs entirely to the right of [c, d].
117     [a, b] >> [c, d] is true if a > d and false otherwise
118    
119     [a, b] &< [c, d] Overlaps or is left of
120    
121     This might be better read as "does not extend to right of".
122     It is true when b <= d.
123    
124     [a, b] &> [c, d] Overlaps or is right of
125    
126     This might be better read as "does not extend to left of".
127     It is true when a >= c.
128    
129     [a, b] = [c, d] Same as
130    
131     The segments [a, b] and [c, d] are identical, that is, a == b
132     and c == d
133    
134     [a, b] @> [c, d] Contains
135    
136     The segment [a, b] contains the segment [c, d], that is,
137     a <= c and b >= d
138    
139     [a, b] <@ [c, d] Contained in
140    
141     The segment [a, b] is contained in [c, d], that is,
142     a >= c and b <= d
143    
144     Other operators:
145    
146     [a, b] < [c, d] Less than
147     [a, b] > [c, d] Greater than
148    
149     These operators do not make a lot of sense for any practical
150     purpose but sorting. These operators first compare (a) to (c),
151     and if these are equal, compare (b) to (d). That accounts for
152     reasonably good sorting in most cases, which is useful if
153     you want to use ORDER BY with this type
154    
155    
156 kmr 38 NOTE: The performance of an R-tree index can largely depend on the order of
157     input values. It may be helpful to sort the input table on the BIOSEG column.
158 kmr 1
159    
160     INDEXES
161     =======
162    
163     A GiST index can created for bioseg columns that will greatly speed up
164     overlaps and contains queries. For example:
165    
166     CREATE TABLE tt (range bioseg, id integer);
167     CREATE INDEX tt_range_idx ON tt USING gist (range);
168    
169 kmr 39 Or for an existing table a function index can be used. For example on a
170     feature table with fmin and fmax:
171 kmr 1
172 kmr 39 CREATE INDEX bioseg_index ON feature USING gist (bioseg_create(fmin, fmax));
173    
174     This query will then find features that overlap 2000..3000, using the index:
175    
176     SELECT * FROM feature
177     WHERE '2000..3000'::bioseg && bioseg_create(fmin, fmax);
178    
179    
180 kmr 1 INTERBASE COORDINATES
181     =====================
182    
183     The standard bioseg type uses the common convention of numbering the bases
184     starting at 1. If you wish to use "interbase" coordinates (also known as "0
185     based" or "half-open intervals") run the build with INTERBASE_COORDS defined
186     in make, ie.:
187    
188 kmr 32 make clean
189 kmr 1 make INTERBASE_COORDS=t
190     make install INTERBASE_COORDS=t
191 kmr 32 (or: sudo make install INTERBASE_COORDS=t)
192 kmr 1
193     This will compile and install the implementation for the "bioseg0" type.
194 kmr 22 The "0" in the name is a mnemonic for "0-based".
195 kmr 1
196 kmr 22 Then read "bioseg0.sql" into your database:
197 kmr 5 psql -d databasename < bioseg0.sql
198 kmr 22 to install the type.
199 kmr 1
200 kmr 22 The bioseg and bioseg0 types can be mixed in the same database.
201    
202 kmr 32 Notes
203     -----
204 kmr 1 In the interbase system '1..10'::bioseg0 and '10..20'::bioseg0 don't overlap,
205     whereas in the 1-based system '1..10'::bioseg and '10..20'::bioseg have a one
206     base overlap. Also note that the length of '1..10'::bioseg0 is 9, whereas the
207     length of '1..10'::bioseg is 10.
208    
209 kmr 32 Unlike the bioseg type the start and/or end of a bioseg0 can be negative, with
210     the expected reults.
211     eg. bioseg0_size('-10..10'::bioseg0) == 20
212    
213 kmr 1 See:
214     http://www.gmod.org/wiki/index.php/Introduction_to_Chado#Interbase_Coordinates
215     for a longer discussion of the differences between the coordinate systems.
216    
217    
218     TESTS
219     =====
220    
221     The installation of bioseg can be checked by running:
222    
223     make installcheck
224    
225    
226     CREDITS
227     =======
228    
229     Note from the author: Most of the code and all of the hard work needed to
230     implement BIOSEG was by Gene Selkov, Jr, author of the SEG type (contrib/seg
231 kmr 7 in the PostgreSQL source). All bugs are due to me (kmr).
232 kmr 1
233    
234 kmr 23 THANKS
235     ======
236    
237     Thanks to bioinformatics.org for hosting the project.
238    
239    
240 kmr 1 AUTHOR
241     ======
242    
243     Kim Rutherford <kmr@flymine.org>
244 kmr 7
245     SEG code by Gene Selkov, Jr.