ViewVC Help
View File | Revision Log | Show Annotations | View Changeset | Root Listing
root/bioseg/trunk/README.bioseg
Revision: 32
Committed: Thu Aug 21 16:13:43 2008 UTC (10 years, 9 months ago) by kmr
File size: 6960 byte(s)
Log Message:
Allow negative coordinates when using bioseg0.

Line User Rev File contents
1 kmr 1 This directory contains the code for the user-defined type,
2     BIOSEG, representing contiguous intervals in biological sequence.
3    
4     (Most of this documentation is copied from contrib/seg/README.seg in the
5     PostgreSQL source).
6    
7    
8     FILES
9     =====
10    
11     Makefile building instructions for the shared library
12    
13     README.bioseg the file you are now reading
14    
15     bioseg.c the implementation of this data type in C
16    
17     bioseg.sql.in SQL code needed to register this type with PostgreSQL
18     (transformed to bioseg.sql by make)
19    
20     INSTALLATION
21     ============
22    
23     Change into the contrib directory in PostgreSQL and unpack the bioseg tar
24     file:
25     gzip -d < bioseg-x.y.tar.gz | tar xf -
26    
27 kmr 23 (Or check-out from subversion with:
28     svn checkout svn://bioinformatics.org/svnroot/bioseg/trunk bioseg
29     in the contrib directory)
30    
31 kmr 1 To install the type, change to the bioseg directory and run
32    
33     make
34     make install
35    
36     The user running "make install" may need root access; depending on the
37 kmr 22 configuration of PostgreSQL. If so this may work:
38 kmr 1
39 kmr 22 sudo make install
40    
41 kmr 1 This only installs the type implementation and documentation. To make the
42     type available in any particular database, do
43    
44     psql -d databasename < bioseg.sql
45    
46     If you install the type in the template1 database, all subsequently created
47     databases will inherit it.
48    
49     To test the new type, after "make install" do
50    
51     make installcheck
52    
53     If it fails, examine the file regression.diffs to find out the reason (the
54     test code is a direct adaptation of the regression tests from the main
55     source tree).
56    
57     If you have a full installation of PostgreSQL, including the pg_config
58     program, bioseg can be unpacked anywhere and built like:
59    
60 kmr 32 make USE_PGXS=t clean
61 kmr 22 make USE_PGXS=t
62     make install USE_PGXS=t
63 kmr 23 (or: sudo make install USE_PGXS=t)
64 kmr 1
65     and the type can then be installed in a particular database by any user with:
66    
67     psql -d databasename < `pg_config --sharedir`/contrib/bioseg.sql
68    
69    
70     SYNTAX
71     ======
72    
73 kmr 7 The user visible representation of an interval is formed using one or two
74 kmr 1 integers greater than 0 joined by the range operator ('..' or '...').
75     The first integer must be less than or equal to the second.
76    
77 kmr 27 11..22 An interval from 11 to 22 inclusive - length 12 (= 22-11+1)
78 kmr 1
79 kmr 7 1...2 The same as 1..2
80 kmr 1
81 kmr 7 50 The same as 50..50
82 kmr 1
83 kmr 7 In a statement, bioseg values have the form:
84     '<start>..<end>'::bioseg
85     or can be created with:
86     bioseg_create(start, end)
87 kmr 1
88 kmr 32 For the bioseg type <start> must be less than or equal to <end> and <start>
89     must be 1 or more.
90    
91 kmr 7 For example:
92     CREATE TABLE test_bioseg (id integer, seg bioseg);
93     insert into test_bioseg values (1, '1000..2000'::bioseg);
94     or, equivalently
95     insert into test_bioseg values (1, bioseg_create(1000, 2000));
96    
97    
98 kmr 1 USAGE
99     =====
100    
101 kmr 18 See http://www.bioinformatics.org/bioseg/wiki/Main/BiosegUsage for usage
102     examples.
103    
104 kmr 7 The following is a list of the available operators. The [a, b] should be
105 kmr 18 replaced in a statement with 'a..b'::bioseg or bioseg_create(a, b).
106 kmr 1
107     [a, b] && [c, d] Overlaps
108    
109     Returns true if and only if segments [a, b] and [c, d] overlap
110    
111     [a, b] << [c, d] Is left of
112    
113     The left operand, [a, b], occurs entirely to the left of the
114     right operand, [c, d]. It means, [a, b] << [c, d] is true if b
115     < c and false otherwise
116    
117     [a, b] >> [c, d] Is right of
118    
119     [a, b] is occurs entirely to the right of [c, d].
120     [a, b] >> [c, d] is true if a > d and false otherwise
121    
122     [a, b] &< [c, d] Overlaps or is left of
123    
124     This might be better read as "does not extend to right of".
125     It is true when b <= d.
126    
127     [a, b] &> [c, d] Overlaps or is right of
128    
129     This might be better read as "does not extend to left of".
130     It is true when a >= c.
131    
132     [a, b] = [c, d] Same as
133    
134     The segments [a, b] and [c, d] are identical, that is, a == b
135     and c == d
136    
137     [a, b] @> [c, d] Contains
138    
139     The segment [a, b] contains the segment [c, d], that is,
140     a <= c and b >= d
141    
142     [a, b] <@ [c, d] Contained in
143    
144     The segment [a, b] is contained in [c, d], that is,
145     a >= c and b <= d
146    
147     Although the mnemonics of the following operators is questionable, I
148     preserved them to maintain visual consistency with other geometric
149     data types defined in PostgreSQL.
150    
151     Other operators:
152    
153     [a, b] < [c, d] Less than
154     [a, b] > [c, d] Greater than
155    
156     These operators do not make a lot of sense for any practical
157     purpose but sorting. These operators first compare (a) to (c),
158     and if these are equal, compare (b) to (d). That accounts for
159     reasonably good sorting in most cases, which is useful if
160     you want to use ORDER BY with this type
161    
162    
163     NOTE: The performance of an R-tree index can largely depend on the
164     order of input values. It may be very helpful to sort the input table
165     on the BIOSEG column (see the script sort-segments.pl for an example)
166    
167    
168     INDEXES
169     =======
170    
171     A GiST index can created for bioseg columns that will greatly speed up
172     overlaps and contains queries. For example:
173    
174     CREATE TABLE tt (range bioseg, id integer);
175     CREATE INDEX tt_range_idx ON tt USING gist (range);
176    
177    
178     INTERBASE COORDINATES
179     =====================
180    
181     The standard bioseg type uses the common convention of numbering the bases
182     starting at 1. If you wish to use "interbase" coordinates (also known as "0
183     based" or "half-open intervals") run the build with INTERBASE_COORDS defined
184     in make, ie.:
185    
186 kmr 32 make clean
187 kmr 1 make INTERBASE_COORDS=t
188     make install INTERBASE_COORDS=t
189 kmr 32 (or: sudo make install INTERBASE_COORDS=t)
190 kmr 1
191     This will compile and install the implementation for the "bioseg0" type.
192 kmr 22 The "0" in the name is a mnemonic for "0-based".
193 kmr 1
194 kmr 22 Then read "bioseg0.sql" into your database:
195 kmr 5 psql -d databasename < bioseg0.sql
196 kmr 22 to install the type.
197 kmr 1
198 kmr 22 The bioseg and bioseg0 types can be mixed in the same database.
199    
200 kmr 32 Notes
201     -----
202 kmr 1 In the interbase system '1..10'::bioseg0 and '10..20'::bioseg0 don't overlap,
203     whereas in the 1-based system '1..10'::bioseg and '10..20'::bioseg have a one
204     base overlap. Also note that the length of '1..10'::bioseg0 is 9, whereas the
205     length of '1..10'::bioseg is 10.
206    
207 kmr 32 Unlike the bioseg type the start and/or end of a bioseg0 can be negative, with
208     the expected reults.
209     eg. bioseg0_size('-10..10'::bioseg0) == 20
210    
211 kmr 1 See:
212     http://www.gmod.org/wiki/index.php/Introduction_to_Chado#Interbase_Coordinates
213     for a longer discussion of the differences between the coordinate systems.
214    
215    
216     TESTS
217     =====
218    
219     The installation of bioseg can be checked by running:
220    
221     make installcheck
222    
223    
224     CREDITS
225     =======
226    
227     Note from the author: Most of the code and all of the hard work needed to
228     implement BIOSEG was by Gene Selkov, Jr, author of the SEG type (contrib/seg
229 kmr 7 in the PostgreSQL source). All bugs are due to me (kmr).
230 kmr 1
231    
232 kmr 23 THANKS
233     ======
234    
235     Thanks to bioinformatics.org for hosting the project.
236    
237    
238 kmr 1 AUTHOR
239     ======
240    
241     Kim Rutherford <kmr@flymine.org>
242 kmr 7
243     SEG code by Gene Selkov, Jr.