ViewVC Help
View File | Revision Log | Show Annotations | View Changeset | Root Listing
root/bioseg/trunk/README.bioseg
Revision: 23
Committed: Wed Jul 9 12:18:51 2008 UTC (10 years, 8 months ago) by kmr
File size: 6626 byte(s)
Log Message:
Mention subversion checkout in the README.

Line User Rev File contents
1 kmr 1 This directory contains the code for the user-defined type,
2     BIOSEG, representing contiguous intervals in biological sequence.
3    
4     (Most of this documentation is copied from contrib/seg/README.seg in the
5     PostgreSQL source).
6    
7    
8     FILES
9     =====
10    
11     Makefile building instructions for the shared library
12    
13     README.bioseg the file you are now reading
14    
15     bioseg.c the implementation of this data type in C
16    
17     bioseg.sql.in SQL code needed to register this type with PostgreSQL
18     (transformed to bioseg.sql by make)
19    
20     INSTALLATION
21     ============
22    
23     Change into the contrib directory in PostgreSQL and unpack the bioseg tar
24     file:
25     gzip -d < bioseg-x.y.tar.gz | tar xf -
26    
27 kmr 23 (Or check-out from subversion with:
28     svn checkout svn://bioinformatics.org/svnroot/bioseg/trunk bioseg
29     in the contrib directory)
30    
31 kmr 1 To install the type, change to the bioseg directory and run
32    
33     make
34     make install
35    
36     The user running "make install" may need root access; depending on the
37 kmr 22 configuration of PostgreSQL. If so this may work:
38 kmr 1
39 kmr 22 sudo make install
40    
41 kmr 1 This only installs the type implementation and documentation. To make the
42     type available in any particular database, do
43    
44     psql -d databasename < bioseg.sql
45    
46     If you install the type in the template1 database, all subsequently created
47     databases will inherit it.
48    
49     To test the new type, after "make install" do
50    
51     make installcheck
52    
53     If it fails, examine the file regression.diffs to find out the reason (the
54     test code is a direct adaptation of the regression tests from the main
55     source tree).
56    
57     If you have a full installation of PostgreSQL, including the pg_config
58     program, bioseg can be unpacked anywhere and built like:
59    
60 kmr 22 make USE_PGXS=t
61     make install USE_PGXS=t
62 kmr 23 (or: sudo make install USE_PGXS=t)
63 kmr 1
64     and the type can then be installed in a particular database by any user with:
65    
66     psql -d databasename < `pg_config --sharedir`/contrib/bioseg.sql
67    
68    
69     SYNTAX
70     ======
71    
72 kmr 7 The user visible representation of an interval is formed using one or two
73 kmr 1 integers greater than 0 joined by the range operator ('..' or '...').
74     The first integer must be less than or equal to the second.
75    
76 kmr 7 11..22 An interval from 10 to 20 inclusive - length 11 (= 22-11+1)
77 kmr 1
78 kmr 7 1...2 The same as 1..2
79 kmr 1
80 kmr 7 50 The same as 50..50
81 kmr 1
82 kmr 7 In a statement, bioseg values have the form:
83     '<start>..<end>'::bioseg
84     or can be created with:
85     bioseg_create(start, end)
86 kmr 1
87 kmr 7 For example:
88     CREATE TABLE test_bioseg (id integer, seg bioseg);
89     insert into test_bioseg values (1, '1000..2000'::bioseg);
90     or, equivalently
91     insert into test_bioseg values (1, bioseg_create(1000, 2000));
92    
93    
94 kmr 1 USAGE
95     =====
96    
97 kmr 18 See http://www.bioinformatics.org/bioseg/wiki/Main/BiosegUsage for usage
98     examples.
99    
100 kmr 7 The following is a list of the available operators. The [a, b] should be
101 kmr 18 replaced in a statement with 'a..b'::bioseg or bioseg_create(a, b).
102 kmr 1
103     [a, b] && [c, d] Overlaps
104    
105     Returns true if and only if segments [a, b] and [c, d] overlap
106    
107     [a, b] << [c, d] Is left of
108    
109     The left operand, [a, b], occurs entirely to the left of the
110     right operand, [c, d]. It means, [a, b] << [c, d] is true if b
111     < c and false otherwise
112    
113     [a, b] >> [c, d] Is right of
114    
115     [a, b] is occurs entirely to the right of [c, d].
116     [a, b] >> [c, d] is true if a > d and false otherwise
117    
118     [a, b] &< [c, d] Overlaps or is left of
119    
120     This might be better read as "does not extend to right of".
121     It is true when b <= d.
122    
123     [a, b] &> [c, d] Overlaps or is right of
124    
125     This might be better read as "does not extend to left of".
126     It is true when a >= c.
127    
128     [a, b] = [c, d] Same as
129    
130     The segments [a, b] and [c, d] are identical, that is, a == b
131     and c == d
132    
133     [a, b] @> [c, d] Contains
134    
135     The segment [a, b] contains the segment [c, d], that is,
136     a <= c and b >= d
137    
138     [a, b] <@ [c, d] Contained in
139    
140     The segment [a, b] is contained in [c, d], that is,
141     a >= c and b <= d
142    
143     Although the mnemonics of the following operators is questionable, I
144     preserved them to maintain visual consistency with other geometric
145     data types defined in PostgreSQL.
146    
147     Other operators:
148    
149     [a, b] < [c, d] Less than
150     [a, b] > [c, d] Greater than
151    
152     These operators do not make a lot of sense for any practical
153     purpose but sorting. These operators first compare (a) to (c),
154     and if these are equal, compare (b) to (d). That accounts for
155     reasonably good sorting in most cases, which is useful if
156     you want to use ORDER BY with this type
157    
158    
159     NOTE: The performance of an R-tree index can largely depend on the
160     order of input values. It may be very helpful to sort the input table
161     on the BIOSEG column (see the script sort-segments.pl for an example)
162    
163    
164     INDEXES
165     =======
166    
167     A GiST index can created for bioseg columns that will greatly speed up
168     overlaps and contains queries. For example:
169    
170     CREATE TABLE tt (range bioseg, id integer);
171     CREATE INDEX tt_range_idx ON tt USING gist (range);
172    
173    
174     INTERBASE COORDINATES
175     =====================
176    
177     The standard bioseg type uses the common convention of numbering the bases
178     starting at 1. If you wish to use "interbase" coordinates (also known as "0
179     based" or "half-open intervals") run the build with INTERBASE_COORDS defined
180     in make, ie.:
181    
182     make INTERBASE_COORDS=t
183     make install INTERBASE_COORDS=t
184    
185     This will compile and install the implementation for the "bioseg0" type.
186 kmr 22 The "0" in the name is a mnemonic for "0-based".
187 kmr 1
188 kmr 22 Then read "bioseg0.sql" into your database:
189 kmr 5 psql -d databasename < bioseg0.sql
190 kmr 22 to install the type.
191 kmr 1
192 kmr 22 The bioseg and bioseg0 types can be mixed in the same database.
193    
194 kmr 1 Note
195     ----
196     In the interbase system '1..10'::bioseg0 and '10..20'::bioseg0 don't overlap,
197     whereas in the 1-based system '1..10'::bioseg and '10..20'::bioseg have a one
198     base overlap. Also note that the length of '1..10'::bioseg0 is 9, whereas the
199     length of '1..10'::bioseg is 10.
200    
201     See:
202     http://www.gmod.org/wiki/index.php/Introduction_to_Chado#Interbase_Coordinates
203     for a longer discussion of the differences between the coordinate systems.
204    
205    
206     TESTS
207     =====
208    
209     The installation of bioseg can be checked by running:
210    
211     make installcheck
212    
213    
214     CREDITS
215     =======
216    
217     Note from the author: Most of the code and all of the hard work needed to
218     implement BIOSEG was by Gene Selkov, Jr, author of the SEG type (contrib/seg
219 kmr 7 in the PostgreSQL source). All bugs are due to me (kmr).
220 kmr 1
221    
222 kmr 23 THANKS
223     ======
224    
225     Thanks to bioinformatics.org for hosting the project.
226    
227    
228 kmr 1 AUTHOR
229     ======
230    
231     Kim Rutherford <kmr@flymine.org>
232 kmr 7
233     SEG code by Gene Selkov, Jr.