[Genquire-dev] Re: mapping ensembl to Genquire (fwd)

Tue, 6 Nov 2001 12:59:49 -0800 (PST)

On Tue, 6 Nov 2001, Ewan Birney wrote:

> 
> David - 
> 
> 
> Sounds like a great project, and if you want to code, noone is stopping
> you that's for sure!
> 
The only thing to fear is fear itself.

I'm not sure why that quote came to mind :)

> 
> ;) I guess we are talking about Genquire for browsing, not for write back
> quite yet, but I am happy to get into write back. 
> 
> 
Yeah, that will come.  For now, I just want to see stuff.

> 
> 
> I suspect as I know the current Ensembl API well and the Bioperl
> interfaces perfectly I would be the man to help out here.
> 
> 

Hopefully I don't add too much to your plate.

> 
> Big question for you 
> 
> 
>    how do you handle scrolling (or imagine handling scrolling) across v.
> large regions of DNA - ie, you have 250MB of chr1. Do you want to
> 

We handled that in Arabidopsis with a contig-based chromosome viewer 
screen, kind of like DAS entry points.  Then the easiest way into the 
genome was to pick your chromosome, then pick the contig of interest on 
that chromosome.  100 Kb is no big deal for features, and bigger than that 
you are flying so high above the data that individual features are 
irrelevant.  
This approach is not quite so relevant once we think in terms of whole 
chromosomes.  I think our approach will have to be to use the GUI to 
encourage people to download smaller chunks at a time, and then to scroll 
chunk by chunk using the chromosome-level window.

> 
>    (i) pull out a sequence of the whole thing and then make calls like
> 
>         get_SeqFeatures_range(10000000,20000000);
> 
>         with the returned features starting at 100000000
> 
I don't like this, but it could be done.

> 
>    (ii) pull out a sequence between 1000000,2000000 and then make calls 
> like
> 
>         get_SeqFeatures();
> 
>         with the returned features starting at 1
> 
> 
This is the way I think about things, and the way we've done it in 
Arabidopsis.

> Remember we are talking millions of features across chr1, so pulling them
> all out into memory is not going to happen!
> 
No, we want to lazy load as much as possible, and encourage the user to 
restrict the length of sequence downloaded to something reasonable.

> 
> Efficiently caching and managing the memory for the scrolling seems to be
> where alot of "magic" has to happen for these sorts of browsers.
> 
Maybe we're punting by using a higher-level screen.  Mark, what do you 
think?

> 
> I/We/Ensembl can accommodate both ways. (ii) is currently easier with the
> current API (i) will be easier with a future API and so I'd love to see
> how easy it is to adapt between the two things.
> 
> 
We should be able to accomodate (i), but (ii) is a more natural fit.  The 
problem is the DAS entry points list.  In Arabidopsis, the list of contigs 
available from TIGR was the natural set to work with.  They are almost all 
around 100 Kb, so loading them up was trivial.  Anything bigger than that 
will take some work.

> 
> 
> Secondly - How do you want the GO things attached to genes (DBxRefs?) and
> do you want to reuse all the lovely GeneStructureI stuff inside
> Bioperl? (I presume yes). Should GeneStructureI also have-a
> AnnotationCollectionI (talking bioperl) or should we hook it up someway
> else?
> 
These are write-back questions, correct?  Mark and I stored GO things 
somewhat crudely and directly inside our TagValue table, using a small 
hack.  I'm not sure how we would want to handle this.

We have had discussions about where annotations belong, on the gene or on 
the transcript.  The Genquire annotation code is part of 
GQ::Server::GenericFeature, so it can hang off of Genes, Transcripts, or 
Features (Exons, etc.).

Does the GeneStructureI map to Ensembl adaptors okay?  Genquire implements 
all of those interfaces as well, so our business objects look like 
GeneStructureI objects.  They just have a 'context', which is their 
persistence hook (DbObj was too ugly), from which they receive an 
appropriate adaptor, which is called to do persistence-y type things.

> 
> I forsee a genquire-ensembl-bridge cvs repository existing somewhere out
> there....
> 
It will probably be a fairly major sub-project within Genquire, so it can 
hang out at bioinformatics.org, or whatever.  How well does mixing cvs 
roots work in a single installation?  Namespace shouldn't be too hard, 
since genquire is all in GQ::Server or GQ::Client.  But can I keep a 
sub-directory (GQ::Server::Ensembl) stored in a different cvs root?  I'm 
sure you deal with that, with all the different projects on the go at one 
time in your world :)

Thanks for responding.

BTW, it looks like I'm going to be spending some quality commuting time on 
a train here in California.  I look forward to some Ewan-ish outbursts in 
my future!

Dave