[Owl-devel] Uniprot entries and mapping to Pdb structures

Tue Apr 27 12:14:37 EDT 2010

Hi Jose,

> Sorry I should have mentioned some of the things I've been doing.

No need to ask for permission :) The worst that can happen is that I
will be asking questions like this.
I'm actually really happy with the development as it works now.
Sometimes cool features will
just magically appear in the repository. I guess this is called Open Source :)

> Could you check if you are using a 1.6 compliant level for the compilation?

Right, mine was set to 1.5. I changed it and it's now rebuilding the
workspace. Hopepully this will solve the problems.

> Other than that it could only be the version of eclipse (I'm using 3.5.2). Let me know what you find.

3.4.2 here. Actually eclipse is really buggy here. The code completion
feature is broken (gives a null pointer exception, really weird) and
it frequently crashes because of low memory. I would really like to
reinstall but I'm too afraid of installing all the plugins and stuff
all over again.

> I did already commit your UniprotConnection class with all its dependencies...

Oh, I didn't see. That's great.

> What I didn't do was redirect your structuralimpact code to the new class in owl.

No need. I can do it some time.

> What I'm then doing is using this UniprotHomolog to store sequence and
> related data (at the moment blast hit, taxonomy id, associated embl cds
> sequences, uniprot ids). I need local blast for this because it's a lot of
> queries that I'm doing.

Do you have a script to create the blast db? Could we also commit that to
the scripts dir? It's a pain to figure out how to do it every time.

> Ok here as usual I'm working as I go along... The SiftsConnection class
> does the parsing of the local/remote SIFTS file. Then it stores it all in a
> Map in memory and you can then query it and it returns SiftsFeatures (which
> are actually just mappings of pdb sequences to uniprot sequences,
> essentially the same as your SiftsMappings class). Those you can then add
> to a Pdb (see owl.tests.core.connections.SiftsConnectionTest for an
> example). I'm really not sure whether SiftsFeature is an appropriate name
> for the mappings but after giving it some thought that's the best solution
> I came up with.

First of all, it's really great that the Pdb class now implements HasFeatures.
This will be now our reference implementation since there is no other yet :)
But then, the SiftsFeature, if I understand it correctly, shouldn't
that be a feature of the
UniprotEntry rather than the Pdb object? A feature should be something you can
completely localize in the sequence of its parent. A Uniprot protein
usually has one or more subdomains
for which the structure is known. Ah, now I see, because the mapping
in the pdb file is also not necessarily complete.
Yes, that's true, but I think usually, what's missing is only a
his-tag or something. In most cases I've seen, the
Pdb more or less completely maps to the Uniprot entry. So I'd make the
SiftsFeature a property of the UniprotEntry (which is
yet to be created, hence my previous question, see below). In
practice, as you mentioned, you will actually need to
do an alignment of the pdb to the uniprot sequence. That's why in my
mutanom project, I created a
class Substructure which holds the alignment. Alternatives would be to
add an alignment member to SiftsFeature,
or even to require any feature to always contain a (possibly trivial)
alignment. But maybe this would be a bit
of an overkill for features like mutations which are only one amino
acid long. The nice thing if you have
an alignment is that you can translate between positions in Uniprot
sequence space and Pdb sequence space.
Then you can decide whether e.g. a scop domain or a mutation should be
a feature of the Pdb or the UniprotEntry. Both
is possible and the only difference is the numbering. So it depends on
your application which makes more sense.
BTW, one cool thing about doing all this Feature stuff is that one day
we can write a viewer class that can display
any object implementing the Feature interface along a sequence or in a
structure.
I suddenly feel gravity declining, so I better stop architecture talk...

> Please feel free to restructure/refactor any of this if you
> have clearer ideas about how to do it. (BTW perhaps SiftsFeature should
> belong in owl.core.structure.features instead of owl.core.features)

Right, that's something I forgot to explain. The things in
owl.core.structure.features were supposed to be
something temporary. They are in fact feature-like things which are
not implemented with the
Feature interface but eventually should be. So SiftsFeature should
really stay in owl.core.features
and we should move more things there eventually. Logically, I feel
there is no real difference between
a structural feature and a sequence feature.

>> - Once you have selected the appropriate protein from your
>> UniprotHomologs, do you have a UniprotEntry class which owns the
>> SiftsFeatures? Otherwise shall we create one in owl.core.sequence?

That's what I was thinking about above. Some class which represents a
full protein sequence (e.g. from Uniprot) as you will find it in the genome.
It should implement HasFeatures and may reference the gene sequence
as well. How should we call this? UniprotEntry? or simply 'Protein'?

I hope this all makes sense. Otherwise please overrule me :)

I figure that this should have probably gone to the list. Any
objections to forwarding this conversation
to the list for archiving?

Cheers,
Henning