[Owl-devel] Features

Thu Mar 18 19:27:18 EDT 2010

Hi Jose,

thanks for making those changes. It's really getting there that OWL
will be the better Biojava :) As for your questions, I think I can
answer most of them, just I can't look at the code at the moment so
lets see:

> I'm trying to get Uniprot mapping from the data file provided by the SIFTS
> project (http://www.ebi.ac.uk/msd/sifts/), so now I need to use the
> UniprotFeature class.

UniprotFeature is currently only for sequence features annotated in
Uniprot such as active sites, posttranslational modifications etc.
These are parsed by my UniprotConnection class (using the uniprotjapi)
and returned as UniprotFeatures. For Pdb structures associated with a
Uniprot entry, I don't use the UniprotFeature class, but now thinking
about it, it would also make sense to implement this as a feature.
Instead, I'm using a very simple SiftsConnection class, which parses
SIFTS (either locally or from the ftp site). The thing is that after
taking the data from SIFTS (which gives you start and end residues in
Uniprot and Pdb entry) you still need to do an alignment to get the
actual residue mapping because there may still be gaps and/or
mutations.
Both UniprotConnection and SiftsConnection are not checked in yet but
I can do that hopefully tomorrow.

> - I'm not sure what the fields in the UniprotFeature are. What's the
> uniprotTypeName and what's the description?

As mentioned, UniprotFeature stores sequence annotations as described
in http://www.uniprot.org/manual/sequence_annotation.
uniprotTypeName is the category of annotation, e.g. Modified residue
(actually the abbreviation returned by the uniprotjapi, i.e. MOD_RES)
and description is the actual annotation, e.g. Phosphorylation.

> - How are you doing the mapping between Uniprot and pdb? Maybe this is
> provided by the Uniprot API and you don't need to be doing that kind of
> thing? The thing is the SIFTS file has the mapping so now I'm trying to see
> how to design the flow of things: where and how to store the mapping and so
> on.

see above

> - How do you go about storing the Collection of features belonging to a
> certain structure/sequence. I can see that the HasFeatures interface intends
> to help there. Have you already implemented something with it? That'd be
> nice to have as an example.

I'm using a class 'Gene' which implements HasFeatures. I guess we
could make the classes
Pdb and/or Sequence to implement the HasFeatures interface as examples.
For the moment, I'll send you my Gene class for reference.

> By the way I've created now the owl.runners package and put some new runner
> classes there that I extracted from the Pdb class (e.g. dssp runner). Also
> I've added some new stuff to the owl.connections package, again things
> extracted from the Pdb class. When I have some time I will move more stuff
> into owl.runners.

Great.

> While doing that I also realised that the tests we have are hard-coded to
> work only over there. So I've created a "owl_tests_path.dat" file that
> contains the paths for external programs needed to run the tests, at the
> moment only tests.proteinstructure.PdbTest uses it. I'm already thinking in
> redoing that and have something that is more general for the whole OWL
> library, some sort of global config file.

Sounds good. There are actually quite a few hard coded references to
executables,
data files etc. They should be all in some config file. Maybe we
should look for some
best practice how to implement this in other projects.

Hope this makes things a little bit clearer. Otherwise just ask again.

Henning