[Pipet Devel] python data structure

Konrad Hinsen hinsen at cnrs-orleans.fr
Tue Jan 12 04:04:35 EST 1999


> >From Thomas....
> 
> [Konrad, can you take a shot at this question?]

I'll try...

> What python data type would you recommend for the class representation
> of a nucleotide sequence ? 
> - string, list or array (module) ?
> I am not (yet) familiar with the performance questions of python types, but 
> I got the impression that lists are very slow - and I have no idea how the
> array module is implemented. (btw I used strings in Tcl)

The main question is what operations you want to perform on nucleotide
sequences. Here are some considerations:

- Strings are compact and benefit from a large range of string operations
  (in module "string"). However, elements can only be characters,
  and strings are immutable, i.e. cannot be changed once created.
  So any modification requires constructing a new string. But being
  immutable can be an advantage as well, e.g. you can use strings as
  keys in dictionaries.

- Lists can store any data type, and can be modified in a very general
  way (including insertion of lists etc.), but there are fewer
  operations available on them.

- Tuples are just immutable lists.

- Arrays don't seem to be very useful for non-numerical data, with two
  exceptions: they can most easily be accessed from C modules, and
  they facilitate certain structural operations.

In terms of performance, there is not so much difference for basic
operations (creation, indexing, etc.). The main concern should be to
as many built-in operations as possible for typical manipulations;
any piece of Python code is much slower than a simple call to a
built-in function implemented in C! So the first thing to do is to
find out which operations are to be performed on nucleotide sequences,
and which of them occur most frequently.

Konrad.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hinsen at cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------



More information about the Pipet-Devel mailing list