[Biophp-dev] Re: New question from the Newbie

S Clark biophp-dev@bioinformatics.org
Mon, 22 Mar 2004 10:57:41 -0700


=2D----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I'm not Nico, but...

The swissprot.inc.php class handles the 'loading the data into the array'
internally.  If you're reading from a file, just pass the filename
(or the file handle, if you've already opened the file at this point)
to the class, like:

$importer =3D new=20
parse_swissprot("/home/frederic/swissprot/swissprot-data.swp");

See the setSource() and readfromFile() methods for the actual steps
that this parser takes to pull the lines of data into the internal
array...

(See also the readRecord() 'wrapper' method that decides what methods to
call based on whether the data given to swissprot parser was a pre-read
string of text or a file)

I think most of the parsers work this way - if given a file rather than
text data, the parser will read and process one record at a time from the
file rather than loading the entire file into memory, hence the need for
a branch in the code between file reading and text reading.  This is just
to make it possible to read realy big files (think the extreme example of=20
someone wanting to download the entire Genbank database and read through it=
,=20
saving data from only certain types of records that match - who'd want to=20
load 3GB of text into their system's memory before they could start parsing=
?)=20
but doing this isn't absolutely necessary (and in some cases is impossible =
=2D=20
clustal files have the sequence data interleaved, so you HAVE to read the
entire file into memory before you can get any of the complete sequences
parsed from them.).

If locuslink files are never going to be very large, it's probably easiest
just to go ahead and 'internally' have the locuslink parser read the entire
file into memory and just work directly from the text.  (See the clustal
parser for an example of this).

Sean

On Monday 22 March 2004 09:53 am, Frederic.Fleche@aventis.com wrote:
> Hi Nico,
>
> The line was the #166 of the file parsers/swissprot_inc.php that you talk
> about on your 19 March e-mail.
> So if $sourcelines is an array, I understand how that loops works but I
> don't understand where is the step that put all my swissprot file (cause I
> use a file) in this array. So if you could tell me the file and the line =
of
> this step it would be great.
>
> Thanks a lot.
>
> Fred
=2D----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.9.5 (GNU/Linux)

iD8DBQFAXykYJ6yQLhNTzSkRAu2uAJ9pcUq7NJYuBv5HiptdgEz0jvTNsACeJQwB
im16tKct0x72kL6hKASD5II=3D
=3Dl+4z
=2D----END PGP SIGNATURE-----