[BiO BB] problem with LWP::Simple
DMUTANTZ at aol.com
DMUTANTZ at aol.com
Sun Jun 26 13:27:35 EDT 2005
Hello
I would be garteful for any help with this.
I want to pull an id number (UniProt protein accession number) from a file
using a regex. This works OK.
I then wanted to use the number as part of a url to pull the relevant page
back , so I could parse some information about the protein from the page.
The code is very basic.
My perl script:
#!/usr/bin/perl
# A script to pull out an id number from a file using a regex.
#The id number(s0 are put into an array @accnumber.
#The file I read in is html_test2.txt (attached to this mail).
#Then use the id number as part of a url to get and store a webpage.
#In this case to simplify things I just want to take the first
#element of the @accnumber array and use that in the url
use LWP::Simple;
$a = 0;
#ask for the file name
print "please enter file name", "\n";
#open and read the file
$filename1 = <>;
open fileone, "$filename1"
or die;
while (!eof(fileone))
{
my $line = <fileone>;
if ( $line =~/UNIPROT:?\w+\s(\w{6})\s/)
{
@accnumber[$a]= $1."\n";
$a++;
}
}
close fileone;
$query_number = @accnumber[0];
#as a sanity check I print the number to STDOUT
print $query_number;
#I call the subroutine to return the webpage
get_page($query_number);
sub get_page {
my $address = $_[0];
my $url = 'http://www.ebi.uniprot.org/uniprot-srv/xmlView.do?proteinId='
.$address
.'_ORYSA&pager.offset=0';
my $html_file = 'page.html';
my $status = getstore($url, $html_file);
die "No _URL::Error_ (:Error) " unless is_success($status);
}
exit;
and the text file I parse to get my regex:
BLASTP 2.0MP-WashU [13-Dec-2004] [decunix5.0a-ev6-IP32LF64
2004-12-15T17:03:39]
Copyright (C) 1996-2004 Washington University, Saint Louis, Missouri USA.
All Rights Reserved.
Reference: Gish, W. (1996-2004) _http://blast.wustl.edu_
(http://blast.wustl.edu)
Query= 24061 17154533 emb|CAC80823.1 (AJ251791) putative IAA1 protein
[Oryza
sativa] 1e-130 235 236 99.5% top hit
(237 letters; record 1)
Database: uniprot
1,880,849 sequences; 604,459,357 total letters.
Searching....10....20....30....40....50....60....70....80....90....100% done
Smallest
Sum
High Probability
Sequences producing High-scoring Segment Pairs: Score P(N)
N
UNIPROT:Q75KX3_ORYSA Q84PD9 Putative auxin-responsive pro... 1203 1.2e-121
1
All Rights Reserved.
Reference: Gish, W. (1996-2004)
Thanks for any help.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20050626/0c3ee633/attachment.html>
More information about the BBB
mailing list