[Bioclusters] error on qsub/mpirun jobs
Zhiliang Hu
zhu at iastate.edu
Tue Sep 9 10:59:40 EDT 2008
Rob,
Yes I can ssh to any node and ssh back to head node or other nodes.
I see in my .ssh folder files:
-rw------- 1 yhu yhu 668 Aug 29 2007 id_dsa
-rw-r--r-- 1 yhu yhu 620 Aug 29 2007 id_dsa.pub
-rw------- 1 yhu yhu 1675 Dec 14 2007 id_rsa
-rw-r--r-- 1 yhu yhu 410 Dec 14 2007 id_rsa.pub
so I suppose rsa/dsa are installed.
One side info -- This qsub/mpi program worked on this machine before. A couple months ago we had a few bad nodes/disks. After we brought all nodes back to life, this qsub thing started to have this problem. ("mpirun" on command line works fine though).
Zhiliang
At 01:00 PM 9/8/2008 -0300, Rob Hutten wrote:
>Hi Zhiliang ,
>
>Do you have rsa/dsa authentication in both directions, ie can you ssh
>back to the headnodes from the compute nodes?
>-Rob
>
>
>On Fri, Sep 5, 2008 at 5:55 PM, Zhiliang Hu <zhu at iastate.edu> wrote:
>> I have a a mpiblast job that runs well on command line ("mpirun").
>> But have errors when "qsub" to run:
>>
>> qsub -l nodes=6:ppn=2
>> -e /path/to/locationA
>> -o /path/to/locationA
>> /path/to/program
>>
>> ----------------------------------------------------------
>> Unable to copy file /var/spool/torque/spool/658.nagrp2..ER to
>> hu at hist:/raid/pub/ncbi/blast/www/mpiblast.tmp
>>>>> error from copy
>> Host key verification failed.
>> lost connection
>>>>> end error output
>> Output retained on that host in: /var/spool/torque/undelivered/658.nagrp2..ER
>> ----------------------------------------------------------
>>
>> Note: When manually check, the "retained" file is not there:
>> "/var/spool/torque/undelivered/658.nagrp2..ER"
>>
>> I wonder why "Host key verification failed"? Could this be the cause?
>> (I can ssh to all nodes no problem)
>>
>> Any hint to look further is appreciated.
>>
>> Zhiliang
More information about the Bioclusters
mailing list