[Bioclusters] error on qsub/mpirun jobs

Zhiliang Hu zhu at iastate.edu
Tue Sep 9 10:59:40 EDT 2008


Rob,

Yes I can ssh to any node and ssh back to head node or other nodes.

I see in my .ssh folder files:
-rw------- 1 yhu yhu  668 Aug 29  2007 id_dsa
-rw-r--r-- 1 yhu yhu  620 Aug 29  2007 id_dsa.pub
-rw------- 1 yhu yhu 1675 Dec 14  2007 id_rsa
-rw-r--r-- 1 yhu yhu  410 Dec 14  2007 id_rsa.pub

so I suppose rsa/dsa are installed.

One side info -- This qsub/mpi program worked on this machine before.  A couple months ago we had a few bad nodes/disks.  After we brought all nodes back to life, this qsub thing started to have this problem. ("mpirun" on command line works fine though).

Zhiliang


At 01:00 PM 9/8/2008 -0300, Rob Hutten wrote:
>Hi Zhiliang ,
>
>Do you have rsa/dsa authentication in both directions, ie can you ssh
>back to the headnodes from the compute nodes?
>-Rob
>
>
>On Fri, Sep 5, 2008 at 5:55 PM, Zhiliang Hu <zhu at iastate.edu> wrote:
>> I have a a mpiblast job that runs well on command line ("mpirun").
>> But have errors when "qsub" to run:
>>
>> qsub -l nodes=6:ppn=2
>>     -e /path/to/locationA
>>     -o /path/to/locationA
>>     /path/to/program
>>
>> ----------------------------------------------------------
>> Unable to copy file /var/spool/torque/spool/658.nagrp2..ER to
>> hu at hist:/raid/pub/ncbi/blast/www/mpiblast.tmp
>>>>> error from copy
>> Host key verification failed.
>> lost connection
>>>>> end error output
>> Output retained on that host in: /var/spool/torque/undelivered/658.nagrp2..ER
>> ----------------------------------------------------------
>>
>> Note: When manually check, the "retained" file is not there:
>> "/var/spool/torque/undelivered/658.nagrp2..ER"
>>
>> I wonder why "Host key verification failed"?  Could this be the cause?
>> (I can ssh to all nodes no problem)
>>
>> Any hint to look further is appreciated.
>>
>> Zhiliang




More information about the Bioclusters mailing list