[Bioclusters] Current Dell Powerconnect 3048 switches can fail
in some conditio ns under high NFS traffic loads
chris dagdigian
bioclusters@bioinformatics.org
Thu, 21 Nov 2002 14:00:46 -0500
Responding to 2 messages at once here...
This response may not be all that helpful -
We had problems with the 3048 switch locking up under heavy NFSv3 load
as I wrote in my August bioclusters post ( post archived at
https://bioinformatics.org/pipermail/bioclusters/2002-August/000341.html)
I was able to get in touch with both the Powerconnect product manager
and an engineer who was working in the problem. Both reported that the
problem had been seen before at a "huge customer" and that it was being
taken seriously.
My guess is that this was the switch that Dell used for the massive
4,000 CPU cluster at SUNY Buffalo. Given the PR they are trying to
squeeze from that project I'd say that any switch problems would get the
highest level of attention. Dell told me that the powerconnect people
were working with the internal Dell high performance clustering group to
try to recreate the failure in the lab. Last I heard they had been
partially successfull at getting the switch to lock up in their lab.
We (Bioteam.net) were under heavy deadline pressure for the client
cluster we were building. We tried one firmware fix that Dell provided
directly to us and when that failed we met with our customer and decided
that we didn't have the time to wait for a Dell fix. The customer ended
up taking the 3048's out of the cluster and deployed them elsewhere in
the company where they have been working fine ever since in a less
demanding network environment(no crashes).
We replaced the 3048's with the Dell Powerconnect 3248's which were more
expensive and had all these layer3/4 features that we didn't need or
want. But-- the 3248 runs totally different firmware and a totally
different commandline interface.
Since replacing the 3048's with a pair of trunked 3248's we've had zero
problems with the switches in the cluster. In fact the cluster has been
working wonderfully and as of today has an uptime figure of 44 days
straight which is cool given the somewhat odd and experimental stuff we
have been doing to that poor system.
I'd like to assume that the 3048 problem has long since been fixed.
Keith and Tony -- if you are still having problems I can contact you
directly if you like and pass along the email addresses and name of the
powerconnect product manager. I'd guess that he would be able to talk
about the current state of the 3048.
Regards,
Chris
www.BioTeam.net
> Keith Maples wrote:
>> We just purchased a Powerconnect 3048. had the same problems you list
>> in your report. They had us do a firmware update on the switch. I
>> checked out the date of the firmware and it was in June which is some
>> time before you wrote the article on these switches. Does this mean
>> that the firmware didn't work for you on the switch. I hate to bother
>> you but we just got the switch and we'll return it if it's still having
>> issues with Heavy NFS traffic.
>>
>> We have a Macintosh Server OS X 10.1. uplinked to it via Gigabit and
>> assumed it was the culprit. Any input would be GREATLY appreciated.
>>
>> Thank you very much...
Meoni, Tony wrote:
> Chris Dagdigian,
>
> Read your warning tonight on bioinformatics.org, but unfortunately to
> late. We have these lemons and they have been trouble. Loaded a fix
> 5.2.8 on them and they worked for about 17 days and then they all 6
> crashed. Separated them from a stack and couldn't ping between two pc's
> on any of the stacks. Can you shed any other light on this? Have the
> 3248's still been working okay? Maybe I'll ask Dell for an exchange.
>
> Thanks for any help or info you can give.
>
> ____________
> *Anthony Meoni*
> *_tmeoni@aahp.org_*
> ___**__**__**_