Troubleshooting network strangeness

Forum: LXer Meta ForumTotal Replies: 15
Author Content
techiem2

Oct 07, 2011
8:21 PM EDT
I got called this morning and was told that the school network wasn't working (couldn't load web pages/email/etc.).

I've been looking around (remotely, no time to go in and mess with stuff yet):

The router/vm server has 2 interfaces.

eth0 is the internet, eth1 is the bridge/lan/etc. interface.

eth0 seems to be fine, I can connect in, ping out, etc.

eth1 seems to be doing strange things.

When testing from being logged into the router:

I can ping the lan address of the interface fine.

When I ping one of the virtual machines, there is a good chance of packet loss.

When I ping the fileserver (separate machine) there is packet loss.

dmesg has quite a few entries like:

printk: 1543 messages suppressed.

eth1: received packet with own address as source address

Suggestions/Ideas?

Everything was working fine yesterday.

tuxchick

Oct 07, 2011
8:32 PM EDT
Replace the NIC and see what happens. They do fail sometimes, and this sounds like a dying NIC.
techiem2

Oct 07, 2011
8:39 PM EDT
Yeah...that's kinda what I've been leaning towards...what with the packet loss one most ping tests...
cr

Oct 07, 2011
8:48 PM EDT
Suggestions:

1.You say it's a vm server, so I assume it's a linux box, not just a WRT-type router. Sounds like you can ssh into that box to see what's up. Can you install iptraf on it? I learn a lot from having iptraf on my firewall/gateway, sometime more than I learn from having etherape on my network monitor.

2. Pull a copy of the routing table. 3. Pull a copy of the current firewall rules (in my case "ipchains -L" -- yeah, I'm running old stuff). 4. Pull a copy of "ifconfig -a" ...By 'pull', I mean, copy all that data to text files on a local machine for comparison study, because my experience is that, if it's volatile, I'm hurried and I overlook stuff.

5. If anybody can get at that equipment, have them push eth1's RJ45 firmly into its socket several times, just in case some oxidation has built up on the pins. They're supposed to be self-wiping but that's only at poke-home, so poke em again.

6. Has there been electrical storm activity there? If so, consider that eth1 might be physically degraded. Semiconductors don't always fail cleanly; a momentary marginal voltage overload can do local heating but not catastrophic punchthrough; when it's an analog circuit, it can take calculating out expected bias voltages on the schematic to find the fault node, it can be that subtle. In digital, it's often weird waveforms that give it away. You don't have to diagnose to that level, only to board-level; I'm just saying that permanent damage can be subtle.
techiem2

Oct 07, 2011
8:58 PM EDT
1. Yeah, I was looking at IPTRAF. Nothing unusual, mostly lots of netbios-ns requests (probably due to the pdc vm being down due to not being able to boot fully because of the massive packet loss to the fileserver...).

2. Routing table is simple and boring as always (nothing unusual done here, just the lan and the inet).

3. Firewall rules haven't changed. I'm running shorewall and I've checked and did reboot the machine this afternoon just in case something odd was going on with a vm or the firewall or something.

4. I looked at ifconfig and it all looks normal, and everything seems to be working..except for that packet loss thing. :P

5/6. I was thinking of this too. I don't recall there being any odd electrical activity last night, and the whole rack is on a UPS, but was thinking full powerdown and replugging the connections might be worth a shot anyway just to be sure (maybe unplug/replug all the connections into the switches as well).

I'll have a little time when I get home from work in the morning to try a couple things before heading to bed.

Of course this all happens today, when I have my first night of 4 in a row working....

I just hope there's an open slot and I have a low profile bracket NIC around if that's the problem...iirc the server is a 1u or 2u and we're using both onboard nics ....

cr

Oct 07, 2011
9:12 PM EDT
The cable at eth1 plugs into what at the far end? It's equally suspect.
techiem2

Oct 07, 2011
9:59 PM EDT
One of the switches.
tracyanne

Oct 07, 2011
10:02 PM EDT
I've had switches fail more often than NICs, so that's where I would start.
cr

Oct 08, 2011
2:32 AM EDT
If the switch has a wall-wart power supply, replug it at both ends. And see what else is plugged into the power strip or wall socket with it: if a fluorescent lamp or CFL is stuttering and about to fail, that's good for powerline glitches which cheap PSes might not adequately filter out (FCC's part-15 subpart-J, if it's still called that, only covers radiated frequencies).
techiem2

Oct 08, 2011
8:02 AM EDT
Crisis averted! One of my friends was looking at a packet capture with me and noticed lots of duplicate packets and told me to look for a loop or some such. I determined it was something in the second switch and started systematically unplugging the devices from it and found it was a port in the lab. Apparently a certain person wasn't paying very close attention when they were plugging in the netbooks wired to finish getting setup. I found one cat5 hooked between two of the netbooks, and another cat5 hooked between two of the wall jacks. Oops. :P

cr

Oct 08, 2011
10:22 AM EDT
External loopback? Kinky...
gus3

Oct 08, 2011
11:04 AM EDT
Which is why Radia Perlman says "always include a TTL in your network protocol."
cr

Oct 08, 2011
11:31 AM EDT
Quoting: Crisis averted!


Good, because my next suggestion would be to test the output of the wall-wart PS for voltage, and that involves either finding a replacement to swap in or pulling the cover off to get at the regulated voltage under load to see if it's dipping out of regulation, and that's a pain. (Drooping power rails can crosstalk like crazy, and yes, I've had wall warts fail like that.)
techiem2

Oct 08, 2011
5:22 PM EDT
Lol. The main switches don't use warts, though some of the ancient cheapos elsewhere do. One of these days I'll get them all replaced with gigabit switches...
tuxchick

Oct 08, 2011
8:44 PM EDT
I so wanted it to be a bad NIC. Oh well. Good job!
techiem2

Oct 08, 2011
8:54 PM EDT
Lol I didn't. It's a 2u server using dual onboard NICs. :P

Thanks though lol.

It's been blogged now of course lol. Though the perpetrator shall remain nameless (though I have no doubt the person will hear about it at the school lol).

Posting in this forum is limited to members of the group: [Editors, MEMBERS, SITEADMINS.]

Becoming a member of LXer is easy and free. Join Us!