How many times we have seen modules on a switch going down after a reboot. This can be due to couple of reasons. In this article I'll talk about module being powered down due to SCP communication failures.
SCP - Secure Copy Protocol is used for communication between switch processor and line cards through the Ethernet out of band channel. EOBC is a bus on the chassis and line cards communicate with supervisor only through this bus.On Cisco 65xx series switches has a single EOBC and operates in half duplex.
Whenever you see a module powered down, check for a SCP or keep-alive polling failures. This indicates issue in communication between supervisor and line cards.
Error message:
Oct 21 09:43:44.121 utc: %ONLINE-SP-6-REGN_TIMER: Module 3, Proc. 0. Failed to bring online because of registration timer event
Oct 21 09:43:44.121 utc: %C6KPWR-SP-4-DISABLED: power to module in slot 3 set off (Module Failed SCP dnld)
Oct 24 09:43:44.121 utc:%C6KPWR-SP-4-DISABLED: power to module in slot 2 set off (Module not responding to Keep Alive polling)
Troubleshooting:
1. Check SCP counters. If counters for one's marked in bold increase then congestion in EOBC.
received packets = 6398025
transmitted packets = 1801147
retransmitted packets = 110
fast retransmitted packets = 0
loop back packets = 4956282
transmit failures = 0
recv pkts not for me = 0
recv pkts to dead process = 0
recv pkts not enqueuable = 0
response has wrong opcode = 0
response has wrong seqnum = 0
response is not an ack = 0
response is too big = 38975
received expedited packets = 0
transmitted expedited pkts = 0
2.Check per-module SCP receive/transmit counters, and incrementing SCP retries.
Switch1#remote command switch show scp statusRx 6411004, Tx 1804882, scp_my_addr 0x4
Id Sap Channel name current/peak/retry/dropped/total time(queue/process/ack)
-- ---- ------------------- -------------------------------- ----------------------
0 11 SCP Unsolicited:11 0/ 0/ 0/ 0/ 0 0/ 0/ 0
1 20 SCP Unsolicited:20 0/ 0/ 0/ 0/ 0 0/ 0/ 0
2 0 SCP Unsolicited:0 0/ 3/ 0/ 0/2447294 0/ 0/8244
3 2 SCP Unsolicited:2 0/ 4/ 0/ 0/2516140 0/ 0/ 0
4 21 SCP Unsolicited:21 0/ 0/ 0/ 0/ 0 0/ 0/ 0
5 16 SCP Unsolicited:16 0/ 0/ 0/ 0/ 0 0/ 0/ 0
6 1 SCP Unsolicited:1 0/ 4/ 0/ 0/18962 0/ 0/ 236
7 18 SCP Unsolicited:18 0/ 0/ 0/ 0/ 0 0/ 0/ 0
8 17 SCP Unsolicited:17 0/ 0/ 0/ 0/ 0 0/ 0/ 0
9 33 SCP async: LCP#5 0/ 39/ 0/ 0/652887 152/ 40/ 8
10 32 SCP async: LCP#1 0/ 150/ 0/ 0/128237 456/ 232/ 228
11 36 SCP async: LCP#8 0/ 150/ 0/ 0/99088 444/ 228/ 228
12 35 SCP async: LCP#9 0/ 150/ 0/ 0/98919 816/ 228/ 228
13 37 SCP async: LCP#2 0/ 150/ 0/ 0/126100 828/ 228/ 228
14 41 SCP async: LCP#7 0/ 17/ 0/ 0/86316 204/ 228/ 228
3. SCP ping from supervisor to module
Switch1#remote command switch test scp ping 3pinging addr 3(0x3)
assigned sap 0x28
no response from addr 3(0x3) //communication between supervisor and line card is having issue
Switch1#remote command switch test scp ping 1
pinging addr 1(0x1)
assigned sap 0x28
addr 1(0x1) is alive //communication between supervisor and line card is good
4. Change diagnostic level to complete and reseat the module
Switch1(config)#diagnostic level completeSwitch1#show diagnostic result module 2 | inc Diagnostic
Overall Diagnostic Result for Module 2 : PASS
Diagnostic level at card bootup: complete
Further reading:
EOBC interface