Tuesday, October 27, 2015

Secure Copy Protocol (SCP) and line cards powered down


                       How many times we have seen modules on a switch going down after a reboot. This can be due to couple of reasons. In this article I'll talk about module being  powered down due to SCP communication failures.

SCP - Secure Copy Protocol is used for communication between switch processor and line cards through the Ethernet out of band channel. EOBC is a bus on the chassis and line cards communicate with supervisor only through this bus.On Cisco 65xx series switches has a single EOBC and operates in half duplex.

Whenever you see a module powered  down, check for a SCP or keep-alive polling failures. This indicates issue in communication between supervisor and line cards.

Error message:
 Oct 21 09:43:44.121 utc: %ONLINE-SP-6-REGN_TIMER: Module 3, Proc. 0. Failed to bring online because of registration timer event
Oct 21 09:43:44.121 utc: %C6KPWR-SP-4-DISABLED: power to module in slot 3 set off (Module  Failed SCP dnld)
 
Oct 24 09:43:44.121 utc:%C6KPWR-SP-4-DISABLED: power to module in slot 2 set off (Module not responding to Keep Alive polling)

Troubleshooting:

1. Check SCP counters. If counters for one's marked in bold increase then congestion in EOBC.


Switch1#remote command switch show scp counters
received packets            = 6398025
transmitted packets         = 1801147
retransmitted packets       = 110
fast retransmitted packets  = 0
loop back packets           = 4956282
transmit failures           = 0
recv pkts not for me        = 0
recv pkts to dead process   = 0
recv pkts not enqueuable    = 0
response has wrong opcode   = 0
response has wrong seqnum   = 0
response is not an ack      = 0
response is too big         = 38975
received expedited packets  = 0
transmitted expedited pkts  = 0

2.Check per-module SCP receive/transmit counters, and  incrementing SCP retries.

Switch1#remote command switch show scp status
Rx 6411004,  Tx 1804882,  scp_my_addr 0x4
Id Sap      Channel name    current/peak/retry/dropped/total  time(queue/process/ack)
-- ---- ------------------- --------------------------------  ----------------------
0  11   SCP Unsolicited:11      0/    0/    0/      0/    0      0/   0/   0
1  20   SCP Unsolicited:20      0/    0/    0/      0/    0      0/   0/   0
2  0    SCP Unsolicited:0       0/    3/    0/      0/2447294      0/   0/8244
3  2    SCP Unsolicited:2       0/    4/    0/      0/2516140      0/   0/   0
4  21   SCP Unsolicited:21      0/    0/    0/      0/    0      0/   0/   0
5  16   SCP Unsolicited:16      0/    0/    0/      0/    0      0/   0/   0
6  1    SCP Unsolicited:1       0/    4/    0/      0/18962      0/   0/ 236
7  18   SCP Unsolicited:18      0/    0/    0/      0/    0      0/   0/   0
8  17   SCP Unsolicited:17      0/    0/    0/      0/    0      0/   0/   0
9  33   SCP async: LCP#5        0/   39/    0/      0/652887    152/  40/   8
10 32   SCP async: LCP#1        0/  150/    0/      0/128237    456/ 232/ 228
11 36   SCP async: LCP#8        0/  150/    0/      0/99088    444/ 228/ 228
12 35   SCP async: LCP#9        0/  150/    0/      0/98919    816/ 228/ 228
13 37   SCP async: LCP#2        0/  150/    0/      0/126100    828/ 228/ 228
14 41   SCP async: LCP#7        0/   17/    0/      0/86316    204/ 228/ 228

3. SCP ping from supervisor to module

Switch1#remote command switch test scp ping 3
pinging addr 3(0x3)
assigned sap 0x28
no response from addr 3(0x3)     //communication between supervisor and line card is having issue

Switch1#remote command switch test scp ping 1
pinging addr 1(0x1)
assigned sap 0x28
addr 1(0x1) is alive                 //communication between supervisor and line card is good


4. Change diagnostic level to complete and reseat the module

Switch1(config)#diagnostic level complete

Switch1#show diagnostic result module 2 | inc Diagnostic
  Overall Diagnostic Result for Module 2 : PASS
  Diagnostic level at card bootup: complete


 



Further reading:
EOBC interface





No comments:

Post a Comment