Use Cases
Software Products MSPs Schools Development & Marketing DevOps Agencies Help Desk
 
Internet Status Blog Pricing Log In Try IsDown for free now

Outage in New Zealand Internet Exchange

[OUTAGE] AKL-IX | Layer 2 reachability

Resolved Minor
April 03, 2023 - Started almost 3 years ago
Official incident page

Incident Report

We are currently investigating reports of partial breaks in layer 2 communications to peers between MDR and Datacentre220.

Need to monitor New Zealand Internet Exchange outages?

  • Monitor all your external dependencies in one place
  • Get instant alerts when outages are detected
  • Be the first to know if service is down
  • Show real-time status on private or public status page
  • Keep your team informed
Latest Updates ( sorted recent to last )
RESOLVED almost 3 years ago - at 04/03/2023 07:17AM

The incident has been resolved, however the root cause is still under investigation, if any impactful or hazard works are required for the permanent fix; notices will be sent out.

Today one of our redundant direct point to point paths between MDR to DataCentre220 was taken offline due to an unrelated layer 1 fault, this occurs from time to time and normally never an issue nor communicated to peers as we have secondary paths ready to take over in the event of failure. OTDRs were conducted and work was underway fixing the issue which took time. We shut this path down when disruptive works were required at 10:42am (NZST) and our RSVP-TE secondary LSP kicked into gear shifting that paths traffic between MDR and DataCentre220 via VDC Albany (As expected). A short time later some peers advised of reachability issues between the 2 sites of bilateral sessions dropping and ARP failing. At this time the AKL-IX layer 2 fabric was checked and no immediate issue stood out, we could not replicate the issues experienced and the signalled RSVP-TE LSP was performing as expected. Additional requests were made to specific peers to identify the issue in greater detail. With switch control planes reporting all in order but these peers clearly having issues, the failed over LSP was removed to force traffic over the remaining site to site path to no avail, thankfully the original layer 1 fault was resolved and the path and accompanying LSPs was brought back up at ~2:03pm (NZST) which resolved the issue. So far this is isolated to a one-way MAC learning event from our switches between pe3.akl1 (DataCentre220) and pe2.akl3 (MDR) - whereby ARP requests originating from MDR toward pe3.akl1 at DataCentre220 were being dropped. Apologies for the inconvenience caused by this outage, were working hard on ensuring this won't occur again.

INVESTIGATING almost 3 years ago - at 04/03/2023 02:28AM

A fix has been implemented as of 2:03pm (NZST). Impacts were seen between peers on pe2.akl3 at MDR and pe3.akl1 at Datacentre220 with the root cause still under analysis.

INVESTIGATING almost 3 years ago - at 04/03/2023 12:41AM

We are currently investigating reports of partial breaks in layer 2 communications to peers between MDR and Datacentre220.

The Status Page Aggregator with Early Outage Detection

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 5850 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook