Outage in Scaleway

[DC5] - Servers down in Room 1

Resolved Major
June 12, 2023 - Started over 2 years ago - Lasted 4 months
Official incident page

Need to monitor Scaleway outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including Scaleway, and never miss an outage again.
Start Free Trial

Outage Details

Pro-6-x and Pro-5-x a DC5 Room 1 have IPMI down. We are currently investigating on this issue.
Components affected
Scaleway DC5
Latest Updates ( sorted recent to last )
RESOLVED almost 2 years ago - at 10/24/2023 04:17PM

New updates will be posted here but issue is currently resorbed.

INVESTIGATING almost 2 years ago - at 10/23/2023 09:56AM

The issue is stabilized but the investigations on previous cases are still in progress with the help of the manufacturer of the servers to understand the root cause.

INVESTIGATING about 2 years ago - at 06/28/2023 10:36AM

This is a very quick follow-up concerning our favourite bizarre production incident.

- Small numbers of servers continue to exhibit errant behaviour for the first time; we continue to replace them as necessary.
- We have sent hardware samples to our supplier for analysis, along with other information such as environmental data, as well as the findings from our own investigations.
- We will communicate the results of this investigation when there’s more to report.

INVESTIGATING over 2 years ago - at 06/21/2023 08:14AM

Most impacted servers have been replaced.
Our teams are working on the last few cases. We have sent collected data to our supplier for analysis.
We are working on reproducing various conditions related to the issue in order to pinpoint the root cause.

We will provide an update no later than Monday 5:00PM UTC+2.

INVESTIGATING over 2 years ago - at 06/16/2023 04:50PM

This is a follow-up regarding the continuing incident affecting a small number of servers at the DC5 data center.

- While this is still an open incident, only a handful of newly-misbehaving servers have been detected since 2023-06-15 (Thursday) at 17:00 UTC+2.

- Remediation actions are on-going and only a small fraction of the total Quanta X10E-9N machines remain down. The newly-racked replacement blades have not exhibited the errant behavior.

- We do not yet know the root cause of the failures; however we continue to work with our hardware supplier and are developing a few hypotheses (which will need to be verified).

INVESTIGATING over 2 years ago - at 06/14/2023 08:27AM

Here is an update on the situation and some information on what happened :

- On the morning (UTC+2) of 2023-06-12 (Monday) an incident was identified. During the week-end, a series of Quanta X10E-9N machines—representing roughly 0.5 % of the total machines in DC5—began to exhibit errant behavior resulting in a complete crash of the machines.
The rising number of failures triggered our incident process on Monday.

- The errant behavior is specific to this type of hardware—and only in DC5. Identical machines in our other data centers are not exhibiting this behavior, nor are any other types of machines.

- Speaking frankly, the behavior is remarkably inconsistent. We have not yet identified a pattern related to physical position, electrical connection, uptime, climate conditions, firmware version, or serial numbers (for example). That said, we were and are engaged in remediation actions, consisting largely of swapping the hard drives from the misbehaving machines into new hardware.

- We are currently working with the hardware supplier to understand the root cause of this issue and expect to have an explanation and action plan for prevention on 2023-06-16 (Friday).

INVESTIGATING over 2 years ago - at 06/13/2023 08:15AM

We are still investigating the anomaly.

We invite you to contact support if you notice an anomaly on your server.
We can then take action on your server.

INVESTIGATING over 2 years ago - at 06/12/2023 04:24PM

Some servers are still down, please get back in touch with our support if your support is still unavailable.

INVESTIGATING over 2 years ago - at 06/12/2023 07:33AM

Pro-6-x and Pro-5-x a DC5 Room 1 have some issues : Not available, IPMI down, unexpected restart..
We are currently investigating on this issue.

INVESTIGATING over 2 years ago - at 06/12/2023 07:33AM

We are continuing to investigate this issue.

INVESTIGATING over 2 years ago - at 06/12/2023 05:53AM

Pro-6-x and Pro-5-x a DC5 Room 1 have IPMI down.
We are currently investigating on this issue.

Be the first to know when Scaleway and other third-party services go down

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 4480 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook