Outage in Secureworks

Red Cloak Ingest Delay In All Environments

Resolved Minor
October 18, 2023 - Started over 1 year ago - Lasted 2 days
Official incident page

Need to monitor Secureworks outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including Secureworks, and never miss an outage again.
Start Free Trial

Outage Details

We have identified the issue and are currently monitoring Red Cloak ingest delay in all environments.
Latest Updates ( sorted recent to last )
RESOLVED over 1 year ago - at 10/20/2023 06:43PM

The delayed ingest experienced earlier has seen significant recovery with the full restoration of the AWS S3 Cross Region Replication (CRR) service. Additionally, we have successfully deployed a new Taegis subsystem to handle cross region replication independently, eliminating our dependency on AWS CRR. The majority of downstream Taegis subsystems have managed to catch up with the backlog with details on the remaining subsystems and their estimated recovery times below:

US-1 (Charlie):
* Red Cloak Agent's Event Tracing for Windows reassembly into full ScriptBlock events will likely take another 8 hours to catch up.
* Event Filter (Taegis Watchlist) alerting for Process events will likely take another 8 hours to catch up.
* Tactic Graphs requiring Netflow events are expected to recover overnight.
* Raw Event Search for Netflow events is expected to catch up in 6 hours, and Auth events in 3 hours.
* Process Tree creation is expected to fully recover in 4 hours.

US-2 (Delta):
* Red Cloak Agent's Event Tracing for Windows reassembly into full ScriptBlock events will likely take another hour to catch up.
* Event Filter (Taegis Watchlist) alerting for Process events will likely take another 8 hours to catch up.

We appreciate the patience and cooperation of all stakeholders during this period. The newly deployed Taegis subsystem not only resolves the current issue but also strengthens our infrastructure against similar incidents in the future.

IDENTIFIED over 1 year ago - at 10/20/2023 01:37PM

AWS S3 Replication has now fully recovered and AWS's backlog has been eliminated. We are now working through telemetry backlog in Taegis. We cannot yet provide an estimate for when this backlog will be eliminated, but will as soon as we can.

In addition, the code change that was deployed yesterday is continuing to process real time telemetry without relying on S3 Replication.

We will provide another update no later than 16:00UTC.

IDENTIFIED over 1 year ago - at 10/20/2023 02:25AM

AWS S3 Replication has continued to recover, and we are now ingesting backlogged telemetry. Because the backlog exists on AWS, we cannot yet provide an accurate estimate for how long it will take to work through; AWS estimates that their backlog will clear in approximately 12 hours. We will provide a better estimate as soon as it is possible.

We have also deployed a fix to ingest new telemetry without relying on S3 replication. Thus, current telemetry is now being processed alongside older, backlogged telemetry.

We will provide another update no later than 13:00UTC.

IDENTIFIED over 1 year ago - at 10/19/2023 07:52PM

AWS S3 Replication has continued to recover, but is still not fully healthy.

At this time we are preparing to deploy a fix that should mitigate this issue. We will share more details about this fix in the next update.

We will provide another update no later than 23:00UTC.

IDENTIFIED over 1 year ago - at 10/19/2023 04:57PM

We are continuing to monitor the situation. AWS S3 Replication has continued to recover, but is not fully healthy yet. As a reminder, please visit https://health.aws.amazon.com/health/status for their latest status.

We are also continuing to explore possible alternative solutions that will allow us to recover quicker, and will provide an update on these efforts as soon as possible.

We will provide another update no later than 20:00UTC.

IDENTIFIED over 1 year ago - at 10/19/2023 02:13PM

We are continuing to monitor the ingest delay.

AWS continues to work towards S3 Replication recovery. We are currently exploring alternative solutions to mitigate the outage.

We will provide another update by 17:00UTC.

IDENTIFIED over 1 year ago - at 10/19/2023 12:04PM

We are continuing to monitor the ingest delay.

At this time, AWS has implemented software fixes and is also performing diagnostic checks to ensure that recovery is continuing. We have not yet seen S3 replication resume; when it does, we will provide a more accurate timeline for when ingest delay will be fully resolved.

For the latest from AWS, please visit the following link: https://health.aws.amazon.com/health/status.

IDENTIFIED over 1 year ago - at 10/19/2023 02:48AM

We are continuing to monitor the situation.

Once S3 replication has recovered we will be able to provide a more accurate timeline for when ingest delay will be fully resolved.

IDENTIFIED over 1 year ago - at 10/18/2023 11:53PM

The Red Cloak ingest delay is due to an issue on Amazon affecting all AWS S3 replication. Once replication recovers, AWS will start processing the delayed replication request backlog.

For the most update to date information from AWS, please visit the following link: https://health.aws.amazon.com/health/status.

IDENTIFIED over 1 year ago - at 10/18/2023 01:30PM

We have identified the issue and are currently monitoring Red Cloak ingest delay in all environments.

Be the First to Know When Vendors Go Down

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 4400 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook