AWS - Elevated Replication latency - N. Virginia (18/Oct/23)

UPDATE over 1 year ago - at 10/20/2023 08:30AM

Incoming replication traffic is being processed without delay for Replication Time Control (RTC), Cross-Region Replication, and Same-Region Replication by our replication subsystem, which is operating normally. We are 99% done processing the backlog of delayed requests for Replication Time Control (RTC). We continue to work on the backlog of delayed replication for Cross-Region Replication and Same Region Replication, which is roughly 20% complete. We will provide a further update by Oct 20 3:30 AM PDT.

UPDATE over 1 year ago - at 10/20/2023 06:56AM

After making further modifications to accelerate replication processing, our replication subsystem is now replicating any new incoming traffic without delay for all of our policy-driven replication, which includes RTC, Cross-Region Replication, and Same Region Replication. With the replication subsystem operating normally and processing new replication traffic without delay, we are focusing on clearing the backlog of delayed replication. We will provide a further update by Oct 20 1:30 AM PDT.

UPDATE over 1 year ago - at 10/20/2023 05:45AM

At this point, our recovery operations are focused on processing the remaining replication backlog as our replication subsystem has been operating normally since 3:44 PM PDT. Replication Time Control (RTC) backlogs are almost completely processed. We’re focused on processing the high volume of replication traffic from Cross-Region Replication (CRR), Same Region Replication (SRR), and Replication operations in S3 Batch Operations from buckets in the US-EAST-1 Region. We have put together a plan to carefully accelerate the system redrives of delayed replication. We are in the early phase of the acceleration and will provide a further update by Oct 19 11:45 PM PDT.

UPDATE over 1 year ago - at 10/20/2023 04:54AM

The replication subsystem has been operating normally since 3:44 PM PDT and working at maximum throughput to process the different types of replication traffic and backlog for S3 objects in the US-EAST-1 Region. So far, we have almost fully processed the traffic and backlog for RTC replication requests made for storage in the US-EAST-1 Region. There is less than 10% of the replication traffic remaining that we continue to mitigate. The replication subsystem is also in parallel processing the replication traffic and backlog for Cross-Region Replication (CRR), Same Region Replication (SRR), and Replication operations in S3 Batch Operations. At this time, we believe we will complete processing all replication traffic in approximately next 8 hours, but we are working on mitigations now to accelerate our processing. We will provide a further update by Oct 19 10:45 PM PDT.

UPDATE over 1 year ago - at 10/20/2023 03:15AM

We continue to see recovery for S3 Replication Time Control (RTC). Replication operations in S3 Batch Operations, Cross-Region Replication (CRR) policies and Same Region Replication (SRR) policies are continuing to work through backlogged events. We expect to complete backlog processing in the next 9 hours, although we expect many buckets will recover sooner. We will provide a further update by Oct 19 9:15 PM PDT.

UPDATE over 1 year ago - at 10/20/2023 02:16AM

We continue to see recovery for S3 Replication Time Control (RTC). Replication operations in S3 Batch Operations, Cross-Region Replication (CRR) policies and Same Region Replication (SRR) policies are continuing to work through backlogged events. We expect to complete backlog processing in the next 10 hours, although we expect many buckets will recover sooner. We will provide a further update by Oct 19 8:15 PM PDT.

UPDATE over 1 year ago - at 10/20/2023 01:17AM

We continue to see recovery for S3 Replication Time Control (RTC). Replication operations in S3 Batch Operations, Cross-Region Replication (CRR) policies and Same Region Replication (SRR) policies are continuing to work through backlogged events. We expect to complete backlog processing in the next 11 hours, although we expect many buckets will recover sooner. We will provide a further update by Oct 19 7:15 PM PDT.

UPDATE over 1 year ago - at 10/20/2023 12:19AM

We have completed processing Replication Time Control (RTC) backlogs for most buckets enabled for RTC except for a small number of buckets. We continue to work directly with the remaining few customers on RTC backlog progress for their buckets. Replication operations in S3 Batch Operations, Cross-Region Replication (CRR) policies and Same Region Replication (SRR) policies are all in recovery and are working through backlogged events. We expect that processing to complete over the next 12 hours, although we expect many buckets will recover sooner. We will provide a further update by October 19 6:15 PM PDT.

UPDATE over 1 year ago - at 10/19/2023 11:48PM

We continue to process the remaining Replication Time Control (RTC) backlogs with additional capacity and expect the bulk of buckets enabled for RTC to be fully caught up in the next 1 hour. We completed re-enabling the Replication operation in S3 Batch Operations, Cross-Region Replication (CRR) policies and Same Region Replication (SRR) policies. We are also adding more capacity to process Replication operation in S3 Batch Operations, Cross-Region Replication (CRR) policies and Same Region Replication (SRR) policies. We will provide an ETA for full resolution in the next 1 hour. We will provide you with a further update by Oct 19 5:45 PM PDT.

UPDATE over 1 year ago - at 10/19/2023 10:44PM

We continue to process the remaining Replication Time Control (RTC) backlogs with additional capacity and expect the bulk of buckets enabled for RTC to be fully caught up in the next 2 hours. We completed re-enabling the Replication operation in S3 Batch Operations and have begun re-enabling processing of Cross-Region Replication (CRR) policies and Same Region Replication (SRR) policies. Once we have re-enabled all processing, we will be able to provide an ETA for full resolution. We will provide you with a further update by Oct 19 4:45 PM PDT.

UPDATE over 1 year ago - at 10/19/2023 09:50PM

We have accelerated the processing of Replication Time Control (RTC) backlogs for remaining buckets and are continuing to add capacity. We began re-enabling the Replication operation in S3 Batch Operations and expect it to be fully enabled within the next 45 minutes. We will also begin re-enabling processing of Cross-Region Replication (CRR) policies and Same Region Replication (SRR) policies in the next hour. Once we have re-enabled all processing, we will be able to provide an ETA for full resolution. We will provide you with a further update by Oct 19 3:45 PM PDT.

UPDATE over 1 year ago - at 10/19/2023 08:47PM

We have completed processing of Replication Time Control (RTC) backlogs for a number of buckets. We continue working on processing the remaining backlog. We are now executing our plan to increase throughput for replication request processing Cross-Region Replication (CRR) policies, Same Region Replication (SRR) policies and the Replication operation in S3 Batch Operations. We will provide you with a further update by Oct 19 2:45 PM PDT.

UPDATE over 1 year ago - at 10/19/2023 07:43PM

We have put together a plan to increase throughput for replication request processing for Cross-Region Replication (CRR) policies, Same Region Replication (SRR) policies and the Replication operation in S3 Batch Operations. We are in the early stages of implementing that plan. In parallel, we continue to process the remaining RTC current and backlog replication requests. We will provide you with a further update by Oct 19 1:45 PM PDT.

UPDATE over 1 year ago - at 10/19/2023 06:50PM

We are seeing good progress with the RTC replication subsystem processing current and past replication requests. We are now working on a plan to increase throughput and other replication process (specifically, Cross-Region Replication and the Replication operation of S3 Batch Operations) to accelerate current and past replication requests for S3 objects in the US-EAST-1 Region.

Customers needing to replicate or backup S3 data while we restore normal operations to the replication subsystem should use S3's COPY API, S3's Batch Operations using the COPY API, or AWS Backup. We will provide you with a further update by Oct 19 12:45 PM PDT.

UPDATE over 1 year ago - at 10/19/2023 05:46PM

We are operating our replication subsystem at full throughput now so we can process RTC replication and backlog from the US-EAST-1 Region to different destinations. Customers needing to replicate or backup S3 data while we restore normal operations to the replication subsystem should use S3's COPY API, S3's Batch Operations using the COPY API, or AWS Backup. We will provide you with a further update by October 19 11:45 AM PDT.

UPDATE over 1 year ago - at 10/19/2023 04:44PM

We continue to see improvement in the throughput and processing for our replication subsystems replicating storage from the US-EAST-1 Region to different destinations. We continue to focus on restoring RTC replication to normal operations and processing the backlog. Customers needing to replicate or backup S3 data while we restore normal operations to the replication subsystem should use S3's COPY API, S3's Batch Operations using the COPY API, or AWS Backup. We will provide you with a further update by Oct 19 10:45 AM PDT.

UPDATE over 1 year ago - at 10/19/2023 03:44PM

We continue to sustain high throughput for RTC replication traffic for storage using the US-EAST-1 Region. We are prioritizing RTC traffic first and then will accelerate Cross Region Replication (CRR) traffic. Customers needing to replicate or backup S3 data while we restore normal operations to the replication subsystem should use S3's COPY API, S3's Batch Operations using the COPY API, or AWS Backup. We will provide you with a further update by Oct 19 9:45 AM PDT.

UPDATE over 1 year ago - at 10/19/2023 02:46PM

We are seeing steady progress in recovery of RTC replication traffic for storage using the US-EAST-1 Region, sustaining 80% throughput for First In First Out (FIFO) RTC replication. We will continue to increase our throughput to get back to normal operations for RTC FIFO replication and start processing our backlog of RTC replicated objects. Customers needing to replicate or backup S3 data while we restore normal operations to the replication subsystem should use S3's COPY API, S3's Batch Operations using the COPY API, or AWS Backup. We will provide you with a further update by Oct 19 8:45 AM PDT.

UPDATE over 1 year ago - at 10/19/2023 01:44PM

We have started to increase the amount of replication traffic for our First In First Out (FIFO) replication. We are monitoring progress closely as we increase both the number of replication channels used for traffic as well as the bandwidth associated with each channel. Customers needing to replicate or backup S3 data while we restore normal operations to the replication subsystem should use S3's COPY API, S3's Batch Operations using the COPY API, or AWS Backup. We will provide you with a further update by Oct 19 7:45 AM PDT.

UPDATE over 1 year ago - at 10/19/2023 12:44PM

We have completed the deep diagnostic checks needed to ensure that the recovery process can proceed as planned. Our next step is to increase the throughput for replication traffic. The guidance continues that customers needing to replicate or backup S3 data while we restore normal operations to the replication subsystem should use S3's COPY API, S3's Batch Operations using the COPY API, or AWS Backup. We will provide you with a further update by Oct 19 6:45 AM PDT.

UPDATE over 1 year ago - at 10/19/2023 11:40AM

We have completed the isolation of a portion of traffic to a dedicated channel. We are continuing to run the deep diagnostic checks that we need to complete to ensure that the recovery progress is proceeding as planned. We will provide you with a further update by Oct 19 5:45 AM PDT.

UPDATE over 1 year ago - at 10/19/2023 10:48AM

We’re currently running a deep diagnostic check to make sure that our recovery progress is proceeding as planned. This involves validating a number of metrics across different subsystems. Once we determine that the changes made in our system are having the appropriate impact, we will continue to our next step of scaling up traffic for replication. We will provide you with a further update by Oct 19 4:30 AM PDT.

UPDATE over 1 year ago - at 10/19/2023 09:36AM

We continue to make changes to improve the replication processing from the US-EAST-1 Region. We've completed the deployment of the code change to accelerate the replication operations subsystem and are isolating a portion of traffic to a dedicated channel. We will then scale up the traffic settings for the replication channels which will increase the TPS for replication processing for our RTC customers. If customers need to replicate or backup S3 data while we restore normal operations to replication subsystem, customers can use S3's COPY API, S3's Batch Operations using the COPY API, or AWS Backup. We will provide you with a further update by Oct 19 3:30 AM PDT.

UPDATE over 1 year ago - at 10/19/2023 08:46AM

We continue to make changes to improve the replication processing from the US-EAST-1 Region. In the last hour, we identified a portion of traffic that we plan to direct to a dedicated channel, as well as performed some diagnostics on our scaled down replication subsystem. Next, we will deploy a code change to a subsystem that aggregates replication operations and then we will scale up the traffic settings for the replication channels. If customers need to replicate or backup S3 data while we restore normal operations to replication subsystem, customers can use S3's COPY API, S3's Batch Operations using the COPY API, or AWS Backup. We will provide you with a further update by Oct 19 2:30 AM PDT.

UPDATE over 1 year ago - at 10/19/2023 07:29AM

We’d like to share more information about the recovery efforts we’ve been pursuing to improve replication processing delays for objects in the US-EAST-1 Region to other AWS Regions or other buckets in the US-EAST-1 Region. Starting on Oct 18 at 12:41 AM PDT, when our alarms fired upon detection of replication delays, our engineers have been focused on mitigating the impact of the delayed replication through changes made to replication subsystems, resource adjustments and other modifications to the replication environment.

We’ve seen some improvements in the replication of storage, but have not fully restored normal operations. Specifically, the replication rate of storage has increased for some customers’ objects and resulted in less delay. However, we have not yet restored normal operations for First In First Out (FIFO) replication requests nor processed our backlog of replicated storage accumulated during this event. We continue to work to resolve the issue with software and other mitigations that we’ve identified.

As we have noted, customers who want to replicate or backup their storage while we are restoring our replication system can use the S3 COPY API directly or through S3 Batch Operations, or use AWS Backup. For customers who have a lifecycle policy on replicated data, we can confirm that lifecycle policies will not take action on replicated data until all the replicated data in the backlog has been delivered to destination buckets. We want to confirm that during this event, all other S3 API requests (GET, PUT and LIST) and functionality has been operating normally in the US-EAST-1 Region. Our engineers have also audited and confirmed that replication operations in source AWS Regions other than the US-EAST-1 Region are and have been operating normally during this event as well.

We continue to actively work to resolve the issue and will continue until we restore normal operations for replication from the US-EAST-1 region and clear our backlog of delayed replicated objects.

UPDATE over 1 year ago - at 10/19/2023 06:35AM

We are making progress on restarting a replication subsystem that distributes work via queues. We've gradually slowed the system and are scaling it back up again so that we can more closely observe and mitigate how the software processes the increased load. That will allow us to determine how to resolve any issues in this subsystem that plays an important role in processing replication for storage in US-EAST-1. If customers need to replicate or backup S3 data while we restore normal operations to replication subsystem, customers can use S3's COPY API, S3's Batch Operations using the COPY API, or AWS Backup. We will provide you with a further update by Oct 19 12:30 AM PDT.

UPDATE over 1 year ago - at 10/19/2023 05:33AM

We continue to implement different mitigations that improve replication processing for objects in the US-EAST-1 Region. Each mitigation is helping although we have not yet fully recovered. We're currently in process of restarting a replication subsystem that distributes work via queues. Once that system is fully operational, it will speed up the processing of replication for storage in US-EAST-1. If customers need to replicate or backup S3 data while we restore normal operations to replication subsystem, customers can use S3's COPY API, S3's Batch Operations using the COPY API, or AWS Backup. We will provide you with a further update by 11:30 PM PDT.

UPDATE over 1 year ago - at 10/19/2023 04:33AM

Our partition configuration change has increased the throughput of our replication delivery but we have not yet returned to normal operations for First In First Out (FIFO) replication request processing. We are working on an investigation that optimizes the volume of channels used for replication which should also increase the rate of processing replication requests. If customers need to replicate or backup S3 data while we restore normal operations to replication subsystem, customers can use S3's COPY API, S3's Batch Operations using the COPY API, or AWS Backup. We will provide you with a further update by 10:30 PM PDT.

UPDATE over 1 year ago - at 10/19/2023 03:30AM

We are working to sustain the improvements in replication processing that we saw when we deployed our partition configuration change. In parallel, we are investigating other mitigations. If customers need to replicate or backup S3 data while we restore normal operations to replication subsystem, customers can use S3's COPY API, S3's Batch Operations using the COPY API, or AWS Backup. We will provide you with a further update by 9:30 PM PDT.

UPDATE over 1 year ago - at 10/19/2023 02:34AM

We have finished implementing a change to adjust partition configuration within our replication subsystem, and have seen a corresponding improvement in replication processing for one part of our replication subsystem. We have other mitigations that we're still testing and validating for impact. If customers need to replicate or backup S3 data while we restore normal operations to replication subsystem, customers can use S3's COPY API, S3's Batch Operations using the COPY API, or AWS Backup. We will provide you with a further update by 8:30 PM PDT.

UPDATE over 1 year ago - at 10/19/2023 01:28AM

The replication subsystem is slowly recovering as we continue to mitigate impact to replication processing. We have new mitigations that we're evaluating and putting those in place to speed up recovery. We will provide you with a further update by 7:30 PM PDT.

UPDATE over 1 year ago - at 10/19/2023 12:31AM

Our changes to improve the speed of replication processing are helping our replication subsystem recover. However, replication processing is still delayed so we are looking at different ways to accelerate the recovery. For customers who have a lifecycle policy on replicated data, the lifecycle policy will not take action until the replication system is fully restored and the backlog is processed. We will provide you with a further update by 6:30 PM PDT.

UPDATE over 1 year ago - at 10/18/2023 11:36PM

We have made additional changes to speed up replication processing and are closely monitoring the progress. We are taking multiple parallel paths to further improve the throughput rates. We continue to see incremental improvements for the number of new requests processing as expected, and once recovered, we will start processing the delayed replication request backlog. We will provide you with a further update by 5:30 PM PDT.

UPDATE over 1 year ago - at 10/18/2023 10:39PM

We recently made an additional change to speed up replication processing and are monitoring the progress. We are sustaining throughput between 40-60% of normal replication levels. Once we see the replication of the latest requests improving, we will start processing the delayed replication request backlog. We will provide you with a further update by 4:30 PM PDT.

UPDATE over 1 year ago - at 10/18/2023 09:29PM

We are continuing to monitor progress and have sustained throughput between 40-60% of normal replication levels. We are focusing on getting the replication engine processing the latest replication requests and have not started processing the delayed replication backlog. We are now preparing a third change that we expect to further increase throughput and also perform additional analysis to help identify any additional issues. Once throughput levels have increased to a level that allows additional processing, we will begin processing the replication backlog. We will provide you with a further update by 3:30 PM PDT.

UPDATE over 1 year ago - at 10/18/2023 08:31PM

The first software change has completed and the second update is currently deploying. As a result of these two changes, we are now seeing recovery to 60% of normal throughput. We are continuing to monitor these deployments and evaluate additional mitigations to further increase throughput to a level that will allow us to process the outstanding backlog of objects awaiting replication more quickly. We will provide you with a further update by 2:30 PM PDT.

UPDATE over 1 year ago - at 10/18/2023 07:29PM

We have further updates on the subsystem condition that is preventing S3 Replication from reacting normally to adjustments in volume. This issue is being caused by stale reservations for S3 Replication resources. We are deploying the first of two software changes now, and expect to see improvement in the next hour. Simultaneously, we are preparing a second update with further mitigations. We will provide you with a further update by 1:30 PM PDT.

UPDATE over 1 year ago - at 10/18/2023 06:30PM

We have developed a software change to address the root cause of this issue, which involves the subsystem responsible for adjusting the capacity of the replication system. This subsystem encountered a condition that prevented it from reacting normally to adjustments in replication volume, leading to delays replicating S3 objects from the US-EAST-1 Region to other regions. S3 GET and PUT operations, as well as other S3 operations, have been and continue to operate normally during this event. We will provide you with a further update by 12:30 PM PDT.

UPDATE over 1 year ago - at 10/18/2023 05:35PM

The amount of traffic replicated normally continues to increase for S3 objects with S3 Replication Time Control (RTC) using the US-EAST-1 Region as the source. Customers can use their S3 Replication metrics to get a real-time update on processing time. Replication metrics can be accessed through the S3 Management Console. We can also confirm that S3 objects being replicated from the US-EAST-1 Region using S3's Cross Region Replication or the Replication operation in S3 Batch Operations are affected by this processing delay as well. We will provide you with a further update by 11:30 AM PDT.

UPDATE over 1 year ago - at 10/18/2023 04:23PM

We are now seeing recovery for 10-20% of affected replication traffic for S3 objects with Replication Time Control (RTC) in the US-EAST-1 Region. We continue to make progress on resolving the root cause of the event. This event only impacts S3 objects that are being replicated from the US-EAST-1 Region as the source to another AWS Region as the destination region. S3 objects replicated from any other AWS Region as the source Region have been operating normally. We will provide you with a further update by 10:30 AM PDT.

UPDATE over 1 year ago - at 10/18/2023 03:33PM

We're making progress in resolving the root cause that has triggered the delay in replicating S3 storage from the US-EAST-1 Region to other regions or other S3 buckets in US-EAST 1. We are seeing some replication occurring normally at this time, although some objects continued to be delayed for replication. S3 replication will start automatically processing the backlog of S3 objects that were delayed once S3 replication is operating normally. We will provide you with a further update by 9:30 AM PDT.

UPDATE over 1 year ago - at 10/18/2023 02:22PM

S3 Replication of some objects from the US-EAST-1 Region to other regions is currently delayed. We've isolated the cause to an issue within the system that manages the distribution of replication tasks across the fleet that completes object replication. We are currently working to isolate the specific root cause within this system, and in parallel taking actions to work around the delays. We are tracking all objects that are pending replication, and expect that once this issue is resolved, all pending objects will be replicated. There is no current ETA for resolution.

UPDATE over 1 year ago - at 10/18/2023 01:08PM

We are still attempting mitigations for the elevated latencies for S3 Replication Time Control out of the US-EAST-1 Region. As a result of this issue, customers might observe the 'x-amz-replication-status' object header of source objects remain “PENDING”. Our initial mitigation efforts have not yet resulted in full recovery and we continue to work towards a full root cause. We will provide you with another update by 7:00 AM PDT.

UPDATE over 1 year ago - at 10/18/2023 12:09PM

We continue to work on resolving elevated latencies for S3 Replication Time Control out of the US-EAST-1 Region. As a result of this issue, customers might observe the 'x-amz-replication-status' object header of source objects remain “PENDING”. We will continue to provide regular updates as we progress.

UPDATE over 1 year ago - at 10/18/2023 11:27AM

We can confirm elevated latencies for S3 Replication Time Control out of the US-EAST-1 Region. We are taking multiple mitigation paths at this time and our engineering teams are currently working on a fix to mitigate the issue. We will provide a further update by 5:15 AM PDT.

Outage in AWS

Elevated Replication latency - N. Virginia

Outage Details

Latest Updates ( sorted recent to last )

Latest AWS outages

Be the first to know when AWS and other third-party services go down