Need to monitor TrekkSoft outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including TrekkSoft, and never miss an outage again.
Start monitoring today
On the morning of February the 26th we migrated the TrekkSoft servers from our Cloudscale hosting provider in Zurich to the new Amazon Web Services in Ireland.
After the migration was complete, usage of the system increased as the merchants began to take bookings and use the system.
We spotted a major drop in performance. The root cause was one database host that was throttling under the amount of requests per second. This database slowdown caused a significant drop in performance to our applications (Merchants landing pages - CMS, Backoffice, public and private API and mobile apps), in some cases rendering them inoperable.
6:45am - 8:29am CET - We completed the AWS migration.
We tested all the main cases and monitored all hosts and the preliminary results were satisfactory.
9:00am CET - Our applications began handling an increased amount of requests as the system came back online and usage of the system scaled up.
One of the main database hosts (MySQL) began struggling with the amount of requests. This affected the performance of our application, preventing normal functionality.
Uncertainty regarding the performance of the new AWS infrastructure vs CloudScale.
We compared all hosts in CloudScale vs AWS to ensure the same hardware requirements.
The infrastructures are different.
Increase the size of the database in AWS to increase performance (no downtime was required at this point).
Contact AWS support to provide for more information about the resizing time.
The database resize was to take AWS too long to deploy, so we decided to apply another workaround, described below.
We put all the webapps in maintenance mode (down time).
We created a new, larger database (downtime was required to avoid data loss).
We extracted all data from one database to another, now using a migration system in AWS.
The new database created failed.
This required a new approach, described below.
We created a new empty database (again, downtime was required to avoid data loss).
We proceeded with a manual dump of the data from the old database to the new one. The process took 4 hours and was successful.
3:20PM CET The new infrastructure was ready to be released at aprox.
We have been monitoring and tweaking the system over the last 24 hours to improve performance.
Low number of bookings from 7:00am to 4:30pm CET (about 9 hours). Some merchants were unable to process any bookings, while others still managed to take some. The impact here is financial loss to all parties.
The objective behind the migration that caused the issue.
Overall long term increase in performance.
Up to date industry infrastructure.
More direct control over our infrastructure.
Infrastructure ready to apply autoscaling in case of a peak of request per second.
We will strategically time operation of this scale so that we have more time to react and avoid peak booking hours.
Triple-check hardware and settings specifications.
Replicate the system and run stress tests.
Build our infrastructure with extra capacity and resources/have a larger infrastructure as a backup.
We apologize deeply for this incident.
With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.Start free trial
No credit card required · Cancel anytime · 2616 services available
Try it out! How much time you'll save your team, by having the outages information close to them?