Artionet - Etat des services réseaux

Service de mails transactionnels - Mandrill

resolved

Le problème est résolu et sous monitoring auprès de notre fournisseur.

identified

Nous venons de recevoir une mise à jour de notre partenaire :

Hello,

Our Engineering team has restored Mandrill’s operations. For most customers, features should be working as expected. However, there may be some residual effects in your account. We’ve included information on that below.

Due to the nature of this outage, it was difficult to determine which individual accounts were affected so we’ve chosen to notify all Mandrill users of the outage, its effects, and our efforts to resolve it.

What happened and how we responded
For more background information on what caused the outage, please read our previous email to customers here. The outage began around 10:30pm EST on Sunday, February 3, and since then our Engineering team has worked around the clock to diagnose the issue, then develop a plan to fix it. We made code changes to work around the damaged database to preserve as much functionality for as many users as possible. We added machines, storage, and networking to attempt a variety of efforts at preserving all data and getting back online. Ultimately, we decided that our efforts to preserve all data would take too much time. We changed direction to truncate specific tables in order to get back online faster. These efforts were successful.

How we resolved it
We determined that some data was preventing necessary automatic processes from running. We deleted that data, which allowed those processes to successfully run and return the database to a usable state.

Residual effects
You may see some lingering effects in your account as a result of the outage. Stats and metrics that should have been tracked during the outage may be incorrect or missing entirely.

Next steps
We sincerely apologize for this outage and the impact to your business. We’re committed to compensating our customers for the disruption this outage has caused. Once we’ve fully resolved all issues, we’ll let you know more about how and when we plan to issue refunds or credits. You don’t need to take any action at this time.

Likewise, we‘re committed to conducting a full post-event review. We plan to learn as much as we can from this event so that we can make improvements and better serve our customers.

– Mandrill Support

identified

Depuis dimanche 3 février à 18:30, nous rencontrons quelques problèmes sur certains sites web pour la livraison de mails transactionnels comme par exemple les réceptions de commandes de shop où les formulaires de contacts. En effet, notre partenaire externe "Mandrill" a une panne mondiale générale qui impacte 80% de ces clients avec une restauration au fur et à mesure des heures.

Vous trouvez ci-dessous le mail reçu par leurs équipes il y a quelques heures. Tout devrait être rentré dans l'ordre dans le courant de la semaine. En cas de problème, n'hésitez pas à contacter notre support.

***********************************

Hello,

We’re contacting you about an ongoing outage with the Mandrill app. This email provides background on what happened and how users are affected, what we’re doing to address the issue, and what’s next for our customers.

What happened
Mandrill uses a sharded Postgres setup as one of our main datastores. On Sunday, February 3, at 10:30pm EST, 1 of our 5 physical Postgres instances saw a significant spike in writes. The spike in writes triggered a Transaction ID Wraparound issue. When this occurs, database activity is completely halted. The database sets itself in read-only mode until offline maintenance (known as vacuuming) can occur.

The database is large—running the vacuum process takes a significant amount of time and resources, and there’s no clear way to track progress.

Customer impact
The impact to users could come in the form of not tracking opens, clicks, bounces, email sends, inbound email, webhook events, and more. Right now, it looks like the database outage is affecting up to 20% of our outbound volume as well as a majority of inbound email and webhooks.

What we’re doing to address this
We don’t have an estimated time for when the vacuum process and cleanup work will be complete. While we have a parallel set of tasks going to try to get the database back in working order, these efforts are also slow and difficult with a database of this size. We’re trying everything we can to finish this process as quickly as possible, but this could take several days, or longer. We hope to have more information and a timeline for resolution soon.

In the meantime, it’s possible that you may see errors related to sending and receiving emails. We’ll continue to update you on our progress by email and let you know as soon as these issues are fully resolved.

What’s next
We apologize for the disruption to your business. Once the outage is resolved, we plan to offer refunds to all affected users. You don’t need to take any action at this time—we’ll share details in a follow-up email and will automatically credit your account.

Again, we’re sorry for the interruption and we hope to have good news to share soon.

– Mandrill Support

***********************************

A noter également que tout e-mail envoyé est automatiquement stocké dans vos bases de données IceCube2.Net, dès lors aucune donnée n'est perdue.