On 8 April 2020 (UTC), a partial outage occurred involving ProForma’s cloud servers. During the outage forms submitted through the customer portal or changes to the contents of forms may not have been saved.
The cause of this partial outage was that ProForma failed to handle a large volume of requests coming from multiple Jira instances. These requests were triggered by multiple customers performing bulk updates on thousands of issues at once.
This post mortem is to update those affected with further information about the incident timing, it’s potential impact to yourself and how you can respond to the situation. We appreciate how important form data is and we want to do our utmost to ensure your processes aren’t unduly affected.
Impact
ProForma suffered a partial outage and failed to respond to some requests. This meant that users may not have been able to:
This potentially means that requests or issues could have been submitted/created which do not have forms attached, or changes to the contents of a form were not saved. Most users will have seen an error message on their web page, indicating that the server was not available. This should alert them that their work was not saved.
Timing
We have reviewed our logs and the partial outage affected customers between 13:00 UTC and approximately 17:00 UTC on Wednesday 8 April 2020. For different timezones this incident occurred at:
Possible repair pathways
Depending on the importance of the data being received, you will need to consider the following:
For internal Jira Users (Software & Business Projects)
Whether to advise your users that they need to check that any changes to the forms were saved during the outage. Unfortunately this will need to be a manual process.
For Customers using the Jira Service Desk Portal
We recommend checking the requests created during the outage to ensure that they have forms attached. For those requests that do not have forms attached, you could advise customers their request was not submitted properly due to a processing error.
To find the possible issues affected you can use the following JQL queries (adjusting the project key as appropriate). Note: In cloud the created date is relative to the timezone set for your instance:
Once you have identified the issues you can either
To attach a form to the existing issue:
It may be useful to also label all the affected tickets for monitoring.
Next steps by ProForma:
We know this problem is caused when multiple customers perform bulk updates on thousands of issues. We have now changed how our servers operate, so that the bulk update requests are managed in a separate cluster of servers to the rest of ProForma, which ensures that normal ProForma function can always continue to operate. We have also doubled our server capacity by splitting our ProForma full and ProForma Lite customers onto their own server clusters.
We sincerely apologize for this incident and we will do our utmost to avoid an incident of this nature happening again. We are also examining how we can improve our response time to this incident as we believe it took too long for us to restore normal operations.
We appreciate how important form data is, and we are genuinely sorry for the inconvenience this will cause you and your customers. We remain committed to ensuring this issue doesn’t impact you or your customers again.