Stroomstoring Datacenter Hetzner

Voor onze webhosting maken wij gebruik van de diensten van Hetzner.de.

Vanmorgen rond 11 uur kreeg Hetzner een stroomstoring in hun Datacenter te verduren.  Ze kregen een spanningsdip, en de UPS nam deze dip niet over. Normaal gesproken vangt de UPS dit op. Maar voor een tot nu toe onbekende reden is dit niet gebeurd.

De servers zijn spanningsloos geraakt, en Hetzner nu eigen infrastructuur was gedeeltelijk niet bereikbaar.

Via deze link konden we volgen wat er gebeurde:

Type: Fault report
Categories: Basic infrastructure
Start: May 24, 2018 11:30:00 AM CEST
End: Unknown
Description: Due to a current disruption of the power supply at the location Falkenstein, there are currently accessibility problems of some customer servers. Our technicians are currently working on root cause analysis and we will keep you up to date.

Update: May 24, 2018 12:10:00 PM CEST
Update: According to current knowledge, there was a power failure at 11:00 am in the DC7, DC10 + DC12. The consequences was a failure of the core networking equipment at the Falkenstein site, which led to a short-term outage of the entire network connectivity in Datacenter park location in Falkenstein.

Update: May 24, 2018 12:27:00 PM CEST
In the meantime, the power supply in the three affected data centers could be provisionally restored (without UPS protection).

The cause of the failure was a strong voltage reduction in the local power grid. Why it has caused a power failure despite our UPS backups, is still under investigation. Currently, the restoration of the secure operation has top priority.

Our technicians are in the process of determining how many servers need to be manually checked after the power interruption.


Update: May 24, 2018 12:43:00 PM CEST
Currently, the following support inquiries are open in the affected data centers:

DC 7: 237
DC10: 740
DC12: 164

We work on the processing with all available resources.


Update: May 24, 2018 12:58:00 PM CEST
Currently, the following two top-of-the-rack switches are not available:

fsn1-dc12-sw_508 fsn1-dc10-sw_38

Our network team is working hard to resolve the issue.


Update: May 24, 2018 1:29:00 PM CEST
Currently, the following support requests are available in the affected data centers:

DC 07 402
DC 10 914
DC 12 212

We work with all available forces on the processing

All top-of-the-rack switches are now available again.


Update: May 24, 2018 2:00:00 PM CEST
Currently, the following support requests are available in the affected data centers:

DC 07 497
DC 10 989
DC 12 232

We work with all available forces on the processing

Our technicians continue to work on checking all servers that are not accessible again after the interruption.


Update: May 24, 2018 2:28:00 PM CEST
In the meantime the UPS rail in the FSN1-DC10 is active again and the UPS protection could be restored in parts areas of DC10.

We are working with high pressure on the restoration of secured regular operation.


Update: May 24, 2018 4:36:00 PM CEST
In the meantime, all servers that did not restart automatically after the voltage failure were switched on or checked for defective power supplies. The next step is to work on the tickets in the data centers concerned – servers that are currently not available must all be checked individually, because the error situations can vary greatly.

The following tickets are currently in the queues:

DC 07 1013
DC 10 1408
DC 12 465

To avoid further delays, we ask you to create only one ticket per affected server.

Thank you very much.


Update: May 24, 2018 6:29:00 PM CEST
The analysis of the UPS infrastructure is still running at full speed. Some parts of the power-subdistributors are affected and were partly damaged by the strong voltage reduction in the local power grid.

Some of the affected UPS parts can be replaced using existing spare parts by technicians of the UPS system manufacturer. Other spare parts are in the delivery phase and we are still working at full speed on restoring reliable control operation. More detailed information on when the secure regular operation in the affected data centers will be restored (DCs 7/10/12) will probably only become apparent tomorrow. Until then, the affected data centers are partly or completely without UPS protection.

Thank you for your understanding.


Update: May 24, 2018 7:35:00 PM CEST
The number of open tickets is currently decreasing. We are continously processing open support requests. The current number of tickets for the affected data centers is:

DC7: 704
DC10: 1286
DC12: 429

Zoals je kan zien is de storing opgelost, maar zijn de gevolgen nog steeds gaande. Gelukkig zijn onze servers weer online gekomen.

Wat hebben wij geleerd van deze actie:

  1. Alleen de webhosting met bijbehorende mail mogelijkheid van onze websites waren onbereikbaar. Onze servers zoals de kerio mailservers, en de dns servers, voip, etc waren nog steeds bereikbaar aangezien we deze in andere datacenters hebben staan, en onze dns servers datacenter en provider technish redunant zijn.
  2. Onze eigen website draait op dezelfde server als de klanten. Dit vanuit de gedachte dat wat je verkoopt moet je zelf ook gebruiken, alleen jammer genoeg hebben we bij een storing geen manier om dan informatie via onze website te geven.
  3. Stroom storingen komen dus ook voor in de veiligste omgevingen.

Punt 1 en 3 kunnen wij zelf direct niet zoveel aan oplossen. Onze leverancier heeft gewoon dikke pech dat zijn stroom voorziening niet werkt zoals designed. Wij zijn zeer geintresseerd in wat er techisch fout is gegaan. En aan de hand daarvan gaan we kijken wat we gaan doen.

Punt 2 hebben wij vanavond al opgelost. Onze website draait nu apart van alle andere klanten, in een aparte netwerk / datacenter. Zo kunnen we de volgende keer toch meer informatie geven.

Als we meer info ontvangen over bovenstaande storing dan vullen wij onderstaande aan.