Loss of network at our Data Centre – Mar 2018

Posted on March 29, 2018 by eddie_admin

12:30PM Thurs 29th March 2018 – the Data Centre where our equipment is located are currently experiencing a total loss of network across a big part of their facility. If you VPS is currently unreachable, please stand by, we are monitoring updates form the DC, who are working to restore service, and will post an update here as soon as we have an update from them.

Update 13:18 – We have received the following update from the Data Centre: “There has been a break in our fibre network. This is affecting customers with services in our Glasgow, Edinburgh and Manchester data centres. Our team is working to fix the problem but at the moment we don’t know how long this will take. Apologies to all affected.”

Update: 13:47 – We have received the following update from the Data Centre: “We are currently experiencing fibre issues on our national network ring. Our technicians and network suppliers (Zayo) are aware of the issue and are working towards a resolution. We are in touch with our dark fibre team who are assisting with both faults on each side of the country. This will be affecting customers with services in our Northern part of the ring (Glasgow, Edinburgh, Manchester, St Asaph). We are working hard to get services back to normal but at the moment there is no information on when that will be. Apologies to the customers being affected.”

Update 13:57 – The fibre in question is supplied to our Data Centre by a huge fibre company called Zayo, who own 204,371 km of fibre across North America and Europe, so rest assured the people working to fix the problem are more than capable. There is still no ETA on when the network will be restored unfortunately, but as soon as we have more news we’ll post it here. Apologies to all of our customers for the inconvenience we know they will be experiencing right now.

Update 14:05 – Our Data Centre have provided us with this graphic showing 2 breaks in their fibre network – just below Manchester and between Glasgow and Nottingham.

Update 14:27 – We’ve been informed that the fibre break is affecting other Data Centre providers in the UK as well as IOMart (the new name of the Data Centre we use). This is good news for us as means the incident will be being treated with the highest priority by the fibre company.

Update 15:13 – We are pleased to report that full network service has just now returned 🙂 Further info will follow…

Update 16:18 – With this incident now having been resolved, we should get a full post-mortem report from the Data Centre most likely sometime next week. We are confident this will include plans for improvements to their network to ensure something like this cannot happen again. We will share this with our customers directly when we receive this.

Loss of network at our Data Centre

Posted on September 27, 2016 by Eddie

10:50AM Tues 27th Sept – the Data Centre where our equipment is located are currently experienceing a total loss of network across a big part of their facility. If you VPS is currently unreachable, please stand by, we are monitoring updates form the DC, who are working to restore service, and will post an update here as soon as we have an update from them.

Update 11:06AM – the issue has now been resolved and you should be able to access your VPS normally. Please accept our apologies for any inconvenience caused.

Loss of network at our Data Centre

Posted on April 5, 2016 by Eddie

12:30PM Tues 5th April – the Data Centre where our equipment is located are currently experienceing a total loss of network across a big part of their facility. If you VPS is currently unreachable, please stand by, we are monitoring updates form the DC, who are working to restore service, and will post an update here as soon as we have an update from them.

Update 12:35PM – we’ve just seen that network service has returned to all of our VPS. You should find your VPS is reachable again now, if it is not, please raise a support ticket so we can look into it. But please note the DC has yet to confirm that the outage is over, we will post another update once this is confirmed.

Update 12:44PM – unfortunately it appears their problems are not over, as network connectivity has just been lost again at the DC. Customers will be seeing their VPS’ are unreachable again at this time.

Update 12:53PM – network service has again returned to all of our VPS, you should find your VPS is reachable again now. But please note the DC has yet to confirm that the outage is over, we will post another update once this is confirmed.

Update 13:08PM – We have not seen any further interruptions to service since the last update, and the DC have confirmed that service should now be normal. You should find your VPS is reachable, if it is not, please raise a support ticket so we can look into it. We’d like to apologise to our customers for the inconvenience caused by this outage. It is likely the DC will communicate to us a reason for this outage once they have completed their investigations. If and when they do, we will communicate that here.

14:20PM – The DC have provided the following update: “We feel we have resolved the immediate issues with the network here we are still considering this investigation as ongoing. At this time we know an issue with a piece of power infrastructure appears to have led to a routing issue on our network. In turn, this caused network service to be lost across our two data centres. A root cause analysis will be made available within 7 days.”

We at Manchester VPS will pass on here any further information we get. Having been with the DC for a long time now we are very confident in their infrastructure, expertise, and excellent service they always provide to us. It is extremely rare for them to suffer a major outage, so we would like to reassure our customers that you are in good hands.

Network outage Sept 28th

Posted on September 28, 2014 by Eddie

Sun Sept 28th 2014 – 16:50: Unfortunately the data centre we are in has just suffered a brief loss of network connectivity lasting about 5-10 mins. It was very brief so may not have been noticed but our monitoring detected it. We’d like to apologise to any customers affected. At the moment we don’t have any information from the DC as to what the cause was, but we are very pleased to see that they were able to resolve whatever it was so quickly. If we do get any information from them we’ll pass it on here on this post.

Network Outage

Posted on August 23, 2014 by Eddie

Sat 23rd Aug 2014 11:16AM – Some VPS customers will currently be experiencing a lack of network connectivity to their VPS. We are investigating urgently and will update here as soon as we have further news.

Update 11:28AM – We have identified the cause of the issue and have raised a request with on site technicians at the Data Centre to help resolve the problem.

Update 11:34AM – We’d like to apologise to any customers affected by this, we expect to have this resolved very shortly.

Update 11:36AM – This issue has now been resolved and network connectivity has now resumed. Once again, we apologise for the inconvenience caused, a post mortem will follow shortly.

Network packet loss

Posted on August 10, 2014 by Eddie

The Data Centre we are in are this evening experiencing intermittent network issues. There have been two occasions this evening, once at around 18:15, and again at around 20:55, where VPS users have seen varying levels of packet loss lasting between 2 – 4 minutes. Rest assured the Data Centre have a team of network engineers aware of the issues and currently working to resolve them. We’d like to apologise to customers for the inconvenience caused. We will provide a further update in due course.

Update 22:45 – The network has been stable since the last incident at about 20:55. If we see any further incidents we’ll add a further update but, for now, all is looking fine.

Update Mon 11th Aug, 18:41 – The Data Centre have informed us that the packet loss yesterday was the result of a persistent DDoS attack against one of their other customers. The DC have managed to mitigate against the attack, and we have not seen any further packet loss since yesterday as a result.

Brief network outage

Posted on June 24, 2014 by Eddie

Tues 24th June 2014 14:25 – Unfortunately we just suffered a brief loss of connectivity on one of our network segments for approx. 10 minutes, which affected some of our VPS customers. This was as a result of a mistake we made while configuring a switch. Network connectivity is now fully restored, and lessons learned. We’d like to apologise to all customers affected.

Loss of service for some customers

Posted on April 10, 2014 by Eddie

We are currently aware that some customers’ VPS have stopped functioning. We are investigating urgently and will update here as soon as we have further information.

Update 06:22AM BST – We have identified the problem and are working hard to restore service for customers affected. Further updates will be posted when we have more information.

Update 06:44AM BST – Unfortunately we’ve had a failure of a RAID card on one of our VPS hosts. We have replaced the card with a new one but we are currently checking the drives to ensure there is no data loss. As soon as we care confident the drives are OK we will be able to restore service. We will keep this page updated.

Update 07:18AM BST – We are able to access the data on the drives and all looks fine, but we’d like to move the data to a fresh set of drives to be absolutely sure there is no further risk to customer data. We’re in the process of doing that and then will be able to restore service to affected customers on this particular VPS host. We will copy each VPS’ data in turn and will be able to restore service to each VPS in turn in this way.

Update 08:47AM BST – Unfortunately the process is taking longer than expected, please accept our apologies for the delay. At the moment we cannot give an ETA, but will continue to update on here.

Update 10:39AM BST – Unfortunately it turns out the drives were not as fine as they initially looked. We found significant corruption and despite all our efforts we have been unable to restore the data. We are devastated to inform customers on this particular VPS host that their VPS data is lost with no hope of recovery. The data was stored on a RAID 10 array of 4 x SAS disks connected to an enterprise level hardware raid card. When the card failed it caused irreversible destruction of data on all 4 disks. As we do not yet offer backups for our VPS services (as stated in our FAQs) there are no backups from which we can restore anything. Affected customers will now find in the VPS Control Panel that they can start their VPS but they will need to create new fresh virtual drives for their VPS. We now ask affected customers to raise a support ticket with us if you need further assistance with anything. If you have backups of your VPS elsewhere, we will assist in any way we can to help you restore data to your VPS, please raise a ticket so that we can discuss. We can only offer a sincere and heartfelt apology for this dreadful incident. We are now reviewing our storage infrastructure and are going to double our efforts to put in place a backup solution so that customers can take regular backups of their VPS. We will announce when this is ready.

Network outage 3rd March

Posted on March 3, 2014 by Eddie

12:48PM: We are currently aware that some customers are experiencing a loss of network connectivity to/from their VPS. We are investigating the issue and will update this post with further information once we know more.

Update 12:52PM: The data centre have just posted a status update advising that network incident has just occurred causing the issue, which they are investigating.

We are able to see that all customer VPS affected are running fine, this is purely an issue with the Data Centre’s network. Once they have restored connectivity traffic to/from your VPS will resume without any need for action from the customer.

Update 12:57PM: We have noticed that network connectivity has now been restored by the Data Centre. We are waiting for an update from them but it looks like the incident may be over now.

Update 1:04PM: Network connectivity was restored by the Data Centre at about 12:56PM and has been fine since then, so we are assuming for now that the incident is over. Our monitoring systems detected that the incident started at about 12:40PM so total time of the outage was about 15 minutes. As soon as we have heard from the Data Centre what the reason for the outage was we will post details here.

Reason for outage

The following day after the above outage, the Data Centre we are located in sent us an explanation of what happened, which we have posted below. The incident affected all of the Data Centre’s customers. Thankfully they dealt with it very swiftly. In the whole time we have been with them this is the only significant incident we have ever experienced, which is a testament to the excellent facility that it is, and the fantastic staff that work there, who we have met on many occasions and have always been highly impressed by. If any of our customers have any questions about this incident, please do not hesitate to contact us.

"As part of a routine firewall deployment, some new hardware was connected to our network and BGP sessions were configured on our route reflectors for these new devices.

Unfortunately, when these new sessions connected to our route reflectors, they caused an issue with the software we are using to power them – specifically the length of time taken to process and send the full BGP table to the new devices caused the “hold-timer” to expire on a few of the other sessions. This then resulted in these sessions disconnecting and reconnecting, and doing the same thing as the first sessions, causing further sessions to do the same, ultimately resulting in a loop where by the route reflectors were continuously dealing with sessions disconnecting and reconnecting.

This continued until such time as our Network Engineers identified the issue, and were able to log into the route reflectors and manually restart them to clear all the BGP sessions. The first of which was done at 12:50, the second at 12:58. All BGP sessions were re-established with all routes fully exchanged by 13:00 and traffic once again began to flow normally.

It has become apparent from this incident that we have reached a limit with the current Route Reflector software that we are using, and as such we are now in the process of replacing these with both improved hardware, and alternative software which has better handling of long-running tasks that doesn’t result in cpu-starvation of important tasks such as maintaining hold-timers.

Initially we will run these new route-reflectors alongside the old ones to prove their stability, before ultimately retiring the old devices, a further maintenance window will be scheduled for this which you will be notified about in due course."

Delays creating new disks

Posted on September 29, 2013 by Eddie

Update 3: Sept 29th 19:00 BST: The backlog has been processed and new disk jobs are now all being processed within 30 seconds as normal. Please accept our apologies for the inconvenience this will have caused some customers, and please raise a support ticket if you are still seeing any issues.

Update 2: Sept 29th 18:35 BST: The issue causing the problem has been resolved but we see a few disk jobs still waiting to be processed, which are now going through but taking longer than anticipated because of the load on the disk storage subsystems.

Update 1: Sept 29th 18:05 BST: There is currently an issue on our VPS Control Panel platform causing disk creation jobs to be delayed by up to 20 mins. We are working on resolving the issue causing the problem and then will be processing any backlog as quickly as possible.