Read the Original Article at http://www.informationweek.com/news/showArticle.jhtml?articleID=228900199
Skype says last week's daylong service outage that left millions of users unable to place voice or video calls was caused by a string of events that snowballed into the company's worst service disruption since 2007.
Skype's Wednesday "post-mortem" of the embarrassing snafu that started around 8 a.m. Pacific Dec. 22 showed the inherent weakness in the company's peer-to-peer communications network that relies on having a large number of subscribers' computers working. In the latest outage, overloaded support servers caused delayed responses that caused some computers running a version of Skype's proprietary Windows software to crash, setting off a chain reaction.
The software, version 5.0.0152, contained a flaw that prevented the application from processing the delayed response. Roughly half of all Skype users worldwide run the older version of the Windows application, which led to approximately 40% of the computers on the network going offline.
Among the crashing applications were from 25% to 30% of the computers Skype uses as "supernodes" on the network. These systems have the resources to act like phone directories that other computers use to make and receive calls. With so many supernodes out of commission, the load on the remaining supernodes spiked, which was exacerbated further when millions of Skype subscribers attempted to get back on the network.
"The initial crashes happened just before our usual daily peak-hour, and very shortly after the initial crash, which resulted in traffic to the supernodes that was about 100 times what would normally be expected at that time of day," Lars Rabbe, Skype's CIO, said in a blog post explaining the outage.
As a result of the overload, more supernodes shut down, increasing the loads on other systems, which also shut down, leading to the massive outage that left without service more than half of the 20 million-plus users who make calls during peak hours each day.
The outage lasted for about 24 hours. Skype brought the network back up gradually by deploying several thousand "mega-supernodes" to offload work from the supernodes in the peer-to-peer cloud. In order to get the system running, Skype had to siphon from resources normally used to support group video calling. As a result, that service was down for an additional day.
Skype is reviewing the way it provides automatic software updates to help ensure that more subscribers have the latest version. If more subscribers had been running the latest Windows application, version 18.104.22.168, then the outage might have been avoided. The company also will review its testing procedures to try to prevent flaws in future versions.
The end-of-year outage was the second service disruption for Skype this year and the worst since a 36-hour outage in August 2007. The latest snafu comes as Skype tries to boost capacity and network performance to impress Wall Street as the company prepares for an initial public offering.
Skype announced its IPO plans over the summer, but has yet to say when the stock launch would take place. In the meantime, the company has been working to beef up its paid services, particularly in the business market. The vast majority of Skype subscribers use their PCs to call each other free-of-charge. People who call landlines or mobile phones pay only pennies a minute.