By Mitch Wagner and Ted Kemp
EBay's persistent site failures stem from a lack of coordinated IT planning and a centralized database and storage structure that creates a single point of failure, insiders and experts say.
The series of mishaps that led to eBay's 11-hour site crash on Jan. 3 was reminiscent of 15 similar outages between August 1998 and November 1999, including one in June 1999 that made the popular auction site unreachable for a record 22 hours.
Ebay (stock: EBAY) in a written statement blamed the latest blackout on "unrelated" glitches in storage hardware and database software. But experts scoffed at the notion that a string of such severe outages could be a coincidence.
Indeed, five years after the site's launch, the company hasn't installed redundant storage hardware to take over if the main storage system fails, said one former eBay staff member who asked not to be identified.
The IT department alerted management to the risk of storage failure a year ago and recommended an upgrade to redundant storage arrays, but the expense wasn't approved, the source said.
What's more, eBay's application architecture makes it difficult to move to a more resilient load-balanced server farm. The initial eBay applications were designed by former Oracle employees who believed in relying on a single large database. To this day, a single Oracle database supports item listings and auction operations, the source said.
While eBay has begun to split the database for redundancy, resource constraints and a divided management have gotten in the way of major changes, the source said.
The source described eBay as a marketing company that adds features aggressively and worries about the infrastructure later.
IDC analyst Dan Kusnetzky agreed. "It sounds like they've been trying to address issues as they've been discovering them, one by one, with no one sitting back and saying, 'What are we trying to accomplish and what's the best way to do it?'"
EBay uses Sun Enterprise Servers in tightly integrated clusters, which group servers using special software and hardware interconnects so that they appear to be a single system to the user, application and systems manager.
That approach is easier to administer, but it's also more prone to breakdowns because servers tend to share central resources, such as an operating system and storage system, Kusnetzky said.
Other high-traffic sites use more loosely integrated server farms, in which a large number of small, low-end servers perform identical functions in a redundant configuration. Search engine Google, for instance, uses 4,000 Linux servers that do searches and serve up Web pages in parallel, while Yahoo uses clusters of BSD Unix servers. If one server goes down, the rest pick up the load.
The downside is that management is more labor-intensive, because each server must be administered individually.
The server farm approach could be troublesome for a site like eBay, because users need to access the same centralized data simultaneously and in real time, said Munjal Shah, CEO of Andale, a service that helps small businesses manage auctions on eBay and other auction sites.
"It's not like Yahoo, where they can copy the data from a relatively static data center to servers all over the world," Shah said. "At eBay, everybody needs access to the current price."
Despite eBay's IT problems, observers are quick to note that the company is thriving financially, largely because it runs about 90 percent of all online consumer auctions. In its latest financial report, for the quarter ended Sept. 30, eBay said its revenue doubled from the previous quarter, to $113.4 million. Net income totaled $15.2 million.
"An eBay that's up part of the time is still probably better than a little auction site that's up all the time," said Yankee Group analyst Steve Vonder Haar.
Following a difficult 1999, eBay was relatively stable in 2000. The site suffered only one prolonged outage, in February, precipitated by a hacker attack on eBay and other big consumer sites. EBay says it has achieved 99 percent uptime for the last four quarters.
Even half-day outages won't threaten eBay's position, analysts say. "It's a little bit like what happens if the electricity goes out at Sears one day: Over time it's not going to have any significant impact," said ABM AMRO financial analyst Kevin Silverman.
Still, eBay said in a statement that it plans to upgrade hardware and distribute its Oracle database to many separate servers to ensure that in the event of a failure only part of the site goes down. That project has been underway for eight months; it should be completed by May 2001.
EBay's 99 percent uptime is in line with other consumer sites such as Amazon.com and Yahoo, said Daniel Todd, chief technologist at Keynote Systems, which measures Web site uptime and responsiveness.
But eBay should be held to tougher standards than consumer sites, Todd said. "If Yahoo goes down, I can't get news or e-mail for 20 minutes or an hour, but if eBay goes down, I might miss out on $1,000 auction that I'm running my business on," he said.
Indeed, the transaction-heavy eBay better resembles an online brokerage. One difference: Brokerage sites are based on mature systems; eBay developed its systems from scratch. "The trading sites have got decades of effort," Shah said, "and it shows."
Assurant Health seeking Siebel Solution Delivery Lead in Milwaukee, WI
Rho Trading Securities seeking Network and Systems Technician in Chicago, IL
JK Group, Inc. seeking Programmer / Analyst in Plainsboro, NJ
Sibley Memorial Hospital seeking Chief Information Officer in Washington, DC
Lowe's seeking DC Systems Technician II in Pittston, PA
For more great jobs, career-related news, features and services, please visit our Career Center.
TechWeb's FREE e-mail newsletters deliver the news you need to come out on top.
Get definitions for more than 20,000 IT terms.
Editorial and vendor perspectives