Jul 30, 2009 (10:07 AM EDT)
In SPSS, IBM Gains an Open R & Python Analytics Platform

Read the Original Article at InformationWeek

I love telling folks that I ran my first SPSS programs in 1976... and that I haven't run one since. I was in high school. I keypunched and submitted card decks for a researcher back when "SPSS" still stood for Statistical Package for the Social Sciences. SPSS has long since reinvented itself as a predictive analytics vendor. As numerous commentators have pointed out, the company's data mining capabilities will fill a gap in IBM's product line on completion of the announced acquisition and put new heat on rivals including SAS, SAP, and Oracle.

SPSS brings other, less-visible assets to the pending IBM deal. Readers whose interest goes beyond analyses of the "IBM gets to check off another capabilities box" variety may be interested in learning about one of them. SPSS provides an open stats platform that allows users to patch Python and R code into their SPSS routines. SPSS's Bring Your Own Analytics is a clear competitive differentiator. Whatever you call it, SPSS's stats platform is a pioneering example of hybrid commercial-open source analytical computing, with benefits for users and the company alike.Jon Peck, who is Principal Software Engineer and Technical Advisor at SPSS, explained to me that programmability actually started with PASW (Predictive Analytics Software) Statistics version 14 with the introduction of Python and .NET plug-ins. R support was introduced in PASW Statistics version 16. PASW Statistics 18 automates the installation of R and Python programmability components. The August 17 version 18 release is slated to include a new component, PASW Statistics Developer, that provides UI, data management, graphics, and presentation capabilities that make it easy for users to create statistical procedures using R and Python code libraries. To see how this works, check out examples on Jon Peck's blog, for instance on Python and Productivity.

According to SPSS, "users can create and integrate customized functionality with R and Python in any module. They all contain the programmability extension that allows this." Peck says "there are numerous SPSS customers who use R and Python as part of their application development. We know that based on the amount of feedback they've sent us requesting additional accessibility to these programming languages."

The company hosts a site, SPSS Developer Central, to facilitate collaboration, support, and code exchange. SPSS clearly gets the open-source way.

Bob Muenchen, who is manager of research computing support at the University of Tennessee, is one user who has definitely bought into SPSS approach. Muenchen uses and supports several stats packages; he's well aware of the capabilities and problem-suitability of a variety of stats options. He sees SPSS's openness to Python and R extension as a means of complementing the SPSS platform's built-in strengths and says that SPSS is positioning Python as a new, high-power macro (scripting) language. I share his view that SPSS (and SAS) scripting, which I'd define as repetitive, sequential programming with conditional execution, leaves a lot to be desired. Why not use a tool like Python that's designed for that style of programming?

Muenchen estimates that 95% of the SPSS users he supports are satisfied with built-in capabilities, but he believes that when you hit a wall with built-in stats functions, the ability to patch in existing, external code, even when written in a foreign stats language like R, is the most practical remedy.

Not incidentally, Muenchen has written a book, R for SAS and SPSS Users. If you visit his book page, you'll find a link to a presentation he gave last year, R You Taking Advantage of R, on extending SPSS with R code. The presentation is zipped up with code samples.

SPSS isn't the only analytics vendors opening up to R. SAS, in announcing an R interface for SAS/IML Studio 3.2 last March, stated "both R and SAS are here to stay, and finding ways to make them work better with each other is in the best interests of our customers." IML is SAS's Interactive Matrix Language; I published my take on the announcement when it first came out. I thought I'd balance this current story by checking with SAS about progress bringing R into other parts of the company's product line. After all, as Bob Muenchen observed to me, "most customers I work with are using the main [SAS] product and want to do something [extra] without dealing with complexity... Users don't want to dive off into IML."

Anne H. Milley, SAS director of technology product marketing, let me know that "IML Studio is just the first step. Some areas of product development which we are pursuing to further interface to R (enabling SAS programs to call R programs):

  • Available in SAS/IML Studio (client) today
  • Planned for SAS/IML (server) next year
  • And with other SAS products, both client and server next year and beyond."

SPSS is a clear leader in delivering open, extensible, commercial analytics. IBM will continue to sell and support acquired SPSS software, surely for at least a couple of years and probably well beyond as IBM integrates the software into its overall analytics offering. While IBM didn't buy SPSS for the open extensibility of its stats tools, that extensibility, impelled by IBM support for the SPSS platform, will likely continue to set a pace for commercial open analytics for years to come.I love telling folks that I ran my first SPSS programs in 1976... and that I haven't run one since. SPSS has long since reinvented itself as a predictive analytics vendor but brings other, less-visible assets to the IBM deal including the ability to patch Python and R code into SPSS routines. SPSS's Bring Your Own Analytics is a clear competitive differentiator with benefits for users and the company alike...