Moving beyond spreadsheets into enterprise-ready statistical analysis

Moving beyond spreadsheets into enterprise-ready statistical analysis

Moving beyond spreadsheets into enterprise-ready statistical analysis
For most knowledge workers, spreadsheets can be an attractive choice for performing calculations. If the task is to carry out simple tests on a small number of variables, then a spreadsheet is as good a tool as any. Its principal advantages are ubiquity, familiarity and ease of use. 

However,  spreadsheets may prove to be inadequate for a wide range of statistical analysis purposes. Their statistical features and algorithms are generally not as sophisticated, flexible or accurate as those in solutions that were specifically designed for enterprise-grade statistical analysis. For business analysts and anyone else whose job depends on the ability to build complex statistical models on greater amounts of data from disparate sources, you want a tool that goes beyond spreadsheets. You want an enterprise-grade tool that offers greater reliability, accuracy, speed, flexibility, scalability, functionality and programmability. And if that tool is as easy to use, manage and master as any spreadsheet, all the better. 

Read Also:
Predictive Analytics, Healthcare IoT Markets See Major Growth

If we’re going to propose a practical alternative to spreadsheets for daily use, we first need to examine what limitations keep spreadsheets from being used in high-performance enterprise applications. They have three key limitations: superficiality, limited flexibility and inadequate decision support.

Spreadsheets are acceptable if you intend to settle for a superficial glimpse at your data. But when the patterns, trends and correlations in your data set aren’t immediately apparent, you want something more. What you need are regression analysis, data mining and other advanced statistical capabilities that a spreadsheet simply can’t offer. Also, spreadsheets typically impose constraints on the number of records that can be analyzed. Consequently, a spreadsheet-based model can’t scale if the data set you’re analyzing is large—as many of them are in this era of big data.

Yes, carrying out mathematically sophisticated calculations on huge data sets using a spreadsheet is possible. But you can’t easily account for complex factors such as seasonality in a business, performing what-if analyses or developing multivariate scenarios unless you have a sophisticated tool such as IBM SPSS Statistics.

Read Also:
Blog 5: Statistics Denial, Myth #1: Traditional Techniques Straw Man [Blog Series]

Spreadsheets are mass-market software programs that often lack the flexibility to be extended to handle tasks for which they weren’t designed. This characteristic is especially true for tasks involving many variables, huge amounts of data and exceptionally complex statistical correlations. Though many people use spreadsheets as database substitutes, they are unsuited to this role because a proper database has built-in rules for structuring data, maintains data integrity, provides audit trails and has other robust data management and extensibility features that spreadsheets generally lack.

In addition, spreadsheet programs such as Microsoft Excel lack the programmability of a true database management system (DBMS). They instead rely on what is known as a nonprocedural programming language rather than the more sophisticated procedural languages such as Basic, C, Fortran and Java that enterprises rely on. Furthermore, spreadsheets often have functional limitations when handling special types of data such as missing or categorical data and performing sophisticated predictive analyses beyond straight-line extrapolations of past trends. They also exhibit drawbacks when propagating complex data updates beyond the clumsy process of propagating changes through tens to hundreds of separate linked cells, rows, columns and formulas without sacrificing visibility into their impact on accuracy.

Read Also:
Machine Learning And Privacy: A Problem?

Spreadsheet software can be unreliable. For one thing, spreadsheet vendors continually need to issue patches and corrections for problems that might lead to calculation errors.

Read Full Story…

 

Leave a Reply

Your email address will not be published. Required fields are marked *