The right tools for (structured) BIG DATA handling - more Redshift

In my recent post on The right tools (structured) BIG DATA handling, I looked at using AWS Redshift to generate summaries from a large fact table and compared it to previous benchmark results using a columnar database on a fast, SSD drive.

RedShift performed very well indeed, especially so as the number of facts returned by the queries increased.  In this initial testing I was aggregating the entire fact table to get comparable tests to the previous benchmark, but that's typically not how a reporting (or analytic) system would access the data.  In this follow-up post then, let's look at how Redshift performs when we want to aggregate across particular records.

Next Generation DSRs - it's all about speed !

Recently, I have been working with a new-to-me BI tool that has reminded me just how much speed matters.  I'm not mentioning any names here, and it's not a truly bad tool, it's just too slow and that's an insight killer!

Continuing my series on Next Generation DSRs, let's look at how speed impacts the exploratory process and the ability to generate insight and, more importantly, value.

Many existing DSRs do little more than spit out standard reports on a schedule and if that's all you want, it doesn't matter too much if it takes a while to build the 8 standard reports you need.  Pass off the build to the cheapest resource capable of building them and let them suffer.  Once built, if it takes 30 minutes to run when the scheduler kicks it off, nobody is going to notice.

Exploratory, ad-hoc, work is a different animal and one that can generate much more value than standard reports.  It's a very iterative/interactive process.  Define a query, see what results you get back and kick off 2-3 more queries to explain the anomalies you've discovered: filter it, order it, plot it, slice it, summarize it, mash it up with data from other sources, correlate, .., model.  This needs speed.

Visualizing Forecast Accuracy. When not to use the "start at zero" rule ?

I recently joined a discussion on Kaiser Fung's blog Junk Charts , When to use the start-at-zero rule concerning when charts should force a 0 into the Y-axis.  BTW - If you have not done so, add his blog to your RSS feed, it's superb and I have become a frequent visitor.

On this particular post, I would completely agree with his thoughts was it not for this one metric I have problems visualizing, Forecast Accuracy.

Recommended Reading: The Definitive Guide To Inventory Management

A little over 15 years go now, I was set the task to model how much inventory was needed for all of our, 3000 or so, products at every distribution center.  Prior to this point, inventory targets had been set at aggregate level based off experience and my management felt it was likely we had too much inventory in total and what we did have was probably not where it was most needed. (BTW - they were absolutely right and we were ultimately able to make substantial cuts in inventory while raising service levels).

I came to the project with a math degree, some programming expertise, practical experience simulating production lines, optimizing distribution networks, analyzing investments and with no real idea of how to get the job done.  The books I managed to get my hands on gave you some idea how to use such a system but no real idea how to build it.  They left out all the hard/useful bits I think.  So, I set about to work it out for myself with a lot of simulation models to validate that the outputs made sense.
Product Details
I still work occasionally in inventory modeling and I'll be teaching some components this fall, so I have been eagerly awaiting this new book : The Definitive Guide to Inventory Management: Principles and Strategies for the Efficient Flow of Inventory across... by CSCMP, Waller, Matthew A. and Esper, Terry L. (Mar 19, 2014)