posts in the Data Analysis category

How do you speed up 40,000 weighted least squares calculations? Skip 36,000 of them.

Despite having finished all the programming for Chapter 2 of MLFH a while ago, there’s been a long hiatus since thefirst post on that chapter.


Why the delay? The second part of the code focuses on two procedures: lowess scatterplot smoothing, and logistic regression. When implementing the former in statsmodels, I found that it was running dog slow on the data—in this case a scatterplot of 10,000 height-vs.-weight points. Indeed, for these 10,000 points, lowess, run with the default parameters, required about 23 seconds. After importing modules and defining variables according to my IPython notebook, we can run timeit on the function:

%timeit -n 3 lowess.lowess(heights, weights)

This results in

3 loops, best of 3: 42.6 s per loop

on the machine I’m writing this on  (a Windows laptop with a 2.67 GHz i5 processor; timings are faster, but still in the 30 sec. range on my 2.5 GHz i7 Macbook).

An R user—or really a user of any other statistical package—is going to be confused here. We’re all used to lowess being a relatively instantaneous procedure. It’s an oft-used option for ...

Page 1 / 1