Will it Python?

What is it?

Will it Python? posts are my attempts to port data analyses originally done in R into Python.

The objective isn’t to just make a key that translates functions and methods in R into Python equivalents. Instead, the goal is to reproduce the results and insights of the analysis in idiomatic Python (to the extent I’m qualified to judge such a thing). Sometimes there will be a direct translation from a line of R to a line of Python; other times Python will suggest an altogether different approach to the problem.

What’s the point?

The first goal of Will it Python? is entirely selfish—I just want to learn how to better use the Python data analysis stack (NumPy, SciPy, Matplotlib, Pandas, etc.).

Second, though, I hope these posts can also be a useful resource  for others who are interested the Python data analysis toolkit and its viability as an alternative to R. That includes analysts who have been working in R and are “py-curious,” but aren’t 100% sure they can get their work done in Python. It also includes developers in the Python data analysis stack, who are still evolving their tools. By taking cases studies that are known-solvable in R and translating them to Python, we get a better of idea where the Python toolkit shines, and where it still falls short in features or usability.

What are the projects?

My first Will it Python? project is translating Machine Learning for Hackers by Drew Conway and John Myles White. The book has a great collection of case studies that showcase R doing what it does very well: getting users close to their data quickly, and running cutting edge statistical techniques via high-level libraries. The authors also strike a good balance in datasets they work with: they recognize that real world data are messy, but they don’t use datasets so complex that the tedium of cleaning and munging overwhelms the fun stuff.

I’ll also pull examples from textbooks and other sources with R analyses as they strike my interest. For example, Gelman and Hill’s Data Analysis Using Regression and Multilevel/Hierarchical Modelshas a lot of great examples of running statistical models in R.

What else?

You can see all the Will it Python? posts here, or by clicking in the category label of any Will it Python? post.

All the code written for Will it Python? posts lives at the Github repo here.

The dumb name and logo were shamelessly appropriated from Blendtec’s Will it Blend? ad campaign.

This is a learning experience for me, and my Python is amateurish at best. I appreciate any and all comments letting me know when I’m doing something dumb.

I’m also up for taking requests if folks have R projects in mind that they’d like to see attempted in Python.

Viewing IPython notebooks online

Most of the projects are coded in IPython notebooks. As I finish them, I’ll post links to them via nbviewer, so they can be read online. Note that a couple of the earlier entries were coded more with .py scripts in mind, so don’t take good advantage of the notebook features (like markdown, Latex, etc.)

Machine Learning for Hackers

Data Analysis Using Regression and Multilevel/Hierarchical Models (ARM)