In the previous post, I loaded the raw UFO data into a Pandas data frame after cleaning up some irregularities in the text file. Since we’re ultimately concerned with analyzing UFO sightings over time and space, the next step is to clean those variables and prepare them for analysis and vizualization.
Some Python techniques to note in this part are:
- Like in the last part, Python string methods are going to come in really handy, and be a simple, expressive solution to a lot of problems.
- When those aren’t enough, Python has a pretty straightforward set of functions for implementing regular expressions.
map()method in Pandas can be used to “vectorize” functions along a Series (i.e. a data frame column) and is similar to R’s
apply. In general, using a NumPy
ufunc(vectorized function) is preferable, but not all operations can be expressed in
ufuncs. This is especially true for non-numeric operations, such as for strings or dates.
Cleaning dates: mapping and subsetting.
The first two columns of the data are dates in
YYMMDDD format, and
Pandas imported them as integers. R has a function,
as.Date that will
operate on a vector ...