First Brush with Python
I'm well aware I'm late to the band wagon - Python is a big deal and has been for a long time. For a long time, I've been meaning to give it a shot and see what it was all about. I've had a few weak attempts to understand it here and there, but generally never had it really grab my attention. Now that I've managed to complete my time at UNH, it's about time to try and pick up some new and interesting things.
To quote the Python website itself:
Python is a remarkably powerful dynamic programming language that is used in a wide variety of application domains. Python is often compared to Tcl, Perl, Ruby, Scheme or Java. Some of its key distinguishing features include: ...
Python is an interpreted language that is open source and very portable (it can be supported by a variety of virtual machine back ends). The language is a multi-paradigmed including imperative, object-oriented, and functional. These can work interchangably quite easily if desired, and results in some code that's fairly unique to read, yet really simple to follow. In addition, the standard library is very extensive (and the example I've created only touches a few of those options).
Following in the steps of my previous post, I found it neat to attempt to put together a simple statistics package in Python. This is an easy way to check out how a language may express particular algorithms. This set of stats functions is fairly simple - Read in a set of numbers from standard input (into the built-in data stucture list) and calculate things such as the mean, median, mode, variance and standard deviation.
First, lets see how we can read values from standard input. Like many other languages, there is a file-like handler for standard input. This is located with the sys package (sys.stdin). This file handler contains a few functions, most notably the readlines() function, which returns a list of each line read from standard input (each line as a separate value within the list). In other words, a first very simple version of reading input may look like so:
def get_value_set():
input_values = sys.stdin.readlines()
return input_values
And with that, the values are read from standard input. This is fine and all, but it is important to bare in mind that Python uses dynamic typing (in other words, resolving the types at run time). This has its advantages (like supporting multiple types in single functions easily) but will break down when we try and use certain operators (+, -, ...) on different types. In other words, while this is nice, we want to ensure that we have all floating point values for these inputs.
Again, lucky for us we're provided with the float(s) function, transforming a value into a numeric type for us (there is also an int(s) function). This is useful for what we're trying to do, but we're left with the question of how to make each input a float. This is something that doesn't usually look great in an imperative language, requiring some kind of loop applying the transformation to each value within the collection and saving it in the collection. A tedioius and not great looking solution.
Remember that Python supports functional programming. This means we have operations like map/reduce available to us as programmers. We also have list comprehensions. List comprehensions are like what you've seen in set theory - It is a way to control and describe what is in the set (without listing all of the values, which would be impossible many times). List comprehensions allow us to apply certain rules to a collection quickly and in a straightforward manner (no loops, loading/storing values, etc).
Here's the next version of our get_value_set() function:
def get_value_set():
input_values = [float(x) for x in sys.stdin.readlines()]
return input_values
def get_value_set():
try:
input_values = [float(x) for x in sys.stdin.readlines()]
return input_values
except ValueError:
print "ERROR - Invalid input given. Cannot continue"
sys.exit(-1)
def mean(numbers):
if len(numbers) == 0:
return 0
return sum(numbers) / len(numbers)
Once again, the standard library comes out to help us out. The sum function will take an iterable values and sum all of the values contained within it. The len function will count how many values are contained within a given collection. Thus, as long as there are more than zero values, we'll compute the mean in two function calls.
For the sake of comparison, lets see what the previous posts' implementation of the mean function looked like in Java:
public static<T extends Number> Double mean(AbstractList<T> values) {
if (values == null || values.isEmpty())
return new Double(0.0);
double sum = 0.0;
for (T value : values) {
sum += value.doubleValue();
}
return new Double(sum / values.size());
}
While it's certainly not the most confusing piece of code ever, there is a lot of extra boiler plate code and details required to follow this function. In just a few short lines, we've created the same behavior in Python. Granted, however, we're comparing two vastly different languages and any direct comparison (I believe) doesn't make much sense. Regardless, it is important to view what kinds of code we may be implementing and may even help improve code in other languages.
Finally, we can see all of these ideas come together in the function to calculate variance. Variance is basically the observation of how far values are displaced from an expected value and is the value the standard deviation is derived from. In particular, we're interested in seeing how far numbers are from the mean. Lets take a look:
def variance(numbers):
if len(numbers) == 0:
return 0
calculated_mean = mean(numbers)
deviated_values = [pow(x - calculated_mean, 2) for x in numbers]
return sum(deviated_values) / len(deviated_values)
Once again, taking advantage of our previous function mean, we can generate the variance quickly. First, calculate the mean. Next, determine the distance it stands from the mean by using a list comprehension (in particular, for each value in numbers, calculate (x-u)^2). Once this is generated, return the sum of those values divided by the amount of numbers in the collection.
I wish I was a headlight on a north bound train
I wish I was a headlight on a north bound train
I'd shine my light through cool Colorado rain...
The Grateful Dead, I Know You Rider
Tags: programming, python, statistics, introduction
