Archive for January 2007

Sensemaking

January 29, 2007

Dan Russell, a full-time research scientist at Google, wrote a series of post on sensemaking at the Creating Passionate Users Blog. In the first post, Sensemaking 1, Dan starts with an interesting question: “How do you make sense of something that’s big and complicated?” Dan, then goes to give a brief overview of the responses people usually give him.

In the second post, Sensemaking 2: What I do to make sense, Dan explains his approach for sensemaking:

  • Figure out what it is that you’re trying to understand or get done
  • Collect a lot of information about the domain
  • Organize the information
  • Iterate
  • Do

Dan uses the third post, Sensemaking 3:The search for a representation, to show us how he used his approach to sensemaking to answer the question: How do people manage interruptions?

Finally, in the fourth post, Sensemaking 4: Summary of your comments, Dan comments on key issues that were brought up in the comments the readers left.

If you have to make sense of data, this group of posts is a must read!

Cheers,

Lino

Programming systems for kids

January 19, 2007

I am starting to think about programming systems for kids. My son is 4 years old now. He likes to play with the computer, just like his dad. He also started to like the text editors I used when writing programs, specially because the colour of the words change depending on the combination of keys he presses. He is learning to read and write and I guess that he will be soon ready to start learning something more serious than playing Tux Racer. This brings me to the topic of this post, I recently read an article in the O’Reilly Radar about programming systems for kids. The comments for that post are including some good references, I will start checking in the near future. However, I would like to get other opinions. So, do you now of any programming system for kids that you would recommend? If so, please, tell me in the comments.

Cheers,

Lino

State of the Computer Book Market at the O’Reilly Radar

January 17, 2007

Tim O’Reilly published an interesting post on the State of the Computer Book Market . One thing that caught my attention is the fact that Data Analysis related books are gaining some traction. Could that be related to the fact that businesses are realizing how important it is to do something useful with the data they are collecting? I guess that we will know in time…

Cheers,

Lino

Taking the plunge into open source

January 15, 2007

From ZDNet, we have that:

“More software companies are finding that the best way to make money with software is to give it away, cherry-picking open-source software practices for commercial gain.”

The complete article is available here.

Fuzzy Clustering using the Perl Data Language

January 14, 2007

Hello,

Here is a new version of my clustering program using the Perl Data Language.

Cheers,

Lino


#!/usr/bin/perl
use warnings;
use strict;

use PDL;

# fcm: fuzzy c-means implementation in Perl
# usage: $fcm [number_of_clusters] [fuzzification_factor]
# [max_iter] [tolerace]
# returns: prototypes, partition_matrix
#

#
# reading data
#

my ( @data, @tmp, $number_of_patterns, $max_row_number, $max_column_number );

while (defined(my $line = )) {
chomp ($line);
@tmp = split /\s+/, $line;
push @data, [ @tmp ];
}

$number_of_patterns = @data;

my $patterns = pdl(@data);

#
# assigning other variables
#
my $number_of_clusters = shift @ARGV;
my $fuzzification_factor = shift @ARGV;
my $max_iter = shift @ARGV;
my $tolerance = shift @ARGV;

unless (defined($number_of_clusters)) {
$number_of_clusters ||= 2;
}
unless (defined($fuzzification_factor)) {
$fuzzification_factor ||= 2.0;
}
unless (defined($max_iter)) { $max_iter ||= 40; }
unless (defined($tolerance)) { $tolerance ||= 0.00001; }

$number_of_clusters = abs($number_of_clusters);
$fuzzification_factor = abs($fuzzification_factor);
$max_iter = abs($max_iter);
$tolerance = abs($tolerance);

#
# initializing partition matrices
#
my $previous_partition_matrix;
my $current_partition_matrix =
initialize_partition_matrix($number_of_clusters, $number_of_patterns);

#
# output variables
#
my $prototypes;
my $performance_index;

#
# fuzzy c means implementation
#
$max_row_number = $number_of_patterns – 1;
$max_column_number = $number_of_clusters – 1;
my $iter = 0;
while (1) {
# computing each prototype
my $temporal_partition_matrix = $current_partition_matrix ** $fuzzification_factor;
my $temp_prototypes = mv( $temporal_partition_matrix x $patterns,1,0) / sumover($temporal_partition_matrix);
$prototypes = mv($temp_prototypes,1,0);

# copying partition matrix
$previous_partition_matrix = $current_partition_matrix->copy;

# updating the partition matrix
my $dist = zeroes $number_of_patterns, $number_of_clusters;
for my $i (0..$max_row_number){
for my $j (0..$max_column_number){
my $temp_distance = distance($patterns->slice(“:,$i”), $prototypes->slice(“:,$j”), \&euclidean );
$dist->set($i, $j, $temp_distance);
}
}

my $temp_variable = $dist ** (-2/($fuzzification_factor – 1));
$current_partition_matrix = $temp_variable / sumover(mv($temp_variable,1,0));

#
# Performance Index calculation
#
$temporal_partition_matrix = $current_partition_matrix ** $fuzzification_factor;
$performance_index = sum($temporal_partition_matrix * ( $dist ** 2 ));

# checking stop conditions
my $diff_partition_matrix = $current_partition_matrix – $previous_partition_matrix;
$iter++;
if ( ($diff_partition_matrix->max $max_iter) ) {
last;
}
print “iter = $iter\n”;
}

print “=======================================\n”;
print “clustering completed\n”;
print “performance index = $performance_index\n”;
print “prototypes = \n”;
print $prototypes;
print “current partition matrix = \n”;
print $current_partition_matrix;

# ================================
# initialize_partition_matrix
# partition_matrix =
# initialize_partition_matrix(
# num_clusters, num_patterns)
# ================================
sub initialize_partition_matrix {
my ($partition_matrix, $column_sum);

$partition_matrix = random($_[1],$_[0]);
$column_sum = sumover (mv($partition_matrix, 1, 0));#sum over columns
$partition_matrix /= $column_sum;

return $partition_matrix;
}

# ====================================
# compute distance between two vectors
# dist = distance( vector1, vector2, /&type_of_distance )
# ====================================
sub distance{
my ($vector1, $vector2, $type_of_distance) = @_;
my ($r) = $vector1 – $vector2;
$type_of_distance->($r);
}

sub manhattan{ sum(abs($_[0]));}
sub euclidean{ sum(sqrt($_[0] ** 2) );}
sub tschebyschev{ max(abs($_[0])); }

__DATA__
4.0 4.0
4.0 5.0
5.0 4.0
5.5 6.0
5.0 5.0
4.5 4.5
5.0 5.5
5.5 5.0
5.0 4.5
4.5 5.0
9.5 9.0
9.0 9.5
8.0 8.0
7.0 8.0
8.0 7.0
8.5 7.0
7.0 8.5
7.0 7.0
7.5 7.0
6.5 8.0
8.0 6.5
6.5 7.0
10.0 10.0
10.0 9.0
10.0 9.0
9.5 10.0
8.0 10.0
9.5 9.5
9.0 9.0
9.0 10.0

Happy 2007!

January 10, 2007

I know I have not updated this Blog in a while but I am back! The last couple of months, I have been quite busy improving my Perl programming skills and learning to use the Perl Data Language (PDL). In the coming posts, I will share my experience using the PDL.

Cheers,

Lino