Archive for the ‘OSCON’ category

Machine Learning Made Easy with Perl (the day before)

July 24, 2007

Machine Learning Made Easy with Perl is the name of the session I am giving tomorrow afternoon at OSCON. I really worked hard on this one 🙂 It took me more time than I expected to make machine learning easy 😉 I do not want to spoil the surprise but the talk is really packed so if you are attending, do not close your eyes for a second because you might miss one of the pointers that could save your next machine learning project.

There is a small update to the session: I will only be covering “Exploratory financial data analysis using fuzzy clustering” and “Medical decision support systems using support vector machines”. I will cover only two case studies to provide more in depth information. Come and see what I mean 🙂

I hope to see many faces there. By the way, I will make available the slides and the source code one week after the talk.




OSCON 2007: 16 days away

July 7, 2007

Only 16 days separate us from OSCON and I am still polishing the material for my session 😉 I asked my fellow PerlMonks for feedback on a preliminary version of the presentation’s outline and as usual the comments were really useful. Based on the comments, I decided to reduce to two the number of case studies to be presented instead of the three I originally planned. I believe that in this way, I will have more time to clearly explain the techniques.


By the way, with this post, I will start a series of posts in which I show some of the snippets I will be presenting. Here are the first one:


A common practice in machine learning is to preprocess the data before building a model. One popular preprocessing technique is data normalization. Normalization puts the variables in a restricted range (with a zero mean and 1 standard deviation). This is important to achieve efficient and precise numerical computation.

In this snippet, I present how to do data normalization using the Perl Data Language. The input is a piddle (see comment below for a definition) in which each column represents a variable and each row represent a pattern. The output is a piddle (in which each variable is normalized to have a 0 mean and 1 standard deviation), and the mean and standard deviation of the input piddle.

What are Piddles?

They are a new data structure defined in the Perl Data Language. As indicated in RFC: Getting Started with PDL (the Perl Data Language):

Piddles are numerical arrays stored in column major order (meaning that the fastest varying dimension represent the columns following computational convention rather than the rows as mathematicians prefer). Even though, piddles look like Perl arrays, they are not. Unlike Perl arrays, piddles are stored in consecutive memory locations facilitating the passing of piddles to the C and FORTRAN code that handles the element by element arithmetic. One more thing to note about piddles is that they are referenced with a leading $


use warnings;
use strict;

use PDL;
use PDL::NiceSlice;

# ================================
# normalize
# ( $output_data, $mean_of_input, $stdev_of_input) =
# normalize( $input_data )
# processess $input_data so that $output_data
# has 0 mean and 1 stdev
# $output_data = ( $input_data – $mean_of_input ) / $stdev_of_input
# ================================
sub normalize {
my ( $input_data ) = @_;
my ( $mean, $stdev, $median, $min, $max, $adev )
= $input_data->xchg(0,1)->statsover();

my $idx = which( $stdev == 0 );
$stdev( $idx ) .= 1e-10;
my ( $number_of_dimensions, $number_of_patterns )
= $input_data->dims();
my $output_data
= ( $input_data – $mean->dummy(1, $number_of_patterns) )
/ $stdev->dummy(1, $number_of_patterns);

return ( $output_data, $mean, $stdev );

Machine Learning Made Easy with Perl

June 15, 2007

That is the title of a session I am giving on July 25, 2007 at OSCON. Here is the abstract:

Machine learning is concerned with the development of algorithms and techniques that allow computers to “learn” from large data sets. This talk presents an overview of a number of machine learning techniques and the main configuration issues the participants need to understand to successfully deploy machine learning applications. The talk also covers three case studies in which we will use Perl scripts to solve real life problems:

  1. Medical decision support systems using support vector machines
  2. Exploratory financial data analysis using fuzzy clustering
  3. Pattern recognition in weather data using neural networks

This talk offers an intensive presentation of machine learning terminology, best practices, standard process, and strategy. Participants will get to know the techniques but more important, they will learn when to use them and why to use them. The talk is appropriate for educators and programmers who want to use machine learning in their own problem domains.

I will be posting more details about the session as we get closer to OSCON.



OSCON Recap Part III

September 22, 2006

Here is the last part describing my first visit to OSCON. Before I start, I have to thank O’Reilly for giving us the opportunity to talk about the Venezuela’s quest to achieve technological independence. Second, I have to thank Jeff for the invitation and his hospitality. And of course, I have to thank Chris for letting me stay at his place.

On Friday we gave our talk to a group of social activists in Portland. It went quite well: Alejandro had plenty of time to give his portion of the talk 😉

People were quite engaged and we hope the message went across. The audio of the talk is available here thanks to Chris.

Finally, I leave you with the note inviting to our talk




OSCON Recap Part II

September 22, 2006

I have been so busy with my research and helping organize the 4th World Forum on Free Knowledge that I did not have the time to write down my thoughts about OSCON until now. Anyway, I will do my best to give you a short description of what our session was about.

Our session name was: Sofware Libre: FOSS in Venezuela. We had a decent number of attendants considering that we were targeting people interested in the social aspects of Free and Open Source Software in a conference where the business aspects were king.

Jeff gave a not so brief introduction to the Venezuela’s efforts to move to free and open source software and then I started my talk. Some how I got carried away and started talking for a longer time than we agreed to. This left little time to Alejandro to talk about the project he is working on to make more human-friendly the prisons in our country.

In short, we got some nice reviews, specially this one. We hope people enjoyed the talk while learning about what is happening in Venezuela and how that relates to the global Free and Open Source Software movement.

For those interested, I posted the slides of my talk here.



OSCON recap. Part I

August 3, 2006

I decided to write three posts about my experience in OSCON 2006. The first post (this one), will focus on the first day I attended: Wednesday July 26, 2006. The second one, will focus on our session: Sofware Libre: FOSS in Venezuela. The last one, will cover our experience giving a similar talk for the general public in Portland.

OSCON Day 1.

The highlight of the day was Tim O’Reilly saying that the Open Source Licenses are Obsolete in his Keynote. This is still creating some discussion as can be seen in the O’Reilly Radar. However, that was not the only interesting topic Tim talked about. He mentioned 5 topics that should be in our radars:

  1. The Architecture of participation beyond Web 2.0
  2. The fact that open source licenses are obsolete
  3. The fact that open source allows for asymmetric competition
  4. The fact that “operations” is the competitive advantage for open source software companies
  5. Open data as a revolution with larger impact than the open source revolution

After Tim’s talk, Scott Yara from Greenplum had the opportunity to shine. In his talk, School of Rock, Scott compared the music industry associated with Rock and Roll with the software industry associated with Open Source. The highlight of his talk was reminding us that we (the open source developers and activists) should

“Keep it [Open Source Software] Real!”

“Keep it Dangerous! [dangerous to the establishment]”

The other two talks in the plenary session were not as interesting. In case you are wondering about their topics you can find them here or here.

Later in the day, I attended the session on Data Warehousing and Business Intelligence using PostgreSQL by Luke Lonergan. The talk was OK. However, the title was misleading since the Business Intelligence part was absent from most of the talk (it only appeared in the title).

After that, I moved to Easy AI with Python by Raymond Hettinger. Raymond did a very nice analogy between a database and an Artificial Neural Network. Based on that analogy, he was able to create complex queries (similar to the way people talk) to extract information from the database. The only drawback about his approach is that the Artificial Neural Network is not able to learn the connections so it is not intelligent at all. Anyway, it is a talk you might want to check. The presentations slides are available here.

After Raymond’s talk, I had a meeting regarding our session on Software Libre. The next talk I attended was: The Semasiology of Open Source (Part III) by Robert Lefkowitz. Robert is an outstanding speaker and the talk was hilarious. As someone described him the following day: Lefkowitz is a master of the metaphor. His talk is available here for those of you who might be interested. By the way, the talk was so interesting that we allowed Robert to take away part of the break. So, I did not have much time for seeing the exhibits.

Because my friends (Alejandro and Jeff) are Perl fans, I decided to join them for the Perl Lightning Talks. The talks were really good. That good that Alejandro convinced me to start learning Perl. So, here I am going trough the Llama Book (Learning Perl by R. Scwartz, T. Phoenix, and B. Foy).

A rather disappointing talk I attended to close the day was Data Mining Using Orange and Python. It was disappointing because the speakers were neither experts on Orange nor in Data Mining. The only good thing is that they showed me that even them could use Orange so it must be a really good tool.

Well, that is all for now. I will cover Part II of this series of posts during the weekend.



Going to OSCON!

July 25, 2006

In a couple of hours I am taking a plane first to Vancouver and from there to Portland. I will report again from there.