Archive for the ‘knowledge discovery’ category

Machine Learning Development with Perl

September 11, 2007

I just posted in PerlMonks a draft of a 45 minutes-long talk on Machine Learning Development with Perl. Here is an extract of that post:


Machine Learning Development with Perl

The development of machine learning applications can be seen as a three-phase process involving: preparation, modeling, and implementation (See Fig. 1).

As a developer, you have to move back and forth between phases until you get a satisfactory result.


In the preparation phase, you work with your customer to define the problem. You proceed, then, to gather some data. After that, you analyze the data and do some cleaning if necessary and select the features you are going to use in the model. Based on the type of problem, you may decide what type of model you want to develop: a classifier, an estimator, or a clustering application.


In the modeling phase, you do the model selection in case you did not do it in the preparation phase and then you do the development and finally you do the evaluation. Based on the results you get, you may decide to got back to the preparation phase and select other features, other cleaning method, or maybe other type of model.


In the implementation phase, you simply implement your model. One important consideration is that your model should continue learning from new data. Sometimes, in machine learning, your model works well initially but when the data grow significantly then the model does not perform as well as before. This is why it is important to allow the model to continue learning as more data become available.


The full post ( including source code ) is available at RFC: Machine Learning Development with Perl




Presenting AranduCorp

September 10, 2007

AranduCorp is a consulting firm focused on helping organizations improve their business processes, marketing, and sales. AranduCorp offers training and consulting services on predictive analytics and will soon offer affordable predictive analytics software solutions for small and medium size businesses.

For more information visit:

AranduCorp’s Website

AranduCorp’s Blog

Machine Learning Made Easy with Perl

June 15, 2007

That is the title of a session I am giving on July 25, 2007 at OSCON. Here is the abstract:

Machine learning is concerned with the development of algorithms and techniques that allow computers to “learn” from large data sets. This talk presents an overview of a number of machine learning techniques and the main configuration issues the participants need to understand to successfully deploy machine learning applications. The talk also covers three case studies in which we will use Perl scripts to solve real life problems:

  1. Medical decision support systems using support vector machines
  2. Exploratory financial data analysis using fuzzy clustering
  3. Pattern recognition in weather data using neural networks

This talk offers an intensive presentation of machine learning terminology, best practices, standard process, and strategy. Participants will get to know the techniques but more important, they will learn when to use them and why to use them. The talk is appropriate for educators and programmers who want to use machine learning in their own problem domains.

I will be posting more details about the session as we get closer to OSCON.



PhD Thesis: I got a date for the defense

April 26, 2007

I finally have a date for my Thesis defense: September 12, 2007 (give or take one week depending on the other commitments of the Jury). The tentative title of my thesis is:

Knowledge Based Systems for the Assessment of Scoliosis Severity

In the thesis, I describe my research on image and data analysis using machine learning for the development of clinical applications. More details are forthcoming …




January 29, 2007

Dan Russell, a full-time research scientist at Google, wrote a series of post on sensemaking at the Creating Passionate Users Blog. In the first post, Sensemaking 1, Dan starts with an interesting question: “How do you make sense of something that’s big and complicated?” Dan, then goes to give a brief overview of the responses people usually give him.

In the second post, Sensemaking 2: What I do to make sense, Dan explains his approach for sensemaking:

  • Figure out what it is that you’re trying to understand or get done
  • Collect a lot of information about the domain
  • Organize the information
  • Iterate
  • Do

Dan uses the third post, Sensemaking 3:The search for a representation, to show us how he used his approach to sensemaking to answer the question: How do people manage interruptions?

Finally, in the fourth post, Sensemaking 4: Summary of your comments, Dan comments on key issues that were brought up in the comments the readers left.

If you have to make sense of data, this group of posts is a must read!



State of the Computer Book Market at the O’Reilly Radar

January 17, 2007

Tim O’Reilly published an interesting post on the State of the Computer Book Market . One thing that caught my attention is the fact that Data Analysis related books are gaining some traction. Could that be related to the fact that businesses are realizing how important it is to do something useful with the data they are collecting? I guess that we will know in time…