Archive for the ‘analytics’ category

Machine Learning Development with Perl

September 11, 2007

I just posted in PerlMonks a draft of a 45 minutes-long talk on Machine Learning Development with Perl. Here is an extract of that post:


Machine Learning Development with Perl

The development of machine learning applications can be seen as a three-phase process involving: preparation, modeling, and implementation (See Fig. 1).

As a developer, you have to move back and forth between phases until you get a satisfactory result.


In the preparation phase, you work with your customer to define the problem. You proceed, then, to gather some data. After that, you analyze the data and do some cleaning if necessary and select the features you are going to use in the model. Based on the type of problem, you may decide what type of model you want to develop: a classifier, an estimator, or a clustering application.


In the modeling phase, you do the model selection in case you did not do it in the preparation phase and then you do the development and finally you do the evaluation. Based on the results you get, you may decide to got back to the preparation phase and select other features, other cleaning method, or maybe other type of model.


In the implementation phase, you simply implement your model. One important consideration is that your model should continue learning from new data. Sometimes, in machine learning, your model works well initially but when the data grow significantly then the model does not perform as well as before. This is why it is important to allow the model to continue learning as more data become available.


The full post ( including source code ) is available at RFC: Machine Learning Development with Perl