Accuracy in machine learning

July 31, 2016 § Leave a comment


In a interview question I was asked to state why is the Accuracy a bad measure. My answer was that the threshold must be fixed and a Recall-Precision curve would be better, where the threshold varied. I then said that with the graph one can see if the classifier is sensitive to peaks, and can see if it generalizes over great lengths good. I still think this answer is fine but Accuracy is more trickier than I thought.

From wikipedia which states that accuracy and precision are not the same and cite a very good source:



Accuracy: closeness of agreement between a measured
quantity value and a true quantity value of a
measurand. Whereas the precision of a measurement system, related to reproducibility and repeatability, is the degree to which repeated measurements under unchanged conditions show the same results.

In binary classification

Accuracy is also used as a statistical measure of how well a binary classification test correctly identifies or excludes a condition. That is, the accuracy is the proportion of true results (both true positives and true negatives) among the total number of cases examined.[6] To make the context clear by the semantics, it is often referred to as the “Rand accuracy” or “Rand index“.[7][8][9] It is a parameter of the test.

Still, this does not answer why Accuracy is an important measure. There is better  a very good blog is:

Why accuracy alone is a bad measure for classification tasks, and what we can do about it

One problem with it is that the formula calculated does not obeys the convention, so one might get confused what is tp, tn, fp and fn. First, he introduced

accuracy = (TP + TN)/(TP + TN + FP + FN)

but then he had used later

accuracy = (TP + TN)/(TP + FP + FN + TN)

Still I was not full aware of the accuracy paradox and thought only about the threshold setting issue (which is better access with a ROC curve or AUCPR). I was aware that in multi-label for Accuracy the true negatives can become too much important so the true positives, which are the most important information can be obfuscated. In multiclass seems for publicity reasons a error-rate sometimes is better.

There are a lot of other measures, but I will discuss that in another post.

Bottom line, accuracy might be good for talking to common folk, but error-rates normally are more impressive when showing gains, and for data scientist you should go with other measures, but also be aware why you chose them.


HARAM on RCV1 training, Dense and Sparse implementation comparison in C++

April 12, 2016 § Leave a comment

I was comparing liblinear to the HARAM approach, and the python implementation (numpy for dense and scipy for sparse) was relatively slow. Even in large datasets it was at most comparable.
The algorithm is pretty simple and can be explained with hyperboxes that grow. The growing is done with minimum over the hyperbox lower left corder with the sample and maximum with sample and right upper corner. The latter can be translated to be also a minimum (using 1-x), so it is mainly about taking the minimum. Yet, for sparse implementation there is a better alternative, which was explained in the paper.
The numpy/scipy part was mainly about the calculation of the minimum, which is pretty easy to implement. I did go even further and used cython to implement that part in the sparse mode, since there was a good optimization when transversing over the indexes for minimum and maximum.
I decided to implement the algorithm in c++ and see what could be optimized.
Suffice to say that the dense implementation was on sparse data about 10 times faster than scipy/cython one.
I planned to do a sparse also which would involve then not using simple double(float) arrays but using two vectors one for index and another for values.
Interesting is that the minimum and maximum over the sparse can be optimized if some assumption are taken, the lower bound will be more likely be zero, and only the intersection of nonzero values should be considered, the upper bound of the neuron will be more dense than the comparing sample so saving the value of the sum of the upper-bound will save considerably calculations.

//a is neuron lower bound, b is sample
double minimum_AB_sum(vector aData, vector aInd,vector bData,vector bInd){
double sumv=0;
int ia = 0;
int na = aInd.size();
int ib = 0;
int nb = bInd.size();
int cn = min(na,nb);

while (ia < na & ib < nb){
if( >{
ib ++;
else if( <{
ia ++;
ib ++;
ia ++;
if (aValue<bValue){

return sumv;

This simple while loop checks for the intersection between the indices vector aInd and bInd.

double maximum_AB_sum_fast(vector aData, vector aInd,double sumA,vector bData,vector bInd){
//A is neuron with many indicies, B is sample with few indices
double sumv;
int ia = 0;
int na = aInd.size();
int ib = 0;
int nb = bInd.size();
double sumB=0;
while (ia < na && ib aValue){
ib ++;
ia ++;
//values only in B
while (ib < nb){
ib ++;

return sumA+ sumB;

This is a pretty fast solution, still there is a big mistake which comes from the fact, that I am more comfortable with Java.
The vectors need to be passed by reference, i.e. double maximum_AB_sum_fast(vector &aData, vector &aInd,… because else c++ will copy the whole vector, instead of just passing the reference.
This means for RCV1 training (23149 samples) with 20835 and 103 labels it took 1.2 seconds to train and 6.8 s (2314 samples) to test.
Using the reference it took 0.8s to train and 4.1 s to test.
This problem was massive when I tried to use instead of two vectors (one index other data) to use hashmappings.
They were massively slow, but with the reference it was much faster, still not as fast as the routine above.
Unordered was faster with training time of 1.2s and test 9.3s. Orderered took 2.6s to train and 21.4s to test.
This also supports the findings of this comparison.
In the whole data with 780k samples training and 23149 test samples, it took about x seconds to train and 217s to test,
Liblinear only 6.7s. Still, comparing the number of computations the code is close to the efficiency of liblinear:
ML-HARAM does need the minimum operation for lower and upper bound requiring in mean
about 2884 (=2*(354(clusters)+17*$\frac{22684 neurons}{354 cluster}$)) operations per sample and
non-zero feature. Since SVM is BR it would
do 103 times so the difference would be about 28 (2884/103) times. 6.3s*28 is
176.4s which is close to the 217.8s which ML-HARAM needed.

Minecraft-pi offset two windows

September 15, 2015 § Leave a comment

Raspberry Pi gave me a strange behavior yesterday, as I tried to setup for my son to play minecraft (granted, I tried also to play the thing).

I installed raspbmc with retropie in my raspberry pi SD card, which works almost fine, only stella crashes too often… and snes is not that easy to setup to play, but I will get to this later.

I exited xmbc, typed startx, and could not read the letters of X. I had set the resolution very low because I had performance problems with rmvb playing over the lan… so I set the resolution to 1280×1024 (through xbmc), now I could read the context menu, but when I started minecraft, I had two windows, one with black screen with the window frame, another with the minecraft screen outside the first one, without any frame, but not centered, I could only see part of it.

The solution was to setup the resolution to 1920x1080p. The Problem is that minecraft work on the GPU so it read the resolution of the TV directly. I calculated that the middle is, but it is not the middle of X resolution. Sync the both and the window are one over the other.

windows 7 installation from USB Stick

May 29, 2013 § Leave a comment

I had recently to install windows 7, and the laptop in question had not a working dvd drive. To install windows 7 into a usb stick, which was used before for booting Ubuntu, turned to be a hard challange. It was not only necessary to use diskpart to create a primary partition on the USB. The automatic installer  from windows did always complain it could not install the software into the usb. Fortunately, I found a nice post (I lost it although) mencioning that wust (windows 7 usb/dvd download tool) would not work properly because of antivirus Avira. The author also noted that it should look suspicious to write to the master boot sector of any device. After that I had also the problem that multiple usb devices were plugged in and the vostro 1710 from dell would not show which it used to boot (ok, 5 years ago it was very rare to have that much usb sticks/drives attached for installing something). So unplugging the usb with the backups turned to be the last stone in the way.

Evince and Debian

April 16, 2013 § Leave a comment

Newly I made an update and my Evince does not automatically update after I compile a new version of my articles. It was not the bad I thought… After proof-reading, correcting  and compiling  in emacs, I really prefer that to automatize in the evince end instead of emacs.

To do so, gconf-editor should be opened and the auto-reload setting in apps should be set to true (see

Matlab access sparse row fast

April 2, 2013 § Leave a comment

Matlab does store values in a sparse matrix in the column format. In contrary to scipy it does not have multiple ways of storing the sparse matrix. Accessing a whole row in the matrix can be very expensive since matlab must go through all the columns search for this very index. Scipy does offer row, dictionary of keys, linked list and also column-wise storing of sparse matrices.

A solution for this issue is very simple but not trivial. The matrix can be transpose and stored again in sparse. So instead of accessing the row you must access the column, but it improves greatly the speed:

>> inds=randsample(10e5,1000);
>> m1=sparse(10000,10000);
>> m1(inds)=1;
>> tic;c1=0; for i=1:10000; c1=c1+nnz(m1(i,:));end;toc;
Elapsed time is 1.483025 seconds.
>> tic;c1=0; for i=1:10000; c1=c1+nnz(m1(:,i));end;toc;
Elapsed time is 0.168305 seconds.

So the solution is:

>> tic;c1=0; for i=1:10000; c1=c1+nnz(m1_t(i,:));end;toc;
Elapsed time is 1.613423 seconds.
>> tic;c1=0; for i=1:10000; c1=c1+nnz(m1_t(:,i));end;toc;
Elapsed time is 0.151909 seconds.

Here are the columns the rows of the original matrix m1.
For a bigger number the gain is even more impressive.

Why Computer games are addictive (in my case)

March 16, 2012 § Leave a comment

Sometimes you are at work and you can’t get the things done. You already tried this new method “get things done” but this does not motivate you. You think how much rewarding would be to play a game and feel cool, smart and even powerful again. The problem is that this illusion only last for a few hours, after that the load of work got to do got even bigger, the energy even smaller and the motivation loss is gone through the roof…

Why is that playing a game some nerd designed gives such much missed emotions?

First a game has a clear goal, with clear submissions. A good game also feels some how real, it sucks you into its own world. And most important you get things down and directly rewarded without vital pressure. Although you are working in this world/illusion (you are doing things done step by step) you do not feel like working. So it gives me two things I missed in my in own work clear goals, which is also hard to make when you work is about creativity and knowledge, because you need a good idea and also because you need to know the other bad ideas.  And the second thing is the reward, without pressure and gaanteed. Normally in real life you do not know if it works the way you want. Even if you have a great idea, it may be a great idea to you, others may have a very different opiniion.

So how to make your work so interesting as a game? First you need a clear goal. in the game is to finish it. In real life it can be to be promoted, to change to another company with better conditions or (as in my cse) get your damn Ph.D. Then you need some how a plan, it will be tentative, in a game you normally need much more tries to get the goal. Since in real life all the things are more complex you must also scale it up, you will need a lot of tries. You must get rewards, some ppl think it is food, well it does not work that well for me, but if you tried to put yourself in a very small diet every little chocolate is already a reward. Also write diaries and think what you found good about your job. If there is nothing good, either you do not value your work, in which case you should think about changing it, or you do not value you. The last one was my case. I did tried to find out what I liked and tried to persue it. Richard Feyman also had a time were he was not that motivated, and spend time trying to understand why plates spinning behave in a certain way. It got him motivated finding the solution and solved another problem with quantum physics. How do you find such things to do? Normally , it is boring yourself. That is the hard part, you must get used to your own thoughts, to  hear what you want, to give breaks to you. So not reading (or surfing, you spend also concetration on that), not listening to very intensive music (also spend energy and concetration on that), not playing stupid games (it indeed needs more concentration than it refills). Another thing is to do productive work like writing a blog (now you know my real motivation for writing this one), doing stuff with people you like (but not at work…), to do things were you feel smart and that gives you something back.

I must further admit I am taking antidepressant drugs, so maybe I feel this good because of the serotonin blockers… But I think also that trying hard and feeling the emotionsI can feel when doing productive stuff is also doing a great part of it. Still I try to keep my black humor but as Dr. Cox (Scrubs) said children make ones soft.