Histogram intersection kernel optimization in MATLAB

jperezmartin

I want to try a svm classifier using histogram intersection kernel, for a dataset of 153 images but it takes a long time. This is my code:

a = load('...'); %vectors
b = load('...'); %labels
g = dataset(a,b);

error = crossval(g,libsvc([],proxm([],'ih'),100),10,10);
error1 = crossval(g,libsvc([],proxm([],'ih'),10),10,10);
error2 = crossval(g,libsvc([],proxm([],'ih'),1),10,10);

My implementation of the kernel within the proxm function is:

...
case {'dist_histint','ih'}
    [m,d]=size(A);
    [n,d1]=size(B);
    if (d ~= d1)
        error('column length of A (%d) != column length of B (%d)\n',d,d1);
    end

    % With the MATLAB JIT compiler the trivial implementation turns out
    % to be the fastest, especially for large matrices.
    D = zeros(m,n);
    for i=1:m % m is number of samples of A 
        if (0==mod(i,1000)) fprintf('.'); end
        for j=1:n % n is number of samples of B
            D(i,j) = sum(min([A(i,:);B(j,:)]));%./max(A(:,i),B(:,j)));
        end            
    end

I need some matlab optimization for this code!

Divakar

You can get rid of that kernel loop to calculate D with this bsxfun based vectorized approach -

D = squeeze(sum(bsxfun(@min,A,permute(B,[3 2 1])),2))

Or avoid squeeze with this modification -

D = sum(bsxfun(@min,permute(A,[1 3 2]),permute(B,[3 1 2])),3)

If the calculations of D involve max instead of min, just replace @min with @max there.


Explanation: The way bsxfun works is that it does expansion on singleton dimensions and performs the operation as listed with @ inside its call. Now, this expansion is basically how one achieves vectorized solutions that replace for-loops. By singleton dimensions in arrays, we mean dimensions of 1 in them.

In many cases, singleton dimensions aren't already present and for vectorization with bsxfun, we need to create singleton dimensions. One of the tools to do so is with permute. That's basically all about the way vectorized approach stated earlier would work.

Thus, your kernel code -

...
case {'dist_histint','ih'}
    [m,d]=size(A);
    [n,d1]=size(B);
    if (d ~= d1)
        error('column length of A (%d) != column length of B (%d)\n',d,d1);
    end

    % With the MATLAB JIT compiler the trivial implementation turns out
    % to be the fastest, especially for large matrices.
    D = zeros(m,n);
    for i=1:m % m is number of samples of A 
        if (0==mod(i,1000)) fprintf('.'); end
        for j=1:n % n is number of samples of B
            D(i,j) = sum(min([A(i,:);B(j,:)]));%./max(A(:,i),B(:,j)));
        end            
    end

reduces to -

...
case {'dist_histint','ih'}
    [m,d]=size(A);
    [n,d1]=size(B);
    if (d ~= d1)
        error('column length of A (%d) != column length of B (%d)\n',d,d1);
    end
    D = squeeze(sum(bsxfun(@min,A,permute(B,[3 2 1])),2))
    %// OR D = sum(bsxfun(@min,permute(A,[1 3 2]),permute(B,[3 1 2])),3)

I am assuming the line: if (0==mod(i,1000)) fprintf('.'); end isn't important to the calculations as it does printing of some message.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related