MATLAB efficient histogram look up

marcman

I have a large 3-dimensional matrix (on the order of 1000x1000x100) containing values that correspond to a bins in normalized high-resolution histograms. There is one histogram for every index in the 3rd matrix dimension (e.g. 100 histograms for the example dimensions).

What is the fastest way to check the find the probability (i.e. the value associated with a bin in a normalized histogram) of the value at the 2D indices?

The code I have now is exorbitantly slow:

probs = zeros(rows, cols, dims);
for k = 1 : dims
    tmp = data(:,:,k);
    [h, centers] = hist(tmp, 1000);
    h = h / sum(h); % Normalize the histogram
    for r = 1 : rows
        for c = 1 : cols
            % Identify bin center closest to value
            [~, idx] = min(abs(centers - data(r, c, k)));
            probs(r,c,k) = h(idx);
        end
    end
end

For loops are generally (although not always) less efficient than vectorized code, and nest for loops are usually even worse. How can I go about doing this with fewer loops but also without running out of memory? I tried some repmat calls to vectorize the whole process, but crashed my MATLAB session with 1000x1000x1000x100 matrices.

NOTE: I only have MATLAB 2014a, so while solutions using the new histogram() function are welcome, I'm still stuck with hist().

Here's a small-scale demo example that should run in a replicable way:

rng(2); % Seed the RNG for repeatability
rows = 3;
cols = 3;
dims = 2;
data = repmat(1:3,3,1,2);
probs = zeros(rows, cols, dims);
for k = 1 : dims
    tmp = normrnd(0,1,1000,1);
    [h, centers] = hist(tmp);
    h = h / sum(h); % Normalize the histogram
    for r = 1 : rows
        for c = 1 : cols
            % Identify bin center closest to value
            [~, idx] = min(abs(centers - data(r, c, k)));
            probs(r,c,k) = h(idx);
        end
    end
end

When I ran the above code, I got the following output (which is logical as the histogram is a normal Gaussian):

probs(:,:,1) =

0.1370    0.0570    0.0030
0.1370    0.0570    0.0030
0.1370    0.0570    0.0030


probs(:,:,2) =

0.1330    0.0450    0.0050
0.1330    0.0450    0.0050
0.1330    0.0450    0.0050

NOTE: I found an efficient solution to this as posted in the answer below.

marcman

BEST SOLUTION:

Without using the new histogram() function (i.e. all versions before R2014b), the best way to do this is to take advantage of both the hist() function and the histc() function in tandem.

There are two cases to consider as well:

  1. Binning data and then looking up the same data in the histogram
  2. Looking up bins in a histogram formed from different data

The first case is the simpler one. A nice feature of histc() is that it returns both the histogram and the indices of the histogram into which the data was binned. At this point, we should be done. Alas! Sadly we are not. Because the code behind histc() and hist() bins the data differently, we end up with two different histograms depending on which we use. The reason appears to be that histc() chooses bins based on a strictly greater-than criteria, while hist() chooses bins using greater-than-or-equal-to. As a result, the equivalent function call:

% Using histc
binEdges = linspace(min(tmp),max(tmp),numBins+1);
[h1, indices] = histc(data, binEdges);

% Using hist
[h2, indices] = hist(data, numBins);

leads to different histograms: length(h1) - length(h2) = 1.

So to handle this we can just add the value in the last bin of h1 to the value in the second-to-last bin of h1, lop off that last bin, and adjust the indices accordingly:

% Account for "strictly greater than" bug that results in an extra bin

h1(:, numBins) = h1(:, numBins) + h1(:, end); % Combine last two bins
indices(indices == numBins + 1) = numBins; % Adjust indices to point to right spot
h1 = h1(:, 1:end-1); % Lop off the extra bin

Now you are left with an h1 that matches h2 and and indices vector corresponding to where your data went in h1. Thus, you can look up the probability information by efficient indexing instead of loops.

Runnable sample code:

rng(2); % Seed the RNG for repeatability

% Generate some data
numBins = 6;
data = repmat(rand(1,5), 3, 1, 2);
[rows, cols, dims] = size(data);
N = rows*cols;

% Bin all data into a histogram, keeping track of which bin each data point
% gets mapped to
h = zeros(dims, numBins + 1);
indices = zeros(dims, N);
for k = 1 : dims
    tmp = data(:,:,k);
    tmp = tmp(:)';
    binEdges = linspace(min(tmp),max(tmp),numBins+1);
    [h(k,:), indices(k,:)] = histc(tmp, binEdges);
end

% Account for "strictly greater than" bug that results in an extra bin
h(:, numBins) = h(:, numBins) + h(:, end); % Add count in last bin to the second-to-last bin
indices(indices == numBins + 1) = numBins; % Adjust indices accordingly
h = h(:,1:end-1); % Lop off the extra bin
h = h ./ repmat(sum(h,2), 1, numBins); % Normalize all histograms

% Now we can efficiently look up probabilities by indexing instead of
% looping
for k = 1 : dims
    probs(:, :, k) = reshape(h(sub2ind(size(h), repmat(k, 1, size(indices, 2)), indices(k,:))), rows, cols);
end
probs

In the second case, look-ups are tougher because you don't have the luxury of tracking bin-indices during histogram creation. However, we can work around this by building a second histogram with identical bins to the first, and tracking the indices during the binning process.

For this approach, you first compute an initial histogram using hist() on some histogram training data. All you need to store are the minimum and maximum values of that training data. With this information, we can generate the same histogram using linspace() and histc(), adjusting for the extra-bin "bug" that histc() gives.

The key here is to handle outlier data. That is, the data from your new data set that falls outside the pre-computed histogram. As that should be assigned a frequency/probability of 0, we just add an extra bin to the pre-computed histogram with a value of 0, and we direct any unbinned new data to map to that index.

Here is some commented, runnable code for this second method:

% PRE-COMPUTE A HISTOGRAM
rng(2); % Seed the RNG for repeatability

% Build some data
numBins = 6;
old_data = repmat(rand(1,5), 3, 1, 2);
[rows, cols, dims] = size(old_data);

% Store min and max of each histogram for reconstruction process
min_val = min(old_data, [], 2);
max_val = max(old_data, [], 2);

% Just use hist() function while specifying number of bins this time
% No need to track indices because we are going to be using this histogram
% as a reference for looking up a different set of data
h = zeros(dims, numBins);
for k = 1 : dims
    tmp = old_data(:,:,k);
    tmp = tmp(:)';
    h(k,:) = hist(tmp, numBins);
end
h = h ./ repmat(sum(h, 2), 1, numBins); % Normalize histograms
h(:, end + 1) = 0; % Map to here any data to that falls outside the pre-computed histogram

% NEW DATA
rng(3); % Seed RNG again for repeatability

% Generate some new data
new_data = repmat(rand(1,4), 4, 1, 2); % NOTE: Doesn't have to be same size
[rows, cols, dims] = size(new_data);
N = rows*cols;


% Bin new data with histc() using boundaries from pre-computed histogram
h_new = zeros(dims, numBins + 1);
indices_new = zeros(dims, N);
for k = 1 : dims
    tmp = new_data(:,:,k);
    tmp = tmp(:)';

    % Determine bins for new histogram with the same boundaries as
    % pre-computed one. This ensures that our resulting histograms are
    % identical, except for the "greater-than" bug which is accounted for
    % below.
    binEdges = linspace(min_val(k), max_val(k), numBins+1);
    [h_new(k,:), indices_new(k,:)] = histc(tmp, binEdges);
end

% Adjust for the "greater-than" bug
% When adjusting this histogram, we are directing outliers that don't
% fit into the pre-computed histogram to look up probabilities from that 
% extra bin we added to the pre-computed histogram.
h_new(:, numBins) = h_new(:, numBins) + h_new(:, end); % Add count in last bin to the second-to-last bin
indices_new(indices_new == numBins + 1) = numBins; % Adjust indices accordingly
indices_new(indices_new == 0) = numBins + 1; % Direct any unbinned data to the 0-probability last bin
h_new = h_new ./ repmat(sum(h_new,2), 1, numBins + 1); % Normalize all histograms

% Now we should have all of the new data binned into a histogram
% that matches the pre-computed one. The catch is, we now have the indices
% of the bins the new data was matched to. Thus, we can now use the same
% efficient indexing-based look-up strategy as before to get probabilities
% from the pre-computed histogram.
for k = 1 : dims
    probs(:, :, k) = reshape(h(sub2ind(size(h), repmat(k, 1, size(indices_new, 2)), indices_new(k,:))), rows, cols);
end
probs

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Best data structure for efficient look-up by date in R

From Dev

Histogram in MatLab

From Dev

Increase image contrast using look up table in MATLAB

From Dev

A fast look up value in vectors MATLAB code comparion

From Dev

More efficient way to look up dictionary values whose keys start with same prefix

From Dev

Efficient way to replace an element in a list of tuples by performing look up from another list (python)

From Dev

most efficient way to look up the same index number in two different lists to compare values

From Dev

multiple colours on matlab histogram

From Dev

Image histogram implementation with Matlab

From Dev

Estimating skewness of histogram in MATLAB

From Dev

stretching histogram of image in matlab

From Dev

a color histogram algorithm in matlab

From Dev

multiple colours on matlab histogram

From Dev

Normalizing a histogram in matlab

From Dev

Change axis in histogram Matlab

From Dev

Histogram of labeled strings in matlab

From Dev

matlab histogram (with for loops)

From Dev

Efficient way to look in list of lists?

From Dev

Most efficient way to look for these inconsistencies?

From Dev

Is it possible to share data between MATLAB objects, for example a look-up table?

From Dev

MATLAB Need help using 2D index table (look up table?)

From Dev

Bookmark look up confusion

From Dev

jndi look up for DefaultFtpSessionFactory

From Dev

Strange function look up

From Dev

vba - Look up in Loop

From Dev

Strange function look up

From Dev

Matplotlib xticks not lining up with histogram

From Dev

Combine Histogram and Cumulative Distribution Matlab

From Dev

Histogram from Spherical Plots in Matlab

Related Related

HotTag

Archive