As I promised before, not all of these lessons will result in pretty code. This, however, is the worst I believe. Efficiency wise, though, this is the most beneficial. So let’s follow up on an example from the last lesson!
function C = batchMul(A,B)
% Arrange so that the cumulative product will be done
% along the rows.
A = permute(A,[2 1 4 3]);
B = permute(B,[1 4 2 3]);
% Replicate the dimensions such that the "resulting"
% matrix now resides in the 2nd and 3rd dimensions.
A = repmat(A,[1 1 size(B,3)]);
B = repmat(B,[1 size(A,2)]);
Notice the matrix replication lines. Before discussing how to make that bit more efficient, let’s discuss a bit of the back-end of the Matlab/Octave platforms. While doing a bit of MEX (Matlab EXecutable) programming, back in the day, I noticed that every operation requires re-allocation of memory for the output matrix. Naturally, there are cases in which this is redundant. For example, given the simple addition operation
A = rand(100);
B = rand(100);
A = addTwoVars(A,B);
whereas the source code of addTwoVars is given by it’s C source code:
#include <mex.h>
#include <matrix.h>
#include <mat.h>
void mexFunction(int nOutput, mxArray ** Output, int nInput, const mxArray ** Input)
{
// This example shows a per-element addition example of two arbitrary sized matrices
// Input[0] - Matrix A, "double" class
// Input[1] - Matrix B, "double" class
// Output[0] - Per element addition of matrix A and B. Same as writing A + B, "double" classs
// Verify that there are only two inputs to this function
if(nInput != 2)
mexErrMsgTxt("There must be exactly two inputs\n");
// Verify that both are of "double" class
if(!mxIsDouble(Input[0]) || !mxIsDouble(Input[1]) )
mexErrMsgTxt("Both inputs must be of 'double' class\n");
// Verify that the number of dimensions of both matrices is the same
if(mxGetNumberOfDimensions(Input[0]) != (mxGetNumberOfDimensions(Input[1]))
mexErrMsgTxt("Both inputs must consist of the same number of dimensions\n");
// For each dimension, vary similarity between the two matrices
for(unsigned int dimIdx = 0 ; dimIdx < mxGetNumberOfDimensions(Input[0]) ; dimIdx++)
if(mxGetDimensions(Input[0])[dimIdx] != mxGetDimensions(Input[1])[dimIdx])
mexErrMsgTxt("Dimension missmatch between inputs\n");
// Verify that there is an input
if(nOutput != 1)
mexErrMsgTxt("There must be one output argument");
/***************
* Start adding *
***************/
// Get number of elements
size_t nElements = mxGetNumberOfElements(Input[0]);
// Create output matrix
Output[0] = mxCreateNumericArray( mxGetNumberOfDimensions(Input[0]),
mxGetDimensions(Input[0]),
mxDOUBLE_CLASS,
mxREAL);
// Get pointers to all data arrays
double *outData = mxGetPr(Output[0]),
*A = mxGetPr(Input[0]),
*B = mxGetPr(Input[1]);
// Per element addition
for(size_t idx = 0 ; idx < nElements ; idx++)
outData[idx] = A[idx] + B[idx];
}
Notice that the output variable was completely re-allocated during this code. Well, this is not entirely unavoidable, but such redundancies can definitely be sped up a bit.
So going back to the repmat exaple, I suggest that instead of replicating the matrices before hand, the Binary Singleton eXpansion FUNction is suggested, a.k.a. the renown bsxfun. The usage rules are as such: If there is a singleton dimension in one of the input variables that needs to be replicated, use the bsxfun. For example, the lines
% Replicate the dimensions such that the "resulting"
% matrix now resides in the 2nd and 3rd dimensions.
A = repmat(A,[1 1 size(B,3)]);
B = repmat(B,[1 size(A,2)]);
% Cumulative product
C = sum(A.*B,1);
can (and should) be replaced with
% Per element multipication
C = bsxfun(@times,A,B)
% Cumulative product
C = sum(C,1);
Or even better, by
% Cumulative product
C = sum(bsxfun(@times,A,B),1
Not half bad, actually. The result in this case is actually very pleasing to the eye! However, in most cases, this decreases readability. For an example, I found a gem in my thesis work. I constantly had to validate weather some coordinates are within a specific range. So I performed a simple Boolean operation.
% Now check if within bounds, define as regular
% inSeg = (cxListRT < segLengths) & (cxListRT >= 0);
inSeg = bsxfun(@and,bsxfun(@lt,cxListRT,segLengths),cxListRT >= 0);
Notice the difference in readability between the line that describes the expansion (the second commented line) and the actual way it is written. Actually, the mere fact that I had to document the line with it’s own “repmat” counterpart, explains about it’s readability.
What about acceleration? Well, the octave results I got were a bit exaggerated. For the following test script
clc
clear all
close all
Ms = [10 25 50 100 250 500 1000 2500];
N = 100;
nIters = 100;
t_repmat = zeros(size(Ms));
t_bsxfun = t_repmat;
t_repmat_curr = 0;
t_bsxfun_curr = 0;
for mIdx = 1:numel(Ms)
M = Ms(mIdx);
A = rand([M 1]);
B = rand([1 M]);
for i = 1:nIters
tic
C = repmat(A,[1 size(B,2)]) + repmat(B,[size(A,1) 1]);
t_repmat_curr = toc;
t_repmat(mIdx) = t_repmat(mIdx) + t_repmat_curr;
end
for i = 1:nIters
tic
C = bsxfun(@plus,A,B);
t_bsxfun_curr = toc;
t_bsxfun(mIdx) = t_bsxfun(mIdx) + t_bsxfun_curr;
end
end
% Average the timings to diminish effects of "first run"
t_bsxfun = t_bsxfun/nIters;
t_bsxfun = t_bsxfun/nIters;
figure('position',[250 214 1188 421]);
subplot(1,2,1);
semilogx(Ms,t_repmat*1000,'-b');
hold on;
semilogx(Ms,t_bsxfun*1000,'-r');
hold off;
xlabel('Number or rows\cols','fontsize',17);
ylabel('t [msec]','fontsize',17);
set(gca,'fontsize',15)
hdl = legend({'repmat','bsxfun'});
set(hdl,'fontsize',15);
subplot(1,2,2);
semilogx(Ms,t_repmat./t_bsxfun);
xlabel('Number of rows\cols','fontsize',17);
ylabel('Speedup: t_{repmat}/t_{bsxfun}','fontsize',17);
set(gca,'fontsize',15)
I got insane accelerations. This, again, is probably because of unpredictable nature of the Octave implementation. But until I have paid good cash to use the superior Matlab, I am not allowed to complain. Or at least until I have found a collaborator that have…

Cheers!