pilab is a Matlab package that contains my standard tools for data analysis of functional MRI, behavioural and eye tracking data.

This post is the first of a series of tutorials where we run pilab analyses on simulated data. Today we will cover a basic single subject fMRI analysis, including cross-validated prediction of time series. You will need my Matlab figure code and probably my general-purpose utility code to get the example below to run. Have fun!

Building simulated fMRI data

Let’s first make up some data to model. Notice that we assign each condition a true parameter b, which is a weight for each condition on each voxel.

In [1]:
% shuffle randomisation
% control variables for the simulation
% number of volumes (TRs)
n = 200;
% frame period - one volume every tr s
tr = 2;
% number of onsets to include in each run
ntrial = 20;
ncon = 4;
nvox = 50;
nrun = 2;
% true parameters - same across runs. +1 to model variations in constant
b = rand(ncon+1,nvox);

For each run of the experiment, we make up a set of onsets (uniformly distributed over the run), and assign each onset to a conind (effectively the column index in the final design matrix). We then convolve the onsets with a hemodynamic response function to construct the run-specific design matrix X.

The signal in this run is simply the design matrix weighted by the true parameters, X*b. To this we add a touch of noise to construct the final data.

Finally, we plug the convolved design matrix and data into Volume instances. These are container classes that make is easy to index in the sample dimension (e.g. runs) or feature dimension (e.g., regions of interest) while keeping any meta data in register.

In [2]:
designvol = cell(nrun,1);
datavol = cell(nrun,1);
for r = 1:nrun
    % different trial order in each run
    conind = ceil(rand(ntrial,1) * ncon);
    onsets = rand(ntrial,1) * n;
    X = [convolveonsets(onsets,conind,tr,n,'dur',1) ones(n,1)];
    % construct data as some mixture of gaussian noise and the true parameters
    % scaled by each predictor
    data = randn(n,nvox)*.3*max(X(:)) + X*b;
    % build Volume instances for convolved design and data
    metasamp = struct('chunks',ones(n,1)*r);
    designvol{r} = Volume(X,'frameperiod',tr,'metasamples',metasamp,...
    datavol{r} = Volume(data,'frameperiod',tr,'metasamples',metasamp);
% operator overloading means we can use standard concatenation syntax to combine
% the Volume instances over runs
datavol = cat(1,datavol{:});
designvol = cat(1,designvol{:});
datavol = 
  Volume with properties:

              data: [400x50 double]
          nsamples: 400
         nfeatures: 50
              meta: [1x1 struct]
              desc: [1x1 struct]
    standardstruct: [1x1 struct]
       frameperiod: 2

Let’s see what we have so far. Notice how the meta data is used to provide labels for the legend, and the frameperiod to express the X axis in seconds.

In [3]:
%plot -s 700,700 -r 300 -f svg
ax = subplot(2,1,1);
% hide the x axis. Don't you wish Matlab provided a more elegant way of doing this?
set(gca,'xcolor',[1 1 1]);
ax(2) = subplot(2,1,2);
% plot the data in second units by scaling frameperiod
ph = plot((1:designvol.nsamples)*designvol.frameperiod,designvol.data);
xlabel('time (s)');
lh = legend(ph,designvol.meta.features.labels,'location','southoutside');
set([lh ax],'box','off');

Time course fit and out of sample prediction

The basic design logic of pilab is to have independent classes for storing and analysing data. So now we convert the Volume to a GLM instance. Other converters are available for data that doesn’t come from a volume (see spm2glm, array2glm).

Before we do the conversion, we project out the constant from data and design matrix. This is equivalent to a mean subtraction. It’s often convenient to remove the constant (and higher degree polynomials) before cross-validating to ensure that performance is not dominated by trends in the data.

In [4]:
[filtereddatavol,filtereddesignvol] = preprocessvols(datavol,designvol,...
model = vol2glm(filtereddesignvol,filtereddatavol);
% notice that the model only contains 4 regressors as the constant has been projected out
(preprocessvols) projecting out 1 covariates
ans = 
  GLM with properties:

              X: [200x4 double]
           data: [200x50 double]
       nsamples: 200
      nfeatures: 50
    npredictors: 4
        cvgroup: []
      nrandsamp: 200

Let’s see how well the parameter estimates from run 1 can predict the time courses from run 2. We calculate R2 - the coefficient of determination. Which incidentally is not the same as the squared correlation when tested on independent data - this is why a few r2 estimates below are negative. We plot the first voxel to illustrate.

In [5]:
%plot -s 700,350 -r 300 -f svg
% cross-validated prediction - use model(2) fit and model(1) design matrix
% Notice that the GLM is an object array class, so can be indexed like a struct
% to run particular methods on say a train and a test split of the data.
prediction = predictY(model(2),model(1).X);
% evaluate on model(1) data
r2 = rsquare(model(1),prediction)
% visualise fit
ph = plot([prediction(:,1) model(1).data(:,1)],'.-');
lh = legend(ph,{'prediction','data'},'location','best');
r2 =
  Columns 1 through 7
    0.3414    0.0713    0.3359    0.3077    0.1362    0.3609    0.2496
  Columns 8 through 14
    0.1600    0.3858    0.1960    0.0725    0.0861    0.1064    0.2729
  Columns 15 through 21
    0.1266    0.2903    0.2676    0.0071   -0.0006    0.2975    0.1183
  Columns 22 through 28
    0.0783    0.3333    0.1941    0.3377    0.1581    0.1732    0.0449
  Columns 29 through 35
    0.1644    0.0830    0.0951    0.1146    0.2253    0.0880    0.2056
  Columns 36 through 42
    0.1893    0.1403    0.2310    0.1428    0.1101    0.2032    0.2370
  Columns 43 through 49
    0.0812    0.2415    0.1533    0.0984    0.1286    0.1373    0.1396
  Column 50

That’s it for today. Next time we will look at how to calculate linear discriminant contrasts - a pseudo-distance metric that is often used for representational similarity analysis.


comments powered by Disqus