Multi-Voxel Pattern Analysis (MVPA)
This example provides one method of performing MVPA using AFNI. It is meant to be a relatively straightforward example for beginners wanting to learn the basics of MVPA.
We would like to emphasize that there are many ways of performing MVPA that may be better suited to the design of your study. Be sure to take a look at the links in the Additional Resources section at the bottom of this page for more information.
Prior to beginning this analysis, you will need to install AFNI on your computer. The following resource will assist you in downloading this program: AFNI download help page
If you are unfamiliar with UNIX or shell scripting, visit AFNI's Unix Tutorial before beginning this analysis.
The commands in this tutorial are meant to be used in a bash shell, so you may need to update your .bashrc file with the AFNI path if you only have AFNI configured for tcsh. See Step 3 on the help page for more information.
We also assume that you have a general knowledge of imaging analysis, so we do not cover pre-processing, stimulus timing file creation, or ROI mask generation in detail.
To follow along with our analysis steps, download the example dataset. This folder includes the following:
- raw data for the 8 runs (run#.nii)
- pre-processed data for the 8 runs (run#.preproc.nii)
- stimulus timing files (stimuli.block.onsets.1D)
- ROI mask (final.mask.nii)
- training dataset labels (trainLabels.1D)
Note that this example is for a single subject only, so we will provide you with specific commands to run rather than a script for processing multiple subjects. It would be helpful to create your own script using these commands for use with multiple subjects.
If you encounter any issues along the way, the AFNI help forum is a great place to search for solutions to your errors.
Contents:
- Overview of study design
- Step 1: Pre-processing
- Step 2: Regression analysis
- Step 3: Create training set
- Step 4: Create testing set
- Step 5: Create mask
- Step 6: Algorithm training
- Step 7: Algorithm testing
- Additional Resources
Overview of study design
To fully understand this analysis, you will need to understand the design of the study for the sample dataset we provided.
Our study was designed similarly to that of Rice et al. (2014).
Specifically, we presented a participant with 4 different types of visual stimuli (cars, shoes, faces, houses) in a blocked design. The participant passively viewed these images (no task was performed). The study consisted of 8 runs, where each run was comprised of four blocks, one for each stimulus category. Within each block there were 10 images from a single simulus condition. The blocks were randomized across each run, and no image was repeated across runs (80 images per category total).
The purpose of this MVPA example is to (1) train an algorithm to learn what brain activity looks like when the participant was viewing the four different stimuli (cars, faces, shoes, houses) and then (2) test if the algorithm can accurately predict what type of stimulus the participant was viewing for any given trial.
Step 1: Pre-processing
As in almost all other imaging analyses, we first need to pre-process the raw data. To make things easier for you, we have provided the already pre-processed data in the example dataset folder (pre-processed data labeled as 'run#.preproc.nii').
The following pre-processing steps were performed to generate these datasets:
- slice timing correction (3dTshift)
- motion correction (3dvolreg)
- spatial smoothing (3dmerge)
- temporal smoothing (3dTsmooth)
- deobliquing (3dwarp)
- brain masking
This tutorial assumes that you understand the basics of pre-processing with AFNI. Your dataset may require different pre-processing steps than what we performed, so we want to emphasize again that this is just a single example.
Step 2: Regression analysis
After pre-processing, our next step is to perform a regression analysis using 3dDeconvolve.
Before beginning this step, you will need to create stimulus timing files that correspond to the timing of the events in your study. The timing files for our example dataset can be found in the example dataset folder you downloaded at the start of this tutorial. We are assuming that you know how to create stimulus timing files, so we won't cover in detail how we created these files for our study.
Most traditional regression analyses average beta estimates across all trials for a given condition. However, for MVPA, we are interested in the beta for each time a specific stimulus condition was presented. Because of this, we will use 3dDeconvolve with the stim_times_IM option, rather than the traditional stim_times option. This will generate a beta for each individual trial of each condition. In our example we have 8 blocks, so the result of our regression analysis should contain 8 beta outputs per condition (32 total).
Again, we want to note that there are many ways to generate these betas – this is just one way of doing so that we found works well with our study.
Note: all commands are meant to be used in a bash shell (not tcsh)
To perform the regression analysis, we will use the following command:
3dDeconvolve -input run1.preproc.nii run2.preproc.nii run3.preproc.nii run4.preproc.nii run5.preproc.nii run6.preproc.nii run7.preproc.nii run8.preproc.nii \ -polort 1 \ -local_times \ -censor allRuns.censor.1D \ -num_stimts 4 \ -stim_times_IM 1 cars.block.onsets.1D 'BLOCK(9,1)' -stim_label 1 cars \ -stim_times_IM 2 faces.block.onsets.1D 'BLOCK(9,1)' -stim_label 2 faces \ -stim_times_IM 3 houses.block.onsets.1D 'BLOCK(9,1)' -stim_label 3 houses \ -stim_times_IM 4 shoes.block.onsets.1D 'BLOCK(9,1)' -stim_label 4 shoes \ -bucket MVPA.BLOCK.nii
The result of this command is a file called MVPA.BLOCK.nii which should contain 33 sub-bricks. Sub-bricks 1 through 8 are the betas for car stimuli runs 1-8, the next 8 sub-bricks are for face stimuli runs 1-8, and so-forth for houses and shoes (sub-brick 0 is the full F stat).
To extract these sub-bricks into individual files, type the following commands in terminal:
for a in $(seq 1 8); do 3dTcat -prefix cars.$a.nii MVPA.BLOCK.nii[${a}]; done for a in $(seq 9 16); do (( b=`expr $a - 8` )); 3dTcat -prefix faces.$b.nii MVPA.BLOCK.nii[${a}]; done for a in $(seq 17 24); do (( b=`expr $a - 16` )); 3dTcat -prefix houses.$b.nii MVPA.BLOCK.nii[${a}]; done for a in $(seq 25 32); do (( b=`expr $a - 24` )); 3dTcat -prefix shoes.$b.nii MVPA.BLOCK.nii[${a}]; done
This will create 8 files for each stimulus category (cars.1.nii...cars.8.nii, etc). These files contain the beta coefficient datasets for each specific stimulus for each of the 8 runs.
Step 3: Create training set
The next step in this tutorial is to create a training set, which will then be used to train the algorithm as to what the brain activity looks like when the participant is viewing a face, house, etc.
To create a training set for each stimulus category, you will need to randomly select 6 of the 8 runs for each category.
For the stimulus category "cars", we will randomly select runs 3, 8, 4, 6, 5, and 7 to use for training (runs 1 and 2 will be used for testing).
Once the runs are randomly selected, we will then run the following command to generate a training set for the stimulus category "cars":
3dTcat -prefix cars.train.nii cars.3.nii cars.8.nii cars.4.nii cars.6.nii cars.5.nii cars.7.nii
We will then do the same thing for the next stimulus category "faces" – randomly selecting runs 2, 8, 1, 7, 4, 5 for the testing set (leaving 3 and 6 for the training set):
3dTcat -prefix faces.train.nii faces.2.nii faces.8.nii faces.1.nii faces.7.nii faces.4.nii faces.5.nii
We will continue the same way with houses, randomly selecting runs 5, 3, 4, 8, 6, 7:
3dTcat -prefix houses.train.nii houses.5.nii houses.3.nii houses.4.nii houses.8.nii houses.6.nii houses.7.nii
And again with shoes, randomly selecting runs 8, 7, 4, 3, 6, 2:
3dTcat -prefix shoes.train.nii shoes.8.nii shoes.7.nii shoes.4.nii shoes.3.nii shoes.6.nii shoes.2.nii
Note: in a typical study, you would want to generate multiple training sets per category (~10 sets) and then concatenate those multiple sets into a single training set for each category. For simplification purposes with this tutorial, we are only showing how to generate one training set per category.
After creating a training set for each category, we then need to concatenate these four training sets into a single traning set. To do this, enter the following command in terminal:
3dTcat -prefix trainBlock.nii cars.train.nii faces.train.nii houses.train.nii shoes.train.nii
Finally, we need to create a file that tells the algorithm what order the stimuli appear in the final concatenated training set (trainBlock.nii). We have provided you with this file in the example dataset folder (trainLabels.1D), or you can create your own according to the process below:
The trainLabels.1D file should consist of a single column of category labels (we use the numbers 1-4) that correspond to the order of stimuli category in the dataset. You can create this 1D file in a simple text editor.
Because our final training block (trainBlock.nii) consists of a concatenated dataset of 6 car betas, followed by 6 face betas, followed by 6 house betas, followed by 6 shoe betas, the trainLabels.1D file should consist of a column of 6 1's (car label), followed by 6 2's (face label), followed by 6 3's (house label), followed by 6 4's (shoe label).
Thus, the contents of the 1D file should look like this:
1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4
Step 4: Create testing set
After creating a final concatenated training set containing all four stimulus categories and a corresponding training labels file, we now want to generate a testing set. We will do this the same way we generated the training set.
Remember from Step 3 above, we randomly selected runs 1 and 2 for the car test set, 3 and 6 for the faces test set, 2 and 1 for houses test set, and 5 and 1 for the shoes test set.
Again – you probably want to have ~10 training sets, which would result in ~10 testing sets for each category. You would then concatenate these 10 testing sets into one test set per category. However, for simplicity, we are only showing you how to create one test set per category.
To create the test sets, enter the following commands:
3dTcat -prefix cars.test.nii cars.1.nii cars.2.nii 3dTcat -prefix faces.test.nii faces.3.nii faces.6.nii 3dTcat -prefix houses.test.nii houses.2.nii houses.1.nii 3dTcat -prefix shoes.test.nii shoes.5.nii shoes.1.nii
Finally, in the same manner as in the traning set step above, we need to concatenate these four category test sets into a single test set by entering the following command in terminal:
3dTcat -prefix testBlock.nii cars.test.nii faces.test.nii houses.test.nii shoes.test.nii
Step 5: Create mask
After creating training and testing sets, the next step is to create a mask of the brain regions typically involved in your task. We do this to improve accuracy when training and testing, so that the algorithm is only training based on activity in task-relevant regions.
In a study with many participants, it is usually necessary to standardize the brains to a normalized space (i.e. MNI152 space) during pre-processing, and then utilize a normalized atlas to choose the brain regions of interest.
However, for the purposes of this tutorial, we have already created a mask in the subject's native space using freesurfer parcellation. The mask is provided to you in the example dataset folder (final.mask.nii).
More information on how to create ROI masks in AFNI.
Step 6: Algorithm training
Next, we need to train the algorithm to understand what brain activation looks like when the participant is viewing the various stimuli (faces, houses, cars, shoes). To do this, we will use a support vector machine analysis program in AFNI called 3dsvm. For more information on this program, visit the program help page. If you are interested in understanding the math behind support vector machines, you may find this resource to be of interest.
Again, we want to emphasize that 3dsvm is not the only program to use for training/testing. There are many different programs that have been developed for this purpose that you may find better suited for your study – this just happens to be the program we chose to use for our tutorial.
Inputs to this command are the trainingBlock.nii and trainingLabels.1D files created in Step 3 as well as the mask of the region known to be activated by our task.
The run 3dsvm training, enter the following command in terminal:
3dsvm -trainvol trainBlock.nii \ -trainlabels trainLabels.1D \ -model trainSet.model.nii \ -mask final.mask.nii
Step 7: Algorithm testing
The final step in our MVPA tutorial is to then test whether the algorithm can predict what type of stimuli the participant was viewing. To do this, we will run 3dsvm with the testBlock.nii file we created in Step 4.
To run 3dsvm testing, enter the following command in terminal:
3dsvm -testvol testBlock.nii \ -model trainSet.model.nii \ -classout \ -predictions exemplar
The output of this command that is of interest to us is the file "exemplar_overall_DAG.1D"
Note: Because of the way that the algorithm works, you don't get the same prediction every time you run this command. Thus, we are unable to provide you with exactly what the output will look like. However, below is an example for what a potential exemplar_overall file will look like.
Remember that we labeled cars as 1, faces as 2, houses as 3, and shoes as 4. Thus, in the testing set, there are two car datasets followed by two face datasets followed by two house datasets followed by two shoe datasets. Thus, if the algorithm guessed everything perfectly correct, the exemplar_overall file would look like the following:
1 1 2 2 3 3 4 4
However, the predictor algorithm is not always perfectly accurate. Thus, we need to compare its actual guesses (exemplar_overall_DAG.1D) to what we were expecting (outlined above). In reality, the predictors were as followed :
1 3 2 2 3 3 1 4
This means that the first car trial was guessed correctly while the second was guessed incorrectly, both trials of faces and houses were guessed correctly, and the first shoe trial was guessed incorrectly while the second was guessed correctly.
Remember, you may not get this exact result even though you are running the same 3dsvm command that we ran because the algorithm doesn't always produce the same results.
We also want to emphasize the importance of examining the prediction performance on a per-category basis, rather than just examining the overall level of prediction across all categories. In our above example, the algorithm predicted 100% correctly for faces and houses, but only 50% correctly for cars and shoes. This is likely because faces and houses have more distinct patterns of activation, thus allowing for easier prediction. Remember that since we have four categories, predicting based on chance is 25% – so even though cars and shoes had only 50% correct prediction, this is still much better than chance.
Additional Resources
If you don't know where to begin in terms of understanding MVPA, James Haxby outlines an interesting overview of how this technique originated
Here is a useful guide for help in understanding how to design an MVPA study
If you are interested in the math behind support vector machines, you may wish to read this paper
The authors of the above paper also provide a list of useful MVPA links that may be of use in your journey through understanding MVPA.
This "MVPA Meanderings" blog covers a wide range of MVPA topics – the link provided will direct you to a very helpful post with many useful links to MVPA literature, software packages, etc
To read up on MVPA in the literature, this site outlines several MVPA papers of interest.