Comparison of multiple datasets • TrackMateR

In the first vignette vignette("TrackMateR") we saw how to analyse one TrackMate XML file. This is one dataset, the result of a TrackMate tracking session on a single movie. A more likely scenario is that you have several datasets, from different conditions.

In this scenario we would like to see:

a report for each dataset
a compiled report for all datasets in that condition
a comparison between conditions of key parameters

Organising your data

Recommended Start a new RStudio session in a new directory using Create Project. We find it useful to follow this convention for RStudio projects. This gist can be run in the console to quickly set up the folder structure for your new project. Now place your data in a folder called Data/ in this directory.

Required Place all your TrackMate XML files into subfolders according to condition. For example you might have:

Data/
- Control/
  - cell1.xml
  - cell2.xml
- Drug/
  - cell1.xml
  - cell2.xml
  - cell3.xml

Only one level of folders is allowed.

Analysing multiple datasets

Once the data is in place, a single command will generate a series of reports, summaries and a comparison

library(TrackMateR)
compareDatasets()

The outputs are generated using standard parameters equivalent to reportDataset(). However, it is possible to change the defaults by passing additional parameters. For example, compareDatasets(radius = 100) will use the defaults for everything except the radius for searching for neighbours in the track density analysis (this example will use a value of 100 units for the radius). Parameters that can be changed are:

N, short - from MSD analysis
deltaT, mode, nPop, init, timeRes, breaks - from jump distance calculation and fitting
radius - from track density analysis
msdplot - change the MSD plots from linear-linear (default) to e.g. log-log. Options: loglog, loglin, linlog, linlin.

Outputs

In the example above, the code will produce the following in a directory called Output/

Plots/
- comparison.pdf
- Control/
  - combined.pdf
  - report_1.pdf
  - report_2.pdf
- Drug/
  - combined.pdf
  - report_1.pdf
  - report_2.pdf
  - report_3.pdf

That is, the individual reports are saved with a new name (regardless of the original XML filename), all datasets are combined (per condition) in combined.pdf and a comparison of a summary of all conditions is saved in the top level folder (comparison.pdf).

As well as graphical outputs, compareDatasets() saves several csv files of data (to Outputs/Data/). In each condition folder, the data frames are collated and saved. This includes: TrackMate data, MSD, jump distance and fractal dimension data. There is also a csv file of the data frame used to plot the average msd for this condition.

Above the condition folders, three csv files are saved. A collation of the msd summary data allMSCurves.csv for all conditions; summary data per dataset allComparison.csv which is the data frame used for making the comparison plots; and allTraceData.csv which is a data frame of properties per trace per dataset for all conditions.

The idea with these files is that a user can load them back into R and process the data in new ways and go beyond TrackMateR. An advanced user can make their own data frames by running TrackMateR functions. A good place to start is to peruse the code for compareDatasets() and modify from there.

No data? No problem!

If you don’t have your own datasets, or if you’d like to download some data to process and understand how comparison between datasets work in TrackMateR, there are some test datasets available here. There is raw data, TrackMate XML files and example TrackMateR outputs for movies with particles undergoing 6 different kinds of motion. To try it out, download the files and place the contents of TrackMateOutput folder into you RStudio project folder under Data as described above. Now, run

library(TrackMateR)
compareDatasets()

to see how the data can be processed. Using the default parameters, the track density is quite sparse, so one option to try to alter first is radius, i.e. compareDatasets(radius = 4).

The test datasets are generated by tracking synthetic data. Since the movies are synthetic, we have ground truth data for the exact position and identities of the tracks. The ground truth datasets are useful for verifying TrackMate outputs, benchmarking TrackMate performance and so on. To load a single ground truth data set, use readGTFile("path/to/file") and then recalibrate as you would for a TrackMate file (the ground truth data is in pixels and frames). Or you can compare many ground truth csv files using compareGTDatasets("path/to/Data") and specifying the rescaling arguments.

Finally, to generate your own “ground truth” data without using the code here, all that is required is a csv file specifying the tracks with columns TrackID,x,y,frame.

Limitations

When using compareDatasets() it is assumed that the files and datasets have similar scaling. If files need recalibration, see vignette("calibration"). Be aware that if datasets with varying calibration are used, some plots generated by compareDatasets() will be misleading. If the scaling is different between conditions:

if the scaling is correct and consistent - all plots are OK
if the time scaling is correct but inconsistent within a condition
- jump distance plot - in combined.pdf will be meaningless, individual plots are OK
- MSD plot - combined average may look odd since different durations may be used. This is also an issue if movies have different lengths or if tracking is poor in some movies.
if the spatial scaling is correct but inconsistent - all plots are OK

Note that if either spatial or temporal scaling is incorrect, most of the plots will be garbage.