Advanced Usage • SuperPlotR

Advanced SuperPlots

In this vignette, we will explore some of the more advanced features of SuperPlotR.

But first, we need to deal with how to simply scale the plot so it looks good in your paper!

Sizing the SuperPlot

The default setting is to use point sizes of 2 for the data and 3 for the summary points. The font size is 12. This looks good in the RStudio viewer, but is not well suited to a figure which is likely to be very small.

library(SuperPlotR)
# the default plot
superplot(lord_jcb, "Speed", "Treatment", "Replicate", ylab = "Speed (µm/min)")

# the same plot but with custom sizing
superplot(lord_jcb, "Speed", "Treatment", "Replicate", ylab = "Speed (µm/min)",
          size = c(0.8,1.5), fsize = 9)

This does not look great in the viewer, but it will look better in a figure.

library(ggplot2)
# the same plot but with custom sizing
superplot(lord_jcb, "Speed", "Treatment", "Replicate",
               ylab = "Speed (µm/min)", size = c(0.8,1.5), fsize = 9)

ggsave("plot.pdf", width = 88, height = 50, units = "mm") # final size

This is preferable to

library(ggplot2)
# the same plot but with custom sizing
superplot(lord_jcb, "Speed", "Treatment", "Replicate",
               ylab = "Speed (µm/min)")

ggsave("plot.pdf", width = 88, height = 50, units = "mm") # final size

Customising the SuperPlot

A couple of simple tweaks: an x label can be added, and the transparency of points can be altered like this.

superplot(lord_jcb, "Speed", "Treatment", "Replicate",
          xlab = "Drug", ylab = "Speed (µm/min)", alpha = c(0.3,1))

SuperPlotR returns a ggplot object which can be customised how you like. For example, the theme can be overridden like this:

p <- superplot(lord_jcb, "Speed", "Treatment", "Replicate", ylab = "Speed (µm/min)")
p + theme_minimal()

It can also accept a ggplot object using the gg parameter, and then add a SuperPlot to it (within reason!). For example, you might want to plot something behind the SuperPlot.

p <- ggplot() +
  geom_hline(yintercept = 20, linetype = "dashed", col = "grey")
superplot(lord_jcb, "Speed", "Treatment", "Replicate", ylab = "Speed (µm/min)", gg = p)

Ordering the x-axis

This is best done by reordering the levels of the factor in the input dataframe before calling superplot.

df <- lord_jcb
df$Treatment <- factor(df$Treatment, levels = c("Drug", "Control"))
superplot(df, "Speed", "Treatment", "Replicate", ylab = "Speed (µm/min)")

It is also possible to reorder the Replicates using a similar strategy. You might want to do this so that the order of colours and shapes matches a different order to the default. Another way to achieve the same thing is to supply a reordered colour palette to superplot.

Getting information about your SuperPlot

Having made your SuperPlot, you might want to know a bit more about it. For example, you might wonder which replicate is which or perhaps you received a warning that some replicates are missing some conditions.

You can set the option info = TRUE when you call superplot to get more detailed information.

superplot(lord_jcb, "Speed", "Treatment", "Replicate", ylab = "Speed (µm/min)",
          info = TRUE)
#> SuperPlot information
#> =====================
#> Number of conditions: 2
#> Number of replicates: 3
#> Number of data points: 300
#> Number of summary points: 6
#> =====================
#> Colour palette: tol_bright
#> Data distribution: sina
#> Summary statistic: rep_mean
#> No bars
#> X-axis label:
#> Y-axis label: Speed (µm/min)
#> Point sizes: 2 (individual), 3 (summary)
#> Alpha for points: 0.5 (individual), 0.7 (summary)
#> Font size: 12
#> No statistics
#> =====================
#> Colours for replicates: #4477AA, #CCBB44, #EE6677
#> Shapes for replicates: 21, 21, 21
#> =====================
#> Summary statistics:
#> # A tibble: 6 × 6
#>   Treatment Replicate rep_mean rep_median sp_colour sp_shape
#>   <chr>     <fct>        <dbl>      <dbl> <fct>     <fct>   
#> 1 Control   1             41.5       41.7 #4477AA   21      
#> 2 Control   2             32.6       34.4 #CCBB44   21      
#> 3 Control   3             20.6       20.3 #EE6677   21      
#> 4 Drug      1             29.6       30.1 #4477AA   21      
#> 5 Drug      2             22.3       21.9 #CCBB44   21      
#> 6 Drug      3             12.9       12.6 #EE6677   21

When this is set, the SuperPlot will contain a legend, otherwise, the legend is not shown by default. This is because the legend is not very useful in most cases, as the colours and shapes are already shown in the plot. However, if you want to see the legend, you can either set info = TRUE or use the append one the output using + theme(legend.position = "right").

Retrieving the summary data

If you need to retrieve the summary data used to create the SuperPlot, you can use the get_sp_summary function. This will return a data frame with the summary data used to create the SuperPlot.

summary_data <- get_sp_summary(lord_jcb, "Speed", "Treatment", "Replicate")
head(summary_data)
#> # A tibble: 6 × 4
#>   Treatment Replicate rep_mean rep_median
#>   <chr>         <int>    <dbl>      <dbl>
#> 1 Control           1     41.5       41.7
#> 2 Control           2     32.6       34.4
#> 3 Control           3     20.6       20.3
#> 4 Drug              1     29.6       30.1
#> 5 Drug              2     22.3       21.9
#> 6 Drug              3     12.9       12.6

Finding representative datapoints

If you want to find the representative datapoints for each condition, then you can use the representative() function. This is handy if the data come from a set of images and you’d like to show a representative image in the figure.

The function returns a data frame with the datapoints ranked by closeness to the summary of each replicate or condition (see ?representative for more information). It also prints the top ranked datapoint to the console.

representative_data <- representative(lord_jcb, "Speed", "Treatment", "Replicate")
#> # A tibble: 6 × 6
#>   Treatment Replicate Speed rowno     diff  rank
#>   <chr>     <chr>     <dbl> <int>    <dbl> <int>
#> 1 Control   1          41.5     5 0.0524       1
#> 2 Control   2          32.9    98 0.220        1
#> 3 Control   3          20.7   124 0.0541       1
#> 4 Drug      1          29.4   178 0.217        1
#> 5 Drug      2          22.3   207 0.000237     1
#> 6 Drug      3          12.9   285 0.0113       1
head(representative_data)
#> # A tibble: 6 × 6
#>   Treatment Replicate Speed rowno   diff  rank
#>   <chr>     <chr>     <dbl> <int>  <dbl> <int>
#> 1 Control   1          41.5     5 0.0524     1
#> 2 Control   1          41.4    13 0.0875     2
#> 3 Control   1          41.9     2 0.366      3
#> 4 Control   1          42.2    31 0.688      4
#> 5 Control   1          40.2    26 1.26       5
#> 6 Control   1          39.6    36 1.86       6

In this example, the dataset has no label column, so the row number is used as the label. If you have a label column, you can specify it using the label parameter. The label could be a filename or other identifier to identify where the datapoint came from.

# Assuming lord_jcb has a column "FileName" with the labels
example <- lord_jcb
example$FileName <- paste0("Image_", seq_len(nrow(example)), ".tif")
representative_data <- representative(example, "Speed", "Treatment", "Replicate",
                                      label = "FileName")
#> # A tibble: 6 × 6
#>   Treatment Replicate Speed FileName          diff  rank
#>   <chr>     <chr>     <dbl> <chr>            <dbl> <int>
#> 1 Control   1          41.5 Image_5.tif   0.0524       1
#> 2 Control   2          32.9 Image_98.tif  0.220        1
#> 3 Control   3          20.7 Image_124.tif 0.0541       1
#> 4 Drug      1          29.4 Image_178.tif 0.217        1
#> 5 Drug      2          22.3 Image_207.tif 0.000237     1
#> 6 Drug      3          12.9 Image_285.tif 0.0113       1