Variability in photometry signals highlights the need for trial-level analyses.

Signals were recorded from a Pavlovian task in which reward-delivery (sweetened water) followed a stimulus-presentation (0.5 sec auditory cue) after a 1 sec delay. Signals are aligned to cue-onset. (A) Signals exhibit heterogeneity across animals. Each trace is a trial-averaged signal on one session for one animal. (B) Signals exhibit heterogeneity across animals in the effect of condition. Each trace is from one animal on the same session as in (A). Signals were separately averaged across trials in which animals did (Lick+) or did not (Lick-) engage in anticipatory licking. Each trace represents the pointwise difference between average Lick+ and Lick-signals. (C) Signals exhibit heterogeneity across trials within animal. Each trace is a randomly selected trial from the same animal on the same session. (D) Signals exhibit heterogeneity across sessions. Each trace plotted is the trial-averaged signal for one session for one subject. (E) Illustration of common summary measures. Depending on the authors, Area-Under-the-Curve (AUC) can be the area of the shaded region or the average signal amplitude. (F) Example hypothesis test of Lick+/Lick-differences using peak amplitude as the summary measure. All signals are measurements of calcium dynamics from axons of mesolimbic dopamine neurons recorded from fibers in the nucleus accumbens (Coddington et al., 2023).

Functional Linear Mixed Models estimation.

(A) General procedure. (B) Example analysis of Latency–signal (Latency-to-lick) association. To illustrate how the plots are constructed, we show the procedure at an example trial time-point (s = 1.7 sec), corresponding to values in the heatmap [Top left]. Each point in the FLMM β1(s) coefficient plot [Bottom Left] can be conceptualized as pooling signal values at time s across trials/animals (a slice of the heatmap) and correlating that pooled vector, Y (1.7), against Latency, via an LMM. [Bottom Right] shows how functional random-effects can be used to model variability in the Latency–DA slope across animals. The inset shows how at s = 1.7, the model treats an example animal’s slope, β1(1.7) + γ2,1(1.7) = 0.95, to differ from the shared/common fixed-effect, β1(1.7) = 0.82.

Nested longitudinal designs in photometry experiments can result in correlation patterns and missing data that dilute effects if not unaccounted for statistically.

Descriptive statistics and figures pertain to data from Jeong et al. (2022), reanalyzed in Section 2.3. (A) Experiment trial time-windows used to construct photometry signal summary measures (AUC). The reward was delivered at random times and signals were aligned to the first lick following reward delivery. Reward delivery may occur during the Baseline Period or Reward Period, depending on the lick time. (B) The Reward Number is defined as the cumulative number of rewards (interchangeably referred to as “trials”) pooled across sessions. Each session involved delivery of 100 trials. (C) The time between two rewards (inter-reward interval or IRI) was a random draw from an exponential distribution (mean 14). (D) Examples of experimental designs that exhibit hierarchical nesting structure. Trials/sessions and conditions such as cue-type (e.g., CS+/CS-) contribute to variability within-animal. Between-subject variability can arise from, for example, experimental groups, photometry probe placement, or natural between-animal differences. (E) Reward Period AUC values are correlated across sessions. Each dot indicates the average reward period AUC value of one trial. Between-session correlation in AUC values can be seen within-subject since reward period AUC values are similar within-animal on adjacent sessions. Between-session correlation can be seen on average across animals: session boxplot medians are similar on adjacent sessions. (F) Temporal correlation within-subject on session 3, chosen because it is the only session common to all animals. Reward period AUC on each trial for any animal is similar on adjacent trials. (G) Lines show association (OLS) between IRI and reward period AUC for each animal and session, revealing individual differences in association magnitude. The heterogeneity in line slopes highlights the need for random-effects to account for between-animal and between-session variability. (H) Number of sessions and trials per session (that meet inclusion criteria) included varies considerably between animals. For example, one animals’ data was collected on sessions 1-11 while another’s was collected on sessions 1-3.

FLMM reveals distinct components obscured by summary measure analyses.

(A)- (B) show a recreation of statistical analyses conducted in Jeong et al. (2022) on the random inter-trial interval (IRI) reward delivery experiment, and (C)-(D) show our analyses. (A) Analysis of IRI–AUC correlation on all trials in an example animal, as presented in (Jeong et al., 2022). (B) Recreation of boxplot summarizing IRI–AUC correlation coefficients from each animal. (C)-(D) Coefficient estimates from FLMM analysis of IRI–DA association: functional intercept estimate (C), and functional IRI slope (D). Although we do not use AUC in this analysis, we indicate the trial periods, “Baseline” and “Reward Period,” that Jeong et al. (2022) used to calculate the AUC. They quantified DA by a measure of normalized AUC of ΔF/F during a window ranging 0.5 s before to 1 s after the first lick following reward delivery. All plots are aligned to this first lick after reward delivery. The IRI–DA association is statistically significantly positive in the time interval ∼[-0.5, 1.75] sec.

FLMM identifies within-session signal decreases obscured by standard analyses.

(A) Visualization of the Simpson’s paradox: the AUC decreases within-session, but increases across sessions. Plot shows Reward Number–AUC linear regressions fit to data pooled across sessions (black lines), or fit to each session separately (colored lines) in three example animals. Each colored dot is the AUC value for that animal on the corresponding session and trial. The black dots at the left of each color line indicate the intercept value of the session- and animal-specific linear regression model. Intercepts were parameterized to yield the interpretation as the “expected AUC value on the first trial of the session for that animal.” The dotted lines indicate the animal-specific median of the intercepts (across sessions) and are included to visualize that the intercepts increase over sessions. (B)-(C) Coefficient estimates from FLMM analysis of the Reward Number–DA association that models Reward Number with Session Number and Trial Number (linear) effects to capture between-session and within-session effects, respectively. The plots are aligned to the first lick after reward-delivery. The Baseline and Reward Period show the time-windows used to construct AUCs in the summary measure analysis from Jeong et al. (2022). Pre-lick and post-lick time-windows indicate the portions of the Reward Period that occur before and after the lick, respectively. The Session Number effect is jointly significantly positive roughly in the interval [-0.25,0.5] sec, and peaks before lick-onset. This suggests DA increases across sessions during that interval. The Trial Number effect is briefly pointwise significantly positive around ∼ 0.3 sec and jointly significantly negative in the interval [0,2.5] sec. This suggests DA decreases across trials within-session during the interval [0,2.5] sec. (D) Average signal pooled across sessions and animals. Shaded region shows standard error of the mean.

FLMM identifies significant temporal dynamics effects missed by summary measure analyses.

The analysis of the Delay Length change experiment in Jeong et al. (2022) used the following summary measure: the average Cue Period AUC − Baseline AUC (AUCs in the windows [0,2] and [-1,0] sec, respectively, relative to cue onset). (A)-(B) Behavioral task design and Baseline/Cue Period are illustrated for short-delay (A) and long-delay (B) sessions. (C)-(H) These plots show coefficient estimates from FLMM re-analysis of the experiment. (C) The coefficient value at time-point s on the plot is interpreted as the mean change in average DA signal at time-point s between long- and short-delay trials (i.e., positive values indicate a larger signal on long-delay trials), aligned to cue onset. (D) Same Figure as in (C) but the inset shows the interpretation of an example time-point (s = 9.4): the difference in magnitude between the average traces (pooled across animals and trials) of long- and short-delay sessions. (E)-(F) Gold lines indicate the fixed-effect estimates and grey lines indicate animal-specific estimates (calculated as the sum of functional fixed-effect and random-effect estimates (Best Linear Unbiased Predictor)) for the random intercept, and random slope, respectively. (G)-(H) Fixed-effect coefficient estimates shown with expanded time axis. In (H), it is clear that long-delay trials exhibit average (relative) increases (sub-interval (1)) and decreases (sub-interval (2)) in signal that would likely cancel out and dilute the effect, if analyzing with a summary measure (AUC) that averages the signal over the entire Cue Period.

Realistic simulation experiments show that FLMM exhibits desirable statistical properties for photometry analyses.

The simulation produces synthetic photometry data similar to (Jeong et al., 2022), with the same sources of variability across trials and animals. (A) Lines show average traces from the original photometry data. The traces are averaged across trials and animals from the last short- and the first long-delay session. Bar shows the cue-period analyzed in the paper and in our experiments. (B) Each thin line is the signal from a single simulated trial from the same “animal.”; bold lines show the average trace for each trial type. (C) Each line is the trial-averaged trace from one session for seven simulated “animals.” (D) FLMM exhibits approximately correct joint 95% CI coverage. Perm does not provide joint CIs and thus its joint coverage is low, as expected. (E) FLMM exhibits approximately correct pointwise 95% CI coverage. (F) FLMM improves statistical power during the cue-period compared to standard methods at each sample size tested. Power is calculated for Perm based on the full consecutive threshold criteria. For figures (E) and (F) the LMM and t-test were fit on the cue-period AUC and thus each replicate yields one indicator of CI inclusion and statistical significance. We represent the corresponding proportions with a line plot. For other methods, estimates are provided at each time-point. We therefore average performance across the cue period and then summarize variability of these replicate-specific averages with a 95% confidence band.

Example repeated-measures data from a single test session. (A) Session-average approach: Photometry signals from all trials for Animal 1 are averaged across trials and summarized (peak amplitude). The session-average summary is then plotted (inset) against the session-average behavior for each of 5 animals. Example animal’s session average is the filled circle. (B) Trial-level approach: the trial-level signal summary measure (peak amplitude) is pooled across animals and correlated with trial-level behavior. An example signal from one trial for Animal 1 is highlighted in the trace plot. That example trial is represented as the filled circle in the inset. Each dot in the inset is one trial from one animal; dot color indicates the animal ID. (C) Inset from (B) is magnified. Linear regressions (OLS: Ordinary Least Squares) fit separately to each animal’s trial-level. A global regression fit to the trial-level data pooled across animals is displayed as the dotted black line. (D) Linear Mixed Modeling (LMM) strikes a balance between one model common to all animals and fitting many animal-specific models. The “global” fixed-effects fit (from ) and the fits including the subject-level random-effect estimates (Best Linear Unbiased Predictor) are displayed. Subject-specific fit for animal i is calculated from: . Note the fixed-effects, , and random-effects, , are estimated in the same model.

Regression model classes based on functional vs. scalar response variables and for longitudinal (repeated-measures) vs. cross-sectional data. We use “functional mixed models” as a short-hand for function-on-scalar mixed models. We use “functional regression” as a short-hand for single level function-on-scalar regression.

Cross-sectional regression model classes based on functional vs. scalar predictor variables (i.e., covariates) and functional vs. scalar outcome variables. We take the FoFR, FoSR and SoFR to be the single level (non-longitudinal) versions of these methods.

[Left] Google scholar mentions of the string “Fiber Photometry” by year. There were 549 mentions between January 1, 2023 – June 22, 2023. [Right] Web of Science citations of papers that include the string “Fiber Photometry” by year. There were 50 citations between January 1, 2023 – June 22, 2023. Blue lines indicate fitted values from an exponential fit to the data: Citationsi = α ∗ exp[β ∗ Yeari], where α and β were estimated with the nls package in R. The 1,500 references to photometry described in the main text refers to Google Scholar mentions in the previous 12 months.

Reward Number–AUC correlation within-session and across-session.

(A) Color indicates session number, rows denote animal number. Trial Number is the within-session Reward Number and ranges from 1-100 for each session. The black line that spans across sessions is a Reward Number–AUC linear regression fit, while the session color lines indicate a within-session Trial Number–AUC linear regression fit. The large black circles on the left side of each session-specific fit is the intercept, paramterized to yield the interpretation as the “expected AUC on the first trial of the corresponding session.” Dotted horizontal lines are set at the median of the intercepts to facilitate comparison. The intercepts tend to rise across sessions, while few slopes are significantly positive. (B)-(C) Each dot indicates the estimated intercept value (B) or slope (C) from the fits shown in (A). Lines and p-values were calculated in an LMM that was fit to the session-specific linear regression slopes, , and intercepts, , shown in (A). The LMM included animal-specific random intercepts and slopes. These plots quantify the trend observed in (A): the estimated intercepts significantly increase across sessions, but the slopes are mostly negative.

Trial Number–DA correlation within-session on the random IRI task.

Row indicates session number. The intercept is paramterized to yield the interpretation as the “expected signal magnitude on the first trial of the corresponding session.” Effects are aligned to the first lick after reward delivery. This is the session-by-session version of the analysis presented in Figure 5J-K.

FLMM reveals details occluded by summary measure analyses.

Coefficient estimates from an FLMM analysis of the random inter-trial interval (IRI) reward delivery experiment. The top row contains the intercept term plots where the title provides interpretation of the intercept: the average dopamine (DA) signal on trials when Lick Rate is at its average value. The bottom row shows the coefficient estimate plot of the covariate in the model. The “Baseline” and “Reward Period” bars show the trial period that the original authors used to calculate the summary measure (AUC). Specifically, they quantified DA by a measure of normalized AUC of ΔF/F during a window 0.5 s before to 1 s after the first lick following reward delivery. All plots are aligned to this first lick after reward delivery. The interpretation of the y-value of the bottom plot at any time-point s: the mean change in the dopamine signal at s for a one unit change in Lick Rate. Association between DA and Lick Rate aligned to lick bout onset. Time-points when Lick Rate was negatively associated with DA (negative coefficient estimates in the final 1s of the 1.5s window) may have diluted time-points when they were positively associated (positive coefficient estimates in the first 0.5s of the reward period).

Random IRI experiment aligned to lick-onset: Reward Number–DA association analyzed as within-session (Trial Number) and between-session (Session Number) linear effects.

The average reward signal shows the average trace with standard error of the mean indicated by the shaded region. The Trial Number Session Number effects are FLMM coefficient estimate plots.

Random IRI experiment aligned to reward-delivery: Reward Number–DA association analyzed as within-session (Trial Number) and between-session (Session Number) linear effects.

The average reward signal shows the average trace with standard error of the mean indicated by the shaded region. The Trial Number Session Number effects are FLMM coefficient estimate plots.

FLMM identifies how the signal increases across trials during Cue Period and decreases across trials after reward-delivery (3 sec).

Background Reward Experiment Analyses

FLMM identifies how the signal evolves across trial time-points, and how the temporal location of transients progresses across sessions in a statistically significant manner.

The panels show coefficient estimates from FLMM analyses of the “backpropagation” experiment. Panel (A) contains the intercept term plot corresponding to the average signal on the first session of training. The “Early” (0-1s), “Mid” (1-2s), and “Late” (2-3s) bars show the trial time-periods that the original authors used to calculate summary AUC measures. Trials are aligned to cue onset (cues lasted 2s) and rewards were delivered at 3s. Panels (B)-(D) show the coefficient estimates corresponding to the mean change in signal values from 2nd, 3rd or 4th sessions, respectively, compared to the first session (positive values indicate an increase from the first session). Plots are aligned to cue onset.

Individual-level coefficient estimates from FLMM analyses of “backpropagation” experiment: gold lines indicate the fixed-effect estimates and grey lines indicate animal-specific functional random-effect estimates (Best Linear Unbiased Predictor. Panel (A) contains the intercept term plot where the title provides an interpretation: the average signal on the first session of training. The “Early” (0-1s), “Mid” (1-2s), and “Late” (2-3s) bars show the trial time-periods that the original authors used to calculate summary AUC measures. Trials are aligned to cue onset and rewards were delivered at 3s. Panels (B)-(D) show the coefficient estimates which are interpreted as the change in mean signal from 2nd, 3rd or 4th sessions, respectively, compared to the first session (positive values indicate an increase from the first session). Plots are aligned to cue onset.

Summary measure analyses are highly sensitive to minor changes in the summary time-window. (A) Average Short/Long delay data. Bars show cue period length used in (2s) and “extended” delays analyzed in additional simulations. (B) Estimation error (RMSE), where, associated with mean difference during the cue periods (panels) and n on the x-axis. Lower numbers indicate more accurate estimates. (C) pointwise 95% CI coverage associated with mean difference during cue period (panels) and n on the x-axis. Higher values indicate better CI coverage. (D) Statistical power during cue period. The LMM and t-test were fit on the signal averaged over the cue period and thus each simulation replicate yields a single indicator of CI inclusion or statistical significance, which we represent with a line plot. For other methods, estimates are provided at each time-point and performance is averaged across the time-points. We summarize these simulation replicate-specific averages with a boxplot.

Statistical power associated with mean difference defining the cue period as 2s, 2.5s, and 3s (panels) and sample sizes (numbers of animals) on the x-axis. The two panels presents the same data in either violin or boxplot forms. Higher numbers indicate better power. For FLMM and Perm, power is averaged across the time-points in the cue period whereas the others assess the power using the average signal (across the cue period) as the outcome. Since each simulation replicate takes the proportion of significant time-points in the cue period for FLMM and Perm, these are presented as boxplots (or violin plots), whereas the rest are simply presented as the proportion of simulation replicates that identified the mean signal during the cue as statistically significant (either 0 or 1 for each replicate).

Time to fit FLMM fit with our software (using a closed-form variance calculation) on simulated data (each datapoint represents one replicate). pffr shows the time to fit the functional linear mixed model (with the same model specification) with the pffr() function in the refund package. Number of animals in simulations shown in plots ranges from 4-8 (i.e., n = trials/100).

FLMM fit with our software achieves comparable or superior performance to the functional linear mixed model (with the same model specification) fit with the pffr() function in the refund package. (A) FLMM achieves joint 95% CI coverage at roughly the nominal level. pffr does not provide joint 95% CIs and thus the pointwise 95% CIs that it does provide achieve low joint coverage. (B) FLMM achieves pointwise 95% CI coverage at or above the nominal level. The pointwise 95% CI coverage of pffr is close to but below the nominal level. pointwise 95% CI coverage associated with mean difference during cue period (panels) and n on the x-axis. Higher values indicate better CI coverage. (C) FLMM and pffr exhibit comparable fixed-effects estimation performance. Estimation error (RMSE), where , associated with mean difference during the cue periods (panels) and n on the x-axis. Lower numbers indicate more accurate estimates. (D) FLMM exhibits superior statistical power compared to pffr during the cue period. (B)-(D) Since estimates are provided at each time-point for both methods, pointwise performance is averaged across the time-points. We summarize these simulation replicate-specific averages as one point in a boxplot.

Coefficient estimates from FLMM analyses of Final Latency and Lick Probability models. The top row contains intercept term plots where the title provides interpretation of the intercept: the average NAc-DA signal on trials when the covariates in the model are at their average value. The bottom row shows the coefficient estimate plot of the covariate in the model. (Left) Association between average NAc-DA (averaged over first 100 trials) and latency to lick Final Latency (averaged over trials 700-800). (Right) Association between average NAc-DA (averaged over first 100 trials) lick probability (averaged over trials 700-800).

Coefficient estimates from a single FLMM analysis of Final Latency × Lick State interaction model (including main effects). A simple FLMM model enables characterization of how the association between photometry signals and behavioral responding (Final Latency) differs between conditions (Lick+/Lick-) at each time-point in the trial. Panel (A) contains the intercept term plot where the title provides an interpretation: the average photometry signal on Lick-trials for animals that exhibit average Final Latency values. Panels (B)-(D) show the three covariates of main effects and interaction of Lick State and Final Latency. The Final Latency functional coefficient is interpreted as the effect of Final Latency on Lick-trials. The Lick State main effect is interpreted as the difference in average NAc–DA between Lick+ and Lick-trials for an animal with an average Final Latency value. The interaction is interpreted as a difference in differences: during the Reward period, a 1 standard deviation increase in average Final Latency is associated with pointwise significantly higher NAc–DA signals on Lick+ than on Lick-trials (with a portion of joint significance) during most of the Reward period.