B

C

(4) yn = xn + sn + drot (5) un = Kssn - Kyn

Setpoint Model

1. The state (x) is updated by the decaying memory (A) of the previous state and the sensitivity (B) to the feedback controller (u) 2. Target error (y) in movement (n) is a function of the implicit state (x) plus the contribution of visuomotor rotation (drot) 3. For simplicity, the controller (u) was modeled as output feedback, which is the error (y) multiplied by a negative gain (K). 4. Error (y) is the output of the strategy (s), implicit state (x), and the rotation (drot) 5. A strategy (s) multiplied by a strategy gain (Ks) and was incorporated into u, and s≡-drot for 41< n <121. D

Target Error

A

Disturbance Model

Aftereffect Aftereffect

30

30

25

25

20 15

Target Error

Figure 2: Model predictions. Disturbance model (black). Setpoint model (cyan). Setpoint with variable leakage gain (Ks, magenta). Rotation is present for 80 movements (white region), and absent before and after this phase (shaded region).

Figure 1. Replication experiment. The rotation was presented for 80 movements (white region). Participants experienced the rotation for the first two movements before being instructed to use the strategy (magenta circle). Setpoint model fit (black)

(1) xn+1 = Axn + Bun (2) yn = xn + drot (3) un = -Kyn

10

No−Reward Reward No−Aiming Targets

20 15 10 5 0

5

1

20

40

60

80

Movement Number

0

1

20

40

60

80

Movement Number

Figure 3. Experiment 2. Participants first practiced moving to the cued target without a rotation (black) and while using the strategy without a rotation (orange). Rotation block (white region). The rotation was experienced for the first 2 movements without using the strategy (Xs). A) No-Reward (rotation, blue; washout, cyan). B) Reward (rotation, red; washout, magenta) C) No-Aiming Targets (rotation, green; washout, lt. green). D) Aftereffects (binned) Strategy Reward Model (6) sn+1 = sn + δ(rn - wr(E{r(1:n-1)})) a. rn = e

2 −yn 2σ 2

Aiming Target Model (7) un = Kssn - K[wn vn + (1-wv)pn] a. vn ≡ yn b. pn ≡ yn - drot

6. The strategy (s) is updated by the previous strategy, weighted (δ) by the difference between the previous reward and weighted (wr) expectation (E) of reward from previous rewards. The reward (r) is a Gaussian function of the error (y) and the reward window is given by σ. 7. The control signal (u) is a weighted combination (wv) of vision (v) and proprioception (p) relative to the strategy setpoint (s). The visual feedback (v) is the same as the error (y) and the proprioceptive feedback (p) is the same as (y), but is not rotated. Equations 1, 4, 5 and 6 were used to model the differences between the No-Reward and Reward conditions. To model the differences between the Reward and No-Aiming Targets conditions, the parameters from Eq. 6 were held fixed and Eq. 7 replaced Eq. 5.

A B C D Figure 4. Model fits for the No-Reward (blue), Reward (red), and NoAiming Targets (green) conditions. B) Changes over time in the hidden implicit state (dashed) and strategy (solid). C) Estimated reward window (σ) for No-Reward group (blue) and Reward group (red), Shading is 95% confidence interval of the mean. D) Model parameters of interest between No-Reward (blue) and Reward (red) conditions: strategy leakage (Ks), reward sensitivity (δ), expectation of reward (wr). Weighting of visual error, relative to the strategy, (wv) was fit separately for Reward (red) and No-Aiming Targets (green).