Pigeons were rewarded with food for pecking keys in various forms of two-armed bandit situation for an extended series of daily sessions in two experiments. The average daily preference (S=R/[R+L]) is very well fit by a markovian linear model in which predicted preference today is an average of predicted preference yesterday and reinforcement conditions today: s(N+1) = as(N) + (1-a)A(N+1), where A(N+1) is set equal to 1 when all rewards are for the Right response, and 0 when all are for the Left, and a is a longterm memory parameter. This linear model explains some apparent paradoxes in earlier reports of memory effects in two-armed bandit experiments. Nevertheless, closer examination of the details of preference changes within each experimental session showed several kinds of non-markovian effects. The most important was a regression at the beginning of each experimental session towards a preference characteristic of earlier sessions (spontaneous recovery). This effect, but not a smaller, less reliable non-markovian reminiscence effect, is consistent with a very simple rule, namely that the effect on preference of each individual reward for a Right or Left response is inversely related to how long ago the reward occurred. Thus, animals learn to prefer the rewarded side each day because these rewards are recent; but they regress to earlier preferences overnight because the most recent rewards become relatively less recent with lapse of time.
Purchase
Buy instant access (PDF download and unlimited online access):
Institutional Login
Log in with Open Athens, Shibboleth, or your institutional credentials
Personal login
Log in with your brill.com account
| All Time | Past 365 days | Past 30 Days | |
|---|---|---|---|
| Abstract Views | 355 | 27 | 1 |
| Full Text Views | 126 | 3 | 0 |
| PDF Views & Downloads | 22 | 2 | 0 |
Pigeons were rewarded with food for pecking keys in various forms of two-armed bandit situation for an extended series of daily sessions in two experiments. The average daily preference (S=R/[R+L]) is very well fit by a markovian linear model in which predicted preference today is an average of predicted preference yesterday and reinforcement conditions today: s(N+1) = as(N) + (1-a)A(N+1), where A(N+1) is set equal to 1 when all rewards are for the Right response, and 0 when all are for the Left, and a is a longterm memory parameter. This linear model explains some apparent paradoxes in earlier reports of memory effects in two-armed bandit experiments. Nevertheless, closer examination of the details of preference changes within each experimental session showed several kinds of non-markovian effects. The most important was a regression at the beginning of each experimental session towards a preference characteristic of earlier sessions (spontaneous recovery). This effect, but not a smaller, less reliable non-markovian reminiscence effect, is consistent with a very simple rule, namely that the effect on preference of each individual reward for a Right or Left response is inversely related to how long ago the reward occurred. Thus, animals learn to prefer the rewarded side each day because these rewards are recent; but they regress to earlier preferences overnight because the most recent rewards become relatively less recent with lapse of time.
| All Time | Past 365 days | Past 30 Days | |
|---|---|---|---|
| Abstract Views | 355 | 27 | 1 |
| Full Text Views | 126 | 3 | 0 |
| PDF Views & Downloads | 22 | 2 | 0 |