Passive learning: a critique by example

Experimentation, Learning, D83

Experimentation, Learning, D83 Minimize

article

The (in)appropriate benchmark when beliefs are not the only state variable

In models of learning by experimentation, there is a natural benchmark of myopia when the only intertemporal link is the agent`s subjective belief (signal independence). An alternative benchmark using a passive learner has been proposed when there is a further intertemporal link that directly affects payoffs (signal dependence). The purpose of t...

In models of learning by experimentation, there is a natural benchmark of myopia when the only intertemporal link is the agent`s subjective belief (signal independence). An alternative benchmark using a passive learner has been proposed when there is a further intertemporal link that directly affects payoffs (signal dependence). The purpose of this note is to suggest that the use of this particular benchmark is flawed for two reasons: first, passive learning does not disentangle the effects of knowing that beliefs might change as well as other state variables, and we offer another benchmark using a naive learner that does, and so necessarily reduces to myopia in the signal independent case; secondly, and maybe more tellingly, passive learning does not do what it is supposed to do, namely help measure the gains from active experimentation, since the payoffs of a passive learner can be markedly lower than those of a naive learner. ; Experimentation, Learning Minimize

preprint

Optimal Experimentation in a Changing Environment

This paper studies optimal experimentation by a monopolist who faces an unknown demand curve subject to random changes, and who maximises profits over an infinite horizon in continuous time. We show that there are two qualitatively very different regimes, determined by the discount rate and the intensities of demand curve switching, and the depe...

This paper studies optimal experimentation by a monopolist who faces an unknown demand curve subject to random changes, and who maximises profits over an infinite horizon in continuous time. We show that there are two qualitatively very different regimes, determined by the discount rate and the intensities of demand curve switching, and the dependence of the optimal policy on these parameters is discontinuous. One regime is characterised by extreme experimentation and good tracking of the prevailing demand curve, the other by moderate experimentation and poor tracking. Moreover, in the latter regime the agent eventually becomes `trapped' into taking actions in a strict subset of the feasible set. ; Bayesian Learning, Monopoly Experimentation, Optimal Control Minimize

preprint

Market Experimentation in a Dynamic Differentiated-Goods Duopoly

We study the evolution of prices in a symmetric duopoly where firms are uncertain about the degree of product differentiation. Customers sometimes perceive the products as close substitutes, sometimes as highly differentiated. Firms learn about their competitive environment from the quantities sold and a background signal. As the informativeness...

We study the evolution of prices in a symmetric duopoly where firms are uncertain about the degree of product differentiation. Customers sometimes perceive the products as close substitutes, sometimes as highly differentiated. Firms learn about their competitive environment from the quantities sold and a background signal. As the informativeness of the market outcome increases with the price differential, there is scope for active learning. In a setting with linear demand curves, we derive firms' pricing strategies as payoff-symmetric mixed or correlated Markov perfect equilibria of a stochastic differential game where the common posterior belief is the natural state variable. When information has low value, firms charge the same price as would be set by myopic players, and there is no price dispersion. When firms value information more highly, on the other hand, they actively learn by creating price dispersion. This market experimentation is transient, and most likely to be observed when the firms' environment changes sufficiently often, but not too frequently. ; Duopoly Experimentation, Bayesian Learning, Stochastic Differential Game, Markov Perfect Equilibrium, Mixed Strategies, Correlated Equilibrium Minimize

preprint

preprint

Strategic Experimentation with Poisson Bandits

We study a game of strategic experimentation with two-armed bandits where the risky arm distributes lump-sum payoﬀs according to a Poisson process. Its intensity is either high or low, and unknown to the players. We consider Markov perfect equilibria with beliefs as the state variable. As the belief process is piecewise deterministic, payoff fun...

We study a game of strategic experimentation with two-armed bandits where the risky arm distributes lump-sum payoﬀs according to a Poisson process. Its intensity is either high or low, and unknown to the players. We consider Markov perfect equilibria with beliefs as the state variable. As the belief process is piecewise deterministic, payoff functions solve differential-difference equations. Here is no equilibrium where all players use cut-off strategies, and all equilibria exhibit an ‘encouragement effect’ relative to the single-agent optimum. We construct asymmetric equilibria in which players have symmetric continuation values at suffciently optimistic beliefs yet take turns playing the risky arm before all experimentation stops. Owing to the encouragement effect, these equilibria Pareto dominate the unique symmetric one for suffciently frequent turns. Rewarding the last experimenter with a higher continuation value increases the range of beliefs where players experiment, but may reduce average payoffs at more optimistic beliefs. Some equilibria exhibit an ‘anticipation effect’: as beliefs become more pessimistic, the continuation value of a single experimenter increases over some range because a lower belief means a shorter wait until another player takes over ; Strategic Experimentation, Two-Armed Bandit, Poisson Process, Bayesian Learning, Piecewise Deterministic Process, Markov Perfect Equilibrium, Differential-Difference Equation. Minimize

preprint

preprint

preprint

Strategic Experimentation with Exponential Bandits

We analyze a game of strategic experimentation with two-armed bandits whose risky arm might yield payoffs after exponentially distributed random times. Free-riding causes an inefficiently low level of experimentation in any equilibrium where the players use stationary Markovian strategies with beliefs as the state variable. We construct the uniq...

We analyze a game of strategic experimentation with two-armed bandits whose risky arm might yield payoffs after exponentially distributed random times. Free-riding causes an inefficiently low level of experimentation in any equilibrium where the players use stationary Markovian strategies with beliefs as the state variable. We construct the unique symmetric Markovian equilibrium of the game, followed by various asymmetric ones. There is no equilibrium where all players use simple cut-off strategies. Equilibria where players switch finitely often between experimenting and free-riding all yield a similar pattern of information acquisition, greater efficiency being achieved when the players share the burden of experimentation more equitably. When players switch roles infinitely often, they can acquire an approximately efficient amount of information, but still at an inefficient rate. In terms of aggregate payoffs, all these asymmetric equilibria dominate the symmetric one wherever the latter prescribes simultaneous use of both arms. Copyright The Econometric Society 2005. Minimize

article

Unemployment, Particiation and Market Size

We construct an equilibrium random matching model of the labour market, with endogenous market participation and a general matching technology that allows for market size effects: the job-finding rate for workers and the incentives for participation change with the level of unemployment. In comparison to standard models with constant returns to ...

We construct an equilibrium random matching model of the labour market, with endogenous market participation and a general matching technology that allows for market size effects: the job-finding rate for workers and the incentives for participation change with the level of unemployment. In comparison to standard models with constant returns to scale in matching, agent behaviour is more complex - the model generates plausible joint dynamics of employment, unemployment and participation with heterogeneity in search behaviour for workers with different degrees of attachment to the labour market. Techniques are developed to reduce the dimensionality of the problem to establish local and global stability; a complicating factor is the possibility of multiple equilibria, welfare-ranked by market size. A Hosios-type condition internalises search externalities. ; Unemployment, Participation, Job Search, Matching Function, Returns to Scale, Multiple Equilibria, Stability, Coordination, Search Externalities Minimize

preprint

