In this paper, we first explain the statistical model underlying the ordinal regression technique used by Pollet and Nettle (2009), including the two possible ways of calculating the likelihood function (section 1). We then show that the model fit criteria reported were in fact invalid, and calculate the correct ones, showing that this leads to ...

In this paper, we first explain the statistical model underlying the ordinal regression technique used by Pollet and Nettle (2009), including the two possible ways of calculating the likelihood function (section 1). We then show that the model fit criteria reported were in fact invalid, and calculate the correct ones, showing that this leads to a different choice of best model (section 2). We then suggest two other strategies of model selection for these data, and show that these also lead to different best-fitting models than that reported by Pollet and Nettle (2009) (sections 3 and 4). 1 Ordinal regression: The cumulative Logit Model The appropriate model for a dependent variable Yi ∈ {1,.,R}, i = 1,.,n, consisting of ranked outcome categories is a cumulative logit model (Agresti, 2002): P(Yi ≤ r|xi) = exp(β0r −x ⊤ i β) 1+exp(β0r −x ⊤ r = 1,.,R−1. i β), The model includes intercepts β0r for each category and a global parameter vector β = (β1,.,βp) for the p covariates. To obtain parameter estimates the maximum-likelihood method is used. The responses are conditionally independent and follow a multinomial distribution with yi|xi ∼ M(1,πi), yi = (yi1,.,yiR−1) = (0,.,0,}{{} 1,0,.,0) ⇔ Yi = r, r−th position πi = (πi1,.,πiR−1) with πir = P(Yi = r|xi) = P(Yi ≤ r|xi)−P(Yi ≤ r −1|xi), r = 1,.,R−1. The associated likelihood function is L(β01,.β0R−1,β;x1,.xn) = n∏ i=1 πi1 yi1 ·πi2 yi2 ·.·(1−πi1 −.−πiR−1) 1−yi1−.−yiR−1. 1 To obtain the parameter estimates, the data are often (as by default in SPSS 15.0) pooled in K groups, and the likelihood of the grouped data is maximized, instead of the likelihood of the individual data. Group k, k = 1,.K, includes all hk observations with the value ˜xk = (˜xk1,.,˜xkp) of the covariates x = (x1,.,xp). The responses again follow a multinomial distribution:

Minimize