We study the ordering dynamics of nonlinear voter models with multiple states, also providing a discussion of the two-state model. The rate with which an individual adopts an opinion scales as the q-th power of the number of the individual's neighbours in that state. For q>1 the dynamics favor the opinion held by the most agents. The ordering to consensus is driven by deterministic drift, and noise only plays a minor role. For q<1 the dynamics favors minority opinions, and for multistate models the ordering proceeds through a noise-driven succession of metastable states. Unlike linear multi-state systems, the nonlinear model cannot be reduced to an effective two-state model. We find that the average density of active interfaces in the model with multiple opinion states does not show a single exponential decay in time for q<1, again at variance with the linear model. This highlights the special character of the conventional (linear) voter model, in which deterministic drift is absent. As part of our analysis, we develop a pair approximation for the multi-state model on graphs, valid for any positive real value of q, improving on previous approximations for nonlinear two-state voter models.