Under conditions that the probability of good news and bad news are equal (i.e., $\delta p<0$ and $\delta p>0$ happen equally probably), the probability of informed trading can be shown to be
$$VIN=\frac{\alpha\mu}{\alpha\mu+2\epsilon}$$
where $\alpha$ is the probability of an informational event, $\mu$ is the arrival rate of informed orders and $\epsilon$ is the arrival rate of uninformed orders. Heuristically, it is the portion of flow relative to the overall flow that is coming from informed traders.
Let $V$ be the size of a set of volume bars $(V_\tau)_{1\leq\tau\leq N}$. Of the tickets associated to the bar $\tau$ let $V^B_\tau$ and $V^S_\tau$ be the total volume arising from buys and sells, where each tick is classified as a buy or sell using some algorithm such as the tick rule or Lee-Ready algorithm.
Under the same assumptions, it can be shown that $V=E(V^B_\tau+E^S_\tau)=\alpha\mu+2\epsilon$ and $E|V^B_\tau-V^S_\tau|=\alpha\mu$. Consequently, in the volume bar coordinates, we have (adding the "V" for emphasis)
$$VPIN=\frac{E|V^B_\tau-V^S_\tau|}{V}=\frac{1}{V}E|2^B_\tau-V|$$
If we re-define $V^B_\tau$ as the portion of buy volume relative to $V$, then
$$VPIN=E|2V^B_\tau-1|=E|1-2V^S_\tau|$$
We define this quantity as the order flow imbalance in volume bar $\tau$, or $OI_\tau$. Note that
$$|OI_\tau|\gg>0$$
is a necessary, yet insufficient condition for adverse selection. To be sufficient, we need
$$|E_0(OI_\tau)-OI_\tau|\gg0.$$
That is, $OI_\tau$ must be large and unpredictable for the market maker, so that they will trade with an informed trader and be adversely selected by charging insufficient bid-offer spreads.
We can use information theory in order to extract a useful feature from this analysis. Let $(V^B_\tau)_{1\leq\tau\leq N}$ be a sequence of portions of buy volumes for a set of volume bars of size $V$. For a specified $q$, let $\{K_1,\ldots,K_q\}$ be the corresponding collection $q$ quantiles and $f(V^B_\tau)=i$ if $V^B_\tau\in K_i$. We then "quantize" the sequence of volumes portions to an integer sequence
$$X_\tau:=(f(V^B_{\tau_1}),\ldots,f(V^B_{\tau_N}))$$
for which we can estimate $H[X_\tau]$ using the Limpel-Ziv algorithm and then derive the cumulative distribution $F(H[X_\tau])$ and use this time-series as a feature for classifying adverse selection in the order flow.