Under conditions that the probability of good news and bad news are equal (i.e.,
\delta p<0 and
\delta p>0 happen equally probably), the probability of informed trading can be shown to be
VIN=\frac{\alpha\mu}{\alpha\mu+2\epsilon}
where
\alpha is the probability of an informational event,
\mu is the arrival rate of informed orders and
\epsilon is the arrival rate of uninformed orders. Heuristically, it is the portion of flow relative to the overall flow that is coming from informed traders.
Let
V be the size of a set of volume bars
(V_\tau)_{1\leq\tau\leq N}. Of the tickets associated to the bar
\tau let
V^B_\tau and
V^S_\tau be the total volume arising from buys and sells, where each tick is
classified as a buy or sell using some algorithm such as the tick rule or Lee-Ready algorithm.
Under the same assumptions, it can be shown that
V=E(V^B_\tau+E^S_\tau)=\alpha\mu+2\epsilon and
E|V^B_\tau-V^S_\tau|=\alpha\mu. Consequently, in the volume bar coordinates, we have (adding the "V" for emphasis)
VPIN=\frac{E|V^B_\tau-V^S_\tau|}{V}=\frac{1}{V}E|2^B_\tau-V|
If we re-define
V^B_\tau as the portion of buy volume relative to
V, then
VPIN=E|2V^B_\tau-1|=E|1-2V^S_\tau|
We define this quantity as the order flow imbalance in volume bar
\tau, or
OI_\tau. Note that
|OI_\tau|\gg>0
is a necessary, yet
insufficient condition for adverse selection. To be sufficient, we need
|E_0(OI_\tau)-OI_\tau|\gg0.
That is,
OI_\tau must be large and
unpredictable for the market maker, so that they will trade with an informed trader and be adversely selected by charging insufficient bid-offer spreads.
We can use information theory in order to extract a useful feature from this analysis. Let
(V^B_\tau)_{1\leq\tau\leq N} be a sequence of portions of buy volumes for a set of volume bars of size
V. For a specified
q, let
\{K_1,\ldots,K_q\} be the corresponding collection
q quantiles and
f(V^B_\tau)=i if
V^B_\tau\in K_i. We then "quantize" the sequence of volumes portions to an integer sequence
X_\tau:=(f(V^B_{\tau_1}),\ldots,f(V^B_{\tau_N}))
for which we can estimate
H[X_\tau] using the Limpel-Ziv algorithm and then derive the cumulative distribution
F(H[X_\tau]) and use this time-series as a feature for classifying adverse selection in the order flow.