10 March, 2019

Using Entropy to Estimate Probability of Adverse Selection

Under conditions that the probability of good news and bad news are equal (i.e., $\delta p<0$ and $\delta p>0$ happen equally probably), the probability of informed trading can be shown to be

$$VIN=\frac{\alpha\mu}{\alpha\mu+2\epsilon}$$

where $\alpha$ is the probability of an informational event, $\mu$ is the arrival rate of informed orders and $\epsilon$ is the arrival rate of uninformed orders.  Heuristically, it is the portion of flow relative to the overall flow that is coming from informed traders.

Let $V$ be the size of a set of volume bars $(V_\tau)_{1\leq\tau\leq N}$.  Of the tickets associated to the bar $\tau$ let $V^B_\tau$ and $V^S_\tau$ be the total volume arising from buys and sells, where each tick is classified as a buy or sell using some algorithm such as the tick rule or Lee-Ready algorithm.

Under the same assumptions, it can be shown that $V=E(V^B_\tau+E^S_\tau)=\alpha\mu+2\epsilon$ and $E|V^B_\tau-V^S_\tau|=\alpha\mu$.  Consequently, in the volume bar coordinates, we have (adding the "V" for emphasis)

$$VPIN=\frac{E|V^B_\tau-V^S_\tau|}{V}=\frac{1}{V}E|2^B_\tau-V|$$

If we re-define $V^B_\tau$ as the portion of buy volume relative to $V$, then

$$VPIN=E|2V^B_\tau-1|=E|1-2V^S_\tau|$$

We define this quantity as the order flow imbalance in volume bar $\tau$, or $OI_\tau$.  Note that

$$|OI_\tau|\gg>0$$

is a necessary, yet insufficient condition for adverse selection.  To be sufficient, we need

$$|E_0(OI_\tau)-OI_\tau|\gg0.$$

That is, $OI_\tau$ must be large and unpredictable for the market maker, so that they will trade with an informed trader and be adversely selected by charging insufficient bid-offer spreads.

We can use information theory in order to extract a useful feature from this analysis.  Let $(V^B_\tau)_{1\leq\tau\leq N}$ be a sequence of portions of buy volumes for a set of volume bars of size $V$.  For a specified $q$, let $\{K_1,\ldots,K_q\}$ be the corresponding collection $q$ quantiles and $f(V^B_\tau)=i$ if $V^B_\tau\in K_i$.  We then "quantize" the sequence of volumes portions to an integer sequence

$$X_\tau:=(f(V^B_{\tau_1}),\ldots,f(V^B_{\tau_N}))$$

for which we can estimate $H[X_\tau]$ using the Limpel-Ziv algorithm and then derive the cumulative distribution $F(H[X_\tau])$ and use this time-series as a feature for classifying adverse selection in the order flow.