Continuous Bernoulli distribution

Probability distribution
Continuous Bernoulli distribution
Probability density function
Probability density function of the continuous Bernoulli distribution
Notation C B ( λ ) {\displaystyle {\mathcal {CB}}(\lambda )}
Parameters λ ( 0 , 1 ) {\displaystyle \lambda \in (0,1)}
Support x [ 0 , 1 ] {\displaystyle x\in [0,1]}
PDF C ( λ ) λ x ( 1 λ ) 1 x {\displaystyle C(\lambda )\lambda ^{x}(1-\lambda )^{1-x}\!}
where C ( λ ) = { 2 if  λ = 1 2 2 tanh 1 ( 1 2 λ ) 1 2 λ  otherwise {\displaystyle C(\lambda )={\begin{cases}2&{\text{if }}\lambda ={\frac {1}{2}}\\{\frac {2\tanh ^{-1}(1-2\lambda )}{1-2\lambda }}&{\text{ otherwise}}\end{cases}}}
CDF { x  if  λ = 1 2 λ x ( 1 λ ) 1 x + λ 1 2 λ 1  otherwise {\displaystyle {\begin{cases}x&{\text{ if }}\lambda ={\frac {1}{2}}\\{\frac {\lambda ^{x}(1-\lambda )^{1-x}+\lambda -1}{2\lambda -1}}&{\text{ otherwise}}\end{cases}}\!}
Mean E [ X ] = { 1 2  if  λ = 1 2 λ 2 λ 1 + 1 2 tanh 1 ( 1 2 λ )  otherwise {\displaystyle \operatorname {E} [X]={\begin{cases}{\frac {1}{2}}&{\text{ if }}\lambda ={\frac {1}{2}}\\{\frac {\lambda }{2\lambda -1}}+{\frac {1}{2\tanh ^{-1}(1-2\lambda )}}&{\text{ otherwise}}\end{cases}}\!}
Variance var [ X ] = { 1 12  if  λ = 1 2 ( 1 λ ) λ ( 1 2 λ ) 2 + 1 ( 2 tanh 1 ( 1 2 λ ) ) 2  otherwise {\displaystyle \operatorname {var} [X]={\begin{cases}{\frac {1}{12}}&{\text{ if }}\lambda ={\frac {1}{2}}\\-{\frac {(1-\lambda )\lambda }{(1-2\lambda )^{2}}}+{\frac {1}{(2\tanh ^{-1}(1-2\lambda ))^{2}}}&{\text{ otherwise}}\end{cases}}\!}

In probability theory, statistics, and machine learning, the continuous Bernoulli distribution[1][2][3] is a family of continuous probability distributions parameterized by a single shape parameter λ ( 0 , 1 ) {\displaystyle \lambda \in (0,1)} , defined on the unit interval x [ 0 , 1 ] {\displaystyle x\in [0,1]} , by:

p ( x | λ ) λ x ( 1 λ ) 1 x . {\displaystyle p(x|\lambda )\propto \lambda ^{x}(1-\lambda )^{1-x}.}

The continuous Bernoulli distribution arises in deep learning and computer vision, specifically in the context of variational autoencoders,[4][5] for modeling the pixel intensities of natural images. As such, it defines a proper probabilistic counterpart for the commonly used binary cross entropy loss, which is often applied to continuous, [ 0 , 1 ] {\displaystyle [0,1]} -valued data.[6][7][8][9] This practice amounts to ignoring the normalizing constant of the continuous Bernoulli distribution, since the binary cross entropy loss only defines a true log-likelihood for discrete, { 0 , 1 } {\displaystyle \{0,1\}} -valued data.

The continuous Bernoulli also defines an exponential family of distributions. Writing η = log ( λ / ( 1 λ ) ) {\displaystyle \eta =\log \left(\lambda /(1-\lambda )\right)} for the natural parameter, the density can be rewritten in canonical form: p ( x | η ) exp ( η x ) {\displaystyle p(x|\eta )\propto \exp(\eta x)} .


Related distributions

Bernoulli distribution

The continuous Bernoulli can be thought of as a continuous relaxation of the Bernoulli distribution, which is defined on the discrete set { 0 , 1 } {\displaystyle \{0,1\}} by the probability mass function:

p ( x ) = p x ( 1 p ) 1 x , {\displaystyle p(x)=p^{x}(1-p)^{1-x},}

where p {\displaystyle p} is a scalar parameter between 0 and 1. Applying this same functional form on the continuous interval [ 0 , 1 ] {\displaystyle [0,1]} results in the continuous Bernoulli probability density function, up to a normalizing constant.

Beta distribution

The Beta distribution has the density function:

p ( x ) x α 1 ( 1 x ) β 1 , {\displaystyle p(x)\propto x^{\alpha -1}(1-x)^{\beta -1},}

which can be re-written as:

p ( x ) x 1 α 1 1 x 2 α 2 1 , {\displaystyle p(x)\propto x_{1}^{\alpha _{1}-1}x_{2}^{\alpha _{2}-1},}

where α 1 , α 2 {\displaystyle \alpha _{1},\alpha _{2}} are positive scalar parameters, and ( x 1 , x 2 ) {\displaystyle (x_{1},x_{2})} represents an arbitrary point inside the 1-simplex, Δ 1 = { ( x 1 , x 2 ) : x 1 > 0 , x 2 > 0 , x 1 + x 2 = 1 } {\displaystyle \Delta ^{1}=\{(x_{1},x_{2}):x_{1}>0,x_{2}>0,x_{1}+x_{2}=1\}} . Switching the role of the parameter and the argument in this density function, we obtain:

p ( x ) α 1 x 1 α 2 x 2 . {\displaystyle p(x)\propto \alpha _{1}^{x_{1}}\alpha _{2}^{x_{2}}.}

This family is only identifiable up to the linear constraint α 1 + α 2 = 1 {\displaystyle \alpha _{1}+\alpha _{2}=1} , whence we obtain:

p ( x ) λ x 1 ( 1 λ ) x 2 , {\displaystyle p(x)\propto \lambda ^{x_{1}}(1-\lambda )^{x_{2}},}

corresponding exactly to the continuous Bernoulli density.

Exponential distribution

An exponential distribution restricted to the unit interval is equivalent to a continuous Bernoulli distribution with appropriate[which?] parameter.

Continuous categorical distribution

The multivariate generalization of the continuous Bernoulli is called the continuous-categorical.[10]

References

  1. ^ Loaiza-Ganem, G., & Cunningham, J. P. (2019). The continuous Bernoulli: fixing a pervasive error in variational autoencoders. In Advances in Neural Information Processing Systems (pp. 13266-13276).
  2. ^ PyTorch Distributions. https://pytorch.org/docs/stable/distributions.html#continuousbernoulli
  3. ^ Tensorflow Probability. https://www.tensorflow.org/probability/api_docs/python/tfp/edward2/ContinuousBernoulli Archived 2020-11-25 at the Wayback Machine
  4. ^ Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
  5. ^ Kingma, D. P., & Welling, M. (2014, April). Stochastic gradient VB and the variational auto-encoder. In Second International Conference on Learning Representations, ICLR (Vol. 19).
  6. ^ Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2016, June). Autoencoding beyond pixels using a learned similarity metric. In International conference on machine learning (pp. 1558-1566).
  7. ^ Jiang, Z., Zheng, Y., Tan, H., Tang, B., & Zhou, H. (2017, August). Variational deep embedding: an unsupervised and generative approach to clustering. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (pp. 1965-1972).
  8. ^ PyTorch VAE tutorial: https://github.com/pytorch/examples/tree/master/vae.
  9. ^ Keras VAE tutorial: https://blog.keras.io/building-autoencoders-in-keras.html.
  10. ^ Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. In 36th International Conference on Machine Learning, ICML 2020. International Machine Learning Society (IMLS).
  • v
  • t
  • e
Discrete
univariate
with finite
support
with infinite
support
Continuous
univariate
supported on a
bounded interval
supported on a
semi-infinite
interval
supported
on the whole
real line
with support
whose type varies
Mixed
univariate
continuous-
discrete
Multivariate
(joint)DirectionalDegenerate
and singular
Degenerate
Dirac delta function
Singular
Cantor
Families
  • Category
  • Commons