| Peer-Reviewed

Posterior Predictive Checks for the Generalized Pareto Distribution Based on a Dirichlet Process Prior

Received: 14 October 2019     Accepted: 4 December 2019     Published: 25 December 2019
Views:       Downloads:
Abstract

Extreme value modelling is widely applied in situations where accurate assessment of the behavior of a process at high levels is needed. The inherent scarcity of extreme value data, the natural objective of predicting future extreme values of a process associated with modelling of extremes and the regularity assumptions required by the likelihood and probability weighted moments methods of parameter estimation within the frequentist framework, make it imperative for a practitioner to consider Bayesian methodology when modelling extremes. Within the Bayesian paradigm, the widely used tool for assessing the fitness of a model is by using posterior predictive checks (PPCs). The method involves comparing the posterior predictive distribution of future observations to the historical data. Posterior predictive inference involves the prediction of unobserved variables in light of observed data.. This paper considers posterior predictive checks for assessing model fitness for the generalized Pareto model based on a Dirichlet process prior. The posterior predictive distribution for the Dirichlet process based model is derived. Threshold selection is done by minimizing the negative differential entropy of the Dirichlet distribution. Predictions are drawn from the Bayesian posterior distribution by Markov chain Monte Carlo simulation (Metropolis-Hastings Algorithm). Two graphical measures of discrepancy between the predicted observations and the observed values commonly applied in practical extreme value modelling are considered, the cumulative distribution function and quantile plots. To support these, the Nash-Sutcliffe coefficient of model efficiency, a numerical measure that evaluates the error in the predicted observations relative to the natural variation in the observed values is used. Finite sample performance of the proposed procedure is illustrated through simulated data. The results of the study suggest that posterior predictive checks are reasonable diagnostic tools for assessing the fit of the generalized Pareto distribution. In addition, the posterior predictive quantile plot seems to be more informative than the probability plot. Most interestingly, selecting the threshold by minimizing the negative differential entropy of a Dirichlet process has the added advantage of allowing the analyst to estimate the concentration parameter from the model, rather than specifying it as a measure of his/her belief in the proposed model as a prior guess for the unknown distribution that generated the observations. Lastly, the results of application to real life data show that the distribution of the annual maximal inflows into the Okavango River at Mohembo, Botswana, can be adequately described by the generalized Pareto distribution.

Published in American Journal of Theoretical and Applied Statistics (Volume 8, Issue 6)
DOI 10.11648/j.ajtas.20190806.20
Page(s) 287-295
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2019. Published by Science Publishing Group

Keywords

Dirichlet Process Prior, Generalized Pareto Distribution, Markov Chain Monte Carlo, Peaks Over Threshold, Posterior Predictive Checks

References
[1] Ames, A. J. (2018). Prior Sensitivity of the Posterior Predictive Checks Method for Item Response Theory models. Vol. 16, NO. 4, 239-255.
[2] Bauer, P., Gumbricht, T. and Kinzelbach, W. (2006). A regional coupled surface water/groundwater model of the Okavango Delta, Botswana. Water resources Research, Vol. 42, W04403, doi: 10.1029/2005WR004234.
[3] Berger, J. O. and Guglielmi, A. (1999). Bayesian testing of a parametric model versus nonparametric alternatives. Worldwide website: http://www.isds.duke.edu/~berger/papers/99-04.html.
[4] Berger, J. O. and Guglielmi, A. (2001). Bayesian and conditional frequentist testing of a parametric model versus nonparametric alternatives. Journal of American Statistical Association 96, 174-184.
[5] Beirlant, J., Goegebeur, Y., Segers, J. and Teugels, J. (2004). Statistics of Extremes, Theory and Applications. Wiley.
[6] Box, G. E. P. and Tiao, G. C. (1973). Bayesian Inference in Statistical Analysis. New York. Wiley Classics.
[7] Coles, S. G. and Powel, E. A. (1996). Bayesian Methods in Extreme Value Modelling: A Review and New Developments. International Statistical Review, Vol. 64, No. 1, pp. 119-136.
[8] Coles, S. G. (2001). An Introduction to Statistical Modeling of Extreme Values. Springer, London.
[9] Davison, A. C. and Smith, R. L. (1990). Models for exceedances over high threshold (with discussion). Journal of the Royal Statistical Society, B62, 393-442.
[10] De Waal, D, J. (2005). Predictive density derived from the Dirichlet process on goodness of fit of extreme modelling. Tech. Report Nr. 351. Department of Mathematical Statistics, University of the Free State.
[11] De Waal, D, J. and Beirlant, J. (2005). Choice of the threshold through the entropy of the Dirichlet Process when applying POT. Tech. Report nr. 351, Department of Mathematical Statistics, University of the Free State.
[12] De Waal, D, J., Beirlant, J. and Dierckx, G. (2008). Predicting high quantiles through the Dirichlet process on extreme modelling. South African Statistical Journal, Volume 42, Issue 2, Jan 2008, p. 101 – 124.
[13] Fawcett, L. and Walshaw, D. (2016). Sea-surge and wind speed extremes: optimal estimation strategies for planners and engineers. Stochastic Environmental Research and Risk Assessment 30: 463-480.
[14] Fawcett, L. and Walshaw, D. (2018). Bayesian posterior predictive return levels for environmental extremes. Stochastic Environmental Research and Risk Assessment 32: 2233-2252.
[15] Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Annals of Statistics Vol. 1, 209-230.
[16] Gelman, Carlin, Stern and Rubin, 2004. Bayesian Data Analysis. 2nd Ed. Chapman and Hall/CRC.
[17] Gelman, A., Meng, X. L. and Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica 6, 733-807.
[18] Gumbricht, T., Wolski, P., Frost, P. and McCarthy, T. S. (2004). Forecasting the spatial extent of the annual flood of the Okavango Delta, Botswana. Journal of Hydrology 290, 178-191.
[19] Guttman, (1967). The use of the concept of a future observation in goodness-of-fit problems. Journal of the Royal Statistical Society B 29, 83-100.
[20] Honkela, A. (2001). Dirichlet distribution. Worldwide website: http://www.hut.fi/ahonkela/dippa/node95.html.
[21] Janssen, A (2003). Which power of goodness of fit tests can really be expected: intermediate versus contiguous alternatives. Statistics & Decisions 21, 301-325.
[22] Kapur, J. N. and Kesavan, H. K. (1992). Entropy Optimization Principles with Applications. Academic Press, Inc.
[23] Leadbetter, M. R. (1991). On a basis for ‘Peaks over Threshold’ modelling. Statistics and Probability Letters 12, 357-362.
[24] Ledwina, (1994). Data-driven version of Neyman’s smooth test of fit. Journal of American Statistical Association 89, 1000-1005.
[25] Lynch, M. L. and Western, B. (2004). Bayesian Posterior Predictive Checks for complex models. Sociological methods & Research, Vol. 32, 301-335.
[26] Mazzuchi (2002). Bayes estimate and inferences for entropy and information index of fit. SANPAD Workshop Proceedings. TU Delft. Worldwide website: http://www.waterbouw.tudelft.nl/public/gelder/sanpadworkshop2002.pdf.
[27] Mimno, D., Blei, D. M. and Engelhardt, B. E. (2015). Posterior predictive checks to quantify lack-of-fit in admixture models of latent population structure. PNAS June 30, 2015 112 (26) E3441-E3450.
[28] Mothupi, T., Thupeng, W. M., Mashabe, B. and Mokoto, B. (2016). Estimating Extreme Quantiles of the Maximum Surface Air Temperatures for the Sir Seretse Khama International Airport Using the Generalized Extreme Value Distribution. American Journal of Theoretical and Applied Statistics 5 (6): 365-375.
[29] Nash, J. E. and Sutcliffe, J. V (1970). River floor forecasting through conceptual models Part 1 – A discussion of principles. Journal of Hydrology 10 (3), 282-290.
[30] Oztekin, T. (2005). Comparison of Parameter Estimation Methods for the Three-Parameter Generalized Pareto Distribution. Turk j of Agric For 29, 419-428.
[31] Pickands, J. (1975). Statistical inference using extreme order statistics. Annals of statistics 3, 119-131.
[32] Rubin, D. B. (1981). Estimation in parallel randomized experiments. Journal of Educational Statistics 6, 377-401.
[33] Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. Annals of Statistics 12, 1151-1172.
[34] Shannon, C. E., (1948). “A Mathematical Theory of Communication”. Bell System Tech. J. 27: 379-423, 623-659.
[35] Smith, R. L. (1989). Extreme value analysis of environmental time series: an application to.
[36] Trend detection in grouped-level ozone, Statistical Science 4, 367-393.
[37] Verdinelli, I. and Wasserman, L. (1998). Bayesian goodness of fit testing using infinite dimensional exponential families. The Annals of Statistics 20, 1203-1221.
[38] Zellner, A. (1977). “Maximal Data Information Prior Distributions”. In A. Aykac and C. Brumat, Eds. New Developments in the Applications of Bayesian Methods. Amsterdam, North Holland Publishing Co., 211-232.
Cite This Article
  • APA Style

    Wilson Moseki Thupeng, Boikanyo Mokgweetsi, Thuto Mothupi. (2019). Posterior Predictive Checks for the Generalized Pareto Distribution Based on a Dirichlet Process Prior. American Journal of Theoretical and Applied Statistics, 8(6), 287-295. https://doi.org/10.11648/j.ajtas.20190806.20

    Copy | Download

    ACS Style

    Wilson Moseki Thupeng; Boikanyo Mokgweetsi; Thuto Mothupi. Posterior Predictive Checks for the Generalized Pareto Distribution Based on a Dirichlet Process Prior. Am. J. Theor. Appl. Stat. 2019, 8(6), 287-295. doi: 10.11648/j.ajtas.20190806.20

    Copy | Download

    AMA Style

    Wilson Moseki Thupeng, Boikanyo Mokgweetsi, Thuto Mothupi. Posterior Predictive Checks for the Generalized Pareto Distribution Based on a Dirichlet Process Prior. Am J Theor Appl Stat. 2019;8(6):287-295. doi: 10.11648/j.ajtas.20190806.20

    Copy | Download

  • @article{10.11648/j.ajtas.20190806.20,
      author = {Wilson Moseki Thupeng and Boikanyo Mokgweetsi and Thuto Mothupi},
      title = {Posterior Predictive Checks for the Generalized Pareto Distribution Based on a Dirichlet Process Prior},
      journal = {American Journal of Theoretical and Applied Statistics},
      volume = {8},
      number = {6},
      pages = {287-295},
      doi = {10.11648/j.ajtas.20190806.20},
      url = {https://doi.org/10.11648/j.ajtas.20190806.20},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajtas.20190806.20},
      abstract = {Extreme value modelling is widely applied in situations where accurate assessment of the behavior of a process at high levels is needed. The inherent scarcity of extreme value data, the natural objective of predicting future extreme values of a process associated with modelling of extremes and the regularity assumptions required by the likelihood and probability weighted moments methods of parameter estimation within the frequentist framework, make it imperative for a practitioner to consider Bayesian methodology when modelling extremes. Within the Bayesian paradigm, the widely used tool for assessing the fitness of a model is by using posterior predictive checks (PPCs). The method involves comparing the posterior predictive distribution of future observations to the historical data. Posterior predictive inference involves the prediction of unobserved variables in light of observed data.. This paper considers posterior predictive checks for assessing model fitness for the generalized Pareto model based on a Dirichlet process prior. The posterior predictive distribution for the Dirichlet process based model is derived. Threshold selection is done by minimizing the negative differential entropy of the Dirichlet distribution. Predictions are drawn from the Bayesian posterior distribution by Markov chain Monte Carlo simulation (Metropolis-Hastings Algorithm). Two graphical measures of discrepancy between the predicted observations and the observed values commonly applied in practical extreme value modelling are considered, the cumulative distribution function and quantile plots. To support these, the Nash-Sutcliffe coefficient of model efficiency, a numerical measure that evaluates the error in the predicted observations relative to the natural variation in the observed values is used. Finite sample performance of the proposed procedure is illustrated through simulated data. The results of the study suggest that posterior predictive checks are reasonable diagnostic tools for assessing the fit of the generalized Pareto distribution. In addition, the posterior predictive quantile plot seems to be more informative than the probability plot. Most interestingly, selecting the threshold by minimizing the negative differential entropy of a Dirichlet process has the added advantage of allowing the analyst to estimate the concentration parameter from the model, rather than specifying it as a measure of his/her belief in the proposed model as a prior guess for the unknown distribution that generated the observations. Lastly, the results of application to real life data show that the distribution of the annual maximal inflows into the Okavango River at Mohembo, Botswana, can be adequately described by the generalized Pareto distribution.},
     year = {2019}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Posterior Predictive Checks for the Generalized Pareto Distribution Based on a Dirichlet Process Prior
    AU  - Wilson Moseki Thupeng
    AU  - Boikanyo Mokgweetsi
    AU  - Thuto Mothupi
    Y1  - 2019/12/25
    PY  - 2019
    N1  - https://doi.org/10.11648/j.ajtas.20190806.20
    DO  - 10.11648/j.ajtas.20190806.20
    T2  - American Journal of Theoretical and Applied Statistics
    JF  - American Journal of Theoretical and Applied Statistics
    JO  - American Journal of Theoretical and Applied Statistics
    SP  - 287
    EP  - 295
    PB  - Science Publishing Group
    SN  - 2326-9006
    UR  - https://doi.org/10.11648/j.ajtas.20190806.20
    AB  - Extreme value modelling is widely applied in situations where accurate assessment of the behavior of a process at high levels is needed. The inherent scarcity of extreme value data, the natural objective of predicting future extreme values of a process associated with modelling of extremes and the regularity assumptions required by the likelihood and probability weighted moments methods of parameter estimation within the frequentist framework, make it imperative for a practitioner to consider Bayesian methodology when modelling extremes. Within the Bayesian paradigm, the widely used tool for assessing the fitness of a model is by using posterior predictive checks (PPCs). The method involves comparing the posterior predictive distribution of future observations to the historical data. Posterior predictive inference involves the prediction of unobserved variables in light of observed data.. This paper considers posterior predictive checks for assessing model fitness for the generalized Pareto model based on a Dirichlet process prior. The posterior predictive distribution for the Dirichlet process based model is derived. Threshold selection is done by minimizing the negative differential entropy of the Dirichlet distribution. Predictions are drawn from the Bayesian posterior distribution by Markov chain Monte Carlo simulation (Metropolis-Hastings Algorithm). Two graphical measures of discrepancy between the predicted observations and the observed values commonly applied in practical extreme value modelling are considered, the cumulative distribution function and quantile plots. To support these, the Nash-Sutcliffe coefficient of model efficiency, a numerical measure that evaluates the error in the predicted observations relative to the natural variation in the observed values is used. Finite sample performance of the proposed procedure is illustrated through simulated data. The results of the study suggest that posterior predictive checks are reasonable diagnostic tools for assessing the fit of the generalized Pareto distribution. In addition, the posterior predictive quantile plot seems to be more informative than the probability plot. Most interestingly, selecting the threshold by minimizing the negative differential entropy of a Dirichlet process has the added advantage of allowing the analyst to estimate the concentration parameter from the model, rather than specifying it as a measure of his/her belief in the proposed model as a prior guess for the unknown distribution that generated the observations. Lastly, the results of application to real life data show that the distribution of the annual maximal inflows into the Okavango River at Mohembo, Botswana, can be adequately described by the generalized Pareto distribution.
    VL  - 8
    IS  - 6
    ER  - 

    Copy | Download

Author Information
  • Department of Statistics, University of Botswana, Gaborone, Botswana

  • Department of Statistics, University of Botswana, Gaborone, Botswana

  • Department of Statistics, University of Botswana, Gaborone, Botswana

  • Sections