Home | Libraries | People | FAQ | More |
#include <boost/math/distributions/hyperexponential.hpp>
namespace boost{ namespace math{ template <typename RealType = double, typename Policy = policies::policy<> > class hyperexponential_distribution; typedef hyperexponential_distribution<> hyperexponential; template <typename RealType, typename Policy> class hyperexponential_distribution { public: typedef RealType value_type; typedef Policy policy_type; // Constructors: hyperexponential_distribution(); // Default. template <typename RateIterT, typename RateIterT2> hyperexponential_distribution( // Default equal probabilities. RateIterT const& rate_first, RateIterT2 const& rate_last); // Rates using Iterators. template <typename ProbIterT, typename RateIterT> hyperexponential_distribution(ProbIterT prob_first, ProbIterT prob_last, RateIterT rate_first, RateIterT rate_last); // Iterators. template <typename ProbRangeT, typename RateRangeT> hyperexponential_distribution(ProbRangeT const& prob_range, RateRangeT const& rate_range); // Ranges. template <typename RateRangeT> hyperexponential_distribution(RateRangeT const& rate_range); #if !defined(BOOST_NO_CXX11_HDR_INITIALIZER_LIST) // C++11 initializer lists supported. hyperexponential_distribution(std::initializer_list<RealType> l1, std::initializer_list<RealType> l2); hyperexponential_distribution(std::initializer_list<RealType> l1); #endif // Accessors: std::size_t num_phases() const; std::vector<RealType> probabilities() const; std::vector<RealType> rates() const; }; }} // namespaces
Note | |
---|---|
An implementation-defined mechanism is provided to avoid ambiguity between constructors accepting ranges, iterators and constants as parameters. This should be transparent to the user. See below and the header file hyperexponential.hpp for details and explanatory comments. |
The class type hyperexponential_distribution
represents a hyperexponential
distribution.
A k-phase hyperexponential distribution is a continuous probability distribution obtained as a mixture of k Exponential Distributions. It is also referred to as mixed exponential distribution or parallel k-phase exponential distribution.
A k-phase hyperexponential distribution is characterized by two parameters, namely a phase probability vector α=(α1,...,αk) and a rate vector λ=(λ1,...,λk).
The probability density function for random variate x in a hyperexponential distribution is given by:
The following graph illustrates the PDF of the hyperexponential distribution with five different parameters, namely:
Also, the following graph illustrates the PDF of the hyperexponential distribution (solid lines) where only the phase probability vector changes together with the PDF of the two limiting exponential distributions (dashed lines):
As expected, as the first element α1 of the phase probability vector approaches to 1 (or, equivalently, α2 approaches to 0), the resulting hyperexponential distribution nears the exponential distribution with parameter λ=0.5. Conversely, as the first element α2 of the phase probability vector approaches to 1 (or, equivalently, α1 approaches to 0), the resulting hyperexponential distribution nears the exponential distribution with parameter λ=1.5.
Finally, the following graph compares the PDF of the hyperexponential distribution with different number of phases but with the same mean value equal to 2:
As can be noted, even if the three distributions have the same mean value, the two hyperexponential distributions have a longer tail with respect to the one of the exponential distribution. Indeed, the hyperexponential distribution has a larger variability than the exponential distribution, thus resulting in a Coefficient of Variation greater than 1 (as opposed to the one of the exponential distribution which is exactly 1).
A k-phase hyperexponential distribution is frequently used in queueing theory to model the distribution of the superposition of k independent events, like, for instance, the service time distribution of a queueing station with k servers in parallel where the i-th server is chosen with probability αi and its service time distribution is an exponential distribution with rate λi (Allen,1990; Papadopolous et al.,1993; Trivedi,2002).
For instance, CPUs service-time distribution in a computing system has often been observed to possess such a distribution (Rosin,1965). Also, the arrival of different types of customer to a single queueing station is often modeled as a hyperexponential distribution (Papadopolous et al.,1993). Similarly, if a product manufactured in several parallel assembly lines and the outputs are merged, the failure density of the overall product is likely to be hyperexponential (Trivedi,2002).
Finally, since the hyperexponential distribution exhibits a high Coefficient of Variation (CoV), that is a CoV > 1, it is especially suited to fit empirical data with large CoV (Feitelson,2014; Wolski et al.,2013) and to approximate long-tail probability distributions (Feldmann et al.,1998).
1
, the hyperexponential distribution
is simply an Exponential
Distribution.
Suppose a customer is buying an appliance and is choosing at random between an appliance with average lifetime of 10 years and an appliance with average lifetime of 12 years. Assuming the lifetime of this appliance follows an exponential distribution, the lifetime distribution of the purchased appliance can be modeled as a hyperexponential distribution with phase probability vector α=(1/2,1/2) and rate vector λ=(1/10,1/12) (Wolfram,2014).
In the rest of this section, we provide an example C++ implementation for computing the average lifetime and the probability that the appliance will work for more than 15 years.
#include <boost/math/distributions/hyperexponential.hpp> #include <iostream> int main() { const double rates[] = { 1.0 / 10.0, 1.0 / 12.0 }; boost::math::hyperexponential he(rates); std::cout << "Average lifetime: " << boost::math::mean(he) << " years" << std::endl; std::cout << "Probability that the appliance will work for more than 15 years: " << boost::math::cdf(boost::math::complement(he, 15.0)) << std::endl; }
The resulting output is:
Average lifetime: 11 years Probability that the appliance will work for more than 15 years: 0.254817
Cloud computing has become a popular metaphor for dynamic and secure self-service access to computational and storage capabilities. In (Wolski et al.,2013), the authors analyze and model workloads gathered from enterprise-operated commercial private clouds and show that 3-phase hyperexponential distributions (fitted using the Expectation Maximization algorithm) capture workload attributes accurately.
In this type of computing system, user requests consist in demanding the provisioning of one or more Virtual Machines (VMs). In particular, in (Wolski et al.,2013) the workload experienced by each cloud system is a function of four distributions, one for each of the following workload attributes:
The authors assume that all VMs in a request have the same core count, but request sizes and core counts can vary from request to request. Moreover, all VMs within a request are assumed to have the same lifetime. Given these assumptions, the authors build a statistical model for the request interarrival time and VM lifetime attributes by fitting their respective data to a 3-phase hyperexponential distribution.
In the following table, we show the sample mean and standard deviation (SD), in seconds, of the request interarrival time and of the VM lifetime distributions of the three datasets collected by authors:
Dataset |
Mean Request Interarrival Time (SD) |
Mean Multi-core VM Lifetime (SD) |
Mean Single-core VM Lifetime (SD) |
---|---|---|---|
DS1 |
2202.1 (2.2e+04) |
257173 (4.6e+05) |
28754.4 (1.6e+05) |
DS2 |
41285.7 (1.1e+05) |
144669.0 (7.9e+05) |
599815.0 (1.7e+06) |
DS3 |
11238.8 (3.0e+04) |
30739.2 (1.6e+05) |
44447.8 (2.2e+05) |
Whereas in the following table we show the hyperexponential distribution parameters resulting from the fit:
Dataset |
Request Interarrival Time |
Multi-core VM Lifetime |
Single-core VM Lifetime |
---|---|---|---|
DS1 |
α=(0.34561,0.08648,0.56791), λ=(0.008,0.00005,0.02894) |
α=(0.24667,0.37948,0.37385), λ=(0.00004,0.000002,0.00059) |
α=(0.09325,0.22251,0.68424), λ=(0.000003,0.00109,0.00109) |
DS2 |
α=(0.38881,0.18227,0.42892), λ=(0.000006,0.05228,0.00081) |
α=(0.42093,0.43960,0.13947), λ=(0.00186,0.00008,0.0000008) |
α=(0.44885,0.30675,0.2444), λ=(0.00143,0.00005,0.0000004) |
DS3 |
α=(0.39442,0.24644,0.35914), λ=(0.00030,0.00003,0.00257) |
α=(0.37621,0.14838,0.47541), λ=(0.00498,0.000005,0.00022) |
α=(0.34131,0.12544,0.53325), λ=(0.000297,0.000003,0.00410) |
In the rest of this section, we provide an example C++ implementation for computing some statistical properties of the fitted distributions for each of the analyzed dataset.
#include <boost/math/distributions.hpp> #include <iostream> #include <string> struct ds_info { std::string name; double iat_sample_mean; double iat_sample_sd; boost::math::hyperexponential iat_he; double multi_lt_sample_mean; double multi_lt_sample_sd; boost::math::hyperexponential multi_lt_he; double single_lt_sample_mean; double single_lt_sample_sd; boost::math::hyperexponential single_lt_he; }; // DS1 dataset ds_info make_ds1() { ds_info ds; ds.name = "DS1"; // VM interarrival time distribution const double iat_fit_probs[] = { 0.34561,0.08648,0.56791 }; const double iat_fit_rates[] = { 0.0008,0.00005,0.02894 }; ds.iat_sample_mean = 2202.1; ds.iat_sample_sd = 2.2e+4; ds.iat_he = boost::math::hyperexponential(iat_fit_probs, iat_fit_rates); // Multi-core VM lifetime distribution const double multi_lt_fit_probs[] = { 0.24667,0.37948,0.37385 }; const double multi_lt_fit_rates[] = { 0.00004,0.000002,0.00059 }; ds.multi_lt_sample_mean = 257173; ds.multi_lt_sample_sd = 4.6e+5; ds.multi_lt_he = boost::math::hyperexponential(multi_lt_fit_probs, multi_lt_fit_rates); // Single-core VM lifetime distribution const double single_lt_fit_probs[] = { 0.09325,0.22251,0.68424 }; const double single_lt_fit_rates[] = { 0.000003,0.00109,0.00109 }; ds.single_lt_sample_mean = 28754.4; ds.single_lt_sample_sd = 1.6e+5; ds.single_lt_he = boost::math::hyperexponential(single_lt_fit_probs, single_lt_fit_rates); return ds; } // DS2 dataset ds_info make_ds2() { ds_info ds; ds.name = "DS2"; // VM interarrival time distribution const double iat_fit_probs[] = { 0.38881,0.18227,0.42892 }; const double iat_fit_rates[] = { 0.000006,0.05228,0.00081 }; ds.iat_sample_mean = 41285.7; ds.iat_sample_sd = 1.1e+05; ds.iat_he = boost::math::hyperexponential(iat_fit_probs, iat_fit_rates); // Multi-core VM lifetime distribution const double multi_lt_fit_probs[] = { 0.42093,0.43960,0.13947 }; const double multi_lt_fit_rates[] = { 0.00186,0.00008,0.0000008 }; ds.multi_lt_sample_mean = 144669.0; ds.multi_lt_sample_sd = 7.9e+05; ds.multi_lt_he = boost::math::hyperexponential(multi_lt_fit_probs, multi_lt_fit_rates); // Single-core VM lifetime distribution const double single_lt_fit_probs[] = { 0.44885,0.30675,0.2444 }; const double single_lt_fit_rates[] = { 0.00143,0.00005,0.0000004 }; ds.single_lt_sample_mean = 599815.0; ds.single_lt_sample_sd = 1.7e+06; ds.single_lt_he = boost::math::hyperexponential(single_lt_fit_probs, single_lt_fit_rates); return ds; } // DS3 dataset ds_info make_ds3() { ds_info ds; ds.name = "DS3"; // VM interarrival time distribution const double iat_fit_probs[] = { 0.39442,0.24644,0.35914 }; const double iat_fit_rates[] = { 0.00030,0.00003,0.00257 }; ds.iat_sample_mean = 11238.8; ds.iat_sample_sd = 3.0e+04; ds.iat_he = boost::math::hyperexponential(iat_fit_probs, iat_fit_rates); // Multi-core VM lifetime distribution const double multi_lt_fit_probs[] = { 0.37621,0.14838,0.47541 }; const double multi_lt_fit_rates[] = { 0.00498,0.000005,0.00022 }; ds.multi_lt_sample_mean = 30739.2; ds.multi_lt_sample_sd = 1.6e+05; ds.multi_lt_he = boost::math::hyperexponential(multi_lt_fit_probs, multi_lt_fit_rates); // Single-core VM lifetime distribution const double single_lt_fit_probs[] = { 0.34131,0.12544,0.53325 }; const double single_lt_fit_rates[] = { 0.000297,0.000003,0.00410 }; ds.single_lt_sample_mean = 44447.8; ds.single_lt_sample_sd = 2.2e+05; ds.single_lt_he = boost::math::hyperexponential(single_lt_fit_probs, single_lt_fit_rates); return ds; } void print_fitted(ds_info const& ds) { const double secs_in_a_hour = 3600; const double secs_in_a_month = 30 * 24 * secs_in_a_hour; std::cout << "### " << ds.name << std::endl; std::cout << "* Fitted Request Interarrival Time" << std::endl; std::cout << " - Mean (SD): " << boost::math::mean(ds.iat_he) << " (" << boost::math::standard_deviation(ds.iat_he) << ") seconds." << std::endl; std::cout << " - 99th Percentile: " << boost::math::quantile(ds.iat_he, 0.99) << " seconds." << std::endl; std::cout << " - Probability that a VM will arrive within 30 minutes: " << boost::math::cdf(ds.iat_he, secs_in_a_hour / 2.0) << std::endl; std::cout << " - Probability that a VM will arrive after 1 hour: " << boost::math::cdf(boost::math::complement(ds.iat_he, secs_in_a_hour)) << std::endl; std::cout << "* Fitted Multi-core VM Lifetime" << std::endl; std::cout << " - Mean (SD): " << boost::math::mean(ds.multi_lt_he) << " (" << boost::math::standard_deviation(ds.multi_lt_he) << ") seconds." << std::endl; std::cout << " - 99th Percentile: " << boost::math::quantile(ds.multi_lt_he, 0.99) << " seconds." << std::endl; std::cout << " - Probability that a VM will last for less than 1 month: " << boost::math::cdf(ds.multi_lt_he, secs_in_a_month) << std::endl; std::cout << " - Probability that a VM will last for more than 3 months: " << boost::math::cdf(boost::math::complement(ds.multi_lt_he, 3.0*secs_in_a_month)) << std::endl; std::cout << "* Fitted Single-core VM Lifetime" << std::endl; std::cout << " - Mean (SD): " << boost::math::mean(ds.single_lt_he) << " (" << boost::math::standard_deviation(ds.single_lt_he) << ") seconds." << std::endl; std::cout << " - 99th Percentile: " << boost::math::quantile(ds.single_lt_he, 0.99) << " seconds." << std::endl; std::cout << " - Probability that a VM will last for less than 1 month: " << boost::math::cdf(ds.single_lt_he, secs_in_a_month) << std::endl; std::cout << " - Probability that a VM will last for more than 3 months: " << boost::math::cdf(boost::math::complement(ds.single_lt_he, 3.0*secs_in_a_month)) << std::endl; } int main() { print_fitted(make_ds1()); print_fitted(make_ds2()); print_fitted(make_ds3()); }
The resulting output (with floating-point precision set to 2) is:
### DS1 * Fitted Request Interarrival Time - Mean (SD): 2.2e+03 (8.1e+03) seconds. - 99th Percentile: 4.3e+04 seconds. - Probability that a VM will arrive within 30 minutes: 0.84 - Probability that a VM will arrive after 1 hour: 0.092 * Fitted Multi-core VM Lifetime - Mean (SD): 2e+05 (3.9e+05) seconds. - 99th Percentile: 1.8e+06 seconds. - Probability that a VM will last for less than 1 month: 1 - Probability that a VM will last for more than 3 months: 6.7e-08 * Fitted Single-core VM Lifetime - Mean (SD): 3.2e+04 (1.4e+05) seconds. - 99th Percentile: 7.4e+05 seconds. - Probability that a VM will last for less than 1 month: 1 - Probability that a VM will last for more than 3 months: 6.9e-12 ### DS2 * Fitted Request Interarrival Time - Mean (SD): 6.5e+04 (1.3e+05) seconds. - 99th Percentile: 6.1e+05 seconds. - Probability that a VM will arrive within 30 minutes: 0.52 - Probability that a VM will arrive after 1 hour: 0.4 * Fitted Multi-core VM Lifetime - Mean (SD): 1.8e+05 (6.4e+05) seconds. - 99th Percentile: 3.3e+06 seconds. - Probability that a VM will last for less than 1 month: 0.98 - Probability that a VM will last for more than 3 months: 0.00028 * Fitted Single-core VM Lifetime - Mean (SD): 6.2e+05 (1.6e+06) seconds. - 99th Percentile: 8e+06 seconds. - Probability that a VM will last for less than 1 month: 0.91 - Probability that a VM will last for more than 3 months: 0.011 ### DS3 * Fitted Request Interarrival Time - Mean (SD): 9.7e+03 (2.2e+04) seconds. - 99th Percentile: 1.1e+05 seconds. - Probability that a VM will arrive within 30 minutes: 0.53 - Probability that a VM will arrive after 1 hour: 0.36 * Fitted Multi-core VM Lifetime - Mean (SD): 3.2e+04 (1e+05) seconds. - 99th Percentile: 5.4e+05 seconds. - Probability that a VM will last for less than 1 month: 1 - Probability that a VM will last for more than 3 months: 1.9e-18 * Fitted Single-core VM Lifetime - Mean (SD): 4.3e+04 (1.6e+05) seconds. - 99th Percentile: 8.4e+05 seconds. - Probability that a VM will last for less than 1 month: 1 - Probability that a VM will last for more than 3 months: 9.3e-12
Note | |
---|---|
The above results differ from the ones shown in Tables III, V, and VII of (Wolski et al.,2013). We carefully double-checked them with Wolfram Mathematica 10, which confirmed our results. |
hyperexponential_distribution();
Constructs a 1-phase hyperexponential distribution
(i.e., an exponential distribution) with rate 1
.
template <typename ProbIterT, typename RateIterT> hyperexponential_distribution(ProbIterT prob_first, ProbIterT prob_last, RateIterT rate_first, RateIterT rate_last);
Constructs a hyperexponential distribution with phase probability
vector parameter given by the range defined by [prob_first
, prob_last
)
iterator pair, and rate vector parameter given by
the range defined by the [rate_first
,
rate_last
) iterator pair.
prob_first
, prob_last
: the range of non-negative
real elements representing the phase probabilities; elements are normalized
to sum to unity.
rate_first
, rate_last
: the range of positive
elements representing the rates.
ProbIterT
, RateIterT
: must meet the requirements
of the InputIterator
concept.
std::array<double, 2> phase_prob = { 0.5, 0.5 }; std::array<double, 2> rates = { 1.0 / 10, 1.0 / 12 }; hyperexponential he(phase_prob.begin(), phase_prob.end(), rates.begin(), rates.end());
template <typename ProbRangeT, typename RateRangeT> hyperexponential_distribution(ProbRangeT const& prob_range, RateRangeT const& rate_range);
Constructs a hyperexponential distribution with phase probability
vector parameter given by the range defined by prob_range
, and rate vector
parameter given by the range defined by rate_range
.
Note | |
---|---|
As an implementation detail, this constructor uses Boost's enable_if/disable_if mechanism to disambiguate between this and other 2-argument constructors. Refer to the source code for more details. |
prob_range
: the range
of non-negative real elements representing the phase probabilities;
elements are normalized to sum to unity.
rate_range
: the range
of positive real elements representing the rates.
ProbRangeT
, RateRangeT
: must meet the requirements
of the Range
concept: that includes native C++ arrays, standard library containers,
or a std::pair or iterators.
// We could be using any standard library container here... vector, deque, array, list etc: std::array<double, 2> phase_prob = { 0.5, 0.5 }; std::array<double, 2> rates = { 1.0 / 10, 1.0 / 12 }; hyperexponential he1(phase_prob, rates); // Construct from standard library container. double phase_probs2[] = { 0.5, 0.5 }; double rates2[] = { 1.0 / 10, 1.0 / 12 }; hyperexponential he2(phase_probs2, rates2); // Construct from native C++ array.
template <typename RateIterT, typename RateIterT2> hyperexponential_distribution(RateIterT const& rate_first, RateIterT2 const& rate_last);
Constructs a hyperexponential distribution with rate vector
parameter given by the range defined by the [rate_first
,
rate_last
) iterator pair,
and phase probability vector set to the equal phase
probabilities (i.e., to a vector of the same length n
of the rate vector and with each element set to 1.0/n
).
Note | |
---|---|
As an implementation detail, this constructor uses Boost's enable_if/disable_if mechanism to disambiguate between this and other 2-argument constructors. Refer to the source code for more details. |
rate_first
, rate_last
: the range of positive
elements representing the rates.
RateIterT
, RateIterT2
: must meet the requirements
of the InputIterator
concept.
// We could be using any standard library container here... vector, deque, array, list etc: std::array<double, 2> rates = { 1.0 / 10, 1.0 / 12 }; hyperexponential he(rates.begin(), rates.end()); BOOST_ASSERT(he.probabilities()[0] == 0.5); // Phase probabilities will be equal and normalised to unity.
template <typename RateRangeT> hyperexponential_distribution(RateRangeT const& rate_range);
Constructs a hyperexponential distribution with rate vector
parameter given by the range defined by rate_range
,
and phase probability vector set to the equal phase
probabilities (i.e., to a vector of the same length n
of the rate vector and with each element set to 1.0/n
).
rate_range
: the range
of positive real elements representing the rates.
RateRangeT
: must meet
the requirements of the Range
concept: this includes native C++ array, standard library containers,
and a std::pair
of iterators.
std::array<double, 2> rates = { 1.0 / 10, 1.0 / 12 }; hyperexponential he(rates); BOOST_ASSERT(he.probabilities()[0] == 0.5); // Phase probabilities will be equal and normalised to unity.
hyperexponential_distribution(std::initializer_list<RealType> l1, std::initializer_list<RealType> l2);
Constructs a hyperexponential distribution with phase probability
vector parameter given by the brace-init-list
defined by l1
, and rate
vector parameter given by the brace-init-list
defined by l2
.
l1
: the brace-init-list
of non-negative real elements representing the phase probabilities;
elements are normalized to sum to unity.
l2
: the brace-init-list
of positive real elements representing the rates.
The number of elements of the phase probabilities list and the rates list must be the same.
hyperexponential he = { { 0.5, 0.5 }, { 1.0 / 10, 1.0 / 12 } };
hyperexponential_distribution(std::initializer_list<RealType> l1);
Constructs a hyperexponential distribution with rate vector
parameter given by the brace-init-list
defined by l1
, and phase
probability vector set to the equal phase probabilities (i.e.,
to a vector of the same length n
of the rate vector and with each element set to 1.0/n
).
l1
: the brace-init-list
of non-negative real elements representing the phase probabilities;
they are normalized to ensure that they sum to unity.
hyperexponential he = { 1.0 / 10, 1.0 / 12 }; BOOST_ASSERT(he.probabilities()[0] == 0.5);
std::size_t num_phases() const;
Gets the number of phases of this distribution (the size of both the rate and probability vectors).
An non-negative integer number representing the number of phases of this distribution.
std::vector<RealType> probabilities() const;
Gets the phase probability vector parameter of this distribution.
Note | |
---|---|
The returned probabilities are the normalized versions of the probability parameter values passed at construction time. |
A vector of non-negative real numbers representing the phase probability vector parameter of this distribution.
std::vector<RealType> rates() const;
Gets the rate vector parameter of this distribution.
A vector of positive real numbers representing the rate vector parameter of this distribution.
Warning | |
---|---|
The return type of these functions is a vector-by-value. This is deliberate as we wish to hide the actual container used internally which may be subject to future changes (for example to facilitate vectorization of the cdf code etc). Users should note that some code that might otherwise have been expected to work does not. For example, an attempt to output the (normalized) probabilities: std::copy(he.probabilities().begin(), he.probabilities().end(), std::ostream_iterator<double>(std::cout, " ")); fails at compile or runtime because iterator types are incompatible, but, for example, std::cout << he.probabilities()[0] << ' ' << he.probabilities()[1] << std::endl; outputs the expected values. In general if you want to access a member of the returned container, then assign to a variable first, and then access those members: std::vector<double> t = he.probabilities(); std::copy(t.begin(), t.end(), std::ostream_iterator<double>(std::cout, " ")); |
All the usual non-member accessor functions that are generic to all distributions are supported: Cumulative Distribution Function, Probability Density Function, Quantile, Hazard Function, Cumulative Hazard Function, mean, median, mode, variance, standard deviation, skewness, kurtosis, kurtosis_excess, range and support.
The formulae for calculating these are shown in the table below.
The hyperexponential distribution is implemented in terms of the Exponential Distribution and as such should have very small errors, usually an epsilon or few.
In the following table:
Function |
Implementation Notes |
---|---|
support |
x ∈ [0,∞) |
|
|
cdf |
|
cdf complement |
|
quantile |
No closed form available. Computed numerically. |
quantile from the complement |
No closed form available. Computed numerically. |
mean |
|
variance |
|
mode |
|
skewness |
|
kurtosis |
|
kurtosis excess |
kurtosis |