HostFaultInjection

public class HostFaultInjection extends CloudSimEntity

Generates random failures for the Pe’s of Hosts inside a given Datacenter. A Fault Injection object usually has to be created after the VMs are created, to make it easier to define a function to be used to clone failed VMs. The events happens in the following order:

  1. a time to inject a Host failure is generated using a given Random Number Generator;
  2. a Host is randomly selected to fail at that time using an internal Uniform Random Number Generator with the same seed of the given generator;
  3. the number of Host PEs to fail is randomly generated using the internal generator;
  4. failed physical PEs are removed from affected VMs, VMs with no remaining PEs and destroying and clones of them are submitted to the DatacenterBroker of the failed VMs;
  5. another failure is scheduled for a future time using the given generator;
  6. the process repeats until the end of the simulation.

When Host’s PEs fail, if there are more available PEs than the required by its running VMs, no VM will be affected.

Considering that X is the number of failed PEs and it is lower than the total available PEs. In this case, the X PEs will be removed cyclically, 1 by 1, from running VMs. This way, some VMs may continue running with less PEs than they requested initially. On the other hand, if after the failure the number of Host working PEs is lower than the required to run all VMs, some VMs will be destroyed.

If all PEs are removed from a VM, it is automatically destroyed and a snapshot (clone) from it is taken and submitted to the broker, so that the clone can start executing into another host. In this case, all the cloudlets which were running inside the VM yet, will be cloned to and restart executing from the beginning.

If a cloudlet running inside a VM which was affected by a PE failure requires Y PEs but the VMs doesn’t have such PEs anymore, the Cloudlet will continue executing, but it will spend more time to finish. For instance, if a Cloudlet requires 2 PEs but after the failure the VM was left with just 1 PE, the Cloudlet will spend the double of the time to finish.

NOTES:

  • Host PEs failures may happen after all its VMs have finished executing. This way, the presented simulation results may show that the number of PEs into a Host is lower than the required by its VMs. In this case, the VMs shown in the results finished executing before some failures have happened. Analysing the logs is easy to confirm that.
  • Failures inter-arrivals are defined in minutes, since seconds is a too small time unit to define such value. Furthermore, it doesn’t make sense to define the number of failures per second. This way, the generator of failure arrival times given to the constructor considers the time in minutes, despite the simulation time unit is seconds. Since commonly Cloudlets just take some seconds to finish, mainly in simulation examples, failures may happen just after the cloudlets have finished. This way, one usually should make sure that Cloudlets’ length are large enough to allow failures to happen before they end.

For more details, check Raysa Oliveira’s Master Thesis (only in Portuguese).

Author:raysaoliveira

See also: SAP Blog: Availability vs Reliability

Constructors

HostFaultInjection

public HostFaultInjection(Datacenter datacenter)

Creates a fault injection mechanism for the Hosts of a given Datacenter. The Hosts failures are randomly injected according to a UniformDistr pseudo random number generator, which indicates the mean of failures to be generated per hour, (which is also called event rate or rate parameter).

Parameters:
  • datacenter – the Datacenter to which failures will be randomly injected for its Hosts

See also: .HostFaultInjection(Datacenter,ContinuousDistribution)

HostFaultInjection

public HostFaultInjection(Datacenter datacenter, ContinuousDistribution faultArrivalHoursGenerator)

Creates a fault injection mechanism for the Hosts of a given Datacenter. The Hosts failures are randomly injected according to the given pseudo random number generator, that indicates the mean of failures to be generated per minute, (which is also called event rate or rate parameter).

Parameters:
  • datacenter – the Datacenter to which failures will be randomly injected for its Hosts
  • faultArrivalHoursGenerator – a Pseudo Random Number Generator which generates the times Hosts failures will occur (in hours). The values returned by the generator will be considered to be hours. Frequently it is used a PoissonDistr to generate failure arrivals, but any ContinuousDistribution can be used.

Methods

addVmCloner

public void addVmCloner(DatacenterBroker broker, VmCloner cloner)

Adds a VmCloner that creates a clone for the last failed Vm belonging to a given broker, when all VMs of that broker have failed.

This is optional. If a VmCloner is not set, VMs will not be recovered from failures.

Parameters:
  • broker – the broker to set the VM cloner Function to
  • cloner – the VmCloner to set

availability

public double availability()

Gets the Datacenter’s availability as a percentage value between 0 to 1, based on VMs’ downtime (the times VMs took to be repaired).

availability

public double availability(DatacenterBroker broker)

Gets the availability for a given broker as a percentage value between 0 to 1, based on VMs’ downtime (the times VMs took to be repaired).

Parameters:
  • broker – the broker to get the availability of its VMs

generateHostFault

public void generateHostFault(Host host)

Generates a fault for all PEs of a Host.

Parameters:
  • host – the Host to generate the fault to.

generateHostFault

public void generateHostFault(Host host, long numberOfPesToFail)

Generates a fault for a given number of random PEs of a Host.

Parameters:
  • host – the Host to generate the fault to.
  • numberOfPesToFail – number of PEs that must fail

getDatacenter

public Datacenter getDatacenter()

Gets the datacenter in which failures will be injected.

getLastFailedHost

public Host getLastFailedHost()

Gets the last Host for which a failure was injected.

Returns:the last failed Host or Host.NULL if not Host has failed yet.

getMaxTimeToFailInHours

public double getMaxTimeToFailInHours()

Gets the maximum time to generate a failure (in hours). After that time, no failure will be generated.

See also: .getMaxTimeToFailInSecs()

getNumberOfFaults

public long getNumberOfFaults()

Gets the total number of faults which affected all VMs from any broker.

getNumberOfFaults

public long getNumberOfFaults(DatacenterBroker broker)

Gets the total number of Host faults which affected all VMs from a given broker or VMs from all existing brokers.

Parameters:
  • broker – the broker to get the number of Host faults affecting its VMs or null whether is to be counted Host faults affecting VMs from any broker

getNumberOfHostFaults

public int getNumberOfHostFaults()

Gets the total number of faults happened for existing hosts. This isn’t the total number of failed hosts because one host may fail multiple times.

getRandomRecoveryTimeForVmInSecs

public double getRandomRecoveryTimeForVmInSecs()

Gets a Pseudo Random Number used to give a recovery time (in seconds) for each VM that was failed.

meanTimeBetweenHostFaultsInMinutes

public double meanTimeBetweenHostFaultsInMinutes()

Computes the current Mean Time Between host Failures (MTBF) in minutes. Since Hosts don’t actually recover from failures, there aren’t recovery time to make easier the computation of MTBF for Host as it is directly computed for VMs.

Returns:the current mean time (in minutes) between Host failures (MTBF) or zero if no failures have happened yet

See also: .meanTimeBetweenVmFaultsInMinutes()

meanTimeBetweenVmFaultsInMinutes

public double meanTimeBetweenVmFaultsInMinutes()

Computes the current Mean Time Between host Failures (MTBF) in minutes, which affected VMs from any broker for the entire Datacenter. It uses a straightforward way to compute the MTBF. Since it’s stored the VM recovery times, it’s possible to use such values to make easier the MTBF computation, different from the Hosts MTBF.

Returns:the current Mean Time Between host Failures (MTBF) in minutes or zero if no VM was destroyed due to Host failure

See also: .meanTimeBetweenHostFaultsInMinutes()

meanTimeBetweenVmFaultsInMinutes

public double meanTimeBetweenVmFaultsInMinutes(DatacenterBroker broker)

Computes the current Mean Time Between host Failures (MTBF) in minutes, which affected VMs from a given broker. It uses a straightforward way to compute the MTBF. Since it’s stored the VM recovery times, it’s possible to use such values to make easier the MTBF computation, different from the Hosts MTBF.

Parameters:
  • broker – the broker to get the MTBF for
Returns:

the current mean time (in minutes) between Host failures (MTBF) or zero if no VM was destroyed due to Host failure

See also: .meanTimeBetweenHostFaultsInMinutes()

meanTimeToRepairVmFaultsInMinutes

public double meanTimeToRepairVmFaultsInMinutes()

Computes the current Mean Time To Repair failures of VMs in minutes (MTTR) in the Datacenter, for all existing brokers.

Returns:the MTTR (in minutes) or zero if no VM was destroyed due to Host failure

meanTimeToRepairVmFaultsInMinutes

public double meanTimeToRepairVmFaultsInMinutes(DatacenterBroker broker)

Computes the current Mean Time To Repair Failures of VMs in minutes (MTTR) belonging to given broker. If a null broker is given, computes the MTTR of all VMs for all existing brokers.

Parameters:
  • broker – the broker to get the MTTR for or null if the MTTR is to be computed for all brokers
Returns:

the current MTTR (in minutes) or zero if no VM was destroyed due to Host failure

processEvent

public void processEvent(SimEvent evt)

setDatacenter

protected final void setDatacenter(Datacenter datacenter)

Sets the datacenter in which failures will be injected.

Parameters:
  • datacenter – the datacenter to set

setMaxTimeToFailInHours

public void setMaxTimeToFailInHours(double maxTimeToFailInHours)

Sets the maximum time to generate a failure (in hours). After that time, no failure will be generated.

Parameters:
  • maxTimeToFailInHours – the maximum time to set (in hours)

startEntity

protected void startEntity()