Wednesday, September 29, 2010

OAM for FM and PM

The Operations, Administration, and Maintenance (OAM) functionality provided in all modern communications systems supports two distinguishable functions, namely Fault Management (FM) and Performance Management (PM).
It is important to remember that despite the use of the word “management” here, OAM is a user-plane function. OAM may trigger control plane procedures (e.g., protection switching) or management plane actions (such as alarms), but the OAM itself is data that runs along with the user data.

FM deals with the detection and reporting of malfunctions. ITU-T Recommendation G.806 defines a scale of such malfunctions :
  • anomaly (n): smallest observable discrepancy between desired and actual characteristics
  • defect (d): sufficient density of anomalies that interrupts some required function
  • fault cause (c): root cause behind multiple defects
  • failure (f): persistent fault cause such that the ability to perform the function is terminated

The main FM functions include :

  • Continuity Check (CC): checking that data sent from A to B indeed arrives at B
  • Connectivity Verification (CV): checking that data set from A to B does not incorrectly arrive at C
  • Loopback (LB): checking that data can be sent from A to B can be returned from B and received at A
  • Forward Defect Indication (FDI) also called Alarm Indication Signal (AIS): when data sent from A to B is destined for C, B reports to C that it did not receive data from A
  • Backward Defect Indication (BDI) also called Reverse Defect Indication (RDI): when data is sent from A to B, B reports to A that it did not receive the data.

PM deals with monitoring of parameters such as end-to-end delay, Bit Error Rate (BER), and Packet Loss Ratio (PLR). While there may not be loss of basic connectivity if performance parameters are not maintained within their desired realms, the ability to provide specific services may be compromised, even to the extent that there is a loss of service. For example, excessive round-trip delay makes it difficult to hold interactive audio conferences, and excessive PLR may lead to loss of an IPTV service. For this reason, Service Providers (SPs) commit to Service Level Agreements (SLAs) that specify the acceptable PM parameters.

A partial list of PM parameters that may appear in an SLA is :

  • BER or PLR (for packet oriented networks)
  • 1-way delay (1DM) also called latency: the amount of time it takes for data to go between two points of interest (this measurement requires clock synchronization between endpoints)
  • 2-way delay also called roundtrip delay (RTD): the amount of time it takes for data to go to a point of interest and return (does not require clock synchronization)
  • Packet Delay Variation (PDV): the variation of delay (may be 1-way or 2-way, but even 1-way does not require time synchronization, although frequency synchronization may be required for highly accurate measurements)
  • Availability: percentage of time that the service can be provided
  • Throughput or Bandwidth profile (for packet oriented networks): methods of quantifying the sustainable data rate (will generally be needed for each direction separately)

While certain FM functions, in particular Continuity Check (CC), are usually run periodically, PM functions are frequently called on an ad-hoc basis. However, with an SLA in effect, the SP needs to periodically monitor the PM parameters, and the customer may want to do so as well. In fact, while customers typically trust legacy SPs to provide the promised service level (after all, a 2.048 Mbps leased line is never going to deliver only 1.9 Mbps!), they have much less trust for newer services (it is relatively easy for a SP to cheat and provide 8 Mbps Ethernet throughput instead of the promised 10 Mbps).

In future entries I will deal with questions such as what parameter levels are needed for particular applications, how PM impacts user experience, and how SPs and customers should monitor performance.

Y(J)S