Monday, April 29, 2013

Physics (Installment 2 of a series)

In the previous installment of this series on NFV, SDN and related issues, I talked about the relationship between communications and computation (the merging of which I dubbed computications); now we are ready to take these ideas one step further.

A computational resource (such as a server or a laptop computer), as well as a communications resource (such as a router or a switch), may be a genuine physical entity. You can touch such resources with your hand; and thus we call such instantiations – hardware. On the other hand an algorithm or a protocol, as well as data input, output or transferred, can’t be held in your hand or seen with your eyes. So we fancifully call them (after Tukey) software. Software is more logic or mathematics (i.e., pure thought) than physics.

Hardware – the part of a malfunctioning computing system that you can kick.
Software - the part of a malfunctioning computing system that you can merely curse.
- Anonymous sage

However, it is commonplace to recognize that there is a spectrum between these two extremes. Closer to the software is the so-called firmware, which is software that is not frequently reprogrammed and lives in persistent memory.  Closer to the hardware is an FPGA which can be held in your hand, but whose true significance lies in its HDL programming. In between we can have a processor core in an FGPA, network and DSP processors that are programmed but can’t be efficiently replaced with software on a general purpose CPU, driver software, and so on.

 

Every time we plan implementation of a task, we need to consider where to instantiate it on this spectrum. Usually it is optimal to situate simple logic that needs to run at high speeds as close to the hardware as possible; while complex logic that needs to run at low speeds is best located close to the software end of the spectrum. However, in many cases multiple factors need to be considered before making a final placement decision.

Concretization (well, maybe the word is not often used in this way) is the act of moving a task usually implemented closer to the software end of the spectrum towards the hardware end (from right to left in the above figure). Thus when an algorithm originally coded in a general purpose programming language in order to run on a CPU is rewritten in assembler to run on a DSP, or when a task in DSP code is re-implemented in FPGA gates, or when an FPGA design is cast into discrete unprogrammable hardware, we have performed concretization of the task.

Virtualization is the opposite procedure, i.e., moving a task usually implemented closer to the hardware end of the spectrum towards the software end (from left to right in the aforementioned figure). Thus when a task originally carried out by analog circuitry is converted to a DSP-specific algorithm, or a task running on a DSP is transferred to a code running on a general purpose processor, we have virtualized the task. The term virtualization is frequently reserved for the extreme leap of taking hardware functionality directly to software.

The reasons for performing concretization are many, and mostly fairly obvious. When mass-produced, application specific hardware is less expensive than general purpose processing power; making the initial development investment worthwhile for functionality that is widely deployed. Special purpose hardware may be miniaturized and packaged in ways enabling or facilitating desired applications. Dedicated hardware may attain higher communications data-rates than its software emulation. Constrained designs may achieve higher energy efficiency and lower heat dissipation than more flexible instantiations of the same functionality.

The reasoning behind virtualization is initially harder to grasp. For functionality of limited scope it may not be economically worthwhile to expend the development efforts necessary to implement close to hardware. Flexibility is certainly another desirable attribute to be gained by implementing communications functionality in software; virtualization facilitates upgrading a communications device’s functionality, or even adding totally new functionality (due to new operational needs or new protocol developments).

Software Defined Radio (SDR) is a highly developed example of virtualization of physical layer processing. Transmitters and receivers were once exclusively implemented by analog circuitry, which was subsequently replaced by DSP code both for higher accuracy (i.e., lower noise) and to enable more sophisticated processing. For example, the primitive AM envelope detector and FM ring demodulator are today be replaced by digitally calculated Hilbert transforms followed extraction of instantaneous amplitude and frequency. This leads to consequent noise reduction and facilitation of advanced features such as tracking frequency drift and notching out interfering signals. The SDR approach leveraged this development and further allowed downloading of DSP code for the transmitter and/or receiver of interest. Thus a single platform could be an LF AM receiver, or an HF SSB receiver, or a VHF FM receiver, depending on the downloaded executable software. A follow-on development was cognitive radio, where the SDR transceiver can dynamically select the best communications channel available (based on regulatory constraints, spectrum allocation, noise present at particular frequencies, measured performance, etc.) and set its transmission and reception parameters accordingly.

A more recent example of virtualization is Network Functions Virtualization (NFV). Here the communications functionalities being virtualized are higher up the stack at the networking layer. This concept is being evangelized by the ETSI Industry Specification Group on NFV, based on service provider requirements. The specific pain being addressed is that each new service offered requires deploying one or more new hardware appliances throughout the service provider's network. This then engenders not only the capital expense of acquiring these new devices, but allocation of shelf space and power to accommodate them, and training staff to configure and maintain them. As the number of new services accelerates while their lifecycles shorten, acute pressure is placed on the service provider's bottom line. NFV proposes implementing new functionalities in industry standard servers (located in datacenters, POPs, or customer premises), thus reducing CAPEX, consolidating multiple equipment types, reducing time-to-market, and simplifying both initial deployment and maintenance. The network functions that may be virtualized may include packet forwarding devices such as switches and routers; security appliances such as firewalls, virus scanners, IDS, and IPsec gateways; network optimization functions such as NATs, CDNs, load balancers and application accelerators; cellular network nodes such as HLR, MME, and SGSN; etc. NFV proponents believe that it is the only way to keep up with explosion in nework traffic volumes and service types while restraining CAPEX.

Virtualization as we have just defined deals with the instantiation of the communications functionality itself; leaving open the issues of mechanisms used to configure and maintain devices implementing these functionalities. That topic will be the subject of the next installment.

Y(J)S

Thursday, April 25, 2013

Computications (Installment 1 of a series)

We have been hearing a lot recently about Network Functions Virtualization (NFV), Software Defined Networking (SDN), and how communications could learn a lot from computer science. This is the first blog entry of a series in which I will attempt to impose some structure on these ideas by exploring five distinguishable trends, which I shall dub, respectively :
  1. Computications (on the relationship between communications and computation)
  2. Physics (a propos the spectrum of implementation options between hardware and software, and the trends called SDR and NFV)
  3. Philosophy (concerned with touchless, configurable, and fully programmable devices, and the trend known as SDN)
  4. Geography (about where a particular functionality is located, e.g., locally or remotely)
  5. Politics (regarding distributed control planes and centralized management planes).
It was once simple and straightforward to distinguish between communications and computation, in fact the overlap was almost nonexistent. A telephone, a radio, and even a modem were solely for communications, while a slide-rule, a calculator, a mainframe and a minicomputer were for computation. And since most people don’t need to perform complex calculations in their everyday life, even the computer visionaries of the time believed computers to be hopelessly alien to the populace:

"I think there is a world market for maybe five computers."  - Thomas Watson, chairman of IBM (1943)

"There is no reason anyone would want a computer in their home."  - Ken Olson, president, chairman and founder of Digital Equipment Corp. (1977)

This has radically changed, mostly because the vast majority of computers are not used for computation at all. Most home computers are actually entertainment and communications devices, while those in professional environments are used almost exclusively for composing and reading documents. Of course the rendering of graphics and the decompression of videos and songs requires substantial computational power, but the average user has no idea that this computation is taking place.
In addition, telephones have been imbued with vastly more computational power than computers had ten years ago. A host of new devices such as ebook readers and tablets further blur the lines between computation, communications, and everyday life.

"Computing is not about computers any more. It is about living." - Nicholas Negroponte, founder of MIT’s Media Lab and One Laptop Per Child Association (1995)

I personally first appreciated this overlap of communications and computation when working on DSP for facsimile in the 1980s. A fax call starts with a set of tones to establish that both sides are fax machines, a handshake in which one side sends its capabilities and the other side sends its decisions as to which modes to use. This initial exchange is so similar to a human conversation that with a little experience anyone can follow it (and indeed many faxes let you hear the initial stages to diagnose problems). This exchange is a rather typical protocol (documented in ITU-T Recommendation T.30), and while both sides require software with a tremendously complex state machine, there isn’t really any computation in the usual sense of the term. However, the tone generators/detectors and modulator/demodulators do implement numerous numerical signal processing algorithms (sinusoid generation, spectral analysis, digital filtering, timing recovery, phased-lock loops, equalization, geometric slicing), and once the handshake is over the image is compressed and decompressed and error correction/detection is performed (as documented in ITU-T Recommendation T.4). So implementing the communications protocol requires implementing computation on both sides, and ITU documents may define algorithms in addition to protocols. All of this led me to wonder where to draw the dividing line between the protocol and the algorithms. My revelation was:

"Protocols are to communications as algorithms are to computation." - Y(J)S (circa 1985)

By that I meant that although both algorithms and protocols are step by step recipes for accomplishing a desired task, algorithms are logically carried out by a single actor (even when run on parallel computation devices), while protocols are used simultaneously by at least two distinct remotely situated actors. Of course, protocols require algorithms on each actor, but while non-communications algorithms interact with their environment solely via inputs and outputs, algorithms that support communications also interact with another instantiation of the same algorithm (or possibly an instantiation of a complementary algorithm) via the protocol.
Many years later, after I had delved more deeply into the fundamentals of telecommunications, I was confronted with an even more fundamental point of contact – the bit. In computation we use what I call the Tukey bit in contradistinction to the Shannon bit used in communications. The Tukey bit is simply a digit used to represent a number in binary representation. It is a contraction of “binary digit” and was coined by John Tukey, the Princeton statistics professor better known for having co-invented the FFT algorithm. Interestingly, Tukey also coined the word software.

The Shannon bit is the basic unit for quantifying information of any sort. Before Shannon’s work in information theory, it was not clear that any two bodies of information could be compared in the same way as the weights or lengths of physical bodies are compared. Shannon taught us that the information content of a song, a movie, a poem, and a thought can be quantified, and that the natural unit was the bit. The number of bits of information is the smallest number of yes/no answers one needs to provide to uniquely describe the information.
To understand why Tukey’s bit is a special case of Shannon’s bit, consider the special case in which I wish to communicate the value of a variable that can be from zero to fifteen, for instance 10. In Tukey terms such number can be described by 4 Tukey bits - 1010. In Shannon terms the minimum number of questions that I need to answer is four :
  1. Is the number over 7 ? Answer YES encoded as 1 (the number is 8, 9, 10, 11, 12, 13, 14, or 15).
  2. Is the number over 12? Answer NO encoded 0 (the number is 8, 9, 10, or 11).
  3. Is the number over 9? Answer YES encoded 1 (the number is 10 or 11).
  4. Is the number over 10? Answer NO encoded 0 (the number is 10).
The four answers are thus 1010, which indeed agrees with the Tukey representation.
However, the Shannon bit is still meaningful for arbitrary non-numerical information (think of game called “20 questions”) for which the Tukey bit is undefined. So information, the basic quantity transported by communications, generalizes the idea of number, the basic tenet of numerical computation.

However, there are certainly additional, possibly less subtle but more visible, connections between communications and computation. For example, dedicated communications hardware may be replaced by pure processing power, a trend embodied in Software Defined Radio (SDR) and Network Function Virtualization (NFV). Control and management plane protocols that discover and negotiate communications functionalities may be replaced by software running on servers, a trend called Software Defined Networking. We will visit these trends in the upcoming installments of this blog.

Wednesday, January 9, 2013

CAPping routed and SDN networks

In my last blog entry I discussed two differences between SDN and conventional networking. The second was that SDN returns to centralized control as compared to a distributed control plane (I’ll return to the first difference in just a moment). And this means that there is a single point of failure, even if the controller itself has built-in redundancy.

There has been discussion in the OpenFlow community about failover (and even load balancing) between multiple controllers. The OF reference implementation includes a simple mechanism wherein an OF switch can be configured with a list of controllers, and if the master controller fails the OF switch selects the next on the list. Open vSwitch allows sending messages from OF switches to multiple controllers, and enables these controllers to elect a master.

What I want to elucidate here is why such solutions can never provide a complete resolution to the problem. My explanation is based on the CAP theorem for distributed computing systems.

A distributed computing system is composed of multiple computational resources (CPU, memory, storage) that are connected via a communications network and together perform a task. To make things more concrete consider a database, which can queried and into which new data can be entered, via a large number of servers around the world, connected over the public Internet.

There are three characteristics of distributed systems that are universally desirable:
Consistency, meaning that the system responds identically to a query no matter which node receives the request (or does not respond at all);
Availability, i.e., that the system always responds to a request (although the response may not be consistent or correct); and
Partition tolerance, namely that the system continues to function even when nodes or the communications network fail.

In 2000 Eric Brewer conjectured that a distributed system can satisfy any two of these guarantees at the same time, but not all three. This conjecture was later proven by Gilbert and Lynch and is now called Brewer’s theorem, or the CAP theorem.

Without going into a formal proof, you can readily convince yourself of the correctness of the CAP theorem by requiring one of the characteristics and then proving that the other two can’t co-exist. For example, let's require consistency. There are basically two ways to handle data; we can have a single database that all nodes need to update and retrieve from, or maintain a local copies of the database at each node. If there is a single database, then consistency demands that once any node starts updating the database, all the other nodes must wait until the update is complete. If there are multiple copies, then each time one node updates its local copy, consistency demands it to send messages to all the others instructing them to update their copies, and to wait for acknowledgements from all the others. In the first case, a communications network failure that separates the node in question from the database immediately leads to lack of availability. In the second case a network failure that causes even a single node’s acknowledgement not to arrive also causes lack of availability. So availability and partition tolerance can’t be simultaneously achieved.

Now what does this have to do with networking? Well, SDN teaches us that packet forwarding is simply a computational problem. That was the first difference between SDN and conventional networks that I discussed in my last blog entry. And since the task of forwarding a packet from network ingress to network egress is obviously carried out by a large number of forwarding elements, the network of packet forwarding devices is a distributed computational system. And ergo the CAP theorem applies.

So which two of the three desirable characteristics of distributed systems do we want to achieve, and more importantly, which one are we willing to forego?

In conventional routed networks we forego consistency. Each router has its own local Forwarding Information Base (FIB), which is locally created and only weakly tied to the FIBs of the others. Lack of forwarding consistency means that packets can sometimes become misrouted or “black-holed” (go around in circles until they are dropped). That is the reason the IP packet header has a TTL field, and the reason TCP is layered over IP to retransmit lost packets. On the other hand, IP service unavailability is almost always due to a local break that physically disconnects the host from the IP network, and, as previously discussed, IP networks are extremely tolerant to partition failures.

At the other extreme is OpenFlow, which stresses consistency. The controller acts as a global database, and specific OF mechanisms ensure that a packet entering the network is handled consistently by all switches. OpenFlow also attempts to provide high availability, with mechanisms for handling the first packet of a new flow, and popular test suites examining scalability and availability. So, what must be sacrificed? The only remaining characteristic is partition tolerance.

So what is wrong with using multiple OF controllers? If you have read everything up to now you should understand that these controllers can’t be kept consistent even when the network connecting them fails without having them hang indefinitely.

Since you can't have your CAP and eat it too, every network architecture must decide which of consistency, availability, and partition tolerance it is willing to do without. On this issue conventional IP networks and SDN have taken different routes.

Y(J)S

Tuesday, October 9, 2012

SDN : A Backward Step Forward

Everyone in the industry is talking about Software Defined Networks or SDNs, but unfortunately no two people using the term mean precisely the same thing. However, after hearing dozens of academicians and even more networking converts, and after reading hundreds of press releases, I believe that when people mention SDN they are talking about a network based on one or both of two guiding principles – 1) fully programmable forwarding behavior and 2) centralized control.

Both of these principles are in fact revolutionary in that they challenge well established principles of packet switched networks (PSNs) and actually regress to communications technology as it existed before the development of PSNs and the Internet. As I will discuss below, fully programmable forwarding behavior as espoused by SDN is fundamentally at odds with network layering, and centralized control diametrically opposes the philosophy of distributed routing protocols. It is worthwhile understanding these tensions before considering whether it makes sense to discard everything that has been learnt about networking in the past thirty years.


Programmable behavior vs. network layering

When packet switched networks were first proposed as an alternative to circuit switched ones, it was realized that the subject was so complex that it had to be broken down into more readily digestible parts. That’s why network layers (the seven OSI layers) were first invented. Later it was realized that network layering (a la ITU-T G.800) enabled hierarchies of service providers to coexist. Layering is employed despite the fact that it significantly reduces the efficiency of communications networks by introducing constructs that are not mandated by Shannon’s separation theorem. For example, in a VoIP application one may go to great lengths to compress 10 milliseconds of telephony quality speech from 80 Bytes down to 10 Bytes, but to that payload one then adds an RTP header of at least 12 bytes, a UDP header of 8 bytes, an IPv4 header of 20 bytes, and an Ethernet header and trailer of at least 18 bytes, for a grand total of at least 48 bytes of overhead – almost five times the payload size!

So why do we employ all these layers? Simply because it makes the task at hand manageable! Each layer contacts the layer above it (except for the top layer, i.e., the application) and below it (except for the bottom layer, i.e., the physical layer) through well-defined interfaces. Thus, when a lower layer receives information from an upper layer, it should treat it completely transparently, without attempting to discern what belongs to each of the layers above it. Similarly, a layer should not expect any special treatment from lower layers, other than that defined in the interface description.

It is thus understandable that communications theory purists shun layer violations which allow one layer’s processing to peek at information belonging to another layer. The consequences of disobeying this principle may be dire. Consider what would happen if MPLS label switching processing were to depend on details of the IP packet format. Then we would need two versions – one for IPv4 and one for IPv6, and we would not have pseudowires (which explicitly exploit the fact that MPLS is not cognizant of aspects of its payload). What would happen if UDP were to care about RTP fields? Then every modification to these fields would require a modification to UDP; and if IP were dependent on UDP then the domino effect would continue on down, negating the benefits of layering.

Of course in practice some layer violations are condoned, but only when they are absolutely necessary optimizations and can be shown to be benign. Even then they often take a heavy toll, as can be seen in the following three examples:
  1. Network Address Translation (NAT) intertwines the IP layer with the transport layer above it. This optimization became absolutely necessary since the world ran out of IPv4 addresses faster than it could adopt IPv6, but it broke the end-to-end principle underlying IP. Although there was little choice, NAT technology needed to embrace complex ALGs, and then various hole-punching technologies such as TURN, STUN, and ICE. And NAT still breaks various applications.
  2. The IEEE 1588 Transparent Clock (TC) modifies a field in the 1588 header (which sits above Ethernet or IP) based on arrival times of physical layer bits. Without this optimization there is no way to compensate for variable in-switch dwell times, and thus no way of distributing highly accurate timing over multiple hops. However, once again this layer violation comes at a price. The 1588 parser needs to be able to classify arbitrary Ethernet and/or IP packet formats and locate the offset of the field to be updated; not only does this require a complex processor, it also requires updating every time the IEEE or IETF changes a header field. Furthermore, updating the TC correction field is not compatible with Ethernet-layer security (e.g., MACsec that protects the integrity of the Ethernet frame, and optionally encrypts it).
  3. It is common practice for Ethernet Link Aggregation (LAG) to determine which physical link to employ by hashing fields from layer 2, 3, and 4 headers. This three-layer kludge enables transcending the link bandwidth limitations of a particular parallel set of links, and is only allowed because it doesn’t extend past the physical links involved.
But then along came SDN. The guiding SDN principle of completely programmable behavior means that an SDN switch treats an incoming packet as a simple sequence of bytes. The SDN switch may examine an arbitrary combination of fields, and based on these fields takes actions such as rewriting the packet and forwarding it in a particular way. The SDN switch neither knows nor cares to which network layer the bytes it examines belong.

Thus by programming an SDN switch to forward packets based on Ethernet addresses it can act as a layer-2 switch. By having it forward based on IP addresses it can act as a layer-3 router. By telling it to look UDP or TCP ports it can behave as a NAT. By having it block packets based on information further in the packet it can act as a firewall. In principle, we could program an SDN switch to edit a packet and forward it based on hashing arbitrary fields anywhere in the packet. This provides awe-inspiring flexibility, especially when the SDN switch is implemented as software running in a VM. Of course this flexibility comes at the price of obliterating the concept of layering.

Centralized control vs. distributed routing protocols

The most veteran communications network in the world is the Public Switched Telephone Network (PSTN). Unlike routing in the Internet, when a telephone call needs to be routed from source to destination, a centralized computer system works out the optimum path. The optimization algorithms employed can be very sophisticated (witness AT&T's blocking of Karmarkar from disclosing the details of his algorithm!) taking into account overall delay, the present loading throughout the network, etc.

The chief problem with centralized control is that there is a single point of failure. So when ARPA sponsored the design of a network that needed to survive network element failures (no, despite popular belief, there was no design goal to survive nuclear attack), it was decided to rely on distributed routing protocols. Using these protocols routers speak with other routers, and learn enough about the underlying topology to properly forward packets. Local forwarding decisions miraculously lead to globally optimal paths.

Yet, the optimality just mentioned is that of finding the shortest path from source to destination, not that of optimally utilizing network resources. Routing protocols can support traffic engineering, but this means reserving local resources for a flow, not locating under-utilized resources elsewhere and pressing them into service.

But then along came SDN. The guiding SDN principle of centralized control means that the controller sees the entire network (or at least network elements that it controls) and if provided with suitable algorithms can route packets while optimally utilizing network resources. This is precisely what Google are doing in their inter-datacenter WAN SDN – filling up the pipes much more efficiently than could be done purely based on distributed routing protocols. Of course this efficiency comes at the price of reintroducing the problem with a single point of failure. And as a corollary to the CAP theorem it is fundamentally impossible to circumvent this problem.


SDN - is it worth it?

So, no matter how you define it, SDN is fundamentally incompatible with one or more of the well-established principles of PSN design. The question is thus whether the benefits outweigh the costs.

OSI-style layering was an important crutch when PSNs were first being developed, but leads to inefficiencies that should have been addressed long ago. These inefficiencies are not only in bandwidth utilization, they are also in complexity, e.g., the need for ARP in order to match up layer 2 and layer 3 addresses. Were it not for the sheer number of deployed network elements, one could even imagine replacing the present stack of Ethernet, MPLS, IP, and TCP with a single end-to-end protocol (IPv6 formatting might be suitable). This could be accomplished using SDN, but several problems would need to be solved first. The single point of failure argument is not made moot by positing that the controller hardware can be made sufficiently resilient. It is also necessary to make the controller sufficiently secure against malicious attacks. In addition, the boot-strap problem of how the controlled switch can reach and be reached by the controller with a conventional underlay needs to be convincingly resolved.

The previous paragraph addressed OSI-style layering, but what of ITU-style layering imposed in order to support hierarchies of service providers? Eliminating the need for these “layer networks” requires a new model of providing data and communications services. That new model could be the cloud. A cloud service provider which is also a network service provider, or which has business agreements with network service providers, could leverage SDN network elements to revolutionize end-to-end networking. One could envision a host device passing a conventional packet to a first network element in the cloud, which terminates all the conventional layers and applies the single end-to-end protocol header. Thereafter the SDN switches would examine the single header and forward so as to simultaneously ensure high QoE and high network resource utilization efficiency. Present SDN deployments are simply emulating a subset of features of the present network layers, and are not attempting to embody this dream.

SDN technology is indeed a major step backward, but has the potential of being a revolutionary step forward.

Sunday, August 19, 2012

Quality of Experience (QoE)

As a reader of this blog you are doubtless familiar with the concept of the Quality of Service (QoS) of a telecommunications service, by which we mean meeting defined levels of a set of measureable network parameters, such as availability, delay, delay variability, and information loss. The precise set of parameters depends on the service type; for example Bit Error Rate (BER) and Errored Seconds Ratio, are important to TDM services, while Bandwidth Profile and Delay percentiles are two of the parameters measured for Ethernet services.

On the other hand, you may be less familiar with the related concept of Quality of Experience (QoE).

QoE is defined as the acceptability of a service, as perceived subjectively by the end-user (see ITU-T E.800, P.10, G.1080, and the ETSI 2010 QoS QoE User Experience Workshop). It too depends on the service being provided, being diminished when the user perceives low voice or video quality, long response times, service outages, information loss, lack of service reliability, or inconsistent behavior. Unfortunately, the end-user may not always distinguish whether QoE degradation is due to a defect in the communications network or in an information processing resource; for example, response time to a database query is partially due to computational resource availability and speed, and partially due to network delays in both directions.

While QoE as defined above is absolute and subjective, for reasons that we will discuss below, it may be measured in comparative and/or objective ways. By absolute QoE we mean the quality perceived by an end-user based solely on the received information, while comparative QoE refers to the somewhat artificial case of an end-user who has access to the non-degraded information. Subjective QoE determination is the perception of a true end-user, while objective QoE means QoE estimated by an algorithm designed to correlate with true user perception.

Telecommunications Service Providers originally earned their income by providing basic connectivity, but now, in the age of free WiFi, Skype, hotmail, dropbox, and other free best-effort services, the service provider’s only justification for charging a fee is providing a certain QoE level. When the QoE remains above a certain threshold the service is perceived as good, and the end-user is content. Below that level but above some other threshold the end-user perceives service degradation, but is able to tolerate it. Below the lower threshold the user becomes frustrated and typically abandons the service; surveys show that a large percentage of users experiencing low QoE desert the service provider without ever complaining to the provider’s customer service department.

Unfortunately, direct measurement of QoE is often difficult, and so for many years guaranteeing QoS levels has served as a proxy for QoE guarantees. The theory is that for the QoE for a given application is always a function of the network QoS parameters
                                 QoE = f (application, QoS)
but until recently one could only guess the form of this function. However, it is important to emphasize that QoS does not map to QoE independently of the application. For example, for interactive applications such as voice conversations, low delay is critical while packet loss is relatively insignificant, while for others, e.g., progressive download over TCP, the opposite is true.

Unsurprisingly, the first QoE parameter to be directly measured was voice quality, since telephony was for many years the paramount telecommunications service. Telephony service providers promised “toll quality” speech (literally, quality for which they could charge a “toll”), and it was thus natural to specify what that meant. This QoE was quantified using the Mean Opinion Score (MOS), defined in ITU-T Recommendation P.800. MOS is measured by having a number of listeners subjectively scoring the speech quality on a scale from 1 (bad) to 5 (excellent), and averaging over these scores (finding the mean). Many variations are defined, including Absolute Category Rating (ACR) in which the listeners hear only the degraded speech, and a comparative method called Degradation Category Rating (DCR) in which the listeners hear both original and degraded speech and compare the two. The comparative method is used here because it often returns more accurate results.

Unfortunately, direct measurement of MOS in this fashion is an expensive and time-consuming task. So the ITU-T looked into ways of defining objective measures that could be automated. The first method developed was called PSQM (ITU-T P.861), and the second PESQ (ITU-T P.862). Both of these are objective comparative measures in that they compare degraded speech with the original telephone quality speech, using appropriate signal processing (such as computing a logarithmic scale frequency representation) to model the human auditory perception system. Similarly, PEAQ (ITU-R BS-1387) determines the quality of wideband audio. The particular methods were selected in competitions to have highest correlation with human MOS scoring.

PSQM, PESQ, and PEAQ are all comparative, and are thus not suitable for estimating quality in operational systems where only the degraded audio is available. This was rectified by the ITU-T P.563 single-ended method for measuring absolute objective speech quality. P.563 determines the un-naturalness of telephone-grade speech sounds and how much non-speech-like noise is present.

Another approach championed by the ITU-T is the E-model (Recommendation G.107). The E-model is a planning tool that predicts a mouth-to-ear “transmission rating factor” R between 0 and 100, with higher values signifying better voice quality. An R value should be uniquely convertible to a MOS level. The expression for R starts with the basic signal to noise ratio and reduces it to account for various impairments including simultaneous impairments (loudness, quantization noise), delay impairments (delay, echo), and equipment impairments (codec distortion, packet loss). On the other hand, R is increased to compensate for advantageous scenarios such as mobility (cellphone, satellite).

Several years before ITU-T’s P.563, ETSI TIPHON (Telecommunications and Internet Protocol Harmonization Over Networks) produced TS 101 329-5 on QoS measurement methodologies. Annex E of that document described VQMON a single-ended method for estimating the E-model factors for VoIP, based on network parameters.

But voice is not the only service for which QoE has been defined. The ITU-R produced BT.500 on the subjective assessment of television quality. It defines MOS-like scores - television sequences are shown to a group of viewers, and their subjective opinions are averaged.

Among the notable ITU-T Recommendations for video quality are :
  • P.910 Subjective video quality assessment methods for multimedia applications 
  • P.911 Subjective audiovisual quality assessment methods for multimedia applications 
  • P.920 Interactive test methods for audiovisual communications 
  • P.930 Principles of a reference impairment system for video 
  • P.931 Multimedia communications delay, synchronization and frame rate measurement 
  • J.143 User requirements for objective perceptual video quality measurements in digital cable television 
  • J.144 Objective perceptual video quality measurement techniques for digital cable television in the presence of a full reference. 
  • J.246 Perceptual audiovisual quality measurement techniques for multimedia services over digital cable television networks in the presence of a reduced bandwidth reference 
  • J.247 Objective perceptual multimedia video quality measurement in the presence of a full reference (PEVQ) 
  • J.341 Objective perceptual multimedia video quality measurement of HDTV for digital cable television in the presence of a full reference 
Since 1997 the principle body working on video quality is the Video Quality Experts Group (VQEG). VQEG has produced a tutorial on comparative (“full-reference”) objective assessment of television quality, and is working on others.
In addition to audio and video, the ITU has looked into multimedia and data applications. Recommendation G.1011 is a reference guide to existing standards for QoE assessment, and identifies a taxonomy for such standards.
Recommendation G.1010 discusses applications (conversational voice, voice messaging, streaming audio, videophone, one-way video, web-browsing, bulk data transfer, email, e-commerce, interactive games, SMS, instant messaging, etc.) and gives performance targets for delay, delay variation, and loss QoS parameters for each.
Recommendation G.1030 provides network planners with end-to-end (E-model-like) tools for applications over IP networks, with an appendix devoted to web browsing. The appendix presents empirical perception of users to response times, and proposes a MOS measure. This work is complemented by G.1050 which describes an IP network model that can be used for evaluating the performance of IP streams based on QoS parameters (delay, delay variation, and loss). Recommendation G.1070 proposes an algorithm that estimates videophone quality for planners. Other documents include J.163 on QoS for real-time services over cable modems, and X.140 on QoS parameters for public data networks.

Outside the ITU, the Broadband Forum (BBF) has produced TR-126, which is an excellent tutorial on QoE as well as a useful set of guidelines for the relationship between QoE and QoS for broadband triple play applications. The document commences with a definition of QoE that is consistent with that of the ITU-T, namely a measure of end-to-end performance from the user’s perspective, in contrast with QoS as metrics of network performance. TR-126 provides a clear relationship between the two, so that given a set of QoS measurements, one could predict the QoE for a user, and conversely given a target QoE, one could deduce the required network performance. TR-126 discusses QoE “dimensions”: service set-up, operation, and tear-down; QoE “facets”: user effort, application responsiveness, information fidelity, security, and dependability/availability; and the service, application, and transport “layers”. While QoE is quintessentially end-to-end, TR-126 breaks down the contribution of various segments, such as access technologies (e.g., DSL and PON), ISPs, and application service providers. Specific guidelines are given for video (various kinds of entertainment video, video conferencing, surveillance video, streaming video, …), voice (wired, wireless, voice messaging, IVR), and best-effort Internet data (web browsing, email, file transfer, VPN, P2P, ecommerce, …).

The TeleManagement Forum (TMF), as could be expected, has documents discussing QoE from the Service Level Agreement (SLA) management perspective. TMF’s Wireless Services Measurement Handbook GB923 defines Key Quality Indicators (KQIs) and Key Performance Indicators (KPIs), similar to QoE scores and QoS parameters. KQIs experienced by end-users may in principle be determined from KPIs (although the mapping may be complex), while KPIs are derived from QoS parameters. The TMF has defined a set of KQIs including response time, service availability, speech/video quality, transaction rate, offered throughput, etc. An SLA consists of a set of thresholds for KQIs and KPIs, and these are specified in the SLA Management Handbook GB917 and its Application Notes.

The Apdex Alliance is a group of collaborating companies that functions as a program under the auspices of the IEEE Industry Standards and Technology Organization (IEEE-ISTO). Its mission is to develop open standards that define standardized methods to report, benchmark, and track application performance. The Application Performance Index (Apdex) is a number between 0 and 1 that attempts to capture user satisfaction with an application. Zero signifies that no user would be satisfied, while 1 would mean that all users would be. More formally, users are divided into three categories, satisfied, tolerating, and frustrated; and the Apdex represents the ratio between the number of satisfied users and half the tolerating ones, to the total number of users.
                         Satisfied Count + Tolerated Count / 2
 Apdex = -----------------------------------------------------------------------------
               Satisfied Count + Tolerated Count + Frustrated Count
Apdex deconstructs application transactions into sessions (the “connect” time) consisting of processes (interactions accomplishing a goal) that are made up of tasks (individual interactions), and further into turns, protocols, and individual packets. The user is mainly aware of the task response time, since (s)he must wait for the task to complete before proceeding. For example, users may be satisfied if a web page completely loads within 2 seconds, and may tolerate the delay if it loads within 8 seconds. Above that frustration sets in.

The problem with all of the above subjective and objective QoE measures is that they are service/application-specific. Since new applications are coming out every day, and furthermore different users may use completely different features of a single application, it is no longer feasible to study each application in depth. A new approach being studied is behavioral QoE estimation, where the user’s satisfaction is gauged based on his actions and reactions. An extreme example is the high measured correlation between a user being unsatisfied with a service level, and his aborting the application (or at least waiting until the service level improves). Such behavioral QoE may be used to automatically map QoS to QoE for new applications, or may be used directly instead of traditional QoE measurement.

Y(J)S

Sunday, May 20, 2012

What is time ?

There is an anecdote about a foreigner visiting London asking a man on the street “what is time?” and receiving the answer “I’m sorry, but I am not a philosopher”.

I don’t want to discuss here the philosophical or physical questions of what time is, but rather what we mean by time in telecommunications applications. In particular, we frequently hear the terms “UTC”, “GPS time”, “NTP time”, and “1588 time”, and I would like to clarify what these terms mean.

Everything starts with the question “what is a second?”. Until 1960 the second’s duration was based on the rotation of Earth. Specifically, the second was defined as the unit of time of which there are precisely 24*60*60 =86,400 of them in a mean solar day. Unfortunately, the Earth’s rotation is slowing down due to tidal friction, and so between 1960 and 1967 the second was redefined as a particular fraction of the duration of the year 1900. Since it is hard to reproduce the year 1900 in the lab, the second was finally linked to a stable, reproducible, physical phenomenon, namely the radiation emitted when an electron transitions between the two hyperfine levels of the ground state of the cesium 133 atom. Cesium atomic clocks need only count 9,192,631,770 oscillations and declare that a second has passed. (Cesium is chosen because all of its 55 electrons except the outermost one are in stable shells, minimizing their effect on the outermost electron.)

Even such a stable phenomenon as the hyperfine transition is somewhat subject to variability (due to contaminants, undesired fields, and General Theory of Relativity corrections due to height above sea level) leading to variability on the order of a nanosecond or two per day. In order to remove even this small variability the TAI international time scale (TAI stands for “Temps Atomique International” or International Atomic Time) maintained by the International Bureau of Weights and Measures (BIPM) in Paris, is defined as the weighted average of over 300 atomic clocks located around the world (the higher weightings going to the more stable clocks).

TAI is precisely defined, but has become entirely divorced from the Earth’s rotation. Were we to adopt only TAI the time of day would slowly lose connection with the position of the sun in the sky, and after a long enough time we would be having breakfast at 12 noon. IN order to resynchronize the two definitions of the second UTC is defined. UTC stands for Coordinated Universal Time (the order of letters is a compromise between several languages), and it replaced older time standards such as “GMT”. It is defined in ITU-R Recommendation TF.460-6 to be TAI adjusted by leap seconds introduced to compensate for the changing of Earth’s rotational velocity. When to introduce leap seconds is now determined by the International Earth Rotation and Reference Systems Service (IERS). While leap seconds can be either positive or negative, and can be introduced at the end of any month, there have only been positive ones (corresponding to slowing down of Earth’s rotation) and they have only been introduced on the last day of June or December. There are presently proposals to eliminate leap seconds entirely (in which case TAI would be abolished), and perhaps introduce leap hours should the need arise.

UTC is now exactly 34 seconds behind TAI, because of a 10 second introduced in 1972 when the present system was adopted, and 24 positive leap seconds that have been declared since then. The next leap second will be at the end of June 2012, increasing the difference to 35 seconds.

Actually there are several versions of Universal Time. UT0 and UT1 are found by observing the motion of stars (UT0) or distant quasars (UT1), as well as from laser ranging of the Moon and artificial Earth satellites (such as GPS satellites). UT1R and UT2R are smoothed versions of UT1, filtered to remove periodic and stochastic variations in the Earth’s rotation. UT2R is smoother than UT1, and any variations left in it are because of erratic changes in the Earth’s rotation, due to plate tectonics and climate change.

So, what kind of time do we use in GPS and our time distribution protocols ?

The time of day reported by GPS, which is often called “GPS time”, is not UTC. Every GPS satellite has several on-board atomic clocks, and these clocks are set according to the master clock at the US Naval Observatory in Boulder Colorado. “GPS time” does not include leap seconds, but GPS satellites periodically transmit a UTC offset message for this purpose (the GPS-UTC offset field is 8 bits and can thus accommodate 255 leap seconds, which should be sufficient for several hundred years). Once thus compensated, USNO time is within tens of nanoseconds of UTC. However, it can take over 10 minutes until you receive an offset message.

It is interesting that the on-board atomic clocks must be corrected for relativistic effects. Since the satellites are moving at high speeds with respect to an observer on the ground, the Special Theory of Relativity predicts that the on-board clocks will seem to be running about 7 microseconds per day slower than were they stationary with respect to the observer. On the other hand, the General Theory of Relativity predicts that because the satellite is high above the Earth, and thus experiences a weaker gravitational field, the on-board clocks will seem to be running faster by about 45 microseconds per day. The net relativistic correction is about 38 microseconds per day. After compensating for relativistic effects, the accuracy of time derived from a good GPS receiver is about 50 nanoseconds.

NTP (and that included SNTP) distributes UTC (i.e., it does takes leap seconds into account) and specifies UTC in seconds since Jan 1, 1900. The NTP 64-bit timestamp consists of 32 bits of whole seconds (about 136 years until roll-over) and 32 bits of fractional seconds (about 233 picoseconds of resolution). However, any specific NTP server distributes time according to the stratum of its reference clock. Of course, the time a particular NTP client obtains depends on the network between the client and the NTP server. You can expect an NTP client to be within tens of milliseconds of its server on a LAN, but only 100s of milliseconds of error over the Internet. However, NTP allows a client to track several servers, and thus improve its accuracy.

IEEE 1588 distributes TAI according to UNIX epochs. Since the UNIX time epoch started Jan 1, 1970, 1588 time is now ahead of UTC by 24 seconds (soon to be 25). The 1588v2 10-byte timestamp consists of 48 bits of whole seconds, and 32-bits of nanoseconds. Once again, the precise time accuracy depends on the type of grand master to which the 1588 master is synchronized. The big difference between 1588 and NTP is the possibility of on-path support in the network. If you have Boundary Clocks (BCs) or Transparent Clocks (TCs) in your network, the time error should be very small (perhaps a microsecond or less). 1588 can't simultaneously track multiple masters, but it can choose the best one from a list.

So that's what we mean by time.

Y(J)S

Tuesday, May 8, 2012

iPhone Storms

A few years ago RAD’s president Zohar Zisapel asked me to accompany him to a meeting with another Israeli company concerning possible cooperation on an important issue. On our way I asked him what this important issue was. He replied the iPhone problem and I immediately understood.

He informed me that he had been in the US the previous week, and although he carried a Blackberry and not an iPhone, he had experienced inability to connect to the network even for voice calls, calls dropping in the middle, cell breathing (which he graphically described as the signal strength bars undulating up and down), and of course inability to connect to data services. Once back in Tel Aviv, he had contacted companies with whom RAD could cooperate in trying to solve the problem.

I had seen many reports on the problems AT&T was experiencing in New York and San Francisco since the introduction of Apple’s iPhone, but had not known it was really that bad. Obviously the iPhone brought significantly increased bandwidth usage due to users being “always on” and consuming more video streaming and other high-datarate services rather than just sporadically sending an email or downloading a file. However, networks in other parts of the world with many different kinds of smartphones were not experiencing such catastrophic failures; in fact, many operators with whom I had spoken were not observing any problems at all!

What could be causing these problems? There were really only three possibilities:
  1. lack of resources in the air interface (known as spectrum crunch or spectral exhaustion),
  2. under-provisioning of the backhaul network,
  3. failure of the signaling servers (due to what are known as signaling storms);
and if the second item was the problem (or at least a major chunk of it), then RAD was uniquely positioned to help.

Why did we expect that the second problem to be at the root of the problem? Well, the backhaul network is extremely cost sensitive, and increasing bandwidth there was an expensive and time consuming task. We didn’t expect the air interface to be already congested (although we expected the spectrum to eventually become exhausted) since AT&T had already deployed HSPA+. We ruled out signaling as the major issue, since denser networks of smartphones were not experiencing similar problems.

Of course we now know that we were completely wrong, and that signaling server failure was the major problem. The explanation was intimately related to the slim design of the iPhone, and to fact that Americans had never adopted text and multimedia messaging as avidly as Europeans did.

To understand what went wrong and how the issue was eventually solved, I need to explain 3G Radio Resource Control (RRC) states. The RRC protocol is the control plane between the 3G network and the UE (User Equipment, e.g., cellphone). It is responsible for handling many interactions such as locating the UE, waking it up, establishing/releasing connections for voice and data, and for sending SMS’es.

The UE can be in one of five possible RRC states, called Idle, URA_PCH, Cell_PCH, Cell_FACH, and Cell_DCH. In Idle mode the UE is only known to the network by its IMSI (telephone number), and only listens to system broadcasts and paging information. It only very rarely transmits (and even then only location updates) and barely uses its receiver (waking up periodically to check if it has been paged). Battery drain is thus extremely low. At the other extreme is the Cell Dedicated Channel state. Here the UE is using a dedicated high-speed data channel, and may be consuming 100 times more battery power. In between are the PCH states where the UE is connected but still relatively inactive, consuming only a little battery power; and the FACH state where the UE is using shared channels for exchange of small bursts of data, and consuming perhaps half of what it would consume in DCH.

Now, a UE in the Cell_PCH state that needs to send a short data packet (e.g., an application keepalive) will need to transition to Cell_FACH. It does this by sending a single signaling message and receiving a single reply. After sending its data packet, the UE will only drop back to Cell_PCH after a relatively long timeout (several seconds), and in the meantime will be wasting battery power. In order to conserve battery power many manufacturers, starting with RIM in its Blackberry, but more notably Apple in the iPhone and various manufacturers for Android devices, devised a trick. The UE sends a SCRI Signaling Connection Release Indication message, a message that was intended to convey that some unexpected error has occurred in the UE, and that the network should immediately release its connection. The UE drops into the Idle state, with almost no battery drain. However, the network effectively forgets it, and the next time the UE needs to transmit something, it needs to go from idle state to FACH, which is a signaling-intensive (over 25 messages) and lengthy operation.

The consequences of this trick were not very apparent when it was only used by Blackberry handsets, which are mainly used for email and occasional short data transfers. On the other hand, iPhone users tend to continually pull and push data, watch and stream videos, and are generally “always on”. In addition, the iPhone’s iconic slimness meant that Apple couldn’t use anything larger than a 1400 mAh battery, so that Apple was particularly aggressive in sending SCRIs. Finally, in the US where SMS had never been as popular as in Europe, the signaling infrastructure was woefully undersized for millions of iPhones disconnecting and reconnecting to the network.

The initial resolution involved increasing server resources and freeing up bandwidth for signaling channels. The eventual solution was a signaling enhancement in 3GPP Release 8 called Fast Dormancy, which Apple adopted towards the end of 2010. This enhancement enables the UE to transition quickly from FACH state to PCH, rather than to Idle as in the trick. Thus the network remembers the UE, and it can rapidly transition back and forth between FACH and PCH states.

Of course, iPhones are not alone in having caused signaling storms. In mid 2011 the Android port of Angry Birds caused significant signaling traffic that stressed networks until an update solved the problem, and in January 2012 NTT Docomo suffered a 4½ hour outage in Tokyo due to an Android application that overloaded the signaling plane.

And according to many reports, spectral exhaustion is right around the corner.

Y(J)S

Monday, January 9, 2012

Jobs and Ritchie

October 2011 marked the passing away of two men well-known in the computation and communications industries. One was Steve Jobs. In his honor Apple, Microsoft, and Disneyland all flew their flags at half-staff. October 16, 2011, was declared "Steve Jobs Day" in California. President Obama gave a eulogy calling Jobs “among the greatest of American innovators … a visionary”.

The other was Dennis Ritchie.

Ritchie died alone. His passing was not mentioned on the TV news, and was not picked up as a major item by the press. The only formal recognition was the dedication to his memory of the Fedora 16 distribution. For those who don’t recognize his name, Ritchie is the R in K&R (Kernighan and Ritchie’s “The C Programming Language”), a book known by heart to everyone who has ever written in C. In addition to creating C and introducing many of the constructs of imperative programming, Ritchie, along with Ken Thompson, created the UNIX operating system. In fact, C was created as a vehicle to make UNIX more portable.

For his contributions to computer science, Ritchie was awarded the Turing Award, the Hamming Medal, and the US National Medal of Technology. Until his retirement in 2007, Ritchie was head of research at Lucent’s System Software Department.

The papers eulogized Jobs as a great inventor, but were not very specific as to what precisely he invented. Of course they extolled technologies and devices with which his name is connected - the Apple 2, the MacIntosh, the mouse+icon GUI, the iPod, iPhone, and iPad, but mostly admitted that his contributions were in the area of design and evangelization, rather than invention. What they omitted was his major invention – his amazingly successful method of monetizing. Bill Gates convinced people to pay for software rather than receive it free of charge when purchasing hardware, but it was Steve Jobs who convinced people to give him a 30% royalty on third-party software (and music and videos) just in order to use it on his hardware.

In contrast, Ritchie convinced his employer AT&T to distribute UNIX to universities, under license but free of charge. The sources (mostly in C) were widely circulated in the book form and enabled programmers to enhance its features as well as to create their own software. After its divestiture AT&T was allowed to market software and quickly changed Unix System V into a proprietary closed system. This prompted a group at Berkeley to continue development of the BSD UNIX as an Open Source alternative, Ritchie to help in the development of the GNU free version of UNIX, and eventually Linux Torvalds to create Linux.

The computer industry is now segmented into Microsoft, Google/Android, and Apple. Microsoft’s most important asset is its Windows Operating System; this indeed is not based on UNIX but is programmed in C++ and promotes C#, two direct descendants of C. Google’s Android may exploit the Java language, not a direct descendent of C, but is itself based on Linux, a descendant of UNIX. And Job’s Apple uses the iOS operating system, a version of UNIX, and Objective C language – a derivative of C. So while Job’s influence is limited to a small a minority of PCs and one sector of the smartphone market, there is no mainstream computer or smart device without Ritchie’s fingerprints all over it.

Y(J)S

Sunday, December 25, 2011

The meaning of Apple's '647 patent

On December 19th the U.S. International Trade Commission (ITC) issued its final determination on Apple's claims against HTC of Taiwan, finding that HTC violated Section 337 of the Tariff Act by selling Android phones containing a technology that infringed a patent held by Apple. Section 337 enables the ITC to block importation into the US of foreign products that unfairly compete with domestic products, and infringing a valid US patent is considered such an unfair practice. Recently Section 337 investigations are being used more and more as faster and lower cost alternatives to enforcing US patents against foreign entities through litigation in district courts.

Of the 10 patents originally claimed by Apple to be infringed, the ITC rejected all but two in an earlier ruling, and in the final determination reduced this further to a single patent. The two patents in question are 5,946,647 entitled System and Method for Performing an Action on a Structure in Computer-Generated Data (filed Feb. 1 1996 and granted Aug. 31 1999) and 6,343,263 entitled Real-time Signal Processing System for Serially Transmitted Data (filed Aug. 2 1994 and granted Jan. 29, 2002). The ITC found that HTC did not infringe the '263 patent that protects the use of a Hardware Abstraction Layer to isolate real-time code from architectural details of a specific DSP.

The '647 patent discloses a system wherein a computer detects structures in data (such as text data, but possibly digitized sounds or images), highlights these structures via a user interface, and enables the user to select a desired action to perform. Apple's complaint to the ITC gives as an example of infringement the detection and highlighting of a phone number (e.g., in a received SMS) and enabling the user to click to call that number.

I have seen in blogs and forums many completely erroneous statements about what this patent actually means. People have claimed that '647 can't be valid, as hyperlinks or regular expression matching or SQL queries clearly predate the filing. However, a careful reading of the '647 patent shows that it does not claim to cover such obviously prior art. The following analysis is based on the text of the patent and on documents openly available on the web, and should not be considered legal advice.

After eliminating text required for patent validity (an input device, an output device, memory, and a CPU) the invention of '647 has three essential elements. First, an analyzer server parses the input data looking for patterns (called "structures"). Second, via an API the user-interface receives notice of the detected structures and possible actions for each one; displays the detected structures to the user; offers the user a list of actions that can be performed for each structure; and receives the user's selection. Third, an action processor performs the user's selected action (possibly launching new applications). The text of the '647 patent gives as an example the regular expression parsing of an email to find phone numbers, postal addresses, zip-codes, email addresses, and dates, and enabling the user to call a phone number, enter addresses into a contact list, send a fax to a number, draft an email, and similar actions.

Of course plain hyperlinks that are manually inserted into HTML are not covered by this patent since they are not automatically detected by an analyzer. A regular expression engine can potentially be used as an analyzer (although not necessarily by all embodiments as the patent mentions neural networks matching patterns in sounds and images) but is not claimed. The automatic parsing of a document for a list of patterns without offering a list of actions to a user is also not protected; indeed the Rufus file-type classifier is cited as prior art. Even the use of a regular expression engine to parse text and insert hyperlinks into a document is considered prior art, as the application references the Myrmidon text-to-html converter. It is possible that an editor or IDE that offers possible completions of text being typed would be considered infringing, depending on how broadly the patent's concept of input device is interpreted.

The three elements of the '647 patent are all present in many applications and devices used today. Users of Microsoft's Outlook are familiar with its automatic hyperlinking of email addresses and URLs in received messages. My old 2004 Sony-Ericsson K700 2G phone automatically highlights phone numbers in SMSes enabling single-click calling. However, Apple has targeted a very specific infringement - Android's Linkify class. Linkify enables the definition of a list of regular expression patterns to be detected, and a corresponding list of schemes, i.e., actions the user can select to be executed. It even comes with a few pre-defined patterns - email addresses, postal addresses, phone numbers, and URLs - which are almost precisely the examples given in the '647 patent.

While Apple's claims of infringement of '647 may be selective, they are not frivolous. In order to invalidate '647 the Android community would need to find publication of all three essential elements before 1996. I am sure that they have tried.

Removal of the Linkify feature from Android phones will put them at a definite ease-of-use disadvantage in comparison with the iPhone. And HTC has been given until April 19th 2012 to do just that.


Y(J)S

Wednesday, November 30, 2011

On exa, zetta, and beyond

Anyone who lives in metric system countries knows what "kilo" means. A kilogram is 1000 grams, a kilometer is 1000 meters. Of course frequencies are measured in kiloHertz and in the computer world we have kilobits and kilobytes (although we are never quite sure if that is 1000 or 1024!).

Most people even know that "mega" means a million. Power stations output megawatts of electricity, FM radios receive at megaHertz frequencies, and atomic bombs deliver megatons. For years our disks were measured in megabytes, and for most of us our Internet connections are in megabits (although we are not quite sure whether that is 1,000,000 of 1024*1024!).

People with state-of-the-art computers are aware that giga means a (US) billion (a thousand million), and that tera means a thousand of those, but only because disk capacities have increased so rapidly. When you ask people what comes next, you tend to get puzzled looks. Most people aren't even sure whether when they say a billion they mean a thousand million or a million million, so don't expect them to be expert in anything bigger than that!

Up to now only astrophysicists were interested in such large numbers, but with global data traffic increasing at over 30% per year, networking people are getting accustomed to them as well.

For those who are interested, the following numbers are peta (10^15), exa (10^18), zetta (10^21), and finally yotta (10^24). The last two names were only formally accepted in 1991. For those who prefer powers of two, the
IEC has standardized kibi (Ki) for 2^10, mebi (Mi) for 2^20, gibi (Gi) for 2^30, tebi (Ti) for^40, etc., although these terms don't seem to have caught on.

Several years ago I heard that the total amount of printed information in the worlds' libraries does not exceed a few hundred petabytes. On the other hand, present estimates are that global IP traffic now amounts to about 30 exabytes per month, or about ten times the world's accumulated printed knowledge every day. By the middle of this decade should surpass 100 exabytes per month, i.e., about the entire world's printed knowledge per hour.

These datarates, and particularly their time derivatives, present the telecommunications community with major challenges. We have grown accustomed to sophisticated applications that transfer massive amounts of data. A prime example is the new breed of cellphone voice/meaning recognition that sends copious amounts of raw data back to huge servers for processing. Such applications can only continue to be efficiently and inexpensively provided if the transport infrastructure can keep up with the demand for datarates.

And that infrastructure is simply not scaling up quickly enough. We haven't found ways to continually increase the number of bits/sec we can put into long-distance fiber to compensate for >30% annual increase in demand (although new research into mode division multiplexing may help). Moore's law is only marginally sufficient to cover increases in raw computation power needed for routers, but we will need Koomey's law for power consumption (MIPS / unit of energy doubles every year and a half) to continue unabated as well. And we haven't even been able to transition away from IPv4 after all of its addresses were exhausted!

If we don't find ways to make the infrastructure scale, then keeping up with exponential increases in demand will require exponential increase in cost.


Y(J)S

Wednesday, November 23, 2011

My new CTO job

As you all probably know, I have changed job titles.
I am now RAD's Chief Technology Officer instead of (or perhaps in addition to?) Chief Scientist.

Our previous CTO, Prof. Daniel Kofman, is still in touch with the company. However, he is a bit busy since in addition to his position as Professor at Telecom ParisTech (formerly ENST), he has been appointed by France's Minister of Research and Innovation as director of LINCS (Laboratory of Information, Networking, and Communication Sciences), a new research center in Paris.

So, what will I do be doing? Well, I will no longer be managing any R&D teams. The physical layer DSL chip development department I used to run closed many years ago, and last year my DSP software development department was dissolved as well. With my new appointment my HW/FPGA/Innovations department has transitioned to the newly formed Hardware and Innovations department, and my software team is moving to the new Advanced Technologies department. The Algorithmic Research department will still report to me.

I will continue to be responsible for tracking fundamental technology trends, and to steer RAD's participation in standardization forums (IETF, ITU, MEF, BBF, etc.). I will be working with academic research groups here in Israel, and perhaps abroad as well. I will be spending more time on IPR work - over the last few years this work has tended to be more defensive than creative. I will be doing more lecturing and more writing, and will function as editor in chief of the RAD Series on Essentials of Telecommunications (more on that some other time).

And I hope to have more time to blog.

Y(J)S

Thursday, November 17, 2011

MPLS-TP update

At the MPLS Working Group meeting this week it was announced that the core set of MPLS-TP RFCs have been finished.

Indeed, we now have (I hope that I haven't missed too many):
•RFC 5586 MPLS Generic Associated Channel (G-ACh and GAL)
•RFC 5654 Requirements of an MPLS Transport Profile
•RFC 5718 An In-Band Data Communication Network for MPLS-TP
•RFC 5860 Requirements for OAM in MPLS Transport Networks
•RFC 5921 A Framework for MPLS in Transport Networks
•RFC 5950 Network Management Framework for MPLS-TP
•RFC 5951 Network Management Requirements for MPLS-TP
•RFC 5960 MPLS-TP Data Plane Architecture
•RFC 5994 Application of Ethernet Pseudowires to MPLS Transport Networks
•RFC 6215 MPLS-TP UNI and NNI
•RFC 6370 MPLS-TP Identifiers
•RFC 6371 OAM Framework for MPLS-TP
•RFC 6372 MPLS-TP Survivability Framework
•RFC 6373 MPLS-TP Control Plane Framework
•RFC 6374 Packet Loss and Delay Measurement for MPLS Networks
•RFC 6375 Packet Loss and Delay Measurement Profile for MPLS-TP
•RFC 6378 Linear Protection MPLS-TP
•RFC 6425 Detecting Data-Plane Failures in Point-to-Multipoint MPLS - Extensions to LSP Ping
•RFC 6426 MPLS On-Demand Connectivity Verification and Route Tracing
•RFC 6427 MPLS Fault Management Operations, Administration, and Maintenance (OAM)
•RFC 6428 Proactive Connectivity Verification, Continuity Check, and Remote Defect Indication for the MPLS-TP
•RFC 6435 MPLS Transport Profile Lock Instruct and Loopback Functions

In addition, before the IETF meeting the ITU issued a statement reasserting that the IETF holds the pen on MPLS-TP.

It seems that the game is over.

Y(J)S

Wednesday, November 16, 2011

The notorious IP checksum algorithm

I have been asked several times to explain the checksum calculation used in the IP suite (IPv4, TCP and UDP all utilize the same checksum algorithm).

RFC 791, which defines IPv4, gives the checksum algorithm as follows :
The checksum field is the 16 bit one's complement of the one's
complement sum of all 16 bit words in the header. For purposes of
computing the checksum, the value of the checksum field is zero.
and the algorithm description was further updated in RFCs 1071, 1141, and 1624.


RFC 791 further states
This is a simple to compute checksum and experimental evidence
indicates it is adequate, but it is provisional and may be replaced by a CRC procedure, depending on further experience.

Back in 1981 when the RFC was written, Jon Postel already realized that this algorithm is very limited in its error detection capabilities (see below), but at the time CRC computation was too expensive computationally.

RFC 793, which defines TCP, says
The checksum field is the 16 bit one's complement of the one's complement sum of all 16 bit words in the header and text.

while RFC 768 for UDP says the same thing, but leaves a loophole
Checksum is the 16-bit one's complement of the one's complement sum of a pseudo header of information from the IP header, the UDP header, and the data, padded with zero octets at the end (if necessary) to make a multiple of two octets

If the computed checksum is zero, it is transmitted as all ones (the equivalent in one's complement arithmetic). An all zero transmitted checksum value means that the transmitter generated no checksum (for debugging or for higher level protocols that don't care).

On this latter issue, RFC 1180 “A TCP/IP Tutorial” adds
An incoming IP packet with an IP header type field indicating "UDP" is passed up to the UDP module by IP. When the UDP module receives the UDP datagram from IP it examines the UDP checksum. If the checksum is zero, it means that checksum was not calculated by the sender and can be ignored. Thus the sending computer's UDP module may or may not generate checksums. If Ethernet is the only network between the 2 UDP modules communicating, then you may not need checksumming. However, it is recommended that checksum generation always be enabled because at some point in the future a route table change may send the data across less reliable media.

and RFC 1122 “Requirements for Internet Hosts” adds
Some applications that normally run only across local area networks have chosen to turn off UDP checksums for efficiency. As a result, numerous cases of undetected errors have been reported. The advisability of ever turning off UDP checksumming is very controversial.

IPv6, as defined in RFC 2460, doesn’t bother with a header checksum, but closes the UDP loophole
Unlike IPv4, when UDP packets are originated by an IPv6 node,
the UDP checksum is not optional. That is, whenever originating a UDP packet, an IPv6 node must compute a UDP checksum over the packet and the pseudo-header, and, if that computation yields a result of zero, it must be changed to hex FFFF for placement in the UDP header. IPv6 receivers must discard UDP packets containing a zero checksum, and should log the error.

So, how precisely does the IP checksum algorithm work, and why is it designed this way?

The simplest method to protect against bit errors would be to xor bytes (or 16-bit words) together. This method suffers from the disadvantage that two bit errors in the same column cancel out, leaving no trace. Checksums are slightly stronger since they add words together instead of xoring them. Thus, two bit errors in the same column indeed leave that column correct in the sum, but the carry to the next column will be different.

Why does the IP checksum algorithm take the ones complement after adding together all of the words? Since this is a one-to-one transformation, it obviously doesn’t reduce the number of undetected errors. It does, however, protect against one special case – that of all zeros. If somehow the entire packet were wiped out and replaced by all zeros, the sum would still be OK (sum of zeros is zero). By flipping the result we catch this kind of bug.

To compute the IP checksum of some sequence of an even number of bytes (if the length is odd one pads with a zero byte), one groups the bytes in pairs which are considered as 16-bit words. Were one to have a computer that employs ones complement arithmetic, the algorithm would be simple to describe. One adds all of these words together, and returns the negative of this sum.

Unfortunately, ones complement machines are no longer in vogue, and essentially all computers now use twos complement representation. Ones complement and twos complement agree on how to represent positive numbers - they have a zero in the MSB. They also agree that negative numbers have a one in the MSB, but disagree about all the rest. Neither simply set the MSB (that’s called “signed magnitude” representation). In ones complement machines the negative of a positive number (its ones complement) is made by flipping all its bits. In two complement machines one flips all the bits and then adds one. Note that in ones complement representation there are two zeros – positive zero is all zeros and negative zero is all ones. Twos complement has only one zero - all zeros (all ones means -1).

Because of the difference in representation, the addition algorithms are also somewhat different for the two machine types. Twos complement machines add bits from LSB to MSB, and discard any carry from the MSB. Ones complement addition similarly adds the bits, but if a carry remains from the MSB it is added back to the LSB.

So if everyone uses twos complement arithmetic today, why does the IP checksum algorithm use ones complement addition and ones complement negation? Well, perhaps when the checksum algorithm was chosen one’s complement machines were more common (sigh).

More importantly, ones complement arithmetic has two (minor) advantages.

The first has to do with big- and little-endian conventions. Saying that a machine uses twos complement arithmetic still doesn’t completely pin things down. When building larger integers from bytes big endian machines place the higher order bytes to the left of the lower ones, thus if A and B are bytes, AB means A*256+B. Little-endian machines do the opposite – AB means B*256+A. Ones complement arithmetic has an interesting characteristic – addition is the same for big-endian and little-endian machines. This is not the case for twos complement arithmetic due to the discarding of the MSB carries.

For example,
in ones complement FF.FF+02.00=02.00 while FF.FF+00.02=00.02
in twos complement FF.FF+02.00=01.FF while FF.FF+00.02=00.01
thus one can write generic IP checksum code that directly uses 16-bit words that runs correctly on little-endian or big-endian machines, without knowing which kind of machine you have and without putting in compilation conditionals (#ifdef).

Another reason for ones complement is that it is slightly better at catching errors. Remember that twos complement addition discards MSB carries, so two bit errors in MSB positions are not caught, while ones complement propagates these carries back to the LSBs, thus catching this type of error. The difference is minor (for large TCP or UDP payloads the percentage of two bit errors missed by XORing is 6.25%, twos complement summing misses 3.32%, while ones complement summing only misses 3.125%).

Do these two small advantages justify the added complexity of using ones complement arithmetic? Probably not, but it is too late to change. With the greater computational power now available, stronger error detection algorithms should now be implemented. However, when IP is sent via Ethernet, it enjoys Ethernet's Frame Check Sequence, which is not only a CRC rather than a checksum, but is 32 bits in length! This makes the IP checksums superfluous.

Y(J)S

Wednesday, October 5, 2011

Network Coding

In conventional communications networks the active network elements (e.g., Ethernet switches or IP routers) are store-and-forward devices. They perform no nontrivial computation. It turns out that in certain cases it is possible to optimize network operation (to conserve some network resource or to improve some network performance measure) by embedding more intelligence in the network elements.

In order to understand how this is done, it is useful to start with two special cases.

First case : Two individuals communicating via a satellite having a single downlink/uplink coverage beam.

A transmits to B via satellite S and B transmits back to A via the same satellite. Since A and B must share satellite resources (namely time and frequency), the uplink transmissions must be separated in either time or frequency. In the conventional case the downlink transmissions are separated as well. Thus, if it costs one cent to transmit an uplink message from A or B to the satellite, and similarly one cent for the downlink message from the satellite to A or B, then the exchange of two messages, one from A to B and one from B to A costs 4 cents.

But, this does not need to be the case! Rather than S transmitting A’s message to B and afterwards (or on another frequency) B’s message to A, it can transmit only once the message A xor B on a frequency and at a time when both A and B are listening. A retrieves B’s message by xoring the received message with his own message (since B = (A xor B) xor A), and B performs the same operation to retrieve the message from A (since A = (A xor B) xor B).

This reduces the price from 4 units to 3 units (since there are only three transmissions: A-S, B-S, S-A+B) at the cost of the satellite having to perform the simple operation of xoring two messages. The xor operation performed by the satellite is a kind of “coding” operation that leads to reduction of required network resources.

Second case : Using coding to protect real-time or broadcast transmissions against packet loss.

In real-time and broadcast transmissions it is not possible for a receiver to request retransmission of a lost packet, as TCP and ARQ systems do. Some critical control protocols send each packet multiple times (three times is common), but this is extremely wasteful in network resources. RFC 2198 proposes repeating the audio data from the previous RTP packet in the present one, thus maintain the number of packets per second, but still doubling bandwidth requirements. The FECFRAME working group in the IETF standardized more efficient mechanisms in RFCs 5053 and 6015. I will explain only the simplest possible coding.

Assume that we know that there will never be more than 1 packet loss in 4 consecutive packets. Then for every four packets transmitted, a fifth “protection” packet consisting of the xor of these four packets is sent. If all four packets are received then this fifth packet is discarded. If any single packet is lost then it can be recovered by xoring the received three packets with the fifth “protection” packet. Thus, packet loss can be mitigated with only an increase of 25% in bandwidth, and an increase in delay.

But what does this have to do with network coding? In both of the above cases an information source performed some nontrivial operation in order to conserve some resource or to protect against some network defect. The extension to full network coding requires simply that the computation be performed by some network element along the information path that is able to perform the network coding. Unfortunately, examples of network coding can be quite complex. The simplest one is the “butterfly network” (see Figure 1) presented in a paper by Ahlswede, Cai, Li, and Yeung entitled “Network Information Flow”.





In this example a source S needs to multicast two packets of information P1 and P2 to two destinations A and B over the particular network of network elements U, V, W and X shown in the figure. All of the links have the same bandwidth, which is precisely the bandwidth needed to transmit the packets in the desired time.





It turns out that S can send P1 and P2 to both A and B at once, as shown in Figure 2. Network elements U, V, and X are multicast devices that are able to replicate a packet received on its input port to both of its output port. Network element W performs network coding by calculating the xor of two packets received on its two input ports and sending this to its output port.



Were W not able to perform this operation, it would need first to send P1 and then P2, thus taking twice the time, or alternatively would require twice the bandwidth on the link to X (contrary to our assumption on link bandwidths). It is not hard to convince yourself that without the network coding it is not possible to perform the desired task.

Network coding can be used for purposes other than bandwidth or delay minimization and packet loss protection. Recent research has explored applications to energy reduction, information security, file sharing, congestion control, and fairness.

Y(J)S