Voice-over-IP as a standards-based technology is now roughly 22 years old, using the publication of H.323 and RTP as rough milestones. In those years, industry, academia and standards bodies have accomplished much: There are now robust, interoperable products, spanning all major sectors of telecommunications including enterprise desktop, call center, residential landline and mobile, through IMS, VoLTE and RCS. New applications benefiting the public, such as NG911 and video relay services, can draw on those standards and replace outdated legacy systems. Security has improved, with systems relying on cryptographic authentication and encryption rather than simply allowing anything that could find an RJ-11 plug to connect to the network.
However, voice communications is changing - some of the changes have been slowly gathering steam and others are likely to seem more surprising. I will highlight a few themes:
Can VoIP be more than SS7 over IP?
VoIP-based systems use all the modern affordances, from web user interfaces and cloud to public key cryptography, yet the basic user-visible functionality for phones found on typical office worker desks still has not progressed much from the 1990s CLASS (Custom Local Area Signaling Services) features such as call waiting and call forwarding. We still struggle to add a third party to a call without worrying that the other side will get dropped. We still have to explain to callers that they just interrupted a meeting. At home, we get calls to voice lines from fax machines, or, if you are unlucky, somebody from a time zone 12 hours away calls you at 3 am by accident or somebody calls your mobile phone during a wedding or funeral service. (For a few years now, Android and iOS are at least providing relatively functional do-not-disturb features that can prevent some of these annoyances, if one keeps one’s calendar in order.) Number portability in the United States has arcane restrictions that cannot be explained to any consumer. Except for some in-house calls and calls within a single cellular provider, calls are likely to be voice only, with the same G.711 voice “quality” that we have had since the 1960s. With few exceptions, only proprietary systems such as WhatsApp or Skype have experimented with new features and offer better voice quality. It is ironic that toll-quality used to mean “good” - now free international calls using over-the-top (OTT) apps sound better than the per-minute paid variety. Carriers routinely cite the age of their equipment to excuse the lack of any new features, as if running equipment from the 1980s, from manufacturers that have long given up on it and that has been depreciated a decade ago, while charging for the service, is a sign of customer-focused engineering.
We can do better. Integrating personal address books, speech recognition (Alexa, Google Voice, among others) and calendars into enterprise systems is not that hard. Transitioning from TDM interconnection to end-to-end VoIP-based calls should not require an act of Congress (or, if there was that will, the FCC). Integrating real-time text into calls would be helpful not just for people with hearing loss.
“Just leave me alone”
Based on recent FCC estimates citing industry sources, about half of all residential voice calls are now unwanted robocalls, from outright scams to illegal telemarketing to unwanted surveys and election-related calls. Email faced a spam crisis in the 1990s - and voice telephony is now only likely to appear in the general interest news because of a new scam or, lately, because politicians have discovered a bipartisan crowd pleaser in being against robocalls. We all now have learned to ignore calls unless we recognize the caller ID - which defeats the purpose of an open communication system that should not require prior introduction.
There is some hope that, after long years of cajoling by regulators and some industry participants, carriers will implement call authentication, making spoofing of originating numbers much harder. Call authentication echoes the email authentication and sender validation efforts, such as DKIM, SPF and DMARC, from years ago. This won’t prevent unwanted robocalls by itself, but will make call filtering effective at scale. Enterprise call centers will need to plan on how they are going to participate in call authentication using the STIR/ SHAKEN SIP call authentication framework, particularly if they are using outbound numbers that are not the same as their inbound numbers. Currently, the call authentication framework discussions are dominated by large carriers, and enterprises and cloud service providers need to get organized. The FCC has asked all carriers to authenticate all calls by the end of 2019 “or else.”
Complaining that some robocalls are wanted and that these filters prevent consumers from receiving valuable calls will likely receive little sympathy, given how little the call center industry has done to make automated phone calls more than a nuisance. Longer term, legitimate commercial calls will need to be labeled with the type of call, such as “financial”, “survey” or “political” and much more information about the caller, such as business locations and nature of the call (“appointment reminder”), allowing consumers and service providers to offer more fine-grained filtering than a simple wanted-unwanted decision. If, despite accurate labeling, people do not want to get “valuable” reminders about overdue loans, it’s their home and their decision. Offering 15-character uppercase CNAM data, and mostly for landline callers, is not exactly what consumers expect in the age of million-pixel smartphones. SIP has offered these capabilities since 1996, but backward compatibility with 1980s TDM systems and general lack of industry initiative seems to have prevented any significant improvements.
Providers of cloud-based services used for outbound calls will likely have to be much more careful as to who uses their services for what purpose, or find their wholesale carrier cutting them off, just as high-volume senders of email with less-than-careful opt-in practices and faked sender addresses found themselves on email blacklists and saw their deliverability numbers tank. “Know your customer” will become not just something banks are required to worry about, but something that every cloud-based service provider and carrier will have to plan for.
“Can you please mute your line?”
Probably due to robust competition, conference call systems have improved markedly in the last decade, with robust screen sharing functionality, for example, but the basic interaction is still far too difficult and remote “dial-in callers” remain distinctly second-class participants. Callers on the conference bridge can only participate in a meeting room discussion by being rude and interrupting. Since many conference room phones have screens, why can’t there be a hand-raising tool? Why can’t tools allow the leader to keep track of who has been monopolizing the floor and who may need encouragement to contribute? Is there no better way to ensure call tranquility than exhorting everyone to please mute their line?
Participating in a conference call still involves often dialing a toll-free number, followed by some random digit sequence copied manually from the invitation email, and hopefully for the correct instance of the call. Largely, these systems occupy a separate sphere from other collaboration tools, particularly for ongoing, long-lived collaborations spanning organizations. Most ad-hoc groups now involve some concoction of a web-based conferencing tool, email with a cc-list hopefully including everyone, Doodle for finding meeting slots and maybe a shared document store or a Wiki provided by a cloud provider, all with different logins, manual adding and removal of group members and no coherent user experience.
“Which video app are we using today?”
Interoperability seems to have decreased, not increased, in the past few years. Partially, this is due to the lack of innovation in standards-based systems (and maybe their internal complexity), but it makes users go through coordination efforts for simple tasks. (The cartoon at makes the point vividly for texting.) Texting, other than basic SMS, and video calling have largely gone proprietary, with no hope other than installing half a dozen apps, Windows or MacOS tools or using the WebRTC application via a compatible browser. Maybe this is workable, but it leaves everyone at the mercy of the tool vendor. Building enhancements, like integration with calendars and project tools, content analysis or custom user experiences for niche applications or users with special needs, are difficult to impossible.
As modern real-time communication using Internet protocols enters its third decade, there’s still a lot of room for making these tools spark joy, rather than be seen as annoyances or something that the IT department is foisting on users. This requires a willingness to modernize ancient equipment, coordinate among industry players and put user interests first.