Understand Your Distributed Apps with the OpenTracing Standard

522

Microservices and services-oriented architecture are here to stay, but this kind of distributed system destroys the traditional type of process monitoring. Nonetheless, companies still need to understand just what’s happening inside the flow of an application. Ben Sigelman, Co-founder of LightStep, said at his keynote at CloudNativeCon that by adopting a new standard for distributed applications called OpenTracing can tell those stories without building complex instrumentation, or fundamentally changing the code of your application.

“If you previously told a story about what happened in your process in that way with this squiggly line going through a single process, that story is gone,” Sigelman said. “In our conversations with numerous companies that have adopted this sort of technology, what they’ve been telling us is that, as they decouple their systems and their transactions I/O, behold they are no longer on any single process, they are literally unable to answer the most basic questions about what’s happening.

“The solution historically for this has been distributed tracing,” Sigelman said. “It still is a solution and it’s wonderful. So the question is, why isn’t it ubiquitous? … That is what OpenTracing is here for. The reason is the instrumentation has been too difficult. It’s required you to instrument not just across processes but across library boundaries in a way that often couples you to poorly engineered libraries that were written in an afternoon or a weekend by someone.”

OpenTracing is a vendor-neutral API standard, not something that one deploys, Sigelman said. Instead it’s something you program against, something you build into your microservices architecture. The OpenTracing API sits in the middle of the microservices process, like application logic, control-flow packages or existing instrumentation, and tracing infrastructure like LightStep, Zipkin, or Jaeger.

Sigelman showed how OpenTracing works through a demo involving a fake donuts-as-a-service website he created (DonutSalon.com, imbued with the glorious motto “Move Fast and Bake Things”), showing how to track where bottlenecks occurred when the audience faithfully ordered free donuts all at once.

“This can be really powerful if you think about a real system, in that any time you have a latency issue, it’s probably due to some kind of throughput concurrency bottleneck,” Sigelman said. “Being able to actually root-cause where these requests came from in the distributed system is actually fairly profound and something that is not possible with logging at the local level.”

In a Kubernetes system, OpenTracing tracks both the breadth of transactions (called “spans”) and the depth (the communication between clients and services, called “references”). Just tracking one or the other is essentially traditional logging, Sigelman said, but capturing both leads to a much better picture of the traffic in a distributed applications.

“I think it’s possible to get good quality tracing and avoid the pain and suffering of adding a lot of instrumentation or even really changing an application in meaningful ways if we can add the existing API standardization of OpenTracing to a little bit of magic between applications, client proxies, and then the network that connects containers to each other,” he said.

Watch the complete presentation below:

Want to learn more about Kubernetes? Get unlimited access to the new Kubernetes Fundamentals training course for one year for $199. Sign up now!