sakkmesterke - stock.adobe.com

Apache Daffodil advancing Data Format Description Language

Data integration and data loading efforts could get easier to execute, as open source DFDL project becomes a Top-Level Project at the Apache Software Foundation.

Apache Daffodil graduated to Top-Level Project status within the Apache Software Foundation, signifying the stability of the technology, as well as the maturity of the project.

Understanding the attributes of the format of given data set to help with data interchange is a task that the Data Format Description Language (DFDL) helps to enable.

The DFDL specification is defined by the Open Grid Forum and enabled with software implementations, including the open source Apache Daffodil project. Daffodil was created in 2009 at the University of Illinois National Center for Supercomputing Applications and the project joined the Apache Incubator in 2017.

Daffodil is being used by a number of different vendors and government organizations, including DARPA [Defense Advanced Research Projects Agency], Raytheon BBN Technologies and Owl Cyber Defense, among others.

Ken Walker, CTO of Owl Cyber Defense, said the cybersecurity vendor uses Apache Daffodil as an embedded software library within its software platforms. Walker explained that the vendor's technology executes a deep level inspection of data in order to determine what is allowed to be transferred.

Owl Cyber Defense uses DFDL and Apache Daffodil to normalize the data, Walker explained.

Screenshot of Apache DFDL
Apache Daffodil is an implementation of the Data Format Description Language (DFDL) that can be used to help describe and understand data formats.

"We really have to know what the data is that's being transferred and be able to validate that that data should be allowed, so we built a set of filters to be able to inspect and manipulate that data," he said.

Once the cybersecurity vendor understands the data format using Daffodil, the data is then passed through a set of XML filters to manipulate and inspect it. He added that with Daffodil, Owl Cyber Defense is able to support many different data types.

Walker noted that the graduation of Apache Daffodil to a Top-Level Project, made public March 4, should now also make it easier for organizations to adopt the technology because it is now considered stable and mature.

Apache Daffodil and DFDL aim to make data integration easier

Apache Daffodil can play a role for any type of software or organization that is taking data in from different sources.

We really have to know what the data is that's being transferred and be able to validate that that data should be allowed, so we built a set of filters to be able to inspect and manipulate that data.
Ken WalkerCTO, Owl Cyber Defense

Michael Beckerle, vice president of Apache Daffodil and co-author of the DFDL specification, explained that the core part of any data integration environment is the ability to describe external data formats. With Apache Daffodil, he noted, users can understand a data format quickly, which helps software effectively use data. Beckerle is also technical principal at Owl Cyber Defense.

Apache Daffodil can potentially be used for a number of applications, including data loading and data integration. Organizations often do enterprise data loading with extract, load and transform (ELT) tools, which ingest data. Beckerle said that ELT tools can now embed Apache Daffodil to understand data formats, without the need for the ELT tools to reinvent their own nonstandardized approaches to understanding data formats.

Another use case is what Beckerle referred to as data-directed routing, which is less about data integration and more about understanding the data, so it can be routed to the appropriate place.

Apache Daffodil expanding Apache data stack

The Apache software foundation has a growing list of data projects that Daffodil will complement.

Among the many data projects at Apache is the Apache Spark data query engine, Apache Flink data processing and Apache Beam for data pipelines. Beckerle noted that the Daffodil project has started an integration effort with Apache NiFi, a data routing platform.

"I actually think Daffodil would be a great add-on, as a data importing and exporting capability for any of the Apache data processing fabrics," Beckerle said.

Looking forward, Beckerle said a key goal for the open source project will be improving usability, because the concepts behind DFDL can be complex to master and implement.

"[We're basically] just trying to make it easier for people to adopt," Beckerle said.

Dig Deeper on Data integration

Business Analytics
SearchAWS
Content Management
SearchOracle
SearchSAP
Close