Echoprint: Open acoustic fingerprinting

This article brought to you by LWN subscribers

Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.

June 29, 2011

This article was contributed by Nathan Willis

Acoustic fingerprinting has been given a tremendous boost by the mobile smartphone business. You have probably seen the basic scenario in television commercials, if not in person: a user holds up a phone to capture a few seconds of audio playing nearby, and the application computes a "fingerprint" of the track, which is then used to query a remote database for the mystery artist and track name. The space has been dominated by proprietary software, but a new — and open source — project was unveiled last week, named Echoprint.

Fingerprints on the databases

Despite the name, acoustic fingerprinting has little in common with hash-based digital fingerprinting techniques used to detect alteration of a file. While a hash function is sensitive to changes in individual bits, an acoustic fingerprint function must robustly analyze the way the audio sounds, in a manner independent of the codec used, bitrate, or even static and ambient noise. Acoustic fingerprints focus on extracting perceptual data from the audio track, such as its tempo, average spectrum, and pattern of recurring tones. The canonical use is to discover the track information of an unknown audio clip, but other uses are possible as well, such as finding similar-sounding music (on any number of factors supported by the algorithm).

The proprietary software market is home to several acoustic fingerprinting services, most famously Shazam, SoundHound, and Gracenote. Gracenote is known to many in the free software community due to the controversy that erupted a decade ago when its corporate parent suddenly restricted the usage of CDDB, its user-built compact disc identifying database. Many felt betrayed by the policy change, because the CDDB data was submitted voluntarily by users when playing or ripping CDs, not entered or harvested by CDDB's owners themselves. The database was an early example of Internet crowd-sourcing, and many saw the sudden cut-off of access as outright exploitation of their effort.

Fast-forward to 2011, and most open source applications use competing, "open content" services instead, such as MusicBrainz, which is managed by the 501(c)(3) nonprofit MetaBrainz Foundation. For the past several years, MusicBrainz has supported acoustic fingerprinting through the closed-source MusicDNS service provided by MusicIP (later renamed AmpliFIND).

Although MusicBrainz had a perpetual contract with AmpliFIND for the service, it was never considered a good fit, since MusicBrainz's social contract requires it to remain "100% free". Recently, a handful of open source acoustic fingerprinting projects started picking up steam — such as Lukáš Lalinský's Acoustid — and MusicBrainz decided to start looking for open source and open content replacements for MusicDNS. Around the same time, the team at acoustic fingerprinting startup Echo Nest decided that its best strategy was to take its entire product open source and attempt to commoditize acoustic fingerprint services, rather than attempt to take on the entrenched players head-to-head.

Echo Nest and MusicBrainz had collaborated in the past on projects such as Rosetta Stone — a utility to match artist and track IDs between various music services' ID databases — so the mutual decision to launch Echoprint as an open project and begin integrating it with MusicBrainz was a good fit from both perspectives. But it did not hurt that AmpliFIND also sold off its intellectual property holdings — including MusicDNS and the portable unique identifier (PUID) database — to none other than Gracenote.

The Echoprint release

The Echoprint system consists of three components. The Codegen fingerprint generator takes an audio file (or audio sample) as input, and generates a fingerprint based on the Echo Nest Musical Fingerprint (ENMFP) algorithm. The Echoprint server maintains a database of fingerprints indexed to track information, and supports remote queries as well as inserting new fingerprints and tracks. The Echoprint database itself contains publicly-accessible track and fingerprint data. The database contains fingerprint codes for the entire duration of each track, but as in most acoustic fingerprinting techniques, only a shorter segment is usually sent for comparison. Echo Nest claims that Echoprint provides accurate matches for fingerprint blocks computed from samples of at least 20 seconds in length.

In practical usage, an application would sample audio (either captured or from a file), use the Codegen library to compute a fingerprint, and query a compatible Echoprint server. The server would return any matching track records in JSON format. Alternatively, if there are no decent matches, the application could submit its fingerprint information to the server's database along with track metadata acquired through some other means.

The code for Codegen, the server, and various utilities (including an example iPhone app) are hosted at GitHub. The Codegen application and shared library are available under the MIT license, while the server (which is based on Apache Solr and Tokyo Tyrant) are under the Apache License 2.0.

The public Echoprint database is provided under its own terms, dubbed the "Echoprint Database License". It allows for commercial and noncommercial usage, and requires that anyone who downloads the data and adds to it contributes the additional data back to Echo Nest. That clause is something less than a Creative Commons-style "Share Alike" requirement, because it requires sending the data to Echo Nest alone. The preamble to the license seems to indicate that all such contributions will be shared with the public, but Echo Nest assumes no obligation to share the data. The initial release is "seeded" with approximately 13 million fingerprints generated from online music vendor 7Digital's digital holdings, with metadata provided by MusicBrainz.

There are some other potentially worrisome terms in the agreement, including a requirement to use Echoprint "powered by" logos in any application that accesses the data. In addition, the agreement is not clear about how Echo Nest can modify or terminate the agreement down the road. For those who were burned by the CDDB debacle, this agreement should give them pause as it is not at all clear that the same couldn't happen with the Echoprint database.

At the moment, Echo Nest has not published the details of its algorithm in a form suitable for casual reading. The source code for Codegen is provided, of course, but a white paper is supposed to be released shortly that will explain the process at length. Unfortunately, the current legal documents do not explicitly address patent grants in relation to the software (the MIT license is very brief), which might concern some developers. Acoustic fingerprinting is a patent-laden field, and indeed a little searching reveals several relevant filings in the name of Echo Nest and its founders Brian Whitman and Tristan Jehan. On the plus side, all of the proprietary acoustic fingerprinting services are in roughly the same position.

Currently Echo Nest's own "song/identity" server is the only up-and-running Echoprint database, although obviously any application authors could set up their own servers for testing purposes. The Codegen command-line application will build on any reasonably modern Linux system; the only significant dependencies are TagLib, Boost, and FFmpeg. The application generates a fingerprint from a file argument (optionally followed by a start time and duration, both in seconds). The output is a JSON object including ID3 tag information from the file and a base64-encoded representation of the fingerprint. This output can be posted directly to the Echo nest server with cURL or a similar tool, as documented in the Codegen README file.

Play or Pause

MusicBrainz's Robert Kaye said that the project plans to retain support for PUIDs and MusicDNS in the MusicBrainz database for the foreseeable future (or until "people pester me to get rid of it."). The project is running a test server that uses Echoprint in lieu of MusicDNS, but there is no time frame to add tables to the main database to support Echoprint.

Kaye said that he expects more tuning to be done to the Echoprint product before it is ready for widespread adoption, but he observed that "critical mass" is the most important factor — meaning support in client applications and a sizable database of reliable fingerprints. The 13 million tracks pre-loaded with 7Digital's help may sound like a lot, but for comparison, Shazam claims ~~more than one billion songs in its database~~ to have identified more than one billion songs.

Given the number of open source audio projects that use MusicBrainz, it is safe to say that Echoprint has its foot in the door. It is the first entirely open source acoustic fingerprinting system to hit the market in "ready to use" form, so it may spawn considerable development of song recognition in open source mobile applications. Without the burden of licensing fees, the technology could spread beyond stand-alone song-recognition-apps, open or closed.

Nevertheless, Kaye emphasized that MusicBrainz post-MusicDNS move is meant to make the project agnostic to acoustic fingerprinting algorithms. Acoustid is still in active development, too, has documented the details of its algorithm, and does not require changing the MusicBrainz database format for support.

Whether the two fingerprinting techniques overlap, complement, or compete may ultimately be up to the users to determine. Echoprint is so new that it is difficult to predict where it will go from here. The MusicBrainz support is naturally a big boost, but better technical documentation and clarification of the fuzzy legal questions may be required before application authors can be expected to pick up the technology in large numbers. But without doubt, it is poised to fill a visible hole in open source mobile software. An open solution that works well with the crowd-sourcing techniques needed to build the fingerprint database will likely have staying power in a niche with so many similar proprietary offerings.

Index entries for this article
GuestArticles	Willis, Nathan

(Log in to post comments)

Echoprint: Open acoustic fingerprinting

Posted Jun 29, 2011 20:06 UTC (Wed) by jimparis (guest, #38647) [Link]

Shazam claims more than one billion songs in its database.

No way there are that many songs in the world. That link says that they've performed one billion identifications. This article from Dec 2008 says that Shazam's database holds 8 million songs. So a "resolvable catalog" of 13 million songs for Echo Nest sounds pretty good!

Echoprint: Open acoustic fingerprinting

Posted Jun 30, 2011 19:53 UTC (Thu) by xtifr (guest, #143) [Link]

No way there are [one billion] songs in the world.

Actually, I suspect there are far more than that! The number that have been recorded and distributed on the Internet may be much smaller though. But even there--there's a lot of music coming out of Japan and India, you know. Not to mention the rest of the world. As for not-recorded--my niece made up at least two songs in the last week alone! :)

It's also going to depend on how you define "song". Is the legendary Jimi Hendrix cover of Bob Dylan's All Along the Watchtower a different song from Bob's original? I suspect most people would say yes, but then what about the version Bob recorded in collaboration with the Grateful Dead? Is that the same as Bob's original? Or is it the same song as the version the Grateful Dead released on their own a couple of years later? Both? Neither? What about the over 1.5 million liberally-licensed live tracks hosted on the Internet Archive's Live Music Archive? (The overwhelming majority of which are from the USA.) What about the guys I regularly see on the streets around here selling privately made CDs of their own group's work (some of which probably does end up on the Internet)?

I suspect your estimates of the current database sizes are accurate, but I'm not so sure about your proposed theoretical limits.

Echoprint: Open acoustic fingerprinting

Posted Jul 1, 2011 0:22 UTC (Fri) by giraffedata (guest, #1954) [Link]

That link says that they've performed one billion identifications

Thanks for that. I couldn't derive that meaning either from the referenced web page or LWN's corrected text.

It's one of the language foibles that always irritates me: people say "I did X to N things" when they mean "I did X to something N times." Like when a transit agency says 50,000 people ride the the bus on a typical day, when they really mean there are 50,000 boardings (by about 22,000 people) on a typical day.

Echoprint: Open acoustic fingerprinting

Posted Jun 30, 2011 8:08 UTC (Thu) by pabs (subscriber, #43278) [Link]

There is also a more general perceptual hash library:

http://phash.org/

Echoprint: Open acoustic fingerprinting

Posted Jun 30, 2011 15:23 UTC (Thu) by jengelh (subscriber, #33263) [Link]

Nothing beats Perl's soundex! :-)

Echoprint: Open acoustic fingerprinting

Posted Jun 30, 2011 9:51 UTC (Thu) by felipec (guest, #75494) [Link]

Years ago I told MusicBrainz guys the dangers of relying on a closed source component so much.

Now, if there only was a way to compile for Linux.

Echoprint: Open acoustic fingerprinting

Posted Jun 30, 2011 17:35 UTC (Thu) by iabervon (subscriber, #722) [Link]

It looks to me like this is evidence of the opposite: it's great to rely on a proprietary component, because a better open-source component will arise, either before you need it or soon after. In general, it's often good to start with some dependency as a temporary measure, until there is something really good available. The fact that having your dependency be proprietary is doomed in the long run is actually a benefit here, because it means that you will actually get around to making the transition.

Echoprint: Open acoustic fingerprinting

Posted Jul 7, 2011 7:06 UTC (Thu) by jamesh (guest, #1159) [Link]

Of course, this isn't the first time MusicBrainz has had its proprietary fingerprinting solution disappear from under them.

The PUID system they currently use was a replacement for a different proprietary system called TRM. The company behind the TRM fingerprinting had moved on to other things, and the server software was unreliable for the load it was being put under.

It was a bit surprising to see MB switch to another proprietary solution after their prior experience, but it is good that they've finally got an open system they can rely on.

Echoprint: Open acoustic fingerprinting

Posted Jul 7, 2011 14:42 UTC (Thu) by lukaslalinsky (guest, #76480) [Link]

There were these two options:

1) Proprietary solution
2) No fingerprinting at all

I think it's a fairly obvious choice. If there was an open solution back then, MusicBrainz would have used it, but there wasn't and MB was never interested in developing its own fingerprinting technology because it's not the primary goal of the project.

Echoprint Database License?

Posted Jun 30, 2011 12:48 UTC (Thu) by giggls (subscriber, #48434) [Link]

Looks like ODBL (http://www.opendatacommons.org/licenses/odbl/) of Openstreetmap fame would have been a perfect match here.

MPEG-7?

Posted Jul 8, 2011 16:37 UTC (Fri) by rzm (guest, #116) [Link]

I wonder how into all this fits MPEG-7. Is it impractical? To complicated?