Rackspace Open Sources Atom Nuke, The Fast Atom Framework

Filed in Product & Development by Chad Lung | September 11, 2012 3:30 pm

What if you had a tremendous mountain of data, broken up and stored across thousands of servers, and your client wanted some specific portion of that data? You could assemble the whole mountain and send the whole thing to your client, leaving the client to pick out what’s needed. But there are reasons you split it up in the first place: it’s too big to store in one place or to transfer without interruption. Additionally there are reasons you manage the data, including security and privacy, so this mountain moving might not be a good idea.

What if you could create something as complex as this, with data in multiple formats from multiple origins stored across multiple servers but aggregated for multiple consumers, who could then repackage it for consumers of their own?

If you couldn’t give your client a copy of all your data, you could ask the client to describe the specific data that’s needed and then assemble those items the client needs. However, if you had many clients, each with their own mountains of data, would you have to create a direct path from every consumer to every fragment of data they need?

What you need is to easily create a bridge, integrating any number of data origins with any number of data consumers. Enter in Atom Nuke.

[1]

With Atom Nuke[2], no matter where your data originates and who consumes the data, it could be this simple to think about.

Atom Nuke Simplifies Integration

We created Atom Nuke[2] to give ourselves two kinds of power related to the high volumes of data produced by our Atom feeds.

[3]

A six-way integration requires eighteen paths, connecting three data origins with three data consumers so each has direct and equal access. Adding one new origin or consumer requires adding many new paths.

Atom Nuke is an open-source collection of utilities built on a simple, fast Atom implementation that aims for a footprint of minimal dependency. The Atom implementation has its own model and utilizes a SAX parser and a StAX writer.

With Atom Nuke providing a bridge, a six-way integration requires six paths, one from each of the three origins and three clients, with each path terminating at Atom Nuke. Adding one new origin or consumer requires adding one new path.

We designed our Nuke implementation for immutability, maximum simplicity and memory efficiency. Nuke also contains a polling event framework that can poll multiple sources. Each source may be registered with a configured polling interval that governs how often the source is polled during normal operation. That source may have any number of Atom listeners added to its dispatch list. These listeners will begin receiving events on the next scheduled poll.

Atom as a Building Block

Atom is a self-discoverable and generic syndication protocol. The Internet Engineering Task Force (IETF) describes Atom in several ratified Requests for Comments (RFCs):

The unique properties of the Atom specification have made it popular as a protocol for generic event distribution, syndication and aggregation. Using Atom as a common interchange format, event publishers add their domain-specific events to an Atom publication endpoint. Downstream, subscribers are notified of events they’ve pre-identified as relevant, controlling what they consume from potentially-vast collections of published data.

Atom Nuke Within Rackspace

Within Rackspace, the Cloud Integration team builds tools for all our software development teams to use. We need to provide high-quality tools but we also need them to be easy to use and work smoothly together so that we can encourage adoption throughout Rackspace.

Using Atom Nuke, we collect data from the Atom feeds supplied by Atom Hopper[9], another of our open-source tools. We then take that Atom data and feed it into several systems, including those that perform analytics on OpenStack[10]  deployments throughout our data centers. The analytics engine uses Nuke to collect the entire Atom feed data so it can be marshalled into a Hadoop[11] cluster. By combining our Atom Nuke and Atom Hopper tools, we’ve enabled complete portability of data: we can combine Atom events with data from other sources such as Rabbit MQ[12] messages and Flume[13] logs without requiring consumers of that data to deal with the complexities of interacting with those dissimilar sources.

Nuke Makes Working with Atom Easy

Atom Nuke excels as a an Atom feed crawler, since you can poll multiple feeds from multiple endpoints as well as define the polling intervals down to milliseconds. In addition, you can select events in response to specific triggers, such as when a specific Atom entry contains a subscribed category. However, Nuke is much more than a feed crawler, it can create its own Atom feeds if needed.

We built Atom Nuke with Java[14] but we recently extended support to Python[15]. Nuke is licensed under the Apache 2 license[16] and was created by John Hopper[17], a software engineer on the Rackspace Cloud Integration team. We’ve created some tutorials to get developers started with Nuke[18].

Building with Boxes, Not Bricks

Writing about a different kind of atom in a world that was just beginning to understand atomic structure and atomic energy, H.G. Wells (1866-1946) imagined a future in which using the power stored within atoms transformed many aspects of human life:

“I feel that we are but beginning the list. And we know now that the atom, that once we thought hard and impenetrable, and indivisible and final and–lifeless–lifeless, is really a reservoir of immense energy. That is the most wonderful thing about all this work. A little while ago we though of the atoms as we thought of bricks, as solid building material, as substantial matter, as unit masses of lifeless stuff, and behold! these bricks are boxes, treasure boxes, boxes full of the intensest force.”

—H.G. Wells, The World Set Free, 1914

We’re now at a similar point with the technology of our time. We have explored enabling technologies, such as Atom, and have begun fully using and building upon their capabilities, putting them to work in new ways to make new things possible. As we begin building with Atom Nuke, we’re using Atom not as a brick, but as a treasure box, containing amazing possibilities for fission and fusion, dividing and combining data to make new applications possible. By making Atom Nuke and some of our other projects such as Atom Hopper[9] available as open source, we hope we are also creating treasure boxes filled with ideas and possibilities.

To learn more about Atom Nuke, visit our project site[19] and check out the source code on GitHub[20].

Endnotes:
  1. [Image]: http://ddf912383141a8d7bbe4-e053e711fc85de3290f121ef0f0e3a1f.r87.cf1.rackcdn.com/atom-nuke-inall-outall.png
  2. Atom Nuke: http://atomnuke.org/
  3. [Image]: http://ddf912383141a8d7bbe4-e053e711fc85de3290f121ef0f0e3a1f.r87.cf1.rackcdn.com/atom-nuke-hardway-nonuke.png
  4. SAX: http://www.saxproject.org/
  5. StAX: http://stax.codehaus.org/
  6. Atom RFC: http://tools.ietf.org/html/rfc4287
  7. Atom Paging and Archiving RFC: http://tools.ietf.org/html/rfc5005
  8. Atom Publishing Protocol RFC: http://tools.ietf.org/html/rfc5023
  9. Atom Hopper: http://atomhopper.org/
  10. OpenStack: http://openstack.org/
  11. Hadoop: http://hadoop.apache.org/
  12. Rabbit MQ: http://www.rabbitmq.com/
  13. Flume: http://flume.apache.org/
  14. Java: http://java.com/
  15. Python: http://python.org/
  16. Apache 2 license: http://www.apache.org/licenses/LICENSE-2.0.html
  17. John Hopper: https://github.com/zinic
  18. started with Nuke: http://www.giantflyingsaucer.com/blog/?cat=61
  19. project site: http://atomnuke.org/
  20. source code on GitHub: https://github.com/zinic/atom-nuke/

Source URL: http://www.rackspace.com/blog/rackspace-open-sources-atom-nuke-the-fast-atom-framework/