Support: 1-800-961-4454
1-800-961-2888

Coding in the Cloud – Rule 4 – Avoid Unnecessary External Dependencies

13

Coding in the Cloud

By Adrian Otto

This continues my series, Rules for Coding in the Cloud – rules I’ve developed after watching applications encounter problems at scale when deployed on Cloud Sites.

chain-linksAvoid Unnecessary External Dependencies

Time after time on Cloud Sites, a new site will come online that displays information from another web site, like, say, stock quotes.  Let’s say the site sells dump trucks, and there are stock quotes for CAT and other equipment manufacturers they sell, and they want to show those stock quotes on their web site.  Every time there’s a page view, the site makes an outgoing HTTP connection to a stock web site, downloads the stock ticker data for those companies and then displays it as part of the HTML output of their own web site.

This works just fine—provided you’re not doing a whole lot of it.  But if your site suddenly becomes exceedingly popular because of press mentions, links from very busy web sites or Twitter, all of a sudden two million people are trying to access your site (and consequently the stock site), which can crash the stock site and take yours down with it.

The first thing the frustrated customer does is ask us why their site crashed.  When we look, we see that it jammed up waiting for stocks.whatever.com to respond.  So what happens is not that the load crashes your site running on Cloud Sites, but it crashes the remote site, the stock site in this scenario, and that dependency causes a train wreck that results in customer frustration.

Lesson learned: be smart about external dependencies.  Eliminate all external dependencies you don’t need and be smart about the ones you do – from sites that offer stock quotes, or geo location services, or any of these things that require you to call somebody else’s web service – because you just can’t trust that their site is going to scale as well as your own.  This can happen no matter what the size of the external site. We’ve seen it happen with cases where the external site was big, like stocks.yahoo.com.  There are some use cases where we’ve clogged stocks.yahoo.com in this very way because they see all of our requests coming from a single place, and it becomes completely unreachable from our network because of the way the request routing works.  You must not assume that because the remote web site is big or hosted by a big company that it’s running on an infrastructure that’s going to scale when you access it from your web app.  That’s not necessarily the case.

An increasingly popular feature for adding into sites is geolocation services, where you get the location of the person browsing your site. You go to a site, and it might say, “Thanks for browsing from San Antonio. We have a special offer for you in our store at River Center Mall.” These services work by looking up the user’s IP address and using it to determine the user’s location. Some geolocation services are free and not very accurate; others available for a price and tend to be more accurate.  Regardless, this is just the kind of external dependency that can bring down your site. The service starts responding slowly. Since we are charging for the time that your application is running, that slowness translates directly into dollars. Now you’re paying a premium to have geo location services on your site.  If you really must have geo location, don’t do it with a remote web service.  Do it with some kind of a local logic map, like a lookup database that you consult directly and that’s under your own control.

Mashups are another popular use case for external dependencies, and they don’t scale well unless you have a way of caching the results from the dependent web site.  If you include a mashup that passes all of your traffic through a remote site, you are trusting that site to scale as well as yours will. Unfortunately, unless that remote site is running on Cloud Sites, it’s probably not going to scale well, simply because it’s not backed by hundreds of servers.

aotto-gnu-softwareIf you must reference external data, be smart about it. I wrote a piece of software that you can use as an example for how to help mitigate this problem. This PHP code allows you to display information from another site, but, because it uses a caching approach, it can get fresh remote data in a generally non-blocking fashion and at reasonable time intervals.  You can configure the refresh interval to suit your needs. It allows you to have a remote dependency on your site, by limiting the frequency with which you interact with that site.

The bottom line with external dependencies is that they are evil when used blindly. Do everything you can to avoid them, or put a suitable buffer between your web app and any external dependency so that if the remote site does crash, your web app can still run.

About the Author

This is a post written and contributed by Adrian Otto.

Adrian serves as a Principal Architect for Rackspace, focusing on cloud services. He cares deeply about the future of cloud technology, and important projects like OpenStack. He also is a key contributor for and serves on the editing team for OASIS CAMP, a draft standard for application lifecycle management. He comes from an engineering and software development background, and has made a successful career as a serial entrepreneur.


More
  • http://daveyshafik.com Davey Shafik

    Considering the distributed nature of the cloud, web apps here and there and everywhere; remote data source integration is a huge factor of the current web.

    Saying to external dependencies are evil is irresponsible and outright wrong; however, I agree on one point: Be Smart™ about it.

    Developers should
    1) Set timeouts and handle them gracefully (in your geolocation example above: simply don’t display an offer, or give the user a form to choose their local store)
    2) Cache results (and give your users a way to invalidate it if necessary)
    3) Always find out the limits of the remote service; 100 hits per hour? Perhaps you should cache a little more aggressively.

    - Davey

    P.S.
    Curl is terrible, look into the HTTP Streams layer… it’s enabled by default, 100% cross-platform, zero external libs, and much nicer to use ;)

    • http://www.snipe.net snipe

      Well said, Davey. I agree 100%. To say “Eliminate all external dependencies” is simply absurd. The web is more open now than ever before, with APIs and feeds for just about everything, allowing developers to tie in existing functionality to enrich the user’s experience. Eliminating this would be setting us back 5 years. Do it right, handle exceptions properly, and as Davey said, cache aggressively – but ruling it out altogether is not the answer.

      You know I adore you, Adrian – and I know that part of your job at RS is to address issues like this – but I don’t think you should be suggesting people rule out something as important at this. Instead, recommend that developers handle it in a smart way.

  • http://www.jemery.com Jesse

    Probably worth noting that these are good practices even if you’re not in the cloud ;-) Learned these lessons the hard way years ago. It’s awfully embarrassing to have a site slow down cause of the weather feed…

  • http://www.thirdpartycode.com Vid Luther

    I agree with Jesse completely, nothing mentioned in this blog post is ‘new’. I’ve come across many sysadmins that are horrible programmers, and they usually don’t know anything about the application. In order to avoid headaches for themselves, and myself, all my source code is self contained, and doesn’t depend on ‘external dependencies’, like PEAR modules installed on the system.

    A lot of wordpress plugins today ‘talk to’ twitter, quite a few of them fail, whether you’re on the cloud or not, when twitter goes down. Proper error handling, and maybe some rules for timeouts would be more apt here.

    Asking people to not build mashups on the ‘cloud’ is just asking for trouble. Why would I build a local geoip service, when I could pay and get access to a better service remotely?

    Sure, we can cache some of the responses and better caching techniques could be discussed, but this post is bound to confuse people more than educate them.

    • ray

      I dont see PEAR as an external dependency since it is installed on the system. Besides, really I’m amazed you dont use PEAR. You must be a code genius or havent really worked on a comprehensive web app ;)

      • http://www.thirdpartycode.com Vid Luther

        Ray,
        I do use PEAR, but I’ve seen too many servers that require some different pear module, you never know what version of the module will be on the cloud, so you make sure all your external libs are part of your app. you can use svn externals etc to keep everything in your app, working the way you want them to, and you know exactly what version will be available.

  • http://adrianotto.com Adrian Otto

    Thanks for all your comments. The spirit of this article was not clear enough, so I made a few minor word changes to clarify my point. The whole reason I included an example of a caching methodology is because I recognize that external dependencies are needed sometimes.

    The example I provided of a geo-location service is a real world example. It keeps surprising people, and crashes their sites under load. The problem is real. The web services available for geo-location today, and the related CMS plugins for it are particularly bad, and cost our subscribers a lot of money and aggravation when they turn them on without realizing what’s happening. I suggest using a lookup table because I know that these services have such tables available for sale. Using that approach has proven to work much better than using their relatively slow web services.

    I want developers on the cloud to remember my post and to be careful about the plugins they do use, or code they write so that they don’t repeat the same mistakes that we’ve seen (smart) people make repeatedly.

    About mashups… they are not scalable without proper design to make them so. Simply putting a mashup on the cloud will not make it scale. To make it scale you need sensible timeouts, caching, and ideally a non-blocking design.

    Please take away this message: Be careful about external dependencies. They will burn you if you don’t protect yourself.

    • http://www.thirdpartycode.com Vid Luther

      Adrian,
      Thanks for the clarification. I understood your intent, but I think it was confusing to people who are not as familiar with the problems you see daily.

      I agree, that not thinking about caching etc, will cause the mashup to fail, but that’s not limited to the cloud, it’ll fail on a managed server from Rackspace. All I’m saying is that your suggestions are not exclusive to the cloud. In the end, you make very good points :).

      • http://adrianotto.com Adrian Otto

        Vid, thanks for your comment. Yes, you can take just about all of my suggestions and apply them outside the could. In general this series is about things that limit your scalability. Many of the things I will present in this series are about how to avoid the pitfalls that we see people fall into, not necessarily how to make the world’s best cloud app. I will blog more in the future about advanced subjects such as geographic redundancy, data replication, eventual consistency, data stores vs. databases, sharding, and other subjects that are highly specific to application development in the cloud. In this series it’s more about “what not to do”. I’ll start with the basics.

        • http://www.thirdpartycode.com Vid Luther

          Glad to hear that Adrian,
          One thing I’d really like to hear from you, is how people deploy applications to the cloud. Lack of Rsync, makes deployments a fairly manual process. I’d love a shell, or just basic rsync capability, where I can push a tag from within SVN up.

          Maybe there’s a better way your bigger clients are using, but this is my biggest beef with the system. I’ll be detailing my griefs in a separate blog post on my site soon.

          • http://adrianotto.com Adrian Otto

            I’m one step ahead of you. ;-) You’ll see a blog post about that very subject and detailed instructions for how to get just what you need. Keep your eye on our RSS feed.

  • http://www.thirdpartycode.com Vid Luther

    Adrian,
    That’s awesome, I’ll wait on my post till then :).

  • Pingback: Rackspace Cloud Computing & Hosting |  Coding in the Cloud - Rule 5 - CMS Plugins

Racker Powered
©2014 Rackspace, US Inc.