Support: 1-800-961-4454
1-800-961-2888

Coding in the Cloud – Rule 6 – HTTP Includes

3

By Adrian Otto

This continues my series on Rules for Coding in the Cloud. These are rules I’ve developed after watching applications encounter problems at scale when deployed on Cloud Sites.

Rule 6:  Never use HTTP include. Let me explain.

How does a HTTP include work?

You tell your PHP application, “I want to include a file.” For the file name, you supply a URL, which the server must download.  A client makes a connection to a PHP web server, the PHP web server runs an application, the application opens a file, and the file type is a URL. The server makes contact with another server, downloads this URL and puts the output into the PHP script.

Why is this a problem?

This results in not only a huge security problem, but also a performance problem. And now you’re faced with a potential outcome that could be disastrous—an infinite loop in an elastic server environment. You can accidentally create an HTTP include which includes something from your own site, which includes something from your site, which includes something from your site, and… well, you get the idea. If you do that, you’ll get a single client connection, which will open a connection to itself, over and over, until you have 50,000 of them running in parallel. The last connection will then hit the limit that you’re allowed to create and the entire thing will roll all the way back. You’ll get a failure, and the whole application will proceed as if it never happened.  Unfortunately, you will not be aware of this issue until you receive your bill with an outrageous amount of compute cycle usage. The cloud had to do huge amounts of work that you couldn’t even see!  That’s really the scary part about this scenario because the site looks like it’s working just fine. When you browse through your site, it comes up relatively quickly because that just scales through the entire system.  Meanwhile, The Rackspace Cloud is receiving alerts. You may not even know that your site has done the equivalent of 50,000 hits for every single hit.

In addition, you may also inadvertently involve someone else’s site. If you have two interdependent sites, the two may end up fighting back and forth, creating a massive loop.  And because the server is making the HTTP connection, the browser is completely unaware of it, so the browser’s anti-loop code won’t prevent it.  There’s no way to break the loop because there’s no way to see where it starts.

There is more than one way to do an HTTP include. One of them actually allows you to include PHP code from a remote URL and execute it as part of the local application. This feature (gaping hole) in PHP is actually disabled on Cloud Sites. What does work is using an fopen() call where the argument is a URL. This allows you to read data from that file handle and process it (potentially just printing it out to the browser). Try not to be tempted to eval() any of that output.

This may strike you as familiar advice. I mentioned a similar subject in Rule 4 – Avoid External Dependencies and included a code example of how to download content from a remote site on demand, cache a local copy, and provide non-blocking access to that data. The reason why this is a separate rule is I’ve seen it broken repeatedly, but not as an external dependency. It’s a risk of a circular internal (or external) dependency. People find reasons to HTTP include content from their own site but please try not to! What seems like an innocent include eventually leads to the infinite loop situation described above.

Bottom line: Never use HTTP include.

Click here to learn more about cloud computing.

About the Author

This is a post written and contributed by Adrian Otto.

Adrian serves as a Principal Architect for Rackspace, focusing on cloud services. He cares deeply about the future of cloud technology, and important projects like OpenStack. He also is a key contributor for and serves on the editing team for OASIS CAMP, a draft standard for application lifecycle management. He comes from an engineering and software development background, and has made a successful career as a serial entrepreneur.


More
  • http://www.thirdpartycode.com/ vid luther

    Great points, once again Adrian. One side effect of this remote http include, is also how it can cause your site to crash, or not work. If the server you are doing the include from is down, or slow, your visitors will experience a slow down on your site.

    I know that technically falls under the external dependencies post, but I thought people should be aware of it. I’ve seen a lot of my customers do something similar on the cloud, and then curse the cloud for being slow, when in fact it’s the code of their original programmer.

    • Tulio

      As I understand your comment, it goes in the opposite way of what Rule 4 – Avoid Unnecessary External Dependencies – recommends.

  • Blue

    Rackspace blog thing is difficult to navigate. Are these compiled somewhere?

    Thanks!

Racker Powered
©2014 Rackspace, US Inc.