Tuesday, October 26, 2010

What is Reseller Hosting?

Apache log file analysis and apache log file configurations and examples with this excellent guide.

Configure Web Logs in Apache

Author's Note: While most of this piece discusses configuration options for any operating system Apache supports, some of the content will be Unix/Linux (*nix) specific, which now includes Macintosh OS X and its underlying Unix kernel.
One of the many pieces of the Website puzzle is Web logs. Traffic analysis is central to most Websites, and the key to getting the most out of your traffic analysis revolves around how you configure your Web logs. Apache is one of the most -- if not the most -- powerful open source solutions for Website operations. You will find that Apache's Web logging features are flexible for the single Website or for managing numerous domains requiring Web log analysis.
For the single site, Apache is pretty much configured for logging in the default install. The initial httpd.conf file (found in /etc/httpd/conf/httpd.conf in most cases) should have a section on logs that looks similar to this (Apache 2.0.x), with descriptive comments for each item. Your default logs folder will be found in /etc/httpd/logs . This location can be changed when dealing with multiple Websites, as we'll see later. For now, let's review this section of log configuration.
ErrorLog logs/error_log
LogLevel warn

LogFormat "%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"" combined
LogFormat "%h %l %u %t "%r" %>s %b" common
LogFormat "%{Referer}i -> %U" referer
LogFormat "%{User-agent}i" agent


CustomLog logs/access_log combined
Error Logs
The error log contains messages sent from Apache for errors encountered during the course of operation. This log is very useful for troubleshooting Apache issues on the server side.
Apache Log Tip: If you are monitoring errors or testing your server, you can use the command line to interactively watch log entries. Open a shell session and type "tail –f /path/to/error_log" . This will show you the last few entries in the file and also continue to show new entries as they occur.
There are no real customization options available, other than telling Apache where to establish the file, and what level of error logging you seek to capture. First, let's look at the error log configuration code from httpd.conf.
ErrorLog logs/error_log
You may wish to store all error-related information in one error log. If so, the above is fine, even for multiple domains. However, you can specify an error log file for each individual domain you have. This is done in the <VirtualHost> container with an entry like this:
<VirtualHost 10.0.0.2>
DocumentRoot "/home/sites/domain1/html/"
ServerName domain1.com
ErrorLog /home/sites/domain1/logs/error.log
</VirtualHost>

If you are responsible for reviewing error log files as a server administrator, it is recommended that you maintain a single error log. If you're hosting for clients, and they are responsible for monitoring the error logs, it's more convenient to specify individual error logs they can access at their own convenience.
The setting that controls the level of error logging to capture follows below.
LogLevel warn
Apache's definitions for their error log levels are as follows:
Apache Web Log
Tracking Website Activity
Often by default, Apache will generate three activity logs: access, agent and referrer. These track the accesses to your Website, the browsers being used to access the site and referring urls that your site visitors have arrived from.
It is commonplace now to utilize Apache's "combined" log format, which compiles all three of these logs into one logfile. This is very convenient when using traffic analysis software as a majority of these third-party programs are easiest to configure and schedule when only dealing with one log file per domain.
Let's break down the code in the combined log format and see what it all means.
LogFormat "%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"" combined
LogFormat starts the line and simply tells Apache you are defining a log file type (or nickname), in this case, combined. Now let's look at the cryptic symbols that make up this log file definition.
Apache Logs 2 - Click for larger image
To review all of the available configuration codes for generating a custom log, see Apache's [1] docs on the module_log_config , which powers log files in Apache.
Apache Log Tip: You could capture more from the HTTP header if you so desired. A full listing and definition of data in the header is found at the World Wide Web Consortium [2] .
For a single Website, the default entry would suffice:
CustomLog logs/access_log combined
However, for logging multiple sites, you have a few options. The most common is to identify individual log files for each domain. This is seen in the example below, again using the log directive within the <VirtualHost> container for each domain.
<VirtualHost 10.0.0.2>
DocumentRoot "/home/sites/domain1/html/"
ServerName domain1.com
ErrorLog /home/sites/domain1/logs/error.log
CustomLog /home/sites/domain1/logs/web.log
</VirtualHost>

<VirtualHost 10.0.0.3>
DocumentRoot "/home/sites/domain2/html/"
ServerName domain2.com
ErrorLog /home/sites/domain2/logs/error.log
CustomLog /home/sites/domain2/logs/web.log
</VirtualHost>

<VirtualHost 10.0.0.4>
DocumentRoot "/home/sites/domain3/html/"
ServerName domain3.com
ErrorLog /home/sites/domain3/logs/error.log
CustomLog /home/sites/domain3/logs/web.log
</VirtualHost>

In the above example, we have three domains with three unique Web logs (using the combined format we defined earlier). A traffic analysis package could then be scheduled to process these logs and generate reports for each domain independently.
This method works well for most hosts. However, there may be situations where this could become unmanageable. Apache recommends a special single log file for large virtual host environments and provides a tool for generating individual logs per individual domain.
We will call this log type the cvh format, standing for "common virtual host." Simply by adding a %v (which stands for virtual host) to the beginning of the combined log format defined earlier and giving it a new nickname of cvh, we can compile all domains into one log file, then automatically split them into individual log files for processing by a traffic analysis package.
LogFormat "%v %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"" cvh
In this case, we do not make any CustomLog entries in the <VirtualHost> containers and simply have one log file generated by Apache. A program created by Apache called split_logfile is included in the src/support directory of your Apache sources. If you did not compile from source or do not have the sources, you can get the Perl script [3] .
The individual log files created from your master log file will be named for each domain (virtual host) and look like: virtualhost.log.
Log Rotation
Finally, we want to address log rotation. High traffic sites will generate very large log files, which will quickly swallow up valuable disk space on your server. You can use log rotation to manage this process.
There are many ways to handle log rotation, and various third-party tools are available as well. However, we're focusing on configurations native to Apache, so we will look at a simple log rotation scheme here. I'll include links to more flexible and sophisticated log rotation options in a moment.
This example uses a rudimentary shell script to move the current Web log to an archive log, compresses the old file and keeps an archive for as long as 12 months, then restarts Apache with a pause to allow the log files to be switched out.
mv web11.tgz web12.tgz
mv web10.tgz web11.tgz
mv web9.tgz  web10.tgz
mv web8.tgz  web9.tgz
mv web7.tgz  web8.tgz
mv web6.tgz  web7.tgz
mv web5.tgz  web6.tgz
mv web5.tgz  web6.tgz
mv web4.tgz  web5.tgz
mv web3.tgz  web4.tgz
mv web2.tgz  web3.tgz
mv web1.tgz  web2.tgz
mv web.tgz   web1.tgz
mv web.log   web.old
/usr/sbin/apachectl graceful
sleep 300
tar cvfz web.tgz web.old

This code can be copied into a file called logrotate.sh , and placed inside the folder where your web.log file is stored (or whatever you name your log file, e.g. access_log, etc.). Just be sure to modify for your log file names and also chmod (change permissions on the file) to 755 so it becomes an executable.
This works fine for a single busy site. If you have more complex requirements for log rotation, be sure to see some of the following sites. In addition, many Linux distributions now come with a log rotation included. For example, Red Hat 9 comes with logrotate.d , a log rotation daemon which is highly configurable. To find out more, on your Linux system with logrotate.d installed, type man logrotate .

What is Virtual Hosting?

Detailed guide explaining what virtual hosting is.

Virtual reseller hosting offers the appearance of a company's own server but the technical aspects of space sharing. Through virtual hosting, a developer can secure space on a server and have shared access to the server's features. Hosting companies provide this service by maintaining a large server and on that large server they maintain a number of virtual web hosts. The machine examines which "name" it is being called by and then responds appropriately. Thus, visitors to the site enter through the domain name of the developer and therefore cannot recognize that another company's server in fact, hosts the site.

Simply stated by Crowder and Crowder (2000), virtual servers are "nothing more than directories on a hard drive. The webmaster can make each one of the directories seem as though it were a fully functional web server". With a virtual web host, you will have your own identity, but you will not be required to maintain the equipment.

Virtual hosting packages are the most common on the Internet and offer a professional and well-established look for personal and small business web sites. Financially, the average virtual hosting account runs between $15 and $30 per month. Ample space and bandwidth for small businesses, multiple email accounts, cgi-bin access, and a T3 connection are common account features. Additional fees are often required for more advanced features including database software or SSL (secure server) functionality.

# What to look for in a Web Host?

To evaluate a web host, the price is not the only important factor, the client will need to take consideration about other characteristics.
To evaluate a web host, the price is not the only important factor, the client will need to take consideration about other characteristics.

Host's Connection to the Internet
One of the most important features is the quality of host's connection to the Internet. There are a bunch of variations T-1, T-3, OC-3, OC-256, etc. (Fig 5) A company that offers a T1 connection to the Internet can only allow 1.544 Mbps (Megabits per second) a T3 can allow up to 45 Mbps. Then again an OC-256 can allow 13,000 Mbps, giving the ability to transfer much more information at a higher rate of speed before getting bogged down.
Connection Types
Speed
Customers that are seriously worried about speeds that a viewer can see their site, need to consider how fast the servers are. Although bandwidth and connections are major factors, server speeds are equally important. A server that is a host to many sites that are being accessed simultaneously may get bogged down. No matter how fast the connection is this can seriously slow down a viewer’s speed to surf through a site. A simple way to test the speed at which a server responds is called "pinging" a site. This will determine how quickly a server can receive and send back a small piece of data through the connection you have to it.

Processor speed is also important. Certain sites will make greater demands on the host's CPU and will consequently run slower - and slow down every other site on the server as well (Beginner's Guide, 2000). Streaming video and audio, discussion forums and message boards, online surveys, and high-level animation all require huge amounts of memory and fast access to the main server. Overloaded processors can slow down a site's transmission considerably.

Server Software
Server software can also affect a site. UNIX and Windows NT are the most common server software environments. Advanced developers should be aware of what applications they will be using and assess which software environment will best suit their needs. Some hosting companies only offer one of the two software options.

Security
Protecting a site's data from unwanted intrusions is another key consideration for the web developer when selecting a host. The hosting company's security protocols should be outlined. Protection from everyday denial of service attacks and the various hacks and cracks that will be attempted on your server is essential. The hosting company should be responsible for upgrading and maintaining these security measures. "The only thing worse than having no security is thinking you have some" (Finding the Host, 2001).

Customer Service
Service is another important aspect to consider when shopping for a host. Hosts offer a variety of customer service options. Services offered can be 24-hour toll free number, 24-hour email help, Frequently Asked Questions pages and help forums. The amount of help you might need depends mainly on your experience and problems you encounter from the server.

ReliabilityChecking out the reliability of a service is also very important. Hosts usually have several backup systems in case something goes wrong with the main servers. They also can promise less "down time" by backup power systems such as a diesel generator.

Thursday, September 16, 2010

An introduction to domain names, web servers, and website hosting

I assume that you know nothing about the inner workings of the Internet; maybe you're not even sure how people actually get to web sites, where the web sites are actually sitting, what the web is in the first place....
In this article I am going to give you the minimum you need to get your 'feet wet' so that we can quickly get into building web sites. I won't go into painful micro-details that would put all but true nerds to sleep, again there is just enough so that you have a basic understanding of what's going on.

What is the web?

In a nutshell, the web is a whole bunch of interconnected computers talking to one another. The computers (on the web) are typically connected by phone lines, digital satellite signals, cables, and other types of data-transfer mechanisms. A 'data-transfer mechanism' is a nerd's way of saying: a way to move information from point A to point B to point C and so on.
The computers that make up the web can be connected all the time (24/7), or they can be connected only periodically. The computers that are connected all the time are typically called a 'server'. Servers are computers just like the one you're using now to read this article, with one major difference, they have a special software installed called 'server' software.

What is the function of server software / programs?

Server software is created to 'serve' web pages and web sites. Basically, the server computer has a bunch of web sites loaded on it and it just waits for people (via web browsers) to request or ask for a particular page. When the browser requests a page the server sends it out.

How does the web surfer find a web site?

The short answer is: by typing in the URL, or in other words, the web site address. So for example, if you wanted to find the web site www.killersites.com, you would type in the address into your web browser's address bar or maybe use your 'favorites' or 'bookmarks' link to Killersites.
There are other ways to find web sites (like search engines,) but behind the scenes web sites are all being found by going to the web site's official address. That brings us our last nerd detail: how does a website get an official address so that the rest of the web can find it?

Registering your domain name

If you ever wondered what the heck registering a domain was all about ... you probably figured it out by now! But just in case - registering a domain name gets you an official address for your web site on the World Wide Web. With this 'official' address, the rest of the web can find you.
Like your home address is unique in the real world, there also can't be any duplicate addresses on the Internet, otherwise no one would know where to go! In other words, domain names are unique addresses on the web.

Why does registering a domain name cost money?

If you want to have your own unique address on the web, your own domain name, it will cost a few bucks for each year you want to 'own' the name. The cost of registering a domain name ranges from less than $10 USD to about $30 USD per year. You can register a domain from 1 to 10 years.
The reason for the cost is that the central 'address book' of all the world's domain names needs to be updated - somebody's got to pay for that! You may have noticed that I just snuck in a little extra piece of information: the giant 'web address book' of domains.
That leads us to our last bit of nerd information: when you type in a website's domain name or click on a link that takes you to that domain name, your browser starts asking servers where that particular domain name is sitting (on the web) and the servers are then able to tell the browser where to go by referring to the giant address book I mentioned above.

How to be a good host: lessons from AdWords

The hosting defence
Article 14 of the E-Commerce Directive states that the provider of an ‘information society service’ that consists of the storage of information provided by a user of the service is not liable for illegal information, so long as the provider didn’t authorize, control or know about the information and removes it promptly when alerted to it. An ‘information society service’ is defined as any service normally provided for remuneration, at a distance, by electronic means and at the individual request of a recipient of the service.

Recital 42 says the service provider’s activity must be limited to a technical, automatic, passive process of operating a communication network over which third-party information is transmitted (or temporarily stored for the sole purpose of making the transmission more efficient).

Areas of confusion
During the Noughties, questions have been raised over who qualifies as an ‘information society service provider’ and what activities can be described as ‘storage of information’. Does Recital 42 really apply to hosts, or just conduits and caching?

Despite the fact that the hosting defence is so heavily relied upon, this is the first ECJ judgment and the defence has received relatively little consideration by the UK courts, though Continental courts have explored it. In the UK it has been successfully relied upon in libel cases: for comments posted on Usenet newgroups (Bunt v Tilley) and bulletin boards on websites (Karim v Newsquest). Google’s search engine was held not to be a host (Metropolitan International Schools v Designtechnica). There appears to be a public consensus that displaying third-party content on websites passes the hosting test – though many sites are not ‘normally provided for remuneration’.

Rather than analysing such simple phenomena, the ECJ finds itself jumping in at the deep end with more complex site-user interfaces: AdWords in the Google France cases and, in a year or two, eBay in L’Oréal v eBay.

The hosting defence in the Google France cases
In the Google cases, the question was: if Google’s use of AdWords does not constitute a trade mark infringement, does Google benefit from the hosting defence? It would appear that the purpose of the question was: if Google was not found liable for infringement itself, could it be liable for the infringement by a party that had paid for an AdWord? The ECJ held that Google’s use of trade marks is not an infringement. However, because advertisers that pay for AdWords are potentially liable, if Google were to have secondary liability for an advertiser’s infringement, would Google be shielded by the hosting defence?

The Court took the view that AdWords is an ‘information society service’. Moreover, AdWords fulfils the criterion of being a service limited to operating a communication network over which information is transmitted. Was AdWords ‘storage of information’ provided by the recipient of that service?

Last September the Advocate General said that AdWords nominally fulfilled the notion of hosting but should not benefit from the defence – it was not a ‘neutral information vehicle’ because of Google’s relationship with the advertisers. The ECJ, however, said that Google’s financial relationship with advertisers is not relevant.

In the view of the Court, Google could be said to be ‘storing data’ because it was holding it in its servers’ memory. However, to benefit from the hosting defence the defendant’s activity must be ‘ “of a mere technical, automatic and passive nature”, which implies that that service provider “has neither knowledge of nor control over the information which is transmitted or stored”.’ The ECJ described such a role as ‘neutral’ and not ‘an active role’. The Court has left it to the national court to determine if Google AdWords fits this description but relevant points include: ‘the role played by Google in the drafting of the commercial message which accompanies the advertising link or in the establishment or selection of keywords’ – the terms of the contract between Google and advertisers would help determine this. Perhaps there is a reference here to Google's Keyword Tool, which advertisers can use to choose keywords? If so, the Keyword Tool could be said to play a role that is both ‘active’ and ‘neutral’ in the sense that it is automated!

Storage
The Court held that Google ‘stores’ third-party data by holding it in the memory on its servers. This is true – but doesn’t AdWords go beyond this?

The storage of third-party data for AdWords is part of a process (cf. the discussion of eBay in L’Oreal para 437). Yes, Google holds the advertisers’ copy unchanged but that’s not all it does. It displays that copy at a given time, in a certain order and for a particular purpose. If I put a bicycle wheel on a shelf without altering it, then I am storing it. If I put it on a bicycle and ride it, I may not have altered the wheel, but ‘storage’ would not be the first word that comes to mind to describe what I have done. The website may not be changing the copy but it may be performing other activities around it: processing it, employing it, deploying it.

Which is not to say that the ECJ’s decision to base immunity from liability on ‘neutrality’ isn’t fair.

Lessons
Has this interpretation of the Directive helped clear things up for other sites?

Context doesn’t matter: the fact that it is acceptable for a host’s activities and deployment of third-party data to stretch far beyond static storage will come as welcome news for many, such as eBay. Similarly, if AdWords passes the test of being an activity limited to operating a communication network, then that must let a multitude of UGC sites off the hook.

Money doesn’t matter: receiving individual payments from users doesn’t weaken the defence – it would have trashed eBay if it had.

‘Storage’: Google ticked this box by holding data on its servers. What about parties that don’t have servers? The question of how freely ‘storage’ can be interpreted remains unclear.

‘Information society service provider’: well at least we now know for sure that it can include a website, not just ISPs. However, AdWords really can justify the claim that it is a ‘service’ and is ‘normally provided for remuneration’ – what about blogs and message boards that are free?

‘Lack of control’: this has emerged as central but hazy. It may suggest that sites should be wary of software that guides users’ choices? UGC is often produced through an interaction between users’ input and sites’ creative tools (examples here and here).

Under Article 21 of the E-Commerce Directive, the European Commission is obliged to re-examine the Directive every two years, proposing changes if necessary to adapt it to legal, technical and economic developments. Unfortunately only one report was ever produced. Another was planned for last year but has not yet materialized. When I asked why, the Commission told me it was its ‘prerogative’ to decide whether to produce reports.

The AdWords judgment has answered a few of the questions, but it looks like we’ll have to carry on making up the rules as we go along.

Web Hosting Lessons in the Gulf Oil Spill

(The Hosting News) – As you have no doubt heard – probably every day for the past several weeks – oil and gas exploration company BP epxerienced a deep water oil rig explosion April 20, 2010 off the coast of Louisiana. The collapse of the rig killed 11 workers, and has released at least 6 million gallons of crude oil into the Gulf of Mexico, according to a Coast Guard and BP estimate of the rate of oil flow out of the broken drill pipe.
If you haven’t actually seen the effects of the oil on the Gulf coast environment, then these devastating photos of the effects of the oil spill might come as a bit of a shock. One of the clearest lessons emerging from this trajedy is this: it is not enough to think about a disaster that might hit at some unspecified time in the future in an unknown way. In order to be prepared for a major negative event, you have to prepare and have all of your actions and systems tested and ready to go at a moment’s notice – for a particular eventuality.
What Web Hosting Companies Can Learn
These lessons as applied to a web hosting enterprise can be summed up in the following list of thoughts to prepare you for the worst. My thanks to reseller hosting provider in the UK, 34SP.com for helping out with these concepts. I’m sure you can think up many more.
1. Assume that something really bad will go wrong and that it will happen next week. A key flaw in reacting to the oil spill was complacency built up over years of seeing no spills take place in the Gulf. That resulted in the erroneous assumption that there would be no incidents in the future. Web hosting companies should not make this mistake. Just because your servers and network have performed well for weeks or even months, doesn’t mean they can’t be disrupted suddenly. Spontaneous events such as power outages, loss of network connectivity, malicious activity, or hardware failure are facts of life in the hosting industry. To live in denial, assuming that this will never happen at your hosting firm is a recipe for disaster.
2. Make a comprehensive list of the major things that can go wrong. Start here: make a list of things that would really wreak havoc if they broke. Your main server components, main switching, network connections, backups failing, phones going out – whatever points of failure you can envision in your particular setup – write them down in a list.
3. Rank order your list from ‘worst’ to ‘least worst’. Review your list of your company’s unique potential points of failure, and then imagine that the item broke…badly. Now imagine the fallout on your business. As you imagine each of these scenarios in detail, arrange the list to reflect the scenarios that are the most catostrophic at the top of the list – and those that merely hurt, but are not devastating, at the bottom of the list. You now have a rank ordered list of potential problems and can address what to do if they actually occur.
4. Starting with item number 1, create an action plan to deal with
each potential disaster.
Get as specific as you can about your response. For example – who exactly on your staff will be notified, and in what order? Which vendors will potentially be involved? Who are those emergency contacts? Document any hardware or software that may be required in an emergency to fix the problem – or better yet – consider any ways to make that system redundant, or consider stocking spare parts to be used in the event of an emergency.
5. Practice having a disaster. I live in Los Angeles. A few weeks ago the Coast Guard coducted a statewide series of drills to simulate an actual disaster scenario. According to information provided by a summary in the Los Angeles Times, ”The simulation will involve an actor pretending to be a gunman and a fake contamination of hazardous materials near Rainbow Harbor, officials said. The exercises are part of California’s annual two-day homeland security and disaster preparedness drills.” In other words, the emergency response teams imagined a scenario that would cause real destation and practiced addressing it, as those it were actually happening. This is exactly what will prepare your web hosting company for your imagined ‘worst case’ scenario. Prep your team and discuss what you’re trying to accomplish with the drill. Then conduct the drill as best you can and debrief with the participants. You can always refine your testing and drilling through repetition.
While these five steps are only a start on getting ready for a major negative event at your web hosting company, we can all learn from the Gulf oil disaster: be prepared for things to go wrong, and have a plan ready and practiced for when they do.