Home  
  CompTIA  
  Practice Exams  
  TechNotes  
  - A+ Core -  
  - A+ OS -  
  - Network+ -  
  - Security+ -  
  - Linux+ -  
  Links  
  Forums  
  Blogs  
  Topsites  
  Search the Web  
  Watch free videos online  
     
  Subnet Calculator  
  Online Degrees  
  Exam Vouchers  
  Free Magazines  

   
Network+ TechNotes - Fault Tolerance & Disaster Recovery

Members-only Printer-friendly version
Download the complete PDF guide
with all our Network+ TechNotes

Fault Tolerance

Fault tolerance refers to software or hardware options that allow a system to continue operating in case a particular component fails. The main purpose of fault tolerance is to guarantee the availability of information systems to users. Following are some of the most common fault tolerant configurations.

UPS

A UPS (Uninterruptible Power Supply) is a hardware device installed between a power outlet and a system. Example systems include servers, monitors, routers, and other network devices. When the main power fails, the UPS takes over and functions as a battery. This allows the system to stay running ‘uninterrupted’ so the system can be taken down properly after warning users and closing sessions. Or, make an effort to restore the main power before the UPS itself runs out of power.

There are two main different types of UPSs. The first is the standby UPS, which is active only when the main power fails. When that happens, the UPS switches to its battery pack to provide power to the connected devices. During this switching of the power source, the power may be interrupted even if it is for the slightest amount of time. The second type is the online UPS, also referred to as a true UPS, and always provides power from its battery pack. The latter is continuously recharged from the main power source, and continues to provide power during short or longer power outages.

Link Redundancy

A faulty network interface card or cable can prevent an entire server from being able to provide its services to users. To prevent a NIC and cable from being a single-point of failure for the entire server or network device, an extra NIC can be installed and connected. If one interface fails, the other can automatically continue to operate. Multiple NICs can be combined to provide load-balancing in addition to fault tolerance. This means that the load of network traffic can be dynamically and equally divided over the two connections. For increased bandwidth the two links can also be combined to act as one, and still provide fault tolerance by continue to operate if one link fails. Link redundancy also refers to implementing multiple WAN connections between branch office or to the Internet for example.

Mirrored Servers

A more advanced solution is to mirror complete servers, also known as clustering. A cluster contains two or more nodes (servers). If one node fails, another node will take over its duties. This process is known as fail-over. In modern configuration the nodes connect to a shared storage device using fiber optic cabling. The obvious advantage of using clusters is that the availability of services such as file and print sharing, databases, and shared Internet connections are protected. Even though the services can be mirrored on multiple servers, they will still appear as a single server to users.

RAID

RAID (Redundant Array of Inexpensive (or Independent) Disks) allows multiple hard disks to be combined in a set to expand the maximum amount of storage and/or provide fault tolerance in case of a disk failure. RAID is primarily used on servers in corporate environments but using RAID on workstations is not uncommon anymore. Following are the three most common types of RAID, which are relevant for the Network+ exam.

RAID 1 refers to Disk Mirroring/Duplexing. This configuration requires two, in some cases identical, hard disks. When data is written to a RAID 1 set, it is written to the primary and the mirrored disk. This may slow down write performance, but increases read performance since data can be read form both disks at the same time. It is called duplexing when each disk has its own hard disk controller, providing an extra level of redundancy. When a disk fails, the other disk can continue to operate. This process occurs entirely automatically on the better RAID systems.

RAID 5, also known as a stripe set with parity, is more advanced and requires at least three hard disks. When data is written to the RAID 5 set, it is distributed over all disks and parity information about data blocks on one disk are stored on the other disks. In case of a disk failure, the parity information can be used to reconstruct the data that was on the missing disk. Because data is spread out over several disks, RAID 5 offers better read performance than single or mirrored disks. However, because every write requires the parity calculation, write performance can be slower, especially when RAID 5 is implemented in software. If two disks in a RAID 5 set fail, you will need to replace the disks and restore the information from backup.

Fault tolerance RAID configurations implemented in hardware usually offer hot-swappable drives. This means you can pull out and replace a drive while the system is running and it will perform the reconstruction of the data automatically.

Another type is RAID 0, also known as a stripe set . It requires at least two hard disks, and does not offer fault tolerance. It is merely a method of combining hard disks to allow for larger storage volumes. When a file is written to a RAID 0 stripe set with two disks, the first block is written to the first disk, the second block to the second disk, and the third data block is written on the first disk, and so on. If one of the hard disks in the stripe set fails, the entire stripe set is lost and needs to be rebuild and restored from backup.

Disaster Recovery

When you implemented fault tolerance, it doesn't mean you ‘implemented’ disaster recovery. Planning for disaster recovery is an essential task, no matter the level of fault-tolerance. The goal of disaster recovery planning is to recover from a disaster as quickly as possible and keep the impact on day-to-day operations to a minimum.

Data Backups

Backing up data to tape regularly is the most common method to prepare for disaster recovery. Following are some important practices to consider when developing a tape backup strategy:

  • Use a carefully planned tape rotation scheme - You should avoid data on tapes from being overwritten too frequently. Problems with data may have occurred long before they are discovered and restoring a recent backup of the data may include those same problems. On the other hand, using a new tape for every single day is often too costly. A common rotation scheme is Grandfather-Father-Son. For example, a "Son" tape is used for a daily incremental backup on Monday through Thursday. These 4 tapes are reused weekly. A "Father" tape is used for a full backup on Friday, and a different Father tape exists for every Friday in a month. These 5 tapes are reused monthly. A "Grandfather" tape is used to perform a full backup on the last business day of each month in a quarter. These 3 tapes are reused quarterly. This method ensures there is always a backup archive of at least 3 months.
  • Store tapes at an off-site location - Imagine a large office complex with several buildings. A company that has offices in two buildings can easily exchange back ups at the end of a workday. If one building goes up in flames, the backup tapes will be safely stored in the other building. Having employees storing backup tapes at home is not a reliable alternative.
  • Store tapes in a locked fire safe - This actually doesn't always mean they will be safe from any fire, the heat can get so intense the tapes will melt anyway, but it is the least you can do.
  • Test backups frequently - A complete and reliable backup system can be a lifesaver for any organization, so it is imperative to make sure the backups actual can be restored. It is also important to test the backup procedure to be prepared and have a guide for when you do need to restore a complete server for example.

To understand the various common backup types, you need to know about the archive file attribute. If a file has this attribute turned on, it indicates to the backup software that the file changed since the time the archive attribute was turned off. An archive attribute is turned off by performing certain types of backup, or manually by using the 'attrib' command line utility or the change the file properties in Windows Explorer for example. The table below lists the most common backup types:

Normal/Ful l

Backs up every selected file, regardless of the archive attribute setting, and clears the archive attribute.

Copy

Backs up every selected file, regardless of the archive attribute setting. Does not clear the archive attribute.

Daily

Backs up every selected file that has changed that day, regardless of the archive attribute setting. Does not clear the archive attribute.

Incremental

Backs up only those files created or changed since the last normal or incremental backup, and clears the archive attribute. This method is used in combination with a periodic full backup. For example, use a Normal/Full backup on Mondays and an incremental backup on the remaining days of the week. In case of a restore, you will need the last normal backup as well as all incremental backups since the last normal backup.

Differential

Backs up only those files created or changed since the last normal or incremental backup, and does not clear the archive attribute. This method is also used in combination with a periodic full backup. For example, use a Normal/Full backup on Mondays and a differential backup on the remaining days of the week. In case of a restore, you will need the last normal backup and the last differential backup.


Hot and cold spares

Hot spare devices are fully configured spare devices that are identical to production devices and can be used to quickly replace a system in case of a disaster. Examples include routers, switches and complete servers. Hot-spare systems are also referred to as standby systems. A cold spare is a device identical or similar to a device that is operational in the network, but is not configured and does not contain any data. For example, in case of a disruptive event with a production server, it is replaced with a cold spare server. This server then needs to be configured, and data backups need to be restored before it can serve client again.

Alternate Sites

A more rigorous solution to ensure business continuity is an alternate site. These come in various shapes and sizes from fully equipped data centers to empty buildings, and are often divided into three categories, hot, warm, and cold sites.

  • Hot site – A remote facility with power, heating, ventilation, network equipment, local and remote network connections, fully configured servers and clients, and anything else that is needed to continue the primary business operations as soon as possible after a disaster occurred. Data from the original site must be replicated to the hot site very frequently. This usually requires a high speed connection between the original and the hot site.
  • Warm site – A remote facility with power, heating, ventilation, and ‘some’ network equipment and business critical systems. This site can usually be made operational by restoring backups and configuring client, servers, and network devices.
  • Cold site – A remote facility with power, heating, and ventilation. A cold site usually doesn’t contain any hardware, and is basically just an empty space. To make a cold site operational, new equipment must be installed and configured, and data needs to be restored from backup.

 

Current related exam objectives for the Network+ exam:

3.11 Identify the purpose and characteristics of fault tolerance:
- Power
- Link redundancy
- Storage
- Services

3.12 Identify the purpose and characteristics of disaster recovery:
- Backup / restore
- Offsite storage
- Hot and cold spares
- Hot, warm and cold sites



Click here for the complete list of exam objectives.

Discuss this TechNote here Author: Johan Hiemstra




 

Featured Sponsors

TrainSignal - “Hands On” computer training for IT professionals. Network+ Training, MCSE, Cisco & more! Visit Train Signal’s free training site to get loads of Free Computer Training, videos, articles and practice exams.

 

All images and text are copyright protected, violations of these rights will be prosecuted to the full extent of the law.
2002-2011 TechExams.Net | Advertise | Disclaimer

TechExams.Net is not sponsored by, endorsed by or affiliated with CompTIA. CompTIA A+, Network+, Security+, Linux+, Server+, CTT+. , the CompTIA logo and trademarks or registered trademarks of CompTIA in the United States and certain other countries. All other trademarks, including those of Microsoft, Cisco, and CWNP are trademarks of their respective owners.