UPS(Uninterruptible Power System) plays a vital role in ensuring IT reliability. Therefore, their own reliability is also a key consideration. Mission-critical electrical loads can be at risk in the event of a UPS failure.
So, what can companies do to optimize UPS availability? According to this white paper, the common answer to this question is not the best choice. Ultimately, the reliability of a UPS depends more on the overall design of the power system than on the design of the UPS itself (eg, whether the UPS uses line-interactive or double-conversion technology). Ultimately, the way to improve UPS availability is undoubtedly to minimize the overall repair time, including the UPS and the entire power protection scheme, and maximize redundancy.
In addition, in this white paper, we will overturn the conventional wisdom that “more parts means more failure” and explain why a modular UPS design can provide superior reliability.
The Mean Time Between Failure (MTBF) Confusion
Historically, MTBF (Mean Time Between Failure) is a key metric used by UPS manufacturers to measure and describe UPS reliability. However, using MTBF to predict the availability of UPS is actually not convincing.
To illustrate this, let’s take an example. Let’s say a UPS has a MTBF of 200,000 hours. A non-expert might simply think that the device can run for 200,000 hours without failure (about 23 years). However, the fact is that UPS manufacturers cannot and will test their products for 23 years of trouble-free operation. Instead, they simply calculate an MTBF value ahead of time based on the expected lifetime of the UPS components. Then, when its shipments grow to be statistically significant, it will substitute some preliminary estimates based on the actual performance data of this batch of devices. Although these revised figures may be misleading. For example, if 2,500 UPSs performed well over a 5-year study period, the resulting MTBF value could be quite high. But if one of these systems has a lifespan of just six years, 90 percent of them could fail a year after the five-year study period.
Moreover, there is no universal standard for the measurement of MTBF. Over the years, many government agencies have continued to require manufacturers to provide calculation data according to the latest edition of the MIL-HDBK-217F manual, but many commercial customers have adopted the Telcordia (Bellcore) SR-332 standard process. A recent technology industry review found that these measurements, while useful, are not the only way manufacturers can assess product reliability. As a result, today’s manufacturers are increasingly focusing on Design for Reliability (DFR). Whereas past standards focused on individual electrical components and their relationship to the circuits used in the product design, DFR focused on the intended or intended use of the product under various conditions.
Ultimately, however, there is no standard solution for measuring the operation of a UPS with a supply load. As such, it is still difficult to compare the MTBF values of one manufacturer’s UPS with another.
Using availability to measure critical power backup systems is more meaningful. Given the important role UPS plays in data centers, the ability to quickly replace old or faulty parts is critical. Availability represents the relationship between MTBF and another unit of measure, MTTR (Mean Time To Repair). MTTR (Mean Time To Repair) refers to the time it takes to find a fault, respond to it, and fully repair it.
Availability values are generally expressed as a percentage of the number 9, indicating the percentage of time that a particular system was up and running during its one-year lifespan. For example, if the MTBF of a UPS is 500,000 hours and the MTTR is 4 hours, then its availability is 0.999992 or 99.9992% (500,000 ÷ 500,004). This means that the expected downtime per year for this UPS is 4.2 minutes.
However, taken alone, although availability is a better indicator of UPS reliability than MTBF, it falls short in some important respects. Specifically, availability does not account for the time consuming of routine maintenance. If a system has to be scheduled for inspection, recalibration, or routine maintenance every year, its actual operational availability will be lower than the value given by the formula above.
UPS Design and Internal Power Paths
Although increasing the number of power paths within a UPS increases costs, it ensures that critical loads are not interrupted if certain system components such as rectifiers, inverters or internal backup batteries fail.
UPS is basically divided into four categories from the design type:
· When the UPS detects a power failure, the backup UPS can cut off the mains power supply of the IT equipment (ITE) to provide power protection for the system. However, some backup power systems provide local power protection during overvoltage or undervoltage, with limited use of battery power. It can be seen that although a backup UPS can improve efficiency and reduce costs, sometimes the power protection provided is not comprehensive.
· Line-interactive UPS usually adjusts the voltage as appropriate before supplying power to the protected equipment. However, a line-interactive UPS must use battery power to prevent various frequency anomalies and power outages.
· Double-conversion UPS can completely isolate critical loads from utility power, ensuring clean, reliable power for IT equipment. Double-conversion UPSs consume more energy than backup UPSs and line-interactive UPSs, so they dissipate more heat in the data center or facility room.
· Double-conversion UPS with multiple operating modes usually operate in high-efficiency mode, which saves money and energy. After the power quality is guaranteed, they automatically switch to the higher power protection level of double-conversion mode. Additionally, most double-conversion UPSs with multiple operating modes use a modular standard component design that increases system availability by reducing the time it takes to perform maintenance and repairs.
Where these UPS designs differ is the power path inside them. A backup UPS usually has two power paths, controlled simultaneously by a power switch. Therefore, if the power switch fails, the IT equipment will lose power. Most backup power systems are below 2 kVA, so failures will only affect a subset of IT equipment.
Figure 1: Powered by a standard backup UPS, if the power switch fails, the IT equipment is powered off.
Line interactive UPS usually has two completely independent power paths, one of which uses the power interface. If the mains interface fails, the UPS will be battery powered to ensure a graceful shutdown of all connected equipment. Some top-of-the-line interactive systems also include a static bypass path that automatically bypasses failed components in the UPS, connecting IT equipment directly to utility power.
Figure 2: Power path of a standard line-interactive UPS
Most double-conversion UPSs have two power paths, one from utility power or generator power, and one from battery power, and the UPS also includes:
· Automatic static bypass switch can bypass a failed rectifier or inverter and supply IT equipment directly from mains power
Manual maintenance bypass device allows technicians to perform system maintenance without interrupting power to protected loads
Figure 3: Power path of a standard double-conversion UPS
Some double-conversion UPSs with multiple operating modes include an automatic maintenance bypass device that automatically bypasses the inverter when the UPS is undergoing service or maintenance, in addition to the two power paths of a standard double-conversion UPS. Additionally, if a double-conversion UPS with multiple operating modes is used in a modular redundant design, it can automatically choose whether to bypass the load, ensuring that the system is powered by the UPS’s backup power supply when performing maintenance. This reduces MTTR and reduces the risk of downtime or unplanned outages during maintenance and repair periods.
Figure 4: Power path for a high-efficiency double-conversion UPS with multiple operating modes
Strategies for Improving UPS Power Path Availability
There are many ways to improve the reliability of the UPS power path:
· Adding parallel battery packs: The risk of a UPS that uses a single set of series-connected batteries will greatly increase the risk of not powering the load normally. For example, a large UPS has 40 batteries connected in series (ie the positive of one battery is connected to the negative of an adjacent battery). If one of these batteries fails, the entire string of batteries fails, causing the UPS to fail to power properly. If an additional series of battery strings connected in series by 40 battery positive and negative stages is connected in parallel on the UPS, assuming that one of the battery strings fails, the UPS can still be powered by another normal battery string for a period of time, so that there is time Connect a backup generator to supply power or gracefully shut down the load equipment.
Figure 5: For a UPS powered by two series-parallel battery packs, the possibility of the UPS failing to supply power due to battery failure will be reduced
· Install a generator: battery power is only a temporary solution. Even the longest-lasting battery pack may be “powerless” when faced with a prolonged power outage. Therefore, in the event of a long-term power outage, it is ideal to use a generator as the backup power supply.
Figure 6: UPS power path with emergency generator
Make sure the UPS includes an automatic static bypass switch: In the event of a fault inside the UPS or in the event of a severe overload or short circuit on the load powered by the UPS, the automatic static bypass switch of the UPS bypasses the rectifier and inverter, and the mains A power source or generator supplies power directly to IT equipment. In the event of a fault, the static bypass switch only takes 3-8 milliseconds to switch the power supply, so it will not affect the normal power supply of IT equipment.
Figure 7: UPS power path with built-in static switch
Increase availability by installing UPS in parallel
Redundant design logic applies not only to power protection schemes, but also to UPS designs. Building multiple power paths into a power supply design can radically improve system reliability.
Figure 8: System and subsystem reliability.Source: U.S. Department of Defense
From Figure 8, we can draw two simple but important points. The first point, the power path components connected in series (such as subsystem A, subsystem C and subsystem D), weaken the overall reliability of the system; the second point, the parallel redundant power path components (such as subsystem B), Enhanced overall usability. This is because if one of Subsystem A, Subsystem C, or Subsystem D fails, the entire power path cannot function properly. Conversely, in Subsystem B, which consists of 3 components in parallel, if one of them fails, the other two components “take over” to ensure that the entire system operates as usual.
In other words, the “short-board effect” also applies here: the ultimate performance of the power supply chain is limited by its weakest link. Therefore, adding multiple redundancies at each point in a supply chain can improve its overall reliability. Therefore, the most reliable power transmission systems usually include multiple independent power paths from the main power source to the electrical load, avoiding overlapping as much as possible. With redundantly configured power systems, failure of components or routine maintenance will not cause IT equipment to shut down.
Figure 9: Multiple power paths are branched from the mains power supply to the UPS to supply IT equipment, thereby improving system reliability by adding redundancy
Parallel UPS Architecture
In the UPS industry, there are many ways to deploy systems in parallel. The two most common ways are a series-parallel combination deployment architecture or a fully redundant parallel deployment architecture.
Figure 10: System architecture for a series-parallel combination deployment in normal operation (top) and with faulty operation (bottom)
Serial redundant configuration architectures are sometimes used when two different models or UPS systems from two different manufacturers are required to support the base load, and they cannot be paralleled in a redundant configuration. But an architecture deployed using a combination of series and parallel can help you overcome this limitation.
However, systems deployed in series-parallel architectures provide limited redundancy and require several critical events to protect the load during failures. These events include:
1.) The failure system must detect the failure that occurs
2.) The faulty system must be able to safely switch to the static switch built into the system
3.) The faulty system must disconnect the faulty component from the output power bus
4.) The backup power system must be able to support full load operation immediately upon request
In addition, if a system with a series-parallel combination deployment architecture is adopted, the user also needs to bear the operation and maintenance costs of the no-load UPS.
In general, fully redundant parallel architectures are more reliable, but this also depends on how they are implemented. Some UPSs claim to have a parallel architecture, but in reality only a limited number of components are paralleled. That is, although the system can provide some redundancy in the event of a similar component failure, there are no independent subsystems in the system. Once a subsystem fails, the entire UPS needs to be shut down for maintenance.
Figure 11: Parallel architecture with partial built-in redundancy
Other UPS designs include UPSs with independent subsystems and UPSs with point-to-point parallel capability, which means that the UPS itself is controlled rather than using the main controller, which gives the UPS the highest level of reliability. The parallel architecture is designed to eliminate as many single points of failure as possible without increasing the complexity of the design. Therefore, the parallel architecture can use independent subsystems and point-to-point control, providing the highest reliability system design with the fewest points of failure.
Figure 12: Parallel redundant architecture with point-to-point control and independent subsystems per UPS
Of course, a parallel redundant UPS configuration with more components and connection points has more potential points of failure and therefore a shorter MTBF. As a result, IT managers often assume that if the number of UPSs in a parallel architecture is smaller, the reliability of the system will be higher. While adding components to a UPS architecture will eventually reach a point where the returns diminish, a carefully designed system with more UPSs will necessarily provide higher availability than a system with fewer UPSs.
To illustrate this, we assume that two sample system architectures with parallel redundant designs provide protection for a 60 kW load. The first architecture consists of 2 conventional 60 kW UPSs, the second uses 6 12 kW UPSs constructed from modular standard components.
Now let’s assume how this would affect both configurations in the event of a hardware failure:
· Load protection architectures with two 60 kW UPS may only be serviced by trained professionals. Even if professional maintenance personnel can promise to be on site within 4 hours, the total time taken for system downtime will likely be 6-8 hours. And, if maintenance crews don’t carry parts that need to be replaced, downtime can be extended to 24 hours. During this period, the risk index for IT equipment will be high due to the lack of UPS redundancy.
Contrasting with a system using six 12 kW UPS, which uses hot-swappable Electronic components and battery modules, end users can replace faulty components themselves within minutes, assuming they have replaceable parts on hand.
Figure 13: Two system architectures using parallel redundancy to provide power protection for 60 kW loads
Battery considerations provide further evidence. General UPS battery life is 4 years. Thus, a system architecture with a 60 kW UPS configuration may fail to provide redundancy for at least 6 hours every four years due to battery-related issues. But for a system architecture with a 12 kW UPS configuration, it may fail to provide redundancy for only about 1 hour every four years.
This is true for batteries, and the same is true for electro-mechanical components such as fans and capacitors, which are generally wear parts or consumables within the UPS. UPS products designed with hot-pluggable components rarely experience downtime. Therefore, even though a system architecture with 6 12 kW UPS configuration has a lower part failure MTBF than a system configuration with 2 60 kW UPS configuration, its MTTR is also relatively shorter, so the final availability is still relatively better.
How Batteries Affect Reliability
The UPS’s design philosophy determines how often it uses battery power, which in turn is directly related to the battery’s runtime and service life.
A backup UPS frequently switches to battery-powered mode, which reduces battery run time and shortens its lifespan. Moreover, there will be brief outages during the frequent switching of power supply modes, which may shut down the IT system. At the same time, the output voltage adjustment range is wide, which will cause the IT power supply to shut down.
A line-interactive UPS provides better protection against power failures than a backup UPS, but must rely on batteries for power supply when switching between normal and regulating modes or to deal with voltage instability when the engine is started.
The battery usage of a double-conversion UPS is more modest. Over a wide input voltage tolerance range, the UPS rectifier and inverter work together to regulate the output voltage without the need for battery power. In addition, the switching time from normal power supply mode to battery power mode is very short, so there is no need to worry about the interruption of power supply to the IT system.
Newer high-efficiency double-conversion UPSs with multiple operating modes use batteries for a similar time and frequency as double-conversion UPSs, and in some cases less. Moreover, these UPSs can be as high as 99% efficient in normal operation. Higher efficiency equates to longer battery runtime and cooler operation, both of which help extend battery life.
Figure 14: Standard power usage patterns for different UPS designs
Summary: Six Key Steps to Maximizing Power System Availability
1. Standardized design of high-quality UPS: choose industry manufacturers with outstanding qualifications and many successful cases. The UPS design should include built-in redundancy for critical components, use multiple power paths, use components with superior performance, and maintain strict quality control during production.
2. Choose a UPS with multiple power paths built in: A good UPS design should provide multiple power paths for additional redundancy, including static bypass switches, manual maintenance bypass, or automatic maintenance bypass.
3. Find a UPS that meets your IT equipment needs: Some UPSs are inexpensive but cannot properly support electrical loads, which can result in IT equipment being reset, data corrupted, and even equipment shutting down. A high-efficiency double-conversion UPS with multiple operating modes can clean the power well within the voltage and frequency range allowed by IT equipment and industrial equipment.
4. Deploy redundant parallel UPSs: This allows for redundant power paths, electronic components, and battery modules to provide the highest reliability protection.
5. Focus on features that can shorten MTTR: Choose a modular system design, and the UPS should use components that are easy to service, such as hot-swappable batteries and electronic components. Fundamentally, MTTR has a greater impact on availability than MTBF.
6. Choose a UPS with the least possibility of using batteries: A UPS that uses battery power frequently will reduce the operating time and service life of the battery. A high-efficiency double-conversion UPS with multiple operating modes is less likely to use batteries, helping to extend battery life. ■