Click here to learn
about this Sponsor:
Home  |  News  |  Articles  |  Polls  |  Forum

Keywords: Match:
Desirable properties of real-time OSes (Part B)
(continuation)

B. DESIRABLE PROPERTIES

As usual, there are conflicting desires, at least they conflict given the current state of the art. These desires fall into the following categories:
  1. Quality of service

  2. Amount of code that must be inspected to assure quality of service

  3. API provided

  4. Relative complexity of OS and applications

  5. Fault isolation: what non-RT failures endanger RT code?

  6. What hardware and software configurations are supported?

Each of these categories is expanded upon below, and later used to compare a number of proposed realtime approaches for Linux. The discussion does go for some time, which is not surprising given that it is summarizing many hundreds of email messages. ;-)

  1. Quality of Service

    The traditional view is that the entire operating system is either hard realtime, soft realtime, or non-realtime, but this viewpoint is too coarse grained. Different workloads have different needs, and there is disagreement over the exact definitions of these three categories of realtime. For example, (at least) the following two definitions of "hard realtime" are in use:

    1. In absence of hardware failures, software provably meets the specified deadlines. This is fine and good, but many applications simply do not need this "diamond hard" realtime.

    2. Failure to meet the specified deadline results in application failure. This is OK, but -only- if there is a corresponding required probability of success. Otherwise, one could claim "hard realtime" by simply failing the application every time it tries to do anything, which is clearly not useful.

    A better approach is to simply specified the required probability of meeting the specified deadline in absence of hardware failure. A probability of 1.0 is consistent with definition (a). Other applications will be satisfied with a probability such as 0.999999, which might be sufficiently high that the probability of software scheduling failure is "in the noise" compared with the probability of hardware failure. A recent LKML thread called this "metal hard" realtime. Or was it "ruby hard"? ;-) Of course, one can increase the reliability of hardware through redundancy, but no hardware configuration provides perfect reliability. For example, clusters can increase reliability, so that the probability of failure of the cluster is p^n, where "p" is the probability of a single node failing and "n" is the number of nodes. Note that this expression never reaches a probability of 1, no matter how large "n" is. In addition, this mathematical expression assumes that the failover software is perfectly reliable and perfectly configured. This assumption conflicts sharply with my own experience, in which there has always been a point beyond which adding nodes -decreased- cluster reliability.

    The timeframe is also critically important. Any system can provide hard realtime guarantees if the deadline is an infinite amount of time in the future. No computer system that I am aware of at this writing is capable of meeting a 1-picosecond scheduling deadline for any task of non-zero duration, but then neither can dedicated digital hardware. Some applications have definite response-time goals, for example, industrial process-control applications tend to have response-time goals ranging from 100s of microseconds to small numbers of seconds. Other applications can benefit from any improvement in response-time goals -- faster is better, think in terms of Doom players -- but even in these cases there is normally a point of diminishing returns.

    The services used by the realtime application also figure in. Given current disk technology, it is not possible to meet a 100-microsecond deadline for a 1MB synchronous write to disk. Not even if you cheat and supply the disk with a battery-backed-up DRAM. However, many realtime applications need only a few of the services that an operating system might provide. This list might include interrupt handling, process scheduling, disk I/O, network I/O, process creation/destruction, VM operations, and so on. Keep in mind that many popular RTOSes provide very little in the way of services! They frequently leave the complex stuff (e.g., web serving) to general-purpose operating systems.

    Note that each service can have an associated deadline that it can meet. The interrupt system might be able to meet a 1-microsecond deadline, the real-time process scheduler a 10-microsecond deadline, the disk I/O system a 10-millisecond deadline for moderate-sized I/Os, and so on. The deadline that a service can meet might also depend on the parameters, so that the disk-I/O system would be expected to take longer for larger I/Os.

    Furthermore, the probability might vary from service to service or with the parameters to that service. For example, the probability of network I/O completing successfully in minimal time might well be a function of the number of packets transmitted (to account for the probability of packet loss) as well as of packet size (to account for bit-error rate). To make things even more complicated, the probability of meeting the deadline will vary depending on the length of time allowed. Considering the networking example, a very short deadline might not allow the data transmission to complete, even if it proceeds at wire speed. A longer deadline might allow transmission to complete, but only if there are no transmission errors. An even longer deadline might allow time for a limited number of retransmissions, in order to recover from packet loss due to transmission errors. Of course, a deadline infinitely far into the future would allow guaranteed completion, but I for one am not that patient.

    Finally, the performance and scalability of both realtime and non-realtime applications running on the system can be important. Given the current state of the art, one must pay a performance penalty for realtime support, but the smaller the penalty, the better.

    So, to sum up, here are the components of a quality-of-service metric for realtime OSes:

    1. List of services for which realtime response is supported.

    2. For each service:

      1. Probability of missing a deadline due to software, ranging from 0 to 1, with the value of 1 corresponding to the hardest possible hard realtime.

      2. Allowable deadline, measured from the time that the request is initiated to the time by which the response must be received.

    3. Performance and scalability provided to both realtime and non-realtime applications.

  2. Amount of Code Inspection Required

    So you add a new feature to a realtime operating system. How much of the rest of the system must you inspect and understand in order to be able to guarantee that your new feature provides the required level of realtime response? The smaller this amount of code, the easier it is to add new features and fix bugs, and the greater the number of people who will be able to contribute to the project. In addition, the smaller the amount of such code, the smaller the probability that some well-intentioned bug fix will break realtime response.

    Each of the following categories of code might need to be inspected:

    1. The low-level interrupt-handing code.

    2. The realtime process scheduler.

    3. Any code that disables interrupts.

    4. Any code that disables preemption.

    5. Any code that holds a lock, mutex, semaphore, or other resource that is needed by the code implementing your new feature.

    Of course, use of automated tools could make such inspection much more reliable and less onerous, but such tools would need to deal with the very large number of CPU architectures and configuration options that Linux supports. The smaller the amount of code that must be inspected, the less chance there is that such a tool will fall victim to configuration-architecture combinatorial explosion.

    Each of Linux realtime approaches uses a different strategy to minimize the amount of code in these categories. These differences are surprisingly important, and will be discussed in more detail when going over the various approaches to Linux realtime.

  3. API Provided

    I never have learned to -really- like the POSIX API, with the gets() primitive being a particular cause of heartburn, but given the huge amount of software out there that relies on it and the equally huge number of developers who are familiar with it, one should certainly strive to provide it, or at least a sizeable subset of it.

    Other popular APIs include the various Java runtime environments, and of course the feared and loathed, but quite ubiquitous, Windows API.

    There are a lot of developers and a lot of software out there. The more of these existing developers and software your API supports, the more successful your realtime facility is likely to be.

  4. Relative Complexity

    How much realtime capability should be added to the operating system? How much of this burden should the applications take on? Is it better to push some of the complexity into a nanokernel, hypervisor, or other software or firmware layer? Let's first look at the tradeoff between OS and application.

    For example, although it is certainly possible to program for separate realtime and non-realtime operating-system instances, doing so adds complexity to the application. Complexity is particularly deadly in the hard realtime arena, and can be literally so if human lives are at risk.

    Balancing this consideration is the need for simplicity in the operating-system kernel. This balancing act must be carefully considered, taking both the relative complexities and the number of uses into account. Some would argue that it is worthwhile adding 1,000 lines to the OS if that saves 100 lines in each of 1,000 applications. Others would disagree, perhaps citing the greater fault isolation that might be provided by the separation.

    But this balance clearly must be struck somewhere between writing the application to bare metal on the one hand (but achieving a perfectly simple zero-size operating system) and bloating the operating system beyond the limits of maintainability on the other hand.

    Similar arguments can be made for moving some functionality into a hypervisor or nanokernel layer, though fault isolation also comes into play here.

    Many of the most vociferous arguments seem to revolve around this complexity issue.

  5. Fault Isolation

    Can a programming error in a non-realtime application or in a non-realtime portion of the OS harm a realtime application?

    Some applications do not care: in these cases, a failure anywhere causes a user-visible failure, so it is not important to isolate faults. Of course, even in these cases, it may be valuable to isolate faults in order to aid debugging, but, other than that, the fault isolation does not help overall application reliability.

    In other cases, the realtime portion of the application is protecting someone's life and limb, but the non-realtime portion is only compiling statistics and reports. In this case, fault isolation can be of the utmost importance.

    What sorts of faults need isolating?

    • Excessive disabling of interrupts

    • Excessive disabling of preemption

    • Holding a lock, mutex, or semaphore for too long, when that resource must be acquired by realtime code

    • Memory corruption, either via wild pointers or via wild DMA

    These faults might occur in the main kernel, in a loadable module, or in some debugging tool, such as a kprobe procedure or a kernel-debugger breakpoint script. Though in the latter case, perhaps realtime deadlines should not be guaranteed when actively debugging. After all, straightforward debugging techniques, such as use of kprint(), can cause response-time problems even in non-realtime environments.

  6. Hardware and Software Configurations

    Is SMP required? If so, how many CPUs? How many tasks? How many disks? How many HBAs?

    If all the code in the kernel were O(1), it might not matter, but the Linux kernel has not yet reached this goal. Therefore, some applications may choose to restrict the software or the hardware configuration of the platform in order to meet the realtime deadlines. This approach is consistent with traditional RTOS methodology -- RTOS vendors have been known to restrict the configurations in which they will support hard realtime guarantees.

Continue HERE . . .


Story Navigation

  1. INTRODUCTION
  2. DESIRABLE PROPERTIES
  3. LINUX REALTIME APPROACHES
  4. SUMMARY

(Click here for further information)


7 Advantages of D2D Backup
For decades, tape has been the backup medium of choice. But, now, disk-to-disk (D2D) backup is gaining in favor. Learn why you should make the move in this whitepaper.

4 Legal Reasons to Control Internet Access
The Internet is obviously a valuable resource for many organizations. However, many are exposed to legal liability concerns because they fail to control Internet access. Learn if you're safe in this white paper.

Rapidly Resolve J2EE Application Problems
Whether you are in the process of building J2EE applications or have J2EE applications already running in production, you must ensure that they deliver the expected ROI. Learn how in this white paper.

Load Testing 2.0 for Web 2.0
There are many unknowns in stress testing Web 2.0 applications. Find out how to test the performance of Web 2.0 in this white paper.

Build Better Games Online
For the game infrastructure providers, life is complex. Making money from games has become more complicated. Why? Find out in this white paper.

Building a Virtual Infrastructure from Servers to Storage
This white paper discusses the virtual storage solutions that reduce cost, increase storage utilization, and address the challenges of backing up and restoring Server environments.

Gaining Faster Wireless Connections with WiMAX
Welcome to what is quickly becoming the hyperconnected world where anything that would benefit from being connected to the network will be connected. Learn more in this white paper.

Is Your Desktop a Security Threat?
The new wave of sophisticated crimeware not only targets specific companies, but also targets desktops and laptops as backdoor entryways into those business’ operations and resources. Learn how to stay safe in this white paper.

Increasing SAN Reliability by 100 Percent
Storage area networks (SAN) are a strong part of storage plans. Learn how to increase your reliability and uptime by 100 percent in this case study.

 


Got a HOT tip?   please tell us!
Free weekly newsletter
Enter your email...
Click here for a profile of each sponsor:
PLATINUM SPONSORS
GOLD SPONSORS
(Become a sponsor)

ADVERTISEMENT
(Advertise here)

Check out the latest Linux powered...

mobile phones!

other cool
gadgets



BREAKING NEWS

• Linux video camera geo-tags, writes to SATA drives
• Garmin Nav devices run Gnome Linux
• Ten LiMo phones this month?
• It's a Yankee Doodle Linux phone
• Wind River to host "Developer Day"
• Dev boards gain Linux support
• 802.11n zooms ahead
• Low-power mini-ITX board runs Linux
• Pico-ITX board bears twins
• Mass-market WiFi router invites Linux hackers
• LiMo phone specialist buys app stack
• "PDA phone" runs Linux
• ST, NXP spin phone chip JV
• Military-grade USB key supports Linux
• USB Linux systems expand


Most popular stories -- past 30 days:
• World's cheapest Linux-based laptop?
• Ubuntu ported to a PDA
• 64-way chip gains Linux IDE, dev cards, design wins
• Embedded PowerPC dev kits come with Linux
• Rapid time-to-evaluation -- a key goal for silicon providers
• Embedded Linux is doomed. DOOOMED!
• Rugged PDA available with Linux
• Netflix Player runs Linux
• Miniature Linux PC targets military apps
• $7 SoC runs Linux
• Android Developer Challenge announces first-round winners
• Dual-core ARM SoC clocks to 1.2GHz


Linux-Watch headlines:
• Microsoft tactics push India toward Linux
• Bell, SuperMicro sued over GPL
• "Business intelligence" software goes GPL
• Will Atom bomb?
• LF Summit videos posted
• Linux gains "embedded" maintainers
• Virtualization on tap in SLES and RHEL upgrades
• Linux gets security black eye
• Verizon chooses Linux "platform of choice"
• Hats off to Fedora 9


Also visit our sister site:


Sign up for LinuxDevices.com's...

news feed

Home  |  News  |  Articles  |  Polls  |  Forum  |  About  |  Contact
 

Ziff Davis Enterprise Home | Contact Us | Advertise | Link to Us | Reprints | Magazine Subscriptions | Newsletters
Tech RSS Feeds | White Papers | ROI Calculators | Tech Podcasts | Tech Video | VARs | Channel News

Baseline | Careers | Channel Insider | CIO Insight | DesktopLinux | DeviceForge | DevSource | eSeminars |
eWEEK | Enterprise Network Security | LinuxDevices | Linux Watch | Microsoft Watch | Mid-market | Networking | PDF Zone |
Publish | Security IT Hub | Strategic Partner | Web Buyer's Guide | Windows for Devices

Developer Shed | Dev Shed | ASP Free | Dev Articles | Dev Hardware | SEO Chat | Tutorialized | Scripts |
Code Walkers | Web Hosters | Dev Mechanic | Dev Archives | igrep

Use of this site is governed by our Terms of Service and Privacy Policy. Except where otherwise specified, the contents of this site are copyright © 1999-2008 Ziff Davis Enterprise Holdings Inc. All Rights Reserved. Reproduction in whole or in part in any form or medium without express written permission of Ziff Davis Enterprise is prohibited. Linux is a registered trademark of Linus Torvalds. All other marks are the property of their respective owners.