Data Compression and Implementations


In group coding, the correspondence between each bit pattern in the input data and the flux transitions represented by it is independent of all the other bit patterns in the data stream. Each pattern directly corresponds to its own pattern of flux transitions. The group coding system can mindlessly match bit pattern for flux pattern to achieve the optimal storage density of each byte of data.

But group coding does not represent the most efficient way of squeezing information into a storage medium. Many of the bytes in a stream of data are redundant. Their information content could be represented in some other manner using many fewer bytes. Group coding seeks to represent the stream of individual bytes without regard to content, simply ensuring that each pattern can be faithfully reproduced. It does nothing to guarantee that the actual information is encoded in the data stream as efficiently as possible.

In contrast to the local view taken by the group-coding mechanism, data compression systems take a global view. By examining the patterns of bytes rather than the bit patterns inside each byte, the compression system seeks to find patterns that can be more efficiently represented. The goal of the data compression system is to eliminate redundancy, separating the bulk from the content. In effect, the compression system squeezes the air out of the data stream. Data compression can reduce fat files into their slimmest possible representation which can later, through a decompression process, be reconstituted into their original form.

Most compression systems work by reducing recurrent patterns in the data stream into short tokens. For example, the two-byte pattern at could be coded as a single byte such as @, cutting the storage requirement in half. Most compression systems don't permanently assign tokens to bit patterns but instead make the assignments on the fly. They work on individual blocks of data one at a time, starting afresh with each block. Consequently, the patterns stored by the tokens of one block may be entirely different from those used in the next block. The key to decoding the patterns from the tokens is included as part of the data stream.

Disk compression systems put data compression technology to work by increasing the apparent capacity of your disk drives. Generally, they work by creating a virtual drive with expanded capacity with which you can work as if it were a normal (but larger) disk drive. The compression system automatically takes care of compressing and decompressing your data as you work with it. The information is stored in compressed form on your physical disk drive, which is hidden from you.

The compression ratio compares the resultant storage requirements to those required by the uncompressed data. For example, a compression ratio of 90 percent would reduce storage requirements by 90 percent. The compressed data could be stored in 10 percent of the space required by its original form. Most data compression systems achieve about a 50 percent compression ratio on the mix of data found that most people use.

Because the compression ratio varies with the kind of data you store, the ultimate capacity of a disk that uses compression is impossible to predict. The available capacity reported by DOS on a compressed drive is only an estimate based on the assumed compression ratio of the system. You can change this assumption to increase the reported remaining capacity of your disk drive, but the actual remaining capacity (which depends on the data you store, not the assumption) will not change.

Compression Implementations

Compression is a data transformation much like all the other manipulations made by a microprocessor. Consequently, an ordinary software program can convert your PC's microprocessor into an excellent data compressor.

Such software-only compression systems like that built into MS DOS Versions 6.0 and later (and upgrade PC DOS Versions 6.1 and later) can be used to increase disk or tape capacity. Some software compression systems work as software drivers. They intercept the data stream headed for your hard disk, reroute it through a compression algorithm run by your PC's microprocessor, and pass the result to your disk instead of the original data. When the compressed data is later read, the compressed data stream is captured, its bytes processed by a complementary decompression algorithm, and the results passed to your application software.

With older commercial, software-only disk compression systems, these software drivers loaded through your system's CONFIG.SYS and AUTOEXEC.BAT files. This arrangement made the operation of the compression system and your PC confusing. The designers of the compression systems tried to make their disk compression invisible. Because of the way that DOS was designed, the only way the disk compression device driver could create a compressed drive was to give it a new name (drive letter). The disk compression software automatically switched the letter of the compressed drive with the letter assigned to your physical boot drive. The larger capacity compressed drive thus appeared to be drive C: (your boot drive), but your boot drive was actually hidden under some other name. Your CONFIG.SYS and AUTOEXEC.BAT files had to remain uncompressed for DOS to read and load them so that it could load the drivers to read the compressed parts of your disk drives. Although you expected to find these files on your boot drive (nominally drive C:), the files were really on the hidden physical drive.

The real solution was to change the structure of DOS, which Microsoft did with Version 6.0. Before MS DOS 6.0 reads your PC's CONFIG.SYS file, it checks for another configuration file, DRVSPACE.BIN, which holds the disk compression driver. If present, this driver gets loaded first, so the compression system is operational even before DOS reads your CONFIG.SYS file. As a result, your CONFIG.SYS is stored on the virtual compressed drive in compressed format, and you access it the same way you would any file. (Your physical drive is still renamed something else, usually Drive H:, but you never need to access it.)

The advantage of software-only compression is that you pay for nothing other than the program-nothing if you rely on a recent version of DOS-yet you almost miraculously get two times more storage space. As with any software, however, software-only disk compression imposes additional system overhead. In older PCs with 386 and earlier microprocessors, this overhead can be sufficient to slow disk response. You can take a load off your older microprocessor hardware-based compression using a compression coprocessor board. The coprocessor substitutes its power for that of your microprocessor in compression operations, eliminating the performance handicap of software compression.

Lossless Versus Lossy Compression

Most compression systems assume that you want to get back every byte and every bit that you store. You don't want numbers disappearing from your spreadsheets or commands from your programs. You assume that decompressing the compressed data will yield everything you started with-without losing a bit. The processes that deliver that result are called lossless compression systems.

Sometimes, however, your data may contain more detail than you need. For example, you might scan a photo with a true-color 24-bit scanner and display it on an ordinary VGA system with a color range of only 256 hues. All the precise color information in your scan is wasted on your display, and the substantial disk space you use for storing it could be put to better use.

Analog images converted to digital form and analog audio recordings that are digitized often contain subtle nuances beyond the perception of most people. Some data reduction schemes called lossy compression systems ignore these fine nuances. Although the reconstituted data does not exactly replicate the original, for viewing or listening to the restored data, lossy compression is often good enough. Because lossy compression systems work faster than lossless schemes and because their resulting compression ratios are higher, they are often used in time- and space-sensitive applications-digital image and sound storage.

Legal Disclaimer

Our website is not responsible for the information contained by this article. Webworldarticles.com is a free articles resource thus practically any visitor can submit an article. However if you notice any copyrighted material, please contact us and we will remove the article(s) in discussion right away.


This article was sent to us by: Ralph Kitzper at 06012010

Related Articles

1. Convert All Audio Formats to Your MP3 Player
MP3 player, is an electronic device that has the primary function of storing, organizing and playing audio files. MP3 player makes it possible for you to enjoy your music a...

2. Email Archiving in Outlook: Merge PST Archive Files
Email Archiving in Outlook – Wonderful Feature: An archive file in Microsoft Outlook emailing application is a PST file that has the collected email...

3. How to Enjoy Bluray Movies on Windows Movie Player
Have you ever heard about the latest Blu-ray movies, such as Avatar, The Dark Knight, Alice in Wonderland? Do you want to enjoy the perfect effect it brings to you? However...

4. Best Bluray Solution for Windows Movie Maker Users
Recently Microsoft released Windows Movie Maker 2.6 for vista users. But you still cannot edit your latest Blu-ray movies like Avatar, The Dark Knight, Alice in Wonderland ...

5. Hardware Compression
Some hard disk manufacturers have investigated device-level compression into some of their products. This technology moves the compression coprocessor from an expansion b...

6. File Compression and Archiving Systems
Compression has proven to be such a valuable technology that it is used in other ways besides increasing disk storage. For example, advanced modem protocols often includ...

7. New Technology File System (NTFS)
Windows NT and newer versions offer two choices for file system, the same old FAT-based system used since time began, as enhanced for Window 95, and its own Windows NT Fi...

8. JPEG Baseline Compression
The starting point for the JPEG compression process is a baseline compression algorithm that's defined by the standard. All JPEG systems must be able to handle t...

9. About JPEG
Today's top choice for still image compression is JPEG. Just as a celebrity is someone who is well known for being well known, JPEG is popular because of its popularity....

10. MPEG Standards
The MPEG includes multiple standards for encoding not only video but also the accompanying audio. Over the years it has progressed through several levels with increasing...