Data Storage
Data Storage
• Data is stored in a computer system to be accessed by the processor.
• It is a process that allows the computer system to retain information temporarily or permanently.
• This data is usually in the form of optical or electromagnetic form.
Types of Data Storages:
• There are two types of data storages i.e. primary and secondary.
• The primary storage retains data in RAM (Random Access Memory), ROM(Read Only Memory), or L1 & L2 cache.
• The secondary storage stores data in hard disks, RAID (Redundant Array of Independent Disks Systems), Zip drivers, etc.
• Primary storage is faster to access whereas secondary storage can store more data.
• Primary storage is also known as Main Storage whereas Secondary Storage is also known as Auxilary Storage.
File Compression:
• It is a process that allows you to package a single file or multiple files to use less disk space.
• There are two types of file compression:
1. Lossy file compression
2. Lossless file compression
Lossless File Compression:
• This file compression allows the original file to be reconstructed when uncompressed.
• It is best for file formats where data loss can damage the information. E.g. account statements, attendance spreadsheets, etc.
Lossy File Compression:
• In contrary to lossless, lossy compression removes the unnecessary data to compress the files.
• The original file cannot be reconstructed.
• It is used where the quality degradation cannot harm the information e.g. MP3 and JPEG.
File Formats:
• In computer systems, there are various types of file formats. Following are the ones we will discuss in detail:
- MP3
- MP4
- MIDI
- Jpeg
- Text and numbers
MP3:
• MP3 is a technology that compresses music files.
• It is also known as audio compression.
• It compresses a typical music file by 90%.
• E.g. A100 MB music file can be converted into an MP3 file format with a size of 10 MB.
• These types of files can be used in cellphones, computers or MP3 players.
• The music files are compressed using a technology known as ‘Perceptual Music Shaping’.
• This technology removes the sounds that the human ear cannot hear meaning that compression is done by removing some part of the music without affecting the overall quality of music.
• It uses a Lossy Format for compression.
MP4:
• The MP4 format contrary to the MP3 format allows the storage of not only music but also the storage of videos, animation, photos, etc.
• Using this format, videos can be streamed over the internet without compromising the quality.
JPEG:
• JPEG stands for Joint Photograph Experts Group.
• JPEG is an image file format that changes the image resolution i.e. pixels per centimeter to store the image file.
• When the image file is compressed its size is reduced and quality takes the toll for it.
• Since JPEG, reduces the file size by losing the quality it is also an example of the lossy format of compression.
• The original quality cannot be reconstructed once the file is compressed.
MIDI (Musical Instrument Digital Interface):
• It is a standard that allows sound to be represented in binary format.
• It stores the sound description, not the sound itself.
• It stores a series of control messages containing sound events e.g. pitch, volume, and duration.
• When these control messages are received by the MIDI-compatible device the messages are interpreted and reproduced.
• The MIDI data can also be compressed however it does not need any special compression algorithm.
Text & Numbers:
• Text and numbers can be stored in various formats.
• Typically, the text is stored in ASCII.
• However, numbers can be stored in different number formats. E.g. real numbers, date, time, integers, currency, etc.
• The files containing numbers undergo a lossless format of compression since this type of data cannot be compromised.
• The text format can also be compressed and uses a complex algorithm that uses redundancy.
• The compression of text is also lossless.
Error Checking Methods
Introduction:
• When you transmit data, there is always a risk for data corruption i.e. caused due to fault in communication equipment, noise, etc.
• In compressed data, the risk of loss of information increases since redundancy has already reduced to a minimum to reduce the file size.
• Therefore, error control measures are taken to make sure the data that is transferred through communication channels is error-free.
• These error control measures usually contain error detection and correction.
• Error detection detects the errors in the data or message while error correction is the process of reconstruction of the original data.
Error Detection & Correction Methods:
1. Parity
2. Checksum
3. Check Digit
4. Automatic Repeat Request (ARQ)
1. Parity:
• In this error detection method, a parity bit is added to the original message.
• Systems that use even parity counts the occurrences of 1s; adds a 0 parity bit if the count is already even and adds a 1 parity bit to make the occurrence of 1s even if it is not even already.
• In an odd parity system, the number of 1s occurrences needs to odd including the parity bit.
Example 1:
Consider the byte 1101100
• If this byte is using an even parity system, then the parity bit needs to be ‘0’ since the number of occurrences of 1s is already even.
• However, if it is using an odd parity system then the parity bit needs to be ‘1’ to make the number of occurrences of 1s odd.
Example 2:
Now consider the following example bytes and identify the parity system used each one of them.
• In this byte, the parity system used is odd since the number of occurrences of 1s is odd.
• In this byte, the parity system used is even.
Example 3:
Consider an example, in which even parity (vertical parity check) system is used to transmit 9 bytes of data. The following table shows the data at the receiving end.
• If this table is studied properly then it can be seen that:
• Row 8 has incorrect parity i.e. the number of occurrences of 1s is not even so the parity should have been 1.
• Column 5 also has an odd number of occurrences of 1s and the parity bit is wrong.
• This information reveals that error has occurred at the intersection of column 5 and row 8.
• And byte 8 should have been this:
Shortcoming of Parity:
• If more than 1 bit of a byte was replaced during transmission, then it would have been impossible to detect the error.
• Suppose using even parity system, the following byte has been sent:
• This byte could have received like this:
• Or like this:
• In both situations, it would not have triggered the error since the number of occurrences of 1s has remained even.
2. Checksum:
• It is an error detection method that sends an additional value with the original data.
• This additional value is known as the checksum.
• It is a fixed-length modular arithmetic sum of the message. E.g. a byte.
• This sum can be negated by a 1s complement operation before sending the data stream or message to detect errors in the message.
• To understand how it works, assume the checksum is 1 byte in length i.e. the max value can be 28 - 1 = 255.
<= 255:
• If the sum of all the bytes transferred is less than or equal to 255 then checksum will be this value 28 - 1 = 255.
>255:
• If the sum of all the bytes transferred is greater than 255 then checksum will be calculated using the following method.
Example 1:
Suppose the sum of the bytes is 1185.
• Since it is greater than 255 therefore, we will use the second method.
• First, 1185 will be divided by 256. i.e. 1185/256 = 4.496
• Round this value to the nearest whole number i.e. 4.496 rounds off to 4
• Multiply the rounded value to 256 i.e. 4 * 256 = 1024
• Calculate the difference i.e. 1185 – 1024 = 127 checksum
Note:
• When data is to be transmitted, its checksum is calculated and attached to the original message before the transmission.
• At the receiving end, the checksum of the received block is again calculated and compared with the transmitted checksum.
• If both checksums are the same, then the data is error-free.
3. Check Digit:
• It is an error detection system in which an additional number is added to the series (e.g. account no. etc.) to check the accuracy.
• This number is usually derived from the original series of numbers.
• For example, consider a number 232, the sum of these three digits (2+3+2=7) can be added as the last digit to the original series i.e. 2327.
Example 1:
Consider an ISBN-10 number 0 - 2 0 1 - 5 3 0 8 2 - X that is typically used on books that use the module 11 system (X inclusive).
• To calculate the value of X, first, we need to find out the placement of each digit.
• Multiply each digit with its position,
(0x10) + (2x9) + (0x8) + (1x7) + (5x6) + (3x5) + (0x4) + (8x3) + (2x2)= 0 + 18 + 0 + 7 + 30 + 15 + 0 + 24 + 4= 98 |
• Divide the total with 11,
|
• Check the difference, i.e. subtract X placement from the remainder,
|
• This value is your check digit and the final ISBN becomes,
4. Automatic Repeat Request (ARQ):
• This error detection method uses acknowledgment and timeout.
• An acknowledgment is a message specifying correct data has been received and i.e. sent by the receiver.
• A Timeout is a deadline or defined time, or time elapsed before the receiving of the acknowledgment.
• If the acknowledgment is not sent by the receiver before timeout then the message will be sent again automatically.