Compression, Encryption and Hashing | SPCC Computer Science

Compression

Compression is an algorithm that reduces file storage size. Users often compress files to download, email or upload them as a smaller file size requires lower transmission bandwidth. However, this means that accuracy with which it represents data is reduced and information is lost in the process. Thus it should only be used where this loss of detail is unlikely to be noticed or not important.

Methods of Compression: Lossy vs Lossless

Lossy takes away some of the information from the original. Often where it is not noticeable. For example reducing the colour depth (the number of colours used to represent the image). Whereas lossless preserves all the information from the original where any loss is unacceptable. For example a text file or an executable.

Lossless Compression: Run Length Encoding

This is ideal for compressing bitmap images. Each set of consecutive symbols can be represented by the symbol and its number of occurrences e.g. AAFFFCCCCC could be represented as 2A3F5C.

The heart image here can be coded by stating the colour (W for white, B for black) of the space and the number of the following spaces of that colour.

Since RLE looks for repeated symbols, colours, it is unsuitable when there are little or no repetition. It’s ideal for vector style images, not photographs.

Lossless Compression: Dictionary Encoding

This is used for text documents. In dictionary encoding frequently occurring pieces of data are substituted by symbols or tokens. A dictionary is then used to map the symbols etc. to the groups of data. So to decompress back to the original, the symbols etc. are looked up in the dictionary and replaced with the groups of data.

Encryption and Decryption

Encryption is encoding a document (, directory or entire disk!) to make it difficult to read. This has been done since the dawn of civilisations that could write. Encryption using one key is called symmetric encryption. This is easier to hack as you just need the one key. Sensitive data therefore uses asymmetric encryption.

Keys are long often hexadecimal alphanumeric strings.

The number of combinations of this string are so great that it is not possible. It would take billions of years to crack with brute force and our current computing power.

Uses of Asymmetric Keys

This uses two keys one to encrypt and one to decrypt at the other end. The key pairs are generated to work together. One key is public and the other is private. Anyone can access the public key. It is often stored in a key safe on the cloud.

Public keys are exchanged between sender and receiver. The sender uses the receivers public key to encrypt. They use their own private keys to decrypt.

You can even use the public key to encrypt and the private key to decrypt. Because the receiver will still know the sender was authentic if the private key can be used to decrypt.

Or you can use a combined or your own private key and the receivers public key.

Uses of Hashing

A hashing function transforms a string of characters into a fixed length value or key. It is a one way process unlike encryption. Even with the original hashing function you can’t reverse back to plain text.

This works for data such as passwords because you can compare hashed values without knowing the actual password. If both hash to the same value they must have been the same password.

Hash Tables