Cryptanalysis

Cryptanalysis is the art and science of defeating the methods devised by cryptography; the goal is to find some weakness or insecurity in a cryptographic scheme, thus permitting its subversion. Cryptology is the overall discipline encompassing both cryptography and cryptanalysis.

Cryptanalysis is often undertaken by a malicious attacker, attempting to subvert a system; it is an essential part of communications intelligence and of some attacks on computer security. Often, the attacker's goal is to read material which the cryptosystem's users wish to keep secret. For example, the British ULTRA project in World War II read many secret German messages. The goal may also be to defeat a cryptographic authentication mechanism. For example, an attacker who can defeat a bank's authentication can use someone else's account for fun and profit, and an attacker who can defeat email authentication might create a bogus but verifiable message that would hugely embarrass the putative sender.

Cryptanalysis is also routinely done by a system's designer, and by others, attempting to evaluate whether a system has vulnerabilities. This is a normal part of the design process in cryptography; publish your design and invite others to attack it. Nearly all cryptographic algorithms and protocols undergo this process and are carefully examined. Surviving the process can be taken as establishing the practical security of the system (at least, under clear -- and hopefully reasonable -- assumptions). See cryptography is difficult for further discussion.

It is a commonly held misconception that every encryption method can be broken. In connection with his WWII work at Bell Labs, Claude Shannon proved that the one-time pad cipher is unbreakable, provided the key material is truly random, never reused, kept secret from all possible attackers, and of equal or greater length than the message. That is, an enemy who intercepts an encrypted message has provably no better chance of guessing the contents than an enemy who only knows the length of the message.

Even two-time use of the keys can lead to compromise, as shown by the VENONA project that allowed cryptanalysis of Soviet espionage traffic, in which a one-time pad was used more than once.

Any cipher except a one-time pad can be broken with enough computational effort (by brute force attack if nothing else), but the amount of effort needed to break a cipher may be exponentially dependent on the key size, as compared to the effort needed to use the cipher. In such cases, effective security can still be achieved if some conditions (e.g., key size) are such that the effort ('work factor' in Shannon's terms) is beyond the ability of any adversary.

Non-mathematical methods
Before discussing classic cryptanalyis, be aware that mathematical cryptanalysis is by no means the only way to access protected content.

Details of such attacks and defenses against them are beyond our scope here; see information security.

Practical cryptanalysis
Practical cryptanalysis is a euphemism for using physical or social means to compromise the cryptosystem, such as clandestinely breaking into a communications center and copying the keys, or placing a hidden video camera in position to record passwords as they are typed in, or a host of other such methods.

Any of the techniques of espionage &mdash; bribery, coercion, blackmail, deception ... &mdash; may be used to obtain keys. One variant is referred to as rubber hose cryptanalysis &mdash; beating, torturing or threatening someone to get him or her to reveal keys. A very simple attack that can be effective is shoulder surfing, reading a computer password or bank PIN over someone's shoulder as it is typed. Social engineering, deceiving the personnel who work with cryptosystems into providing valuable information, may be the most productive attack of all.

An organisation can partly defend against these attacks with good security policies and procedures: limiting some information on a "need to know" basis, requiring two signatures for certain actions, et cetera, and systems can be designed to enforce some of these procedures. For details, see information security.

Traffic analysis
An attacker might also study the pattern and length of messages to derive valuable information; this is known as traffic analysis, and can be quite useful to an alert adversary. Encrypting messages does not prevent this; an enemy may be able to gain useful information from the timing, size, source and destination of traffic, even if he cannot read the message contents.

Computer vulnerabilities
For computer-based security systems, host security is a critical prerequisite. No computer-based cryptographic system can be secure if the underlying computer is not.

Even systems generally thought to be secure, such as IPsec or PGP, are trivially easy to subvert if the enemy has unfettered access to the machine they run on. For example, given access to a PGP user's machine, an attacker can copy the secret key file and install a keystroke logger that captures the PGP passphrase. Once he has the key file and the passphrase to unlock it, PGP security for that key is completely and permanently lost; the attacker can read any messages sent with that key &mdash; either new ones or older messages he has previously intercepted and archived &mdash; and can forge digital signatures using the key.

For some systems, host security may be an impossible goal. Consider a Digital Rights Management system whose design goal is to protect content against the owner of the computer or DVD player it runs on. If that owner has full control over his device then the goal is not achievable.

Network security is also an issue, since in many cases a miscreant can "sniff" valuable data off a network to which he has access, even if every machine on the net except his is secure. Such sniffing can sometimes bypass other security measures. For example, when a user accesses data from a server, even if both machines are secure and the server implements access controls for that data, an enemy may still get the data while it is in transit. As another example, even if an IPsec tunnel protects data as it travels between two offices, and the IPsec gateways are secure, an attacker with access to either office network can see unencrypted data.

For further discussion, see computer security.

Classifying attacks
There are a wide variety of cryptanalytic attacks, and they can be classified in any of several ways.

The attacker's objective
There are two main types of attack where the attacker must perform cryptanalysis; he has to defeat some cryptographic mechanism in order to conduct the attack.
 * In a passive attack, the attacker only eavesdrops, tries to read data without authorisation. Generally, this requires defeating a cipher; often the objective is to decrypt material without the key.
 * In an active attack, the attacker sends messages or alters existing messages in transit. Generally, this requires defeating some cryptographic authentication mechanism, and often a cipher as well.

Two other classes of attack can, in general, be conducted without cryptanalysis. For these attacks, the attacker can often achieve his goal &mdash; mess up communication or acquire some valuable information &mdash; without attacking the cryptography, Of course, if he does break the crypto, then he can do even more damage.
 * In a denial of service attack, the attacker attempts to disrupt communication. Some, but by no means all, such attacks require defeating a cryptographic authentication mechanism.
 * In traffic analysis the attacker attempts to infer useful information from the source, destination, timing and other characteristics of messages, without reading the actual content.

The attacker's methods
There are three passive attacks that will in theory break any cipher except a one-time pad; variants of these work for either block ciphers or stream ciphers:
 * brute force attack &mdash; try all possible keys
 * algebraic attack &mdash; write the cipher as a system of equations and solve for the key
 * code book attack &mdash; collect all possible plaintext/ciphertext pairs for a block cipher, or the entire pseudorandom stream until it starts repeating for a stream cipher

However, all of those attacks are spectacularly impractical against real ciphers. Brute force and algebraic attacks require the attacker to do far too much work. For a code book attack, he needs far too much data &mdash; a huge collection of intercepts, all encrypted with the same key. If the cipher user changes keys at reasonable intervals, a code book attack is impossible.

Linear cryptanalysis and differential cryptanalysis are often very efficient in terms of the attacker's effort, significantly better than brute force. However, they require large numbers of known or  chosen plaintexts, all encrypted with the same key. Re-keying often enough is an effective defense. Modern ciphers are also often designed to resist these attacks, even if the enemy can get huge numbers of known or chosen plaintexts.

Various newer attacks extend the ideas of differential cryptanalysis; examples include integral cryptanalysis, differential fault analysis, impossible cryptanalysis and boomerang attacks.

A meet-in-the-middle attack is quite effective if it can be used, but it cannot be used against most ciphers.

A birthday attack can be used whenever the issue is finding repeated output from some cryptographic technique &mdash; for example a challenge-response protocol repeating a challenge, or two inputs hashing to the same result.

A dictionary attack may be used against a password or other user authentication system. The attacker encrypts an entire dictionary of possible passwords, then checks to see if the encrypted forms of actual passwords match any of the encrypted dictionary entries.

Active attacks are generally difficult to conduct, and can be blocked by cryptographic authentication, but where they are possible, they are devastating. If the enemy can successfully forge messages, the cryptosystem becomes a liability. Variants include the man-in-the-middle attack and the rewrite attack.

Theoretical versus practical security
There are several attacks that will in theory break any symmetric cipher, but all real ciphers are designed to resist them. Against any sensible design, these attacks are utterly useless in practice because they require the attacker to deploy astronomically large resources. For example, a code book attack on a 128-bit block cipher such as AES does not become useful until the attacker collects 264 blocks (271 bytes) of material encrypted with a single key. At a gigabit a second, transmitting that much data would take several hundred thousand years; it seems reasonable to assume the user would change keys before that. Also, the attacker is likely to have trouble storing that much data; if he uses one-terabyte drives then he needs several billion of them. It is clear, then, that while AES (or any other 128-bit block cipher) is theoretically vulnerable to a codebook attack, this is of zero concern in practice. Even for a 64-bit block cipher, it is of no concern with any sane re-keying policy.

Linear cryptanalysis and differential cryptanalysis are, in theory, very powerful and very versatile attacks. However, both require very large samples of data encrypted under a single key, so in practice they are not a threat as long as the cipher is re-keyed at reasonable intervals.

There are ciphers that are provably secure against certain attacks, at least when properly used, but it does not necessarily follow that such ciphers will be secure in practice. For example, a one-time pad is provably secure in a rather strong sense, but in at least one case (see VENONA) a one-time pad system was broken in practice, because it was not used correctly. Also, even used correctly, a one-time pad is vulnerable to a rewrite attack unless other components of the cryptosystem prevent that. As another example, Serge Vaudenay's work on de-correlation theory shows how to construct ciphers that are provably secure against a broad class of attacks, including both linear and differential analysis. However, for some of Vaudenay's ciphers such as De-correlated Fast Cipher, weaknesses have been found by attackers that went outside the assumptions of the security proofs.

Some cryptographic techniques derive their security from the difficulty of a mathematical problem &mdash; the RSA algorithm from integer factorisation, the Diffie-Hellman protocol from the discrete logarithm problem, and other systems from various elliptic curve problems. In all cases the underlying problem is thought to be hard, so the cryptosystem is thought to be secure. In fact, it is demonstrably secure against all known methods of solving that problem, since no known method is efficient enough for an attack that is even close to practical. However, for any of these problems, no-one has been able to prove that no efficient method exists. Discovery of an efficient solution for any of these problems would render any cryptosystem based on it worthless.

Resources the attacker has
Another way to classify attacks is by what an attacker knows, what data is available to him.

Ciphertext-only
The ciphertext-only attack is the case where the cryptanalyst has access only to the ciphertext. Modern cryptosystems are generally effectively immune to ciphertext-only attacks.

Pure ciphertext-only attacks are rare in practice because the analyst is often able to guess some plaintext. This converts a ciphertext-only situation into a known plaintext attack; see next section. Even if he cannot guess anything, he will often have some general knowledge about the material; for example, he may know what language it is in.

Known plaintext
In a known-plaintext attack, the cryptanalyst has access to a ciphertext and its corresponding plaintext, or to many such pairs.

Sometimes it is enough for the attacker to have partial knowledge of the plaintext &mdash; perhaps that it is ASCII text with the top bit of every byte zero, or that it is radar data in a known format. This gives him something to go on, a way to check if a decryption or partial decryption is correct; that may be all he needs.

Often the attacker can guess some plaintext. In British World War II ULTRA codebreaking, such guesses were known as "cribs". Many messages contain fixed text like dates or formal phrases like "your humble and obedient servant", and various systems such as compression algorithms or email handlers insert fixed-format headers; all these are free gifts to the cryptananlyst. In war, names of enemy officers, bases, ships or units (or their codenames) are good guesses, also perhaps words like "order" and "ammunition". An intelligence organisation that knows the enemy well may have additional cribs available, looking for "congratulations" to a promoted officer. "happy birthday" to a general, and so on.

Language structure may also provide cribs. Consider ordinary English text, where about one in seven characters are spaces, and "the" and "of" are the most common words. Suppose the cipher uses 64-bit (8 character) blocks. The chance that one of them encodes the 8-character string " of the " is significant. If the cipher user (foolishly) sends large volumes of data with the same key and the attacker has the determination and the (huge) resources to test them all, this crib is almost certain to break the cipher eventually. More plausibly, an attacker may be able to use text statistics as an entry point: space is the most common character, "e" the most common letter, "q" is often followed by "u", and so on.

Generally, if a true known plaintext attack (where the attacker actually knows some plaintext) is feasible, then variants based on guessed plaintext or on partial knowledge of plaintext will be more difficult, but not prohibitively so. Suppose there is a known plaintext attack that breaks the cipher at reasonable cost, but the attacker has only some guessed plaintext that has a 10% chance of being right. That gives him a one-in-ten chance of solving the cipher on the first try. If he has many such guessed cribs available, he is almost certain to solve it eventually at some cost not horrifically more than the cost of a pure known plaintext attack.

Or suppose he only knows the plaintext is ASCII; the top bit of every byte is zero. If we are dealing with a block cipher that has 64-bit blocks, and there is a feasible known plaintext attack, then an enemy who knows 64 bits of plaintext in a single block can break the cipher. However, if the data is known ASCII and the enemy has intercepted N blocks, then he knows that 8N bits of the plaintext are zero. Whether this lets him break the cipher or not is an extremely complex question depending on all the details of the cipher and the attack, and on any additional knowledge the attacker may have.

An analyst trying to evaluate the security of a cipher generally cannot answer that question. To be safe, he assumes the worst. If there is an effective known plaintext attack on a cipher, then the cipher must be considered insecure.

A number of attacks require known (including guessed) plaintext to work:
 * a brute force attack tries all possible keys; you need to know one block of plaintext so you can tell when you have found the right key
 * a meet-in-the-middle attack finds a middle value in two ways, by half-encrypting a block of known plaintext and half-decrypting the matching cyphertext, and searches for matching "middle" results; this is much more efficient than brute force but is not applicable to most ciphers
 * an algebraic attack writes the cipher operations as equations in some algebraic system, usually Boolean, then plugs in known values for plaintext and ciphertext and solves for the key. Depending on various details, this may need anywhere from one to a few dozen plaintexts
 * a code book attack requires huge numbers of known plaintexts, at least 2blocksize/2 before it becomes useful.
 * linear cryptanalysis and differential cryptanalysis are often very efficient in terms of the attacker's effort, significantly better than brute force. However, they require large numbers of known or chosen plaintexts.

All these should be completely impractical against any well-designed cipher, properly used. An important usage precaution is to re-key often enough to prevent code book, linear and differential attacks; this is standard practice.

Chosen plaintext
In a chosen-plaintext attack, the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times); an example is the gardening used by the British during WWII.

Linear cryptanalysis and differential cryptanalysis can use either chosen plaintexts or a larger number of known plaintexts. Generally, both numbers are very large, larger than 2blocksize/2, so reasonably frequent re-keying prevents these attacks.

Chosen ciphertext
In a chosen-ciphertext attack, the cryptanalyst may choose ciphertexts and learn their corresponding plaintexts. Also important, often overwhelmingly so, are mistakes (generally in the design or use of one of the protocols involved; see ULTRA for some historical examples of this).

Related key attack
Using two or more related keys for different messages, different links, or different sessions may give a cryptanalyst an entry point.

The best-known failure of this type is for the WEP protocols used in wireless networking. WEP generates keys for different connections by concatenating a connection-specific initialisation value with another secret value, and this creates a weakness. See for example, "Breaking 104 bit WEP in less than 60 seconds".

Side channel attacks
While pure cryptanalysis uses plaintext and ciphertext, looking for weaknesses in the algorithms themselves, a side-channel attack looks at some other aspect of the behaviour of a cryptographic device which may reveal information of value to the analyst.

These attacks may be used against devices such as smartcards or against systems implemented on computers. Any cryptographic primitive &mdash; block cipher, stream cipher, public key or cryptographic hash &mdash; can be attacked this way. Side-channel attacks must be considered in the design of almost any cryptographic system.

For example, any electrical device handling fast-changing signals will produce electromagnetic radiation. An enemy might listen to the radiation from a computer or from crypto hardware. For the defenders, there are standards for limiting such radiation; see TEMPEST and protected distribution system. If the device emits sound, that is another side channel that might give an attack. Timing attacks make inferences from the length of time cryptographic operations take. If a cryptanalyst has access to, say, the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character, he may be able to break a cipher that is otherwise resistant to analysis. For example, the time required for a shift or rotation operation may depend on the distance shifted, and on some computers the time required for a multiplication operation depends on the number of ones in the binary representation of one operand. An analyst who gets timing data may therefore be able to infer something useful. Power analysis has also been used, in much the same way as timing. The two may be combined.

Differential fault analysis attacks a cipher embedded in a smartcard or other device. Apply stress (heat, mechanical stress, radiation, ...) to the device until it begins to make errors; with the right stress level, most will be single-bit errors. Comparing correct and erroneous output gives the cryptanalyst a window into cipher internals. This attack is extremely powerful; "we can extract the full DES key from a sealed tamper-resistant DES encryptor by analyzing between 50 and 200 ciphertexts generated from unknown but related plaintexts". Related attacks  defeat RSA and other public key systems.

See information security for discussion of defenses.

Strategies against symmetric cryptosystems
Cryptanalysis of symmetric-key techniques typically involves looking for efficient attacks against block ciphers or stream ciphers. Against an ideal cipher, there is no attack better than brute force.

For example, a simple brute force attack against DES requires one known plaintext and 255 decryptions, trying approximately half of the possible keys, before chances are better than even the key will have been found. But this may not be enough assurance; a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations. This is a considerable improvement on brute force attacks.

See also the stream cipher article.

Strategies against asymmetric cryptosystems
Public-key algorithms are based on the computational difficulty of various problems. The most famous of these is integer factorization (the RSA cryptosystem is based on a problem related to factoring), but the discrete logarithm problem is also important. Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems, or some of them, efficiently. For instance, the best algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring, at least for problems of equivalent size. Thus, other things being equal, to achieve an equivalent strength of attack resistance, factoring-based encryption techniques must use larger keys than elliptic curve techniques. For this reason, public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s.

Vulnerabilities of cryptographic primitives
Much of the theoretical work in cryptography concerns cryptographic primitives &mdash; algorithms with basic cryptographic properties &mdash; and their relationship to other cryptographic problems. For example, a one-way function is a function intended to be easy to compute but hard to invert. In a very general sense, for any cryptographic application to be secure (if based on such computational feasibility assumptions), one-way functions must exist. However, if one-way functions exist, this implies that P ≠ NP. . Since the P versus NP problem is currently unsolved, we don't know if one-way functions exist. If they do, however, we can build other cryptographic tools from them. For instance, if one-way functions exist, then secure pseudorandom generators and secure pseudorandom functions exist.

Other cryptographic primitives include cipher algorithms themselves, one-way permutations, trapdoor permutations, etc.

Vulnerabilities of cryptographic protocols
In many cases, cryptographic techniques involve back and forth communication among two or more parties in space or across time (e.g., cryptographically protected backup data). The term cryptographic protocol captures this general idea. Cryptographic protocols have been developed for a wide range of problems, including relatively simple ones like interactive proofs, secret sharing , and zero-knowledge , and much more complex ones like electronic cash and secure multiparty computation.

When the security of a cryptographic system fails, it is rare that the vulnerabilty leading to the breach will have been in a quality cryptographic primitive. Instead, weaknesses are often mistakes in the protocol design (often due to inadequate design procedures or less than thoroughly informed designers), in the implementation (e.g., a software bug), in a failure of the assumptions on which the design was based (e.g., proper training of those who will be using the system), or some other human error. Many cryptographic protocols have been designed and analyzed using ad hoc methods. Methods for formally analyzing the security of protocols, based on techniques from mathematical logic (see for example BAN logic), and more recently from concrete security principles, have been the subject of research for the past few decades. Unfortunately, these tools are cumbersome and not widely used for complex designs.