In probability theory and applications, Bayes' theorem shows the relation between two conditional probabilities which are the reverse of each other. This theorem is named for Thomas Bayes and often called Bayes' law or Bayes' rule. Bayes' theorem expresses the conditional probability, or "posterior probability", of a hypothesis H (i.e. its probability after evidence E is observed) in terms of the "prior probability" of H, the prior probability of E, and the conditional probability of E given H. It implies that evidence has a stronger confirming effect if it was more unlikely before being observed. Bayes' theorem is valid in all common interpretations of probability, and it is commonly applied in science and engineering. However, there is disagreement among statisticians regarding its proper implementation.
The key idea is that the probability of an event A given an event B (e.g., the probability that one has breast cancer given that one has tested positive in a mammogram) depends not only on the relationship between events A and B (i.e., the accuracy of mammograms) but on the marginal probability (or "simple probability") of occurrence of each event. For instance, if mammograms are known to be 95% accurate, this could be due to 5.0% false positives, 5.0% false negatives (missed cases), or a mix of false positives and false negatives. Bayes' theorem allows one to calculate the conditional probability of having breast cancer, given a positive mammogram, for any of these three cases. The probability of a positive mammogram will be different for each of these cases. In the example at hand, there is a point of great practical importance that is worth noting: if the prevalence of mammograms resulting positive for cancer is, say, 5.0%, then the conditional probability that an individual with a positive result actually does have cancer is rather small, since the marginal probability of this type of cancer is closer to 1.0%. The probability of a positive result is therefore five times more likely than the probability of the cancer itself. This shows the value of correctly understanding and applying Bayes' mathematical theorem.
Thomas Bayes addressed both the case of discrete probability distributions of data and the more complicated case of continuous probability distributions. In the discrete case, Bayes' theorem relates the conditional and marginal probabilities of events A and B, provided that the probability of B does not equal zero:

In Bayes' theorem, each probability has a conventional name:
- P(A) is the prior probability (or "unconditional" or "marginal" probability) of A. It is "prior" in the sense that it does not take into account any information about B; however, the event B need not occur after event A. In the nineteenth century, the unconditional probability P(A) in Bayes's rule was called the "antecedent" probability; in deductive logic, the antecedent set of propositions and the inference rule imply consequences. The unconditional probability P(A) was called "a priori" by Ronald A. Fisher.
- P(A|B) is the conditional probability of A, given B. It is also called the posterior probability because it is derived from or depends upon the specified value of B.
- P(B|A) is the conditional probability of B given A. It is also called the likelihood.
- P(B) is the prior or marginal probability of B, and acts as a normalizing constant.
Bayes' theorem in this form gives a mathematical representation of how the conditional probability of event A given B is related to the converse conditional probability of B given A.
Bayes' theorem with continuous prior and posterior distributions
Suppose a continuous probability distribution with probability density function ƒΘ is assigned to an uncertain quantity Θ. (In the conventional language of mathematical probability theory Θ would be a "random variable.") The probability that the event B will be the outcome of an experiment depends on Θ; it is P(B | Θ). As a function of Θ this is the likelihood function:

Then the posterior probability distribution of Θ, i.e. the conditional probability distribution of Θ given the observed data B, has probability density function

where the "constant" is a normalizing constant so chosen as to make the integral of the function equal to 1, so that it is indeed a probability density function. This is the form of Bayes' theorem actually considered by Thomas Bayes.
In other words, Bayes' theorem says:
- To get the posterior probability distribution, multiply the prior probability distribution by the likelihood function and then normalize.
More generally still, the new data B may be the value of an observed continuously distributed random variable X. The probability that it has any particular value is therefore 0. In such a case, the likelihood function is the value of a probability density function of X given Θ, rather than a probability of B given Θ:

A simple example of Bayes' theorem
Suppose there is a school with 60% boys and 40% girls as its students. The female students wear trousers or skirts in equal numbers; the boys all wear trousers. An observer sees a (random) student from a distance, and what the observer can see is that this student is wearing trousers. What is the probability this student is a girl? The correct answer can be computed using Bayes' theorem.
The event A is that the student observed is a girl, and the event B is that the student observed is wearing trousers. To compute P(A|B), we first need to know:
- P(B|A), or the probability of the student wearing trousers given that the student is a girl. Since girls are as likely to wear skirts as trousers, this is 0.5.
- P(A), or the probability that the student is a girl regardless of any other information. Since the observer sees a random student, meaning that all students have the same probability of being observed, and the fraction of girls among the students is 40%, this probability equals 0.4.
- P(B), or the probability of a (randomly selected) student wearing trousers regardless of any other information. Since half of the girls and all of the boys are wearing trousers, this is 0.5×0.4 + 1.0×0.6 = 0.8.
Given all this information, the probability of the observer having spotted a girl given that the observed student is wearing trousers can be computed by substituting these values in the formula:

Another, essentially equivalent way of obtaining the same result is as follows. Assume, for concreteness, that there are 100 students, 60 boys and 40 girls. Among these, 60 boys and 20 girls wear trousers. All together there are 80 trouser-wearers, of which 20 are girls. Therefore the chance that a random trouser-wearer is a girl equals 20/80 = 0.25. Put in terms of Bayes´ theorem, the probability of a student being a girl is 40/100, the probability that any given girl will wear trousers is 1/2. The product of these two is 20/100, but we know the student is wearing trousers, so one deducts the 20 students not wearing trousers, and then calculate a probability of (20/100)/(80/100), or 20/80.
It is often helpful when calculating conditional probabilities to create a simple table containing the number of occurrences of each outcome, or the relative frequencies of each outcome, for each of the independent variables. The table below illustrates the use of this method for the above girl-or-boy example
| | Girls | Boys | Total |
| Trousers | 20 | 60 | 80 |
| Skirts | 20 | 0 | 20 |
| Total | 40 | 60 | 100 |