Statistics is how we observe, analyze & dissect data. When coupled with probability, statistics becomes a powerful mechanism for uncovering knowledge, tackling uncertainty, and making predictions using data. In this age of rapid digitization & AI, the demand for skilled statisticians and data scientists is at an all-time high across nearly every industrial sector & is expected to grow by 30% in the coming decade.
If you love working with numbers & wrangling and analyzing data, mastering statistics is necessary. Undoubtedly challenging, it will take consistent hard work, intelligent studying, and a thorough grasp of vital concepts. Revise fundamental concepts in probability & statistics with this article from MyAssignmentHelp, a leading global statistics assignment help service.
Let’s get started.
Conditional probability is a very powerful concept for making accurate probability predictions from available data. If information about the occurrence/presence of one event/factor affects the probability of another, then we can use conditional probability to draw near-accurate conclusions about both events.
If we have two events, A and C & both aren’t independent of each other, then the conditional probability of A given B (that is, the probability of the occurrence of A assuming that C has already occurred) is given as 🡪
P(C) cannot be zero, rendering conditional probability meaningless. Also, note that P(A|C) = P (A) if A and C are independent events.
Let’s have a look at a problem.
Q. Suppose we have a shuffled deck of 52 playing cards.
- What is the probability that a queen has been drawn?
- What is the probability of drawing a queen, given that the card drawn is from the Heart suit?
- Given that all non-face cards have been removed, what is the probability of drawing a King?
Can you use conditional probability to find the answers above? Here are some solved problems that can help you out.
Baye's theorem is a central aspect of conditional probability and a critical theorem in data science. The key to understanding the idea behind this theorem is that it’s used to make predictions about sequential occurrences. New information about subsequent events affects predictions/information about initial events & vice versa.
Baye's theorem deals with prior and posterior probabilities. Prior probability is the initial probability before obtaining additional information. The posterior probability is the revised value after considering the additional information.
The expression is as follows 🡪
As may be evident from above, Bayes Theorem allows us to invert conditional probabilities, that is, find P(B|A) from P(B|A). The Bayes Theorem can be proven as follows 🡪
The multiplication rule of probability states the probability of events A and B P (A∩B) occurring simultaneously as
P(A) * P(B|A) = P(B) * P(A|B)
If we divide both sides by P(A), then we get the Bayes Theorem 🡪
P(B|A)= [P(B) * P(A|B)] / P(A)
Do not consider P(B|A) equivalent to P(A|B).
Here’s an example that can help elucidate things.
#Toss a coin 5 times. Let H1 be the first head toss, and let HA be all five tosses. Then, P(H1|HA) = 1 but P(HA|H1) = 1/16
P(HA|H1) = [P(H1|HA) * P(HA)]/P(H1)
or, P(HA|H1) = [ 1/32] / [1/2]
or, P(HA|H1) = 1/16
As you may have already understood, conditional probability and Bayes theorem are aspects of inferential statistics. Such is the importance of Bayes inference in statistics, that there’s a separate category called Bayesian statistics that tackles problem scenarios using this versatile theorem. Bayesian learning is an aspect of machine learning that uses Bayesian inferences/statistics to make predictions from available information.
Want to boost your ideas and skills in Bayesian statistics? Then, here are some sums to try your hand at.
Randomness and uncertainty are intrinsic to all processes & data. Statistics attempts to manage and monitor randomness in data using random variables and distributions.
It would be best if you were acquainted with sample spaces. In many cases, problems may not require you to consider or look into every individual outcome. Instead, you may be required to look at distribution patterns and trends. Random variables and distributions are stochastic functions that apply to the sample space that help track randomness in a sample space, often about a particular condition or issue of interest.
A random variable is akin to a real-valued function on a sample space, a set of all possible outcomes to an event. The random variable function imposes a certain constraint on the sample space. Random distributions are functions that determine the probability that the value of a random variable lies in a certain range.
Random variables and distributions can be discrete or continuous. Probability distributions of discrete random variables are called probability mass functions, while continuous random variables are called probability density functions.
For better understanding, try to solve the problem below
# Let X be a discrete random variable with probability mass function as follows 🡪
PX (x) =>
- 0.1, for x =0.2
- 0.2, for x =0.4
- 0.2, for x = 0.5
- 0.3, for x = 0.8
- 0.2, for x=1
- 0 otherwise
- Find RX, that is, the range of the random discrete variable.
- Find P(X<=0.5)
- Find P (0.25 < X < 0.75)
- Find P (X =0.2 |X < 0.6)
Hints: For questions b and c, you will have to add all the probabilities within the range of all those values of the random discrete variable. This is because random variables are associated with mutually exclusive outcomes, and the different values form a list of mutually exclusive outcomes.
Question d is a conditional probability question, so solve it accordingly.
Learn more about random variables & distributions through these two university articles.
Descriptive/Sample Statistics & The Central Limit Theorems
Say we draw a set of random samples from a population. Let the sample space be N and X1, X2, …XN be the random variables denoting every sample taken. Each X is independent of the other and has equal probability distributions; they are independent and identically distributed.
For the above scenario, say, we consider X’, the sample mean, as an estimator for the distribution. X' itself is a random variable and can vary according to the population distribution, the size of the sample space, as well as the method of sampling. This means that X’ has a random distribution of its own, referred to as the sampling distribution of the sample mean.
The Central Limit Theorem
Just like the random distributions of X can vary, so can the random distributions of sample means, X’, another random variable. The Central Limit Theorem states that if the population distribution is normal, the sample mean distribution will also be normal for any sample size.
Suppose a normally distributed population's mean and standard deviation are μ and σ, respectively. In that case, the distribution of sample means of that population will be normally distributed with mean μ and standard deviation σ/√n, where n is the sample size.
Revise your ideas with this handy handout from the University of Illinois at Chicago.
A hypothesis is a conjecture or an estimation based on available information, intuition, and experience. Statistical hypotheses are also estimations or conjectures about the parameters or characteristics of a population. Hypothesis testing is how we determine how accurate a proposed hypothesis is. Browse myassignmenthelp.com for statistics and programming assignment help service in the Canada & USA.
The key steps in statistical hypothesis testing are 🡪:
- Formulating the hypothesis to be tested
In any hypothesis testing, two primary and competing hypotheses are generally considered – the status quo (NULL) H0 and the research (ALTERNATIVE) hypothesis HA. Hypothesis testing aims to determine if the alternative hypothesis is true given the data.
The ultimate aim is to reject the null hypothesis or fail to reject the null hypothesis. Make sure to check all conditions and state all the assumptions you make.
- Determining an appropriate testing statistic
Choosing the right kind of statistical test is particularly important. Determine which test to use by closely analyzing the population, your sampling methods & samples, and your hypotheses. Check out these handouts (follow the links) that list all commonly used statistical tests.
- Calculating the P-value
The P-value showcases how likely or unlikely it is to observe an extreme test statistic that aligns closely with the alternative hypothesis, given the assumption that the null hypothesis is true. The general convention is that if the p-value is less than a certain test statistic value, the null hypothesis is rejected in favor of the alternative hypothesis, or the null hypothesis is not rejected.
This is similar to arguments made in criminal trials, where suspects are considered innocent and proven guilty. If a suspect is innocent (if the null hypothesis is true), then what are the chances that they have committed the crime (what is the probability or likelihood of observing an extreme statistic in the direction of the alternative hypothesis)?
The value we equate with the p-value to determine the extreme statistic is the significant value.
Want to dig deeper into statistical hypothesis testing? Then, follow the link to a handout from the University of Colorado, Boulder.
Well, that’s all the space we have for this article. The above concepts and techniques remain some of the most potent and commonly used stat tactics used by professional statisticians & data scientists alike. Go through them minutely, solve different kinds of problems, and get some expert aid from subject matter experts if need be.
Click here to connect with professional writers & SMEs of MyAssignmentHelp.com, a leading statistics and programming assignment help service in the USA.