Analysing Data Using Descriptive Statistics

156 views 8 pages ~ 1928 words Print

Using the Random Number Generator feature in Excel, 5000 samples of size 10 are generated. Out of the 13 possible integer values ranging from 0 to 12, the distribution of the sample is assumed to follow a binomial distribution of p = 0.25.

The mean of a binomial distribution is given as the product of the number of trials and the p-value. In this case,

The variance of a binomial distribution is given by the following formula

However, when we calculate the probability of each individual value, then we find their mean and standard deviation, only the mean is equal with what we have calculated above, whereas the variance is not.

Question 1 – (b)

The test statistics of each of the samples is calculated in excel as follows

sample1

sample2

sample3

sample4

sample5

sample6

sample7

sample8

sample9

sample10

mean (xbar)

2.9844

2.9642

3.0004

2.9834

3.0004

2.9784

3.0008

3.0028

3.0154

2.9698

median (Md)

-0.73539105

-0.07547

0.000843

-0.035

0.000843

-0.04554

0.001687

0.005903

0.032466

-0.06367

unbiased s2

1.481011586

1.500056

1.498015

1.481878

1.502415

1.518418

1.487293

1.531555

1.46942

1.479907

t-test

-0.90642186

-2.06688

0.023109

-0.96424

0.023075

-1.23949

0.046385

0.159984

0.898324

-1.75539

chi-square

3290.478631

3332.791

3328.256

3292.405

3338.033

3373.588

3304.436

3402.775

3264.724

3288.025

Histograms

Mean of means histogram

It is normally distributed

Median histogram

The histogram of the medians is also normally distributed

Histogram of z is also normally distributed

Histgram of t is normally distributed,

Histogram of chi-square

Descriptive statistics

The descriptive statistics of each of the sample is calculated in excel and displayed below.

MEANS

MEDIAN

Mean

2.99

Mean

2.9111

Mean

2.243209

Standard Error

0.006577

Standard Error

0.008288

Standard Error

0.014639

Median

2.1

Mode

2.9

Mode

1.877778

Standard Deviation

0.465076

Standard Deviation

0.586017

Standard Deviation

1.035133

Sample Variance

0.216295

Sample Variance

0.343415

Sample Variance

1.0715

Kurtosis

-0.0789

Kurtosis

-0.01959

Kurtosis

1.772618

Skewness

0.08915

Skewness

0.087995

Skewness

0.993063

Range

3.1

Range

8.4

Minimum

1.5

Minimum

0.222222

Maximum

4.6

Maximum

8.622222

Sum

14950

Sum

14555.5

Sum

11216.04

Count

5000

Count

5000

Count

5000

-0.4714

-4.19079

-35.6755

-1.52041

-10.727

-51.697

CHI

480.56

762.9929

2380.635

CHI

Mean

-0.07672

Mean

8.972836

Standard Error

0.015854

Standard Error

0.058556

Median

8.4

Mode

7.511111

Standard Deviation

1.121081

Standard Deviation

4.140531

Sample Variance

1.256824

Sample Variance

17.144

Kurtosis

1.193707

Kurtosis

1.772618

Skewness

-0.26572

Skewness

0.993063

Range

10.71052

Range

33.6

Minimum

-6.12795

Minimum

0.888889

Maximum

4.582576

Maximum

34.48889

Sum

-383.615

Sum

44864.18

Count

5000

Count

5000

-145.038

281.5622

-194.06

102.0022

2792.383

38090.16

Shape of the histograms

The shape of the histogram shows that they are positively skewed. This means that the it has a long right tail.

Based on what we discussed in class, when the p = 0.5, then the distribution of the binomial distribution is similar to the normal distribution, i.e. bell-shaped. However, when p < 0.5, as in this case, the shape of the histogram is positively skewed and larger values of p > 0.5 makes the shape to be negatively skewed.

The tabulated values of the test statistics are as follows

Z0.025 = 1.96 (two-tailed); Z0.05

= 1.64 (one-tailed).

t4999 = 1.64

χ2 (0.05) = 124

The results calculated are different from those in the one from the theory (tabulated) because of the standard error associated with sampling.

Question 2

(a) Using a larger sample (n = 50) would make the results more accurate than using a sample of n = 10. This is due to the laws of large number which state that as the sample size increases, the statistics will be closer to the parameters. Therefore, I do not expect the results from question 1 to be the same as in question 2

(b) Histograms

Means of means

The histogram of means is as displayed below. The shape of the histogram shows that the it is similar to an normal distribution.

The descriptive statistics of the mean is as follows

DESCRIPTIVE STATISTICS OF MEANS

Mean

2.993536

Median

Mode

2.98

Standard Deviation

0.212386907

Sample Variance

0.045108198

Kurtosis

0.010181873

Skewness

0.042006089

T-TEST

-7.164994654

-97.006464

CHI

0.982356319

Clearly, the mean and median of the means of the 5000 samples, each with n=50 are similar to the mean and median of the entire population.

Mean of medians

The histogram of the medians of the 5000 samples with each n=50 shows that the median is 3.

As for the descriptive statistics,

DESCRIPTIVE STATISTICS OF MEDIAN

Mean

2.9486

Median

Mode

Standard Deviation

0.260905451

Sample Variance

0.068071654

T-TEST

-13.93044431

-2.423019237

CHI-SQUARE

1.482449361

It is also clear to see that the mean and median of the medians is approximately equal to the population mean and median.

Question 3

(a) The sample mean of and the sample median Md can both be an unbiased estimator of the population mean µ when the sample is size is 50 and above because of the law of large number. It states that a regardless of the distribution of a population, a sample taken from it will be normally distributed when it is of size 30 and above.

(b) The results from question 1 uses a sample of size 10 whereas the sample size of question 2 uses a sample size of 50. Because the sample size of question 1 is less than 30, the sample mean of and the sample median Md are not unbiased estimators of the population mean µ. On the other hand, the sample size of question 2 are more than 30 in size and therefore the of and the sample median Md are unbiased estimator of the population mean µ.

(c) The values obtained from question 1 are not consistent estimator because the sample size is smaller and thus, they are not equal, or near the true population parameters. However, since the sample size of question 2 is large enough, then the estimators are approximately equal to the population parameters and therefore, they are consistent estimators.

(d) Out of several estimators, the efficient estimator is the one which has the lowest variance, meaning that it has the smallest deviation from the population parameter is estimating. Out of the estimator in question 1 (n=10) and question 2 (n=50), the latter one is the most efficient estimator. This is because it has a larger sample size than the prior one.

Question 4

In the dataset provided in the excel file about the amount of times customers take to pay their accounts, there are two sets of customers namely country customers and city customers. The six steps procedure will be used to calculate the problem.

(a) The use of a higher value of significance level will reduce the chances of committing a type 1 error since the rejection region is bigger than when we use 0.05 or 0.01.

(b) The descriptive statistics of the city customers and the country customers are as follows.

CITY

COUNTRY

Mean

34.93913043

51.65882

Standard Error

0.713847661

1.467439

Median

Mode

Standard Deviation

7.655163326

13.52912

Sample Variance

58.60152555

183.037

Kurtosis

0.711691012

0.920904

Skewness

0.0320181

-0.02187

Range

Minimum

Maximum

Sum

4018

4391

Count

115

The histogram of the city customers is

(c). The population mean of the past and standard deviation of the city customers is given as follow: µ = 34, σ = 6. The z-statistic is the appropriate statistic test for this problem. The assumption of the z-score are as follows

The distribution of the sampled data is normal. This assumption is fulfilled by the histogram displayed above.

(i) the six-step procedure

Step 1: has the mean of the time of city customer changed from 34?

The hypothesis which we will be testing will be

H0: µ = 34

HA: µ ≠ 34

Step 2:

The variance is known to be 6 and according to the histogram, it is normally distributed. Therefore, the appropriate test will be the standardized normal which is

The standardized normally distributed N[0.1]

Step 3: the level of significance is α = 0.05

Step 4: decision rule

If z>zα = 1.645, we reject the H0

Step 5: calculating the statistic

= 1.26

Step 6: conclusion

Since Z(1.26) < Zα

(1.645), we fail to reject the null hypothesis and conclude that there was no change.

(ii) the appropriate p-value

Our null hypothesis states that H0: µ = 34, whereas the alternative hypothesis is µ ≠ 34. Therefore, we are using the two-tailed test. We will calculate the p-value at 0.05 level of confidence. If p-value < 0.05, we do not reject the null hypothesis. If the p-value > 0.05, we reject the null hypothesis.

Step 1: Is the mean time of the city customer different from 34

H0: µ = 34

HA: µ ≠ 34

Step 2: test statistic

The t-statistic will be used, which is , with a t-distribution of n – 1 degrees of freedom.

Step 3: level of significance is α = 0.05

Step 4: decision rule:

We will reject the H0 if t < - tα, n-1 = t0.025, 114 = 1.982.

Step 5: the statistic is

t = = 1.26.

step 6: since the t (1.26) >0.025, we fail to reject the null hypothesis and conclude that since we do not have sufficient evidence, there is no difference in mean.

(d). the six-step procedure

Step 1: hypothesis

The question which we will be answering will be:

Is the time taken to pay accounts different from city customers and country customers?

In our case, the city customers will be denoted as X1 whereas the country customers X2. The null hypothesis will be as follows

H0: µ1 - µ2 = 10

H1: µ1 - µ2 > 10

Step 2: Test statistic and sampling distribution

Since the variance is known and equal (σ2 = 6), we will use the pooled sample variance,

n1 = 115, n2 = 85

= = 36

The degrees of freedom will be n1

+ n2 – 2 which will be 115 + 85 – 2 = 198

Step 3: Level of significance

The level of significance which we will use in this problem will be α = 10%

Step 4: Decision rule

In our case, we are conducting a two-tailed test because the alternative hypothesis states that H1: µ1 ≠ µ2. The degrees of freedom we will use will be 198.

The null hypothesis will be rejected if our test statistic is below -1.645 or above 1.645.

Step 5: calculating the t-statistic

The pooled variance has already been calculated as being 36, the test statistic will be calculated as follows

= = = -19.8

Step 6: Conclusion

Since t = -19.8 < -1.645, we do not reject the null hypothesis at 10% significance level. We therefore do not have sufficient evidence at 10% to conclude that the time it takes for the city customers and country customers is different.

(e) Testing the population variance

Step 1: Stating the hypothesis

H0: σ2 = 36;

H1: σ2 ≠ 36

Step 2: test statistic

The appropriate test statistic which will be used will be the χ2 statistic:

χ2 =

The chi-square test is with 115 – 1 = 114 degrees of freedom. The assumption of this test is that the dataset is normally distributed. The histogram above shows that the data follows a normal distribution.

Step 3: Level of significance choses will be α = 0.05

Step 4: decision rule: Reject the null hypothesis H0 is > χ2α, n – 1

In our case tabulated χ2, α = 0.025, df = 114 = 130

Step 5: Calculating the test statistic

Χ2 = = 185.81

Step 6: decision

Since X2(185.81) > X2, (α = 0.025, df = 114) = 130, we therefore reject the null hypothesis and conclude that we have enough evidence to conclude that the variance of the sample is different from the population variance.

(f) In the past, country customers take 10 days more than the average city customer. Since we assume there is equal variance between the city and the country client, we use the pooled variance.

Step 1: is the time taken by the country client more than 10 days when compared to the city customers?

Assume country customers are represented by 1, and the city customer are represented by 2

H0: µ1 - µ2 = 10

HA: µ1 - µ2 ≠ 10

Step 2: The test statistics and sampling distribution

The pooled variance will be calculated as follows

S2p = = 36

Test statistic = = -19.6

Step 6: the calculated t = -19.6, whereas the tabulated t (0.05, 198) = 1.653

Since t (calculated) < t (tabulated), we fail to reject the null hypothesis and conclude that the difference in time between the city customers and country customers is still 10 days.

(g) The finance director would like to test whether there is a difference in variance between the city and country customer in paying their accounts

Step 1: Is the variance of the city customer different from the variance of the country customers

Let the city customers be denoted by X1 and those of country be X2.

H0: ; HA: ≠ 1

Step 2: The test statistic

Since the population are normally distributed, we will use the F statistic which can be expressed through the following relationship

F = where s1 is for the city customers and s2 is for country customers.

The degrees of freedom of the F distribution will be n1 – 1 for the numerator and n2 – 1 for the denominator

Step 3: level of significance

The standard significance which will be used will be 5% i.e. α = 0.05

Step 4: since it is a two tailed test, the null hypothesis will be rejected if the calculated F is below the tabulated F0.975, 114, 84 = 2.98

Step 5

S21 = 58.60152555 s22

= 183.037

The test statistic = 58.6/183

= 23.9

Step 6: decision

Since the calculated F (23.9) > tabulated F (2.98), we fail to reject the null hypothesis and conclude that the variance of the city customers and the country customers is equal to 1.

January 19, 2024

Category:

Health

Subcategory:

Medicine

Subject area:

Data Analysis

Number of pages

Number of words

1928

Downloads:

Rate:

4.7

Expertise Data Analysis

Verified writer

Clive2020 is an excellent writer who is an expert in Nursing and Healthcare. He has helped me earn the best grades with a theorists paper and the shadowing journal. Great job that always stands out!

Hire Writer

Use this essay example as a template for assignments, a source of information, and to borrow arguments and ideas for your paper. Remember, it is publicly available to other students and search engines, so direct copying may result in plagiarism.

Eliminate the stress of research and writing!

Hire one of our experts to create a completely original paper even in 3 hours!

Hire a Pro

Related Essays

230 views 6 pages ~ 1416 words

Case Study Hospital

The Case Study of Springfield General Hospital

This paper examines the case study relating to the welfare of the patients at Springfield General Hospital to investigat...

94 views 2 pages ~ 327 words

Data Analysis

The Process of Data Analysis

The process of data analysis is comprised of five core processes that begin with data collection from a range of sources...

154 views 3 pages ~ 609 words

Data Analysis Statistics Correlation

Pearson Correlation Test

The present assignment applied Pearson correlation test to evaluate the direction and strength of the association betwee...

237 views 12 pages ~ 3189 words

Breakfast Eating Nutrition

Breakfast Consumption and Academic Performance

The current research article provided a comprehensive review of the association between work, academic performance and b...

85 views 2 pages ~ 398 words

Research Data Analysis Statistics Theory

A Comparison of Descriptive and Differential Statistics

The aim of this paper is to give a broad understanding of the underlying concepts of quantitative methods as well as off...

143 views 7 pages ~ 1857 words

Case Study Patient Hospital

Nightingale Hospital Periodic Performance Review Case Study

Periodic Performance Review (PPR) tool is used by Nightingale Community Hospital to examine its internal processes so th...

Similar Categories

Euthanasia Nutrition Cloning Medical Marijuana Organ Donation Vaccination Plastic Surgery