Statistics Project
Enviado por paulafk5 • 9 de Mayo de 2014 • 1.941 Palabras (8 Páginas) • 220 Visitas
1-4-2014
1. MOTIVATION
Our project is based on two variables: Income Per Capita in 2007 (X) and Expenditure on Tobacco in 2007 (Y). To analyze these variables we have used a random sample of 25 countries that belong to four different continents, this will allow us not to only have different results but to observe the variables in developed, developing and third world countries. Both variables are economical because they are expressed in monetary units, but despite of their economic nature, the relationship between them and the topic of our project could be rather considered, social-economic. This is because smoking is a social problem. Tobacco companies rely on the addictive properties of this vice that make consumers buy the product regardless of its price. Meaning, tobacco has an inelastic demand.
We have chosen these two variables because we thought that they were important in the current world we are living. Tobacco is a product that has a negative impact on humans causing, addiction and illness that can even lead to death. These past years we’ve had more information about how truly harmful consuming it can be, and society has taken into action a movement to inform population about its negative effects. It is common knowledge that there is a direct relationship between smoking and its health related negative consequences. As a result of this observation, we thought it was important to state the “anti-smoking law” which began to be imposed in many countries in 2006 and continued strongly in 2007 (year from where our data are extracted) with limitations of tobacco consumption in public spaces in countries such as Lithuania, Belgium, Hong Kong, Argentina, Thailand, France, United Kingdom, Iceland, Estonia and Finland.
This is one of the reasons we wanted to analyze it’s consume (expenditure) to see how the knowledge we have in the present has affected its purchase not only in our country, or cautious developed countries, but globally. But mainly, the real motivation of this study is to know how the wealth of a country determinates the expenditure on tobacco and also, which countries have the highest amount of money expended in this practice.
2. GATHERING THE DATA
This is the data we have used. We have applied the X and Y variables in 25 countries color-coded by continents:
Table1.Data bases “Passport (Euromonitor)” and Classora.
Graph1. Relation between Income per Capita and Expenditure on Tobacco of each country.
3. STATISTICAL ANALYSIS
We calculated the measures of central tendency and variability. These are the results we obtained:
Table2: Central Tendency and Variability.
With the mean we study the arithmetic average and with the median the midpoint of ranked values, this will help us analyze the shape.
All the shapes except, Africa (Income per capita), are Right-Skewed. This means the Median is larger than the Mean therefore the shape is asymmetric and positive. A positive skew indicates that the tail on the right side is longer than the left side and the bulk of the values lie to the left of the mean
Africa (Income per capita) on the other hand is Left-Skewed. This means that the Median is smaller than the Mean therefore the shape is asymmetric and negative. A negative skew indicates that the tail on the left side of the probability density function is longer than the right side and the bulk of the values (including the median) lie to the right of the mean.
In this case no variable has a mode, because no variables have repeated values. But we can observe the Histograms and arrive to the conclusion that they do have a mode if you group the data into intervals.
Table3 Graph2
As we can observe in the graph, the majority of countries’ (7) expenditure is between 0-1000$. This would be the interval mode.
Table4 Graph3
As we can observe in the graph, the majority of countries’ (10) Income per capita is larger than 11000$. This would be the interval mode of variable X.
In descriptive statistics, the quartiles of a set of values are the three points that divide the data set into four equal groups, each representing a fourth of n being sampled. We will use these to later calculate the interquartile range (IQR) which is a measure of variability, based on dividing a data set into quartiles.
To obtain these results we rank-ordered the data (Table3), and calculated the interquartile range with the formula Q3 minus Q1.
Minimum Q1 Median(Q2) Q3 Maximum
Total 32,88 694,86 2158,79 12469,51 73701,8
Asia 32,88 612,38575 6247,53 14751,99 22472,48
America 121,3 10130,93 3244,41 12760,57 73701,8
Africa 162,708 361,124 909,120 2150,41 2158,79
Europe 361,635 2152,08 10871,79 23743,61 29610,49
Table5: Quartile Variable Y; Expenditure on tobacco 2007.
Minimum Q1 Median(Q2) Q3 Maximum
Total 834 2467,5 6645 37082 62451
Asia 834 1030,5 18094 44505 62451
America 3432 4684 7185 43185 46458
Africa 1123 1376,5 2928 3747 5933
Europe 2562 7073 33873 38019,5 59608
Table6: Quartile Variable X; Income per Capita 2007.
The range is the difference between the extreme values, the only problem with this formula is that we don’t really know if between those extreme values there is a lot of variability, if there are any repeated ones, if there are continuous, etc. Usually the greater the value
...