Saturday, January 17, 2015

Simple Explanation and Sample Case of the Correlation Analysis

Correlation analysis is a Statistical technique that used to measure and determine the relationship between the two variables. Example decline in rice production due to reduced fertilizer, declining sales results may be due to the decline in advertising costs, rise in blood pressure due to weight gain. The relationship between the two variables there are positive and negative. For example, the relationship between variables X and Y. The relationship of X and Y are positive if rise in X followed by the increase in Y and otherwise. The relationship between variables X and Y are said to be negative if the increase followed by a decrease X Y and otherwise.

Examples of positive relationships:
X = Fertilizer              Y = Produced
X = Cost of Ad           Y = Income Sales
X =  Body Weight      Y = Blood Pressure

Example of a negative relationship:
X = Price of Goods     Y = Demand

Relationship between two variables measured by a value called the correlation coefficient. Correlation coefficient represent the linear dependence of two variables or sets of data.The correlation coefficient ranges between -1 and 1, the correlation coefficient is denoted by (r), the value of r can be written as follows:  -1 <  r  < 1
  •  r = 1, meaning that the relationship between the two variables and positively perfect (close to 1, the relationship is very strong and positive).
  • r = -1, Meaning that the relationship between the two variables and negative perfect (close to -1, the relationship is very strong and negative).
  • r = 0, the relationship between the two variables is weak even none relationship
The correlation coefficient is defined as follows :
the contribution of X to Y is calculated with a coefficient called the coefficient of determination that denoted by (CD). Coefficient of determination is defined as follows :

Sample Case of the Correlation Analysis

Suppose X is the cost of advertising, while Y is the income sale, Calculate the correlation coefficient (r) and coefficient of determination of the two variables.


Friday, January 16, 2015

Basic Probability Theory


Probability is a value that used to measure the rate of occurrence of an random event. The experiment is a terms that widely used in probability theory. Simple examples of experiment in probability theory is tossing of a pair coin and the experiment of throwing a dice. Two main basic part of probability is Sample Space and Event.

In an experiment known terms of Sample Space, Sample Space (S) is the set of all possible outcomes of an experiment. Examples of experiments throwing a six-sided dice, so the sample space it is

S = {1, 2, 3, 4, 5, 6}

Event  is a subset of the sample space. Events usually denoted by the letters A, B and others. Examples, appear side the dice is odd number in an experiment of throwing a dice. 

A = {Event appear odd number} = {1,3,5}

The Probability of an event, such as event A is written as P(A), then the probability of event A is defined as follows:
Where  x is : the number of elements in the Event or occurrence frequency
           n    is : all the elements in the sample space of events or number of observations

Value or range of an Odds from an event is between 0 ≤ P(A) ≤1

Pattern of Events in Probability Theory

Complement or Opposite

Let [S] be the sample space of an event, and [A] is a subset of [S] then even [not A] or the complement of the event [A] is a subset of [S] which are not members of the even [A].


Intersection or Joint Probability
Intersection of two events are events that contain all the same elements from events A and B.
Intersection of two events denoted by A ∩ B
Example :
Event A = {1,2,3,4,5}, event B = {2,4,6,8}  so A ∩ B : {2,4}
P(A and B) = P (A ∩ B) = P(A)P(B)

Union
Union is the set of events that includes all the elements  in set A, B or both. The union of two events A and B is denoted as A ∪ B
Example: A = {2,3,5,8} and B = {3,6,8} then A ∪ B = {2,3,5,6,8}
P(A ∪ B) = P(A)+P(B) - P (A ∩ B) if the two event are not mutually exclusive
P(A ∪ B) = P(A)+P(B) if the two event are mutually exclusive

Mutually exclusive
Mutually exclusive are two events occur that do not have intersections.

Conditional Probability
Two events said have a conditional probability if the one event become condition to  occurance of the other event. The conditional probability is written P(A|B) and it's read "the probability of event A is given by event B". It's defined by


Sunday, January 4, 2015

CSPro 6.0.1


CSPro, short for the Census and Survey Processing System, is a public domain statistical package developed by the U.S. Census Bureau and ICF International. Serpro S.A. was involved in past development. Funding for development comes primarily from theU.S. Agency for International Development.
The software can be used for entering, editing, tabulating, mapping, and disseminating census and survey data. This package is widely used by statistical agencies in developing countries and major international household survey programmes, such as Multiple Indicator Cluster Surveys (MICS) and Demographic and Health Surveys (DHS). As of version 5, Unicode data entry is supported.
CSPro was designed and implemented through a joint effort by the developers of two software packages that were previously used to process census and survey data on DOS machines: the Integrated Microcomputer Processing System (IMPS) and the Integrated System for Survey Analysis (ISSA). The public domain distribution is binary-only, running exclusively on the Microsoft Windows family of operating systems. The source code has not been released to the public.
A new version of CSPro has been released, 6.0.1. This is a major release, as it also involves the release of the CSEntry Android app. Now data entry applications can be run on Windows, as before, but also on Android phones and tablets. For more information on this version, click on the CSPro 6.0.1 tag to read previous posts highlighting various new features.
Download the New version here, and as always, you can access older versions on the Software page.

What is Statistics ??


Statistics is the study of the collection, analysis, interpretation, presentation, and organization of data. In applying statistics to, e.g., a scientific, industrial, or societal problem, it is necessary to begin with a population or process to be studied. Populations can be diverse topics such as "all persons living in a country" or "every atom composing a crystal". It deals with all aspects of data including the planning of data collection in terms of the design of surveys and experiments.

In case census data cannot be collected, statisticians collect data by developing specific experiment designs and survey samples. Representative sampling assures that inferences and conclusions can safely extend from the sample to the population as a whole. An experimental study involves taking measurements of the system under study, manipulating the system, and then taking additional measurements using the same procedure to determine if the manipulation has modified the values of the measurements. In contrast, an observational study does not involve experimental manipulation.

Two main statistical methodologies are used in data analysis : Descriptive statistics, which summarizes (Frequency Distribution) data from a sample using indexes such as the mean or standard deviation, and inferential statistics, which draws conclusions from data that are subject to random variation (e.g., observational errors, sampling variation). Descriptive statistics are most often concerned with two sets of properties of a distribution (sample or population) : central tendency (or location) seeks to characterize the distribution's central or typical value, while dispersion (or variability) characterizes the extent to which members of the distribution depart from its center and each other. Inferences on mathematical statistics are made under the framework of probability theory, which deals with the analysis of random phenomena. To make an inference upon unknown quantities, one or more estimators are evaluated using the sample.

Saturday, January 3, 2015

How to Construct the table of frequency distributions


Frequency Distribution is a table that displays the frequency of various outcomes in a sample each entry in the table that contain the frequency or count of the occurences of valaues within a particular group or interval, and in this way, the table summarizes the distribution of values in the sample

A frequency distribution in statistics shows us a summarized grouping of data divided into mutually exclusive classes and the number of occurrences in a class. It is a way of showing unorganized data e.g. to show results of an election, income of people for a certain region, sales of a product within a certain period, student loan amounts of graduates, etc. Some of the graphs that can be used with frequency distributions are histograms, line charts, bar charts and pie charts. Frequency distributions are used for both qualitative and quantitative data.


Construction of frequency distributions :

1. Decide about the number of classes. The maximum number of classes 
    may be determined by Sturges formula:

K = 1+ 3,322 log  n
       whereas :

        K = Number of Class
        n = Number of observations in Data

2. Calculate the Range of Data


Range = ( Xmax - Xmin)

         X = Observation Value

      3. Decide width (Interval) of the class denote by ( I ) : 


For Example
If the total number of Observations is 50, the number of class (K) would be

K = 1+3,322 log n
K = 1+3,322 log (50)
K = 1+3,322 (1,69897)
K = 1+5,644
K = 6,644 or approximately by 7 is good