# Frequency Distribution

The tally or frequency count is the calculation of how many people fit into a certain category or the number of times a characteristic occurs. This calculation is expressed by both the absolute (actual number) and relative (percentage) totals.

The example below is a typical output by the statistical software package SPSS. It provides us with the following information by column, starting from left to right:

• Column 1: the "valid" or useable information obtained as well as how many respondents did not provide the information and the total of the two.
• Column 2: the names of the "values" or choices people had in answering the particular question (in this case "high school or less", "some college/university", "graduated college/university or more") the total of these values and the reason why the information is missing. "System" refers to an answer that was not provided. Other choices might be "don’t know", "not applicable" or "can’t remember", which we might wish to discount in our total.
• Column 3: "frequency" refers to the actual number of respondents. As we can see, 95 people had a high school or less education, 263 had some college or university and 790 had graduated college/university or more, for a total of 1148. 60 people did not provide an answer, for a total of 1208 respondents in the survey. (This number should almost always be the same for all questions analyzed, although the number of "missing" could vary widely.)
• Column 4: "percent" is the calculation in percentage terms of the relative weight of each of the categories, but taking the full number of 1208 respondents in the survey as the base. Since we are rarely interested in including the ‘missing’ category as part of our analysis, this percentage is rarely used.
• Column 5: "valid percent" is the calculation in percentage terms of the relative weight of each of the "valid" categories only. Hence it uses only those who responded to the question, or 1148, for the calculation. It is this column that is normally used for all data analysis.
• Column 6: when adding the valid percent column together row by row, you get the corresponding "cumulative percentage". In this example, "high school or less" (8.3) plus "some college/university (22.9) equals the cumulative percentage of 31.2 (8.3 + 22.9). When you add the third category of "graduated college/university or more" (68.8) to the first two, you will reach 100. This column is particularly useful when dealing with many response categories and trying to determine where the median falls.

Highest level of education

 Frequency Percent Valid Percent Cumulative Percent Valid high school or less 95 7.9 8.3 8.3 some college/university 263 21.8 22.9 31.2 graduated college/ university or more 790 65.4 68.8 100.0 Total 1148 95.0 100.0 Missing System 60 5.0 Total 1208 100.0

There are two common ways of graphically representing this information. The first is in the form of a pie chart (Figure 1), which takes the percentage column and represents it in the form of pieces of pie based on the percentage for each category. You will notice that any graph should be given a number (e.g. Figure 1), a title (e.g. Highest level of education) and the total number of respondents that participated in the survey (n=1208). Pie charts should be used to only to express percentages or proportions, such as marketshare.  Another way to graph the information is through a bar chart (Figure 2). In this case, we used a simple bar graph. Notice also that you are able to eliminate the missing category from the graph and therefore base the analysis on the 1148 respondents who answered this particular question. This is preferable to using a pie chart in the SPSS program, which will not allow you to eliminate the missing cases.

Line charts are used when plotting the chance in a variable over time. For example, if this same study had been undertaken every two years for the past ten, you might want to present this information graphically by showing the evolution in each education level with a line.