What is used to display the frequencies or proportions of observations in a categorical scale data set?

Frequency Distribution

What is a frequency distribution?

Frequency distributions are visual displays that organise and present frequency counts so that the information can be interpreted more easily.

Frequency distributions can show absolute frequencies or relative frequencies, such as proportions or percentages.


How do we show a frequency distribution?

A frequency distribution of data can be shown in a table or graph. Some common methods of showing frequency distributions include frequency tables, histograms or bar charts.

Frequency Tables

A frequency table is a simple way to display the number of occurrences of a particular value or characteristic.

For example, if we have collected data about height from a sample of 50 children, we could present our findings as:

Height of Children

Height (cm) of children

Absolute frequency

Relative frequency

120 – less than 130

9

18%

130 – less than 140

10

20%

140 – less than 150

13

26%

150 – less than 160

11

22%

160 – less than 170

7

14%

Total

50

100%


From this frequency table we can quickly identify information such as 7 children (14% of all children) are in the 160 to less than 170 cm height range, and that there are more children with heights in the 140 to less than 150 cm range (26% of all children) than any other height range.

Data can also be presented in graphical form.

Frequency Graphs

Histograms and bar charts are both visual displays of frequencies using columns plotted on a graph. The Y-axis (vertical axis) generally represents the frequency count, while the X-axis (horizontal axis) generally represents the variable being measured.

A histogram is a type of graph in which each column represents a numeric variable, in particular that which is continuous and/or grouped.

A histogram shows the distribution of all observations in a quantitative dataset. It is useful for describing the shape, centre and spread to better understand the distribution of the dataset.

Features of a histogram:

  • The height of the column shows the frequency for a specific range of values.
  • Columns are usually of equal width, however a histogram may show data using unequal ranges (intervals) and therefore have columns of unequal width.
  • The values represented by each column must be mutually exclusive and exhaustive. Therefore, there are no spaces between columns and each observation can only ever belong in one column.
  • It is important that there is no ambiguity in the labelling of the intervals on the x-axis for continuous or grouped data (e.g. 0 to less than 10, 10 to less than 20, 20 to less than 30).

    For example:


    The histogram below shows the same information as the frequency table.

    A bar chart is a type of graph in which each column (plotted either vertically or horizontally) represents a categorical variable or a discrete ungrouped numeric variable.

    It is used to compare the frequency (count) for a category or characteristic with another category or characteristic.

    Features of a bar chart:

  • In a bar chart, the bar height (if vertical) or length (if horizontal) shows the frequency for each category or characteristic.
  • The distribution of the dataset is not important because the columns each represent an individual category or characteristic rather than intervals for a continuous measurement. Therefore, gaps are included between each bar and each bar can be arranged in any order without affecting the data.

    For example:


    If data had been collected for 'country of birth' from a sample of children, a bar chart could be used to plot the data as 'country of birth' is a categorical variable.

    Birthplace of Children

    Country of Birth

    Absolute frequency

    Relative frequency

    Australia

    16

    32%

    Fiji

    3

    6%

    India

    8

    16%

    Italy

    10

    20%

    New Zealand

    9

    18%

    United States of America

    4

    8%

    Total

    50

    100%


    The bar chart below shows us that 'Australia' is the most commonly observed country of birth of the 50 children sampled, while 'Fiji' is the least common country of birth.

    Return to Statistical Language Homepage

  • Data types are an important aspect of statistical analysis, which needs to be understood to correctly apply statistical methods to your data. There are 2 main types of data, namely; categorical data and numerical data.

    As an individual who works with categorical data and numerical data, it is important to properly understand the difference and similarities between the two data types. This will make it easy for you to correctly collect, use, and analyze them. 

    The importance of understanding the different data types in statistics cannot be overemphasized. Therefore, in this article, we will be studying at the two main types of data- including their similarities and differences.

    What is Categorical Data?

    Categorical data is a type of data that can be stored into groups or categories with the aid of names or labels. This grouping is usually made according to the data characteristics and similarities of these characteristics through a method known as matching.

    Also known as qualitative data, each element of a categorical dataset can be placed in only one category according to its qualities, where each of the categories is mutually exclusive. For example,  gender is a categorical data because it can be categorized into male and female according to some unique qualities possessed by each gender. 

    There are 2 main types of categorical data, namely; nominal data and ordinal data

    This is the data type of categorical data that names or labels. Sometimes called naming data, it has characteristics similar to that of a noun. 

    E. g. Name of a person,  gender, school graduates from,  etc. 

    nominal-ordinal-data

    This type of categorical data includes elements that are ranked, ordered or have a rating scale attached. One can count and order, nominal data,  but it can not be measured. 

    For example, suppose a group of customers were asked to taste the varieties of a restaurant’s new menu on a rating scale of 1 to 5—with each level on the rating scale representing strongly dislike, dislike, neutral, like, strongly like. In this case, a rating of 5 indicates more enjoyment than a rating of 4, making such data ordinal.

    What is Numerical Data?

    Numerical data is a type of data that is expressed in terms of numbers rather than natural language descriptions. Similar to its name, numerical, it can only be collected in number form. Also known as quantitative data, this numerical data type can be used as a form of measurement, such as a person’s height, weight, IQ, etc.

    It can also be used to carry out arithmetic operations like addition, subtraction, multiplication, and division.

    There are 2 types of numerical data,  namely; discrete data and continuous data. 

    Discrete data is a type of numerical data with countable elements. I.e they have a one-to-one mapping with natural numbers. Discrete data can either be countably finite or countably infinite. Some general examples of discrete data are; age, number of students in a class, number of candidates in an election, etc. 

    A countably finite data can be counted from the beginning to the end, while a countably infinite data cannot be completely counted because it tends to infinity. 

    For example, the bags of rice in a store are countably finite while the grains of rice in a bag is countably infinite

    Continuous is a numerical data type with uncountable elements. They are represented as a set of intervals on a real number line. Some examples of continuous data are; student CGPA, height, etc.

    Similar to discrete data, continuous data can also be either finite or infinite. An uncountable finite data set has an end, while an uncountable infinite data set tends to infinity. 

    Continuous data can be further divided into interval data and ratio data.

    Interval data: This is when numbers have units that are of equal magnitude as well as rank order on a scale without an absolute zero. Scales of this type can have an arbitrarily assigned “zero”, but it will not correspond to an absence of the measured variable.  For example, the temperature in Fahrenheit scale. 

    Ratio data: When numbers have units that are of equal magnitude as well as rank order on a scale with an absolute zero. An example is blood pressure.

    15 Key Differences Between Categorical & Numerical Data 

    Definitions

    Categorical data is a type of data that is used to group information with similar characteristics while Numerical data is a type of data that expresses information in the form of numbers. It combines numeric values to depict relevant information while categorical data uses a descriptive approach to express information

    We can see that the 2 definitions above are different. Therefore, categorical data and numerical data do not mean the same thing.

    Other Names

    Categorical data is also called qualitative data while numerical data is also called quantitative data. This is because categorical data is used to qualify information before classifying them according to their similarities.

    Numerical data is used to express quantitative values and can also perform arithmetic operations which is a quantitative characteristic.

    Both numerical and categorical data have other names that depict their meaning. But the names are however different from each other.

    Examples

     Categorical data examples include personal biodata information—full name, gender, phone number, etc. Numerical data examples include CGPA calculator, interval sale, etc. 

    The examples below are examples of both categorical data and numerical data respectively.

    1. What is your hair colour?
    • Blonde
    • Brunette
    • Brown
    • Black
    • Red

    categorical-data-examples

    1. A CGPA calculator that asks students to input their grades in each course, and the number of units to output their CGPA. 

    For example, 1. above the categorical data to be collected is nominal and is collected using an open-ended question. Example 2. is a numerical data type.

    The content suggestion here (See how you can create a CGPA calculator using Formplus.)

    Types

    Categorical data is divided into two types, namely; nominal and ordinal data while numerical data is categorised into discrete and continuous data. Continuous data is now further divided into interval data and ratio data.

    Although they are both of 2 types, these data types are not similar.

    Data Characteristics

    The characteristics of categorical data include; lack of a standardized order scale, natural language description, takes numeric values with qualitative properties, and visualized using bar chart and pie chart. 

    Numerical data, on the other hand, has a standardized order scale, numerical description, takes numeric values with numerical properties, and visualized using bar charts, pie charts, scatter plots, etc.

    User-centred Design

    Numerical data collection method is more user-centred than categorical data. Most respondents do not want to spend a lot of time filling out forms or surveys which is why questionnaires used to collect numerical data has a lower abandonment rate compared to that of categorical data.

    This is because categorical data is mostly collected using open-ended questions

    Data Collection Methods

    Categorical data can be collected through different methods, which may differ from categorical data types. For instance, nominal data is mostly collected using open-ended questions while ordinal data is mostly collected using multiple-choice questions.

    Numerical data, on the other hand, is mostly collected through multiple-choice questions. We observe that it is mostly collected using open-ended questions whenever there is a need for calculation.

    Data Collection Tools

    Data collectors and researchers collect numerical data using questionnaires, surveys, interviews, focus groups and observations. Categorical data is collected using questionnaires, surveys, and interviews. 

    Data collection is usually straightforward with categorical data and hence, does not require technical tools like numerical data. For example, numerical data of a participant’s score in different sections of an IQ test may be required to calculate the participant’s IQ.

    When collected using online forms, this may require some technical additions to the form, unlike categorical data which is simple.

    Analysis & Interpretation

    There are 2 methods of performing numerical data analysis, namely; descriptive and inferential statistics. Some examples of these 2 methods include; measures of central tendency, turf analysis, text analysis, conjoint analysis, trend analysis, etc. 

    There are also 2 methods of analyzing categorical data, namely; median and mode. In some cases, we see that ordinal data Is analyzed using univariate statistics, bivariate statistics, regression analysis, etc. which is used as an alternative to calculating mean and standard deviation.

    Uses

    Numerical data is mostly used for calculation problems in statistics due to its ability to perform arithmetic operations. For example, when designing a CGPA calculator, one may need to include commands that allow for the addition, subtraction, division, and multiplication. 

    Categorical data, on the other hand, is mostly used for performing research that requires the use of respondent’s personal information, opinion, etc. It is commonly used in business research.

    Advantage

    Numerical data is compatible with most statistical analysis methods and as such makes it the most used among researchers. Categorical data, on the other hand, does not support most statistical analysis methods. 

    There are alternatives to some of the statistical analysis methods not supported by categorical data. However, they can not give results that are as accurate as the original.

    Disadvantage

    Numerical data analysis is mostly performed in a standardized or controlled environment, which may hinder a proper investigation. This is because natural factors that may influence the results have been eliminated, causing the results not to be completely accurate. 

    Numerical data collection is also strictly based on the researcher’s point of view, limiting the respondent’s influence on the result. This is not the case with categorical data. 

    Nominal data captures human emotions to an extent through open-ended questions. However, the setback with this is that the researcher may sometimes have to deal with irrelevant data.

    numerical-categorical-data-analysis

    Compatibility

    Numerical data is compatible with most statistical methods of data analysis, but categorical data is incompatible with the majority of these methods. Therefore, hindering some kind of research when dealing with categorical data.

    More reasons why most researchers prefer to use categorical data.

    Visualization

    Categorical data can be visualized using only a  bar chart and pie chart. The bar chart is used when measuring for frequency (or mode) while the pie chart is used when dealing with percentages. Numerical data, on the other hand,d can not only be visualized using bar charts and pie charts, but it can also be visualized using scatter plots.

    Structure

    Categorical data can be considered as unstructured or semi-structured data. It is loosely formatted with very little to no structure, and as such cannot be collected and analyzed using conventional methods.

    Although there are some methods of structuring categorical data, it is still quite difficult to make proper sense of it. This method is had to do with indexing, which is what search engines like Google, Bing, and Yahoo use.

    Numerical data, on the other hand, is considered as structured data. It is formatted in such a way that it can be quickly organized and searchable within relational databases. E.g. numbers and values found in spreadsheets.

    Similarities Between Categorical & Numerical Data 

    Although proven to be more inclined to categorical data, ordinal data can be classified as both categorical and numerical data. In some texts, ordinal data is defined as an intersection between numerical data and categorical data and is therefore classified as both. 

    Numerical and categorical data can not be used for research and statistical analysis. They might, however, be used through different approaches, but will give the same result.

    Researchers sometimes explore both categorical and numerical data when investigating to explore different paths to a solution. For example, an organization may decide to investigate which type of data collection method will help to reduce the abandonment rate by exploring the 2 methods.

    Hence, the organization may ask these 2 questions to investigate the response rate. 

    Question 1:

    What do you think about our product?  ____

    Question 2

    Rate our product on a scale of 1 to 5.

    • 1
    • 2
    • 3
    • 4
    • 5
    • Numerical Value

    Both numerical and categorical data can take numerical values. Categorical data can take values like identification number, postal code, phone number, etc. The only difference is that arithmetic operations cannot be performed on the values taken by categorical data.

    Numerical and categorical data can both be collected through surveys, questionnaires, and interviews. 

    What Is The Best Tool For Collecting Numerical & Categorical Data? 

    It is not enough to understand the difference between numerical and categorical data to use them to perform better statistical analysis. You also need to use Formplus, the best tool for collecting numerical and categorical to get better results.

    Formplus contains 30+ form fields that allow you to ask different types of questions from your respondents. You also have access to the form analytics feature that shows you the form abandonment rate, number of people who viewed your form and the devices they viewed them from.

    Hence, making it possible for you to track where your data comes from and ask better questions to get better response rates. It doesn’t matter whether the data is being collected for business or research purposes, Formplus will help you collect better data.

    Why Use Formplus to Collect Numerical and Categorical Data? 

    Work with real data & analytics that will help you reduce form abandonment rates. With Formplus, you can analyze respondents’ data, learn from their behaviour and improve your form conversion rate.

    The form analytics feature gives zero room for guess games. That is, you strictly work with real data—know the number of people who fill out your form, where they’re from, and what devices they’re using.

    Reduce form abandonment rates with visually appealing forms. The best part is that you don’t have to know how to write codes or be a graphics designer to create beautiful forms with Formplus.

    There is also a pool of customized form templates from you to choose from. You can easily edit these templates as you please.

    Respondents in remote locations or places without a reliable internet connection can fill out forms while offline. The data will be automatically synced once there is an internet connection.

    You can also use conversational SMS to fill forms, without needing internet access at all. This also helps to reduce abandonment rates and increase audience reach since it allows people without internet access.

    Store your online forms, data and all files in the unlimited cloud storage provided by Formplus. That way, your data is not only kept safe and secure, but you can also easily access it anywhere and from any device.

    If you don’t want to use the Formplus storage, you can also choose another cloud storage. Formplus currently supports Google Drive, Microsoft OneDrive and Dropbox integrations.

    Allow respondents to save partially filled forms and continue at a later time with the Save & Resume feature from Formplus. Respondents can choose to save the form and send the link to their email and continue from where they stopped later.

    This is a great way to avoid form abandonment or the filling of incorrect data when respondents do not have an immediate answer to the questions.

    Conclusion

    Statistical analysis may be performed using categorical or numerical methods, depending on the kind of research that is being carried out. A researcher may choose to approach a problem by collecting numerical data and another by collecting categorical data, or even both in some cases.

    During the data collection phase, the researcher may collect both numerical and categorical data when investigating to explore different perspectives. However, one needs to understand the differences between these two data types to properly use it in research.

    This is more reason why it is important to understand the different data types.