Data Analysis Walkthrough

Overview

Data analysis is about extracting meaningful information from your dataset using an appropriate tool. Displaying the data in graphical form can be a simple way to extract something meaningful. However, there is often a great deal of stochasticity (randomness, or noise) in ecological systems, so ecologists often employ tools whose specialty is separating meaning from noise - statistics!


The Ecoplexity website contains various tools for exploring, analyzing, and graphing your datasets. An approach that often works for people is to start by generating summary statistics and basic plots for datasets. This way one can identify potential trends and anomalies in the data, and determine whether parametric or non-parametric statistics will be most appropriate. After that, analyses are performed.


Tips

  • The best time to determine what type of data analysis to use is when you are planning your study. This way you can be sure to gather enough appropriate data, and avoid gathering unnecessary data.

  • One thing to identify is the type of data each of your variables contain. Common types are:

     


    1) Count data (e.g., number of individuals)


    2) Categorical data (e.g., forest vs. edge vs. meadow habitat) and


    3) Continuous data (e.g., inches of rain).


     


    Note that you may want to treat data as a different type than they would be considered in another circumstance. For example, dates might be categorical for the purposes of comparing the mean temperature of a stream during different times of the year but continuous for determining if there is a correlation between day of the year and mean stream temperature.


     


  • Once you collect data it can be helpful to get to know the structure of the dataset before doing formal statistical tests.

     

    For continuous data, the most common approaches are to:

     


    1) Examine summary statistics and


    2) Generate graphs of the distribution of each variable.


    Summary statistics (like the mean and range) are especially useful for getting an initial sense of how samples from two groups (levels of a categorical variable) compare. This can also be helpful for identifying erroneous values in your dataset. Because many statistical tests require data to be "normally distributed" (follow the bell-curve), graphs like histograms, box-plots, and Q-Q (quantile-quantile) plots are very useful for determining what test to perform.

Additional Resources: