Dr. John
Line
Office Hours
Classes Resume Publications Research
Line

Java applications for Bootstrapping and Resampling

Students learning statistics often are confused by topics like degrees of freedom (why are t-tests N-1 and correlations N-2?), one tailed vs 2 tailed comparisons, and paired vs independent designs. The tables in the back of statistics books often provide only critical t-values for various degrees of freedom. Hidden behind this approach are two problems. First, for the small N often collected by students the distribution of means is often very non-normal. Second, the rather tortured logic of null hypothesis testing of masks the real interest, which is how variable would I expect my data to be if I repeated my experiment? This forces students into a cookbook-type approach that leaves precious little understanding of the underlying principles of the tests, things like variability of the sampled means, or where the p-value comes from in the first place.

To solve these problems, we created three java applications which implement resampling to perform functions like t, F and correlation tests. However, these are free of distributional assumptions and make the link between samples and populations much more clear to students. These applications would be suitable for any methods or statistics course where the emphasis was on interpreting results rather than computing values.

You can download each from the links below, or look at descriptions and movies later on in this page. All programs were written by Melissa Troyer, based on ideas and input from Rick Hullinger, under the direction of Tom Busey.

To download java Applications (double-clickable .jar files), select from the list below:

These are java applications, and if necessary you can get java at:

You can also run these java applets inside your browser:

A primer, written by Rick Hullinger, describes the mathematical basis of resampling, and can be found here:

Within- and Between-Subjects Means Comparison

Often the first experiment a student will do involves a control and an experimental condition. The figure below illustrates how a student would conduct a within-subject (paired) comparison. The data for two conditions (here labeled Fast Task and Slow Task) is entered in columns, and each row contains the data for one subject. The top graph shows the distribution of differences. The bottom graph shows the distribution of resampled means for 10,000 resampled experiments. The gray region is the 95% confidence interval plotted around the obtained mean, and since zero falls outsize this confidence interval, we are reasonably confident that the two conditions differ. The program will also compute an exact probability, which is the number of resampled means that fell on the other side of zero from the majority of the data. Since this value is less than 0.025 we would report that we have a statistically significant difference at the 0.05 level (two tailed). Read below the figure for more information about resampling.

Within and between sampling application

Resampling involves the assumption that your sample is randomly selected from some population of interest. Because each subject represents a number of potential participants in the entire population, we can resample from the sample, treating it as a proxy for the entire population. We do this with replacement, which means that a given subject might enter our simulated experiment many times, or not at all. We resample as many subjects as we had in the original experiment. We then compute the summary score of interest (like the difference between two means) and create the histogram of these scores when we repeat this some large number of times (like 10,000).

The movie below explains how this works (click the play button in the lower-left corner):

 

More information about the basics of resampling are contained in a movie at the end of this page.

Correlational Data

Similar procedures can be applied to correlation data, except that the resampling is done on the correlation coefficient (or the slope; it doesn't matter, the two are transforms of each other):

Correlation bootstrapping

A brief movie explaining resampling with correlations is below (click the play button in the lower-left corner):

Factorial Designs

More recently we've come up with a version that does resampling on 2x2 designs, to look at interactions the graph below illustrates an interaction and we resample the difference of differences. The program handles all possible combinations of within and between subject variables (here gender is between subjects and Task is within subjects). The only difference between the different combinations is how the subjects are resampled.

interactions

Note that this version is still under development, so please report problems that might occur.

Summary

We have used these programs in a second-year undergraduate methods course at Indiana University with over 1000 students. We find that these tools obviate the need for more mathematical discussions of statistics in a research methods course, and provide all of the statistical tools we need in the course.

For More Information

The basics of resampling are covered in this movie (click the play button in the lower-left corner):

For more information about these applications or to make suggestions for future development, contact Tom Busey at busey@indiana.edu.