23-APR-19: Created to outline the specificaitons for SP 19 final projects
DAT-102 Final project specifications
Experientially acquire and/or magnify sample-based data inquiry skills including study design, implementation, sharing, and documentation
Formulate a compelling inquiry question about a unit of analysis of interest to you and one which can be studied safely and easily by CCAC students (such as a Vehicle, or Student, or Building. Design a unit sampling methodology and rationale. Implement your methodology, digitize the data, analyze the results, draw preliminary conclusions, and document your work thoroughly for future students.
motorcycleLibrary book example
Final project idea generation activity
We'll use library books as a sample final project to illustrate each step of the data project life cycle. Use this in-class activity to formulate a study of your choosing.
- Generate a classification tree for Library books: call numbers delimit sub-populations
- Create an inquiry question about sub-populations
- Create data gathering spreadsheet and prepare data dictionary
- Devise sampling method to randomly sample from the sub-population
- Gather sample data
- Perform Analysis
Final project requirements
The following are the base level requirements for the Spring 2019 final project in data 102. All of these specs are negotiable through discussion with the instructor. Customization and creative modification of all requirements in the spirit of experimentation and exploration is strongly encouraged.
- Define a single unit of analysis, such as a Book or a CCAC Student or Colleague, or Bicycle, or Road.
- Formulate a compelling inquiry question (or small set of questions) related to this identified unit of analysis and compose a brief (1 paragraph) rationale for how you arrived at this question. Consider your own background, interests, hobbies, career/job, values, and curiosities.
- Generate a classification tree whose root is the complete population of all instances of your unit of analysis (e.g. all books cataloged by the Library of Congress) and whose sub-branches depict sub-populations of your unit of analysis (e.g. books with call numbers starting with CR (heraldry).
- Formulate your tree iteratively over several drafts. Include all drafts, no matter how rough, with your final project documentation.
- At the conclusion of your project, create a final copy of your classification tree digitally or neatly by hand (and scanned). draw.io is a google drive based tool that's excellent for making diagrams of all kinds.
- Include at least 5 sub-categories of your unit of analysis
Create data instruments and procedures
- Create a spreadsheet to serve as a data gathering tool. Each row should capture data about a single instance of your unit of analysis. 1 or more columns should be dedicated to each of your chosen variables.
- Add a data dictionary in a second tab in your spreadsheet from the previous requirement which includes the following:
- The name of each variable
- Data type (integer, text, date, count, etc.)
- Possible values or range of values, especially if coded (i.e. M = male, F=female, O=Other)
- Tips for accurate measurement
Carry out your study & make claims
- Carry out your study: gather your data, record that data digitally in your spreadsheet.
- Carry out basic descriptive statistics on each of your variables of interest: mean, median, mode, quartiles, standard deviation (if appropriate). Summarize the differences seen across sub-populations using these descriptive statistics.
- Generate compelling visualizations describing your results: box-and-whisker charts, neat hand-drawn figures, etc.
- If you feel comfortable with the concepts, apply basic statistics means testing (e.g. interpreting the p-value generated from a t-test) comparing measured values between two sub-populations. See chapter 3 on confidence intervals in our Lock 5 statistics textbook, and chapter 6 for Inferences on means and proportions.
- Extract data-backed claims that relate to your chosen inquiry question using your gathered data. A claim that suggests that not enough data was gathered, or that the data gathered is inclusive (or even worthless) is entirely appropriate.
Documentation and sharing
- Write a short letter to a future DAT-102 student who may wish to continue your study detailing the following:
- Your experience in gathering the data (did anything unexpected happen?)
- Likely sources of error or bias in your sample: defend your approach to randomly drawing from the sub-population of interest.
- Revisions you would suggest making to your own study now that you've carried it out once.