Due: Fri Jun 6
For each question provide: (i) a written description of the results of your analyses including a brief justification for the test/analysis you used, (ii) the R source file used to produce these results, (iii) a clear visualization of both the underlying data points and the variability in that data provided to you and a description of what is being visualized (e.g. like in a figure legend).
Q1: You want to test the hypothesis that microbial diversity affects Parkinson’s disease (PD). You therefore collect a cohort of individuals, obtain microbial samples from different regions of the body (gut, skin) from each individual, calculate alpha diversity for each region, and group alpha diversity values into two categorites (high, low). Is high or low alpha diversity for either region associated with PD? If so, what is the effect (odds ratio) on PD? The data is provided here.
(10 points)
Q2: You identify a novel biomarker of disease and want to determine whether age and BMI are associated with levels of the biomarker. You collect a cohort of individuals and meausre biomarker levels and record their age and BMI, and the data are stored here.
Is there evidence that age and/or BMI is associated with the biomarker? Use multiple regression, and make sure to check the distribution of the residuals to make sure they are normally distributed.
You then want to use your regression model to predict the biomarker level of individuals given their age and BMI. For the following five individuals here, what are their predicted biomarker levels?
(10 points)
Extra credit:
Can you find any issues with the statistical analyses in this paper? Walker et al. Sex-specific transcriptional changes in response to adolescent social stress in the brain’s reward circuitry
[Main text]
[Methods]