Assignment 2

Due: Friday May 9

For each question provide: (i) a written description of the results and interpretation of your analyses and, where applicable, any visualizations of the data (ii) the R source file or notebook used to produce these results

Q1: You are performing experiments to determine whether treatment with three different small molecules affects viability in a neuronal cell line in response to stress compared to mock treatment. For each small molecule and control (no-treatment), you culture independent clones of the cell line (10 per group) and after 72hr measure cell viability where for each clone you perform two replicate measurements of the assay. The data from these experiments is here. Provide descriptive summary statistics for each group, a clear visualization of the underlying data and variability, and details of what is being visualized (i.e. like in a figure legend). Finally, test the hypothesis that each small molecule affects viability. What can you conclude about the effects of each small molecule on neuronal viability in response to stress? (6 points)

Q2: You are interested in identifying candidate genes for Parkison’s Disease (PD) to select for detailed mechanistic study in cellular models. Your PI sends you unpublished data from a collaborator who has profiled substantia nigra regions from non-disease and PD patient samples from a biobank using RNA-seq and tested genes genome-wide for differential expression. The data they sent you is here where they provided the raw p-values for each gene, and which you would like to then use to identify interesting candidates for your study. How many genes have significant changes in expression after accounting for multiple tests, and how many of these genes are up-regulated in PD and down-regulated in PD? (7 points)

Q3: You are designing an experiment to determine whether knocking out a gene of interest with CRISPR/Cas9 affects action potential duration in stem cell-derived cardiomyocytes. Your lab has generated preliminary data recording action potential duration (ms) from a range of lines, which is located here. Previous studies in the literature have identified multiple genes with significant effects on action potential duration using a similar system, and the reported percent changes in mean action potential duration from these studies are 40%, 20% and 10% for each gene, respectively. What is the sample size you would need to be well-powered to detect each of these reported changes for your gene of interest - given the parameters of the preliminary data generated from your lab - at an alpha of .05? What would your conclusions from this power calculation be?

As a follow-up, after designing your study, your PI comes back and says due to budget cuts that you only have the funding to perform these experiments in at most five lines per group. What power do you have to detect each of these effects with a sample size per group of five at an alpha of .05? What would your conclusions from this power calculation be, and how would that inform your interpretation of the results of your eventual hypothesis testing?

After designing your study, you perform your experiments and determine that your gene of interest has a significant effect (60% increase) in action potential duration. You are thrilled because you think now you can defend and graduate, but your PI then asks you to design and perform a follow-up study to determine whether the effect is reproducible. What is the sample size you would need to be well-powered to detect a 60% increase in action potential in your replication study at an alpha of .05? Will you ever graduate? (7 points)

Extra credit: Instead of having everyone self-assign to groups I could have randomly assigned everyone to a group. Provide code in R that will randomly assign students in the class to 9 groups where each group has at least 3 members. The class list is here.