Published

16 Dec 2025

Guidelines for coursework 2

General information

The goal of coursework 2 is to conduct a small data analysis projects that includes a complete workflow from obtaining raw data, cleaning and importing them, making a simple statistical analysis, creating plots and writing a short report. By doing this work you will practice the key principles and basic aspects of data science and reproducible research and apply your R skills to this goal.

The dataset

The dataset consists of measurements of the weather station (Link) of the Heidfeldhof near the University of Hohenheim and comprise the year 2007 to 2024. They contain the average and maximal values of various parameters (depending on the trait). The data are as a downloadable .csv file on the course webpage: Link.

Each student will be assigned the data for one trait and you find the assignment at the homepage of the course.

Objectives of the data science project

The goal of the project is to test the following hypothesis: Climate change is observable at the Heidfeldhof agricultural research station.

Your tasks

  1. Formulate a specific null hypothesis for the two time series you have been assigned.
  2. Think about how to conduct the analysis to test the null hypothesis.
  3. Analyse the data that were assigned to you using the tidyverse libraries of R.
  4. Produce a report from the analysis and discuss whether the hypothesis

You are expected to carry out the following steps and implement them in a fully reproducible document in the RMarkdown Quarto format. Your analysis involves the following steps:

  1. Import data into an R data frame using the readr package of tidyverse.
  2. Filter out the two columns with the measurements and the month that were assigned to you
  3. Check the proportion of missing data and report them.
  4. Plot a histogram with the distribution of the both datasets.
  5. Analyse the correlation between two datasets
  6. Implement the analyses that test your hypotheses. Use both statistical tests and plots. Report the results.
  7. Add any literature references, if needed (can be done with RStudio, but there is also information on the Quarto website).

Structure of the report

The analysis report should consist of the following sections:

  • A header with your immatriculation number (do not include your name to avoid any bias during grading) and the dataset number.
  • An introduction with the hypothesis and description of the data (number of individuals, type of data, units)
  • A materials and methods section in which you describe the tools you used
  • A results section. Briefly describe which analysis you did, why you did it and what the result is.
  • Include the code for analysis in the report as code chungs. The code chunks should also be displayed in proper context in the PDF (or HTML).
  • A brief description of the results which summarises which information can be retrieved from the results and how it can be interpreted.
  • Figures and/or tables that show the results, properly referenced in the text sections and with proper caption and correct annotation (e.g., have axis labels). Avoid redundancy.
  • Each analysis should be fully repeatable with the given information.
  • A discussion section that interpretes your results and puts them into context.

A discussion section, in which you summarise all your results and discuss them in context with each other. It should put the results in context (i.e., what could be a reason that the data are not normally distributed?). Also check whether the correlations you observe correspond to the trait correlations reported in the publication. Justify any of your conclusions with your analysis results.

The console output should not be included in the PDF, only the results of the analysis. Some methods produce a large amount of text that should not be included in a report. The report should be concise - It is not necessery to spend too much time on a specific statistical method (such as linear mixed models). Simple regression analyses are sufficient.

Hint: It is OK if you consider the data points within a month of a year as a random sample for a specific month of the year, both for plotting (i.e., all measurements for a month can be used for a box plot with jitter) and for the statistical analysis.

Grading criteria

The following criteria are used for the evaluation of the coursework:

  1. You submit an .qmd and .pdf formatted file. The PDF can be produced with the Typst program, which is installed with Quarto by default.
  2. The .qmd must be fully executable without any error message. Keep this in mind with your paths. Assume the dataset is in the same directory then the .qmd file.
  3. The analyses must be done with the tidyverse R package system (e.g., using the ggplot library for plotting).
  4. Only the output that is relevant for the report is printed (use the corresponding options of the code chunks). Avoid any cluttering output.
  5. Overall layout and structure of report according to template. No need to use fancy styling!
  6. Figures and tables are appropriate and meaningful
  7. Length and structure introduction, data cleaning, analysis and discussion sections
  8. Typing errors and grammar
  9. Quality of writing and appropriate use of jargon terms
  10. Correct and complete references
  11. Connections between data cleaning and data analysis is correct
  12. Results of analysis are discussed in a sensible and meaningful manner

The paper is evaluated according to the formal criteria described above and then ranked among all term papers with respect to the criteria. All term papers will be checked for plagiarism! We usually write down some comments on the term paper and make them available to you after grading to give you some feedback.

Submission of the coursework 2 paper

Please upload the coursework 2 paper to the Coursework-2 folder on ILIAS as a single pdf.

Name the .qmd file as 123456_coursework2.qmd where 123456 is your immatriculation number. Name the accompanying .pdf file accordingly. Do not use your name in the report to avoid any conscious or unconscious bias in the grading process.

Please remember to submit as a separate PDF the signed declaration of plagiarism, which you can download from here: Link It is fine if you print it out, sign it and take a scan with your smartphone or tablet.

The due date for submission is given on the main page of the course.