Appendix B — Creating Fake Data

Creation of fake data helps to avoid the pitfalls of violating model assumptions (e.g. Section 5.2)

B.1 Linear Regression

The approach taken to create the fake data in Chapter 5 was gleaned from Mohr (2018) and extended to include multiple x variables.

# Fake data for linear regression

# For writing the CSV file
library(readr)

# Set random number generator for reproducibility
set.seed(2023)

# Number of observations
N <- 50

# Create x from a uniform distribution ranging from Low to High
# Adjust the Low and High for each x variable as desired
x1 <- runif(N, 0, 10)
x2 <- runif(N, 1, 20)
x3 <- runif(N, 0, 25)

# Create y by forcing the equation y = b0 + b1x1 + b2x2 + b3x3 + e
# e is normally distributed with mean = 0 and standard deviation = 2. Adjust as desired.
# Adjust each coefficient as well as the + or - of each coefficient as desired
y <- 12 - 0.5 * x1 - 0.25 * x2 + 0.8 * x3 + rnorm(N, 0, 2)

# Create the data frame
df <- data.frame(x1, x2, x3, y)

# Save the data
write_csv(df, "fake3.csv")

B.2 Logisitc Regression

The approach taken to create the fake data in Chapter 6 was gleaned from Ford (2019).