# Fake data for linear regression
# For writing the CSV file
library(readr)
# Set random number generator for reproducibility
set.seed(2023)
# Number of observations
N <- 50
# Create x from a uniform distribution ranging from Low to High
# Adjust the Low and High for each x variable as desired
x1 <- runif(N, 0, 10)
x2 <- runif(N, 1, 20)
x3 <- runif(N, 0, 25)
# Create y by forcing the equation y = b0 + b1x1 + b2x2 + b3x3 + e
# e is normally distributed with mean = 0 and standard deviation = 2. Adjust as desired.
# Adjust each coefficient as well as the + or - of each coefficient as desired
y <- 12 - 0.5 * x1 - 0.25 * x2 + 0.8 * x3 + rnorm(N, 0, 2)
# Create the data frame
df <- data.frame(x1, x2, x3, y)
# Save the data
write_csv(df, "fake3.csv")Appendix B — Creating Fake Data
Creation of fake data helps to avoid the pitfalls of violating model assumptions (e.g. Section 5.2)
B.1 Linear Regression
The approach taken to create the fake data in Chapter 5 was gleaned from Mohr (2018) and extended to include multiple x variables.
B.2 Logisitc Regression
The approach taken to create the fake data in Chapter 6 was gleaned from Ford (2019).