Dec 23rd, 2020: Start with a Joke

It's a common strategy to start something new with a joke. Nevertheless, explaining the joke itself is often frowned upon on. I hereby claim the latter statement is simply not true. Hence the goal of this post is two-fold: 1) to get me started with the habit of writing by writing a joke ; 2) to spend serious amount of time explaining a joke and still make it funny.

Motivation

My colleague once mentioned to us his favorite statistics question on StackExchange is the following:

Suppose we have data set (𝑋𝑖,𝑌𝑖) with 𝑛 points. We want to perform a linear regression, but first we sort the 𝑋𝑖 values and the 𝑌𝑖 values independently of each other, forming data set (𝑋𝑖,𝑌𝑗). Is there any meaningful interpretation of the regression on the new data set? Does this have a name?

If you know anything about statistics, I strongly encourge you to visit the website here to read about people's comments and responses - it will make you smile.

If you don't know how statistics works as my friend Henry, that's okay too. As I mentioned at the very beginning, the goal of this post is to explain the joke and make it funny. Please keep reading. Below is a customized expository writing I sent to Henry couple of days ago, who managed to understand the joke immediately after reading it. Please let me know if there is still anything unclear!

True Story: Henry and Henry's Research

Setting: suppose Henry's boss wants Henry to predict next year revenue. Henry decided to make a regression model to impress his boss. Specifically, he decided to plot the year-revenue curve and fit a regression line.

Henry spent the morning finding this extremely reliable source: the revenues between 2010 and 2019 were: 79619, 29458, 29920, 30530, 31603, 28726, 28138, 27179, 29833, 27487 millions.

Henry was excited but he forgot how to do regression, so he asked his intern (from Emory University?) to do this instead. Here is the plot his intern presented to him:

Coefficients:
 [[-3119.60606061]]
r-square: [0.67403379]

Henry was happy. When the intern asked him how the plot could be improved, Henry replied with only two words: "Bigger Plot" . The intern looks concerned but went ahead to make this:

Coefficients:
 [[-3119.60606061]]
intercept: [50593.33333333]
r-square: [0.67403379]

Henry was excited, not only because of the larger plot which can impress his manager, but because he could now apply his solid quantitative skills developped from Middlebury Econ Department to predict 2020 revenue. Based on the coefficient, Henry did this:

y_2020= -3119*2020+50593= 19403 millions dollars

Henry felt crestfallen: "bellow 20000 Millions!!! My boat, my bonus, my new Tesla model XXX! No I wouldn't show this to my manager". Henry never gave up; he thought the intern must have done something fundamentally stupid. And the plot looked urgly anyway, because there was no clear trend of going up or going down. Not good!

Suddenly, there was a violent lightening strike in his office and a light bulb appeared in Henry's head. The Intern later claimed he did see the actual light bulb in Henry's head himself, as if seeing the face of god. It still remains unclear whether that was a mystical experience for both Henry and his intern (and this is still under dabate by many famous theologists), but Henry's ideas had fundamentally revolutionalized the sciences forever:

Henry's BIG idea: what if we sort the year and revenus independently and in increasing order? Will we have a better 2020 revenue? Will my boss love me again because of this?

Henry asked his intern to make another plot based on his new idea, though the intern looked very suspicious of Henry's new idea. But as always, the Emory intern had to listen to his boss. Here is the new plot the intern generated for Henry:

Coefficients:
 [[3431.58181818]]
intercept: [14811.8]
r-square: [0.59639035]

Henry was in ecstacy, not only because now there was a clear increasing trend in the plot, but because he could now apply his math skills developped from Midd again to predict 2020 revenue. Based on the coefficient, Henry did this:

y_2020= 3431*2020+14811= 6945431 millions dollars !!!!!!!!!!!!

Sometimes just a small modifications of conventional ideas could lead to BIG advance in sciences. Henry reported his predictions to his boss, and everyone was super excited. The very next day, the Nobel committe called, and Henry was awarded a Nobel prize in Economics due to his special contribution to corporate finance. Nowadays, Henry's sorting method is known as "stochastic monte Carlo mystical reordering method", which has been widely adopted in both academia and industry. He later accepted a Tenure position at UChicago and became so rich that he bought his best friend a new house in Hyde Park, Chicago, IL.