The concept of Linear Regression (Part-1)

Krishna
8 min readSep 18, 2022

--

Machine Learning is a very vast field, often when we want to start learning ML algorithms, the first ML algorithm that we start with is Linear Regression, many online data science courses and even graduate courses start with Linear Regression. The main reason behind this might be LR is one such ML algorithm which easy to teach but takes little time to grasp the underlying idea because of the heavy maths involved in it. Today, in part 1 of the concept of LR we will understand why this algorithm is useful, when can we use this algorithm, and what is basic math involved in it. Later, in part 2 we will discuss types of LR and how can we tune the hyperparameters to reduce the overall error. Let me first introduce to few concepts before we start discussing the LR algorithm.

Independent Variable

These are the inputs for a process to get started, for example, if we want to predict whether a person will be qualified for the interview, we need to have specific details, such as the requirements for the interview. For example, If the company needs candidates with at least 65% overall grade in bachelor’s and 1-year experience in the field of data analytics, then the overall grade and experience will be the independent variables here, they are not dependent on anything.

Dependent variable

These are the outputs from a specific experiment that ran on a set of variables (Independent variables). Considering the same example, If the company needs candidates with at least 65% overall grade in bachelor’s and 1-year experience in the field of data analytics, then the overall grade and experience will be the independent variables here, whether the candidate who is applying for the role will be eligible or not is the dependent variable here. This needs to be dependent on the inputs.

The slope of a line

Consider an example, your friend called you and invited you to a party but you don’t know where exactly the place is, what would be your next question that you would ask your friend, ‘hey can you please help me with the direction ?’. In the same way, to know the direction of a line, and where it is moving we need to find the slope.

How to Calculate a slope?

It is the change in y by a change in x.

M = (y-y1) / (x-x1)

What is a y-intercept of line (b)?

It is the point where the line crosses the y-axis.

(Note: There is a y-intercept and x-intercept, here we are only talking about the y-intercept)

If we understand what is a slope and why we calculate it, then we will be comfortable understanding the math behind building an ML algorithm.

Consider the above image, it is a graph that represents the relationship between capacity and price. From the plot, we can able to understand how will the price get affected by the increase in the capacity, for change is x what is the change is y? Which is the slope of a line which we discussed above. In general, we use the LR algorithm for this kind of prediction, but when can we use LR? Let’s first try o answer this and later we can get started with the concept.

Linearity

When we have two features and when we plot the data points on a graph, if we can see a relationship between the two features and we can represent the data on a line graph, we can say it is having linear properties. In other words, if the data can be separated by a straight line, and if we can calculate if one value in a feature changes what would be the change in the target then we can say follows linearity and that model can be considered as a linear model. In a linear model, the coefficients (y = mx+b, where m is the co-efficient) should follow the linearity, which means the linear combination of coefficients we call a linear model.

Dependent variables (y) = constant(b) + independent variable1 (x) * parameter (m) + independent variable2 (x) * parameter (m) + ….. + independent variable n (x) * parameter (m)

If we can see this kind of relationship between the dependent and independent variables, we call it a Linear model.

Note: The in-depend variables can have higher polynomials (which we will be covering in part -2, polynomial regression).

Now, that we have understood what is a linear model, let us understand when should use it. When we want to solve a problem and if we have sufficient relevant data with us, often we dive into finding some patterns through visualization and start building an ML model, here is where we need to be clear and cautious. During the visualization step, we need to access whether the data points are overlapping with each other, and how the data patterns are arranged (especially while we have many features). If there is a lot of overlap between the data points or if the data points are not following any linear pattern, it is hard to predict the output through LR (in such case we need to go with Support vector regression (SVR) or Random forest Regression (RFR)).

The above image will help to understand when we need to go for LR.

Now let us understand the intuition behind LR, for that we need to first understand

  1. OLS (Ordinary Least Squares)
  2. Cost Function
  3. Gradient Descent

OLS (Ordinary Least Square)

Linear regression tries to establish a linear relationship between the independent variables to predict the dependent variable. We know the slope of a straight line from above

y= mx+b

We will first consider a sample data set as an example to solve for OLS, which will help to understand the concept. (Note: we are considering only small examples, just for ease of explanation, but in real-time, the data will be very complex).

Considering the above dataset, which is a small customized movie dataset having 1 independent features cp, we need to predict what would the revenue be. We can establish a linear relation through this equation

Dependent variables (y) = constant(b) + independent variable1 (x) * parameter (m) + independent variable2 (x) * parameter (m) + ….. + independent variable n (x) * parameter (m)

Y = Bo + B1*X1 — — 1

Here, ‘Y’ is the prediction
Bo, B1 and B2 are the coefficients (or wights, which help to reduce the error)
Which is similar to y = mx +b

Here ‘Y’ is an estimation, not the true ‘y’ value, let me make this clear first,
We have the above data, which have 1 feature and ‘y’ value, we will split this data into train and test (this is very very small data, as mentioned only for ease of explanation), and we will train our LR model with the train data and using the test data, we will make the model predict the revenue for the test data. Later we will compare the predicted revenue to the true revenue which we split earlier, this will tell is what is the error, and based on that we will use some regularisation methods to minimize the error (will discuss more on this in part 2). So, ‘Y’ is the revenue that we are predicting, and ‘y’ is the true value. The equation-1 above is for the ‘Y’, estimated revenue.

The above image shows what exactly this equation represents, we have x values and we have a linear line which is for prediction, this is just saying for every point here which is x multiplied by some B (coefficient), what is going to be the prediction which is Y. So, for each point we will multiply with B, which means BX = Y

If we consider for one row in the dataset, Y= Bo + B1*X
B1 is the correlation between the X and Y multiplied by the standard deviation of Y / standard deviation of X

B1 = ρ(x, y) * σy/σx — — 2,

if we observe this formula this is the slope, change in y by change x.

If we expand this formula and solve this equation we get,

B1 = Σ(x-x̄) * (y — ȳ) / Σ(x-x̄)² — — — 3

So, Bo = Y — B1*X, we know B1 we can calculate Bo.

If we calculate for Y in the dataset above we get the below values

So, if we substitute the values in equation 3,

B1 = 344.3/144.25 ==> 2.4M

B0 = y — B1*x ==> 27.3–2.4*13.5 ==> -5.1

==> Y = -5.1 + 2.4*X, is what we have.

So if a producer invests 100M, what would be the revenue? Substitute in the above equation

Y = -5.1 + 2.4 * 100 == > 243.9M would be the revenue. This is how LR works.

OLS method will help us to solve the problem when we have only one feature, but what if we have multiple features, it is not possible to follow the same method for predicting the Y. So, in that case, we use Gradient Descent to solve the problem. [Please refer to my article on GD]

Now we calculate for Y (estimated value), and we have a true ‘y’ value, now how can we judge whether the value we estimated is accurate or not, for that we have to calculate the error. We call that a Cost Function, which calculates what is the extra cost that added to the estimated value from the true value, finding this we can able to use regularisation methods to reduce the error.

Cost function

Linear regression is a machine learning algorithm built to predict a target by training itself with a set of features. For instance, if you were asked to build a machine learning algorithm that could predict the house rents in your city so that, if someone wants to relocate depending upon their needs they can easily find a house. For this, you decided to build a linear regression model and predict the house rents, but how would you know whether the predictions are correct? For this purpose, you use an error metric to measure the difference between the actual rents and the predicted rents. This error metric is what we call a cost function.

In linear regression, Mean square error(MSE) is to find the cost.

MSE = (1/n) * Σ(a — p)**2

n is the number of values.

‘a’ is the actual rent price.

‘p’ is the rent price that the model has predicted.

If the error is small, then it is a good model, if it is too large then we need to use some hyperparameters to tune it. For this, we use gradient descent .

To be Cont….

About me

Please check my LinkedIn profile.

--

--

Krishna
Krishna

Written by Krishna

Machine learning | Statistics | Neural Networks | data Visualisation, Data science aspirant, fictional stories, sharing my knowledge through blogs.

No responses yet