Yahoo
Advertisement
Advertisement
Advertisement
Advertisement

This Python Code Could Save You From Spending Too Much on Your Next Laptop

A laptop with the Python download webpage open on Chrome.
Hannah Stryker / How-To Geek

If you're thinking of a new laptop, you might wonder how much you should pay for one. You could sift through websites, but some Python code and a little linear regression could make the job easier.

Why Build a Laptop Price Predictor?

You could search through thousands of pages and trawl through physical stores, but this would take a lot of time. I like computers, but I don't have the time or the inclination for this task. I have other things I'd rather be doing. I would like to have a program that I can input specs like how much RAM I want or the screen resolution I need and have it spit out a price.

There are so many machines on the market that it would be difficult for me to compute all of this information by hand. I also thought that sharing this info with other people who might be in the market for a new laptop could be helpful. Who wants to overpay for a machine? I don't, and I can guess that you probably don't either.

Advertisement
Advertisement

With my knowledge of linear regression from statistics, I realized I could easily build a model to answer these questions. Python is a great language, and I already had some basic familiarity with it. It's become popular in data analysis because it's simple enough for people without computer science backgrounds to pick up yet offers powerful libraries to analyze data.

Assembling the Python Statistical Toolkit

For this project, I used pieces of the Python statistical ecosystem that I was already comfortable with.

I'd already set up a Mamba environment with these tools. While many systems, including Linux, include Python, it's meant for supporting the system and less for user programs. If you upgrade the system Python, you might find that scripts that depend on it will break. There are tools for installing custom environments like VirtualEnv.

The first component is NumPy . It's a popular library for all kinds of numerical operations, particularly statistical and linear algebra calculations that will happen in the background.

Advertisement
Advertisement

The next library you'll need is Pandas , which will let you import the dataset and view it in columns as a "data frame." It's a bit like a cross between a relational database and a spreadsheet. You can also make some powerful manipulations on your data.

Seaborn is a library for viewing statistical data plots. I use it for visualizing data distributions in histograms, scatterplots, and linear regressions.

Finally, Pingouin lets me perform many statistical tests easily, without having to memorize all those formulas I forgot in my college statistics class years ago. This is the program that will build the model through multiple linear regression of the retail price vs all the laptop attributes.

Putting all this together is simple in most Unix-like environments, including Windows using the Windows Subsystem for Linux . You can follow the instructions on the web page to install it.

Advertisement
Advertisement

Jupyter notebooks provide a relatively user-friendly way to run the Python commands and view the results, as well as store the results for later, but it's strictly optional. I created a Jupyter notebook , and will be demonstrating code examples from it. I've posted it to my GitHub , so you can see the code and some examples I couldn't cover in this article.

With Mamba installed, you can create an environment you need. Like a cooking show, I had one already ready. To activate it, I type this at the Linux shell:

Acquiring the Laptop Data

To build the dataset for the regression model, I could trawl through internet stores and build up a comprehensive database of laptops. That would take a long time to build up, as well as clean the data so it would be consistent. Fortunately, someone has done that already.

There's a database of laptops with certain hardware specs like CPU speed, amount of RAM, amount of storage, and horizontal and vertical screen resolutions available on Kaggle .

Advertisement
Advertisement

The price of the laptops was in euros, but a quick check on Xe.com in July 2025 showed that the exchange rate between euros and US dollars is pretty close.

Building the Regression Model

With the environment assembled and the data acquired, now is the time to build the model. First, I have to import the libraries I'm going to use.

These lines import the NumPy, Pandas, Seaborn, and Pingouin libraries. Numpy, Pandas, Seaborn, and Pinguoin, are shortened to "np, pd, sns, and pg." The line that starts with "%" is for use in a Jupyter notebooks. It tells it to use the Matplotlib library that draws the plots to display them within the Jupyter notebook. Otherwise, they'll be displayed as a separate window.

Next, we'll import the data with Pandas:

This will create a Pandas data frame. We can see how the data is laid out with the head() method:

Output of laptops.head() in Jupyter notebook.

We can also see basic descriptive statistics of all the numerical columns with the describe() method.

Descriptive statistics of the numerical columns in the laptops dataset.

This will show the mean, median, the standard deviation, the minimum value, the lower quartile or 25th percentile, the median, the upper quartile or 75th percentile, and the maximum value of each column.

Advertisement
Advertisement

I also like to visualize the distributions of data through histograms. Seaborn's displot does this.

To see how the prices are distributed:

Histogram of laptop prices in a Jupyter notebook.

This tells Pingouin to plot the prices along the x axis and to use the laptop dataframe as the source. The tail of the distribution is noticeably skewed to the right.

We'll build up a model that uses various specs. It'll look something like this:

price = a(CPU speed) + b(RAM) + c(size in inches) ...

The letters are stand-ins for the coefficients defined by the regression. It's similar to simple linear regression you might have seen, but instead of fitting a line over a scatterplot, you're fitting a plane. Since there are more than three dimensions in this model, it's actually a hyperplane.

Advertisement
Advertisement

To obtain the regression of price in euros vs the laptop size, CPU speed, screen size, weight, primary storage and secondary storage, use Pingouin's linear regression function:

Laptop mutiple regression model in a Jupyter notebook.

This will give us the coefficients for this regression equation. The relimp= option will tell Pingouin to calculate how much each variable contributes to the price. The coefficients will be displayed in the left-most column, with the column on the far right telling us that RAM is the biggest predictor of price. The number to pay attention to in determining how good of a fit is the square of the correlation coefficient, which is "r2" in this table. It's around .66, which means it's a pretty good fit.

With the predicted coefficients, we can now plug values into the equation to predict the price. Here's a function that does just that:

You should indent the second line, but the limitations of our system require me to present it this way.

Do Prices Really Differ Among Brands?

This regression model only looks at specs. You might wonder if price is really a predictor of price. We can use analysis of variance, or ANOVA, to determine if the differences among brands is significant. Because the price data was skewed, as seen with the histogram, a non-parametric test will be more accurate. Pingouin has a Kruskal-Wallis test that does this.

Advertisement
Advertisement

This will test the null hypothesis that there's no relationship between price and brand:

The p-value is 0, which means that that price is indeed significant. The rounding was done to make the p-value more apparent. Otherwise, it will be shown in scientific notation. This means that we can reject the null hypothesis and conclude that brand is a predictor of price.


I was able to build a price predictor to help me decide what a fair price to pay for a machine would be based on its specifications, and another to determine how significant brand was. This shows the power of Python and its libraries to make something that might have been difficult to do by hand reducible to a few lines of code.

Advertisement
Advertisement
Mobilize your Website
View Site in Mobile | Classic
Share by: