Visualizing Stock Volume: Building a Dashboard with Pandas

Transform raw stock data into a clean, human-readable format using Python and Pandas to prepare for data visualization. This article walks through importing a CSV file, filtering the top five stocks by trading volume, and formatting the data for clarity.

Key Insights

Imported stock data from a CSV file into a Pandas DataFrame and used the nlargest function to isolate the top five stocks by trading volume.
Formatted volume data by converting raw values into millions and rounding to one decimal place for improved readability.
Noble Desktop’s curriculum emphasizes the importance of data preparation—such as filtering, sorting, and formatting—as a critical step before building visualizations.

This lesson is a preview from our Data Analytics Certificate Online (includes software). Enroll in this course for detailed lessons, live instructor support, and project-based training.

This is a lesson preview only. For the full lesson, purchase the course here.

Our goal ultimately here is going to be to create a dashboard that's going to visualize the shares of total stock sales from the top five most popular stocks in this data set. Now, by popular we mean by volume, by how many shares were sold during that period. And to do that, to narrow down to the top five most popular, to sort them by volume, we're going to need to get our data into that format to begin with.

And as anyone who's worked with data knows, a lot of what you're doing, like 90% of it, is massaging your data, cleaning it up, getting it into the right format, filtering the data, all of that. So we'll start with that. But first, we need to actually get the data.

To do that, we're going to grab it from a local CSV file. Now, we have the file right here, stockprices.csv. We take a look at that. It's your typical CSV file, comma separated values.

And you're probably familiar with these if you've worked with data at all. We have our first row is a list of columns. And every row after that is one piece of data with all of its many characteristics.

This is this stock's ticker. And the next line is the next comma separated value is the company name and the date and then the open value, the price the stock opened at that day, etc. So we have 10 stocks here.

And what we're going to do is get this into Pandas and do our work to get our top five by volume. Okay. So above this, all this dash code, let's write some data code, some Pandas work.

The very first thing we're going to do is import Pandas. So import Pandas. And as is typical, as pd, that's the standard name for it in a file.

And we already installed Pandas, but we need to pull it into this file to say, yes, this is a variable called pd that we're going to use in this file. Okay. So now we'll use Pandas to import into a variable we'll call stocks.

And that will be a data frame, a two dimensional matrix of data, similar to a spreadsheet, take those comma separated values and put them in a data that we can work with a little easier. So we use Pandas. And it's read CSV file method that reads from a CSV file.

All it needs is a string of the path to the file. Now we're going to run this from our data visualization curriculum main folder here, which means that when we're referring to it, we're going to refer to it as to it from its path from that directory. And as we can see here, it's in the notebook one directory stockprices.csv. That's its path from data visualization curriculum main.

So it's going to be notebook one, notebook dash one slash, or backslash in Windows, stockprices.csv. Great. Let's, let's make sure we did it right, sort of sanity check. Did this work? We'll check the price of stocks.

Now I've actually quit running the app. The server is no longer running. So I'm going to go up and run up, I press the up arrow to get back my previous command, which is to run Python on app.py. Okay, it actually printed out twice.

But that's because it started the server and then ran it again, essentially. So here we have our full 10 stocks. Notice that they're printing out here in the terminal, they're not printing out in our, where we're programming, like they might with a Jupyter notebook.

They're not printing out here in the browser. Right, which makes sense, because we're talking about sort of an internal reporting tool with this print. We don't want it to display it in the dashboard.

That's for our ultimate user facing dashboard. And so it prints out where we're executing the code here in the terminal. So it's waiting for our changes.

Let's make some changes. And then it will update automatically. We've loaded the data.

It's a pandas data frame. Now, what we want is to format this data. And the first step we're going to take is, and I'm going to put this above the print so that when we save changes and it reloads, it will print out what we've got so far.

We're going to reassign stocks to be a new value. It will be the data frame we get back by calling nlargest on that data frame. It gives you the, nlargest will give you, let's see if I can describe this very well, the top n rows of a data frame based on a specific column.

So we need to say how many of the largest we want and how to measure the largest. So the number we want is five, the five largest. That's our n. And largest based on what column? And the answer is the volume column.

All right. So we're reading our stocks in from our CSV. And then we're changing stocks to be the part of the data frame that is the nlargest, the five largest by volume.

If I save these changes, it will reload here and you could see our previous printing here. But now here's our new printing. It's the five largest by volume and it's already nlargest sorted them for us by largest by volume.

So it's looking at this column right here, this volume column and seeing, okay, here's the top by volume. Here's the second by volume, the third by volume, the fourth by volume and the fifth by volume. And they're not in their original order.

Here's row zero, then four, then six, then three, then eight. But those are our top five by volume. And this is a pretty nice little feedback loop where we can write some code and when we're ready, save it, check on the results.

We've got our five largest, but we don't want to be, when we're printing this out on our dashboard, we don't want to print out that the volume was whatever this number is. I can't even read this number because it doesn't have commas in it. It's hard to know what it is.

It's like 80 million, 80 million. I did some quick counting of the digits there, 80 million. But that's not, it's not very user-friendly.

So here's what we're going to do. We're going to have the prices, the volumes be in millions. All we want to do is divide it by a million.

So get 80 million, 45 million, 38 million, et cetera. Let's try that out. We'll say, all right, stocks, let's change its volume column to be the old value for volume divided by 1 million.

And I don't know if you're aware, but in Python, we can put in these little underscores. You're probably aware, but if you weren't, this is a great day for you to learn that, you know, we don't have to write the numbers in a way that's not human readable. We can write it this way and those underscores don't affect the number.

It's still 1 million, but it makes it much more readable. So we'll divide it by, divide each value in the stocks by a million, each value in the volume column by a million. Let's save changes, switch to the terminal.

All right, this is much more human readable, 80 million, 45 million, et cetera. But we have all these decimal numbers. We don't really need them.

Let's round each one to the nearest, maybe one past the decimal would be good, like 45.5 million, or maybe we'll make them all into integers. Let's do round and then we can play around with it, but we'll make one more change. We'll wrap this all in a round function.

Here we go. So here we're surrounding this calculation of, you know, each volume, each value of the volume column divided by a million. Now we're rounding it.

A round takes in the number you want to round and how much past the decimal point you want. I think I want zero past the decimal point. Let's see what that looks like.

80.0, 46.0. It's a little awkward with the point zeros. I'm going to go with rounding it to one place. Let's see what that looks like.

80.2, 45.5, 38.2. That seems like a right amount of rounding to do. I like that better. All right.

So we've got our stocks in the right format now where maybe we could do some real dashboard work with this now to take what is a, you know, printed out of the terminal table and visualize it for people. Make some charts, some graphs, and not only make it more human readable and digestible as a picture is worth a thousand words, but also to understand the data better ourselves. How much more is Apple in volume than the others? And we'll think of different ways to visualize that.

All right. We'll jump into some true data visualization in our next part.

Key Insights

How to Learn Python