Installing a Python Library in Visual Studio Code — Windows
In this quick blogpost, I will share the steps that you can follow in order to install a Python library using pip through either the Terminal or a Jupyter Notebook in Visual Studio Code (VSCode) on a Windows computer.
Pre-requisites
In order to complete the steps of this blogpost, you need to install the following in your windows computer:
- Visual Studio Code : you can find the steps to install it here.
- Python Extension for Visual Studio Code : you can find the steps to install it here.
- Python Interpreter : you can find the steps to install it here.
Installing a Python Library Using the Terminal in VSCode
1) Accessing Visual Studio Code Terminal
- Open VSCode application
- Go to the Terminal menu and select New Terminal .
- A new terminal (PowerShell based) window is opened.
2) Importing a Python Library
- Run the following command to validate that pip is installed in your computer.
- Let us say that you want to install Pandas Python library.
- Run the following command
- Pandas library is now ready to be imported by any python application. You can repeat this process for any Python library.
Installing a Python Library Using a Jupyter Notebook in VSCode
1) Creating a Jupyter Notebook in VSCode
- Create a Jupyter Notebook following the steps of My First Jupyter Notebook on Visual Studio Code (Python kernel)
2) Importing a Python Library
- Run the following command to validate that pip is installed in your computer.
- Let us say that you want to install Pandas Python library.
- Run the following command.
Data Science in VS Code tutorial
This tutorial demonstrates using Visual Studio Code and the Microsoft Python extension with common data science libraries to explore a basic data science scenario. Specifically, using passenger data from the Titanic, you will learn how to set up a data science environment, import and clean data, create a machine learning model for predicting survival on the Titanic, and evaluate the accuracy of the generated model.
Prerequisites
The following installations are required for the completion of this tutorial. Make sure to install them if you haven’t already.
The Python extension for VS Code and Jupyter extension for VS Code from the Visual Studio Marketplace. For more details on installing extensions, see Extension Marketplace. Both extensions are published by Microsoft.
Note: If you already have the full Anaconda distribution installed, you don’t need to install Miniconda. Alternatively, if you’d prefer not to use Anaconda or Miniconda, you can create a Python virtual environment and install the packages needed for the tutorial using pip. If you go this route, you will need to install the following packages: pandas, jupyter, seaborn, scikit-learn, keras, and tensorflow.
Set up a data science environment
Visual Studio Code and the Python extension provide a great editor for data science scenarios. With native support for Jupyter notebooks combined with Anaconda, it’s easy to get started. In this section, you will create a workspace for the tutorial, create an Anaconda environment with the data science modules needed for the tutorial, and create a Jupyter notebook that you’ll use for creating a machine learning model.
Begin by creating an Anaconda environment for the data science tutorial. Open an Anaconda command prompt and run conda create -n myenv python=3.10 pandas jupyter seaborn scikit-learn keras tensorflow to create an environment named myenv. For additional information about creating and managing Anaconda environments, see the Anaconda documentation.
Next, create a folder in a convenient location to serve as your VS Code workspace for the tutorial, name it hello_ds .
Open the project folder in VS Code by running VS Code and using the File > Open Folder command. You can safely trust opening the folder, since you created it.
Once VS Code launches, create the Jupyter notebook that will be used for the tutorial. Open the Command Palette ( ⇧⌘P (Windows, Linux Ctrl+Shift+P ) ) and select Create: New Jupyter Notebook.
Note: Alternatively, from the VS Code File Explorer, you can use the New File icon to create a Notebook file named hello.ipynb .
Save the file as hello.ipynb using File > Save As. .
After your file is created, you should see the open Jupyter notebook in the notebook editor. For additional information about native Jupyter notebook support, you can read the Jupyter Notebooks topic.
Now select Select Kernel at the top right of the notebook.
Choose the Python environment you created above in which to run your kernel.
To manage your environment from VS Code’s integrated terminal, open it up with ( ⌃` (Windows, Linux Ctrl+` ) ). If your environment is not activated, you can do so as you would in your terminal ( conda activate myenv ).
Prepare the data
This tutorial uses the Titanic dataset available on OpenML.org, which is obtained from Vanderbilt University’s Department of Biostatistics at https://hbiostat.org/data. The Titanic data provides information about the survival of passengers on the Titanic and characteristics about the passengers such as age and ticket class. Using this data, the tutorial will establish a model for predicting whether a given passenger would have survived the sinking of the Titanic. This section shows how to load and manipulate data in your Jupyter notebook.
To begin, download the Titanic data from hbiostat.org as a CSV file (download links in the upper right) named titanic3.csv and save it to the hello_ds folder that you created in the previous section.
If you haven’t already opened the file in VS Code, open the hello_ds folder and the Jupyter notebook ( hello.ipynb ), by going to File > Open Folder.
Within your Jupyter notebook, begin by importing the pandas and numpy libraries, two common libraries used for manipulating data, and loading the Titanic data into a pandas DataFrame. To do so, copy the code below into the first cell of the notebook. For more guidance about working with Jupyter notebooks in VS Code, see the Working with Jupyter Notebooks documentation.
Now, run the cell using the Run cell icon or the Shift+Enter shortcut.
After the cell finishes running, you can view the data that was loaded using the Variables Explorer and Data Viewer. First select the Variables icon in the notebook’s upper toolbar.
A JUPYTER: VARIABLES pane will open at the bottom of VS Code. It contains a list of the variables defined so far in your running kernel.
To view the data in the Pandas DataFrame previously loaded, select the Data Viewer icon to the left of the data variable.
Use the Data Viewer to view, sort, and filter the rows of data. After reviewing the data, it can then be helpful to graph some aspects of it to help visualize the relationships between the different variables.
Before the data can be graphed, you need to make sure that there aren’t any issues with it. If you look at the Titanic csv file, one thing you’ll notice is that a question mark ("?") was used to identify cells where data wasn’t available.
While Pandas can read this value into a DataFrame, the result for a column like age is that its data type will be set to object instead of a numeric data type, which is problematic for graphing.
This problem can be corrected by replacing the question mark with a missing value that pandas is able to understand. Add the following code to the next cell in your notebook to replace the question marks in the age and fare columns with the numpy NaN value. Notice that we also need to update the column’s data type after replacing the values.
Tip: To add a new cell you can use the insert cell icon that’s in the bottom left corner of an existing cell. Alternatively, you can also use the Esc to enter command mode, followed by the B key.
Note: If you ever need to see the data type that has been used for a column, you can use the DataFrame dtypes attribute.
Now that the data is in good shape, you can use seaborn and matplotlib to view how certain columns of the dataset relate to survivability. Add the following code to the next cell in your notebook and run it to see the generated plots.
Tip: To quickly copy your graph, you can hover over the upper right corner of your graph and click on the Copy to Clipboard button that appears. You can also better view details of your graph by clicking the Expand image button.
These graphs are helpful in seeing some of the relationships between survival and the input variables of the data, but it’s also possible to use pandas to calculate correlations. To do so, all the variables used need to be numeric for the correlation calculation and currently gender is stored as a string. To convert those string values to integers, add and run the following code.
Now, you can analyze the correlation between all the input variables to identify the features that would be the best inputs to a machine learning model. The closer a value is to 1, the higher the correlation between the value and the result. Use the following code to correlate the relationship between all variables and survival.
Looking at the correlation results, you’ll notice that some variables like gender have a fairly high correlation to survival, while others like relatives (sibsp = siblings or spouse, parch = parents or children) seem to have little correlation.
Let’s hypothesize that sibsp and parch are related in how they affect survivability, and group them into a new column called "relatives" to see whether the combination of them has a higher correlation to survivability. To do this, you will check if for a given passenger, the number of sibsp and parch is greater than 0 and, if so, you can then say that they had a relative on board.
Use the following code to create a new variable and column in the dataset called relatives and check the correlation again.
You’ll notice that in fact when looked at from the standpoint of whether a person had relatives, versus how many relatives, there is a higher correlation with survival. With this information in hand, you can now drop from the dataset the low value sibsp and parch columns, as well as any rows that had NaN values, to end up with a dataset that can be used for training a model.
Note: Although age had a low direct correlation, it was kept because it seems reasonable that it might still have correlation in conjunction with other inputs.
Train and evaluate a model
With the dataset ready, you can now begin creating a model. For this section, you’ll use the scikit-learn library (as it offers some useful helper functions) to do pre-processing of the dataset, train a classification model to determine survivability on the Titanic, and then use that model with test data to determine its accuracy.
A common first step to training a model is to divide up the dataset into training and validation data. This allows you to use a portion of the data to train the model and a portion of the data to test the model. If you used all your data to train the model, you wouldn’t have a way to estimate how well it would actually perform against data the model hasn’t yet seen. A benefit of the scikit-learn library is that it provides a method specifically for splitting a dataset into training and test data.
Add and run a cell with the following code to the notebook to split up the data.
Next, you’ll normalize the inputs such that all features are treated equally. For example, within the dataset the values for age range from
0-100, while gender is only a 1 or 0. By normalizing all the variables, you can ensure that the ranges of values are all the same. Use the following code in a new code cell to scale the input values.
There are many different machine learning algorithms that you could choose from to model the data. The scikit-learn library also provides support for many of them and a chart to help select the one that’s right for your scenario. For now, use the Naïve Bayes algorithm, a common algorithm for classification problems. Add a cell with the following code to create and train the algorithm.
With a trained model, you can now try it against the test data set that was held back from training. Add and run the following code to predict the outcome of the test data and calculate the accuracy of the model.
Looking at the result of the test data, you’ll see that the trained algorithm had a
75% success rate at estimating survival.
(Optional) Use a neural network
A neural network is a model that uses weights and activation functions, modeling aspects of human neurons, to determine an outcome based on provided inputs. Unlike the machine learning algorithm you looked at previously, neural networks are a form of deep learning wherein you don’t need to know an ideal algorithm for your problem set ahead of time. It can be used for many different scenarios and classification is one of them. For this section, you’ll use the Keras library with TensorFlow to construct the neural network, and explore how it handles the Titanic dataset.
The first step is to import the required libraries and to create the model. In this case, you’ll use a Sequential neural network, which is a layered neural network wherein there are multiple layers that feed into each other in sequence.
After defining the model, the next step is to add the layers of the neural network. For now, let’s keep things simple and just use three layers. Add the following code to create the layers of the neural network.
- The first layer will be set to have a dimension of 5, since you have five inputs: sex, pclass, age, relatives, and fare.
- The last layer must output 1, since you want a 1-dimensional output indicating whether a passenger would survive.
- The middle layer was kept at 5 for simplicity, although that value could have been different.
The rectified linear unit (relu) activation function is used as a good general activation function for the first two layers, while the sigmoid activation function is required for the final layer as the output you want (of whether a passenger survives or not) needs to be scaled in the range of 0-1 (the probability of a passenger surviving).
You can also look at the summary of the model you built with this line of code:
Once the model is created, it needs to be compiled. As part of this, you need to define what type of optimizer will be used, how loss will be calculated, and what metric should be optimized for. Add the following code to build and train the model. You’ll notice that after training, the accuracy is
Note: This step may take anywhere from a few seconds to a few minutes to run depending on your machine.
Now that the model is built and trained, we can see how it works against the test data.
Similar to the training, you’ll notice that you now have 79% accuracy in predicting survival of passengers. Using this simple neural network, the result is better than the 75% accuracy from the Naive Bayes Classifier tried previously.
Next steps
Now that you’re familiar with the basics of performing machine learning within Visual Studio Code, here are some other Microsoft resources and tutorials to check out.
How do I install pandas into visual studios code?
I want to read an excel csv file, and after researching, I realized I need to import pandas as pd . Is there a way to install it into the visual studio code? I have tried typing import pandas as pd , but it shows a red line. I’m still new to python.
7 Answers 7
I think the above answers are very well put already. just to add to that.
Windows:
2.type python -m pip install pandas
3.restart your visual studio code
Linux or macOS:
2.type pip install pandas
3.restart your visual studio code
As pandas is a Python library, you can install it using pip — the Python ‘s package management system. If you are using Python 2 >=2.7.9 or Python 3 >=3.4, pip is already installed with your Python . Ensure that Python has been added to PATH
Then, to install pandas , just simply do:
you can install using pip
pip install pandas
For anyone else in a similar situation, I’d recommend following along with this VS Code official tutorial.
It guides you to use Conda instead of Pip, and setup a Python environment, along with installing various packages like Pandas, Jupyter, etc.
For example, after installing the Python extension for VSCode and Miniconda or Anaconda:
In terminal on vscode, check and make sure python is installed:
Then you can install libraries with:
py -m pip install packagename
This was a simple solution I came up with since the others weren’t working on my system. Hopefully this helps!
I also had the same question. As a newbie, I did not understand the answer. Perhaps these notes will help others in the same boat.
You need to type this into command prompt (not visual studio or python): pip install pandas
Before you do that, you must "Ensure that Python has been added to PATH". This did not make sense to me, but there are pages on this if you Google.
Also useful to know: CMD and Terminal = Command Prompt (please correct me if that’s not true).
Hopefully this helps others. Thanks
You need to start off by installing Anaconda in order to create an environment for Pandas; you can manage this environment with Anaconda. Go to your terminal then run conda create -n myenv python=3.9 pandas jupyter seaborn scikit-learn keras tensorflow . It will create environments for all of the libraries mentioned above. PS : this is an old post, please check python’s latest version
After that click on your Kernel1 (top right) and chose the environment that is associated with Anaconda
-
The Overflow Blog
Linked
Related
Hot Network Questions
Subscribe to RSS
To subscribe to this RSS feed, copy and paste this URL into your RSS reader.
Site design / logo © 2023 Stack Exchange Inc; user contributions licensed under CC BY-SA . rev 2023.5.25.43461
By clicking “Accept all cookies”, you agree Stack Exchange can store cookies on your device and disclose information in accordance with our Cookie Policy.
How to Set up Python and Visual Studio Code IDE for Data Science
Setting up Python and running it smoothly on your PC is essential for data analytics or computational work. With advancements in open-source package managers, it has become simple and straightforward. In this tutorial, we will go through the whole process from downloading to installing Python and setting a custom environment for personal projects.
The entire process involved downloading Miniconda (python library manager) and Visual Studio Code, installing them, and creating an environment for managing and handling project libraries.
Why Miniconda?
Because it is small in size, easy to download, and only contains required libraries and dependencies which require minimum installation time.
Why Visual Studio Code?
Because VS Code is one of the best integrated development environment (IDE) with awesome look and up-to-date functionalities used by millions of developers worldwide. It is maintained by Microsoft and absolutely free to use.
Article Outline
- Downloading and installing Miniconda
- Downloading and installing Visual Studio Code
- Setting up a new environment
- Check for code execution
Let’s start with the set-up process.
1. Downloading and Installing Miniconda
Step 1.1: The very first step is to download the Miniconda. Miniconda is the smaller installer version of the Conda. It includes only conda, Python, the packages they depend on, and a small number of other useful packages, including pip, zlib and a few others.
Use the following link to download the Miniconda. Download the version that includes Python 3.8 or above based on your operating system and its configuration (32 bit or 64 bit).
Step 1.2: Once the downloading is complete, then start the installation process. It will first show the welcome page.
Click “Next >” button.
Step 1.3: The next step is to accept the license agreement by clicking “I Agree”.
Step 1.4: Once you accept the license agreement, then you need to select the installation type. For personal use, select “Just Me” and click “Next >”
Step 1.5: In the next step, select the directory where you would like to install Miniconda. If your default location has enough memory, then proceed with that directory.
Step 1.6: Next we need to set the following:
- Setting path variable: tick the add Miniconda to my PATH environment variable.
- Next tick the “Register Miniconda3 as my default Python 3.9. It will set the Anaconda as the primary Python 3.9 on the system. We can change it later during customized Miniconda environment setup.
Step 1.7: Click the install button to start the Miniconda installation in your PC.
Step 1.8: Once the installation is done, it will show the “Installation Complete” on top of the installation page.
Click “Next >” to finish the installation process.
To finish the process, uncheck the following boxes (as they are not that much important) and click “finish”.
2. Downloading and Installing VS Code
Once the Miniconda installation is complete, you can proceed with the Visual Studio Code installation.
Step 2.1: First, visit the following website to download the desired version of the VS Code.
Here, in this blog, I will go with Windows 64 bit version. You can proceed with Windows/Linux/Mac, whichever your preferred operating system. The process is almost similar.
Step 2.2: Once you start the installation, it will first ask you to accept the license agreement.
After accepting the license agreement, click “Next >” to proceed to the next page.
Step 2.3: In the next step, select the directory where you would like to install VS Code. If your default location has enough memory, then proceed with that directory.
Step 2.4: In this page, it will inform you that it will create a shortcut in the Start Menu Folder.
Click “Next>” and proceed to the next page.
Step 2.5: Next tick the following boxes as illustrated in the image below.
Thereafter, proceed with “Next>”.
Step 2.6: In this page, it will inform us that the application is now ready to begin installation in your computer with all desired settings that we have selected initially.
Proceed with “install” to begin the installation process.
Here, is a screenshot of the installation process.
Step 2.7: Now we have reached the final page, which is showing that the installation process is complete. We can now click the finish button to launch VS Code IDE.
This is what VS Code home page looks like.
Step 2.8: To run the python in VS Code smoothly, we need to install the Python extension provided by Microsoft. It offers IntelliSense (Pylance), linting, debugging, code navigation, code formatting, refactoring, variable explorer, test explorer and more!
To install it:
- First click the four dots menu on the left side called “Extensions”.
- Then type Python in the search bar (it requires internet connection).
- Look for the Python by Microsoft.
- Click it and look on the right side for install button.
- Click on the install button.
Here is a screenshot of the extension page after installation.
Step 2.9: Next, click on the “Explorer” located on the left side menu (top one).
- Next press Ctrl + Shift + P, it will open the Command Palette. The command palette is the option menu from where any functionality of VS Code can be set or altered.
- Next, type in the search “Select Interpreter”. Once it shows the option, click on it and wait for a few seconds.
After waiting for a few seconds, it will show all the available Python interpreters. Here in the below image it is showing the Python 3.9.7 as base interpreter. This is the default that was installed with the Miniconda, and we could utilize it to run Python codes.
But occasionally, we need to create a separate environment for running Python, especially for a group project. Here comes the part called Python environment.
3. New Environment Set-up
Now, you might be curious why we want a separate Python environment?
Because we need it for managing Python packages for different projects. Python environment allows us to avoid installing Python packages globally, which could break system tools or other projects.
Now let’s begin the environment set-up.
Step 3.1: First go to windows menu and look for “Anaconda Prompt (miniconda 3).
Click it to open it in a separate window.
Once you open it, it looks like the following, where
- (base) indicates that we are now in base environment (under Miniconda)
Step 3.2: As we want to create a new environment to manage all our libraries.
Let’s create a new environment.
To create a new environment, we need to run the following code, where you have to enter your environment name in place of “yourenvname”. Our base environment was Python 3.9.7 (base). But let’s assume that we specially need the Python version 3.8 for our new environment. So, at the end of the comment, we will add the Python version “python=3.8”.
So say I would like to name my new virtual environment as “datascience” and want python 3.8 as my default python version. To execute and create a virtual environment with the above stated configurations, we need to run just the following code in the command prompt.
Once you press enter and execute the code, it will prompt you to install various default packages (dependencies) under the new environment.
- Type “y” which means “yes” and press enter. Wait until the installation is done.
After the installation, you can check different environments available under Miniconda.
- Type “conda env list”
It will print the existing environment names. You can see that there are two environments. The base and our newly created environment, datascience.
Step 3.3: Now we are ready with our new conda environment called “datascience”. Before we use this environment to run code in VS Code, we need to install some basic libraries/packages inside our “datascience” environment.
To install packages inside our datascience environment, we need to first activate it.
Just type “conda activate datascience” without any quote in the command prompt. Now you can observe that environment name datascience is inside the parenthesis (datascience) instead of base environment (base). It indicates that our new environment “datascience” is activated.
Step 3.4: As the datascience environment is now activated, we can install different packages inside this environment that we are going to utilize for data analysis.
To install any package (that are available under anaconda repositories/servers holding libraries), we need to run the following code, where replace the “library_name” with real library name.
First, we need to install the ipykernel library so that we could run the Jupyter notebook inside VS Code using the following command.
Similarly, we need to install pandas (for data wrangling), matplotlib, seaborn and plotly (for plotting/data visualization).
Here is a snapshot of pandas installation.
Note: While installing libraries, it may ask you to install additional dependencies (upon which the current library depends). If it shows different library names (which will be installed) and displays “y/n” then press ‘y” means “yes’ install all dependencies. Afterward, just wait until all dependencies are installed in the activated “datascience” envirobnment.
Step 3.5: Next open the VS Code and enter Ctrl + Shift + p to open the command palette. Type “Select Interpreter” and click on it and wait for a few seconds.
Now, it will show all the available interpreters. You can observe that it is showing Python 3.9.7 (base: conda) and our newly created environment Python 3.8.11 (datascience: conda).
- Select the Python 3.8.11 (datascience: conda)as we are going to set it as the default environment for our current work or project.
Step 3.6: Next, we will create a new Jupyter Notebook so that we can test that our code runs in the newly created environment.
What is a Jupyter Notebook?
Jupyter notebook is a web like application which used by research scholars, data engineer, data analyst, machine learning scientist, scientific researchers or a general user who wants to do any sort of scientific computation, data processing or visualization related work.
- To create a new Jupyter notebook (also known as IPython Notebook) just open the command palette (Ctrl + Shift + p) and type “New Jupyter Notebook” without any quotation, and click it to create a new notebook.
Here, the below snapshot shows a blank Jupyter Notebook with empty cell.
Step 3.7: First, we will check whether the datascience environment is working fine or not.
- Type 1+1 in the cell and run by pressing the “triangular arrow” button on the left side of the cell. You can also run the cell by Ctrl + Enter. If it produces an answer of 2, then it is working well.
- Next, let's check whether the libraries we properly installed under the datascience environment. First we need to create a new cell where we test the libraries. To generate a new cell, just press Ctrl +Shift + b where b indicates below. Now we will import the pandas library in the current Jupyter Notebook session. Type “import pandas as pd”without any quotation in the new cell, then run the cell. If it runs without producing any error, then our environment is working fine.
Now our environment is ready, and we can use it to run codes and perform data analysis.
Note: You can create unlimited environments based on your project requirements.
I hope you learned something new!
If you learned something new and liked this article, share it with your friends and colleagues. If you have any suggestions, drop a comment.