Learning Guy

CoursesCreate

Learning Guy

CoursesCreate

This course content is AI generated

Learning Guy

CoursesCreate
Home/
Courses/
Advanced Data Visualization with Python: From CSV to Publication-Ready Graphics

Advanced

MatplotlibSeabornPandas

Advanced Data Visualization with Python: From CSV to Publication-Ready Graphics

Master professional data visualization workflows using pandas, matplotlib, and seaborn, covering theory, best practices, and end-to-end project implementation from raw data to final visualizations.

Course Completion

0%

Chapters

Advanced Python Visualization Ecosystem Setup

Learning Outcomes

By the end of this chapter, you will be able to:

  • Create and manage isolated virtual environments for Python visualization projects.
  • Configure Jupyter to use specific environment kernels to avoid dependency conflicts.
  • Apply standardized import conventions and global configurations for consistent plotting.
  • Manage Matplotlib backends and troubleshoot common installation and environment issues.

Concepts

Environment Isolation

Professional visualization workflows require strict dependency management to ensure reproducibility. The global Python environment is often unstable due to conflicting package versions. Use Virtual Environments to isolate the visualization stack.

  • Venv: The standard Python tool. It is lightweight and sufficient for pure Python projects.
  • Conda: Preferred for scientific packages as it handles complex binary dependencies (like HDF5) better than pip.

Warning: Avoid mixing pip and conda to install packages whenever possible. Using pip inside a Conda environment can lead to broken dependencies and conflicts that are difficult to resolve. Stick to one package manager for a specific environment.

Always activate the environment before installing packages or launching Jupyter.

Jupyter Kernel Architecture

Jupyter operates on a client-server model. The Kernel (backend) runs Python code, while the Frontend (browser) renders output. A common issue is installing a library in the global Python but launching a Jupyter Kernel from a virtual environment, or vice versa. This results in ModuleNotFoundError in the notebook even if the package is installed elsewhere on the system.

Solution: Install ipykernel in the target environment and register it as a Jupyter kernel. This allows you to select the specific environment explicitly from the Jupyter interface, ensuring the code runs with the correct dependencies.

Import Standardization

To ensure code consistency, follow the standard import conventions used by the libraries' documentation. This reduces cognitive load and errors.

  • Pandas: import pandas as pd
  • NumPy: import numpy as np
  • Matplotlib Pyplot: import matplotlib.pyplot as plt
  • Seaborn: import seaborn as sns
  • Seaborn Objects API: import seaborn.objects as so

Global Configuration (Theming)

Seaborn and Matplotlib allow global configuration changes that persist for the duration of the session. sns.set_theme() applies a style (e.g., whitegrid) and context (e.g., talk, paper) which automatically scales fonts and lines. This ensures visual consistency across all plots without manual styling in every command.

Backend Management

For Jupyter Notebooks, the backend is typically inline (static images). For JupyterLab, you might use widget (interactive backends).

  • Inline: Renders the plot as a static PNG/JPEG in the output cell. Fast and portable, but not interactive.
  • Widget: Renders an interactive figure that supports zooming, panning, and rotating (if 3D). This requires the ipympl package (pip install ipympl) and works best in JupyterLab or VS Code.

The backend must be set before importing plotting libraries or explicitly changed via matplotlib.use(). Mixing backends mid-session can cause crashes or unpredictable behavior. If you switch backends, you must restart the kernel.

Examples

1. Environment Setup (Shell/Terminal)

Creating and populating a professional environment.

# 1. Create the virtual environment
python -m venv vis_env

# 2. Activate it
# Command for Windows:
vis_env\Scripts\activate
# Command for Mac/Linux:
source vis_env/bin/activate

# 3. Install Jupyter and libraries
# Using pip (standard for venv)
pip install jupyter pandas matplotlib seaborn numpy ipykernel

# 4. Register the kernel for Jupyter
python -m ipykernel install --user --name=vis_env --display-name="Python (Vis Env)"

2. Global Configuration Script

Create a file named config.py to run at the start of every notebook to enforce consistency.

# config.py
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# 1. Global Seaborn Theme
# "context" scales plot elements (font, lines)
# "style" sets the background grid
sns.set_theme(style="whitegrid", context="talk", palette="deep")

# 2. Matplotlib RC Params
# Ensure fonts are embedded correctly for PDF export
plt.rcParams['pdf.fonttype'] = 42
plt.rcParams['figure.figsize'] = (10, 6) # Standard figure size

# 3. Pandas Display Options
# Show more rows/columns for inspection
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)

print("Visualization environment configured.")

3. Correct Import Structure

In your analysis notebooks, follow this strict order to ensure configurations are applied before any plotting logic executes.

Why the order matters:

  1. %run config.py: Sets the backend and styling rules immediately. If you import libraries before this, they might initialize with default settings that are hard to override.
  2. Data Loading: Separates data logic from visualization logic.
  3. Plotting: Keeps the visualization code clean and separate.
# Cell 1: Run configuration
%run config.py 

# Cell 2: Data Loading
df = sns.load_dataset("penguins")

# Cell 3: Plotting (Seaborn)
# Note: We do not need plt.show() in Jupyter with the 'inline' backend
plt.figure(figsize=(8, 5))
sns.scatterplot(data=df, x="bill_length_mm", y="bill_depth_mm", hue="species")
plt.title("Bill Dimensions by Species")
plt.show() # Explicit show is good practice for portability

# Cell 4: Using Matplotlib for annotation (overlay)
# Access the last active axis
ax = plt.gca() 
ax.text(45, 17, "Outlier?", fontsize=12, color='red')
plt.draw()

4. Verifying the Environment

Run this diagnostic cell to ensure you are using the correct libraries, versions, and the correct kernel.

import sys
import matplotlib
import seaborn

print(f"Python Executable: {sys.executable}")
print(f"Pandas Version: {pd.__version__}")
print(f"Matplotlib Backend: {matplotlib.get_backend()}")
print(f"Seaborn Version: {seaborn.__version__}")
print(f"Ipykernel Version: {importlib.metadata.version('ipykernel')}")

# Check if running in a notebook
try:
    shell = get_ipython().__class__.__name__
    if shell == 'ZMQInteractiveShell':
        print("Environment: Jupyter Notebook/Lab")
    elif shell == 'TerminalInteractiveShell':
        print("Environment: IPython Terminal")
    else:
        print("Environment: Other")
except NameError:
    print("Environment: Standard Python")

Key Notes and Troubleshooting

Kernel vs. Environment Mismatch

If you see ModuleNotFoundError in Jupyter despite installing the package:

  1. Confirm the package is installed in the active environment (pip list in terminal).
  2. Verify the Jupyter kernel is pointing to that environment.
  3. Inside the notebook, run import sys; sys.executable to see which Python executable the kernel is using.
  4. Use !python -m pip list inside a notebook to see what that specific kernel sees.

Managing Styles

Do not manually set font sizes in every plot command. Use sns.set_context("talk") for presentations or sns.set_context("paper") for manuscripts. This ensures all plot elements scale proportionally.

Kernel Crashes and Backend Errors

If the kernel crashes or you receive backend errors (e.g., RuntimeError: Invalid DISPLAY variable):

  1. Restart the Kernel: Always the first step. (Kernel -> Restart).
  2. Check Backend Setup: Ensure the backend is set correctly in your config script. If you change backends (e.g., from inline to widget), you must restart the kernel.
  3. Installation Issues: If pip install ipympl fails, ensure your environment is active and your package manager (pip/conda) is up to date. Try installing ipympl via conda install ipympl if pip fails, as it handles binary dependencies better.
Back to courses

This course content is AI generated