๐ Day 22 : NumPy Advanced
๐ฏ Enterprise Objective
To manipulate data efficiently, you must master the art of selecting exactly what you need. Today we cover Slicing matrices, filtering data with Boolean Masks, and applying mathematical transformations across different shapes using the magic of Broadcasting.
๐ Strategic Overview
| # | Topic | Concept |
|---|---|---|
| 1 | Slicing | arr[0:2, :], Views |
| 2 | Masking | arr[arr > 5], Filtering |
| 3 | Broadcasting | Shape stretching (+) |
1. Slicing & Indexing : Navigating Matrices
NumPy slicing is similar to Python lists but extended to multiple dimensions. You access elements using arr[row, col]. Slices return Views (not copies!), meaning if you modify a slice, the original array changes.
# [start_row:stop_row, start_col:stop_col]
sub_matrix = matrix[0:2, 1:3]
๐ผ Why Data Analysts Care
โข Image Cropping: An image is just a 3D NumPy array (Height, Width, RGB). Cropping is just slicing: img[100:200, 100:200]
โข Data Sampling: Extracting every 10th row using a step slice: data[::10, :]
โ ๏ธ The View Trap
Slicing a NumPy array does not copy data; it creates a 'View'. slice = arr[:2]; slice[0] = 99 WILL change the original array. Use arr[:2].copy() if you need an independent copy.
๐งช Concept Checks: Slicing
Q1. Create arr = np.arange(10). Slice the first 5 elements.
Q2. Create a 4x4 matrix using np.arange(16).reshape((4,4)). Print it.
Q3. From the 4x4 matrix, extract the 2x2 square in the top-right corner. Print it.
Q4. Demonstrate the view trap: extract the first row, change its first element to 999, and print the original matrix to see it changed.
Q5. Extract the second column from the matrix as a 1D array using matrix[:, 1].
2. Boolean Masking : Filtering Data
Boolean Masking is how we filter arrays. When you apply a condition to an array (e.g.,
arr > 5), it returns an array of Booleans (True/False). You can use this Boolean array inside the brackets to select only the True elements.
mask = arr > 5 # [False, True, ...]
filtered = arr[mask] # Keeps only the True values
๐ผ Why Data Analysts Care
โข Outlier Removal: clean_data = data[(data > -3) & (data < 3)] to remove extreme z-scores
โข Conditional Assignment: arr[arr < 0] = 0 to instantly cap all negative numbers to zero
๐ง Pro Tip
When combining conditions, you MUST use bitwise operators & (and) / | (or) instead of Python's and/or. You MUST also wrap conditions in parentheses: (arr > 2) & (arr < 8).
๐งช Concept Checks: Boolean Masking
Q1. Create arr = np.array([10, 50, 30, 80, 20]). Create a mask for values > 40 and print the mask.
Q2. Use the mask from Q1 to extract and print the values greater than 40.
Q3. Use a combined mask (arr > 20) & (arr < 60) to filter the array. Print the result.
Q4. Try to use the Python and keyword instead of & for the combined mask. Catch the ValueError.
Q5. Replace all values in the array that are < 30 with -1 using a mask assignment. Print the updated array.
3. Broadcasting : Shape Alignment
Broadcasting is NumPy's way of doing math between arrays of different shapes. If the shapes are compatible, NumPy 'stretches' the smaller array to match the larger one without actually making copies in memory.
| Array A | Array B | Result | Works? |
|---|---|---|---|
| (3, 3) | Scalar 5 | (3, 3) | Yes (Scalar stretches) |
| (3, 3) | (3,) | (3, 3) | Yes (Row stretches down) |
| (3, 3) | (4,) | Error | No (Dimensions mismatch) |
๐ผ Why Data Analysts Care
โข Standardization: Subtracting the mean of each column from a large matrix: matrix - means_array
โข Color Adjustments: Multiplying an RGB image matrix (1080, 1920, 3) by a brightness vector (3,)
๐ง Pro Tip
Broadcasting starts checking dimensions from the trailing (rightmost) edge. They must be equal, or one of them must be 1. If not, you get a ValueError: operands could not be broadcast together.
๐งช Concept Checks: Broadcasting
Q1. Create a 3x2 matrix of ones. Multiply it by the scalar 10. Print the result.
Q2. Create a 3x3 matrix of zeros. Add a 1D array [1, 2, 3] to it. Observe how it broadcasts across rows.
Q3. Reshape [1, 2, 3] into a column (3, 1). Add it to the zeros matrix. Observe how it broadcasts across columns.
Q4. Try to add a 1D array of length 4 to a 3x3 matrix. Catch the ValueError and print the error message.
Q5. Explain why (4, 3) broadcasts with (3,) but fails with (4,). (Hint: Right-to-left dimension matching).
๐ ๏ธ Professional Practice Tasks
Theory is useless without muscle memory. Complete these tasks to solidify your understanding.
Task 1 (Matrix Borders): Create a 5x5 array of zeros. Use slicing to set the outer border (first row, last row, first col, last col) to 1. Print the result.
Task 2 (Checkerboard): Create an 8x8 array of zeros. Use step slicing [::2] to create a checkerboard pattern of 1s and 0s (like a chess board).
Task 3 (Outlier Capping): Create an array of 50 random numbers from a standard normal distribution (np.random.randn). Use boolean masking to cap any values > 2 to 2, and any values < -2 to -2. Print the min and max to verify.
Task 4 (Column Standardization): Create a 10x3 matrix of random integers. Calculate the mean of each column (.mean(axis=0)). Subtract this mean array from the matrix using broadcasting. The new matrix columns should have a mean of 0.
Task 5 (Distance Matrix): Create a 1D array x = np.arange(5). Use broadcasting to create a 5x5 matrix where each element M[i,j] = abs(x[i] - x[j]). (Hint: reshape one x to column).
๐ป Pure Coding Interview Questions
Q1.
What is an array View in NumPy? How does it differ from a Copy?
Q2.
How do you forcefully create a copy of a slice instead of a view?
Q3.
Explain Boolean Masking. What type of array is generated as the mask?
Q4.
Why does NumPy require & and | instead of and and or for boolean arrays?
Q5.
Explain the broadcasting rules in NumPy. What does 'trailing dimensions' mean?
Q6.
Write code to add a 1D array of length 3 to the columns of a 4x3 matrix.
Q7.
How do you add a 1D array of length 4 to the rows of a 4x3 matrix? (Hint: np.newaxis or reshape).
Q8.
What does np.where(condition, x, y) do? Write an example replacing negatives with 0.
Q9.
How do you select specific arbitrary rows from a matrix using a list of indices? (Fancy Indexing).
Q10.
Explain the difference between Slicing (arr[1:3]) and Fancy Indexing (arr[[1,2]]) in terms of Views vs Copies.
Q11.
Write a one-liner to reverse the rows of a 2D matrix.
Q12.
How do you find the unique elements and their counts in a NumPy array? (np.unique).
Q13.
Write code to stack two 1D arrays horizontally and vertically (np.hstack, np.vstack).
Q14.
Explain the axis parameter. What does .sum(axis=0) do on a 2D matrix?
Q15.
Write a boolean mask to filter out all np.nan values from an array.
Q16.
How do you concatenate two 2D matrices along the column axis?
Q17.
Explain np.argmax() and np.argmin(). What do they return?
Q18.
Write code to sort a 2D array by the values in its second column using np.argsort().
Q19.
What is the difference between arr.flatten() and arr.ravel()? (View vs Copy).
Q20.
Write a broadcasting operation that computes the outer product of two vectors [1,2,3] and [4,5,6].
Q21.
How do you use np.clip()? Compare it to using boolean mask assignments.
Q22.
Explain how memory layout (C-order vs Fortran-order) affects NumPy performance.
Q23.
Write code to extract the diagonal elements of a matrix without using a loop.
Q24.
What is the Ellipsis (...) used for in NumPy slicing?
Q25.
How does NumPy handle operations between arrays of different dtypes? (Type Promotion).
๐ Day 22 Executive Summary
| # | Topic | Key Takeaway | |
|---|---|---|---|
| 1 | Slices | Slices are Views! Modifying a slice alters the original data | |
| 2 | Masks | Use & and ` | with parentheses (arr>1) & (arr<5)` |
| 3 | Broadcast | NumPy automatically stretches dimensions to align math operations |
โ Instructor's End-of-Day Checklist
โข [ ] I can slice rows and columns from a 2D matrix.
โข [ ] I can filter an array using a boolean condition.
โข [ ] I understand how scalar broadcasting works.