โณ Loading Python Engine...

๐Ÿ“Š Day 28 : Pandas Time Series

๐ŸŽฏ Enterprise Objective

Data is rarely clean, and it usually involves Time. Today we master Pandas Accessors (.str and .dt) to clean text and extract date features. We then dive into Time Series analysis, learning how to resample frequencies and calculate Rolling Moving Averages.

๐Ÿ“‹ Strategic Overview

#TopicConcept
1Accessors.str.upper(), .dt.year
2ResampleGrouping by Time ('M', 'W')
3RollingMoving Averages (MA)

1. String and Datetime Accessors : .str and .dt

๐Ÿ” What is it?

Pandas provides specialized methods for specific data types using Accessors. If a column contains strings, use the .str accessor to apply string methods (like .upper()). If it contains dates, use .dt to extract the year or month.

# String Accessor
df['Name'] = df['Name'].str.upper().str.strip()

# Datetime Accessor
df['Year'] = df['Date'].dt.year

๐Ÿ’ผ Why Data Analysts Care

โ€ข Text Cleaning: Removing leading/trailing spaces from a column of messy user inputs

โ€ข Seasonality: Extracting the DayOfWeek from a Timestamp to see if weekend sales are higher

โš ๏ธ Type Errors

You cannot use .dt on a string column! You must first convert it: df['Date'] = pd.to_datetime(df['DateString']) before using the accessor.

In [ ]:

๐Ÿงช Concept Checks: Accessors

Q1. Given df["Text"] = [" hello ", " WORLD "]. Use .str.lower().str.strip() to clean it.

In [ ]:

Q2. Create a string column "100 USD". Use .str.replace(" USD", "") to remove the currency.

In [ ]:

Q3. Convert df["Date"] = ["2020-01-01"] to datetime using pd.to_datetime(). Print the dtypes.

In [ ]:

Q4. Use the .dt accessor to extract the .day from your converted datetime column.

In [ ]:

Q5. Use df["Text"].str.contains("hel") to create a boolean mask. Print it.

In [ ]:

2. Time Series Resampling : Rolling up Dates

๐Ÿ” What is it?

When working with Time Series data (where the Index is a Datetime), you can use .resample(). It is exactly like .groupby(), but specifically for time frequencies (e.g., grouping daily data into monthly averages).

FrequencyStringExample
Daily'D'df.resample('D').sum()
Weekly'W'df.resample('W').mean()
Monthly'M'df.resample('M').max()
Quarterly'Q'df.resample('Q').count()

๐Ÿ’ผ Why Data Analysts Care

โ€ข Financial Aggregations: Converting minute-by-tick stock data into Daily OHLC (Open, High, Low, Close) bars

โ€ข Smoothing: Aggregating noisy daily web traffic into smooth Weekly averages

๐Ÿง  Pro Tip

.resample() ONLY works if your DataFrame's index is a DatetimeIndex! Use df.set_index('DateColumn', inplace=True) before attempting to resample.
In [ ]:

๐Ÿงช Concept Checks: Resampling

Q1. Create dates = pd.date_range("2020-01-01", periods=30, freq="D") and a DF with it as the index.

In [ ]:

Q2. Fill the DF with a Sales column of random integers. Print df.head().

In [ ]:

Q3. Resample the daily data to Weekly ("W") and calculate the .sum(). Print it.

In [ ]:

Q4. Resample the data to Monthly ("M") and calculate the .mean(). Print it.

In [ ]:

Q5. What happens if you try to resample a DataFrame whose index is just integers (0,1,2...)? Catch the TypeError.

In [ ]:

3. Rolling Windows : Moving Averages

๐Ÿ” What is it?

A Rolling Window calculates statistics over a sliding window of time. The most common use case is the Moving Average, which smooths out short-term fluctuations in data to highlight longer-term trends.

# Calculate a 7-day moving average
df['7D_MA'] = df['Sales'].rolling(window=7).mean()

๐Ÿ’ผ Why Data Analysts Care

โ€ข Stock Analysis: Calculating the 50-day and 200-day moving averages to find 'Golden Cross' buy signals

โ€ข Trend Detection: Smoothing out weekday vs weekend sales spikes to see the true month-over-month growth

๐Ÿง  Pro Tip

The first N-1 rows of a rolling window of size N will result in NaN, because there isn't enough historical data to calculate the full window yet. You can use min_periods=1 to calculate partial windows.

In [ ]:

๐Ÿงช Concept Checks: Rolling

Q1. Given df["Prices"] = [10, 20, 30, 40, 50]. Calculate a rolling mean with window=2.

In [ ]:

Q2. Observe the NaN in the first row. Add min_periods=1 to the .rolling() call and print again.

In [ ]:

Q3. Calculate a rolling .max() with window=3. Print it.

In [ ]:

Q4. Calculate an expanding sum (cumulative sum) using df["Prices"].expanding().sum(). Print it.

In [ ]:

Q5. Explain why moving averages are essential for visualizing highly volatile daily data.

In [ ]:

๐Ÿ› ๏ธ Professional Practice Tasks

Theory is useless without muscle memory. Complete these tasks to solidify your understanding.

Task 1 (String Pipeline): Create df = pd.DataFrame({'Code': [' id-01 ', 'id-02', ' ID-03 ']}). Write a chained pipeline using .str to: strip whitespace, convert to uppercase, and replace '-' with '_'. Result should be 'ID_01'.

In [ ]:

Task 2 (Datetime Features): Create a single Datetime column for '2023-12-25'. Use the .dt accessor to extract the Year, Month, Day, and day_name() into 4 separate new columns.

In [ ]:

Task 3 (Monthly Aggregation): Generate 365 days of random sales data for the year 2023. Set the date as the index. Resample to Monthly ('M') and find the Total Sales and Max Daily Sale per month using .agg(['sum', 'max']).

In [ ]:

Task 4 (Golden Cross): Generate 100 days of stock prices. Calculate the 10-day moving average and the 30-day moving average. Add both as columns to the DataFrame.

In [ ]:

Task 5 (Shift and Differences): Use the .shift(1) method on a Sales column to create a Previous_Day_Sales column. Then calculate the Daily_Growth by subtracting them. This is how Day-over-Day metrics are made!

In [ ]:

๐Ÿ’ป Pure Coding Interview Questions

Q1.

Explain the difference between the .str accessor and the .dt accessor.

In [ ]:

Q2.

How do you convert a string column 'Jan 15, 2023' into a Pandas datetime object?

In [ ]:

Q3.

What happens if you use pd.to_datetime() on a column that has mixed or European date formats? (Hint: dayfirst=True or format=...).

In [ ]:

Q4.

Write code to extract the day of the week (e.g., 'Monday') from a datetime column.

In [ ]:

Q5.

Explain what a DatetimeIndex is. Why is it required for .resample()?

In [ ]:

Q6.

What is the difference between .resample('M').mean() and .groupby(df.index.month).mean()?

In [ ]:

Q7.

How do you handle the NaN values generated by a .rolling(window=30) calculation?

In [ ]:

Q8.

Explain the difference between a Rolling window and an Expanding window.

In [ ]:

Q9.

Write code to calculate the Exponential Moving Average (EMA) using .ewm().

In [ ]:

Q10.

How do you calculate the day-over-day percentage change in Pandas? (Hint: .pct_change()).

In [ ]:

Q11.

Write code using .str.contains() with a regex to filter rows that contain an email address.

In [ ]:

Q12.

How do you split a string column 'First Last' into two separate columns 'First' and 'Last'? (.str.split(expand=True)).

In [ ]:

Q13.

Explain .shift(). How is it used to calculate differences between consecutive rows?

In [ ]:

Q14.

What does .diff() do? How does it relate to .shift()?

In [ ]:

Q15.

How do you handle Time Zones in Pandas datetime objects? (.dt.tz_localize() and .tz_convert()).

In [ ]:

Q16.

Write a .resample() operation that aggregates trade data into OHLC (Open, High, Low, Close) bars.

In [ ]:

Q17.

Explain the pd.DateOffset object and how it's used to add exactly one month to a datetime column.

In [ ]:

Q18.

How do you fill missing dates in a Time Series so that every single day has a row? (Hint: .asfreq('D')).

In [ ]:

Q19.

Write code to interpolate missing values in a time series quadratically.

In [ ]:

Q20.

What is a Timedelta? How do you calculate the number of days between two datetime columns?

In [ ]:

Q21.

Explain how to use .str.get_dummies() for One-Hot Encoding a categorical string column.

In [ ]:

Q22.

Write code to filter a Time Series DataFrame to only include business days (Monday-Friday).

In [ ]:

Q23.

How do you slice a Time Series DataFrame using partial string indexing? (e.g., df.loc['2023-01']).

In [ ]:

Q24.

What is 'Look-ahead bias' in Time Series analysis, and how do rolling windows help prevent it?

In [ ]:

Q25.

Write code to calculate the rolling standard deviation (volatility) of a stock price over a 20-day window.

In [ ]:

๐Ÿ“Š Day 28 Executive Summary

#TopicKey Takeaway
1.str / .dtUnlock hundreds of type-specific methods on entire columns
2ResampleIt's just groupby but for dates. Requires a DatetimeIndex.
3RollingCalculates stats over a sliding window. First N rows will be NaN.

โœ… Instructor's End-of-Day Checklist

โ€ข [ ] I can use .str to clean text columns.

โ€ข [ ] I can .resample() daily data into monthly data.

โ€ข [ ] I can calculate a 7-day rolling moving average.