โณ Loading Python Engine...

๐Ÿ“Š Day 15 : File Handling

๐ŸŽฏ Enterprise Objective

Data Analyst pipelines start by reading data and end by saving data. Today we master Disk I/O, parsing the universal language of the web (JSON), and modernizing our code using the powerful, object-oriented pathlib module.

๐Ÿ“‹ Strategic Overview

#TopicConcept
1File I/Oopen(), with context
2JSONloads(), dumps()
3PathlibObject-oriented paths

1. Reading & Writing Files : Disk I/O

๐Ÿ” What is it?

Data lives in files. Python interacts with files using the open(filename, mode) function. Modes include 'r' (read), 'w' (write/overwrite), and 'a' (append). You should ALWAYS use a Context Manager (the with statement) to ensure the file is closed properly, even if an error occurs.

with open('data.txt', 'w') as f:
    f.write('Hello World\n')

๐Ÿ’ผ Why Data Analysts Care

โ€ข Log Processing: Reading multi-gigabyte log files line-by-line without running out of RAM

โ€ข Data Exports: Saving analysis results to local text files

โš ๏ธ Memory Leaks

If you do f = open('data.txt') and forget to call f.close(), the file remains locked in memory. Always use with open(...) as f:.

In [ ]:

๐Ÿงช Concept Checks: File I/O

Q1. Write code to open a file "test.txt" in write mode ("w") and write your name to it.

In [ ]:

Q2. Open "test.txt" in read mode ("r"). Read the contents and print them.

In [ ]:

Q3. Open "test.txt" in append mode ("a"). Add a new line "Welcome to Python". Print the full file again.

In [ ]:

Q4. Explain why with open() as f: is superior to f = open(); f.read(); f.close().

In [ ]:

Q5. Write a memory-efficient for loop to read a file line-by-line. (Assume file is "test.txt").

In [ ]:

2. Parsing JSON : The Language of the Web

๐Ÿ” What is it?

JSON (JavaScript Object Notation) is the universal format for web APIs. It maps perfectly to Python dictionaries and lists. The built-in json module provides tools to parse strings into dicts (loads) and serialize dicts into strings (dumps).

FunctionPurposeInput -> Output
json.loads(s)Load StringString -> Dictionary
json.dumps(d)Dump StringDictionary -> String
json.load(f)Load FileFile Object -> Dictionary
json.dump(d, f)Dump FileDictionary -> File Object

๐Ÿ’ผ Why Data Analysts Care

โ€ข API Integration: Parsing REST API responses (which are almost always JSON)

โ€ข Configuration: Loading application settings from a .json file

๐Ÿง  Pro Tip

Use json.dumps(data, indent=4) to 'pretty-print' complex dictionaries for easy debugging.

In [ ]:

๐Ÿงช Concept Checks: JSON

Q1. Import json. Use json.loads() to parse '{"x": 10, "y": 20}' into a dictionary.

In [ ]:

Q2. Convert the dictionary {"color": "red", "sizes": [1, 2]} to a JSON string using json.dumps(). Print it.

In [ ]:

Q3. Use json.dumps() with the indent=4 argument to pretty-print {"a": {"b": 1}}.

In [ ]:

Q4. Write code to save a dictionary d directly to a file "data.json" using with open() and json.dump().

In [ ]:

Q5. Read "data.json" back into a dictionary using json.load(). Print the type of the loaded object.

In [ ]:

3. Pathlib : Modern File Paths

๐Ÿ” What is it?

Handling file paths as strings (e.g., 'data/users.txt') causes bugs across different operating systems (Windows uses \, Mac/Linux use /). The modern Pythonic way is the pathlib module, which treats paths as objects.

from pathlib import Path

# Object-oriented paths
folder = Path('data')
file_path = folder / 'users.txt'  # The / operator intelligently joins paths!

๐Ÿ’ผ Why Data Analysts Care

โ€ข Cross-Platform Code: Write code on a Mac that executes flawlessly on a Windows server

โ€ข File Operations: Easily check if a file exists, get its suffix (e.g., .csv), or read its text instantly

๐Ÿง  Pro Tip

Pathlib objects have amazing built-in methods: path.exists(), path.read_text(), and path.suffix. Use them instead of the older os.path module.

In [ ]:

๐Ÿงช Concept Checks: Pathlib

Q1. Import Path from pathlib. Create a Path object for "folder" / "subfolder" / "file.csv". Print it.

In [ ]:

Q2. Create a Path object p = Path("demo.txt"). Use p.write_text("Hello") to create the file.

In [ ]:

Q3. Use p.read_text() to read the file created in Q2 and print it. Then check p.exists().

In [ ]:

Q4. Create a Path for "image.jpg". Print its .suffix and .stem (the name without extension).

In [ ]:

Q5. Use Path.cwd() to get the current working directory. Print it.

In [ ]:

๐Ÿ› ๏ธ Professional Practice Tasks

Theory is useless without muscle memory. Complete these tasks to solidify your understanding.

Task 1 (Log Parser): Create a file server.log with 5 lines: 2 containing 'ERROR', 3 containing 'INFO'. Write a memory-efficient loop to read the file and print ONLY the 'ERROR' lines.

In [ ]:

Task 2 (JSON Config Updater): Write a function update_config(file_path, key, val). It should read a JSON file (or create {} if missing), update the key, and save the JSON back to the file.

In [ ]:

Task 3 (File Extension Counter): Create 3 files: a.txt, b.csv, c.txt in a new folder using Pathlib. Write a function that uses Path.iterdir() to iterate the folder and count how many .txt files exist.

In [ ]:

Task 4 (CSV to JSON): Write a simulated CSV string (e.g., 'id,name\n1,Alice\n2,Bob'). Parse it manually using split('\n') and split(','), convert to a list of dicts, and json.dumps() it.

In [ ]:

Task 5 (Safe File Reader): Write a function read_safe(path) that uses pathlib to check if a file exists. If so, return its text. If not, return None. Test with a valid and invalid path.

In [ ]:

๐Ÿ’ป Pure Coding Interview Questions

Q1.

Explain the difference between open('f.txt', 'w') and open('f.txt', 'a').

In [ ]:

Q2.

Why is it essential to use a context manager (with statement) when opening files?

In [ ]:

Q3.

Explain the difference between f.read(), f.readline(), and f.readlines().

In [ ]:

Q4.

How do you read a 50GB file in Python without running out of RAM?

In [ ]:

Q5.

Explain the difference between json.loads() and json.load().

In [ ]:

Q6.

Write code to parse a JSON string, extract a specific field, and handle a json.JSONDecodeError.

In [ ]:

Q7.

Why shouldn't you use regular expressions to parse JSON or HTML?

In [ ]:

Q8.

Compare os.path.join with pathlib's / operator. Why is pathlib preferred in modern Python?

In [ ]:

Q9.

Write a script that uses pathlib to rename all .txt files in a directory to .md.

In [ ]:

Q10.

How do you write a list of dictionaries to a CSV file without using Pandas (using the csv module)?

In [ ]:

Q11.

Explain how character encodings work in Python. Why should you often use encoding='utf-8' in open()?

In [ ]:

Q12.

Write code that safely creates a nested directory structure (e.g., a/b/c) if it doesn't exist.

In [ ]:

Q13.

What is the Pickle module? Why is json generally preferred over pickle for data serialization?

In [ ]:

Q14.

Write a generator function that reads a file and yields chunks of 1024 bytes at a time.

In [ ]:

Q15.

Explain the security risks of using yaml.load() or pickle.loads() on untrusted data.

In [ ]:

Q16.

Write code using the tempfile module to create a temporary file, write data, and auto-delete it.

In [ ]:

Q17.

How do you handle file locking in Python if two processes try to write to the same file simultaneously?

In [ ]:

Q18.

Explain what the file variable is and how it's used to find relative asset paths.

In [ ]:

Q19.

Write a function that recursively finds all files larger than 1MB in a directory using pathlib.

In [ ]:

Q20.

How do you handle reading a file that might be locked or currently being written to by another program?

In [ ]:

Q21.

Write code using shutil to copy a file and preserve its metadata.

In [ ]:

Q22.

Explain the purpose of StringIO and BytesIO in the io module. When would you use them?

In [ ]:

Q23.

Write a script that merges 5 different JSON files into a single master JSON file.

In [ ]:

Q24.

How does Pandas read_csv differ fundamentally from the standard library csv.reader?

In [ ]:

Q25.

Write code to extract a ZIP file using the zipfile module or shutil.unpack_archive.

In [ ]:

๐Ÿ“Š Day 15 Executive Summary

#TopicKey Takeaway
1I/OALWAYS use context managers (with)
2JSONThe bridge between Python dicts and the web
3PathlibReplaces messy os.path strings

โœ… Instructor's End-of-Day Checklist

โ€ข [ ] I can safely open, read, and close a file.

โ€ข [ ] I can parse a JSON string into a dictionary.

โ€ข [ ] I can use pathlib to construct safe file paths.