Skyscraper

Posted on May 31, 2026

SKY-SCRAPER is a small Flask app plus two processing scripts for turning selected regions of a video into a PDF. It is designed for videos where a small control area changes when the content changes. The app lets you pick:

  1. A start time and stop time.
  2. A crop region to extract from each selected frame.
  3. A control region used to detect when a frame is novel.

When processing runs, the script saves cropped frames to tmp/, then buildpdf.py compiles those images into out.pdf. See the source code.

How It Works

  1. Open the web UI and load static/vid.mp4.
  2. Choose the start and stop timestamps.
  3. Draw the crop rectangle.
  4. Draw the control rectangle.
  5. Send the selection to the backend.
  6. Run process.py to extract frames.
  7. Run buildpdf.py to pack the extracted images into a PDF.

Project Files

  • main.py: Flask server that serves the UI and writes crop.cfg.
  • templates/index.html: video selection page.
  • static/main.js: front-end interaction logic for timestamps and rectangles.
  • static/style.css: basic page styling.
  • process.py: extracts cropped frames from the configured video range.
  • buildpdf.py: converts extracted JPEGs into out.pdf.
  • crop.cfg: generated selection data.
  • tmp/: generated cropped frame images.

Requirements

  • Python 3.14+
  • Flask
  • OpenCV
  • Pillow
  • tqdm

Install dependencies with your preferred tool, for example uv sync if you are using uv.

Usage

1. Prepare the video

Put the source video at static/vid.mp4.

2. Start the web app

Run:

python main.py

Open the app in your browser, select the start/stop times, crop area, and control area, then click Send.

This writes the current selection to crop.cfg.

3. Extract frames

Run:

python process.py

This will:

  • Clear and recreate tmp/.
  • Read crop.cfg.
  • Scan the video between the selected start and stop times.
  • Save cropped frames only when the control region is considered novel.

4. Build the PDF

Run:

python buildpdf.py

This creates out.pdf from the images in tmp/.

Notes

  • process.py uses a similarity threshold (SIMILARITY_THRESHOLD = 0.97) to decide whether a control frame is new.
  • buildpdf.py currently places up to 6 images per A4 page.
  • The selected rectangles are stored in video pixel coordinates, not canvas coordinates.
  • The workflow is intentionally manual because fully automatic extraction is unreliable across arbitrary videos.

Outputs

  • crop.cfg: selected timestamps and rectangles.
  • tmp/frame-*.jpg: cropped frames.
  • out.pdf: final compiled document.

Acknowledgments

This project began while I was practising music from YouTube videos where authors often display sheet music alongside the lesson (example). Many of those authors do not distribute the sheet music separately for free, which is understandable. I’m grateful to them for sharing their tutorials. I created this tool out of curiosity and as an educational exercise, and definitely not to redistribute copyrighted material. I wanted to see whether automated extraction could be practical, and how it compares to manually cropping and assembling sheets into a PDF. For many cases, manual cropping remains superior: the tool requires parameter tuning, and doesn’t work reliably for all videos (example). Please use responsibly and respect the original authors’ copyrights.

License

MIT