Skyscraper
SKY-SCRAPER is a small Flask app plus two processing scripts for turning selected regions of a video into a PDF. It is designed for videos where a small control area changes when the content changes. The app lets you pick:
- A start time and stop time.
- A crop region to extract from each selected frame.
- A control region used to detect when a frame is novel.
When processing runs, the script saves cropped frames to tmp/, then buildpdf.py compiles those images into out.pdf. See the source code.
How It Works
- Open the web UI and load
static/vid.mp4. - Choose the start and stop timestamps.
- Draw the crop rectangle.
- Draw the control rectangle.
- Send the selection to the backend.
- Run
process.pyto extract frames. - Run
buildpdf.pyto pack the extracted images into a PDF.
Project Files
main.py: Flask server that serves the UI and writescrop.cfg.templates/index.html: video selection page.static/main.js: front-end interaction logic for timestamps and rectangles.static/style.css: basic page styling.process.py: extracts cropped frames from the configured video range.buildpdf.py: converts extracted JPEGs intoout.pdf.crop.cfg: generated selection data.tmp/: generated cropped frame images.
Requirements
- Python 3.14+
- Flask
- OpenCV
- Pillow
- tqdm
Install dependencies with your preferred tool, for example uv sync if you are using uv.
Usage
1. Prepare the video
Put the source video at static/vid.mp4.
2. Start the web app
Run:
python main.py
Open the app in your browser, select the start/stop times, crop area, and control area, then click Send.
This writes the current selection to crop.cfg.
3. Extract frames
Run:
python process.py
This will:
- Clear and recreate
tmp/. - Read
crop.cfg. - Scan the video between the selected start and stop times.
- Save cropped frames only when the control region is considered novel.
4. Build the PDF
Run:
python buildpdf.py
This creates out.pdf from the images in tmp/.
Notes
process.pyuses a similarity threshold (SIMILARITY_THRESHOLD = 0.97) to decide whether a control frame is new.buildpdf.pycurrently places up to 6 images per A4 page.- The selected rectangles are stored in video pixel coordinates, not canvas coordinates.
- The workflow is intentionally manual because fully automatic extraction is unreliable across arbitrary videos.
Outputs
crop.cfg: selected timestamps and rectangles.tmp/frame-*.jpg: cropped frames.out.pdf: final compiled document.
Acknowledgments
This project began while I was practising music from YouTube videos where authors often display sheet music alongside the lesson (example). Many of those authors do not distribute the sheet music separately for free, which is understandable. I’m grateful to them for sharing their tutorials. I created this tool out of curiosity and as an educational exercise, and definitely not to redistribute copyrighted material. I wanted to see whether automated extraction could be practical, and how it compares to manually cropping and assembling sheets into a PDF. For many cases, manual cropping remains superior: the tool requires parameter tuning, and doesn’t work reliably for all videos (example). Please use responsibly and respect the original authors’ copyrights.
License
MIT