Skip to content

KWARC/semantic-video

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

96 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Semantic Video

This project is designed to enhance course notes by semantically linking them with text extracted from video frames of a professor teaching slides. The system uses Optical Character Recognition (OCR) to extract text from the slides in the video, processes the extracted content, and integrates it into the course notes by matching the slide content to relevant sections.

Features

  • Extracts text from video frames of teaching sessions at specified intervals.
  • Processes extracted text and semantically matches it to existing course notes.
  • Uses frame comparison to skip redundant frames with minimal differences.
  • Utilizes Tesseract OCR for text extraction and OpenCV for frame processing.

Requirements

  • Python 3.6 or higher
  • Required Python libraries (listed in requirements.txt)

Setup Instructions

1. Clone the Repository

Start by cloning the repository to your local machine:

2. Create a Virtual Environment

Create a virtual environment to isolate the project dependencies:

  • python3 -m venv venv

Activate the virtual environment:

  • macOS/Linux: source venv/bin/activate

  • Windows: venv\Scripts\activate

3. Install Dependencies

  • pip install -r requirements.txt

4. Set the env Variables

  • Create a .env.local file in the root directory of your project and add the following lines:

VIDEOS_DIR=./data/videos/{course_id} # Replace {course_id} with the actual course ID OCR_EXTRACTED_FILE_PATH=./data/cache/{course_id}_extracted_contents.json # File path for the cache JSON file (adjust file name accordingly) PROCESSED_SLIDES_FILE_PATH=./data/slides/{course_id}_processed_slides.json # File path for the processed slides JSON file (adjust file name accordingly)

5. Run the script

  • python scripts/video_text_extractor.py or
  • python3 scripts/video_text_extractor.py

Customize frame interval(optional)

The script processes frames at 10-second intervals by default. You can change the interval by modifying the following line in main.py

  • interval_frames = int(fps * 10) # Change 10 to your desired interval (in seconds)

Matching algorithm configuration (optional)

The slide matcher uses an enhanced algorithm for higher accuracy. You can tune it via environment variables in .env.local:

Variable Default Description
MATCH_MIN_OCR_LENGTH 50 Minimum OCR text length (chars) to attempt matching
MATCH_LOW_THRESHOLD 55 Minimum similarity score to accept a match
MATCH_SHORT_SLIDE_THRESHOLD 85 Higher threshold for slides with <100 chars
MATCH_SEQUENTIAL_WINDOW 15 Slides ±N from last match get priority (videos show slides in order)
MATCH_SEQUENTIAL_BOOST 5 Extra score points when match is in expected sequential range

For higher accuracy, try lowering MATCH_LOW_THRESHOLD (e.g. 50) if you get many unmatched entries, or raising it if you get wrong matches.

License

This project is licensed under the MIT License .

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors