No description

Blade 47.3%
PHP 44.4%
Python 3.9%
Dockerfile 3.6%
Shell 0.5%
Other 0.2%

Find a file

unurled c84e9b4188 Some checks failed Build / docker (push) Failing after 25s Details linter / quality (push) Successful in 2m27s Details tests / ci (push) Failing after 3m53s Details Merge pull request 'fix UTC in ICS export' (#6 ) from dev into main Reviewed-on: #6		2026-02-24 15:26:18 +01:00
.devcontainer	add sail	2025-11-27 12:36:43 +01:00
.github	Update build.yml	2026-01-03 18:39:17 +01:00
.junie/mcp	Initial commit	2025-11-27 11:12:56 +01:00
app	fix UTC in ICS export	2026-02-24 15:00:23 +01:00
bootstrap	add cronimage	2026-01-14 14:23:32 +01:00
config	fix ics export	2026-01-14 12:21:11 +01:00
database	feat: add functionality to hide/show events for users	2025-12-22 16:44:15 +01:00
docker	relpace crond for suercronic	2026-01-14 15:56:43 +01:00
docs	multiple fixes	2025-12-08 12:41:54 +01:00
lang	feat: implement localization for redirect links management and platform messages	2025-12-22 17:13:13 +01:00
public	last minutes fix	2025-12-08 13:09:37 +01:00
resources	admin course editing	2026-02-09 13:25:53 +01:00
routes	admin course editing	2026-02-09 13:25:53 +01:00
storage	Initial commit	2025-11-27 11:12:56 +01:00
tests	feat: add user creation and editing modals with validation	2025-12-22 21:33:15 +01:00
.dockerignore	another fix?	2026-01-14 14:30:11 +01:00
.editorconfig	Initial commit	2025-11-27 11:12:56 +01:00
.env.example	feat: add umami tracking	2025-12-11 14:25:20 +01:00
.gitattributes	Initial commit	2025-11-27 11:12:56 +01:00
.gitignore	fix ics export	2026-01-14 12:21:11 +01:00
artisan	Initial commit	2025-11-27 11:12:56 +01:00
boost.json	fix ics export	2026-01-14 12:21:11 +01:00
CLAUDE.md	fix ics export	2026-01-14 12:21:11 +01:00
compose.dev.yaml	fix: remove all mentions of email verification	2025-12-22 16:08:14 +01:00
compose.yaml	add cronimage	2026-01-14 14:23:32 +01:00
composer.json	fix ics export	2026-01-14 12:21:11 +01:00
composer.lock	fix ics export	2026-01-14 12:21:11 +01:00
Dockerfile	Update Dockerfile	2025-12-23 08:10:41 +00:00
main.py	import ics and import pdf	2025-11-28 14:28:59 +01:00
package-lock.json	Initial commit	2025-11-27 11:12:56 +01:00
package.json	add docker images	2025-12-03 19:17:46 +01:00
phpunit.xml	Initial commit	2025-11-27 11:12:56 +01:00
pyproject.toml	import ics and import pdf	2025-11-28 14:28:59 +01:00
README.md	import ics and import pdf	2025-11-28 14:28:59 +01:00
start-container.sh	fix: http 500	2025-12-11 15:46:18 +01:00
uv.lock	import ics and import pdf	2025-11-28 14:28:59 +01:00
vite.config.js	Initial commit	2025-11-27 11:12:56 +01:00

README.md

EDT OCR - PDF Timetable Extractor

A Python tool to extract structured timetable data from PDF files, specifically designed for course schedules with weekly layouts.

Features

Automatic table extraction from PDF files
Intelligent parsing of course and professor information
Structured output organized by day, time slot, and week
Multiple export formats: CSV, JSON, and ICS (iCalendar)
Flexible extraction: automatic detection or manual coordinate specification
Calendar integration: Export to ICS for Google Calendar, Outlook, Apple Calendar
Pretty console output for quick review

Timetable Format

This tool is designed for timetables with the following structure:

Columns: Weeks
Rows: Days of the week
Time slots:
- Morning: 8:30 - 12:15
- Afternoon: 13:30 - 17:15
Cell content: Course name + Professor name

Installation

Ensure you have Python 3.13+ installed
Install dependencies:

pip install pdfplumber

Or using uv:

uv pip install pdfplumber icalendar

Usage

Basic Usage

Extract and parse timetable from PDF:

python main.py FIP1A_EDT_2025_2026-v12112025.pdf

This will:

Extract the table from the first page
Parse course and professor information
Display organized entries by day and time slot

Specific Page

python main.py FIP1A_EDT_2025_2026-v12112025.pdf --page 0

View Raw Table

To see the raw extracted table without parsing:

python main.py FIP1A_EDT_2025_2026-v12112025.pdf --raw

Limit displayed rows:

python main.py FIP1A_EDT_2025_2026-v12112025.pdf --raw --max-rows 20

Extract from Specific Coordinates

If automatic detection doesn't work well, specify exact table position:

python main.py FIP1A_EDT_2025_2026-v12112025.pdf --page 0 --x 50 --y 100 --width 500 --height 600

Coordinates explained:

--x: X coordinate of the top-left corner (in PDF points)
--y: Y coordinate of the top-left corner (in PDF points)
--width: Width of the table region
--height: Height of the table region

Export to CSV

python main.py FIP1A_EDT_2025_2026-v12112025.pdf --output timetable.csv

CSV format includes columns:

day: Day of the week
time_slot: Morning or Afternoon with time range
week: Week identifier
course: Course name
professor: Professor name

Export to JSON

python main.py FIP1A_EDT_2025_2026-v12112025.pdf --output timetable.json

JSON format provides structured data with the same fields as CSV.

Export to ICS (iCalendar)

python main.py FIP1A_EDT_2025_2026-v12112025.pdf --output timetable.ics --year 2025

ICS format creates calendar events that can be imported into:

Google Calendar
Microsoft Outlook
Apple Calendar
Any calendar application supporting iCalendar format

Features:

Events automatically scheduled on correct dates and times
Exam sessions marked with 🎓 EXAM prefix
Professor information included in event description
Special "EXAM" category for filtering exam events

Combined Options

python main.py FIP1A_EDT_2025_2026-v12112025.pdf --page 0 --x 50 --y 100 --width 500 --height 600 --output schedule.csv

Output Example

Console Output

=== TIMETABLE ENTRIES ===

Monday:
------------------------------------------------------------
  [Morning (8:30-12:15)] Week 1: Mathematics (Prof. Smith)
  [Afternoon (13:30-17:15)] Week 1: Physics (Prof. Johnson)

Tuesday:
------------------------------------------------------------
  [Morning (8:30-12:15)] Week 1: Chemistry (Prof. Williams)
  [Afternoon (13:30-17:15)] Week 2: Biology (Prof. Brown)

CSV Output

day,time_slot,week,course,professor
Monday,Morning (8:30-12:15),Week 1,Mathematics,Prof. Smith
Monday,Afternoon (13:30-17:15),Week 1,Physics,Prof. Johnson
Tuesday,Morning (8:30-12:15),Week 1,Chemistry,Prof. Williams

How the Parser Works

The script intelligently parses cell content to separate course names from professor names:

Newline separation: If cell contains multiple lines, first line is the course, remaining lines are the professor
Pattern matching: Detects professor names with titles (M., Mme, Dr., etc.) or in parentheses
Fallback: Uses heuristics based on capitalization and word count

Finding Coordinates

If automatic table detection fails:

Run with --raw to see what's being extracted
Open the PDF in a viewer with coordinate display (Adobe Acrobat, PDF-XChange)
Note the bounding box of your table
Use those coordinates with --x, --y, --width, --height

Alternatively, use trial and error with the coordinate parameters until the table is properly extracted.

Dependencies

Python 3.13+
pdfplumber: PDF processing and table extraction library
icalendar: iCalendar file generation for calendar export (optional, only needed for ICS export)

Project Structure

edt-ocr/
├── main.py                          # Main script with parsing logic
├── pyproject.toml                   # Project configuration and dependencies
├── README.md                        # This file
└── FIP1A_EDT_2025_2026-v12112025.pdf  # Example PDF (your timetable)

Troubleshooting

No tables detected

Try different pages with --page N
Use coordinate-based extraction with --x, --y, --width, --height
Check if the PDF contains actual tables (not images of tables)
Use --raw to see what's being extracted

Course and professor not separated correctly

The parser tries multiple patterns to split course/professor
If it fails, the full cell content will be in the course field
You can manually adjust the parsing logic in the parse_cell_content() function

Incorrect table structure

Use --raw to verify the table structure
Adjust coordinates for more precise extraction
Check if the PDF has the expected structure (weeks as columns, days as rows)

Empty cells or missing data

Some cells might be empty (no class scheduled)
The parser skips empty cells automatically
Merged cells in the PDF might cause parsing issues

Advanced Usage

Customizing Time Slots

If your timetable has different time slots, you can modify the time_slots parameter in the parse_timetable() function:

time_slots = ["Morning (8:00-12:00)", "Afternoon (14:00-18:00)", "Evening (18:00-22:00)"]

Customizing Day Names

The parser supports both English and French day names. To add more languages, edit the days_of_week list in parse_timetable().

Using ICS Files

After generating the ICS file:

Google Calendar:
- Go to Google Calendar
- Click the "+" next to "Other calendars"
- Select "Import"
- Upload the .ics file
Outlook:
- File → Open & Export → Import/Export
- Select "Import an iCalendar (.ics) file"
- Choose the file
Apple Calendar:
- Double-click the .ics file
- Or File → Import → select the file
Mobile Devices:
- Email the .ics file to yourself
- Open on mobile and import to your calendar app

License

MIT

Contributing

Feel free to open issues or submit pull requests for improvements!