Read Microsoft PST files with rich schemas for common MAPI types (emails, contacts, appointments, tasks)
Maintainer(s):
mach-kernel
Installing and Loading
INSTALL pst FROM community;
LOAD pst;
Example
-- Count all messages across PST files (supports globs)
D SELECT count(*) FROM read_pst_messages('enron/*.pst');
┌────────────────┐
│ count_star() │
│ int64 │
├────────────────┤
│ 1227193 │
└────────────────┘
-- Query contacts (supports remote URIs)
D SELECT given_name, surname FROM read_pst_contacts('https://example.com/outlook.pst');
┌────────────┬─────────┐
│ given_name │ surname │
│ varchar │ varchar │
├────────────┼─────────┤
│ John │ Doe │
│ Jane │ Smith │
└────────────┴─────────┘
-- Read messages with limit (applied during planning for large files)
D SELECT subject, sender_email_address, message_delivery_time
FROM read_pst_messages('*.pst', read_limit=100);
About pst
A DuckDB extension for reading Microsoft PST files with rich schemas for common MAPI types. Built on Microsoft's official PST SDK. Query emails, contacts, appointments, and more. Use it to analyze PST data in-place (locally or on object storage), import to DuckDB tables, or export to Parquet.
Table Functions
| Function | MAPI Class | Description |
|---|---|---|
read_pst_folders |
* |
Folder hierarchy |
read_pst_messages |
* |
All messages with base IPM.Note schema |
read_pst_notes |
IPM.Note |
Email messages |
read_pst_contacts |
IPM.Contact |
Contacts with 78+ fields |
read_pst_distribution_lists |
IPM.DistList |
Distribution lists with members |
read_pst_appointments |
IPM.Appointment |
Calendar appointments and meetings |
read_pst_sticky_notes |
IPM.StickyNote |
Sticky note items |
read_pst_tasks |
IPM.Task |
Task items |
Performance Features
- Query pushdown: projection and statistics pushdown
- Concurrent planning: parallel partition planning for directories with many PST files
- Late materialization: filter on virtual columns before expanding full projections
Parameters
| Parameter | Default | Description |
|---|---|---|
read_body_size_bytes |
1000000 |
Max bytes to read into body/body_html (0 for unlimited) |
read_attachment_body |
false |
Whether to read attachment bytes |
read_limit |
NULL |
Max items to read (applied during planning) |
For full schema documentation and usage examples, see the GitHub repository.
Added Functions
| function_name | function_type | description | comment | examples |
|---|---|---|---|---|
| read_pst_appointments | table | NULL | NULL | |
| read_pst_contacts | table | NULL | NULL | |
| read_pst_distribution_lists | table | NULL | NULL | |
| read_pst_folders | table | NULL | NULL | |
| read_pst_messages | table | NULL | NULL | |
| read_pst_notes | table | NULL | NULL | |
| read_pst_sticky_notes | table | NULL | NULL | |
| read_pst_tasks | table | NULL | NULL |