2022 MAR 30
UPDATE 2024 DEC 09: This tutorial is incomplete as Internet Explorer went out of support. I’ve given up on my old OneNote notes.
In this article we will explore how to export your data from OneNote into Obsidian using Pandoc and Internet Explorer.
Obsidian is a note taking app much like OneNote. But with one key difference. All your data is store locally in Markdown files; while OneNote uses a propriatary system. Due to recent changes in their privacy policy I think its time to move away from OneNote.
Getting your data out of OneNote is tricky. It allows export into PDF, XPS, and MHTML. None of which are great. PDF, and XPS, are useless in our case. Since we’d like to generate Markdown files using Pandoc which does not support reading from XPS or PDF files. Instead we will focus on MHTML.
MHTML is an old and poorly supported format for storing a single HTML page and all its dependencies in a single file. Similar to email the content is split into parts of text and parts of Base64 encoded binary blobs.
Pandoc does not support MHTML. Internet Explorer does.
Tested with OneNote version 2202 Build 14931.20132
Open OneNote and export your data as Single File Web Page (*.mht)
:
OneNote should generate an MHTML file with your entire notebook; including images.
Open the resulting *.mht
file in Internet Explorer.
Press File > Save As.. and choose Webpage, complete (*.htm;*.html)
:
This should create a file called MyStuff.htm
and a folder called MyStuff_files
.
Convert the resulting HTML file to DOCX using Pandoc:
pandoc MyStuff.htm -o MyStuff.docx
This step is necessary to recover the media files. The media files in MyStuff_files
were mangeled by Internet Explorer. It unhelpfully renamed all files to mht7B30.tmp
and similar.
pandoc MyStuff.docx -o MyStuff.md -t markdown_strict --extract-media=Attachments