Mark Koester

Personal Blog. A Data-Driven Life. Minding the Interstitial Spaces since 2007.

Paper to Paperless: A Guide to Digitalizing Your Journals With a Scanner App

One of the most effective and powerful tools for taking notes, capturing memories and creativity in general is using a physical notebook or journal. Research has shown that writing by hand improves our retention and learning and even has benefits for our imagination too. While typing and digital photos might be the main mode most of us work in, I am increasingly convinced that in my own life about utility of a hand-written journal. Furthermore, it’s a key creative habit that all creatives from writers and artists to programmers and data scientists should consider using.

But what’s to be done with the notebook afterwards? Once we are done with it, should we throw it away, stow it in a box or on a shelf, or might we do something else with it? Perhaps we can preserve it in another way? Perhaps we can take take our paper journals and digitalize them?

While you may not want to go to the lengths of Microsoft engineer Gordon Bell, who scanned and archived almost every single aspect of life, I’d argue that once we can and should take our paper notebooks and make them digital. Unlike paper, digital files won’t mold and degrade. They can be easily stored in the cloud or to a digital backup so they can last forever. With digital files in the cloud, you can access them anywhere with an internet connection and share them too. Digital files are easy to reorganize so you can mix-and-match them too. Additionally by adding some textual annotations or a table of contents, it’s fast to find your pages from a certain date or on a particular topic.

Ultimately the principal benefits of a paper journal are in the moment. You capture things when and where you are. For example, you read with a pen, which allows you to take better or smarter notes. You might also make a sketch, note keywords, write-down quotes, create lists, or make outlines. These are all powerful tools for learning and creating with a physical notebook, but there is benefit in digital files too. In most cases your final production or output will be digital. So it’s great to have your paper notebooks incorporated into your digital archive.

Paper and pen are powerful tools, but eventually you want you want to go digital and live paperless.

Fortunately, by using either an actual flat-bed scanner or, as I’ll show here, a scanner app on your phone, it’s easy to scan and digitalize your paper journals as well as photos, letters and anything less on paper. For example, you can digitalize an entire paper journal of 100+ pages in 20 or 30 minutes. Additionally, with a couple extra steps you can add annotations or append a table of contents to help you keep things better organized and link paper pages to your digital doings. .

In this post, I want to share a small slice of how to digitalize a life. Unlike Gordon Bell and his epic quest to digitalize everything, our target is a bit more modest: scanning and digitalizing our journals and notebooks. In the first paper, I provide an overview of the tools and techniques for scanning with either a flat-bed scanner or an app as well as how to add a table of contents. After that, in the the second part, I’ll share a quick example of scanning one of my own notebooks and tell you just how long it took to do. Finally, I’ll conclude with a few general thoughts on the benefits of digitalizing your notebooks, lifelogging and incorporating them in your digital archive.


NOTE: This post covers the practical aspects to digitalizing a paper notebook or journal. For a bit of history and background, see my previous post, Gordon Bell and The Epic Quest to Digitalize Everything.

A Modest How-To Guide for Going Paperless

The end goal of digitalizing a journal is to have a single file showing an image of each physical page and a table of contents that lets you look up past entries quickly.

There are a few tools you’ll need for digitalizing a life in paper:

  • Scanning: Flat-bed Scanner or Scanner App
  • TOC / Annotations: PDF Editing Tool like Acrobat or Preview and/or some basic coding skills.

Physical Scanners: Highest Quality Way to Go Digital

A flat-bed scanner is arguably the best way to turn physical journals (or photos and letters for that matter) into digital documents. This was the method used by Gordon Bell in his own digitalization project.

Today there are even some scanners that can be loaded with multiple photos, papers or a book and automatically scan multiple pages at once. This was how Google managed to digitalize many public domain books too.

Ultimately, an actual scanner will provide the best fidelity in capturing physical objects digitally. For example, my father has been using a scanner in his own archive project digitalizing photos and docs from his (including his first grade report card!).

Though there is a wide range of prices and features, you can generally buy a good photo and document scanner for around $100 usd. Check out PCMag or Best Reviews for some reviews of the most popular models.

Unfortunately, the process of scanning and digitalizing with a physical scanner can be rather tedious and time consuming. You must load the document, paper or photo into the scanner and do the scan which can take a few minutes. Then on the computer, you will likely need to use some software to correct the final images too. Auto-corrections for contrast and straightening works great in most cases, but it does take time.

One thing to also note about physical scanners is that you need to check and clean them periodically. Smudges and dust can built up and degrade your scan quality.

If your objective is to get a digital version of your notebooks that can be printed at high quality, then a physical scanner is the way to go. But if you are merely looking for readability and general accuracy for personal usage in your scans, then an app might be a faster method to try.

Scanner Apps: The Quick and Efficient Way to Go Paperless

If you are looking to quickly and with less effort digitalize your notebooks, arguably your best bet is a scanner app. A scanner app uses software to optimize either digital images you’ve already taken or uses your mobile phone’s camera to act like a scanner. Basically it improves the photos you take so they are sharper and flatter and can be converted into documents.

Like photography and camera apps in general, scanner apps are one of the more popular categories in iOS and Android app stores. There are dozens, perhaps hundreds of apps to choose from. Some of the top options that come up are Scanbot, Adobe Scan, Microsoft’s Office Lens, CamScanner, Tiny Scanner, and Google’s PhotoScan. All of which I have tried and worked fine.

Most work great for scanning needs, come with various features and sharing options. Unfortunately Scanbot Pro and Adobe Scan both require a subscription now, so likely they aren’t a good choice for casual or budget-conscious users.

Genius Scan: My Preferred Document Scanner App

Based on my own testing and extensive usage on iOS, my personal pick of Scanner Apps is Genius Scan, which is is also available for Android too.

Instantly scan documents - Genius Scan smart algorithms automatically detect your document, apply perspective correction and enhance the colors.

Source: https://www.thegrizzlylabs.com/genius-scan

Genius Scan does a great job of automatically detecting document edges while you take photos using the camera or while directly editing static images. The app then resizes and positions the images accordingly to create a flat, document view. You can apply appropriate filters so docs are high contrast and black and white (or alternatively leave them as colored photos). You can also manually edit scans to correct misapplied edge detection, use an alternative color filter, or make other corrections.

You can take single pics or use multiple pictures to create a single scanned document. This feature works really well and makes it time-efficient if you want to scan a book, a series of individual pages, or even multiple receipts from a trip in a single session.

Once you’ve scanned and editing the documents to your liking, Genius Scan lets you export the entire document as a PDF and put it where you want it. For example, you can share it by email, save it to a cloud storage like Dropbox or Google Drive, or store it locally on the device. You can also select individual scans and export as a JPEG.

For general usage, open up the app and tap the PLUS button and start scanning pages using the camera. It should detect edges and start snapping pics for each page. Once complete, export it to a PDF.

The end result: You’ve transformed physical paper into digital!

NOTE about Example Scan: This is one of my earliest notebooks from 2004-08-to-2006-03 following gradation in Chicago and moving to Europe.

Adding a Table of Contents (TOC) to a PDF

While scanning and digitalizing your journal is the most important part of this process, I find that adding annotations and a table of contents (TOC) makes your scanned journals even more usable, better organized and powerful. Without a table of contents, you are forced to scan page by page to find what you are looking for. But by appending a digital table of contents, you can quickly look up any of the main sections with just a single click.

There are a few different options for creating a table of contents on a PDF document. The most common way we normally create a TOC is by first putting links into the Word document we are creating and then exporting it all to a PDF. In this way the TOC should automatically be included. Unfortunately, for our needs, we already have a PDF so we need to add a table of contents to an existing document.

TOC Editing Software

Since we already have a PDF and need to create or edit the TOC, our options are a bit more limited. One option is to use special PDF editing software to do this. Three options include:

Each of these provides a way to add or edit existing TOC in your PDFs. Unfortunately at the time writing, none of these are free.

Appending a TOC using a Text File and a Python Script

One alternative to add a table of contents to your scanned journal and PDF document is to use a python script I’ve created and shared here. It requires a bit of coding and is slightly more technical way but it is also free and customizable.

Here’s how to do it.

On Mac, you should download or copy you script from Github Gist and paste the code into a file called something like pdf_toc_processor.py.

After that, you need to create a text file of your bookmarks or table of contents like the following example:

1
2
3
1, Cover
2, August 20 2018 @ Singapore
4, August 22 @ Los Angeles

The number is the page you will link to do and the text is the title of the bookmark itself.

After creating your PDF file and preparing your bookmarks / TOC, you’ll need to run the main command:

python pdf_toc_processor.py -i <path-to-target.pdf> -b <path-to-bookmarks-file.txt> -o <path-to-output.pdf>

This command requires three input paths: initial PDF, bookmarks file, and output PDF.

Complete Step-By-Step Guide to Digitalizing a Physical Journal with a Scanner App

Scanned Notebook to PDF with Digital TOC

Example of a recent journal I scanned with the bookmarks file and appended TOC in the PDF.

For me there are two primary usages for scanner apps.

The first and most common use case for me is scanning receipts and various paper items I collect on trips. While it’s possible to just use your camera to do this, a dedicated app makes it easier to key everything in one place and generate a final PDF export you can share with your company or keep for accounting purposes.

The second reason I have for scanning was in the spirit of Bell’s MyLifeBits project: I wanted to digitalize and create an archive of my assorted journals and notebooks.

I’ve been writing regularly in some form since I was in high school. Though it has ebbed and flowed, I’ve basically always had some sort of notebook for jotting ideas, sketching and writing. Over the years it’s grown to quite a pile.

TODO: PHOTO HERE OF JOURNALS

Some 20 or so odd physical journals I’ve created over the years. Is it time to digitalize?

As a minimalist and digital nomad, it doesn’t make sense to lug around an extra bag just for my old paper-bound scribblings that I rarely need anyways. In fact, I mostly use my notebooks as an initial step towards digital smart notes as part of my plain-text file archive anyways. Instead I wanted to create a digital copy for my archive and and ideally make it annotated too.

Step 1: Setup for Scanning

Here are two considerations for I follow in digitalizing my journals:

  1. Find a Good Scanning Surface: Ideally you want it to be non-white with a high contrast compared to the material you are scanning. While a bed might work, I got the best results with a desktop or table. The aim should be a none-white surface so the app can detect edges better.
  2. Check and Adjust the Lighting: The biggest issue you will likely confront in getting a good scan using your phone are shadows. The lighting in the room can cause your phone or arm to put a darker shadow on the page. Ideally find a desktop lamp or setup during the day where you minimize or eliminate unwanted shadows. Most apps can deal with it but you’ll see better results without shadows.
Step 2: Test Scans:

Once the space and lighting are ready, fire up your scanning app and do a few test scans. Before you scan an entire notebook, it’s best to do a “reality check” on a few pages to confirm the results are good enough for you needs.

Step 3: Scan an Entire Document

It’s likely you have multiple books you want to scan, and it might take multiple sessions. That said, I recommend starting with scanning one entire journal first and do all of the pages in it. Avoid scanning the same document over multiple sessions since lighting might can leading to an inconsistent final product.

Once everything looks good in testing, then just start scanning.

Depending on the app, it might attempt to detect edges while you take photos. If it doesn’t detect it right the first time, I recommend just continuing to take photos of the complete document rather than editing while you shoot. Take advantage of batching effect and focus on capturing all of the pages in one session.

Personally, even before editing and in spite of a few pages that had poor edge detection, I was quite pleased with the results:

Step 4: Check and Edit Your Scanned Document

Once you’ve captured all of the pages, it’s time to look at it page by page and edit accordingly. Genius Scan did a good job during the editing process of detecting most edges and making the appropriate corrections, but since the journal I used wasn’t completely flat and there were some colored pages, I needed to do some manual editing.

The editing process with Genius Scan and similar apps is quite straightforward. The main task is to help the app with identifying the edges of the page. So your task is to adjust the rectangle to cover the edges accordingly. Once set correctly, Genius Scan can reposition it for ideal viewing and export.

Step 5: Export to PDF

Now that the journal pages have all been photographed and correct, you are ready to export it. Like most apps, Genius Scan offers a range of options for export. For my purposes, a simple export to my file directory was enough.

Step 6: Typing up the Table of Contents

Now that everything has been digitalized, it’s up to you if you want to do anything else. For my purposes, I think taking a few extra minutes to add a table of contents makes the document a lot more usable. Using the script and example I provided above, go through the export PDF and create a text file of the page number and the title or topic of the page. This document can be used not only to create the TOC but it can also be a good reference to include in your own smart notes archive too.

Step 7: Run Script to Append TOC

Once you’ve got the initial PDF and a bookmarks TOC, it’s time to run our script:

python pdf_toc_processor.py -i my-journal-2018.pdf -b my-journal-2018-toc.txt -o my-journal-2018-with-toc.pdf

Final Result: Scanned Notebook to PDF with Digital TOC

Using my smartphone, Genius Scan and a bit of simple scripting, I created a PDF document that was both extremely high quality and detailed. It had a TOC for easy reference, and I now had a digital version of my notebook forever. For this example, the final size clocked in at 110 meg of space too.

So, how long did scanning and digitalizing this particular notebook take?

  • 15 to 20 minutes for capturing via the app all 170+ pages in this particular notebook
  • 20-25 minutes more for additional manual cropping, editing and exporting to PDF.
  • 20-25 minutes to type up the TOC and append to the PDF.

Even though a physical scanner might be higher quality, there is no doubt the using a phone and scanner are a faster approach. All told, the scanning and editing work for a single notebook took me about an hour over 3 different sessions. I believe as I do this more, I can slightly decrease the time it takes. That said, considering that I still have about 25 more notebooks to scan, I estimate this will take 20+ hours to complete. Now I just need the actual time to do it!

Conclusion: Paper Made Digital: From Lifelogging to Meaningful Digital Archiving

I’ll admit that I am a strong adherent of the digital life. As a reader of this blog will likely have realized, a number of tools and technologies now exist that make it easy to track a life. For example, RescueTime enables you to log the time spent in different apps and sites on your computer (See: my guide to time tracking); Google’s Takeout, Twitter, Amazon’s Alexa, and Facebook let you export the bulk of your usage (like social media posts, photos, emails, voice messages, and more); read-it-later apps like Instapaper (See: my guide to article reading tracking); Apple and Android phones can track your screentime; by tweaking how you take notes you can track your knowledge management and creative process; and by owning a wearable you can track your activity, steps, sleep, heart rate, stress, and more (See: Self-Tracking with Apple Watch).

In short, there is no shortage of ways to lead a digitally-mediated, data-driven and tracked life.

In my own life, I’m a full-time product manager, and I dabble regularly in mobile and web development and data science. All of these roles put me in front of technology all the time, and I spend a large part of my non-sleeping hours on digital devices. I’m a strong believer in time tracking as an enabler of more conscious time management. To put this in perspective, according to my sleep tracking, I sleep about 7-8 hours a week, about one third of each day (or about 7-9 hours) goes into computer time, and another 10-16% per day (or 2-4 hours) is spent on non-computer screen time either on my smartphone or tablet (including article reading, YouTube, TV shows, Movies, and PDF reading). Also while it’s a bit more of a challenge to track my book reading, even with a Kindle, I have used my reading history to visualize what I read and estimate that I read at least an hour or two per day

Obviously I have taken extra steps to track and make my technology and time usage more transparent, but nearly all of us today, including most millennials, live most of lives in the digital realm. So, while it is impressive that Gordon Bell took on an epic quest to digitalize everything in his life), for most of us today that’s standard and par for the course.

And yet there are things in our lives that will remain non-digital, and that’s ok. In fact, I’d argue that it’s good to have a part of our learning, creative, and note-taking process that is analog and non-digital. A journal is a powerful tool and hand-writing is well-documented way to better retain information and build connections between ideas. In short, there is still a place for notebooks and pens today, and there are a range of techniques you can use to leverage physical paper in your creative process. This is a topic we will have to leave for another post.

With paper notebooks and bounded journals, there is often a moment where we want to have them digitally. Maybe you are worried that a flood or natural disaster will destroy them. Or you travel extensively and like to have the reminders for reference. Perhaps you are concerned that time will degrade them. Or you want to share a few pages with someone. Possible you desire to reuse aspects of them in an a mash-up projects. These are a just benefits and reasons why you should digitalize your notebooks. Likely you are, like many, bound to the digital world, and by making paper digital, you can make this happen.

While the “gold standard” for digitalization might be using an actual scanner, we all basically carry around a mini computer and digital camera that can be easily put to the same usage as an actual scanner. A number of scanner apps do a pretty incredible job of fixing the orientation, highlighting text and automating improvements.

In this post, we dove into the technologies and tools need to digitalize your notebooks. With a simple scanner app and a smartphone, we also did a step-by-step guide on how to go from scanning and editing journal pages to exporting to PDF and appending a table of contents. The final result is a robust digital document you can search, reference and reuse digitally. Depending on the number pages, digitalizing a notebook might only take 30 minutes to an hour or so with a scanning app.

Today, it’s also not unreasonable to go nearly paperless. Most of us already live that way. But some things are still nice on paper and pen. I’m a stronger believe that learning and creativity finds new avenues when expressed in analog forms. This is only possible on paper, and a journal is a great home for those sketches, ideas, and notes from the road.

Fortunately, it doesn’t need to one or other. Your paper journals can be made digital. You can have the best of both worlds. Paper can be made digital.


Best of luck and happy digitalizing!



BONUS: Images Scans to Digital Text: How to OCR Your Scanned Docs (Mac OSX)

In this post, we focused largely on digitalizing hand-written notebooks. But one interesting opportunity, especially if you have good handwriting is OCR technology.

Optical character recognition (OCR) is the conversion of scanned images into machine-readable digital text. Basically OCR software looks at the images and detects the actual letters and words. This is then superimposed on the image or document itself.

While a deep dive into the topic of OCR is beyond the scope of this post, OCR can be a pretty powerful addition to any quest to digitalize everything, since it makes texts searchable.

A number of scanners, scanning software and scanner apps provide an OCR feature, often as a paid add-on. Genius Scan includes this as one of its paid features too. Additionally there are various online, web services that can provide this too.

If you are on a Mac, OCRmyPDF is a free, open source command line tool you can install that can process and add an OCR text layer to your PDF files. For example, when the Mueller report was first released as an image PDF, I used OCRmyPDF to convert it to text for easier reading and highlighting.

While the primary example I shared in this post involved handwriting and thus difficult for OCR, this approach can be applied for scans of printed pages. Also if you have clear handwriting and find the right software, you might be able to convert your paper journals into digital text docs now or one day in the future.

Comments