Guide

Why PDF text cannot be copied

Visible letters in a PDF do not always mean there is real copyable text underneath. The reason depends on how the PDF was made and whether the document allows extraction.

Quick answer

If you cannot copy text from a PDF, the file is usually one of three things: an image-only scan, a text PDF with broken font encoding, or a document with extraction restrictions. The visible letters on screen do not by themselves prove that the file contains reusable text.

Real user problem

The common situation is practical rather than technical: someone wants to quote a contract, reuse an address from an invoice, copy a table from a report, or search inside a scanned packet. The frustration starts when the PDF looks readable but behaves like a picture.

Image PDFs only look like text

A scanned PDF is often just a page image. You can see letters on the screen, but there may be no embedded text layer for copying or extraction. In that case you need OCR rather than a normal text extractor.

Font encoding can break extraction

Some PDFs use custom fonts or unusual character encoding. The text may render correctly on screen but still extract as broken characters, missing spaces, or jumbled reading order. This is common in older exported forms, niche business software, and PDFs built from print drivers.

Permissions can block copying

A PDF can also include security settings that restrict copy or extraction. Even if the file visually contains text, the tool or browser may refuse to expose it in a normal workflow. If the restriction is respected by the PDF library, a normal browser tool should not pretend to bypass it.

Checklist before you try another tool

Try selecting a short line in a normal PDF reader. If selection fails completely, suspect an image PDF or a protected file. If selection works but pasted text is messy, suspect encoding. If only a few pages fail, the document may mix scanned and digital pages.

What PDFresh can do

Extract PDF Text reads an existing text layer in the browser. It can help on digital PDFs where the text is already there and you want a quick browser-side way to pull it out without uploading the file for that task.

What PDFresh cannot do

PDFresh does not OCR image-only PDFs in the current text-extraction flow and it does not override restrictions that your browser or PDF library respects. Extracted text may also have broken order, merged columns, missing spaces, or damaged characters, so important output must be checked against the original PDF.

Practical examples

Digital invoices, reports exported from spreadsheets, and office-generated contracts often contain a real text layer. Scanned receipts, photographed pages, and multi-page copier scans often do not. Mixed files are common too: one section may copy correctly while an inserted scan does not.

Common mistakes and fixes

A frequent mistake is assuming that visible text is always selectable text. Another is assuming OCR and extraction are interchangeable. A third is trusting extracted text from legal or financial documents without checking the original layout, numbers, and punctuation.