Image PDFs only look like text
A scanned PDF is often just a page image. You can see letters on the screen, but there may be no embedded text layer for copying or extraction. In that case you need OCR rather than a normal text extractor.
Guide
Visible letters in a PDF do not always mean there is real copyable text underneath. The reason depends on how the PDF was made and whether the document allows extraction.
A scanned PDF is often just a page image. You can see letters on the screen, but there may be no embedded text layer for copying or extraction. In that case you need OCR rather than a normal text extractor.
Some PDFs use custom fonts or unusual character encoding. The text may render correctly on screen but still extract as broken characters, missing spaces, or jumbled reading order.
A PDF can also include security settings that restrict copy or extraction. Even if the file visually contains text, the tool or browser may refuse to expose it in a normal workflow.
Try selecting a short line in a normal PDF reader. If selection fails completely, it may be an image PDF or a protected file. If selection works but extracted text looks broken, encoding is more likely the problem.
Extract PDF Text reads an existing text layer in the browser. It does not OCR image-only PDFs and it cannot override document restrictions that your browser or PDF library respects.
Use Extract PDF Text when the file already has selectable text. If the file is a scan, read What OCR means for PDF text extraction. If you only need a few pages from the document, continue with How to extract pages from a PDF.