Image PDFs only look like text
A scanned PDF is often just a page image. You can see letters on the screen, but there may be no embedded text layer for copying or extraction. In that case you need OCR rather than a normal text extractor.
Guide
Visible letters in a PDF do not always mean there is real copyable text underneath. The reason depends on how the PDF was made and whether the document allows extraction.
A scanned PDF is often just a page image. You can see letters on the screen, but there may be no embedded text layer for copying or extraction. In that case you need OCR rather than a normal text extractor.
Some PDFs use custom fonts or unusual character encoding. The text may render correctly on screen but still extract as broken characters, missing spaces, or jumbled reading order.
A PDF can also include security settings that restrict copy or extraction. Even if the file visually contains text, the tool or browser may refuse to expose it in a normal workflow.
Try selecting a short line in a normal PDF reader. If selection fails completely, it may be an image PDF or a protected file. If selection works but extracted text looks broken, encoding is more likely the problem.
提取 PDF 文本 reads an existing text layer in the browser. It does not OCR image-only PDFs and it cannot override document restrictions that your browser or PDF library respects.
Use 提取 PDF 文本 when the file already has selectable text. If the file is a scan, read OCR 对 PDF 文本提取意味着什么. If you only need a few pages from the document, continue with 如何从 PDF 提取页面.