Spaces:
Running
Running
.\" Copyright 1997-2011 Glyph & Cog, LLC | |
.TH pdftotext 1 "15 August 2011" | |
.SH NAME | |
pdftotext \- Portable Document Format (PDF) to text converter | |
(version 3.03) | |
.SH SYNOPSIS | |
.B pdftotext | |
[options] | |
.RI PDF-file | |
.RI [ text-file ] | |
.SH DESCRIPTION | |
.B Pdftotext | |
converts Portable Document Format (PDF) files to plain text. | |
.PP | |
Pdftotext reads the PDF file, | |
.IR PDF-file , | |
and writes a text file, | |
.IR text-file . | |
If | |
.I text-file | |
is not specified, pdftotext converts | |
.I file.pdf | |
to | |
.IR file.txt . | |
If | |
.I text-file | |
is \'-', the text is sent to stdout. If | |
.I PDF-file | |
is \'-', it reads the PDF file from stdin. | |
.SH OPTIONS | |
.TP | |
.BI \-f " number" | |
Specifies the first page to convert. | |
.TP | |
.BI \-l " number" | |
Specifies the last page to convert. | |
.TP | |
.BI \-r " number" | |
Specifies the resolution, in DPI. The default is 72 DPI. | |
.TP | |
.BI \-x " number" | |
Specifies the x-coordinate of the crop area top left corner | |
.TP | |
.BI \-y " number" | |
Specifies the y-coordinate of the crop area top left corner | |
.TP | |
.BI \-W " number" | |
Specifies the width of crop area in pixels (default is 0) | |
.TP | |
.BI \-H " number" | |
Specifies the height of crop area in pixels (default is 0) | |
.TP | |
.B \-layout | |
Maintain (as best as possible) the original physical layout of the | |
text. The default is to \'undo' physical layout (columns, | |
hyphenation, etc.) and output the text in reading order. | |
.TP | |
.BI \-fixed " number" | |
Assume fixed-pitch (or tabular) text, with the specified character | |
width (in points). This forces physical layout mode. | |
.TP | |
.B \-raw | |
Keep the text in content stream order. This is a hack which often | |
"undoes" column formatting, etc. Use of raw mode is no longer | |
recommended. | |
.TP | |
.B \-nodiag | |
Discard diagonal text (i.e., text that is not close to one of the | |
0, 90, 180, or 270 degree axes). This is useful for skipping | |
watermarks drawn on body text. | |
.TP | |
.B \-htmlmeta | |
Generate a simple HTML file, including the meta information. This | |
simply wraps the text in <pre> and </pre> and prepends the meta | |
headers. | |
.TP | |
.B \-bbox | |
Generate an XHTML file containing bounding box information for each | |
word in the file. | |
.TP | |
.B \-bbox-layout | |
Generate an XHTML file containing bounding box information for each | |
block, line, and word in the file. | |
.TP | |
.B \-tsv | |
Generate a TSV file containing the bounding box information for each | |
block, line, and word in the file. | |
.TP | |
.B \-cropbox | |
Use the crop box rather than the media box with \-bbox and \-bbox-layout. | |
.TP | |
.BI \-colspacing " number" | |
Specifies how much spacing we allow after a word before considering adjacent text to be a new column, measured as a fraction of the font size. Current default is 0.7, old releases had a 0.3 default. | |
.TP | |
.BI \-enc " encoding-name" | |
Sets the encoding to use for text output. This defaults to "UTF-8". | |
.TP | |
.B \-listenc | |
Lists the available encodings | |
.TP | |
.BI \-eol " unix | dos | mac" | |
Sets the end-of-line convention to use for text output. | |
.TP | |
.B \-nopgbrk | |
Don't insert page breaks (form feed characters) between pages. | |
.TP | |
.BI \-opw " password" | |
Specify the owner password for the PDF file. Providing this will | |
bypass all security restrictions. | |
.TP | |
.BI \-upw " password" | |
Specify the user password for the PDF file. | |
.TP | |
.B \-q | |
Don't print any messages or errors. | |
.TP | |
.B \-v | |
Print copyright and version information. | |
.TP | |
.B \-h | |
Print usage information. | |
.RB ( \-help | |
and | |
.B \-\-help | |
are equivalent.) | |
.SH BUGS | |
Some PDF files contain fonts whose encodings have been mangled beyond | |
recognition. There is no way (short of OCR) to extract text from | |
these files. | |
.SH EXIT CODES | |
The Xpdf tools use the following exit codes: | |
.TP | |
0 | |
No error. | |
.TP | |
1 | |
Error opening a PDF file. | |
.TP | |
2 | |
Error opening an output file. | |
.TP | |
3 | |
Error related to PDF permissions. | |
.TP | |
99 | |
Other error. | |
.SH AUTHOR | |
The pdftotext software and documentation are copyright 1996-2011 Glyph | |
& Cog, LLC. | |
.SH "SEE ALSO" | |
.BR pdfdetach (1), | |
.BR pdffonts (1), | |
.BR pdfimages (1), | |
.BR pdfinfo (1), | |
.BR pdftocairo (1), | |
.BR pdftohtml (1), | |
.BR pdftoppm (1), | |
.BR pdftops (1), | |
.BR pdfseparate (1), | |
.BR pdfsig (1), | |
.BR pdfunite (1) | |