public class PDFParser extends AbstractParser
| Modifier and Type | Field and Description |
|---|---|
protected static org.slf4j.Logger |
log |
content, encoding, filename, locale| Constructor and Description |
|---|
PDFParser() |
| Modifier and Type | Method and Description |
|---|---|
String |
getAuthor() |
String |
getSourceDate() |
String |
getTags() |
String |
getTitle() |
void |
internalParse(InputStream input)
Invoked by the parse method
|
protected void |
parseDocument(org.apache.pdfbox.pdmodel.PDDocument pdfDocument)
Extract text and metadata from the main document
|
void |
parseForm(org.apache.pdfbox.pdmodel.PDDocument pdfDocument)
Extract the text from the form fields
|
getContent, getEncoding, getFilename, getLocale, getVersion, parse, parse, setEncoding, setFilename, setLocalepublic String getAuthor()
getAuthor in interface ParsergetAuthor in class AbstractParserpublic String getSourceDate()
getSourceDate in interface ParsergetSourceDate in class AbstractParserpublic String getTags()
getTags in interface ParsergetTags in class AbstractParserpublic String getTitle()
getTitle in interface ParsergetTitle in class AbstractParserpublic void internalParse(InputStream input)
AbstractParserinternalParse in class AbstractParserprotected void parseDocument(org.apache.pdfbox.pdmodel.PDDocument pdfDocument)
public void parseForm(org.apache.pdfbox.pdmodel.PDDocument pdfDocument)
throws IOException
IOExceptionCopyright © 2008-2014 Logical Objects. All Rights Reserved.