public class StringExtractor extends Object
The output of StringExtractor is suited for text indexing but less for human consumption, as any formatting will most likely be lost and some amount of unwanted characters slipping through can also not be prevented.
Modifier and Type | Field and Description |
---|---|
static String[] |
COMMON_FONT_NAMES |
Constructor and Description |
---|
StringExtractor() |
Modifier and Type | Method and Description |
---|---|
String |
extract(InputStream stream)
Extract all human-readable text from an InputStream.
|
protected boolean |
isNormalWord(String word) |
protected boolean |
isStartLine(String lineLowerCase)
Determines whether the supplied line indicates the start of the textual contents.
|
protected boolean |
isTextCharacter(int charNumber)
Checks whether the supplied character is a text character.
|
protected boolean |
isValidLine(String lineLowerCase)
Determines whether the supplied line should be included in the end result.
|
protected String |
postProcessLine(String line) |
public static final String[] COMMON_FONT_NAMES
public String extract(InputStream stream) throws IOException
stream
- The InputStream to read the bytes from. The stream will be fully consumed but not closed.IOException
- When reading characters from the InputStream caused an IOException.protected boolean isStartLine(String lineLowerCase)
protected boolean isValidLine(String lineLowerCase)
protected boolean isTextCharacter(int charNumber)
protected boolean isNormalWord(String word)
Copyright © 2008-2014 Logical Objects. All Rights Reserved.