Working with password protected office documents using Apache POI.
This project is focused on Excel (XLS and XLSX) spreadsheets.
Today we have the new office XML based documents (XLSX, DOCX, etc).
We have also the legacy binary document format (XLS, DOC, etc.).
The current project includes features for both versions.
Apache POI doesn't work well with legacy format.
Office documents offer protection against modifications (read only mode), i.e., you can open and read, but cannot make any change unless types the corret password.
There's also protection against opening the document, i. e., encryption that wouldn't allow anyone to open the document without a password.
The former approach has low security level since any other software can actually read the document and unlock it. The latter has stronger security because the entire file is encrypted.
I've tested POI reading both XLS and XLSX documents in the following scenarios:
- Reading a file protected against modification
- Reading a file protected against opening (encrypted)
- Reading a file protected against modification and opening (encrypted)
Then I've tested POI creating both XLS and XLSX documents in the following scenarios:
- Creating a file protected against modification
- Creating a file protected against opening (encrypted)
- Creating a file protected against modification and opening (encrypted)
- Passed. It doesn't need any additional implementation.
- Passed. It was necessary to use
Decryptor
class. - Passed. It was necessary to use
Decryptor
class.
- Passed. It doesn't need any additional implementation.
- Passed. It was necessary to use
Biff8EncryptionKey
class. - Passed. It was necessary to use
Biff8EncryptionKey
class.
- Passed. It was used the method
protectSheet
from POI API. - Passed. It was necessary to use
Encryptor
class. - Passed. It was necessary to use
Encryptor
class and the methodprotectSheet
.
- Passed. It was used the method
protectSheet
from POI API. - Not Passed. Encryption not supported.
- Not Passed. Encryption not supported.
"It doesn't need any additional implementation" actually means POI reads the file as if it haven't any protection.
And I'll discuss the other cases in the following topics:
In this project, the class XlsxDecryptor
is responsible for loading documents protected against opening.
Actually, this class wraps the API provided by org.apache.poi.poifs.crypt.Decryptor
.
Decryptor
reads a binary InputStream
and returns the actual content of the document.
It means this class isn't coupled to any kind of document.
In this project, the class XlsDecryptor
is responsible for loading documents protected against opening.
Actually, this class wraps the API provided by org.apache.poi.hssf.record.crypto.Biff8EncryptionKey
.
It's necessary to call the static method setCurrentUserPassword(password)
to define the password for opening the file.
Then, new HSSFWorkbook(input)
constructor will use that password.
It's also necessary to call the static method again with a null
as argument to clear the password for future documents.
This solution is thread-safe because the password is stores in a ThreadLocal variable.
XSSFSheet
provides the method protectSheet
so you can put a password for modifications in sheets.
You need to do this for each sheet in the Workbook.
There's not a method to protect the document as a whole.
In this project, the class XlsxEncryptor
is responsible for outputting documents protected against opening.
Actually, this class wraps the API provided by org.apache.poi.poifs.crypt.Encryptor
.
Encryptor
reads a InputStream
(the original document) and outputs the binary encrypted version of it.
This class isn't coupled to any kind of document.
Java Excel API is an anternative to Apache POI. However, it's somewhat more limited, so I tested as a second option.
Unfortunately, there's no password protection features unless Sheet protection as we already have in POI.
Contribute to POI project:
- Implement
writeProtectWorkbook
in XSSF - Fix
writeProtectWorkbook
in HSSF