Efficient reconstruction techniques for disaster recovery in secret-split datastores

Document Type

Conference Proceeding

Publication Date



© 2018 IEEE. Increasingly, archival systems are relying on authentication-based techniques that leverage secret-splitting rather than encryption to secure data for long-Term storage. Secret-splitting data across multiple independent repositories reduces complexities in key management, eliminates the need for updates due to encryption algorithm deprecation over time, and reduces the risk of insider compromise. While reconstruction of stored data objects is straightforward if a user-maintained index is available, the system must also support disaster recovery incase the index is unavailable. Designing a mechanism for efficient index-free reconstruction, that does not increase the risk of attacker compromise, is a challenge. Reconstruction requires the association of chunks that make up an object, which is the kind of information attackers can use to identify chunks they must steal to illicitly obtain data. We propose two new techniques, the set-subset reconstruction and secret-split secure hash (S3H) reconstruction, which allow chunks of data to be correlated and quickly reconstructed without providing useful information to an attacker. Both techniques operate on the entire collections of secret-split chunks in the archive. While they can efficiently rebuild an entire archive, they are inefficient and impractical for rebuilding single objects, making them useless for attackers that do not have access to all of the data. These techniques can each be tuned to trade-off between reconstruction performance and security, reducing overall runtime from O(N^K) (for N objects requiring K recombined chunks each to return the original object) to between O(N) and O(N2). These runtimes are practical for archives containing as many as 107 objects for the secret-split secure hash method and 109 objects for the set-subset method. Larger archives can run these techniques with manageable runtimes by grouping data into separate smaller collections and running the algorithms on each collection in parallel.

Publication Title

Proceedings - 26th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2018