Archi – the possibility of long-term storage of the collected digital recordings
One of the essential elements of IT use in business practice is the possibility of long-term storage of the collected digital recordings.
Presented design focuses on storing digital files and fulfills the basic requirements of long-term archive:
- Built-in automatic protection of physical state of records (interim and without necessity of data migration)
- Energy efficiency – power is consumed only when performing tasks assigned by the user or for maintenance
- Check for the authenticity of the collected resources
- Secure access to the stored digital files
- Scalability – simplicity of adding additional storage
- Access to the collected resources only on demand
Basic elements of the archive are data nodes, equipped with mass memories. The nodes are controlled by embedded low-power computers which are independently powered up only when their storage is about to be accessed. This allows not only for limiting the overall energy consumption but also lowers environmental demands (no air-condition needed).
The nodes are grouped in trays. Basic and recommended configuration allows for 30 nodes in trays, but it is possible to extend this limit up to 253. Each tray contains several networks designed for data transport, devices’ state control and power supply. Communication with clients is conducted through buffers that are the only parts visible from externally connected networks. Therefore, stored files are completely isolated and cannot be directly accessed. Multiple trays located at single physical site create a complete archive. It is possible to split storage space into virtual archives that are separated on logical level.
The operating system of the data network allows to store from 3 to 7 copies of single digital file in different nodes. Moreover, additional copies of the resource may be stored automatically in remotely located archives. The trays are treated as local parts of wider dispersed data network structure.
Software of the archive enables not only secure read and write operations data but it also automatically takes care of the stored data. It periodically regenerates physical state of saved files. In case of device failure clients are transparently redirected to local or remote redundant copies.
The mechanism of “software bots” was implemented. Archive can be supplied with external programs for processing files stored inside the data network. This allows for data analyzes, indexation, post-data creation, statistical computations or finding associations in unstructured data sets of Big Data type. Only the output of software bot can be externally accessed what makes such operations very secure.
Client programs communicate with the archive using set of simple protocols based on key-value pair strings, making it convenient to build web interfaces for archive access and administration.
By automating the supervision of the resources, reduction of requirements for storage, precise energy consumption control and proposed solution significantly lowers the cost of long-term data storage.
Technical data Node raw capacity |
9 HDDs for data storage (9 – 90 TB depending on HDD used, currently WD Green) |
Tray capacity |
from 30 to 253 nodes |
Internal data duplication |
from 3 to 7 copies including original (adjustable) |
Max. number of trays at single physical location |
10 000 |
Max. number of virtual archives at single physical location |
10 000 |
Max. number of nodes in dispersed archives |
unlimited |
Max. number of users |
10 million per virtual archive |
Tray’s power supply |
230V AC |
Max. power per tray |
400 W (max 1000 W in service mode) |
Min. power per tray |
0 W |
Environmental conditions |
Office (10° to 35°C), humidity 8% to 90% (non-condensing) |
Minimal data retention time |
35 years |
Power-on time after read/write request |
20 s |
Data transfer |
Standard 1 Gb/s |
Contact : Aleksandra Śmietanka
http://www.softcream.pl/kontakt/
J.P. Walczak, K. Marasek, P. Sobótka