FAQ: Mass Store System (MSS)
- What is the MSS?
- Can anyone use the MSS?
- Why does the mssMkdir command fail with 'permission denied?
- Why does my mssGet command hang?
- Why did my file come back corrupted or short?
- My mssGet fails with an ‘illegal seek’. What does that mean?
A: The MSS consists of an ADIC AML/J robot that consists of 1880 LTO slots, 8 LTO-1 tape drives and 2 managed cache partitions that are 600 GB each. LTO-1 tapes are 100 GB native/200 GB compressed. We have the capacity of 188 TB near-line storage. The management software is ADIC's StorNext Management Suite.
A: No. The MSS is only available for use by personnel of the Global Systems Division (GSD) and users of approved HPCS projects.
Q: Why does the mssMkdir command fail with "permission denied"?
A: There is a known bug in CVFS (the file system the MSS' disk cache is built on) that causes alternate groups to be ignored when creating directories and writing files. Not even running things under ‘newgrp’ helps.
In this case, MSS administrators can create the directory or twiddle the group information on the MSS server manually as a workaround. Send mail to hpcshelp.fsl@noaa.gov if you run into this problem.
Q: Why does my mssGet command hang?
A: There could be a number of issues here:
1.) if you asked for a large number of files to be retrieved, it may just be that it isn't hanging, that your command is silently working on it. One thing you could do is to run mssGet with the --verbose option. If mssGet isn't hanging, mssGet will echo the rcp/scp commands as they are spawned.
2.) the files you're asking for may be on tapes not in the tape robot. The tape robot can hold only so many tapes. The remaining tapes are stored on a shelf, and must be manually loaded back into the robot before the files can be read off of them. The notification system for alerting MSS staff about the need to load tapes is rather limited, and time lags between notification and loading can be on the order of hours.
3.) rsh/ssh is prompting for passwords. mssGet wraps rsh/ssh commands, and expects that these commands can be run without password prompting. The wrapping is done in such a way as to lose the prompt string. ssh in particular is known for being very picky about password checking, and so even minor changes to a system could cause it to unexpectedly start it asking for passwords. One way to check if this is the case is to run these commands: "ls ssh jet-sun" If any of these prompt for passwords, that could be the reason for the mssGet hang. Enter your password, and try the above commands again. If any of them still prompt, there's an underlying rsh/ssh issue you need to first address. See the "Connecting with SSH" FAQ item for information on how to set up ssh so that passwords aren't prompted for.
4.) there could well be a problem with the MSS server. Contact the MSS administrators at hpcshelp.fsl@noaa.gov if the hanging persists.
Q: Why did my file come back corrupted or short?
A: I ran mssGet and the file came back shorter than expected (when I went to untar it, it complained about a premature EOF). What's with that?
Rarely, do we see corrupted files. As we save copies of every file on two different tapes, MSS administrators may be able to recover such files by manually requesting them from the backup tape (contact the MSS administrators by sending mail to hpcshelp.fsl@noaa.gov). If this fails, however, those files are unfortunately lost.
Sometimes this loss is attributable to the MSS server software, but sometimes file corruption occurs during the store process. Perhaps the store got interrupted during transport, or some similar problem occurred. We strongly urge users to always check the exit codes from mssPut and not assume that files get stored on the MSS just because an mssPut was run.
Q: My mssGet fails with an 'illegal seek'. What does that mean?
A: We've seen this problem show up under two different circumstances. The first is that the wrapping mssGet does around scp is somewhat buggy, in that it's getting confused over the wrapped command's exit codes. To check for this, add the --verbose flag to the mssGet command line, then cut-and-paste the scp command and run it directly to show the true error. The second circumstance is with possible corruption of large files. Please see the "Why did my file came back corrupted or short?" FAQ item for more information.