Handling of rfs4_start_time is fragile
There are basically two issues with handling of rfs4_start_time.
When the rfs4_start_time is set in rfs4_state_init(), there is a scenario where the rfs4_start_time could be set to a value which was used by one of the previous nfs server instances. The scenario requires three nfsd restarts in less than one second.
1183 /* 1184 * Set the boot time. If the server 1185 * has been restarted quickly and has had the opportunity to 1186 * service clients, then the start_time needs to be bumped 1187 * regardless. A small window but it exists... 1188 */ 1189 if (rfs4_start_time != gethrestime_sec()) 1190 rfs4_start_time = gethrestime_sec(); 1191 else 1192 rfs4_start_time++;
During the first nfsd restart the rfs4_start_time is set to current time (let say T) in rfs4_state_init().
During the second restart of nfsd, the rfs4_start_time is bumped to T+1, because we restarted nfsd in the same second as the previous nfsd restart, so the condition at line 1189 (see above) is false.
During the third restart of nfsd, the condition at line 1189 is true (current time is T, rfs4_start_time is T+1), so we will set rfs4_start_time back to T, as we had after the first nfsd restart.
To fix this we need to change "not equal" to "less than" at line 1189.
The rfs4_start_time is used in what_stateid_error() to check whether the supplied stateid is from the current, or previous (or some other) nfs instance. In a case the supplied stateid is accidentally from some other (and newer) nfs instance, the following check will pass and NFS4ERR_STALE_STATEID won't be returned. NFS4ERR_BAD_STATEID will be returned instead.
3271 /* From a previous server instantiation, return STALE */ 3272 if (id->bits.boottime < rfs4_start_time) 3273 return (NFS4ERR_STALE_STATEID); ... 3293 3294 return (NFS4ERR_BAD_STATEID);
In general, NFS4ERR_BAD_STATEID should be returned only when the stateid was generated by the current server instance which is not our case here (boottime does not match rfs4_start_time).
To fix this we need to change "less than" at line 3272 above to "not equal".
Updated by Electric Monk about 8 years ago
- Status changed from Pending RTI to Closed
- % Done changed from 0 to 100
commit 25a1318c7b398ea2cc4a31ea6d3fa5c1ffea91b9 Author: Marcel Telka <email@example.com> Date: 2014-05-10T22:46:36.000Z 4768 Handling of rfs4_start_time is fragile Reviewed by: Dan McDonald <firstname.lastname@example.org> Reviewed by: Jan Kryl <email@example.com> Approved by: Richard Lowe <firstname.lastname@example.org>