Bug #2083

stmf: worker manager doesn't scale workers anymore

Added by Emmanuel Hocdet about 2 years ago. Updated about 2 years ago.

Status:Resolved Start date:2012-02-06
Priority:Urgent Due date:
Assignee:Alexander Stetsenko % Done:

100%

Category:comstar - iSCSI/FC/SAS target Spent time: -
Target version:-
Difficulty:Medium Tags:

Description

it's a regression introduced in changeset: 13291:d5c7e34a4e3f
(https://www.illumos.org/issues/434)

in the new function stmf_svc_timeout: clock_t vars are now in the loop of stmf_svc instead out.

I attach a quick patch to address this issue:

stmf-nworker.patch (598 Bytes) Emmanuel Hocdet, 02/06/2012 11:27 am

stmf.diff (4.2 kB) Alexander Stetsenko, 02/07/2012 08:01 am

History

Updated by Dan McDonald about 2 years ago

Right idea, wrong way to fix it.

Statics in an MT-environment are dangerous. I see that these variables need to be kept in the caller state, but why not pass a pointer to them.

Updated by Alexander Stetsenko about 2 years ago

  • File stmf.diff added
  • Category set to comstar - iSCSI/FC/SAS target
  • Assignee set to Alexander Stetsenko

I guess it should look like this:

Updated by Dan McDonald about 2 years ago

Alexander Stetsenko wrote:

I guess it should look like this:

Yeah, that's more like it. Dumb question: Will the static initialization of one field be sufficient? Or should you zero out the whole structure? Can't remember if a lone 0 is sufficient to blank the whole thing.

Updated by Alexander Stetsenko about 2 years ago

Emmanuel Hocdet wrote:

it's a regression introduced in changeset: 13291:d5c7e34a4e3f (https://www.illumos.org/issues/434)

in the new function stmf_svc_timeout: clock_t vars are now in the loop of stmf_svc instead out.

I attach a quick patch to address this issue:

Emmanuel, could you provide more detailed description of the problem, what are the symptoms… ?
Also, could you test the supplied patch stmf.diff, does it really help? (testing is needed to make patch integration request)

Thanks

Updated by Emmanuel Hocdet about 2 years ago

Emmanuel, could you provide more detailed description of the problem, what are the symptoms… ?

stmf_worker_mgmt() is call many many time per second and don't scale stmf_nworkers_cur
It's because stmf_worker_mgmt() do nothing if it's call more than one in a second. ( cf /* still ramping up */ in the source)

with stmf-nworker.patch, stmf_worker_mgmt() is call each 5s, and scale stmf_nworkers_cur (like before the bug).

Also, could you test the supplied patch stmf.diff, does it really help? (testing is needed to make patch integration request)

I only test my patch (since one week) and work well.

Dan McDonald wrote:

Right idea, wrong way to fix it.

Statics in an MT-environment are dangerous. I see that these variables need to be kept in the caller state, but why not pass a pointer to them.

Yes, but it's the global use case in stmf.c. Caller and called use lot of static global vars and work (and design) in one and only one thread.
I don't like change something for look in such case, because it's quickly more difficult to understand code and changes, and it's the way to introduce new bug.

Updated by Dan McDonald about 2 years ago

Emmanuel Hocdet wrote:

<snip!>

Also, could you test the supplied patch stmf.diff, does it really help? (testing is needed to make patch integration request)

I only test my patch (since one week) and work well.

Could you please test the one Alexander made? I ask because...

Dan McDonald wrote:

Right idea, wrong way to fix it.

Statics in an MT-environment are dangerous. I see that these variables need to be kept in the caller state, but why not pass a pointer to them.

Yes, but it's the global use case in stmf.c. Caller and called use lot of static global vars and work (and design) in one and only one thread. I don't like change something for look in such case, because it's quickly more difficult to understand code and changes, and it's the way to introduce new bug.

There may come a day when that one thread in stmf becomes more than one. Using statics and globals tends to make future architectural changes much more difficult. I understand the principles of least surprise and minimal changes, which are what I believe underpin your argument. I'm more concerned about future-proofing than I am anything else.

I've reviewed Alexander's code changes, and find them (here http://cr.illumos.org/view/mgg7puw0/ ) perfectly reasonable, and more enabling of any future refactoring work.

Updated by Emmanuel Hocdet about 2 years ago

Dan McDonald wrote:

Alexander Stetsenko wrote:

I guess it should look like this:

Yeah, that's more like it. Dumb question: Will the static initialization of one field be sufficient? Or should you zero out the whole structure? Can't remember if a lone 0 is sufficient to blank the whole thing.

= {0}; blank the whole thing and set the first to 0.

= {}; blank the whole thing. This must be used instead.

Updated by Emmanuel Hocdet about 2 years ago

Dan McDonald wrote:

Could you please test the one Alexander made? I ask because...

I tested it and it's ok.

Updated by Rich Lowe about 2 years ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100
  • Tags deleted (needs-triage)

Resolved in r13626 f228617ce378

Also available in: Atom PDF