Schedulers launched twice - problem with RunOnlyOnIP null running on load balancer


The RunOnlyOnIP functionality works still fine.

But with ticket it was introduced that in a load balanced environment - if the RunOnlyOnIP is null, the scheduler is started just on one server (provided they are connected via hazelcast).

My test case involved the following:

  • started at the same time two servers linked by hazelcast

  • check in idempiereMonitor - everything working fine - schedulers just running on one server

  • restarted the server where the schedulers are not running

  • all the schedulers now are running twice

Probably the cause is that the schedulers are being started before the server joins the other hazelcast node.




Carlos Ruiz
October 1, 2020, 9:57 AM

Hi ,

Delay the start of scheduler server to give time to hazelcast to start and join other nodes

Is it possible to know the status of hazelcast from the org.compiere.server.Scheduler or org.adempiere.server.AdempiereServerActivator?

If yes, I think a wait can be implemented until knowing that hazelcast plugin started and it has joined the hazelcast network (or is standalone).

If that's not possible, maybe we can implement some semaphore (maybe could be something like org.adempiere.plugin.utils.AbstractActivator.getLockPO)

Alternatively, add isStarted/isScheduled flag to AD_Scheduler table and use that to avoid more than 1 server start a particular scheduler record.

The problem I see with this alternative is that a crashed server will leave the flag set permanently and would require manual intervention to make it work again.


Carlos Ruiz

Heng Sin Low
October 13, 2020, 6:14 AM

Hi ,

I’ve push a pull request that added a 1 to 3 minutes delay to wait for hazelcast service.



Carlos Ruiz
October 17, 2020, 8:35 PM

Thanks , this was tested in the normal single-server scenario. Pending to test with a configuration of hazelcast nodes.


Carlos Ruiz

Carlos Ruiz
November 1, 2020, 6:20 PM

Hi , I found one server where the AdempiereServerMgr didn't start without leaving any clue in the log for the reason.

It seems this code AdempiereMonitor.init:1285:

is chewing the exception and is not noticeable for the user.

Debugging I found the cause to be this exception:

I'm trying to find the root cause to fix - but anyways I think it would be important that the log shows error.



Carlos Ruiz

Carlos Ruiz
November 1, 2020, 6:31 PM

Going deeper in the root cause of the failure the creation of the MSession record is arriving there without a context, the validation showed that
getCtx().isEmpty() -> true
getCtx().getProperty("#AD_Client_ID") == null -> true

I don't know why this happen in one specific server, but not in others, however I think maybe there can be something wrong with the context of the thread opened with the Adempiere.getThreadPoolExecutor().schedule(() ?



Heng Sin Low


Carlos Ruiz



Tested By


Fix versions

Affects versions