Skip to content

bogus usage jobs may be created when usage is behind in processing #12424

@DaanHoogland

Description

@DaanHoogland

problem

During trouble shooting missing data, when usage is behind it was found that a lot of usage_jobs are created that won’t have a period to process:

2026-01-13 01:00:38,566 DEBUG [cloud.usage.UsageManagerImpl] (Usage-HB-1:null) (logid:) it's been 86438558 ms since last usage job and 66261443 ms until next job, scheduling an immediate job to catch up (aggregation duration is 1440 minutes)
2026-01-13 01:00:38,568 DEBUG [cloud.usage.UsageManagerImpl] (Usage-HB-1:null) (logid:) Scheduling Usage job...
2026-01-13 01:00:38,570 INFO  [cloud.usage.UsageManagerImpl] (Usage-Job-1:null) (logid:) starting usage job...
2026-01-13 01:00:38,606 INFO  [cloud.usage.UsageManagerImpl] (Usage-Job-1:null) (logid:) Parsing usage records between Mon Jan 12 01:00:00 CET 2026 and Tue Jan 13 00:59:59 CET 2026
2026-01-13 01:01:38,581 DEBUG [cloud.usage.UsageManagerImpl] (Usage-HB-1:null) (logid:) it's been 86498581 ms since last usage job and 66201420 ms until next job, scheduling an immediate job to catch up (aggregation duration is 1440 minutes)
2026-01-13 01:01:38,581 DEBUG [cloud.usage.UsageManagerImpl] (Usage-HB-1:null) (logid:) Scheduling Usage job...
2026-01-13 01:02:38,559 DEBUG [cloud.usage.UsageManagerImpl] (Usage-HB-1:null) (logid:) it's been 86558559 ms since last usage job and 66141442 ms until next job, scheduling an immediate job to catch up (aggregation duration is 1440 minutes)
2026-01-13 01:02:38,559 DEBUG [cloud.usage.UsageManagerImpl] (Usage-HB-1:null) (logid:) Scheduling Usage job...
2026-01-13 01:03:38,558 DEBUG [cloud.usage.UsageManagerImpl] (Usage-HB-1:null) (logid:) it's been 86618558 ms since last usage job and 66081443 ms until next job, scheduling an immediate job to catch up (aggregation duration is 1440 minutes)
2026-01-13 01:03:38,558 DEBUG [cloud.usage.UsageManagerImpl] (Usage-HB-1:null) (logid:) Scheduling Usage job...
2026-01-13 01:04:38,559 DEBUG [cloud.usage.UsageManagerImpl] (Usage-HB-1:null) (logid:) it's been 86678559 ms since last usage job and 66021442 ms until next job, scheduling an immediate job to catch up (aggregation duration is 1440 minutes)
2026-01-13 01:04:38,559 DEBUG [cloud.usage.UsageManagerImpl] (Usage-HB-1:null) (logid:) Scheduling Usage job...
2026-01-13 01:05:38,558 DEBUG [cloud.usage.UsageManagerImpl] (Usage-HB-1:null) (logid:) it's been 86738559 ms since last usage job and 65961442 ms until next job, scheduling an immediate job to catch up (aggregation duration is 1440 minutes)
2026-01-13 01:05:38,558 DEBUG [cloud.usage.UsageManagerImpl] (Usage-HB-1:null) (logid:) Scheduling Usage job...
2026-01-13 01:06:02,952 DEBUG [cloud.usage.UsageManagerImpl] (Usage-Job-1:null) (logid:) Creating networking offering: 18 for Vm: 75432 for account: 239496
2026-01-13 01:06:03,028 DEBUG [cloud.usage.UsageManagerImpl] (Usage-Job-1:null) (logid:) create volume with id : 163459 for account: 239496
2026-01-13 01:06:03,497 DEBUG [cloud.usage.UsageManagerImpl] (Usage-Job-1:null) (logid:) deleting network offering: 18 from Vm: 96981

the HeartBeat thread should be clever about existing jobs and not create the extra jobs.

versions

ACS 4.19

The steps to reproduce the bug

  1. have an environment running for a while
  2. prune intermediate and usage tables a couple of months
  3. set the events of those months to unprocessed
  4. start usage
  5. observe the a job starts and multiple other jobs are created.
  6. the extra jobs seem to have an interval of -1 second and fail (of course)

What to do about it?

make sure the heartbeat job does not create jobs while jobs are being processed/scheduled already.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    Status

    Discuss

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions