[INSTRM-2696] Identify source of updateTelStatus timeouts - PFS-JIRA

XML

Word

Printable

Details

Type: Task
Status: Open (View Workflow)
Priority: Normal
Resolution: Unresolved
Component/s: None
Labels:
None

Description

As slightly alleviated by ~~INSTRM-2686~~ and ~~INSTRM-2695~~, the updateTelStatus command is sometimes slow. The problem is in the opdb INSERT itself, but we haven't pinned it down beyond that. Turning on PostgreSQL logging and/or psycopg2/sqlalchemy logging might help; both of those might cause trouble.

We did turn on log_min_duration_statement but got no hits. I'm not convinced that actually tells us what we are looking for: the end-to-end wall time for the statement hitting/leaving the server process. Still, I would try set log_statement='mod' and lowering the associated duration in any case. Kiyoto Yabe?

WAL checkpointing (on a decently loaded server with basically one logical spindle) is a concern, per logic and the logs. I just don't know exactly how that IO/buffering affects statement processing. You can certainly see the effect of observing activity, but there are also much longer delays than at times where we see failures in gen2. Not sure.

It would be nice to be able to turn psycopg/sqlalchemy logs on/off at runtime: correlating those times with the server times could clarify things. Wilfred Gee ?

Attachments

Issue Links

relates to

INSTRM-2686 Better handling of updateTelStatus timeouts

Done

INSTRM-2695 Clean up MCS frame id/tel status handling

Done

Activity

People

Assignee:

Unassigned

Reporter:

cloomis

Votes:

0 Vote for this issue

Watchers:

4 Start watching this issue

Dates

Created:

24/Sep/25 7:21 AM

Updated:

20/Oct/25 8:09 PM