[2025-01-21 06:16:14,057 I 23840 23840] (raylet) main.cc:180: Setting cluster ID to: b96c6f6510a38509f7a01fcc10d6eb313f07df412f48c16b64ee6690 [2025-01-21 06:16:14,066 I 23840 23840] (raylet) main.cc:289: Raylet is not set to kill unknown children. [2025-01-21 06:16:14,066 I 23840 23840] (raylet) io_service_pool.cc:35: IOServicePool is running with 1 io_service. [2025-01-21 06:16:14,067 I 23840 23840] (raylet) main.cc:419: Setting node ID node_id=70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3 [2025-01-21 06:16:14,067 I 23840 23840] (raylet) store_runner.cc:32: Allowing the Plasma store to use up to 2.14748GB of memory. [2025-01-21 06:16:14,067 I 23840 23840] (raylet) store_runner.cc:48: Starting object store with directory /dev/shm, fallback /tmp/ray, and huge page support disabled [2025-01-21 06:16:14,068 I 23840 23868] (raylet) dlmalloc.cc:154: create_and_mmap_buffer(2147483656, /dev/shm/plasmaXXXXXX) [2025-01-21 06:16:14,069 I 23840 23868] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 0 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:16:15,073 I 23840 23840] (raylet) grpc_server.cc:134: ObjectManager server started, listening on port 40575. [2025-01-21 06:16:15,076 I 23840 23840] (raylet) worker_killing_policy.cc:101: Running GroupByOwner policy. [2025-01-21 06:16:15,076 I 23840 23840] (raylet) memory_monitor.cc:47: MemoryMonitor initialized with usage threshold at 94999994368 bytes (0.95 system memory), total system memory bytes: 99999997952 [2025-01-21 06:16:15,076 I 23840 23840] (raylet) node_manager.cc:287: Initializing NodeManager node_id=70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3 [2025-01-21 06:16:15,077 I 23840 23840] (raylet) grpc_server.cc:134: NodeManager server started, listening on port 35805. [2025-01-21 06:16:15,086 I 23840 23933] (raylet) agent_manager.cc:77: Monitor agent process with name dashboard_agent/424238335 [2025-01-21 06:16:15,086 I 23840 23935] (raylet) agent_manager.cc:77: Monitor agent process with name runtime_env_agent [2025-01-21 06:16:15,086 I 23840 23840] (raylet) event.cc:493: Ray Event initialized for RAYLET [2025-01-21 06:16:15,086 I 23840 23840] (raylet) event.cc:324: Set ray event level to warning [2025-01-21 06:16:15,089 I 23840 23840] (raylet) raylet.cc:134: Raylet of id, 70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3 started. Raylet consists of node_manager and object_manager. node_manager address: 192.168.0.2:35805 object_manager address: 192.168.0.2:40575 hostname: 0cd925b1f73b [2025-01-21 06:16:15,092 I 23840 23840] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, node:__internal_head__: 10000, memory: 777605300230000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -7419015889323649548 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [777605300230000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [777605300230000]}}, "labels":{"ray.io/node_id":"70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3",} is_draining: 0 is_idle: 1 Cluster resources: node id: -7419015889323649548{"total":{CPU: 200000, node:__internal_head__: 10000, memory: 777605300230000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, accelerator_type:A40: 10000, GPU: 20000}}, "available": {CPU: 200000, node:__internal_head__: 10000, memory: 777605300230000, object_store_memory: 21474836480000, node:192.168.0.2: 10000, accelerator_type:A40: 10000, GPU: 20000}}, "labels":{"ray.io/node_id":"70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 0 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 69958887677660000.000 [state-dump] - num location lookups per second: 69958887677648000.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 0 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 0 [state-dump] - num PYTHON drivers: 0 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 0 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 28 total (13 active) [state-dump] Queueing time: mean = 1.435 ms, max = 10.897 ms, min = 22.714 us, total = 40.188 ms [state-dump] Execution time: mean = 36.769 ms, total = 1.030 s [state-dump] Event stats: [state-dump] PeriodicalRunner.RunFnPeriodically - 11 total (2 active, 1 running), Execution time: mean = 201.247 us, total = 2.214 ms, Queueing time: mean = 3.640 ms, max = 10.897 ms, min = 38.704 us, total = 40.039 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.257 ms, total = 2.257 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.696 ms, total = 1.696 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.debug_state_dump - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.021 s, total = 1.021 s, Queueing time: mean = 92.512 us, max = 92.512 us, min = 92.512 us, total = 92.512 us [state-dump] ObjectManager.UpdateAvailableMemory - 1 total (0 active), Execution time: mean = 3.941 us, total = 3.941 us, Queueing time: mean = 34.137 us, max = 34.137 us, min = 34.137 us, total = 34.137 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 1 total (0 active), Execution time: mean = 1.720 ms, total = 1.720 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 421.706 us, total = 421.706 us, Queueing time: mean = 22.714 us, max = 22.714 us, min = 22.714 us, total = 22.714 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 0 [state-dump] [state-dump] [2025-01-21 06:16:15,094 I 23840 23840] (raylet) accessor.cc:762: Received notification for node, IsAlive = 1 node_id=70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3 [2025-01-21 06:16:15,168 I 23840 23840] (raylet) worker_pool.cc:501: Started worker process with pid 23972, the token is 0 [2025-01-21 06:16:15,172 I 23840 23840] (raylet) worker_pool.cc:501: Started worker process with pid 23973, the token is 1 [2025-01-21 06:16:15,174 I 23840 23840] (raylet) worker_pool.cc:501: Started worker process with pid 23974, the token is 2 [2025-01-21 06:16:15,176 I 23840 23840] (raylet) worker_pool.cc:501: Started worker process with pid 23975, the token is 3 [2025-01-21 06:16:15,179 I 23840 23840] (raylet) worker_pool.cc:501: Started worker process with pid 23976, the token is 4 [2025-01-21 06:16:15,181 I 23840 23840] (raylet) worker_pool.cc:501: Started worker process with pid 23977, the token is 5 [2025-01-21 06:16:15,183 I 23840 23840] (raylet) worker_pool.cc:501: Started worker process with pid 23978, the token is 6 [2025-01-21 06:16:15,186 I 23840 23840] (raylet) worker_pool.cc:501: Started worker process with pid 23979, the token is 7 [2025-01-21 06:16:15,188 I 23840 23840] (raylet) worker_pool.cc:501: Started worker process with pid 23980, the token is 8 [2025-01-21 06:16:15,190 I 23840 23840] (raylet) worker_pool.cc:501: Started worker process with pid 23981, the token is 9 [2025-01-21 06:16:15,193 I 23840 23840] (raylet) worker_pool.cc:501: Started worker process with pid 23982, the token is 10 [2025-01-21 06:16:15,195 I 23840 23840] (raylet) worker_pool.cc:501: Started worker process with pid 23983, the token is 11 [2025-01-21 06:16:15,197 I 23840 23840] (raylet) worker_pool.cc:501: Started worker process with pid 23984, the token is 12 [2025-01-21 06:16:15,199 I 23840 23840] (raylet) worker_pool.cc:501: Started worker process with pid 23985, the token is 13 [2025-01-21 06:16:15,201 I 23840 23840] (raylet) worker_pool.cc:501: Started worker process with pid 23986, the token is 14 [2025-01-21 06:16:15,204 I 23840 23840] (raylet) worker_pool.cc:501: Started worker process with pid 23987, the token is 15 [2025-01-21 06:16:15,206 I 23840 23840] (raylet) worker_pool.cc:501: Started worker process with pid 23988, the token is 16 [2025-01-21 06:16:15,209 I 23840 23840] (raylet) worker_pool.cc:501: Started worker process with pid 23989, the token is 17 [2025-01-21 06:16:15,212 I 23840 23840] (raylet) worker_pool.cc:501: Started worker process with pid 23990, the token is 18 [2025-01-21 06:16:15,215 I 23840 23840] (raylet) worker_pool.cc:501: Started worker process with pid 23991, the token is 19 [2025-01-21 06:16:15,889 I 23840 23868] (raylet) object_store.cc:35: Object store current usage 8e-09 / 2.14748 GB. [2025-01-21 06:16:16,071 I 23840 23840] (raylet) worker_pool.cc:692: Job 01000000 already started in worker pool. [2025-01-21 06:16:24,081 W 23840 23862] (raylet) metric_exporter.cc:105: [1] Export metrics to agent failed: RpcError: RPC Error message: failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:42309: Failed to connect to remote host: Connection refused; RPC Error details: . This won't affect Ray, but you can lose metrics from the cluster. [2025-01-21 06:17:14,069 I 23840 23868] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:17:15,094 I 23840 23840] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, node:__internal_head__: 10000, memory: 777605300230000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -7419015889323649548 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [777605300230000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [170000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [777605300230000]}}, "labels":{"ray.io/node_id":"70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3",} is_draining: 0 is_idle: 0 Cluster resources: node id: -7419015889323649548{"total":{node:192.168.0.2: 10000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, CPU: 200000, node:__internal_head__: 10000, GPU: 20000, memory: 777605300230000}}, "available": {accelerator_type:A40: 10000, GPU: 20000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, CPU: 170000, memory: 777605300230000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 3 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] - (language=PYTHON actor_or_task=process_xlsx_file pid=23977 worker_id=49bac05a110f5910dca66a01e5b69358e4e99d38e09bae5258b63d21): {CPU: 10000} [state-dump] - (language=PYTHON actor_or_task=process_xlsx_file pid=23974 worker_id=72ce79ea43d23877f1c03ef1ba1db264020fa6bb97f9aab832a3d0a5): {CPU: 10000} [state-dump] - (language=PYTHON actor_or_task=process_xlsx_file pid=23984 worker_id=69c9c53a269e714ea569a0e78d2dfb0f5567e85d9375955553fef4a7): {CPU: 10000} [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] - {depth=1 function_descriptor={type=PythonFunctionDescriptor, module_name=__main__, class_name=, function_name=process_xlsx_file, function_hash=6148d8ed771b449986612c5a33305783} scheduling_strategy=default_scheduling_strategy { [state-dump] } [state-dump] resource_set={CPU : 1, }}: 3/20 [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 17 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 5567 total (35 active) [state-dump] Queueing time: mean = 537.695 us, max = 851.961 ms, min = 56.000 ns, total = 2.993 s [state-dump] Execution time: mean = 565.146 us, total = 3.146 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 1259 total (0 active), Execution time: mean = 39.467 us, total = 49.689 ms, Queueing time: mean = 114.376 us, max = 402.931 us, min = 3.365 us, total = 144.000 ms [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 1259 total (0 active), Execution time: mean = 534.120 us, total = 672.457 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ObjectManager.UpdateAvailableMemory - 600 total (0 active), Execution time: mean = 6.085 us, total = 3.651 ms, Queueing time: mean = 113.391 us, max = 425.256 us, min = 4.594 us, total = 68.035 ms [state-dump] RaySyncer.OnDemandBroadcasting - 600 total (1 active), Execution time: mean = 13.166 us, total = 7.900 ms, Queueing time: mean = 131.547 us, max = 21.522 ms, min = 21.101 us, total = 78.928 ms [state-dump] NodeManager.CheckGC - 600 total (1 active), Execution time: mean = 2.944 us, total = 1.767 ms, Queueing time: mean = 140.788 us, max = 21.528 ms, min = 23.799 us, total = 84.473 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 300 total (1 active), Execution time: mean = 18.894 us, total = 5.668 ms, Queueing time: mean = 85.883 us, max = 2.090 ms, min = 20.228 us, total = 25.765 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 240 total (1 active), Execution time: mean = 452.659 us, total = 108.638 ms, Queueing time: mean = 75.067 us, max = 171.319 us, min = 12.938 us, total = 18.016 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 90 total (21 active), Execution time: mean = 6.540 us, total = 588.639 us, Queueing time: mean = 26.534 ms, max = 851.961 ms, min = 23.577 us, total = 2.388 s [state-dump] ClientConnection.async_read.ProcessMessage - 69 total (0 active), Execution time: mean = 894.979 us, total = 61.754 ms, Queueing time: mean = 21.710 us, max = 213.287 us, min = 2.291 us, total = 1.498 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 60 total (1 active), Execution time: mean = 2.842 us, total = 170.491 us, Queueing time: mean = 180.287 us, max = 1.325 ms, min = 8.217 us, total = 10.817 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 60 total (1 active), Execution time: mean = 8.082 us, total = 484.924 us, Queueing time: mean = 176.508 us, max = 1.330 ms, min = 11.879 us, total = 10.590 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 60 total (1 active), Execution time: mean = 14.884 us, total = 893.069 us, Queueing time: mean = 73.844 us, max = 175.391 us, min = 20.323 us, total = 4.431 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 60 total (0 active), Execution time: mean = 133.796 us, total = 8.028 ms, Queueing time: mean = 122.549 us, max = 359.960 us, min = 18.577 us, total = 7.353 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 60 total (0 active), Execution time: mean = 710.351 us, total = 42.621 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.172 us, total = 25.789 us, Queueing time: mean = 38.125 us, max = 64.240 us, min = 17.983 us, total = 838.743 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 11.860 us, total = 249.050 us, Queueing time: mean = 81.493 us, max = 198.678 us, min = 16.748 us, total = 1.711 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 116.123 us, total = 2.439 ms, Queueing time: mean = 3.901 ms, max = 13.542 ms, min = 3.723 us, total = 81.931 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.277 us, total = 383.809 us, Queueing time: mean = 105.871 us, max = 269.034 us, min = 34.597 us, total = 2.223 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 5.361 ms, total = 112.577 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 21 total (1 active), Execution time: mean = 8.741 us, total = 183.559 us, Queueing time: mean = 80.475 us, max = 171.259 us, min = 37.484 us, total = 1.690 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 223.973 us, total = 2.912 ms, Queueing time: mean = 3.425 ms, max = 10.897 ms, min = 38.704 us, total = 44.520 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 12 total (0 active), Execution time: mean = 1.401 ms, total = 16.811 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 12 total (1 active), Execution time: mean = 526.378 us, total = 6.317 ms, Queueing time: mean = 334.158 us, max = 996.142 us, min = 17.945 us, total = 4.010 ms [state-dump] NodeManager.GcsCheckAlive - 12 total (1 active), Execution time: mean = 246.299 us, total = 2.956 ms, Queueing time: mean = 583.914 us, max = 1.289 ms, min = 20.032 us, total = 7.007 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 12 total (0 active), Execution time: mean = 50.805 us, total = 609.661 us, Queueing time: mean = 115.853 us, max = 171.167 us, min = 18.246 us, total = 1.390 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 6 total (1 active), Execution time: mean = 1.605 ms, total = 9.631 ms, Queueing time: mean = 45.398 us, max = 69.288 us, min = 18.408 us, total = 272.390 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 6 total (0 active), Execution time: mean = 1.148 ms, total = 6.889 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] WorkerPool.PopWorkerCallback - 6 total (0 active), Execution time: mean = 22.250 us, total = 133.500 us, Queueing time: mean = 164.362 us, max = 282.543 us, min = 35.715 us, total = 986.169 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 6 total (0 active), Execution time: mean = 110.817 us, total = 664.900 us, Queueing time: mean = 322.666 us, max = 426.731 us, min = 202.815 us, total = 1.936 ms [state-dump] RaySyncer.BroadcastMessage - 5 total (0 active), Execution time: mean = 218.160 us, total = 1.091 ms, Queueing time: mean = 724.400 ns, max = 1.189 us, min = 70.000 ns, total = 3.622 us [state-dump] - 5 total (0 active), Execution time: mean = 1.229 us, total = 6.146 us, Queueing time: mean = 113.824 us, max = 169.186 us, min = 12.676 us, total = 569.120 us [state-dump] NodeManagerService.grpc_server.ReturnWorker - 3 total (0 active), Execution time: mean = 673.429 us, total = 2.020 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 3 total (0 active), Execution time: mean = 132.832 us, total = 398.496 us, Queueing time: mean = 138.507 us, max = 156.035 us, min = 121.681 us, total = 415.521 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.888 us, total = 3.776 us, Queueing time: mean = 227.000 ns, max = 398.000 ns, min = 56.000 ns, total = 454.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 490.582 ms, total = 981.163 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.607 us, total = 285.213 us, Queueing time: mean = 546.575 us, max = 1.076 ms, min = 16.905 us, total = 1.093 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.645 ms, total = 3.289 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 421.706 us, total = 421.706 us, Queueing time: mean = 22.714 us, max = 22.714 us, min = 22.714 us, total = 22.714 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.669 ms, total = 1.669 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 140.321 us, total = 140.321 us, Queueing time: mean = 124.131 us, max = 124.131 us, min = 124.131 us, total = 124.131 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.804 ms, total = 1.804 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 37.127 us, total = 37.127 us, Queueing time: mean = 28.605 us, max = 28.605 us, min = 28.605 us, total = 28.605 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.696 ms, total = 1.696 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 247.599 us, total = 247.599 us, Queueing time: mean = 76.163 us, max = 76.163 us, min = 76.163 us, total = 76.163 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 1 total (1 active, 1 running), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.257 ms, total = 2.257 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 71.844 us, total = 71.844 us, Queueing time: mean = 236.685 us, max = 236.685 us, min = 236.685 us, total = 236.685 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 67.212 us, total = 67.212 us, Queueing time: mean = 172.638 us, max = 172.638 us, min = 172.638 us, total = 172.638 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.021 s, total = 1.021 s, Queueing time: mean = 92.512 us, max = 92.512 us, min = 92.512 us, total = 92.512 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.185 ms, total = 1.185 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:18:14,069 I 23840 23868] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:18:15,097 I 23840 23840] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, node:__internal_head__: 10000, memory: 777605300230000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -7419015889323649548 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [777605300230000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [170000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [777605300230000]}}, "labels":{"ray.io/node_id":"70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3",} is_draining: 0 is_idle: 0 Cluster resources: node id: -7419015889323649548{"total":{node:192.168.0.2: 10000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, CPU: 200000, node:__internal_head__: 10000, GPU: 20000, memory: 777605300230000}}, "available": {accelerator_type:A40: 10000, GPU: 20000, node:192.168.0.2: 10000, object_store_memory: 21474836480000, CPU: 170000, memory: 777605300230000, node:__internal_head__: 10000}}, "labels":{"ray.io/node_id":"70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 3 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] - (language=PYTHON actor_or_task=process_xlsx_file pid=23977 worker_id=49bac05a110f5910dca66a01e5b69358e4e99d38e09bae5258b63d21): {CPU: 10000} [state-dump] - (language=PYTHON actor_or_task=process_xlsx_file pid=23974 worker_id=72ce79ea43d23877f1c03ef1ba1db264020fa6bb97f9aab832a3d0a5): {CPU: 10000} [state-dump] - (language=PYTHON actor_or_task=process_xlsx_file pid=23984 worker_id=69c9c53a269e714ea569a0e78d2dfb0f5567e85d9375955553fef4a7): {CPU: 10000} [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] - {depth=1 function_descriptor={type=PythonFunctionDescriptor, module_name=__main__, class_name=, function_name=process_xlsx_file, function_hash=6148d8ed771b449986612c5a33305783} scheduling_strategy=default_scheduling_strategy { [state-dump] } [state-dump] resource_set={CPU : 1, }}: 3/20 [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 17 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 10799 total (37 active) [state-dump] Queueing time: mean = 310.822 us, max = 851.961 ms, min = 56.000 ns, total = 3.357 s [state-dump] Execution time: mean = 367.528 us, total = 3.969 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 2519 total (1 active), Execution time: mean = 494.915 us, total = 1.247 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 2519 total (1 active), Execution time: mean = 37.172 us, total = 93.636 ms, Queueing time: mean = 104.016 us, max = 1.839 ms, min = 3.365 us, total = 262.016 ms [state-dump] ObjectManager.UpdateAvailableMemory - 1199 total (0 active), Execution time: mean = 5.670 us, total = 6.798 ms, Queueing time: mean = 100.906 us, max = 649.952 us, min = 4.336 us, total = 120.986 ms [state-dump] NodeManager.CheckGC - 1199 total (1 active), Execution time: mean = 2.894 us, total = 3.470 ms, Queueing time: mean = 116.648 us, max = 21.528 ms, min = 12.538 us, total = 139.862 ms [state-dump] RaySyncer.OnDemandBroadcasting - 1199 total (1 active), Execution time: mean = 11.497 us, total = 13.784 ms, Queueing time: mean = 108.984 us, max = 21.522 ms, min = 16.746 us, total = 130.672 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 600 total (1 active), Execution time: mean = 17.766 us, total = 10.659 ms, Queueing time: mean = 75.279 us, max = 2.090 ms, min = 10.864 us, total = 45.167 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 480 total (1 active), Execution time: mean = 444.846 us, total = 213.526 ms, Queueing time: mean = 71.701 us, max = 242.397 us, min = 12.938 us, total = 34.416 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 120 total (0 active), Execution time: mean = 128.214 us, total = 15.386 ms, Queueing time: mean = 110.139 us, max = 359.960 us, min = 18.577 us, total = 13.217 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 120 total (0 active), Execution time: mean = 658.722 us, total = 79.047 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 120 total (1 active), Execution time: mean = 14.571 us, total = 1.749 ms, Queueing time: mean = 86.340 us, max = 2.039 ms, min = 12.779 us, total = 10.361 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 120 total (1 active), Execution time: mean = 7.952 us, total = 954.271 us, Queueing time: mean = 177.685 us, max = 1.441 ms, min = 11.879 us, total = 21.322 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 120 total (1 active), Execution time: mean = 2.847 us, total = 341.686 us, Queueing time: mean = 181.322 us, max = 1.436 ms, min = 8.217 us, total = 21.759 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 90 total (21 active), Execution time: mean = 6.540 us, total = 588.639 us, Queueing time: mean = 26.534 ms, max = 851.961 ms, min = 23.577 us, total = 2.388 s [state-dump] ClientConnection.async_read.ProcessMessage - 69 total (0 active), Execution time: mean = 894.979 us, total = 61.754 ms, Queueing time: mean = 21.710 us, max = 213.287 us, min = 2.291 us, total = 1.498 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 41 total (1 active), Execution time: mean = 8.493 us, total = 348.211 us, Queueing time: mean = 70.032 us, max = 171.259 us, min = 16.832 us, total = 2.871 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 24 total (0 active), Execution time: mean = 1.336 ms, total = 32.059 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 24 total (0 active), Execution time: mean = 50.592 us, total = 1.214 ms, Queueing time: mean = 105.688 us, max = 183.681 us, min = 18.246 us, total = 2.537 ms [state-dump] NodeManager.GcsCheckAlive - 24 total (1 active), Execution time: mean = 250.079 us, total = 6.002 ms, Queueing time: mean = 630.420 us, max = 1.503 ms, min = 20.032 us, total = 15.130 ms [state-dump] NodeManager.deadline_timer.record_metrics - 24 total (1 active), Execution time: mean = 522.683 us, total = 12.544 ms, Queueing time: mean = 371.390 us, max = 1.145 ms, min = 17.945 us, total = 8.913 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.172 us, total = 25.789 us, Queueing time: mean = 38.125 us, max = 64.240 us, min = 17.983 us, total = 838.743 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 11.860 us, total = 249.050 us, Queueing time: mean = 81.493 us, max = 198.678 us, min = 16.748 us, total = 1.711 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 5.361 ms, total = 112.577 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 116.123 us, total = 2.439 ms, Queueing time: mean = 3.901 ms, max = 13.542 ms, min = 3.723 us, total = 81.931 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.277 us, total = 383.809 us, Queueing time: mean = 105.871 us, max = 269.034 us, min = 34.597 us, total = 2.223 ms [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 223.973 us, total = 2.912 ms, Queueing time: mean = 3.425 ms, max = 10.897 ms, min = 38.704 us, total = 44.520 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 12 total (1 active), Execution time: mean = 1.692 ms, total = 20.300 ms, Queueing time: mean = 50.017 us, max = 69.288 us, min = 18.408 us, total = 600.203 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 6 total (0 active), Execution time: mean = 1.148 ms, total = 6.889 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 6 total (0 active), Execution time: mean = 110.817 us, total = 664.900 us, Queueing time: mean = 322.666 us, max = 426.731 us, min = 202.815 us, total = 1.936 ms [state-dump] WorkerPool.PopWorkerCallback - 6 total (0 active), Execution time: mean = 22.250 us, total = 133.500 us, Queueing time: mean = 164.362 us, max = 282.543 us, min = 35.715 us, total = 986.169 us [state-dump] RaySyncer.BroadcastMessage - 5 total (0 active), Execution time: mean = 218.160 us, total = 1.091 ms, Queueing time: mean = 724.400 ns, max = 1.189 us, min = 70.000 ns, total = 3.622 us [state-dump] - 5 total (0 active), Execution time: mean = 1.229 us, total = 6.146 us, Queueing time: mean = 113.824 us, max = 169.186 us, min = 12.676 us, total = 569.120 us [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 3 total (0 active), Execution time: mean = 132.832 us, total = 398.496 us, Queueing time: mean = 138.507 us, max = 156.035 us, min = 121.681 us, total = 415.521 us [state-dump] NodeManagerService.grpc_server.ReturnWorker - 3 total (0 active), Execution time: mean = 673.429 us, total = 2.020 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 2 total (1 active, 1 running), Execution time: mean = 1.371 ms, total = 2.741 ms, Queueing time: mean = 81.527 us, max = 163.054 us, min = 163.054 us, total = 163.054 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 490.582 ms, total = 981.163 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.645 ms, total = 3.289 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.607 us, total = 285.213 us, Queueing time: mean = 546.575 us, max = 1.076 ms, min = 16.905 us, total = 1.093 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.888 us, total = 3.776 us, Queueing time: mean = 227.000 ns, max = 398.000 ns, min = 56.000 ns, total = 454.000 ns [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 247.599 us, total = 247.599 us, Queueing time: mean = 76.163 us, max = 76.163 us, min = 76.163 us, total = 76.163 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.257 ms, total = 2.257 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.696 ms, total = 1.696 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 37.127 us, total = 37.127 us, Queueing time: mean = 28.605 us, max = 28.605 us, min = 28.605 us, total = 28.605 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 67.212 us, total = 67.212 us, Queueing time: mean = 172.638 us, max = 172.638 us, min = 172.638 us, total = 172.638 us [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 71.844 us, total = 71.844 us, Queueing time: mean = 236.685 us, max = 236.685 us, min = 236.685 us, total = 236.685 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 421.706 us, total = 421.706 us, Queueing time: mean = 22.714 us, max = 22.714 us, min = 22.714 us, total = 22.714 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.669 ms, total = 1.669 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 140.321 us, total = 140.321 us, Queueing time: mean = 124.131 us, max = 124.131 us, min = 124.131 us, total = 124.131 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.804 ms, total = 1.804 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.021 s, total = 1.021 s, Queueing time: mean = 92.512 us, max = 92.512 us, min = 92.512 us, total = 92.512 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.185 ms, total = 1.185 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 2 [state-dump] [state-dump] [2025-01-21 06:19:14,070 I 23840 23868] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:19:15,100 I 23840 23840] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, node:__internal_head__: 10000, memory: 777605300230000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -7419015889323649548 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [777605300230000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [180000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [777605300230000]}}, "labels":{"ray.io/node_id":"70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3",} is_draining: 0 is_idle: 0 Cluster resources: node id: -7419015889323649548{"total":{node:192.168.0.2: 10000, GPU: 20000, accelerator_type:A40: 10000, CPU: 200000, memory: 777605300230000, object_store_memory: 21474836480000, node:__internal_head__: 10000}}, "available": {node:192.168.0.2: 10000, accelerator_type:A40: 10000, object_store_memory: 21474836480000, CPU: 180000, node:__internal_head__: 10000, GPU: 20000, memory: 777605300230000}}, "labels":{"ray.io/node_id":"70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 2 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] - (language=PYTHON actor_or_task=process_xlsx_file pid=23977 worker_id=49bac05a110f5910dca66a01e5b69358e4e99d38e09bae5258b63d21): {CPU: 10000} [state-dump] - (language=PYTHON actor_or_task=process_xlsx_file pid=23984 worker_id=69c9c53a269e714ea569a0e78d2dfb0f5567e85d9375955553fef4a7): {CPU: 10000} [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] - {depth=1 function_descriptor={type=PythonFunctionDescriptor, module_name=__main__, class_name=, function_name=process_xlsx_file, function_hash=6148d8ed771b449986612c5a33305783} scheduling_strategy=default_scheduling_strategy { [state-dump] } [state-dump] resource_set={CPU : 1, }}: 2/20 [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 18 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 16035 total (35 active) [state-dump] Queueing time: mean = 233.398 us, max = 851.961 ms, min = 56.000 ns, total = 3.743 s [state-dump] Execution time: mean = 303.076 us, total = 4.860 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 3778 total (0 active), Execution time: mean = 496.038 us, total = 1.874 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 3778 total (0 active), Execution time: mean = 38.039 us, total = 143.713 ms, Queueing time: mean = 104.283 us, max = 2.493 ms, min = 3.365 us, total = 393.982 ms [state-dump] ObjectManager.UpdateAvailableMemory - 1799 total (0 active), Execution time: mean = 5.631 us, total = 10.130 ms, Queueing time: mean = 100.595 us, max = 649.952 us, min = 3.084 us, total = 180.970 ms [state-dump] NodeManager.CheckGC - 1799 total (1 active), Execution time: mean = 2.872 us, total = 5.166 ms, Queueing time: mean = 109.529 us, max = 21.528 ms, min = 8.879 us, total = 197.043 ms [state-dump] RaySyncer.OnDemandBroadcasting - 1799 total (1 active), Execution time: mean = 11.152 us, total = 20.063 ms, Queueing time: mean = 102.188 us, max = 21.522 ms, min = 10.651 us, total = 183.837 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 900 total (1 active), Execution time: mean = 17.784 us, total = 16.006 ms, Queueing time: mean = 74.099 us, max = 2.090 ms, min = 10.864 us, total = 66.689 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 719 total (1 active), Execution time: mean = 444.003 us, total = 319.238 ms, Queueing time: mean = 70.134 us, max = 242.397 us, min = 12.326 us, total = 50.426 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 180 total (0 active), Execution time: mean = 128.408 us, total = 23.113 ms, Queueing time: mean = 109.408 us, max = 359.960 us, min = 18.577 us, total = 19.693 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 180 total (0 active), Execution time: mean = 661.029 us, total = 118.985 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 180 total (1 active), Execution time: mean = 14.674 us, total = 2.641 ms, Queueing time: mean = 80.101 us, max = 2.039 ms, min = 12.779 us, total = 14.418 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 180 total (1 active), Execution time: mean = 8.055 us, total = 1.450 ms, Queueing time: mean = 175.822 us, max = 1.441 ms, min = 11.879 us, total = 31.648 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 180 total (1 active), Execution time: mean = 2.842 us, total = 511.624 us, Queueing time: mean = 179.500 us, max = 1.436 ms, min = 8.217 us, total = 32.310 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 90 total (21 active), Execution time: mean = 6.540 us, total = 588.639 us, Queueing time: mean = 26.534 ms, max = 851.961 ms, min = 23.577 us, total = 2.388 s [state-dump] ClientConnection.async_read.ProcessMessage - 69 total (0 active), Execution time: mean = 894.979 us, total = 61.754 ms, Queueing time: mean = 21.710 us, max = 213.287 us, min = 2.291 us, total = 1.498 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 61 total (1 active), Execution time: mean = 8.307 us, total = 506.752 us, Queueing time: mean = 64.993 us, max = 171.259 us, min = 16.832 us, total = 3.965 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 36 total (0 active), Execution time: mean = 1.371 ms, total = 49.345 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 36 total (0 active), Execution time: mean = 49.930 us, total = 1.797 ms, Queueing time: mean = 101.646 us, max = 183.681 us, min = 18.246 us, total = 3.659 ms [state-dump] NodeManager.GcsCheckAlive - 36 total (1 active), Execution time: mean = 247.588 us, total = 8.913 ms, Queueing time: mean = 640.281 us, max = 1.503 ms, min = 20.032 us, total = 23.050 ms [state-dump] NodeManager.deadline_timer.record_metrics - 36 total (1 active), Execution time: mean = 539.375 us, total = 19.418 ms, Queueing time: mean = 359.631 us, max = 1.145 ms, min = 17.945 us, total = 12.947 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.172 us, total = 25.789 us, Queueing time: mean = 38.125 us, max = 64.240 us, min = 17.983 us, total = 838.743 us [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 11.860 us, total = 249.050 us, Queueing time: mean = 81.493 us, max = 198.678 us, min = 16.748 us, total = 1.711 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 5.361 ms, total = 112.577 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 116.123 us, total = 2.439 ms, Queueing time: mean = 3.901 ms, max = 13.542 ms, min = 3.723 us, total = 81.931 ms [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.277 us, total = 383.809 us, Queueing time: mean = 105.871 us, max = 269.034 us, min = 34.597 us, total = 2.223 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 18 total (1 active), Execution time: mean = 1.710 ms, total = 30.789 ms, Queueing time: mean = 51.267 us, max = 69.288 us, min = 18.408 us, total = 922.799 us [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 223.973 us, total = 2.912 ms, Queueing time: mean = 3.425 ms, max = 10.897 ms, min = 38.704 us, total = 44.520 ms [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 6 total (0 active), Execution time: mean = 1.148 ms, total = 6.889 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncer.BroadcastMessage - 6 total (0 active), Execution time: mean = 239.322 us, total = 1.436 ms, Queueing time: mean = 737.500 ns, max = 1.189 us, min = 70.000 ns, total = 4.425 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 6 total (0 active), Execution time: mean = 110.817 us, total = 664.900 us, Queueing time: mean = 322.666 us, max = 426.731 us, min = 202.815 us, total = 1.936 ms [state-dump] - 6 total (0 active), Execution time: mean = 1.180 us, total = 7.078 us, Queueing time: mean = 120.351 us, max = 169.186 us, min = 12.676 us, total = 722.104 us [state-dump] WorkerPool.PopWorkerCallback - 6 total (0 active), Execution time: mean = 22.250 us, total = 133.500 us, Queueing time: mean = 164.362 us, max = 282.543 us, min = 35.715 us, total = 986.169 us [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 4 total (0 active), Execution time: mean = 125.886 us, total = 503.544 us, Queueing time: mean = 109.663 us, max = 156.035 us, min = 23.131 us, total = 438.652 us [state-dump] NodeManagerService.grpc_server.ReturnWorker - 4 total (0 active), Execution time: mean = 592.071 us, total = 2.368 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 3 total (1 active, 1 running), Execution time: mean = 1.837 ms, total = 5.512 ms, Queueing time: mean = 75.001 us, max = 163.054 us, min = 61.950 us, total = 225.004 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 490.582 ms, total = 981.163 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.607 us, total = 285.213 us, Queueing time: mean = 546.575 us, max = 1.076 ms, min = 16.905 us, total = 1.093 ms [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.888 us, total = 3.776 us, Queueing time: mean = 227.000 ns, max = 398.000 ns, min = 56.000 ns, total = 454.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.645 ms, total = 3.289 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.185 ms, total = 1.185 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.021 s, total = 1.021 s, Queueing time: mean = 92.512 us, max = 92.512 us, min = 92.512 us, total = 92.512 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.804 ms, total = 1.804 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 140.321 us, total = 140.321 us, Queueing time: mean = 124.131 us, max = 124.131 us, min = 124.131 us, total = 124.131 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.669 ms, total = 1.669 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 421.706 us, total = 421.706 us, Queueing time: mean = 22.714 us, max = 22.714 us, min = 22.714 us, total = 22.714 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 37.127 us, total = 37.127 us, Queueing time: mean = 28.605 us, max = 28.605 us, min = 28.605 us, total = 28.605 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.696 ms, total = 1.696 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 71.844 us, total = 71.844 us, Queueing time: mean = 236.685 us, max = 236.685 us, min = 236.685 us, total = 236.685 us [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 247.599 us, total = 247.599 us, Queueing time: mean = 76.163 us, max = 76.163 us, min = 76.163 us, total = 76.163 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 67.212 us, total = 67.212 us, Queueing time: mean = 172.638 us, max = 172.638 us, min = 172.638 us, total = 172.638 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.257 ms, total = 2.257 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:20:14,070 I 23840 23868] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:20:15,103 I 23840 23840] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, node:__internal_head__: 10000, memory: 777605300230000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -7419015889323649548 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [777605300230000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [190000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [777605300230000]}}, "labels":{"ray.io/node_id":"70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3",} is_draining: 0 is_idle: 0 Cluster resources: node id: -7419015889323649548{"total":{node:192.168.0.2: 10000, accelerator_type:A40: 10000, node:__internal_head__: 10000, memory: 777605300230000, CPU: 200000, object_store_memory: 21474836480000, GPU: 20000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, GPU: 20000, memory: 777605300230000, node:__internal_head__: 10000, CPU: 190000, object_store_memory: 21474836480000}}, "labels":{"ray.io/node_id":"70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 1 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] - (language=PYTHON actor_or_task=process_xlsx_file pid=23977 worker_id=49bac05a110f5910dca66a01e5b69358e4e99d38e09bae5258b63d21): {CPU: 10000} [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] - {depth=1 function_descriptor={type=PythonFunctionDescriptor, module_name=__main__, class_name=, function_name=process_xlsx_file, function_hash=6148d8ed771b449986612c5a33305783} scheduling_strategy=default_scheduling_strategy { [state-dump] } [state-dump] resource_set={CPU : 1, }}: 1/20 [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 19 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 21271 total (35 active) [state-dump] Queueing time: mean = 193.104 us, max = 851.961 ms, min = 56.000 ns, total = 4.108 s [state-dump] Execution time: mean = 267.071 us, total = 5.681 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 5038 total (0 active), Execution time: mean = 485.955 us, total = 2.448 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 5038 total (0 active), Execution time: mean = 37.294 us, total = 187.888 ms, Queueing time: mean = 102.131 us, max = 3.805 ms, min = 3.365 us, total = 514.537 ms [state-dump] ObjectManager.UpdateAvailableMemory - 2398 total (0 active), Execution time: mean = 5.515 us, total = 13.225 ms, Queueing time: mean = 97.999 us, max = 649.952 us, min = 3.084 us, total = 235.002 ms [state-dump] NodeManager.CheckGC - 2398 total (1 active), Execution time: mean = 2.848 us, total = 6.829 ms, Queueing time: mean = 105.004 us, max = 21.528 ms, min = 8.879 us, total = 251.800 ms [state-dump] RaySyncer.OnDemandBroadcasting - 2398 total (1 active), Execution time: mean = 11.071 us, total = 26.549 ms, Queueing time: mean = 97.709 us, max = 21.522 ms, min = 10.651 us, total = 234.306 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 1200 total (1 active), Execution time: mean = 17.731 us, total = 21.278 ms, Queueing time: mean = 73.078 us, max = 2.090 ms, min = 10.864 us, total = 87.694 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 959 total (1 active), Execution time: mean = 441.918 us, total = 423.800 ms, Queueing time: mean = 73.529 us, max = 3.941 ms, min = 11.335 us, total = 70.515 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 240 total (1 active), Execution time: mean = 14.474 us, total = 3.474 ms, Queueing time: mean = 78.774 us, max = 2.039 ms, min = 12.779 us, total = 18.906 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 240 total (1 active), Execution time: mean = 7.943 us, total = 1.906 ms, Queueing time: mean = 172.250 us, max = 1.441 ms, min = 11.879 us, total = 41.340 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 240 total (1 active), Execution time: mean = 2.824 us, total = 677.755 us, Queueing time: mean = 175.807 us, max = 1.436 ms, min = 8.217 us, total = 42.194 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 240 total (0 active), Execution time: mean = 643.561 us, total = 154.455 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 240 total (0 active), Execution time: mean = 125.962 us, total = 30.231 ms, Queueing time: mean = 106.489 us, max = 359.960 us, min = 18.577 us, total = 25.557 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 90 total (21 active), Execution time: mean = 6.540 us, total = 588.639 us, Queueing time: mean = 26.534 ms, max = 851.961 ms, min = 23.577 us, total = 2.388 s [state-dump] ClusterResourceManager.ResetRemoteNodeView - 81 total (1 active), Execution time: mean = 8.298 us, total = 672.152 us, Queueing time: mean = 66.921 us, max = 171.259 us, min = 16.832 us, total = 5.421 ms [state-dump] ClientConnection.async_read.ProcessMessage - 69 total (0 active), Execution time: mean = 894.979 us, total = 61.754 ms, Queueing time: mean = 21.710 us, max = 213.287 us, min = 2.291 us, total = 1.498 ms [state-dump] NodeManager.GcsCheckAlive - 48 total (1 active), Execution time: mean = 246.050 us, total = 11.810 ms, Queueing time: mean = 625.928 us, max = 1.503 ms, min = 20.032 us, total = 30.045 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 48 total (0 active), Execution time: mean = 1.330 ms, total = 63.853 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 48 total (0 active), Execution time: mean = 49.081 us, total = 2.356 ms, Queueing time: mean = 96.340 us, max = 183.681 us, min = 18.246 us, total = 4.624 ms [state-dump] NodeManager.deadline_timer.record_metrics - 48 total (1 active), Execution time: mean = 523.232 us, total = 25.115 ms, Queueing time: mean = 355.045 us, max = 1.145 ms, min = 17.945 us, total = 17.042 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 24 total (1 active), Execution time: mean = 1.685 ms, total = 40.439 ms, Queueing time: mean = 54.084 us, max = 112.959 us, min = 18.408 us, total = 1.298 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.172 us, total = 25.789 us, Queueing time: mean = 38.125 us, max = 64.240 us, min = 17.983 us, total = 838.743 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.277 us, total = 383.809 us, Queueing time: mean = 105.871 us, max = 269.034 us, min = 34.597 us, total = 2.223 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 11.860 us, total = 249.050 us, Queueing time: mean = 81.493 us, max = 198.678 us, min = 16.748 us, total = 1.711 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 116.123 us, total = 2.439 ms, Queueing time: mean = 3.901 ms, max = 13.542 ms, min = 3.723 us, total = 81.931 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 5.361 ms, total = 112.577 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 223.973 us, total = 2.912 ms, Queueing time: mean = 3.425 ms, max = 10.897 ms, min = 38.704 us, total = 44.520 ms [state-dump] RaySyncer.BroadcastMessage - 7 total (0 active), Execution time: mean = 237.935 us, total = 1.666 ms, Queueing time: mean = 740.714 ns, max = 1.189 us, min = 70.000 ns, total = 5.185 us [state-dump] - 7 total (0 active), Execution time: mean = 1.303 us, total = 9.122 us, Queueing time: mean = 117.881 us, max = 169.186 us, min = 12.676 us, total = 825.164 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 6 total (0 active), Execution time: mean = 1.148 ms, total = 6.889 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 6 total (0 active), Execution time: mean = 110.817 us, total = 664.900 us, Queueing time: mean = 322.666 us, max = 426.731 us, min = 202.815 us, total = 1.936 ms [state-dump] WorkerPool.PopWorkerCallback - 6 total (0 active), Execution time: mean = 22.250 us, total = 133.500 us, Queueing time: mean = 164.362 us, max = 282.543 us, min = 35.715 us, total = 986.169 us [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 5 total (0 active), Execution time: mean = 131.495 us, total = 657.476 us, Queueing time: mean = 109.989 us, max = 156.035 us, min = 23.131 us, total = 549.946 us [state-dump] NodeManagerService.grpc_server.ReturnWorker - 5 total (0 active), Execution time: mean = 617.412 us, total = 3.087 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 4 total (1 active, 1 running), Execution time: mean = 2.118 ms, total = 8.473 ms, Queueing time: mean = 68.349 us, max = 163.054 us, min = 48.390 us, total = 273.394 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.888 us, total = 3.776 us, Queueing time: mean = 227.000 ns, max = 398.000 ns, min = 56.000 ns, total = 454.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 490.582 ms, total = 981.163 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.645 ms, total = 3.289 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.607 us, total = 285.213 us, Queueing time: mean = 546.575 us, max = 1.076 ms, min = 16.905 us, total = 1.093 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 247.599 us, total = 247.599 us, Queueing time: mean = 76.163 us, max = 76.163 us, min = 76.163 us, total = 76.163 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.696 ms, total = 1.696 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 37.127 us, total = 37.127 us, Queueing time: mean = 28.605 us, max = 28.605 us, min = 28.605 us, total = 28.605 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.021 s, total = 1.021 s, Queueing time: mean = 92.512 us, max = 92.512 us, min = 92.512 us, total = 92.512 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.257 ms, total = 2.257 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 421.706 us, total = 421.706 us, Queueing time: mean = 22.714 us, max = 22.714 us, min = 22.714 us, total = 22.714 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 67.212 us, total = 67.212 us, Queueing time: mean = 172.638 us, max = 172.638 us, min = 172.638 us, total = 172.638 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.669 ms, total = 1.669 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 140.321 us, total = 140.321 us, Queueing time: mean = 124.131 us, max = 124.131 us, min = 124.131 us, total = 124.131 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.804 ms, total = 1.804 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 71.844 us, total = 71.844 us, Queueing time: mean = 236.685 us, max = 236.685 us, min = 236.685 us, total = 236.685 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.185 ms, total = 1.185 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:21:14,070 I 23840 23868] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:21:15,106 I 23840 23840] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, node:__internal_head__: 10000, memory: 777605300230000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -7419015889323649548 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [777605300230000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [190000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [777605300230000]}}, "labels":{"ray.io/node_id":"70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3",} is_draining: 0 is_idle: 0 Cluster resources: node id: -7419015889323649548{"total":{node:192.168.0.2: 10000, accelerator_type:A40: 10000, node:__internal_head__: 10000, memory: 777605300230000, CPU: 200000, object_store_memory: 21474836480000, GPU: 20000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, GPU: 20000, memory: 777605300230000, node:__internal_head__: 10000, CPU: 190000, object_store_memory: 21474836480000}}, "labels":{"ray.io/node_id":"70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 1 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] - (language=PYTHON actor_or_task=process_xlsx_file pid=23977 worker_id=49bac05a110f5910dca66a01e5b69358e4e99d38e09bae5258b63d21): {CPU: 10000} [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] - {depth=1 function_descriptor={type=PythonFunctionDescriptor, module_name=__main__, class_name=, function_name=process_xlsx_file, function_hash=6148d8ed771b449986612c5a33305783} scheduling_strategy=default_scheduling_strategy { [state-dump] } [state-dump] resource_set={CPU : 1, }}: 1/20 [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 19 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 26503 total (35 active) [state-dump] Queueing time: mean = 166.263 us, max = 851.961 ms, min = 56.000 ns, total = 4.406 s [state-dump] Execution time: mean = 241.089 us, total = 6.390 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 6297 total (0 active), Execution time: mean = 464.677 us, total = 2.926 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 6297 total (0 active), Execution time: mean = 35.975 us, total = 226.536 ms, Queueing time: mean = 96.773 us, max = 3.805 ms, min = 3.241 us, total = 609.382 ms [state-dump] ObjectManager.UpdateAvailableMemory - 2998 total (0 active), Execution time: mean = 5.260 us, total = 15.769 ms, Queueing time: mean = 92.473 us, max = 649.952 us, min = 3.084 us, total = 277.235 ms [state-dump] NodeManager.CheckGC - 2998 total (1 active), Execution time: mean = 2.813 us, total = 8.432 ms, Queueing time: mean = 99.145 us, max = 21.528 ms, min = 8.879 us, total = 297.236 ms [state-dump] RaySyncer.OnDemandBroadcasting - 2998 total (1 active), Execution time: mean = 10.589 us, total = 31.745 ms, Queueing time: mean = 92.277 us, max = 21.522 ms, min = 10.651 us, total = 276.647 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 1500 total (1 active), Execution time: mean = 17.064 us, total = 25.595 ms, Queueing time: mean = 69.804 us, max = 2.090 ms, min = 10.864 us, total = 104.707 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1198 total (1 active), Execution time: mean = 438.531 us, total = 525.360 ms, Queueing time: mean = 69.930 us, max = 3.941 ms, min = 11.335 us, total = 83.776 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 300 total (1 active), Execution time: mean = 14.062 us, total = 4.219 ms, Queueing time: mean = 72.704 us, max = 2.039 ms, min = 12.779 us, total = 21.811 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 300 total (1 active), Execution time: mean = 7.760 us, total = 2.328 ms, Queueing time: mean = 172.249 us, max = 1.792 ms, min = 11.879 us, total = 51.675 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 300 total (1 active), Execution time: mean = 2.791 us, total = 837.334 us, Queueing time: mean = 175.682 us, max = 1.792 ms, min = 8.217 us, total = 52.705 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 300 total (0 active), Execution time: mean = 623.212 us, total = 186.964 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 300 total (0 active), Execution time: mean = 124.034 us, total = 37.210 ms, Queueing time: mean = 102.873 us, max = 359.960 us, min = 18.577 us, total = 30.862 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 101 total (1 active), Execution time: mean = 7.979 us, total = 805.905 us, Queueing time: mean = 65.058 us, max = 171.259 us, min = 16.832 us, total = 6.571 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 90 total (21 active), Execution time: mean = 6.540 us, total = 588.639 us, Queueing time: mean = 26.534 ms, max = 851.961 ms, min = 23.577 us, total = 2.388 s [state-dump] ClientConnection.async_read.ProcessMessage - 69 total (0 active), Execution time: mean = 894.979 us, total = 61.754 ms, Queueing time: mean = 21.710 us, max = 213.287 us, min = 2.291 us, total = 1.498 ms [state-dump] NodeManager.GcsCheckAlive - 60 total (1 active), Execution time: mean = 243.395 us, total = 14.604 ms, Queueing time: mean = 631.011 us, max = 1.710 ms, min = 20.032 us, total = 37.861 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 60 total (0 active), Execution time: mean = 1.287 ms, total = 77.207 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 60 total (0 active), Execution time: mean = 48.454 us, total = 2.907 ms, Queueing time: mean = 92.706 us, max = 183.681 us, min = 18.246 us, total = 5.562 ms [state-dump] NodeManager.deadline_timer.record_metrics - 60 total (1 active), Execution time: mean = 520.893 us, total = 31.254 ms, Queueing time: mean = 359.723 us, max = 1.366 ms, min = 17.945 us, total = 21.583 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 30 total (1 active), Execution time: mean = 1.694 ms, total = 50.829 ms, Queueing time: mean = 51.666 us, max = 112.959 us, min = 14.721 us, total = 1.550 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.172 us, total = 25.789 us, Queueing time: mean = 38.125 us, max = 64.240 us, min = 17.983 us, total = 838.743 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.277 us, total = 383.809 us, Queueing time: mean = 105.871 us, max = 269.034 us, min = 34.597 us, total = 2.223 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 11.860 us, total = 249.050 us, Queueing time: mean = 81.493 us, max = 198.678 us, min = 16.748 us, total = 1.711 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 116.123 us, total = 2.439 ms, Queueing time: mean = 3.901 ms, max = 13.542 ms, min = 3.723 us, total = 81.931 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 5.361 ms, total = 112.577 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 223.973 us, total = 2.912 ms, Queueing time: mean = 3.425 ms, max = 10.897 ms, min = 38.704 us, total = 44.520 ms [state-dump] RaySyncer.BroadcastMessage - 7 total (0 active), Execution time: mean = 237.935 us, total = 1.666 ms, Queueing time: mean = 740.714 ns, max = 1.189 us, min = 70.000 ns, total = 5.185 us [state-dump] - 7 total (0 active), Execution time: mean = 1.303 us, total = 9.122 us, Queueing time: mean = 117.881 us, max = 169.186 us, min = 12.676 us, total = 825.164 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 6 total (0 active), Execution time: mean = 1.148 ms, total = 6.889 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 6 total (0 active), Execution time: mean = 110.817 us, total = 664.900 us, Queueing time: mean = 322.666 us, max = 426.731 us, min = 202.815 us, total = 1.936 ms [state-dump] WorkerPool.PopWorkerCallback - 6 total (0 active), Execution time: mean = 22.250 us, total = 133.500 us, Queueing time: mean = 164.362 us, max = 282.543 us, min = 35.715 us, total = 986.169 us [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 5 total (0 active), Execution time: mean = 131.495 us, total = 657.476 us, Queueing time: mean = 109.989 us, max = 156.035 us, min = 23.131 us, total = 549.946 us [state-dump] NodeManagerService.grpc_server.ReturnWorker - 5 total (0 active), Execution time: mean = 617.412 us, total = 3.087 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 5 total (1 active, 1 running), Execution time: mean = 2.262 ms, total = 11.312 ms, Queueing time: mean = 70.211 us, max = 163.054 us, min = 48.390 us, total = 351.056 us [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.888 us, total = 3.776 us, Queueing time: mean = 227.000 ns, max = 398.000 ns, min = 56.000 ns, total = 454.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 490.582 ms, total = 981.163 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.645 ms, total = 3.289 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.607 us, total = 285.213 us, Queueing time: mean = 546.575 us, max = 1.076 ms, min = 16.905 us, total = 1.093 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 247.599 us, total = 247.599 us, Queueing time: mean = 76.163 us, max = 76.163 us, min = 76.163 us, total = 76.163 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.696 ms, total = 1.696 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 37.127 us, total = 37.127 us, Queueing time: mean = 28.605 us, max = 28.605 us, min = 28.605 us, total = 28.605 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.021 s, total = 1.021 s, Queueing time: mean = 92.512 us, max = 92.512 us, min = 92.512 us, total = 92.512 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.257 ms, total = 2.257 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 421.706 us, total = 421.706 us, Queueing time: mean = 22.714 us, max = 22.714 us, min = 22.714 us, total = 22.714 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 67.212 us, total = 67.212 us, Queueing time: mean = 172.638 us, max = 172.638 us, min = 172.638 us, total = 172.638 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.669 ms, total = 1.669 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 140.321 us, total = 140.321 us, Queueing time: mean = 124.131 us, max = 124.131 us, min = 124.131 us, total = 124.131 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.804 ms, total = 1.804 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 71.844 us, total = 71.844 us, Queueing time: mean = 236.685 us, max = 236.685 us, min = 236.685 us, total = 236.685 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.185 ms, total = 1.185 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:22:14,070 I 23840 23868] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:22:15,109 I 23840 23840] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, node:__internal_head__: 10000, memory: 777605300230000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -7419015889323649548 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [777605300230000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [190000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [777605300230000]}}, "labels":{"ray.io/node_id":"70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3",} is_draining: 0 is_idle: 0 Cluster resources: node id: -7419015889323649548{"total":{node:192.168.0.2: 10000, accelerator_type:A40: 10000, node:__internal_head__: 10000, memory: 777605300230000, CPU: 200000, object_store_memory: 21474836480000, GPU: 20000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, GPU: 20000, memory: 777605300230000, node:__internal_head__: 10000, CPU: 190000, object_store_memory: 21474836480000}}, "labels":{"ray.io/node_id":"70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 1 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] - (language=PYTHON actor_or_task=process_xlsx_file pid=23977 worker_id=49bac05a110f5910dca66a01e5b69358e4e99d38e09bae5258b63d21): {CPU: 10000} [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] - {depth=1 function_descriptor={type=PythonFunctionDescriptor, module_name=__main__, class_name=, function_name=process_xlsx_file, function_hash=6148d8ed771b449986612c5a33305783} scheduling_strategy=default_scheduling_strategy { [state-dump] } [state-dump] resource_set={CPU : 1, }}: 1/20 [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 19 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 31733 total (35 active) [state-dump] Queueing time: mean = 152.275 us, max = 851.961 ms, min = 56.000 ns, total = 4.832 s [state-dump] Execution time: mean = 231.363 us, total = 7.342 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 7556 total (0 active), Execution time: mean = 477.424 us, total = 3.607 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 7556 total (0 active), Execution time: mean = 36.740 us, total = 277.609 ms, Queueing time: mean = 100.346 us, max = 3.805 ms, min = 3.241 us, total = 758.218 ms [state-dump] ObjectManager.UpdateAvailableMemory - 3597 total (0 active), Execution time: mean = 5.335 us, total = 19.191 ms, Queueing time: mean = 95.369 us, max = 884.800 us, min = 3.084 us, total = 343.044 ms [state-dump] NodeManager.CheckGC - 3597 total (1 active), Execution time: mean = 2.824 us, total = 10.157 ms, Queueing time: mean = 100.220 us, max = 21.528 ms, min = 8.879 us, total = 360.490 ms [state-dump] RaySyncer.OnDemandBroadcasting - 3597 total (1 active), Execution time: mean = 10.542 us, total = 37.921 ms, Queueing time: mean = 93.409 us, max = 21.522 ms, min = 10.651 us, total = 335.991 ms [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 1800 total (1 active), Execution time: mean = 17.418 us, total = 31.353 ms, Queueing time: mean = 71.288 us, max = 2.090 ms, min = 10.864 us, total = 128.319 ms [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1438 total (1 active), Execution time: mean = 441.335 us, total = 634.640 ms, Queueing time: mean = 70.602 us, max = 3.941 ms, min = 11.335 us, total = 101.526 ms [state-dump] NodeManager.ScheduleAndDispatchTasks - 360 total (1 active), Execution time: mean = 14.382 us, total = 5.177 ms, Queueing time: mean = 73.550 us, max = 2.039 ms, min = 12.779 us, total = 26.478 ms [state-dump] NodeManager.deadline_timer.flush_free_objects - 360 total (1 active), Execution time: mean = 7.973 us, total = 2.870 ms, Queueing time: mean = 171.692 us, max = 1.792 ms, min = 11.879 us, total = 61.809 ms [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 360 total (1 active), Execution time: mean = 2.818 us, total = 1.014 ms, Queueing time: mean = 175.257 us, max = 1.792 ms, min = 8.217 us, total = 63.093 ms [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 360 total (0 active), Execution time: mean = 633.775 us, total = 228.159 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 360 total (0 active), Execution time: mean = 125.503 us, total = 45.181 ms, Queueing time: mean = 105.982 us, max = 359.960 us, min = 18.577 us, total = 38.154 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 121 total (1 active), Execution time: mean = 8.190 us, total = 990.991 us, Queueing time: mean = 70.979 us, max = 204.733 us, min = 16.832 us, total = 8.588 ms [state-dump] ClientConnection.async_read.ProcessMessageHeader - 90 total (21 active), Execution time: mean = 6.540 us, total = 588.639 us, Queueing time: mean = 26.534 ms, max = 851.961 ms, min = 23.577 us, total = 2.388 s [state-dump] NodeManager.GcsCheckAlive - 72 total (1 active), Execution time: mean = 257.558 us, total = 18.544 ms, Queueing time: mean = 619.134 us, max = 1.710 ms, min = 8.178 us, total = 44.578 ms [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 72 total (0 active), Execution time: mean = 1.331 ms, total = 95.806 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 72 total (0 active), Execution time: mean = 49.255 us, total = 3.546 ms, Queueing time: mean = 97.010 us, max = 183.790 us, min = 18.246 us, total = 6.985 ms [state-dump] NodeManager.deadline_timer.record_metrics - 72 total (1 active), Execution time: mean = 528.769 us, total = 38.071 ms, Queueing time: mean = 352.968 us, max = 1.366 ms, min = 17.945 us, total = 25.414 ms [state-dump] ClientConnection.async_read.ProcessMessage - 69 total (0 active), Execution time: mean = 894.979 us, total = 61.754 ms, Queueing time: mean = 21.710 us, max = 213.287 us, min = 2.291 us, total = 1.498 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 36 total (1 active), Execution time: mean = 1.685 ms, total = 60.651 ms, Queueing time: mean = 57.743 us, max = 122.552 us, min = 14.721 us, total = 2.079 ms [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.172 us, total = 25.789 us, Queueing time: mean = 38.125 us, max = 64.240 us, min = 17.983 us, total = 838.743 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.277 us, total = 383.809 us, Queueing time: mean = 105.871 us, max = 269.034 us, min = 34.597 us, total = 2.223 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 11.860 us, total = 249.050 us, Queueing time: mean = 81.493 us, max = 198.678 us, min = 16.748 us, total = 1.711 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 116.123 us, total = 2.439 ms, Queueing time: mean = 3.901 ms, max = 13.542 ms, min = 3.723 us, total = 81.931 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 5.361 ms, total = 112.577 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 223.973 us, total = 2.912 ms, Queueing time: mean = 3.425 ms, max = 10.897 ms, min = 38.704 us, total = 44.520 ms [state-dump] RaySyncer.BroadcastMessage - 7 total (0 active), Execution time: mean = 237.935 us, total = 1.666 ms, Queueing time: mean = 740.714 ns, max = 1.189 us, min = 70.000 ns, total = 5.185 us [state-dump] - 7 total (0 active), Execution time: mean = 1.303 us, total = 9.122 us, Queueing time: mean = 117.881 us, max = 169.186 us, min = 12.676 us, total = 825.164 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 6 total (0 active), Execution time: mean = 1.148 ms, total = 6.889 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 6 total (0 active), Execution time: mean = 110.817 us, total = 664.900 us, Queueing time: mean = 322.666 us, max = 426.731 us, min = 202.815 us, total = 1.936 ms [state-dump] WorkerPool.PopWorkerCallback - 6 total (0 active), Execution time: mean = 22.250 us, total = 133.500 us, Queueing time: mean = 164.362 us, max = 282.543 us, min = 35.715 us, total = 986.169 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 6 total (1 active, 1 running), Execution time: mean = 2.324 ms, total = 13.945 ms, Queueing time: mean = 68.250 us, max = 163.054 us, min = 48.390 us, total = 409.502 us [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 5 total (0 active), Execution time: mean = 131.495 us, total = 657.476 us, Queueing time: mean = 109.989 us, max = 156.035 us, min = 23.131 us, total = 549.946 us [state-dump] NodeManagerService.grpc_server.ReturnWorker - 5 total (0 active), Execution time: mean = 617.412 us, total = 3.087 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.888 us, total = 3.776 us, Queueing time: mean = 227.000 ns, max = 398.000 ns, min = 56.000 ns, total = 454.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 490.582 ms, total = 981.163 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.645 ms, total = 3.289 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.607 us, total = 285.213 us, Queueing time: mean = 546.575 us, max = 1.076 ms, min = 16.905 us, total = 1.093 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 247.599 us, total = 247.599 us, Queueing time: mean = 76.163 us, max = 76.163 us, min = 76.163 us, total = 76.163 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.696 ms, total = 1.696 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 37.127 us, total = 37.127 us, Queueing time: mean = 28.605 us, max = 28.605 us, min = 28.605 us, total = 28.605 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.021 s, total = 1.021 s, Queueing time: mean = 92.512 us, max = 92.512 us, min = 92.512 us, total = 92.512 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.257 ms, total = 2.257 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 421.706 us, total = 421.706 us, Queueing time: mean = 22.714 us, max = 22.714 us, min = 22.714 us, total = 22.714 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 67.212 us, total = 67.212 us, Queueing time: mean = 172.638 us, max = 172.638 us, min = 172.638 us, total = 172.638 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.669 ms, total = 1.669 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 140.321 us, total = 140.321 us, Queueing time: mean = 124.131 us, max = 124.131 us, min = 124.131 us, total = 124.131 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.804 ms, total = 1.804 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 71.844 us, total = 71.844 us, Queueing time: mean = 236.685 us, max = 236.685 us, min = 236.685 us, total = 236.685 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.185 ms, total = 1.185 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:25:01,038 I 23840 23868] (raylet) store.cc:564: Plasma store debug dump: Current usage: 0 / 2.14748 GB - num bytes created total: 168 0 pending objects of total size 0MB - objects spillable: 0 - bytes spillable: 0 - objects unsealed: 0 - bytes unsealed: 0 - objects in use: 0 - bytes in use: 0 - objects evictable: 0 - bytes evictable: 0 - objects created by worker: 0 - bytes created by worker: 0 - objects restored: 0 - bytes restored: 0 - objects received: 0 - bytes received: 0 - objects errored: 0 - bytes errored: 0 [2025-01-21 06:25:01,044 I 23840 23840] (raylet) node_manager.cc:525: [state-dump] NodeManager: [state-dump] Node ID: 70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3 [state-dump] Node name: 192.168.0.2 [state-dump] InitialConfigResources: {node:192.168.0.2: 10000, CPU: 200000, accelerator_type:A40: 10000, node:__internal_head__: 10000, memory: 777605300230000, object_store_memory: 21474836480000, GPU: 20000} [state-dump] ClusterTaskManager: [state-dump] ========== Node: 70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3 ================= [state-dump] Infeasible queue length: 0 [state-dump] Schedule queue length: 0 [state-dump] Dispatch queue length: 0 [state-dump] num_waiting_for_resource: 0 [state-dump] num_waiting_for_plasma_memory: 0 [state-dump] num_waiting_for_remote_node_resources: 0 [state-dump] num_worker_not_started_by_job_config_not_exist: 0 [state-dump] num_worker_not_started_by_registration_timeout: 0 [state-dump] num_tasks_waiting_for_workers: 0 [state-dump] num_cancelled_tasks: 0 [state-dump] cluster_resource_scheduler state: [state-dump] Local id: -7419015889323649548 Local resources: {"total":{GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [200000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [777605300230000]}}, "available": {GPU: [10000, 10000], node:192.168.0.2: [10000], accelerator_type:A40: [10000], CPU: [190000], object_store_memory: [21474836480000], node:__internal_head__: [10000], memory: [777605300230000]}}, "labels":{"ray.io/node_id":"70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3",} is_draining: 0 is_idle: 0 Cluster resources: node id: -7419015889323649548{"total":{node:192.168.0.2: 10000, accelerator_type:A40: 10000, node:__internal_head__: 10000, memory: 777605300230000, CPU: 200000, object_store_memory: 21474836480000, GPU: 20000}}, "available": {accelerator_type:A40: 10000, node:192.168.0.2: 10000, GPU: 20000, memory: 777605300230000, node:__internal_head__: 10000, CPU: 190000, object_store_memory: 21474836480000}}, "labels":{"ray.io/node_id":"70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} [state-dump] Waiting tasks size: 0 [state-dump] Number of executing tasks: 1 [state-dump] Number of pinned task arguments: 0 [state-dump] Number of total spilled tasks: 0 [state-dump] Number of spilled waiting tasks: 0 [state-dump] Number of spilled unschedulable tasks: 0 [state-dump] Resource usage { [state-dump] - (language=PYTHON actor_or_task=process_xlsx_file pid=23977 worker_id=49bac05a110f5910dca66a01e5b69358e4e99d38e09bae5258b63d21): {CPU: 10000} [state-dump] } [state-dump] Backlog Size per scheduling descriptor :{workerId: num backlogs}: [state-dump] [state-dump] Running tasks by scheduling class: [state-dump] - {depth=1 function_descriptor={type=PythonFunctionDescriptor, module_name=__main__, class_name=, function_name=process_xlsx_file, function_hash=6148d8ed771b449986612c5a33305783} scheduling_strategy=default_scheduling_strategy { [state-dump] } [state-dump] resource_set={CPU : 1, }}: 1/20 [state-dump] ================================================== [state-dump] [state-dump] ClusterResources: [state-dump] LocalObjectManager: [state-dump] - num pinned objects: 0 [state-dump] - pinned objects size: 0 [state-dump] - num objects pending restore: 0 [state-dump] - num objects pending spill: 0 [state-dump] - num bytes pending spill: 0 [state-dump] - num bytes currently spilled: 0 [state-dump] - cumulative spill requests: 0 [state-dump] - cumulative restore requests: 0 [state-dump] - spilled objects pending delete: 0 [state-dump] [state-dump] ObjectManager: [state-dump] - num local objects: 0 [state-dump] - num unfulfilled push requests: 0 [state-dump] - num object pull requests: 0 [state-dump] - num chunks received total: 0 [state-dump] - num chunks received failed (all): 0 [state-dump] - num chunks received failed / cancelled: 0 [state-dump] - num chunks received failed / plasma error: 0 [state-dump] Event stats: [state-dump] Global stats: 0 total (0 active) [state-dump] Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Execution time: mean = -nan s, total = 0.000 s [state-dump] Event stats: [state-dump] PushManager: [state-dump] - num pushes in flight: 0 [state-dump] - num chunks in flight: 0 [state-dump] - num chunks remaining: 0 [state-dump] - max chunks allowed: 409 [state-dump] OwnershipBasedObjectDirectory: [state-dump] - num listeners: 0 [state-dump] - cumulative location updates: 0 [state-dump] - num location updates per second: 0.000 [state-dump] - num location lookups per second: 0.000 [state-dump] - num locations added per second: 0.000 [state-dump] - num locations removed per second: 0.000 [state-dump] BufferPool: [state-dump] - create buffer state map size: 0 [state-dump] PullManager: [state-dump] - num bytes available for pulled objects: 2147483648 [state-dump] - num bytes being pulled (all): 0 [state-dump] - num bytes being pulled / pinned: 0 [state-dump] - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} [state-dump] - first get request bundle: N/A [state-dump] - first wait request bundle: N/A [state-dump] - first task request bundle: N/A [state-dump] - num objects queued: 0 [state-dump] - num objects actively pulled (all): 0 [state-dump] - num objects actively pulled / pinned: 0 [state-dump] - num bundles being pulled: 0 [state-dump] - num pull retries: 0 [state-dump] - max timeout seconds: 0 [state-dump] - max timeout request is already processed. No entry. [state-dump] [state-dump] WorkerPool: [state-dump] - registered jobs: 1 [state-dump] - process_failed_job_config_missing: 0 [state-dump] - process_failed_rate_limited: 0 [state-dump] - process_failed_pending_registration: 0 [state-dump] - process_failed_runtime_env_setup_failed: 0 [state-dump] - num PYTHON workers: 20 [state-dump] - num PYTHON drivers: 1 [state-dump] - num PYTHON pending start requests: 0 [state-dump] - num PYTHON pending registration requests: 0 [state-dump] - num object spill callbacks queued: 0 [state-dump] - num object restore queued: 0 [state-dump] - num util functions queued: 0 [state-dump] - num idle workers: 19 [state-dump] TaskDependencyManager: [state-dump] - task deps map size: 0 [state-dump] - get req map size: 0 [state-dump] - wait req map size: 0 [state-dump] - local objects map size: 0 [state-dump] WaitManager: [state-dump] - num active wait requests: 0 [state-dump] Subscriber: [state-dump] Channel WORKER_OBJECT_EVICTION [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_REF_REMOVED_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] Channel WORKER_OBJECT_LOCATIONS_CHANNEL [state-dump] - cumulative subscribe requests: 0 [state-dump] - cumulative unsubscribe requests: 0 [state-dump] - active subscribed publishers: 0 [state-dump] - cumulative published messages: 0 [state-dump] - cumulative processed messages: 0 [state-dump] num async plasma notifications: 0 [state-dump] Remote node managers: [state-dump] Event stats: [state-dump] Global stats: 32378 total (77 active) [state-dump] Queueing time: mean = 264.587 ms, max = 525.076 s, min = 56.000 ns, total = 8566.800 s [state-dump] Execution time: mean = 230.069 us, total = 7.449 s [state-dump] Event stats: [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog - 7707 total (19 active), Execution time: mean = 477.113 us, total = 3.677 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 7707 total (19 active), Execution time: mean = 36.758 us, total = 283.291 ms, Queueing time: mean = 100.041 us, max = 3.805 ms, min = 3.241 us, total = 771.016 ms [state-dump] ObjectManager.UpdateAvailableMemory - 3667 total (1 active), Execution time: mean = 5.333 us, total = 19.556 ms, Queueing time: mean = 95.333 us, max = 884.800 us, min = 3.084 us, total = 349.586 ms [state-dump] NodeManager.CheckGC - 3667 total (1 active), Execution time: mean = 2.824 us, total = 10.357 ms, Queueing time: mean = 43.452 ms, max = 158.970 s, min = 8.879 us, total = 159.337 s [state-dump] RaySyncer.OnDemandBroadcasting - 3667 total (1 active), Execution time: mean = 10.544 us, total = 38.663 ms, Queueing time: mean = 43.445 ms, max = 158.970 s, min = 10.651 us, total = 159.312 s [state-dump] RayletWorkerPool.deadline_timer.kill_idle_workers - 1835 total (1 active), Execution time: mean = 17.463 us, total = 32.045 ms, Queueing time: mean = 86.717 ms, max = 158.994 s, min = 10.864 us, total = 159.125 s [state-dump] MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 1466 total (1 active), Execution time: mean = 441.157 us, total = 646.736 ms, Queueing time: mean = 108.499 ms, max = 158.957 s, min = 11.335 us, total = 159.060 s [state-dump] NodeManager.ScheduleAndDispatchTasks - 368 total (1 active), Execution time: mean = 14.406 us, total = 5.301 ms, Queueing time: mean = 431.924 ms, max = 158.921 s, min = 12.779 us, total = 158.948 s [state-dump] NodeManager.deadline_timer.flush_free_objects - 368 total (1 active), Execution time: mean = 8.000 us, total = 2.944 ms, Queueing time: mean = 431.920 ms, max = 158.883 s, min = 11.879 us, total = 158.946 s [state-dump] NodeManager.deadline_timer.spill_objects_when_over_threshold - 368 total (1 active), Execution time: mean = 2.828 us, total = 1.041 ms, Queueing time: mean = 431.923 ms, max = 158.883 s, min = 8.217 us, total = 158.948 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad - 368 total (1 active), Execution time: mean = 633.400 us, total = 233.091 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 368 total (1 active), Execution time: mean = 125.084 us, total = 46.031 ms, Queueing time: mean = 106.172 us, max = 359.960 us, min = 18.577 us, total = 39.071 ms [state-dump] ClusterResourceManager.ResetRemoteNodeView - 124 total (1 active), Execution time: mean = 8.169 us, total = 1.013 ms, Queueing time: mean = 1.266 s, max = 156.943 s, min = 16.832 us, total = 156.952 s [state-dump] ClientConnection.async_read.ProcessMessageHeader - 90 total (8 active), Execution time: mean = 7.173 us, total = 645.572 us, Queueing time: mean = 75.857 s, max = 525.076 s, min = 23.577 us, total = 6827.136 s [state-dump] ClientConnection.async_read.ProcessMessage - 82 total (13 active), Execution time: mean = 753.092 us, total = 61.754 ms, Queueing time: mean = 18.268 us, max = 213.287 us, min = 2.291 us, total = 1.498 ms [state-dump] NodeManager.GcsCheckAlive - 75 total (1 active), Execution time: mean = 258.110 us, total = 19.358 ms, Queueing time: mean = 2.079 s, max = 155.885 s, min = 8.178 us, total = 155.931 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 75 total (1 active), Execution time: mean = 1.315 ms, total = 98.589 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManager.deadline_timer.record_metrics - 75 total (1 active), Execution time: mean = 526.723 us, total = 39.504 ms, Queueing time: mean = 2.079 s, max = 155.885 s, min = 17.945 us, total = 155.911 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 74 total (0 active), Execution time: mean = 49.593 us, total = 3.670 ms, Queueing time: mean = 97.403 us, max = 183.790 us, min = 18.246 us, total = 7.208 ms [state-dump] NodeManager.deadline_timer.debug_state_dump - 38 total (1 active), Execution time: mean = 1.693 ms, total = 64.316 ms, Queueing time: mean = 4.102 s, max = 155.885 s, min = 14.721 us, total = 155.887 s [state-dump] ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.172 us, total = 25.789 us, Queueing time: mean = 38.125 us, max = 64.240 us, min = 17.983 us, total = 838.743 us [state-dump] ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 18.277 us, total = 383.809 us, Queueing time: mean = 105.871 us, max = 269.034 us, min = 34.597 us, total = 2.223 ms [state-dump] ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 11.860 us, total = 249.050 us, Queueing time: mean = 81.493 us, max = 198.678 us, min = 16.748 us, total = 1.711 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 116.123 us, total = 2.439 ms, Queueing time: mean = 3.901 ms, max = 13.542 ms, min = 3.723 us, total = 81.931 ms [state-dump] NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 5.361 ms, total = 112.577 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 223.973 us, total = 2.912 ms, Queueing time: mean = 3.425 ms, max = 10.897 ms, min = 38.704 us, total = 44.520 ms [state-dump] RaySyncer.BroadcastMessage - 7 total (0 active), Execution time: mean = 237.935 us, total = 1.666 ms, Queueing time: mean = 740.714 ns, max = 1.189 us, min = 70.000 ns, total = 5.185 us [state-dump] NodeManager.deadline_timer.print_event_loop_stats - 7 total (1 active, 1 running), Execution time: mean = 2.416 ms, total = 16.912 ms, Queueing time: mean = 67.295 us, max = 163.054 us, min = 48.390 us, total = 471.064 us [state-dump] - 7 total (0 active), Execution time: mean = 1.303 us, total = 9.122 us, Queueing time: mean = 117.881 us, max = 169.186 us, min = 12.676 us, total = 825.164 us [state-dump] NodeManagerService.grpc_server.RequestWorkerLease - 6 total (0 active), Execution time: mean = 1.148 ms, total = 6.889 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 6 total (0 active), Execution time: mean = 110.817 us, total = 664.900 us, Queueing time: mean = 322.666 us, max = 426.731 us, min = 202.815 us, total = 1.936 ms [state-dump] WorkerPool.PopWorkerCallback - 6 total (0 active), Execution time: mean = 22.250 us, total = 133.500 us, Queueing time: mean = 164.362 us, max = 282.543 us, min = 35.715 us, total = 986.169 us [state-dump] NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 5 total (0 active), Execution time: mean = 131.495 us, total = 657.476 us, Queueing time: mean = 109.989 us, max = 156.035 us, min = 23.131 us, total = 549.946 us [state-dump] NodeManagerService.grpc_server.ReturnWorker - 5 total (0 active), Execution time: mean = 617.412 us, total = 3.087 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] RaySyncerRegister - 2 total (0 active), Execution time: mean = 1.888 us, total = 3.776 us, Queueing time: mean = 227.000 ns, max = 398.000 ns, min = 56.000 ns, total = 454.000 ns [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 490.582 ms, total = 981.163 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.645 ms, total = 3.289 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 142.607 us, total = 285.213 us, Queueing time: mean = 546.575 us, max = 1.076 ms, min = 16.905 us, total = 1.093 ms [state-dump] ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 247.599 us, total = 247.599 us, Queueing time: mean = 76.163 us, max = 76.163 us, min = 76.163 us, total = 76.163 us [state-dump] NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 1.696 ms, total = 1.696 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 37.127 us, total = 37.127 us, Queueing time: mean = 28.605 us, max = 28.605 us, min = 28.605 us, total = 28.605 us [state-dump] ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.021 s, total = 1.021 s, Queueing time: mean = 92.512 us, max = 92.512 us, min = 92.512 us, total = 92.512 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.257 ms, total = 2.257 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 421.706 us, total = 421.706 us, Queueing time: mean = 22.714 us, max = 22.714 us, min = 22.714 us, total = 22.714 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 67.212 us, total = 67.212 us, Queueing time: mean = 172.638 us, max = 172.638 us, min = 172.638 us, total = 172.638 us [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.669 ms, total = 1.669 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 140.321 us, total = 140.321 us, Queueing time: mean = 124.131 us, max = 124.131 us, min = 124.131 us, total = 124.131 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.804 ms, total = 1.804 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 71.844 us, total = 71.844 us, Queueing time: mean = 236.685 us, max = 236.685 us, min = 236.685 us, total = 236.685 us [state-dump] ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.185 ms, total = 1.185 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s [state-dump] DebugString() time ms: 1 [state-dump] [state-dump] [2025-01-21 06:25:01,044 I 23840 23840] (raylet) main.cc:454: received SIGTERM. Existing local drain request = None [2025-01-21 06:25:01,044 I 23840 23840] (raylet) main.cc:255: Raylet graceful shutdown triggered, reason = EXPECTED_TERMINATION, reason message = received SIGTERM [2025-01-21 06:25:01,044 I 23840 23840] (raylet) main.cc:258: Shutting down... [2025-01-21 06:25:01,044 I 23840 23840] (raylet) accessor.cc:510: Unregistering node node_id=70fc2fa5396fc132250cfaf4b4655b76759ab81263e74a39254c24b3 [2025-01-21 06:25:01,044 I 23840 23840] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=0, has creation task exception = false [2025-01-21 06:25:01,044 I 23840 23840] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=0, has creation task exception = false [2025-01-21 06:25:01,044 I 23840 23840] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=0, has creation task exception = false [2025-01-21 06:25:01,045 I 23840 23840] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=0, has creation task exception = false [2025-01-21 06:25:01,045 I 23840 23840] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=0, has creation task exception = false [2025-01-21 06:25:01,046 I 23840 23840] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=0, has creation task exception = false [2025-01-21 06:25:01,046 I 23840 23840] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=0, has creation task exception = false [2025-01-21 06:25:01,046 I 23840 23840] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=0, has creation task exception = false [2025-01-21 06:25:01,046 I 23840 23840] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=0, has creation task exception = false [2025-01-21 06:25:01,046 I 23840 23840] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=0, has creation task exception = false [2025-01-21 06:25:01,046 I 23840 23840] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=0, has creation task exception = false [2025-01-21 06:25:01,046 I 23840 23840] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=0, has creation task exception = false [2025-01-21 06:25:01,046 I 23840 23840] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=0, has creation task exception = false [2025-01-21 06:25:01,047 I 23840 23840] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=0, has creation task exception = false [2025-01-21 06:25:01,047 I 23840 23840] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=0, has creation task exception = false [2025-01-21 06:25:01,048 I 23840 23840] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=0, has creation task exception = false [2025-01-21 06:25:01,048 I 23840 23840] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=0, has creation task exception = false [2025-01-21 06:25:01,049 I 23840 23840] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=0, has creation task exception = false [2025-01-21 06:25:01,050 I 23840 23933] (raylet) agent_manager.cc:79: Agent process with name dashboard_agent/424238335 exited, exit code 0. [2025-01-21 06:25:01,050 E 23840 23933] (raylet) agent_manager.cc:83: The raylet exited immediately because one Ray agent failed, agent_name = dashboard_agent/424238335. The raylet fate shares with the agent. This can happen because - The version of `grpcio` doesn't follow Ray's requirement. Agent can segfault with the incorrect `grpcio` version. Check the grpcio version `pip freeze | grep grpcio`. - The agent failed to start because of unexpected error or port conflict. Read the log `cat /tmp/ray/session_latest/logs/{dashboard_agent|runtime_env_agent}.log`. You can find the log file structure here https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#logging-directory-structure. - The agent is killed by the OS (e.g., out of memory). [2025-01-21 06:25:01,050 I 23840 23840] (raylet) main.cc:252: Raylet shutdown already triggered, ignoring this request. [2025-01-21 06:25:01,050 I 23840 23840] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=0, has creation task exception = false [2025-01-21 06:25:01,057 I 23840 23840] (raylet) node_manager.cc:821: didn't find failure cause for task c416957131f739dfa1a3eed0596cdbd4ba38302601000000 [2025-01-21 06:25:01,057 I 23840 23840] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=0, has creation task exception = false [2025-01-21 06:25:01,057 I 23840 23840] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=0, has creation task exception = false [2025-01-21 06:25:01,057 I 23840 23840] (raylet) node_manager.cc:1495: Ignoring client disconnect because the client has already been disconnected. [2025-01-21 06:25:01,081 I 23840 23840] (raylet) ray_syncer-inl.h:318: Failed to read the message from: 00000000000000000000000000000000000000000000000000000000 [2025-01-21 06:25:02,052 I 23840 23840] (raylet) worker_pool.cc:501: Started worker process with pid 25704, the token is 20 [2025-01-21 06:25:02,058 I 23840 23840] (raylet) worker_pool.cc:501: Started worker process with pid 25705, the token is 21 [2025-01-21 06:25:03,084 I 23840 23840] (raylet) ray_syncer.cc:236: Connection is broken. Reconnect to node. node_id=00000000000000000000000000000000000000000000000000000000 [2025-01-21 06:25:03,084 I 23840 23840] (raylet) ray_syncer-inl.h:318: Failed to read the message from: 00000000000000000000000000000000000000000000000000000000 [2025-01-21 06:25:03,999 I 23840 23840] (raylet) node_manager.cc:1481: NodeManager::DisconnectClient, disconnect_type=3, has creation task exception = false [2025-01-21 06:25:03,999 I 23840 23840] (raylet) node_manager.cc:1586: Driver (pid=21900) is disconnected. worker_id=01000000ffffffffffffffffffffffffffffffffffffffffffffffff job_id=01000000