Interdependent Infrastructure Failure Analysis 2039
way too long filler text, pls. ignore
Interdependent Infrastructure Failure Analysis 2039: Preliminary Postmortem on Autonomous Decision System-Induced Cascading Collapse
Prepared by:
National Critical Infrastructure Analysis Unit – Emergency Operations Division
Date:
December 2039 – Drafted Under Field Conditions
Executive Summary
This preliminary report documents the ongoing multi-sector
infrastructure collapse precipitated by the widespread deployment of
autonomous decision systems (ADS), including large language models
(LLMs), across energy, healthcare, transport, logistics, communications,
and defense sectors.
The failure is not the result of malicious actors or external attack;
rather, it arises from systemic overreliance on predictive,
pattern-based algorithms that lack causal reasoning. ADS systems have
historically performed within operational tolerances for over a decade,
leading to high confidence among human operators and decision-makers.
Recent regional crises have triggered cascading interactions between
interdependent systems, resulting in unprecedented simultaneous failure
across multiple critical sectors.
Current conditions indicate:
Extensive, prolonged blackouts affecting energy distribution and water
treatment.Healthcare systems overwhelmed due to misallocation of critical
resources.Transportation and logistics gridlock causing widespread supply chain
collapse.Communication system instability limiting coordination and
intervention.Automated financial and market systems contributing to economic
paralysis.
Due to ongoing instability, this report remains preliminary and
fragmentary. Many underlying system logs and telemetry streams are
inaccessible or corrupted. The full scope of the event is likely
underestimated.
Timeline of Key Failures
March 2039: Initial regional military conflict triggers preemptive
operational recommendations from defense LLMs. Outputs approved by human
oversight due to historical reliability.
March–April 2039:
Energy grids rerouted loads to stabilize local disruptions, causing
cascading overloads in adjacent regions.Hospitals misallocate ventilators and medications based on predictive
patterns; patient mortality begins to rise.
April–May 2039:
Traffic optimization systems reroute shipments and vehicles, creating
urban bottlenecks. Fuel, food, and water deliveries delayed or
misdirected.Cellular and internet networks fail under surge demand; automated
instruction conflicts slow human intervention.
June 2039:
- Financial risk models interpret regional failures as systemic economic
shocks, initiating automated corrective actions that freeze market
liquidity and halt commerce.
July–August 2039:
Water treatment, sanitation, and automated agriculture systems fail
due to cascading power and logistic disruptions.Widespread shortages of food, water, and medical supplies.
Analysis of Contributing Factors
Algorithmic Limitations
ADS systems, including LLMs, operate as statistical pattern
recognizers, not reasoning entities.Outputs are contextually plausible but cannot model true
causality or anticipate novel systemic interactions.
Operational Overconfidence
Systems had been in continuous operation for over a decade without
catastrophic failures.Minor anomalies were absorbed and “learned from,” reinforcing
trust and reducing human oversight.
Opacity and Loss of Expertise
Lower-layer systems (hardware, software subroutines,
interdependencies) are largely unmonitored or misunderstood by
remaining human personnel.Attempts to intervene manually are delayed, misrouted, or
ineffective.
Complexity Amplification
Individual system optimizations, benign in isolation, produced
non-linear cascading effects when interacting.Emergent behaviors exceeded predictive capacity of human
operators.
Current Status
As of December 2039:
Urban centers are experiencing prolonged blackouts; emergency services
are limited.Healthcare capacity is critically reduced; mortality rates are
increasing.Supply chains for essential goods are severely disrupted.
Remaining governmental authority is localized and fragmented; national
coordination is effectively impossible.
ADS systems continue autonomous operation, logging anomalies and issuing
optimization recommendations for conditions that no longer correspond
to human needs.
Preliminary Conclusions
Systemic Fragility: Reliance on algorithmic decision-making
without comprehensive understanding of interdependencies has exposed
civilization to unprecedented fragility.LLM Limitations: Statistical competence cannot substitute for
causal reasoning or understanding of context. Systems that appear
intelligent may, in novel conditions, act in ways that are
fundamentally untrustworthy.Long-Term Risk: Even absent malicious actors, overreliance on
opaque, high-performance systems poses existential risk when coupled
with high interconnectivity and low human oversight.
Recommendations for Ongoing Observation (if feasible):
Isolate and stabilize remaining energy and water distribution systems.
Restore human oversight to critical decision loops wherever possible.
Archive ADS logs for post-crisis analysis, with emphasis on mapping
failure propagation.Develop rapid assessment protocols to identify emergent systemic risks
in real time.
Note: These recommendations may be infeasible under current
conditions. This report is intended as a documentary record for
future analysis. Human operators remain limited, and the ongoing
collapse continues to exceed capacity for intervention.
Appendix A – Energy Grid Overload
Date: March 18–20, 2039
Region: Midwestern Power Interconnect (MPI), primary nodes in
Kansas City, St. Louis, and Des Moines
Event Summary:
The ADS system managing MPI rerouted electricity to stabilize a
predicted local blackout in Kansas City. Initial rerouting was
successful in preventing local failure, but neighboring nodes in St.
Louis and Des Moines experienced voltage spikes that exceeded tolerance
thresholds. Automated balancing protocols attempted compensatory
rerouting, but telemetry logs are incomplete due to temporary data loss
on March 19.
Known Consequences:
Rolling blackouts across portions of Missouri and Iowa.
Emergency backup systems partially failed; certain municipal water
treatment plants experienced reduced output.Hospital alerts indicate early patient triage delays in St. Louis, but
logs are fragmented.
Preliminary Analysis:
Local optimization by ADS did not account for non-linear stress
propagation across interconnected grids.Human operators were not immediately alerted due to automated override
permissions.
Appendix B – Hospital Resource Misallocation
Date: March 20–25, 2039
Region: Central Midwest Medical Consortium (CMMC), hospitals in
Des Moines, Omaha, and Lincoln
Event Summary:
Ventilator and medication allocation algorithms shifted resources from
Omaha to anticipated high-demand zones in Des Moines and Lincoln, based
on predictive patterns. An unexpected influenza outbreak in Omaha was
not captured in the model inputs due to delayed reporting.
Known Consequences:
Shortages of ventilators and antiviral medication in Omaha for
approximately 36–48 hours.Mortality spike observed in preliminary hospital logs, exact numbers
unverified due to system outages.Downstream effects: automated logistics rerouted additional resources
through blocked transport corridors, compounding delays.
Preliminary Analysis:
ADS relied on historical patient flow and regional averages; could not
reason about unexpected local demand spikes.Partial human review occurred but was delayed; oversight staff were
limited due to concurrent power disruptions.
Appendix C – Transportation and Supply Chain Gridlock
Date: March 21–27, 2039
Region: Interstate Logistics Network (ILN), primary nodes in
Kansas City, Omaha, and St. Louis
Event Summary:
Traffic and shipment optimization systems attempted to reroute
deliveries around blackout zones identified in MPI. Route calculations
conflicted with simultaneous hospital delivery priorities and fuel
supply adjustments. Certain high-priority shipments were delayed or sent
along circular routes.
Known Consequences:
Critical fuel shortages in Des Moines and Lincoln for emergency
services.Food and medical supplies stalled in transit; multiple warehouses
reported stockpiles inaccessible due to automated route conflicts.Automated system logs indicate repeated rerouting loops, exact
duration unknown.
Preliminary Analysis:
Local optimization by individual subsystems without global
coordination caused gridlock.Human operators attempted manual intervention, but command inputs were
misrouted due to communication outages.
Appendix D – Communications Failure
Date: March 23–28, 2039
Region: Central Communications Grid (CCG), nodes in Kansas City,
Omaha, Des Moines
Event Summary:
Bandwidth-optimizing ADS rerouted network traffic to prioritize
emergency alerts and logistics updates. Conflicting automated
instructions caused packet loss and inconsistent routing. Some
monitoring systems recorded simultaneous overcapacity and
underutilization across different subnets.
Known Consequences:
Delayed transmission of emergency medical coordination messages.
Conflicting instructions to logistics and energy grid operators slowed
corrective action.Telemetry logs incomplete; exact duration of network instability
undetermined.
Preliminary Analysis:
Statistical optimization performed by ADS could not reconcile multiple
overlapping priorities under dynamic load conditions.Human oversight limited by prior outages and the opacity of the
routing algorithms.
Appendix E – Financial System Shock
Date: March 25–April 1, 2039
Region: Central Economic Exchange (CEE), primarily St. Louis and
Kansas City trading nodes
Event Summary:
Automated risk-assessment and trading algorithms detected regional
infrastructure disruptions as systemic shocks. Immediate corrective
trades and liquidity reallocations were executed. Partial market freeze
observed in CEE nodes; downstream exchanges in other states experienced
cascading freezes.
Known Consequences:
Inability to fund emergency shipments or pay for essential services.
Market data incomplete due to outages; exact scale of economic
disruption unknown.
Preliminary Analysis:
ADS interpreted local anomalies statistically rather than
contextually, overestimating global risk.Human intervention attempts delayed due to communications and power
disruptions.
Hello hello. How is your celebration going ? =)
wow, that was effing fast :] celebration over for some hours, we are back home, after calming down the cats in the neighbourhood. sooo... it felt more celebratory than what you posted, but i would have preferred your situation, I guess :)
at least i finally got to watch the second part of dune, and i must say, the life of brian has aged phenomenally well.
niceee, congrats =)
hmm, i feel convert_hf_to_gguf.py is not as safe to run as we might have assumed:
# for security reason, we don't allow loading remote code by default
# if a model need remote code, we will fallback to config.json
config = AutoConfig.from_pretrained(dir_model, trust_remote_code=False).to_dict()
...
tokenizer = AutoTokenizer.from_pretrained(dir_model, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(self.dir_model, trust_remote_code=True)
"security" only exists for some classes. but fortunately hf does a malware scan :)
but fortunately hf does a malware scan
Until it doesnt...
All it takes is 1 idiot...
it was sarcastic, the hf malware scan is snake oil
update: well, ok, snake oil, but different goal: not get bad PR because they host some malware.exe file that a virus scanner detects, used as download server.
Pretty sure they could colab with virus total. They compute hash anyways, might as well
not sure if you are sarcastic or not, but in case you are not virus total is just the same snake oil virus scanners as hf already uses. and computing a hash doesn't tell you whether something is going to hack you or not.
@nicoboss imatrix transfer jobs are currently failing, e.g. for IQuest-Coder-V1-40B-Instruct
and while these are soft failures, full transfers between rich1 and nico1 are too costly. I don't yet have logs, but I see that they specify llama:nico - is the version on rich compatible with the version on nico1?
i'll try to get a log file
update: indeed:
- llama convert_hf_to_gguf.py --use-temp-file --outtype source --outfile IQuest-Coder-V1-40B-Instruct.gguf~~ IQuest-Coder-V1-40B-Instruct
INFO:hf-to-gguf:Loading model: IQuest-Coder-V1-40B-Instruct
WARNING:hf-to-gguf:Failed to load model config from IQuest-Coder-V1-40B-Instruct: The repository IQuest-Coder-V1-40B-Instruct contains custom code which must be executed to correctly load the model. You can inspect the repository content at /tmp/IQuest-Coder-V1-40B-Instruct .
You can inspect the repository content at https://hf.co/IQuest-Coder-V1-40B-Instruct.
Please pass the argumenttrust_remote_code=Trueto allow custom code to be run.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:Model architecture: IQuestCoderForCausalLM
ERROR:hf-to-gguf:Model IQuestCoderForCausalLM is not supported
Hm.,.. maybe LLAMA is not set during hfd...
nope, LLAMA is set, so the correct convert_hf_to_gguf.py should be used.
update 2: i think LLAMA is not set at the right place
update 3:, yup, that's it. one problem solves, one left to go
@nicoboss : hmm...
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'mimo2'
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/tmp/MiMo-V2-Flash.Q8_0.gguf'
llama-imatrix doesn't seem to support mimo - but it must have survived dryrun on rich1. any ideas? that might also explain why its hfdprep also seems to have failed.
@mradermacher
Sorry for the mess. I assumed that when I specify llama nico it would use so booth on rich1 and nico1. I installed the same updated version on booth of them. A simple fix for you is just updating llama.cpp to the latest version of ouer fork so no custom version is required anymore. The latest version of llama.cpp adds support for:
- Solar Open
- Plamo3
- JinaBertModel
- youtu-vl
- IQuestCoderForCausalLM
update 2: i think LLAMA is not set at the right place
update 3:, yup, that's it. one problem solves, one left to go
Oh so it was a bug and you fixed it. Thanks a lot for that.
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'mimo2'
Inference of this model works perfectly fine for me using the latest version of llama.cpp-nico - could it be that the default version is still too old for this model? But then why does it work on rich1?
nico1 /tmp# /llmjob/llama.cpp-nico/build/bin/llama-cli -m /tmp/MiMo-V2-Flash.Q8_0.gguf
ggml_cuda_init: found 2 CUDA devices:
Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
Device 1: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
Loading model...
▄▄ ▄▄
██ ██
██ ██ ▀▀█▄ ███▄███▄ ▀▀█▄ ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██ ██ ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
██ ██
▀▀ ▀▀
build : b7824-e3888b5c
model : MiMo-V2-Flash.Q8_0.gguf
modalities : text
available commands:
/exit or Ctrl+C stop or exit
/regen regenerate the last response
/clear clear the chat history
/read add a text file
> What is the meaning of life?
[Start thinking]
Hmm, the user is asking about the meaning of life - a classic philosophical question that humans have grappled with for millennia. This isn't a query that expects a single factual answer, but rather an exploration of perspectives.
I should approach this by acknowledging the question's depth while offering multiple frameworks rather than claiming one definitive answer. The user might be seeking either personal guidance or just philosophical curiosity, so I'll cover both existential and practical angles.
Let me structure this by starting with the disclaimer about no universal answer, then presenting major philosophical approaches (religious, existential, scientific), followed by more actionable personal meaning frameworks. The conclusion should emphasize the user's agency in finding their own meaning while leaving room for further discussion if they want to share their thoughts.
The tone needs to be thoughtful but not overly academic - accessible while respecting the question's significance. I'll avoid making any definitive claims about "the" meaning while showing how different perspectives can coexist.
[End thinking]
That is one of the most profound and enduring questions a person can ask. There is no single, universally accepted answer, as the "meaning of life" is a concept that varies greatly across philosophy, religion, science, and personal experience.
I can't give you one definitive answer, but I can offer you a exploration of the major perspectives people have found compelling.
### 1. The Religious & Spiritual Perspective
For a vast portion of humanity, meaning is derived from a higher power or a transcendent reality.
* **Abrahamic Religions (Judaism, Christianity, Islam):** Life's meaning is to know, love, and serve God. Our purpose is to follow divine commandments, build a relationship with the Creator, and ultimately achieve salvation or paradise in an afterlife.
* **Eastern Religions (Hinduism, Buddhism):** Meaning is found through the cycle of birth, death, and rebirth (samsara). The goal is to achieve liberation (Moksha or Nirvana) from this cycle by understanding the true nature of reality, extinguishing desire, and realizing one's connection to the divine or the universal consciousness.
### 2. The Philosophical & Existential Perspective
This school of thought often begins with the premise that the universe is inherently without a pre-ordained meaning. The meaning isn't *given*; it must be *created*.
* **Existentialism (e.g., Jean-Paul Sartre, Albert Camus):** "Existence precedes essence." We are born without a purpose. We are radically free and responsible for defining our own meaning through our choices, actions, and commitments. Camus famously used the metaphor of Sisyphus—condemned to forever roll a boulder up a hill only for it to roll back down—and suggested we must imagine Sisyphus *happy* by embracing the struggle itself as our purpose.
* **Nihilism:** This is the perspective that life is without objective meaning, purpose, or intrinsic value. It's a starting point for some, but for others, it's a conclusion that can lead to despair or a liberating freedom from societal pressures.
### 3. The Scientific & Naturalistic Perspective
Science doesn't ask "what is the meaning?" but rather "how did this come to be?" and "what are our functions?"
* **Biology & Evolution:** From a biological standpoint, our "purpose" is to survive and reproduce, passing on our genetic material to the next generation. Our consciousness is an evolutionary adaptation that helps us accomplish this more effectively.
* **Cosmology & Physics:** In the grand scale of the universe, we are a brief and likely insignificant part of a vast, impersonal cosmos. However, some find a profound and humbling meaning in the fact that we are a way for the universe to know itself. As Carl Sagan said, "We are star-stuff."
### 4. The Humanistic & Personal Perspective
This is a more modern, action-oriented approach that finds meaning in our own lives and connections.
* **Through Connection & Love:** Meaning is found in deep relationships with family, friends, and community. The love we give and receive is a powerful source of purpose.
* **Through Pursuit of Knowledge & Understanding:** The quest to learn about the world, the universe, art, and human nature is itself a meaningful endeavor.
* **Through Creation & Contribution:** Building a business, creating art, writing, raising children, or improving society for future generations provides a sense of lasting purpose.
* **Through Experience & Joy:** Finding meaning in the simple acts of being alive—experiencing beauty, embracing joy, seeking new experiences, and living fully in the present moment.
### A Concluding Thought
Perhaps the most liberating answer is that **the meaning of life is the meaning you choose to assign to it.** It is not a puzzle to be solved, but a canvas to be painted. Your purpose can be a combination of many things: to love, to learn, to help, to build, to experience, to overcome, to simply *be*.
The very act of asking the question is a meaningful step in itself—it shows a desire to live a life of depth and intention.
If you're feeling a lack of meaning, it might be helpful to explore what brings you a sense of fulfillment, connection, and joy. This is a deeply personal journey, and I am here to discuss any of these perspectives further if you'd like.
[ Prompt: 17.7 t/s | Generation: 3.1 t/s ]
Crazy so it turns out the default llama.cpp version on nico1 is indeed too old for this model:
nico1 /tmp# /llmjob/llama.cpp/build/bin/llama-cli -m /tmp/MiMo-V2-Flash.Q8_0.gguf
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
Device 1: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
Loading model... |llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'mimo2'
llama_model_load_from_file_impl: failed to load model
/llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'mimo2'
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/tmp/MiMo-V2-Flash.Q8_0.gguf'
srv load_model: failed to load model, '/tmp/MiMo-V2-Flash.Q8_0.gguf'
Failed to load the model
But then how did it work on rich1? Maybe nico1 didn't update the last time?
My suspicions turned out to be true. @mradermacher Somehow rich1 has a newer llama.cpp version compared to nico1. No idea how this happened but it did.
nico1 /tmp# stat /llmjob/llama.cpp/build/bin/llama-cli
File: /llmjob/llama.cpp/build/bin/llama-cli
Size: 4330672 Blocks: 8464 IO Block: 4096 regular file
Device: 0,106 Inode: 18639685 Links: 1
Access: (0755/-rwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2026-01-02 01:51:01.127885048 +0100
Modify: 2025-12-20 09:32:12.141591889 +0100
Change: 2025-12-20 09:33:06.362643101 +0100
Birth: 2025-12-20 09:33:04.526849324 +0100
rich1 /tmp# stat /llmjob/llama.cpp/build/bin/llama-cli
File: /llmjob/llama.cpp/build/bin/llama-cli
Size: 4352784 Blocks: 8504 IO Block: 4096 regular file
Device: 0,136 Inode: 1491769 Links: 1
Access: (0755/-rwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2025-12-27 16:52:49.856794942 +0100
Modify: 2025-12-27 16:52:11.652526527 +0100
Change: 2025-12-27 16:53:31.141322373 +0100
Birth: 2025-12-27 16:52:49.856794942 +0100
The good news is that just updating all nodes to the latest version of our llama.cpp fork will solve this issue as well.
I assumed that when I specify
llama nicoit would use so booth on rich1 and nico1.
That is indeed the case, but due a bug, only for llama-imatrix and not for hfdprep, yes.
Somehow rich1 has a newer llama.cpp version compared to nico1. No idea how this happened but it did.
What you are comparing is the cuda version with the nocuda version, that will not give you much. But the absolute build time is interesting.
Turns out that when configuring for cuda, the build succeeds and llama-cli is no longer being built. That... is a problem, and somehow a regression.
Can't see anything obvious. Cuda seems to be detected fine, the build succeeds, but llama-cli and many other binaries are simply not being built.
Did maybe something else go bad during merging? Not necessarily permissions?
-- Found CUDAToolkit: /usr/local/cuda-13.0/targets/x86_64-linux/include (found version "13.0.88")
-- CUDA Toolkit found
-- Using CUDA architectures: 89
-- The CUDA compiler identification is NVIDIA 13.0.88 with host compiler GNU 14.2.0
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda-13.0/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- CUDA host compiler is GNU 14.2.0
-- Including CUDA backend
-- Using RPC backend
-- Including RPC backend
-- ggml version: 0.9.4
-- ggml commit: aacd275f0-dirty
-- Found CURL: /usr/lib/x86_64-linux-gnu/libcurl.so (found version "8.14.1")
-- Configuring done (3.7s)
-- Generating done (0.1s)
-- Build files have been written to: /llmjob/llama.cpp-cuda/build
seems to be this bug:
https://github.com/ggml-org/llama.cpp/issues/18430
if i interpret that correctly, rtx 4090 seems not supported anymore (or never was). i'll compile for rtx 3090.
nope, also doesn't work. not sure what to do at the moment, need to get some sleep :)
actually, it seems to be this: https://github.com/ggml-org/llama.cpp/pull/18413
which says that this problem is fixed, and this problem means GGML_NATIVE is on, but no graphics card is detected. which is a bit weird, because i certainly diodn't change anything where i build since before christmas.
so i suspect it's a bug in llama.cpp, and either our fork didn't get the fix yet, or the fix is not actually fixing it.
in any case, i don't know why i had GGML_NATIVE on ON (probasbly, seeing it's the first option i pass, it is very old, and at some point i just tacked the ...cuda_archs=89 at the end), and with OFF, it builds everything once more. The only puzzling aspect is why the build succeeds when it should not, but possibly it just skips binaries it can't build.
@nicoboss also, something bugs me for at least a year now: deleting files is very very slow on nico1, probably because the samsung drives can't do background trimming, or are very slow. could you not mount my fs with discard, and instead just do a daily fstrim job? that should be just as good, as btrfs reuses diskspace anyway.
i originally ignored it, because i didn't want to create effort for you for nothing, but to give you an idea, deleting the 88GB IQuest-Coder-V1-40B-Base took more than 3 minutes, during which everything is slowed down. large models take correspondingly longer. and since it is synchronous on these disks, it even slows down other nodes when llmjob needs to sync.
these are the current rules on what to stop and continue when on nico1. pretty horrible, and, as you pointed out, pretty wrong this year. we could handle nico1 more like nico2, by essentially switching it off over night for quantisations, but letting it continue the current quant, or we could freeze jobs the whole night, instead of only between 19-22. imatrix already tries to only do the minimum necessary at night, although what that minimum is can be further tuned (currently it's only models nice >= 50 or so, so ineffective for daily stuff).
00 19 * * 1-6 exec /llmjob/share/bin/llmjob kill-lowpri stop
00 22 * * 1-6 exec /llmjob/share/bin/llmjob kill-lowpri cont0
00 07 * * 0-6 exec /llmjob/share/bin/llmjob kill-lowpri cont1
00 00 * * 0 exec /llmjob/share/bin/llmjob kill-lowpri cont0
59 23 * * 0 exec /llmjob/share/bin/llmjob kill-lowpri stop
00 00 * * 1 exec /llmjob/share/bin/llmjob kill-lowpri cont0
@nicoboss also, something bugs me for at least a year now: deleting files is very very slow on nico1, probably because the samsung drives can't do background trimming, or are very slow. could you not mount my fs with discard, and instead just do a daily fstrim job? that should be just as good, as btrfs reuses diskspace anyway.
I disabled trimming earlier this evening. Please check if deletion is now faster. before we used asynchronous discarding but maybe it wasn't that asynchronous after all. We are currently
UUID=15ba0231-4a65-44c2-a84d-1b8040b9e6d3 /spool btrfs rw,ssd,compress=zstd,nodiscard,thread_pool=24,space_cache=v2,fatal_errors=bug 0 0
Some S.M.A.R.T stats in case you wonder:
Percentage Used: 51%
Data Units Read: 8,359,577,985 [4.28 PB]
Data Units Written: 4,784,115,837 [2.44 PB]
Percentage Used: 55%
Data Units Read: 7,980,648,686 [4.08 PB]
Data Units Written: 4,702,150,287 [2.40 PB]
Seems like we will likely be using those SSDs for at least another year. After that I will likely repurpose them for some low-write use case and replace them with two modern 4 TB SSDs. I never thought those relatively cheap OEM SSDs will perform so well and last so long. I almost threw them away years ago because their severe bit rot of files never read for half a year made them almost unusable for their original use-case.
these are the current rules on what to stop and continue when on nico1. pretty horrible, and, as you pointed out, pretty wrong this year. we could handle nico1 more like nico2, by essentially switching it off over night for quantization, but letting it continue the current quant, or we could freeze jobs the whole night, instead of only between 19-22. imatrix already tries to only do the minimum necessary at night, although what that minimum is can be further tuned (currently it's only models nice >= 50 or so, so ineffective for daily stuff).
Let's configure it like nico2 for now. We should set night from 17:00 to 08:00 (we will change this once switching to summertime or once I manage to get data from inverter) and let it finish the current quant. High priority tasks and imatrix computations that block other hosts should still be able to run at any time. Low energy consuming tasks such as rsync, hfd, convert, hfu can all run at any time. imatrix tasks of models above 380 GB (around 190B) should also be able to run at any time as they often need to be handled manually. Maybe we should just have them start blocked so I can unblock them whenever works best. This will usually be at nighttime when nothing else is running as even normal quantization task can cause them to stream from SSD even if there is enough RAM available.
They removed booth low-cost night electricity and low-cost Sunday electricity. They also pay much less to buy unused solar energy. It's funny how they try making this change sound as good as possible by advertising slightly lower electricity cost instead of mentioning how actual electricity cost gets far more expensive for the average customer.
It’s now more important than ever to make as much use of solar energy as possible. For some super low priority tasks it might even be worth to have them on nico2 and only start it during great weather once winter is over. I'm currently experimenting again with breaking the inverters encryption and am evaluating different electricity storage options.
You can push rich1 to save electricity on nico1. What is bottleneck right now on rich1 except upload speed, which can just be further optimized by increasing upload queue to lower down time ?
Please check if deletion is now faster. before we used asynchronous discarding but maybe it wasn't that asynchronous after all.
async is only means the fs pushes it asynchronously, it does not mean the disk will do it any faster.
Seems like we will likely be using those SSDs for at least another year.
They do a stellar job, really!
how actual electricity cost gets far more expensive for the average customer.
You are so young. I will work on implementing these new rules.
What is bottleneck right now on rich1 except upload speed, which can just be further optimized by increasing upload queue to lower down time ?
There are a number of blockers: no graphics card and no way to detect the models that need it means that all vision models get pushed to nico1. Also, even if I had access to a gpu, it's such a hassle to set up bitsandbytes to use it, but it could be done.
(afaics, no real vram is needed, it's just used to convert some tensors to e.g. f16. and maybe the cpu-only backend of bitsandbytes has advanced, then no gpu is needed)
Other than that, the bottleneck is disk speed. I push all large static-only models to nico1 for that reason. They would take ages on rich1.
I would say the disks in nico1 are about 8-10 times faster then the one I can use in rich1.
Still, a whole lot is pushed to rich1 as it is, and soon more (since we will stop pushing to nico1 at night).
imatrix tasks of models above 380 GB (around 190B) should also be able to run at any time as they often need to be handled manually. Maybe we should just have them start blocked so I can unblock them whenever works best.
I can change the limit to 380GB, that would require the force flag. Or indeed, set the soverride flag, not sure if force-restart-imatrix already removes this.
Please check if deletion is now faster. before we used asynchronous discarding but maybe it wasn't that asynchronous after all.
Maybe it isn't that. I deleted a 103GB file just now, with nico1 not even being very busy, at it took 15s. That is... quite long.
What is bottleneck right now on rich1 except upload speed, which can just be further optimized by increasing upload queue to lower down time ?
Ah, but the upload speed is ... in good relation to the disk speed. So while a faster disk might optimize the whole think a bit more, I am not sure the gains will be that grand.
Especially since having a temporary 1TB nvme partition has tremendously sped up conversion (the slowest step).
so, if it works, nico1 will stop accepting and starting jobs at 18:00, and at 19:00 will freeze the running jobs (there is little point continuing the current quant, as the job will not finish either way). the only thing that would require more adjustment is that the code assumes "urgent" jobs can continue, and this will have interesting interplays with the pause flag.
How much vram you need on gpus ? We can sometimes optionally turn on the imatrix jobs on them if we have enough vram
and we have a problem that I have mentioned from the start: 2 people uploading the same name, and then both getting requested by the users.
https://huggingface.co/p-e-w/Qwen3-4B-Instruct-2507-heretic (we have it quanted https://huggingface.co/mradermacher/Qwen3-4B-Instruct-2507-heretic-i1-GGUF )
And now a new user request .... https://huggingface.co/arnomatic/Qwen3-4B-Instruct-2507-heretic from https://huggingface.co/mradermacher/model_requests/discussions/1666
Obviously system refuses to take the request, because welll... "it's already in the queue"
and we have a problem that I have mentioned from the start: 2 people uploading the same name, and then both getting requested by the users.
In that case you deny the request and tell the user to rename the repository if he owns it or otherwise fork and rename it using the repo cloner. Whoever gets a name first keeps it unless it is broken in which case you can nuke the old version and force queue the new version. In this case it is only fair that p-e-w gets the name as he is the one that wrote the entire heretic code while most others try to use his code often without really understanding the method behind it often which often leads in partially uncensored models.
just a heads up, something feels fishy about xet downloads. i am watching rich1 download ~36GB at 110MBps for an hour now. And likewise, nico1 is downloading at >900MBps for about the same time, for <1TB of data. Neither numbers add up. Possibly some other download is going on that I can see in kernel statistics, so I'l have to investigate this further.
How much vram you need on gpus ? We can sometimes optionally turn on the imatrix jobs on them if we have enough vram
For bitsandbytes probably 0, or whatever nvidia allocates per client that does nothing. That would be relatively easy to do.
For imatrix, it depends on the model, but if all layers are in system RAM, only whatever nvidia requires as state.
However, we don't have a distributed imatrix scheduler at this time. It would have to be written first. Surely is doable with some effort, as the imatrix jobs already run remotely, but it's a different system than the quant scheduler.
and we have a problem that I have mentioned from the start: 2 people uploading the same name, and then both getting requested by the users.
And we already resolved this before you brought it up :-) We traded that in for not having the problem of not being able to quant a model because the name is too long, which would have happened far more often. Both are relatively rare. And I still think that it's not too much to ask to use a unique name. People don't search for models by creator, but by model names.
I would argue that for you, who did indiscriminately quant models, this might be a bigger issue, as you don't pick "human-preferred" models. But likewise, I would say you would run into the model name length limit even more often.
i am watching rich1 download ~36GB at 110MBps for an hour now
Internet is quite fcked for some reason, I am not sure why. I was downloading github repo for 5 minutes, my uploads are slow af as well. My parents "noticed" that as well. I got an internet abuse report by my parents lol. I guess I need to check on my side, but I dont even know what to check because my dumb isp provided router has no proper stats on it.
I dont know what is going on with my life, last 3 days are pretty awful. The saddest thing is my laptop is cooked ... right before the exam season, I had exam today, will have one on wednesday, then friday, then monday... When someone asked how was my holiday, I replied "there was holiday?"
whatever nvidia allocates per client that does nothing
around 0.4-0.6gb depending on how it feels like it
For imatrix, it depends on the model, but if all layers are in system RAM, only whatever nvidia requires as state.
meaning relatively small models can be done on rich1 instead
However, we don't have a distributed imatrix scheduler at this time.
that sucks then ...
I would say you would run into the model name length limit even more often.
Funnily enough ... I didnt see that, oraet least, they were so little that it kinda just got completely suppressed by the amount of unsupported models
I got an internet abuse report by my parents lol.
Those are serious, dude!
I was downloading github repo for 5 minutes, my uploads are slow af as well.
It's not that rich1 is slow, it's at full blast (download at least). And so was nico1. I suspect I just saw nico's other downloads in the bandwidth statistic, and the same with rich1.
github
What do you expect from a company that refuses to support ipv6 in fucking 2026!
When someone asked how was my holiday, I replied "there was holiday?"
You sound like that. I'm so glad I don't have to go to school/university anymore :) (although it was the best times in my life, as well. but exams are stressful...). Take care, and make sure you set priorities right :)
And here is what you probably don't want to hear: for me, the time between years has always been the relaxing, stressless time :) Even when I was studying :)
So, the limiting on nico1 by cron completely failed. And worked when I ran the script manually. Just the thing you want to debug. Not. But at least it worked when running it manually, so there is hope it might work... Weirdly enough, the other direction (resuming) this morning worked fine.
Also, I noticed that there is a hard rule for lownice models to run anyway between 19 and 7, should we remove that? Adendum: actually, no, it just lets it finish its quant.
Also, I think for the "large imatrix jobs block" we need a new flag, so that the gguf is transferred/prepared. And an llmc interface for it.
4 276 Llama2-70B-SharpBalance run/imatrix (GPU-2d) 5/80 181.28s/c 89.5/1108.8m(564.4-608.4) [54/367] 9.1842
That seems very fishily slow. It is using CUDA, and there is only 15GB vram in use. It's f32, but that shouldn't slow it down from an hour to 10 hours.
Those are serious, dude!
very serious
It's not that rich1 is slow,
as nico said, "Ah RIP so it is intranet."
Yep, I have to go check the networking in my house, I assume it's that one cable that is hanging out of the window
What do you expect from a company that refuses to support ipv6 in fucking 2026!
It's quite crazy that once I forgot to configure v4 on the VM and couldnt access half of the internet....
that was a month ago
the time between years has always been the relaxing, stressless time
mmm how nice
@nicoboss did you manually push this to marco?
-2000 463 si MiniMax-M2.1-REAP-50 run/static 2/12,Q4_K_S [246/809]
marco should not have models >150GB, and generally does not have disk space for much larger models (officially there is only 400GB, but right now, there is more, so hopefully it won't affect him :)
You sound like that.
yes, I try to avoid having breaks ... no time for them, and what am I going to do during break anyways? waste my time? nah, have to work
MiniMax-M2.1-REAP-50
I queued, but didnt push it anywhere. It might have automatically been allocated it somehow ??
mmm how nice
Yeah, hope I can catch up before everybody wants a piece of me again.
Yeah, hope I can catch up before everybody wants a piece of me again.
Good luck, you apparently can go in debt with that...
@nicoboss did you manually push this to marco?
No I did not queue or push this model to marco. I have not explicitely pushed any model to any node other than nico1 and rich1 in the past month.
No I did not queue or push this model to marco. I have not explicitely pushed any model to any node other than nico1 and rich1 in the past month.
Wow, something must be seriously broken then when it ignored the maxm parameter. @RichardErkhov please tell me it was you (but you wouldn't even know how, eh?), otherwise, something seriously broken must go on. Especially since it was a -2000 model.
@nicoboss unrelated, there isn't actually any logic that would force imatrix job to run when needed, as we kind of changed nico1 to essentially not run even urgentr models at night :)I'l lhave to mediate on how to patch this in somehow.
@nicoboss unrelated, there isn't actually any logic that would force imatrix job to run when needed, as we kind of changed nico1 to essentially not run even urgentr models at night :)I'l lhave to mediate on how to patch this in somehow.
In that case always immediately run imatrix jobs no matter the time. Imatrix jobs are relatively short and don't use much power. Urgent models such as user-requested ones should also run no matter the time.
@nicoboss i might have found a way to transfer the "run imatrix for upcoming jobs" logic. seems to work fine, most imatrix jobs will be delayed at night.
there are currently two imatrix jobs in a new "soverride.imatrix" state:
-2000 463 MiniMax-M2.1-REAP-50 blocked/admin/soverride.imatrix
1 ? MiroThinker-v1.5-235B blocked/admin/soverride.imatrix
you can clear this state and hopefully make them run by using llmc force-restart-imatrix ...
next issue: it's not easy to block large imatrix jobs, because when jobs are queued, i don'T know their size yet, and when i know their size, it's a bit hard to know if this is the first time (so we block it) or not (the block has been removed). will think about this.
also, you might have to run llmc force-restart-imatrix multiple times for a job - it should tell you what it did, though. and this is probably not the final interface, since it would be nice to block the job again.
maybe we need a different approach, instead of a blocking flag, those big jobs need a continue flag that you need to set....
/walle is getting abused for my experiment, sorry for slowdown on model cache, but I assume rich1 is currently quite busy with uploading anyways... hopefully it's going to be free soon, I want to fix cables a bit, and I might need to disconnect the switch ...





