Comments (3)
- HW414700 describes an early chip bug related to missing some SUE reporting. Prior to Nimbus 2.1 there was a chance of missing errors so with this setting applied it will force a checkstop in those cases.
- If you search in the code you can see that HW414700 is in a lot of places. It seems to affect more than just regular memory since I see it in some of the other initfiles too. In general it will cause more failures to checkstop versus properly failing with SUE/machinecheck.
Why are you interested in forcing checkstops for these kinds of errors? In general we would want the errors to flow upward into a possibly non-fatal machinecheck/SUE that the OS could handle accordingly. These systems are designed to avoid full system checkstops whenever possible.
from hostboot.
Thank you for your answer
- Nimbus 2.1 will cause the OS to be stuck during the DIMM RAS test, and the system serial port continues to report the error "Memory failure: 0x20000000: reserved kernel page still referenced by 1 users" for several hours. I think this is abnormal and unacceptable from the perspective of use
- Nimbus 2.2/2.3 does not have this phenomenon. After the DIMM RAS test, the checkstop is triggered and the corresponding DIMM is restarted
So I guess Nimbus 2.1 also has the bug of missing some SUE reporting. That's why I'm interested in "forcing checkstops for these kinds of errors"
from hostboot.
It seems unlikely that DD2.1 has the bug but it went unaddressed. However, that level of part is technically only supported as part of the https://github.com/ibm-op-release/op-build branch. It looks like you are trying to use our most current code level. There are all sorts of other settings that could be incorrect for DD2.1 if you are using master. It is possible that you are missing some other tangentially related behavior that the OS interacts with to properly handle the error. It does seem like the OS knows the memory is bad, which I think means that the initial chip bug was fixed since that was a case of not reporting the error at all (a silent failure).
from hostboot.
Related Issues (20)
- Power10 small core cpu checkstop HOT 4
- Can I map the memory to within 128TB HOT 2
- CPU is limited to wofbase frequency HOT 1
- The impact of MCC memory availability on stream performance: HOT 1
- Is the memory stream performance related to Signal integrity HOT 32
- Small core CPU report checkstop error when wof enabled HOT 6
- Does hostboot only read Lid files from running DIR? HOT 1
- About Simics HOT 2
- How can I trigger this function HOT 10
- Hostboot boot fail without inserting TPM card HOT 15
- checkstop: PC timebase Facility HOT 1
- About libconsole.so HOT 3
- Why do different L2 caches react differently to the same data in the "L2 Error Injection Register" after error injection? HOT 5
- About istep16.01 "host_activate_boot_core" HOT 1
- What is the purpose of calling the ”platCreateGardRecord“ function to create Ephemeral gard records for BMC system? HOT 2
- Is there a register on P10 that can directly adjust "write delay values for one of the MEMINTDnnB pins" HOT 2
- About intvect_system_reset_external HOT 2
- CDD_ChA_RR_3_2? CDD_ChB_RR_3_2 HOT 1
- HTMGT failed to reload occ image during runtime for P10. HOT 12
- Where is the variable "iv_occsStarted" set to true during the runtime for P10 opal? HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hostboot.