Coder Social home page Coder Social logo

open-power / hostboot Goto Github PK

View Code? Open in Web Editor NEW
74.0 74.0 97.0 201.27 MB

System initialization firmware for Power systems

License: Apache License 2.0

Makefile 0.48% Shell 0.20% Tcl 0.04% C 19.44% Perl 1.57% Python 1.97% C++ 76.19% Assembly 0.05% XSLT 0.01% Lex 0.01% Yacc 0.02% CMake 0.01% M4 0.01% Meson 0.02% Raku 0.01%

hostboot's People

Contributors

aamarin avatar anusrang avatar benatibm avatar brs332 avatar cmolsen avatar cnpalmer avatar crgeddes avatar cvswen avatar davidduyue avatar dcrowell77 avatar fenkes-ibm avatar ibmthi avatar ibmzach avatar jacobharvey avatar jjmcgill avatar mabaiocchi avatar mderkse1 avatar mraybuck avatar op-jenkins avatar prasrang avatar premsjha avatar rjknight avatar sannerd avatar sglancy6 avatar steffenchris avatar stermole avatar stillgs avatar velozr avatar wghoffa avatar zane131 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hostboot's Issues

All PRD syslog entries aren't understandable by humans

I use humans, plural, in the title as I believe there's only one that could make sense of this kind of log messages.

I think the following decodes to "Processor Runtime Diagnostics detected hardware failure", but it should be a lot more obvious to me (and an end user) as to what on earth this means.

Sep  3 00:18:25 YC01UNOS opal-prd: SCOM: read: chip 0x80000001, addr 0x201140c, val 0xc8c0001000000000
Sep  3 00:18:25 YC01UNOS opal-prd: SCOM: read: chip 0x0, addr 0x20118c0, val 0xc00000000000
Sep  3 00:18:25 YC01UNOS opal-prd: SCOM: read: chip 0x0, addr 0x20118c3, val 0x79163401a47d3c00
Sep  3 00:18:25 YC01UNOS opal-prd: SCOM: read: chip 0x0, addr 0x20118c6, val 0x9c00000000000
Sep  3 00:18:25 YC01UNOS opal-prd: SCOM: read: chip 0x0, addr 0x20118c7, val 0x8ee00a9018800000
Sep  3 00:18:25 YC01UNOS opal-prd: SCOM: read: chip 0x0, addr 0x20118c8, val 0xc00000000000
Sep  3 00:18:25 YC01UNOS opal-prd: SCOM: read: chip 0x0, addr 0x201189e, val 0x0
Sep  3 00:18:25 YC01UNOS opal-prd: SCOM: read: chip 0x0, addr 0x20118ce, val 0x0
Sep  3 00:18:25 YC01UNOS opal-prd: HBRT: PRDF:I>PRD Signature 00040001 FBF70000
Sep  3 00:18:25 YC01UNOS opal-prd: HBRT: PRDF:I>PRD Signature 00040001 FBF70000
Sep  3 00:18:25 YC01UNOS opal-prd: HBRT: PRDF:I>PRD Signature 00040001 FBF70001
Sep  3 00:18:25 YC01UNOS opal-prd: HBRT: PRDF:I>PRD Signature 00040001 FBF70004
Sep  3 00:18:25 YC01UNOS opal-prd: HBRT: PRDF:I>PRD Signature 00040001 FBF70008
Sep  3 00:18:25 YC01UNOS opal-prd: HBRT: PRDF:I>PRD Signature 00040001 FBF70009
Sep  3 00:18:25 YC01UNOS opal-prd: HBRT: PRDF:I>PRD Signature 00040001 FBF7001B
Sep  3 00:18:25 YC01UNOS opal-prd: HBRT: PRDF:I>PRD Signature 00040001 FBF7DD81
Sep  3 00:18:25 YC01UNOS opal-prd: HBRT: PRDF:I>PRD Signature 00040001 FBF70000
Sep  3 00:18:25 YC01UNOS opal-prd: SCOM: write: chip 0x0, addr 0x20118c1, val 0xfff63fffffffffff
Sep  3 00:18:25 YC01UNOS opal-prd: HBRT: ERRL:>>addHwCallout(0x00040001 0x1 0x0 0x0)
Sep  3 00:18:25 YC01UNOS opal-prd: HBRT: TARG:[TARG] E> Number of Parent chip is not 1, but 0
Sep  3 00:18:25 YC01UNOS opal-prd: HBRT: ERRL:E>ErrlEntry::collectTrace(): getBuffer(prdf) rets zero.
Sep  3 00:18:25 YC01UNOS opal-prd: HBRT: ERRL:commitErrLog() called by E500 for plid=0x8900082D,Reasoncode=E504
Sep  3 00:18:25 YC01UNOS opal-prd: HBRT: ERRL:I>Send an error log to hypervisor to commit. plid=0x8900082D
Sep  3 00:18:25 YC01UNOS opal-prd: HBRT: ERRL:>>saveErrLogToPnor eid=8900082d
Sep  3 00:18:25 YC01UNOS opal-prd: HBRT: ERRL:I>saveErrLogToPnor: INFORMATIONAL/RECOVERED log, skipping
Sep  3 00:18:25 YC01UNOS opal-prd: HBRT: ERRL:<<saveErrLogToPnor returning true
Sep  3 00:18:25 YC01UNOS opal-prd: HBRT: ERRL:I>Send msg to BMC for errlogId [0x8900082d]
Sep  3 00:18:25 YC01UNOS opal-prd: HBRT: ERRL:>>sendErrLogToBmc errlogId 0x8900082d, i_sendSels 1
Sep  3 00:18:25 YC01UNOS opal-prd: HBRT: ERRL:I>sendErrLogToBmc: 8900082D is INFORMATIONAL/RECOVERED; skipping
Sep  3 00:18:25 YC01UNOS opal-prd: HBRT: ERRL:<<sendErrLogToBmc
Sep  3 00:18:25 YC01UNOS opal-prd: HBRT: ERRL:<<sendToHypervisor()
Sep  3 00:18:25 YC01UNOS opal-prd: HBRT: PRDF:I>[ErrDataService::GenerateSrcPfa] PRD called to analyze an error: 0x00040001 0xfbf70000
Sep  3 00:18:25 YC01UNOS opal-prd: HBRT: PRDF:<<PRDF::main()
Sep  3 00:18:25 YC01UNOS opal-prd: SCOM: read: chip 0x0, addr 0x2000001, val 0x198000000000000

Error log parsing: procedure is in decimal but hex in code.

When we print out an error log the procedure callout is in decimal but the code uses hex for the enumeration numbers. Please review this and all other error log parsing to make sure we use the correct base.

Procedure 0x22 and decimal 22 both correspond to memory-related procedures so this is especially confusing.

BIOS hangs in a loop on Barreleye boot-up

I've seen couple of Barreleye servers running that failed the boot up process as they were stuck in loop.

Hconsole logs show that they go back to the start of loop when they hit Step 14. And this keeps repeating on and on.

In general is there a guide to decoding what the step info means ? Full log is attached.
BIOS hang.pdf

22.35508|ISTEP 13.11
22.46796|ISTEP 13.12
22.46877|ISTEP 14. 1
22.54008|ISTEP 14. 2
22.56077|ISTEP 14. 3
2.10687|ERRL|Dumping errors reported prior to registration
3.75598|ISTEP 6. 3
4.09279|ISTEP 6. 4
4.09330|ISTEP 6. 5
8.99220|HWAS|PRESENT> DIMM[03]=A0A0A0A0A0A0A0A0
8.99221|HWAS|PRESENT> Membuf[04]=CCCC000000000000
8.99221|HWAS|PRESENT> Proc[05]=C000000000000000

FSPCT: No centaur memory detected after cold reset

No centaur memory was detected after doing a cold reset of the BMC:

System Info:
BMC ip address= 9.41.164.80 name=habcap10.aus.stglabs.ibm.com

OS ip address=9.41.164.12 name=habcap10p1.aus.stglabs.ibm.com

Following is a list of commands that were executed and their outcome:

HOST OS: $ sudo /home/tyan/skiboot/external/xscom-utils/getscom -l
[sudo] password for tyan:

Chip ID Rev Chip type
80000005 DD2.0 Centaur memory buffer
80000004 DD2.0 Centaur memory buffer
80000001 DD2.0 Centaur memory buffer
80000000 DD2.0 Centaur memory buffer
00000000 DD2.0 P8 (Venice) processor
ipmitool -H habcap10 -I lanplus -U ${Userid} -P ${Password} mc cold reset

HOST OS: $ sudo /home/tyan/skiboot/external/xscom-utils/getscom -l
[sudo] password for tyan:

Chip ID Rev Chip type
00000000 DD2.0 P8 (Venice) processor <=========== No Centaur Memory

Also attached is an output obtained from the serial port, which shows that the BMC was unable to write/detect these memory.
centaur_memory_error


Following is the response from skiboot team:
This isn't a skiboot problem. HostBoot didn't forward the information about the Centaurs to us.

VPD Caching Issue with CVPD when SN/PN not programmed

On a recent system bringup, I was encountering several VPD errors related to CVPD. All the reads were resulting in errors stating the entire record/keyword was missing. This was a configuration wiith combined centaur + planar VPD on one chip. The issue was that when CVPD PNOR Caching was enabled, the code determined the cache was up to date and did not correctly populate the PNOR partition. I believe this was due to no PN/SN being programmed in the VPD at this point. I think we need to add an additional check to the logic determining whether to write the VPD cache or not taking into account an empty PN/SN scenario.

Hostboot broke op-build

This commit from November breaks building the Palmetto pnor 02353bc

Reverting it fixes the build.

###############################/scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/hostboot_build_images//buildpnor.pl --pnorOutBin /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/pnor/palmetto.pnor --pnorLayout /scratch/joel/op-build/output/build/openpower-pnor-40e407735b317d0174645b7543e0a72019709ce2/defaultPnorLayoutSingleSide.xml --binFile_HBD /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//PALMETTO_HB.targeting.bin.ecc --binFile_SBE /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//venice_sbe.img.ecc --binFile_SBEC /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//centaur_sbec_pad.img.ecc --binFile_WINK /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//p8.ref_image.hdr.bin.ecc --binFile_HBB /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//hostboot.header.bin.ecc --binFile_HBI /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//hostboot_extended.header.bin.ecc --binFile_HBRT /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//hostboot_runtime.header.bin.ecc --binFile_HBEL /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//hbel.bin.ecc --binFile_GUARD /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//guard.bin.ecc --binFile_PAYLOAD /scratch/joel/op-build/output/images/skiboot.lid --binFile_BOOTKERNEL /scratch/joel/op-build/output/images/zImage.epapr --binFile_NVRAM /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//nvram.bin.ecc --binFile_MVPD /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//mvpd_fill.bin.ecc --binFile_DJVPD /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//djvpd_fill.bin.ecc --binFile_CVPD /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//cvpd.bin.ecc --binFile_ATTR_TMP /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//attr_tmp.bin.ecc --binFile_ATTR_PERM /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//attr_perm.bin.ecc --binFile_OCC /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/occ/occ.bin.ecc --binFile_FIRDATA /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//firdata.bin.ecc --binFile_CAPP /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//cappucode.bin.ecc --binFile_VERSION /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_version/openpower-pnor.version.txt --fpartCmd "fpart" --fcpCmd "fcp"
TRACE:  main::loadPnorLayout: metadata: imageSize = 33554432, blockSize=4096, arrangement = A-D-B, numOfSides: 1, sideSize: 33554432, tocSize: 32768
TRACE: A-D-B: side:A HBB:32927744, primaryTOC:0, backupTOC:33521664, golden: no
TRACE: Done checkSpaceConstraints
TRACE: createPnorPartition:: 33521664
TRACE: createPnorImg:: 33521664
TRACE: createPnorPartition:: 0
TRACE: createPnorImg:: 0
TRACE: fpart --target /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/pnor/palmetto.pnor --partition-offset 33521664 --add --offset 33521664 --size 32768 --name BACKUP_PART --flags 0x0
fpart: unexpected : ../src/libffs.c(968) : (code=-1) 'BACKUP_PART' at offset 33521664 and size 32768 overlaps 'part' at offset 33521664 and size 4096
ERROR: Call to add partition BACKUP_PART failed. Aborting! at /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/hostboot_build_images//buildpnor.pl line 534.
Error running command: /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/hostboot_build_images//buildpnor.pl --pnorOutBin /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/pnor/palmetto.pnor --pnorLayout /scratch/joel/op-build/output/build/openpower-pnor-40e407735b317d0174645b7543e0a72019709ce2/defaultPnorLayoutSingleSide.xml --binFile_HBD /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//PALMETTO_HB.targeting.bin.ecc --binFile_SBE /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//venice_sbe.img.ecc --binFile_SBEC /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//centaur_sbec_pad.img.ecc --binFile_WINK /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//p8.ref_image.hdr.bin.ecc --binFile_HBB /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//hostboot.header.bin.ecc --binFile_HBI /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//hostboot_extended.header.bin.ecc --binFile_HBRT /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//hostboot_runtime.header.bin.ecc --binFile_HBEL /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//hbel.bin.ecc --binFile_GUARD /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//guard.bin.ecc --binFile_PAYLOAD /scratch/joel/op-build/output/images/skiboot.lid --binFile_BOOTKERNEL /scratch/joel/op-build/output/images/zImage.epapr --binFile_NVRAM /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//nvram.bin.ecc --binFile_MVPD /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//mvpd_fill.bin.ecc --binFile_DJVPD /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//djvpd_fill.bin.ecc --binFile_CVPD /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//cvpd.bin.ecc --binFile_ATTR_TMP /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//attr_tmp.bin.ecc --binFile_ATTR_PERM /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//attr_perm.bin.ecc --binFile_OCC /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/occ/occ.bin.ecc --binFile_FIRDATA /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//firdata.bin.ecc --binFile_CAPP /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//cappucode.bin.ecc --binFile_VERSION /scratch/joel/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_version/openpower-pnor.version.txt --fpartCmd "fpart" --fcpCmd "fcp". Nonzero return code of (256) returned.
package/pkg-generic.mk:267: recipe for target '/scratch/joel/op-build/output/build/openpower-pnor-40e407735b317d0174645b7543e0a72019709ce2/.stamp_images_installed' failed
make: *** [/scratch/joel/op-build/output/build/openpower-pnor-40e407735b317d0174645b7543e0a72019709ce2/.stamp_images_installed] Error 1
make: Leaving directory '/scratch/joel/op-build/buildroot'

FSI Errors Halting IPL when Inband Scom is disabled

On a system with the Centaur on the planar, it was discovered that FSI errors halt the IPL when inband scom is disabled. The istep that fails is 16.3 - mss_scrub(). Based on input from @dcrowell77 @sannerd it is believed that FSI errors from mss_power_cleanup():

220.75511|ISTEPS_TRACE|Running mss_power_cleanup HWP on Centaur HUID 00040001, MBA0 HUID 000D0002, MBA1 HUID 000D0003,
220.75512|FAPI|mss_power_cleanup.C: Running mss_power_cleanupon centaur k0:n0:s0:p05
220.75513|FAPI|mss_power_cleanup.C: Starting mss_power_cleanup_mba_part1
220.75513|FAPI|mss_power_cleanup.C: working on a centaur whose functional is 0
220.75514|FAPI|mss_power_cleanup.C: working on an mba whose functional is 0
220.75514|FAPI|mss_power_cleanup.C: Starting mss_power_cleanup_mba_part1
220.75515|FAPI|mss_power_cleanup.C: working on a centaur whose functional is 0
220.75515|FAPI|mss_power_cleanup.C: working on an mba whose functional is 0
220.75516|FAPI|mss_power_cleanup.C: Starting mss_power_cleanup_mba_fence
220.75516|FAPI|mss_power_cleanup.C: mba0 functional is 0
220.75516|FAPI|mss_power_cleanup.C: mba1 functional is 0
220.75630|FSI|FsiDD::handleOpbErrors> Error during FSI access to 00040001 : relAddr=0x1008, absAddr=00050000->071008, OPB Status 00020001=0x80108000, l_opbErrorMask=FFFC7C00

Aren't completely cleaned up. If inband Scom is disabled, this same FSI Master is used later in mss_scrub() and the following FSI errors are observed:

208.00289|INITSVC|>>doIstep: step 16, substep 3, task mss_scrub
208.00289|ISTEP 16. 3
208.01753|INITSVC|I>Progress Code 16.3 Sent
208.01755|MBOX|W>MSGSEND - mailboxsp is disabled. Message dropped! msgQ=0x80000008 type=0x40000011
208.00801|PRDF|>>[PRDF::startScrub]
208.00802|PRDF|>>PRDF::noLock_refresh()
208.00802|PRDF|<<PRDF::noLock_refresh()
208.00813|PRDF|>>[PRDF::startScrub] HUID=0x00020000
208.03624|FSI|FsiDD::checkForErrors> After op to 00040000, MAEB(3070)=80000000 (Master=00050000)
208.03625|ERRL|>>addProcedureCallout(0x2d, 0x6)
208.03626|ERRL|>>addHwCallout(0x00040000 0x1 0x2 0x0)
208.03626|TARG|[TARG] E> Number of Parent chip is not 1, but 0
208.03628|FSI|FSI::getFFDC>
208.03631|FSI|FSI::getFFDC>
208.03632|FSI|000031D8 = 180E106B
208.03633|FSI|000031DC = 04000000
208.03634|FSI|000031D0 = 90005095
208.03635|FSI|00003050 = 00000000
208.03636|FSI|00003070 = 80000000
208.03637|FSI|000031D4 = 98000001
208.03638|FSI|000030D0 = 00000000
208.03639|FSI|000030D4 = 00000000
208.03640|FSI|000030D8 = 00000000
208.03641|FSI|000030DC = 00000000
208.03642|FSI|000030E0 = 00000000
208.03643|FSI|000030E4 = 00000000
208.03644|FSI|000030E8 = 00000000
208.03645|FSI|000030EC = 00000000
208.03650|FSI|errorCleanup> 00050000->003430 = 00000000
208.03651|FSI|errorCleanup> 00050000->003030 = 00000000
208.03652|FSI|errorCleanup> 00050000->003470 = 00000000
208.03653|FSI|errorCleanup> 00050000->003070 = 00000000
208.03654|FSI|FsiDD::write> FSI Errors after doing write operation : 00040000->00001000
208.03654|FAPI_I|fapiReturnCode.C: setPlatError: Creating PLAT error 0x2000001
208.03656|FAPI_I|fapiHwAccess.C: fapiPutScom failed - Target centaur.mba k0:n0:s0:p01:c0 , Addr 0000000003010612
208.03656|FAPI_I|fapiPlatHwpInvoker.C: fapiRcToErrl: PLAT error: 0x02000001

However, when Inband Scom is enabled. A different FSI Master is used, so the leftover errors from the previous mss_power_cleanup() would not impact things. So I believe the focus here is to understand/resolve why FSI errors aren't cleaned up correctly on the FSI Master during mss_power_cleanup(). Attached is the full log from a run with the above fail.

122M1ABCD.TXT

git-describe invocation during op-build.

When doing an op-build, the Hostboot makefile still attempts to run 'git-describe' to determine the current tag to insert into the image as a version. We need to eliminate this invocation when doing an op-build and pick another sane description based on package variables. Jeremy did something similar for skiboot (see the openpower/buildroot commit).

System Halt during Barreleye bootup process

We have Barreleye server for which:

a) Host bootup process hangs and then restarts around 50 seconds or so after we power on.
b) After restarting (by itself in step above) bootup process gets further but finally hangs around where it's trying to pick up the kernel:
[8082470429,5] INIT: Starting kernel at 0x20010000, fdt at 0x305efb18 (size 0x1b72d)
host hangs loading skiboot kernel.pdf

Attaching the log, that should give you an idea so as to what is happening.

Cannot build with -Os

Once you convert -Werror to -Wno-error (as various legitimate warnings are emitted), later on:

exception caught: Duplicate weak symbol with contained value member detected: _ZN9SingletonI6PnorDDE8instanceEv with member: _ZGVZN9SingletonI6PnorDDE8instanceEvE8instance

Hostboot build tools utilities are built as 32-bit

Currently, the hostboot utils are built with -m32. For example:

g++ -m32  -c  -g -O0   -I ../../../../src/include/usr -I ../../../../obj/genfiles  -o ../../../../obj/modules/errl/parser/errlparser.o  errlparser.C

This means that 64-bit build machines need an entire 32-bit toolchain. We've worked-around this by requiring 32-bit libc and a multilib-capable compiler in the README, but it'd be nice not to require that.

Unused variable errors

When building with GCC 5, hostboot errors out due to unused variable warnings:

In file included from ../../../src/include/usr/targeting/common/predicates/predicateisfunctional.H:58:0,
                 from ../../../src/include/usr/targeting/common/predicates/predicates.H:34,
                 from ../../../src/include/usr/targeting/common/commontargeting.H:39,
                 from compdesc.C:31:
../../../src/include/usr/targeting/common/targetservice.H:137:22: error: ‘TARGETING::MASTER_PROCESSOR_CHIP_TARGET_SENTINEL’ defined but not used [-Werror=unused-variable]
 static Target* const MASTER_PROCESSOR_CHIP_TARGET_SENTINEL
                      ^

A workaround is to build with -Wno-unused-variable.

Non-present FRUs are being reported as faulted.

In our IPMI code for OpenBMC, we observed that the hostboot code is setting the fault bit in the FRU sensor for every non-present device. For DIMM slots that are not populated and for processors with partial-good records, this is probably not the best option. Can this be changed in the host firmware without breaking other BMC implementations?

See openbmc/openbmc#92.

Intermittent failures with duplicated enum elements in fapiHwpReturnCodes.H

I rarely get the following type of failures when compiling within Ozlab's CI:
05:06:47 In file included from ../../../../src/include/usr/hwpf/fapi/fapiReturnCode.H:78:0,
05:06:47 from ../../../../src/include/usr/hwpf/fapi/fapiUtil.H:63,
05:06:47 from fapiReturnCodeDataRef.C:47:
05:06:47 ../../../../obj/genfiles/fapiHwpReturnCodes.H:323:5: error: redefinition of 'RC_PROCPM_CHKSTOP'
05:06:47 RC_PROCPM_CHKSTOP = 0xd2d3e9,
05:06:47 ^
05:06:47 ../../../../obj/genfiles/fapiHwpReturnCodes.H:288:5: note: 'RC_PROCPM_CHKSTOP' previously defined here
05:06:47 RC_PROCPM_CHKSTOP = 0xd2d3e9,
05:06:47 ^
05:06:47 ../../../../obj/genfiles/fapiHwpReturnCodes.H:324:5: error: redefinition of 'RC_MSS_EFF_CONFIG_TERMINATION_INVALID_DIMM_RCD_IBT'
05:06:47 RC_MSS_EFF_CONFIG_TERMINATION_INVALID_DIMM_RCD_IBT = 0x0829ae,
05:06:47 ^
05:06:47 ../../../../obj/genfiles/fapiHwpReturnCodes.H:42:5: note: 'RC_MSS_EFF_CONFIG_TERMINATION_INVALID_DIMM_RCD_IBT' previously defined here
05:06:47 RC_MSS_EFF_CONFIG_TERMINATION_INVALID_DIMM_RCD_IBT = 0x0829ae,
....(many more)

I suspect that the code generation is triggering an issues in Perl (we are using Perl 5.20.2).

Sorting the output of keys() before emitting the enums seems to avoid triggering the bug, and also makes the generated output stable (the ordering of keys() is not guaranteed).
console.txt.zip
fapiHwpReturnCodes.H.zip
fapiHwpReturnCodes.H-sorted.zip

attributeOverride creates file aligned wrong for ECC.

If you create an attribute override file with 1 entry the resulting file has ECC added but is only 4096 bytes. When you flash this into the ATTR_TMP partition, it causes a PNOR UE to be reported. When the ECC option is enabled we should be aligning to 4608 bytes to compensate for the ECC bytes. (4096 * 9 / 8).

palmetto: Hostboot stuck in error loop at istep 21.1

On booting current op-build on palmetto (pass 2), I'm seeing hostboot get stuck at istep 21.1:

 70.54823|ISTEP 18.14
 70.58662|ISTEP 21. 1
 81.03411|================================================
 81.05850|Error reported by occc (0x2A00)
 81.05850|  <none>
 81.05850|  ModuleId   0x02 unknown
 81.05850|  ReasonCode 0x2a00 unknown
 81.05851|  UserData1  unknown : 0x0000000000000000
 81.05851|  UserData2  unknown : 0x000000000a000000
 81.05851|User Data Section 0, type UD
 81.05851|  Subsection type 0x0c
 81.05852|  ComponentId hb-trace (0x3100)
 81.05852|User Data Section 1, type UD
 81.05852|  Subsection type 0x00
 81.05852|  ComponentId occc (0x2a00)
 81.05852|User Data Section 2, type UD
 81.05853|  Subsection type 0x06
 81.05853|  ComponentId errl (0x0100)
 81.05853|  CALLOUT
 81.05853|  PROCEDURE ERROR
 81.05853|  Procedure: 85
 81.05854|User Data Section 3, type UD
 81.05854|  Subsection type 0x03
 81.05854|  ComponentId errl (0x0100)
 81.05854|User Data Section 4, type UD
 81.05854|  Subsection type 0x01
 81.05855|  ComponentId errl (0x0100)
 81.05855|  STRING
 81.05855|  Hostboot Build ID: 
 81.05855|================================================
  • this error repeats indefinitely.

This is a clean build, at op-build commit 5da6cb1.

VPD Caching broken on multiple socket, single CVPD systems (Barreleye)

Found a bug with the VPD caching:

In fsipres.C, hostboot loops through the all targets and processes VPD cache. If the CPU or Centaur is not present, it essentially invalidates the PNOR cache for that MVPD/CVPD. It does this based on FSI HW detect. If the CPU or Centaur is present, it will load VPD cache from hardware. Since there is only 1 physical VPD, if anything is not present, it will invalidate VPD Cache. This is a bug. Barreleye hit this bug because it is the only 2 socket system we have with a single VPD chip for all Centaurs.

Here is current (failing) sequence:

  1. CPU0/Centaurs -> present -> VPD cache is loaded
  2. CPU1/Centaurs -> not present -> VPD cache is invalidated
  3. VPD cache is needed but has been invalidated

@dcrowell77 and I brainstormed a couple of solutions --

  1. Simple hack to reverse order of presence detection from CPU X -> CPU 0. Not sure about any side effects, but this works because CPU 0 always has to be present

  2. The algorithm is using the FSI presence detect to qualify if VPD is even possible to read. In this case the singular copy of the VPD is behind the CPU (not the centaur). Fix would be to update the presence detect code for centaurs to take into account which target actually has the I2C master that reads the VPD and use that to determine if VPD is reachable. This is likely the proper fix, and what this issue was opened for.

op-build failure at building hostboot

Dear Hostboot Team,

I am trying to build op-build, but failed at building hostboot. I tried on two machines, one is trying to build op-build v1.3, the other is trying to build op-build v1.2.

They all have errors similar to the following:
make[8]: *** [../../../obj/modules/errl/errludcallout.o] Segmentation fault (core dumped)
make[8]: *** [../../../obj/modules/errl/errludcallout.o] Segmentation fault (core dumped)
make[8]: *** Deleting file ../../../obj/modules/errl/errludcallout.o' make[8]: *** Waiting for unfinished jobs.... LD libdevicefw_rt.so MAKE test CODE TESTGEN testdevicefw_rt.C DEP testdevicefw_rt.C CXX testdevicefw_rt.C LD libtestdevicefw_rt.so make[7]: *** [_BUILD/PASSES/CODE/BODY] Error 2 make[6]: *** [_BUILD/SUBDIR/CODE/errl] Error 2 make[5]: *** [_BUILD/PASSES/CODE/POST] Error 2 make[4]: *** [_BUILD/SUBDIR/CODE/usr] Error 2 make[3]: *** [_BUILD/PASSES/CODE/POST] Error 2 make[2]: *** [_BUILD/SUBDIR/CODE/src] Error 2 make[1]: *** [_BUILD/PASSES/CODE/POST] Error 2 make[1]: Leaving directory/home/weiwang/git/op-build/output/build/hostboot-70b5e31d74487d51e69a0e0a390adea6b4f32dc5'

What is interesting is, the machine that failed initially with v1.3, later was successfully built ( palmetto.pnor generated)! I did not do anything special other than retype "op-build".

However, on the other machine that I am building op-build v1.2, I still fail at this step.

By the way, could you also be so kind to explain how palmetto.pnor is supposed to be used? Does it need to be combined with occ package as occ readme mention
"
op-build occ-rebuild openpower-pnor-rebuild
"?

Thank you!

Wei Wang

Machine doesn't boot after software error

After getting the below xstop the machine stops booting. Admittedly the xstop is due to a bug in system software, but it shouldn't permanately stop the machine booting:

CPU Summary:

== Recoverables ==

== Checkstops ==
p8 k0:n0:s0:p00 EX9 | 0x19040000 (00) | XFIR_IN0
p8 k0:n0:s0:p00 EX9 | 0x19010C00 (01) | TLBIE_SW_ERR

== FIR Lookup ==
TLBIE_SW_ERR | TLBIE received illegal AP/LP field from core.;
XFIR_IN0 | summary bit(any xstop);

I get the following errors from Hostboot on reboot:

3.64150|ISTEP 6. 3
3.99286|ISTEP 6. 4
3.99388|ISTEP 6. 5
4.44992|HWAS|PRESENT> DIMM[03]=00AA000000000000
4.44993|HWAS|PRESENT> Membuf[04]=4000000000000000
4.44993|HWAS|PRESENT> Proc[05]=8000000000000000
7.61691|ISTEP 6. 6
9.18410|================================================
9.18410|Error reported by unknown (0xE500)
9.18410|
9.18410| ModuleId 0x0b unknown
9.18411| ReasonCode 0xe540 unknown
9.18411| UserData1 unknown : 0x0006000600000101
9.18411| UserData2 unknown : 0x7e0d000100000000
9.18412|User Data Section 0, type UD
9.18412| Subsection type 0x04
9.18412| ComponentId errl (0x0100)
9.18412|User Data Section 1, type UD
9.18412| Subsection type 0x06
9.18413| ComponentId errl (0x0100)
9.18413| CALLOUT
9.18413| HW CALLOUT
9.18413| Reporting CPU ID: 75
9.18413| Called out entity:
9.18414|User Data Section 2, type UD
9.18414| Subsection type 0x06
9.18414| ComponentId errl (0x0100)
9.18414| CALLOUT
9.18414| PROCEDURE ERROR
9.18415| Procedure: 16
9.18415|User Data Section 3, type UD
9.18415| Subsection type 0x33
9.18415| ComponentId unknown (0xe500)
9.18415|User Data Section 4, type UD
9.18416| Subsection type 0x01
9.18416| ComponentId unknown (0xe500)
9.18416| STRING
9.18416|
9.18416|User Data Section 5, type UD
9.18417| Subsection type 0x0c
9.18417| ComponentId hb-trace (0x3100)
9.18417|User Data Section 6, type UD
9.18417| Subsection type 0x03
9.18418| ComponentId errl (0x0100)
9.18418|User Data Section 7, type UD
9.18418| Subsection type 0x01
9.18418| ComponentId errl (0x0100)
9.18418| STRING
9.18419| Hostboot Build ID: hostboot-70b5e31-opdirty-71b88d7/hbicore.bin/cyril
9.18419|================================================
9.54200|================================================
9.54201|Error reported by initservice (0x0500)
9.55198| Deconfigured occurred during an istep. The reconfig loop was not performed by Hostboot because either the Istep is outside the reconfig loop (desired steps 0), too many reconfig loops were attempted, in manufacturing mode or in istep mode.
9.55198| ModuleId 0x03 ISTEP_INITSVC_MOD_ID
9.55198| ReasonCode 0x050a ISTEP_FAILED_DUE_TO_DECONFIG
9.55199| UserData1 Istep that failed : 0x0000000600000006
9.55199| UserData2 SubStep that failed : 0x0000000000000000
9.55199|User Data Section 0, type UD
9.55199| Subsection type 0x0c
9.55200| ComponentId hb-trace (0x3100)
9.55200|User Data Section 1, type UD
9.55200| Subsection type 0x0c
9.55200| ComponentId hb-trace (0x3100)
9.55201|User Data Section 2, type UD
9.55201| Subsection type 0x06
9.55201| ComponentId errl (0x0100)
9.55201| CALLOUT
9.55201| PROCEDURE ERROR
9.55202| Procedure: 1
9.55202|User Data Section 3, type UD
9.55202| Subsection type 0x01
9.55202| ComponentId errl (0x0100)
9.55202| STRING
9.55203| libistepdisp.so
9.55203|User Data Section 4, type UD
9.55203| Subsection type 0x03
9.55203| ComponentId errl (0x0100)
9.55203|User Data Section 5, type UD
9.55204| Subsection type 0x01
9.55204| ComponentId errl (0x0100)
9.55204| STRING
9.55205| Hostboot Build ID: hostboot-70b5e31-opdirty-71b88d7/hbicore.bin/cyril
9.55205|================================================

And on subsequent reboots:
1.91500|ERRL|Dumping errors reported prior to registration
2.28838|IPMI: power cycle requested
2.69033|IPMI: shutdown complete
1.91555|ERRL|Dumping errors reported prior to registration
3.75812|ISTEP 6. 3
4.10948|ISTEP 6. 4
4.11048|ISTEP 6. 5
4.56734|HWAS|PRESENT> DIMM[03]=00AA000000000000
4.56734|HWAS|PRESENT> Membuf[04]=4000000000000000
4.56734|HWAS|PRESENT> Proc[05]=8000000000000000
7.93459|ISTEP 6. 6
8.17292|================================================
8.17292|Error reported by hwas (0x0C00)
8.18265| HWAS host_gard: no masterCore found
8.18265| ModuleId 0x81 MOD_HOST_GARD
8.18265| ReasonCode 0x0c84 RC_MASTER_CORE_NULL
8.18265| UserData1 0 : 0x0000000000000000
8.18266| UserData2 0 : 0x0000000000000000
8.18266|User Data Section 0, type UD
8.18266| Subsection type 0x06
8.18266| ComponentId errl (0x0100)
8.18267| CALLOUT
8.18267| PROCEDURE ERROR
8.18267| Procedure: 85
8.18267|User Data Section 1, type UD
8.18268| Subsection type 0x01
8.18268| ComponentId errl (0x0100)
8.18268| STRING
8.18268| host_gard
8.18269|User Data Section 2, type UD
8.18269| Subsection type 0x03
8.18269| ComponentId errl (0x0100)
8.18269|User Data Section 3, type UD
8.18270| Subsection type 0x01
8.18270| ComponentId errl (0x0100)
8.18270| STRING
8.18271| Hostboot Build ID: hostboot-70b5e31-opdirty-71b88d7/hbicore.bin/cyril
8.18271|================================================

Add "IPMI shutdown requested" to non-debug console

My Palmetto was being told by the BMC over IPMI to shut down, but until I put a debug build of hostboot on there I had no idea that was the case.

We should display something so the non-debug console knows what's going on.

  5.38149|IPMI|rp: queuing sync 18:6
  6.67555|IPMI|rp: timeout: 18:36
  6.67555|IPMI|rp: get_capabilities not ok 195, using defaults
  6.67563|IPMI|dd: I>write ok 18:35 seq 2 len 0
  6.69803|IPMI|dd: I>read b2h ok 1c:35 seq 2 len 10 cc 0
  6.69804|IPMI|rp: queuing event 3a:4 for handler
  6.69807|IPMI|rp: Graceful shutdown request recieved
  6.69809|INITSVC|doShutdown(i_status=0000000001230000)
  6.69809|INITSVC|doShutdown> status=0000000001230000
  6.69810|IPMI|dd: I>write ok 18:6 seq 3 len 2
  6.69836|IPMI|rp: I>MSG_STATE_GRACEFUL_SHUTDOWN: send power off command to BMC
  6.79797|IPMI|dd: I>read b2h ok 1c:6 seq 3 len 0 cc 0
  6.79804|

Doesn't build with GCC5

I'm seeing build failures on GCC5.3 when building hostboot, several unused variable warnings. This seems to be because of how Hostboot code defines some constants.

The below patch is a quick stab at solving it, but this just gets hostboot to compile, booting does not work (checkstop) - this is probably due to me being very rusty on C++ linkage for constants and not entirely sure how hostboot links itself at runtime that may cause problems.

--- a/src/include/usr/targeting/attrPlatOverride.H
+++ b/src/include/usr/targeting/attrPlatOverride.H
@@ -52,13 +52,7 @@ struct AttrOverrideSection
  *        enums and make it obvious which layers map to what PNOR section.
  *        Currrently the pair is only used in a test case to keep it in order
  */
-const std::pair
-    tankLayerToPnor[AttributeTank::TANK_LAYER_LAST] =
-    {
-        std::make_pair(AttributeTank::TANK_LAYER_FAPI, PNOR::ATTR_TMP),
-        std::make_pair(AttributeTank::TANK_LAYER_TARG, PNOR::ATTR_TMP),
-        std::make_pair(AttributeTank::TANK_LAYER_PERM, PNOR::ATTR_PERM)
-    };
+extern const std::pair tankLayerToPn
or[AttributeTank::TANK_LAYER_LAST];
  
 /**
  * @brief This function gets any Attribute Overrides in PNOR
diff --git a/src/include/usr/targeting/common/targetservice.H b/src/include/usr/
targeting/common/targetservice.H
index 828fe86f3e51..8603f5174743 100644
--- a/src/include/usr/targeting/common/targetservice.H
+++ b/src/include/usr/targeting/common/targetservice.H
@@ -134,10 +134,7 @@ namespace TARGETING
  *      DD framework to bring PNOR device driver online.  Note this target
  *      cannot be used as input to any target service APIs.
  */
-static Target* const MASTER_PROCESSOR_CHIP_TARGET_SENTINEL
-    = (sizeof(void*) == 4) ?
-        reinterpret_cast(0xFFFFFFFF)
-      : reinterpret_cast(0xFFFFFFFFFFFFFFFFULL);
+extern Target* const MASTER_PROCESSOR_CHIP_TARGET_SENTINEL;
 
 /**
  *  @brief TargetService class
diff --git a/src/usr/i2c/i2c.C b/src/usr/i2c/i2c.C
index 2afd7730c4ba..0261dda6db9b 100755
--- a/src/usr/i2c/i2c.C
+++ b/src/usr/i2c/i2c.C
@@ -89,6 +89,7 @@ const TARGETING::ATTR_I2C_BUS_SPEED_ARRAY_type g_var = {{NULL}};
 
 namespace I2C
 {
+uint64_t g_I2C_NEST_FREQ_MHZ = i2cGetNestFreq();
 
 // Register the generic I2C perform Op with the routing code for Procs.
 DEVICE_REGISTER_ROUTE( DeviceFW::WILDCARD,
diff --git a/src/usr/i2c/i2c.H b/src/usr/i2c/i2c.H
index 05f11d5514d6..0b15c5a096d7 100755
--- a/src/usr/i2c/i2c.H
+++ b/src/usr/i2c/i2c.H
@@ -88,7 +88,7 @@ ALWAYS_INLINE inline uint64_t i2cGetNestFreq()
 
   return sysTarget->getAttr();
 };
-static uint64_t g_I2C_NEST_FREQ_MHZ = i2cGetNestFreq();
+extern uint64_t g_I2C_NEST_FREQ_MHZ;
 
 
 /**
diff --git a/src/usr/targeting/attrPlatOverride.C b/src/usr/targeting/attrPlatOverride.C
index 9368c4af1378..4edac1fa339a 100644
--- a/src/usr/targeting/attrPlatOverride.C
+++ b/src/usr/targeting/attrPlatOverride.C
@@ -30,6 +30,13 @@
 
 namespace TARGETING
 {
+const std::pair
+    tankLayerToPnor[AttributeTank::TANK_LAYER_LAST] =
+    {
+        std::make_pair(AttributeTank::TANK_LAYER_FAPI, PNOR::ATTR_TMP),
+        std::make_pair(AttributeTank::TANK_LAYER_TARG, PNOR::ATTR_TMP),
+        std::make_pair(AttributeTank::TANK_LAYER_PERM, PNOR::ATTR_PERM)
+    };
 
 errlHndl_t getAttrOverrides(PNOR::SectionInfo_t &i_sectionInfo,
                       AttributeTank* io_tanks[AttributeTank::TANK_LAYER_LAST])
diff --git a/src/usr/targeting/common/targetservice.C b/src/usr/targeting/common/targetservice.C
index cd3870dedced..2822b47080e3 100644
--- a/src/usr/targeting/common/targetservice.C
+++ b/src/usr/targeting/common/targetservice.C
@@ -61,7 +61,7 @@
 
 namespace TARGETING
 {
-
+Target* const MASTER_PROCESSOR_CHIP_TARGET_SENTINEL = reinterpret_cast(~0);
 #define TARG_NAMESPACE "TARGETING::"
 
 #define TARG_CLASS "targetService"

about XSCOM

Hi williamspatrick,
I boot op-build1.7, And there is an error message on IPL.
Can you give me some advise?thanks.
What's the userdata1 mean?Can you give me some matirial of xscom? I don't the function and theory of xscom.
UserData1 HMER value (piberr in bits 21:23) : 0x00c0040000000000

In spec bits 21:23:
HMER0_XSCOM_STATUS: 000 = no error
001 = XSCOM blocked due to pending error
010 = chiplet offline
011 = partial good
100 = invalid address / address error
101 = clock error
110 = parity err
111 = time out

110.60816|================================================
110.64730|Error reported by xscom (0x0400)
110.65466| XSCom access error
110.65467| ModuleId 0x07 XSCOM_DO_OP
110.65467| ReasonCode 0x0401 XSCOM_STATUS_ERR
110.65467| UserData1 HMER value (piberr in bits 21:23) : 0x00c0040000000000
110.65468| UserData2 XSCom address : 0x00000000006b0328
110.65468|User Data Section 0, type UD
110.65468| Subsection type 0x05
110.65468| ComponentId errl (0x0100)
110.65469|User Data Section 1, type UD
110.65469| Subsection type 0x15
110.65469| ComponentId hb-trace (0x3100)
110.65469|User Data Section 2, type UD
110.65470| Subsection type 0x04
110.65470| ComponentId errl (0x0100)
110.65470|User Data Section 3, type UD
110.65470| Subsection type 0x06
110.65471| ComponentId errl (0x0100)
110.65471| CALLOUT
110.65471| HW CALLOUT
110.65471| Reporting CPU ID: 8
110.65471| Called out entity:
110.65472|User Data Section 4, type UD
110.65472| Subsection type 0x06
110.65472| ComponentId errl (0x0100)
110.65472| CALLOUT
110.65472| PROCEDURE ERROR
110.65473| Procedure: 85
110.65473|User Data Section 5, type UD
110.65473| Subsection type 0x15
110.65473| ComponentId hb-trace (0x3100)
110.65474|User Data Section 6, type UD
110.65474| Subsection type 0x15
110.65474| ComponentId hb-trace (0x3100)
110.65474|User Data Section 7, type UD
110.65475| Subsection type 0x15
110.65475| ComponentId hb-trace (0x3100)
110.65475|User Data Section 8, type UD
110.65475| Subsection type 0x03
110.65476| ComponentId errl (0x0100)
110.65476|User Data Section 9, type UD
110.65476| Subsection type 0x01
110.65476| ComponentId errl (0x0100)
110.65476| STRING
110.65477| Hostboot Build ID:
110.65477|User Data Section 10, type UD
110.65477| Subsection type 0x04
110.65477| ComponentId errl (0x0100)
110.65478|================================================

System always shows "dumping errors" message

I have a Plametto machine and it appears that every time I boot I see this on the console:

  2.16642|ERRL|Dumping errors reported prior to registration

Why does this happen?

Is there something I can do to fix it?

Thanks!

Document procedure callouts better.

The procedure callouts are not documented at all other than an enumeration name. Typically IBM uses these as part of our system support documentation. It would be useful for OpenPower partners to create similar documentation for their boards and systems, so we need to document them better in Hostboot.

Document attributeOverride tool.

Attribute override text file format is not documented in a publically available document. Please create a readme under Hostboot and/or open-power/docs.

Uncorrectable ECC error after user initiated shutdown

Reference https://github.com/open-power/tyan-openpower/issues/49 for history

I was able to reproduce this pretty easily

** Booted, waited until I saw "PNOR_PN/SN" traces in the console, then ipmi powered off
target 00040001
  6.43127|VPD|VPD::ensureCacheIsInSync: PNOR_PN/SN = SEEPROM_PN/SN for target 0003000A
  6.46956|VPD|VPD::ensureCacheIsInSync: PNOR_PN/SN = SEEPROM_PN/SN for target 00040001
  6.47407|VPD|VPD::ensureCacheIsInSync: PNOR_PN/SN = SEEPROM_PN/SN for target 0003000B
  for target 00040001
  6.55746|VPD|VPD::ensureCacheIsInSync: PNOR_PN/SN = SEEPROM_PN/SN for target 0003000D
   6.59989|VPD|VPD::ensureCacheIsInSync: PNOR_PN/SN = SEEPROM_PN/SN for target 0003000E
IsInSync: PNOR_PN/SN = SEEPROM_PN/SN for target 0003000F
  6.65167|VPD|>IpVpdFacade::findRecordOffsetPnor: No matching Record (VINI) found in TOC!
  6.65168|ERRL|>>addHwCallout(0x00040002 0x6 0x1 0x0)
  6.65168|ERRL|>>addProcedureCallout(0x4, 0x5)
  6.65169|ERRL|>>addProcedureCallout(0x55, 0x1)
  6.66640|VPD|E>IpVpdFacade::findRecordOffsetPnor: No matching Record (VINI) found in TOC!
  6.66640|ERRL|>>addHwCallout(0x00040002 0x6 0x1 0x0)
  6.66641|ERRL|>>addProcedureCallout(0x4, 0x5)
  6.66641|ERRL|>>addProcedureCallout(0x55, 0x1)
  6.66643|VPD|VPD::ensureCacheIsInSync: PNOR_PN/SN != SEEPROM_PN/SN, Loading PNOR from SEEPROM for target 00040002
  6.67729|PNOR|>>PnorDD::writeFlash(i_address=0x172000)>
  6.79018|PNOR|<<PnorDD::
  6.79864|VPD|E>IpVpdFacade::findKeywordAddr: No matching PT keyword found within VTOC record!
  6.79865|ERRL|>>addHwCallout(0x00040002 0x6 0x1 0x0)
  6.79865|ERRL|>>addProcedureCallout(0x4, 0x5)
  6.79866|ERRL|>>addProcedureCallout(0x55, 0x1)
0NR|>>PnorDD::writeFlashss01  7.1176ess=0x173200)> io_buflen=00001200
7.14732|VPD|VPD::ensureCacheIsInSync: PNOR_PN/SN = SEEPROM_PN/SN for target 00030010
 00 for target 00030011
  7.23656|VPD|VPD::ensureCacheIsInSync: PNOR_PN/SN = SEEPROM_PN/SN for target 00040002
  7.24078|VPD|VPD::ensureCacheIsInSync: PNOR_PN/SN = SEEPROM_PN/SN for target 00030012
  7.27902|VPD|VPD::ensureCacheIsInSync: PNOR_PN/SN = SEEPROM_PN/SN for target 00040002
  7.28192|VPD|VPD::ensureCacheIsInSync: PNOR_PN/SN = SEEPROM_PN/SN  7.3209nsureCacheIsInSync: PNOR_PN/SN = SEEPROM_PN/SN for target 00030014
  7.36327|VPD|VPD::ensureCacheIsInSync: PNOR_PN/SN = SEEPROM_PN/SN for target 00040002
  7.36622|VPD|VPD::ensureCacheIsInSync: PNOR_PN/SN = SEEPROM_PN/SN for target 00030015
::ensureCacheIsInSync: PNOR_PN/SN = SEEPROM_PN/SN for target 00030016
  7.44855|VPDreCacheIsInSync: PNOR_PN/SN = SEEPROM_PN/SN for target 00040002
  7.45157|VPD|VPD::ensureCacheIsInSync: PNOR_PN/SN = SEEPROM_PN/SN for target 00030017
  7.46169|VPD|E>IpVpdFacade::findRecordOffsetPnor: No matching Record (VINI) found in TOC!
  7.46170|ERRL|>>addHwCallout(0x00040003 0x6 0x1 0x0)
  7.46171|ERRL|>>addProcedureCallout(0x4, 0x5)
  7.46171|ERRL|>>addProcedureCallout(0x55, 0x1)
  7.47164|VPD|E>IpVpdFacade::findRecordOffsetPnor: No matching Record (VINI) found in TOC!
  7.47165|ERRL|>>addHwCallout(0x00040003 0x6 0x1 0x0)
  7.47165|ERRL|>>addProcedureCallout(0x4PROM for target 00040003
  7.48210|PNOR|>>PnorDD::writeFlash(i_address=0x174400)>
  7.59941|PNOR|<PnorDD::writeFlash(i_a io_buflen=00001200
  7.61239|VPD|E>IpVpdFacade::findKeywordAddrut(0x4, 0x5)
  7.61241|ERRL|>>addProcedureCallout(0x55, 0x1)
n::d44

Then
[bofferdn@gfw161 ~]$ $IPMI chassis power off
[bofferdn@gfw161 ~]$   $IPMI raw 0x04 0x30 0x50 0x01 0x00 0x02 0 0 0 0 0   // (make sure boot count 2)
[bofferdn@gfw161 ~]$ $IPMI chassis power on

After booting, it came back, fails with signature we're familiar with:
 7.29852|PNOR|PnorRP::readFromDevice> Uncorrectable ECC error : chip=0,offset=0x173200
The failing address is somewhat near the addresses being played with during the initial power interruption.  I think that help bolster our theory that pnor pages are being wiped without ECC, and power off request is intervening before ECC can be re-written. 
It's more concerning because this was a user initiated action

Patrick noted "The close time proximity and the fact that there are no traces between the 'doshutdown' call and the IPMI-RP trace indicate to me that there is a bug in this shutdown path where the IPMI-RP is sending the shutdown message to the BMC in the first callback instead of the second.

There is one shutdown callback to the IPMI-RP that is to tell it "stop accepting new messages and flush your current ones". Then we do the memory flush. Then there is a second callback to tell the IPMI-RP "you are ok to ack the BMC now for that previous shutdown request."

[FATAL!] Could not find enumeration with ID of PCI_BASE_ADDRS_64

Hi,
I am doing op-build for habanero on Fedora 23 distro
After fixing various perl missing packages like XML::Simple, XML::Parser, XML::SAX, type could make hostboot build to proceed.
After that as it started building at openpower-mrw, it failed with following error

reating XML: /home/nayna/Project/Code/16022016next/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_mrw_scratch/HABANERO_hb.mrw.xml
MRW created successfully!

merge in any system specific attributes, hostboot attributes

/home/nayna/Project/Code/16022016next/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/hostboot_build_images/mergexml.sh /home/nayna/Project/Code/16022016next/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_mrw_scratch/"HABANERO_hb.system.xml" /home/nayna/Project/Code/16022016next/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/hostboot_build_images/attribute_types.xml /home/nayna/Project/Code/16022016next/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/hostboot_build_images/attribute_types_hb.xml /home/nayna/Project/Code/16022016next/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/hostboot_build_images/target_types_merged.xml /home/nayna/Project/Code/16022016next/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/hostboot_build_images/target_types_hb.xml /home/nayna/Project/Code/16022016next/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_mrw_scratch/"HABANERO_hb.mrw.xml" > /home/nayna/Project/Code/16022016next/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/hostboot_build_images/temporary_hb.hb.xml;

creating the targeting binary

/home/nayna/Project/Code/16022016next/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/hostboot_build_images/xmltohb.pl --hb-xml-file=/home/nayna/Project/Code/16022016next/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/hostboot_build_images/temporary_hb.hb.xml --fapi-attributes-xml-file=/home/nayna/Project/Code/16022016next/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/hostboot_build_images/fapiattrs.xml --src-output-dir=none --img-output-dir=/home/nayna/Project/Code/16022016next/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/hostboot_build_images/ --vmm-consts-file=/home/nayna/Project/Code/16022016next/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/hostboot_build_images/vmmconst.h --noshort-enums --bios-xml-file=/home/nayna/Project/Code/16022016next/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_mrw_scratch/"HABANERO_bios.xml" --bios-schema-file=/home/nayna/Project/Code/16022016next/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/hostboot_build_images/bios.xsd --bios-output-file=/home/nayna/Project/Code/16022016next/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/hostboot_build_images/"habanero"_bios_metadata.xml
Warning: XML::LibXML compiled against libxml2 20903, but runtime libxml2 is older 20902
[FATAL!] Could not find enumeration with ID of PCI_BASE_ADDRS_64

 1: main::getEnumerationType(1222)
 2: main::packSingleSimpleTypeAttribute(4673)
 3: main::packAttribute(4746)
 4: main::generateTargetingImage(5453)
 5: (369)

hostboot fails to build in op-build with Fedora 20

I have a problem when building hostboot from op-build on Fedora 20

I get these errors, is it a problem in hostboot or in the build environment?
...
In file included from ../../../../obj/modules/testhwas/testhwas.C:18:0:
./hwasGardTest.H:48:28: error: ‘TargetInfo’ does not name a type
bool compareAffinity(const TargetInfo t1, const TargetInfo t2)
^
./hwasGardTest.H:48:39: error: ISO C++ forbids declaration of ‘t1’ with no type [-fpermissive]
bool compareAffinity(const TargetInfo t1, const TargetInfo t2)
^
./hwasGardTest.H:48:49: error: ‘TargetInfo’ does not name a type
bool compareAffinity(const TargetInfo t1, const TargetInfo t2)
^
./hwasGardTest.H:48:60: error: ISO C++ forbids declaration of ‘t2’ with no type [-fpermissive]
bool compareAffinity(const TargetInfo t1, const TargetInfo t2)
^
./hwasGardTest.H: In function ‘bool compareAffinity(int, int)’:
./hwasGardTest.H:50:19: error: request for member ‘affinityPath’ in ‘t1’, which is of non-class type ‘const int’
return t1.affinityPath < t2.affinityPath;
^
./hwasGardTest.H:50:37: error: request for member ‘affinityPath’ in ‘t2’, which is of non-class type ‘const int’
return t1.affinityPath < t2.affinityPath;
^
./hwasGardTest.H: In member function ‘void HwasGardTest::testdeconfigPresentByAssoc1()’:
./hwasGardTest.H:3388:9: error: ‘TargetInfoVector’ was not declared in this scope
TargetInfoVector l_targets;
^
./hwasGardTest.H:3388:26: error: expected ‘;’ before ‘l_targets’
TargetInfoVector l_targets;
^
./hwasGardTest.H:3389:26: error: expected ‘;’ before ‘l_targToDeconfig’
TargetInfoVector l_targToDeconfig;
^
./hwasGardTest.H:3390:9: error: ‘TargetInfo’ was not declared in this scope
TargetInfo l_TargetInfo;
^
./hwasGardTest.H:3390:20: error: expected ‘;’ before ‘l_TargetInfo’
TargetInfo l_TargetInfo;
^
./hwasGardTest.H:3392:9: error: ‘l_TargetInfo’ was not declared in this scope
l_TargetInfo.pThisTarget = NULL;
^
./hwasGardTest.H:3400:9: error: ‘l_targets’ was not declared in this scope
l_targets.push_back(l_TargetInfo);
^
./hwasGardTest.H:3402:35: error: ‘l_targToDeconfig’ was not declared in this scope
presentByAssoc(l_targets, l_targToDeconfig);
^
./hwasGardTest.H:3402:51: error: ‘presentByAssoc’ was not declared in this scope
presentByAssoc(l_targets, l_targToDeconfig);
^
./hwasGardTest.H: In member function ‘void HwasGardTest::testdeconfigPresentByAssoc2()’:
./hwasGardTest.H:3470:9: error: ‘TargetInfoVector’ was not declared in this scope
TargetInfoVector l_targets;
^
./hwasGardTest.H:3470:26: error: expected ‘;’ before ‘l_targets’
TargetInfoVector l_targets;
^
./hwasGardTest.H:3471:9: error: ‘TargetInfo’ was not declared in this scope
TargetInfo l_TargInfo;
^
./hwasGardTest.H:3471:20: error: expected ‘;’ before ‘l_TargInfo’
TargetInfo l_TargInfo;
^
./hwasGardTest.H:3473:9: error: ‘l_TargInfo’ was not declared in this scope
l_TargInfo.pThisTarget = NULL;
^
./hwasGardTest.H:3475:9: error: ‘l_targets’ was not declared in this scope
l_targets.insert(l_targets.begin(), NUM_TARGS, l_TargInfo);
^
./hwasGardTest.H:3477:26: error: expected ‘;’ before ‘l_targToDeconfig’
TargetInfoVector l_targToDeconfig;
^
./hwasGardTest.H:3515:35: error: ‘l_targToDeconfig’ was not declared in this scope
presentByAssoc(l_targets, l_targToDeconfig);
^
./hwasGardTest.H:3515:51: error: ‘presentByAssoc’ was not declared in this scope
presentByAssoc(l_targets, l_targToDeconfig);
^
./hwasGardTest.H: In member function ‘void HwasGardTest::testdeconfigPresentByAssoc3()’:
./hwasGardTest.H:3551:9: error: ‘TargetInfoVector’ was not declared in this scope
TargetInfoVector l_targets;
^
./hwasGardTest.H:3551:26: error: expected ‘;’ before ‘l_targets’
TargetInfoVector l_targets;
^
./hwasGardTest.H:3552:9: error: ‘TargetInfo’ was not declared in this scope
TargetInfo l_TargInfo;
^
./hwasGardTest.H:3552:20: error: expected ‘;’ before ‘l_TargInfo’
TargetInfo l_TargInfo;
^
./hwasGardTest.H:3554:9: error: ‘l_TargInfo’ was not declared in this scope
l_TargInfo.affinityPath = l_ep[0];
^
./hwasGardTest.H:3555:9: error: ‘l_targets’ was not declared in this scope
l_targets.insert(l_targets.begin(), NUM_TARGS, l_TargInfo);
^
./hwasGardTest.H:3557:26: error: expected ‘;’ before ‘l_targToDeconfig’
TargetInfoVector l_targToDeconfig;
^
./hwasGardTest.H:3613:35: error: ‘l_targToDeconfig’ was not declared in this scope
presentByAssoc(l_targets, l_targToDeconfig);
^
./hwasGardTest.H:3613:51: error: ‘presentByAssoc’ was not declared in this scope
presentByAssoc(l_targets, l_targToDeconfig);
^
./hwasGardTest.H: In member function ‘void HwasGardTest::testdeconfigPresentByAssoc4()’:
./hwasGardTest.H:3650:9: error: ‘TargetInfoVector’ was not declared in this scope
TargetInfoVector l_targets;
^
./hwasGardTest.H:3650:26: error: expected ‘;’ before ‘l_targets’
TargetInfoVector l_targets;
^
./hwasGardTest.H:3651:9: error: ‘TargetInfo’ was not declared in this scope
TargetInfo l_TargInfo;
^
./hwasGardTest.H:3651:20: error: expected ‘;’ before ‘l_TargInfo’
TargetInfo l_TargInfo;
^
./hwasGardTest.H:3653:9: error: ‘l_TargInfo’ was not declared in this scope
l_TargInfo.affinityPath = l_ep[0];
^
./hwasGardTest.H:3654:9: error: ‘l_targets’ was not declared in this scope
l_targets.insert(l_targets.begin(), NUM_TARGS, l_TargInfo);
^
./hwasGardTest.H:3656:26: error: expected ‘;’ before ‘l_targToDeconfig’
TargetInfoVector l_targToDeconfig;
^
./hwasGardTest.H:3671:35: error: ‘l_targToDeconfig’ was not declared in this scope
presentByAssoc(l_targets, l_targToDeconfig);
^
./hwasGardTest.H:3671:51: error: ‘presentByAssoc’ was not declared in this scope
presentByAssoc(l_targets, l_targToDeconfig);
^
./hwasGardTest.H: In member function ‘void HwasGardTest::testdeconfigPresentByAssoc5()’:
./hwasGardTest.H:3707:9: error: ‘TargetInfoVector’ was not declared in this scope
TargetInfoVector l_targets;
^
./hwasGardTest.H:3707:26: error: expected ‘;’ before ‘l_targets’
TargetInfoVector l_targets;
^
./hwasGardTest.H:3708:9: error: ‘TargetInfo’ was not declared in this scope
TargetInfo l_TargInfo;
^
./hwasGardTest.H:3708:20: error: expected ‘;’ before ‘l_TargInfo’
TargetInfo l_TargInfo;
^
./hwasGardTest.H:3710:9: error: ‘l_TargInfo’ was not declared in this scope
l_TargInfo.affinityPath = l_ep[0];
^
./hwasGardTest.H:3711:9: error: ‘l_targets’ was not declared in this scope
l_targets.insert(l_targets.begin(), NUM_TARGS, l_TargInfo);
^
LD libintr.so
./hwasGardTest.H:3713:26: error: expected ‘;’ before ‘l_targToDeconfig’
TargetInfoVector l_targToDeconfig;
^
./hwasGardTest.H:3730:35: error: ‘l_targToDeconfig’ was not declared in this scope
presentByAssoc(l_targets, l_targToDeconfig);
^
./hwasGardTest.H:3730:51: error: ‘presentByAssoc’ was not declared in this scope
presentByAssoc(l_targets, l_targToDeconfig);
^
./hwasGardTest.H: In member function ‘void HwasGardTest::testdeconfigPresentByAssoc6()’:
./hwasGardTest.H:3766:9: error: ‘TargetInfoVector’ was not declared in this scope
TargetInfoVector l_targets;
^
./hwasGardTest.H:3766:26: error: expected ‘;’ before ‘l_targets’
TargetInfoVector l_targets;
^
./hwasGardTest.H:3767:9: error: ‘TargetInfo’ was not declared in this scope
TargetInfo l_TargInfo;
^
./hwasGardTest.H:3767:20: error: expected ‘;’ before ‘l_TargInfo’
TargetInfo l_TargInfo;
^
./hwasGardTest.H:3769:9: error: ‘l_TargInfo’ was not declared in this scope
l_TargInfo.affinityPath = l_ep[0];
^
./hwasGardTest.H:3770:9: error: ‘l_targets’ was not declared in this scope
l_targets.insert(l_targets.begin(), NUM_TARGS, l_TargInfo);
^
./hwasGardTest.H:3772:26: error: expected ‘;’ before ‘l_targToDeconfig’
TargetInfoVector l_targToDeconfig;
^
./hwasGardTest.H:3791:35: error: ‘l_targToDeconfig’ was not declared in this scope
presentByAssoc(l_targets, l_targToDeconfig);
^
./hwasGardTest.H:3791:51: error: ‘presentByAssoc’ was not declared in this scope
presentByAssoc(l_targets, l_targToDeconfig);
^
./hwasGardTest.H: In member function ‘void HwasGardTest::testdeconfigPresentByAssoc7()’:
./hwasGardTest.H:3826:9: error: ‘TargetInfoVector’ was not declared in this scope
TargetInfoVector l_targets;
^
./hwasGardTest.H:3826:26: error: expected ‘;’ before ‘l_targets’
TargetInfoVector l_targets;
^
./hwasGardTest.H:3827:9: error: ‘TargetInfo’ was not declared in this scope
TargetInfo l_TargInfo;
^
./hwasGardTest.H:3827:20: error: expected ‘;’ before ‘l_TargInfo’
TargetInfo l_TargInfo;
^
DEP proc_dmi_scominit.C
./hwasGardTest.H:3829:9: error: ‘l_TargInfo’ was not declared in this scope
l_TargInfo.affinityPath = l_ep[0];
^
./hwasGardTest.H:3830:9: error: ‘l_targets’ was not declared in this scope
l_targets.insert(l_targets.begin(), NUM_TARGS, l_TargInfo);
^
./hwasGardTest.H:3832:26: error: expected ‘;’ before ‘l_targToDeconfig’
TargetInfoVector l_targToDeconfig;
^
./hwasGardTest.H:3872:35: error: ‘l_targToDeconfig’ was not declared in this scope
presentByAssoc(l_targets, l_targToDeconfig);
^
./hwasGardTest.H:3872:51: error: ‘presentByAssoc’ was not declared in this scope
presentByAssoc(l_targets, l_targToDeconfig);
^
./hwasGardTest.H: In member function ‘void HwasGardTest::testdeconfigPresentByAssoc8()’:
./hwasGardTest.H:3907:9: error: ‘TargetInfoVector’ was not declared in this scope
TargetInfoVector l_targets;
^
./hwasGardTest.H:3907:26: error: expected ‘;’ before ‘l_targets’
TargetInfoVector l_targets;
^
./hwasGardTest.H:3908:9: error: ‘TargetInfo’ was not declared in this scope
TargetInfo l_TargInfo;
^
./hwasGardTest.H:3908:20: error: expected ‘;’ before ‘l_TargInfo’
TargetInfo l_TargInfo;
^
./hwasGardTest.H:3910:9: error: ‘l_TargInfo’ was not declared in this scope
l_TargInfo.affinityPath = l_ep[0];
^
./hwasGardTest.H:3911:9: error: ‘l_targets’ was not declared in this scope
l_targets.insert(l_targets.begin(), NUM_TARGS, l_TargInfo);
^
./hwasGardTest.H:3913:26: error: expected ‘;’ before ‘l_targToDeconfig’
TargetInfoVector l_targToDeconfig;
^
./hwasGardTest.H:3989:35: error: ‘l_targToDeconfig’ was not declared in this scope
presentByAssoc(l_targets, l_targToDeconfig);
^
./hwasGardTest.H:3989:51: error: ‘presentByAssoc’ was not declared in this scope
presentByAssoc(l_targets, l_targToDeconfig);
^
./hwasGardTest.H: In member function ‘void HwasGardTest::testdeconfigPresentByAssoc9()’:
./hwasGardTest.H:4029:9: error: ‘TargetInfoVector’ was not declared in this scope
TargetInfoVector l_targets;
^
./hwasGardTest.H:4029:26: error: expected ‘;’ before ‘l_targets’
TargetInfoVector l_targets;
^
./hwasGardTest.H:4030:9: error: ‘TargetInfo’ was not declared in this scope
TargetInfo l_TargInfo;
^
./hwasGardTest.H:4030:20: error: expected ‘;’ before ‘l_TargInfo’
TargetInfo l_TargInfo;
^
./hwasGardTest.H:4032:9: error: ‘l_TargInfo’ was not declared in this scope
l_TargInfo.affinityPath = l_ep[0];
^
./hwasGardTest.H:4033:9: error: ‘l_targets’ was not declared in this scope
l_targets.insert(l_targets.begin(), NUM_TARGS, l_TargInfo);
^
./hwasGardTest.H:4035:26: error: expected ‘;’ before ‘l_targToDeconfig’
TargetInfoVector l_targToDeconfig;
^
./hwasGardTest.H:4087:35: error: ‘l_targToDeconfig’ was not declared in this scope
presentByAssoc(l_targets, l_targToDeconfig);
^
./hwasGardTest.H:4087:51: error: ‘presentByAssoc’ was not declared in this scope
presentByAssoc(l_targets, l_targToDeconfig);
^
MAKE test CODE
./hwasGardTest.H: In function ‘bool compareAffinity(int, int)’:
./hwasGardTest.H:51:1: error: control reaches end of non-void function [-Werror=return-type]
}
^
TESTGEN testintr.C
cc1plus: all warnings being treated as errors
make[10]: *** [../../../../obj/modules/testhwas/testhwas.o] Error 1
make[9]: *** [_BUILD/PASSES/CODE/BODY] Error 2
make[8]: *** [_BUILD/SUBDIR/CODE/test] Error 2
make[7]: *** [_BUILD/PASSES/CODE/POST] Error 2
make[6]: *** [_BUILD/SUBDIR/CODE/hwas] Error 2

attributeOverride / bios section questions.

  1. How does the attribute code handle if an attribute changes size between two firmware levels? If we had an attribute that was 1 byte and was a BIOS setting, what happens if it needs to change to 2 bytes? Will the ATTR_PERM parsing code automatically handle this?
  2. How does the attribute code handle if an attribute has been deleted? If we have an attribute as a BIOS setting in one firmware level and then we deleted that attribute entirely, what is the behavior? Does the attribute code crash or create an error log or silently ignore the extra attribute?

Need to add more FFDC

A PRD error is reported recently from customers recently. PRD failed because targets that existed on the previous IPL no longer exist during the checkstop analysis. However, not enough FFDC has been recorded for this failure. Has shared the logs with Daniel and Zane. Maybe need to modify the code to add more FFDC.

MVPD code does not account for more than 2 PT keywords in VTOC record with VPD cache enablement

Dean has found the root cause and provide the patch as follows:

-- a/src/usr/vpd/ipvpd.C
+++ b/src/usr/vpd/ipvpd.C
@@ -1276,7 +1276,7 @@ IpVpdFacade::getRecordListSeeprom
( std::list & o_recList,
offset = le16toh( toc_rec->record_offset ) + 1; // skip 'large
resource'

       // Read the PT keyword(s) from the VTOC
  -    for (uint16_t index = 0; index < 2; ++index)
  +    for (uint16_t index = 0; index < 3; ++index)
       {
           pt_len = sizeof(l_buffer);
           err = retrieveKeyword( "PT",

about hostboot trace log

@cooktail commented on Mon Sep 12 2016

Hi ,
When I unset CONSOLE_OUTPUT_TRACE,The hostboot IPL log is pithily,But I cannot see any trace message.
Can the hostboot design into SOL print pithily log,And I can check the detail trace message in other places?

op-build fails to build pnor image when enabling console tracing in hostboot code

Tried building pnor image by enabling console tracing in hostboot code, it fails with below errors.
"""ERROR: PnorUtils::checkSpaceConstraints: Image provided (/home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//hostboot_extended.header.bin.ecc) has size (11796480) which is greater than allocated space (5898240) for section=HBI. Aborting! at /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/hostboot_build_images/PnorUtils.pm line 352."""

op-build commit id:43882ebce3063d3b1c9353f598a61f4fa557dd71

#########################/home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/hostboot_build_images//buildpnor.pl --pnorOutBin /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/pnor/firestone.pnor --pnorLayout /home/ubuntu/tracing/op-build/output/build/openpower-pnor-cdfe37976dae7d3171ce9b999cf91f1b5d80e9cc/defaultPnorLayoutWithGoldenSide.xml --binFile_HBD /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//FIRESTONE_HB.targeting.bin.ecc --binFile_SBE /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//venice_sbe.img.ecc --binFile_SBEC /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//centaur_sbec_pad.img.ecc --binFile_WINK /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//p8.ref_image.hdr.bin.ecc --binFile_HBB /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//hostboot.header.bin.ecc --binFile_HBI /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//hostboot_extended.header.bin.ecc --binFile_HBRT /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//hostboot_runtime.header.bin.ecc --binFile_HBEL /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//hbel.bin.ecc --binFile_GUARD /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//guard.bin.ecc --binFile_PAYLOAD /home/ubuntu/tracing/op-build/output/images/skiboot.lid.xz --binFile_BOOTKERNEL /home/ubuntu/tracing/op-build/output/images/zImage.epapr --binFile_NVRAM /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//nvram.bin --binFile_MVPD /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//mvpd_fill.bin.ecc --binFile_DJVPD /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//djvpd_fill.bin.ecc --binFile_CVPD /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//cvpd.bin.ecc --binFile_ATTR_TMP /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//attr_tmp.bin.ecc --binFile_ATTR_PERM /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//attr_perm.bin.ecc --binFile_OCC /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/occ/occ.bin.ecc --binFile_FIRDATA /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//firdata.bin.ecc --binFile_CAPP /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//cappucode.bin.ecc --binFile_SECBOOT /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//secboot.bin.ecc --binFile_VERSION /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_version/openpower-pnor.version.txt --fpartCmd "fpart" --fcpCmd "fcp"

TRACE: PnorUtils::loadPnorLayout: metadata: imageSize = 67108864, blockSize=4096, arrangement = A-D-B, numOfSides: 2, sideSize: 33554432, tocSize: 32768
TRACE: A-D-B: side:A HBB:32899072, primaryTOC:0, backupTOC:33521664, golden: no
TRACE: A-D-B: side:B HBB:66482176, primaryTOC:33554432, backupTOC:67076096, golden: yes
ERROR: PnorUtils::checkSpaceConstraints: Image provided (/home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//hostboot_extended.header.bin.ecc) has size (11796480) which is greater than allocated space (5898240) for section=HBI. Aborting! at /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/hostboot_build_images/PnorUtils.pm line 352.
Error running command: /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/hostboot_build_images//buildpnor.pl --pnorOutBin /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/pnor/firestone.pnor --pnorLayout /home/ubuntu/tracing/op-build/output/build/openpower-pnor-cdfe37976dae7d3171ce9b999cf91f1b5d80e9cc/defaultPnorLayoutWithGoldenSide.xml --binFile_HBD /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//FIRESTONE_HB.targeting.bin.ecc --binFile_SBE /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//venice_sbe.img.ecc --binFile_SBEC /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//centaur_sbec_pad.img.ecc --binFile_WINK /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//p8.ref_image.hdr.bin.ecc --binFile_HBB /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//hostboot.header.bin.ecc --binFile_HBI /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//hostboot_extended.header.bin.ecc --binFile_HBRT /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//hostboot_runtime.header.bin.ecc --binFile_HBEL /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//hbel.bin.ecc --binFile_GUARD /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//guard.bin.ecc --binFile_PAYLOAD /home/ubuntu/tracing/op-build/output/images/skiboot.lid.xz --binFile_BOOTKERNEL /home/ubuntu/tracing/op-build/output/images/zImage.epapr --binFile_NVRAM /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//nvram.bin --binFile_MVPD /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//mvpd_fill.bin.ecc --binFile_DJVPD /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//djvpd_fill.bin.ecc --binFile_CVPD /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//cvpd.bin.ecc --binFile_ATTR_TMP /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//attr_tmp.bin.ecc --binFile_ATTR_PERM /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//attr_perm.bin.ecc --binFile_OCC /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/occ/occ.bin.ecc --binFile_FIRDATA /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//firdata.bin.ecc --binFile_CAPP /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//cappucode.bin.ecc --binFile_SECBOOT /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//secboot.bin.ecc --binFile_VERSION /home/ubuntu/tracing/op-build/output/host/usr/powerpc64le-buildroot-linux-gnu/sysroot/openpower_version/openpower-pnor.version.txt --fpartCmd "fpart" --fcpCmd "fcp". Nonzero return code of (65280) returned.
make[1]: *** [/home/ubuntu/tracing/op-build/output/build/openpower-pnor-cdfe37976dae7d3171ce9b999cf91f1b5d80e9cc/.stamp_images_installed] Error 255
make: *** [_all] Error 2
make: Leaving directory `/home/ubuntu/tracing/op-build/buildroot'

tracehash discards .group sections, can't link with gcc-4.9

Current hostboot won't build with gcc-4.9:

/home/jk/IBM/openpower/op-build/output/host/usr/bin/powerpc64le-buildroot-linux-gnu-ld -shared -z now -x -melf64ppc --nostdlib --sort-common -O3 -nostdlib --nostdlib --sort-common -O3 -nostdlib --nostdlib --sort-common -O3 -nostdlib --nostdlib --sort-common -O3 -nostdlib --nostdlib --sort-common -O3 -nostdlib --nostdlib --sort-common -O3 -nostdlib --nostdlib --sort-common -fPIC -Bsymbolic -Bsymbolic-functions -O3 -nostdlib --nostdlib --sort-common -fPIC -Bsymbolic -Bsymbolic-functions -O3 -nostdlib \
           ../../../obj/modules/trace/interface.o ../../../obj/modules/trace/service.o ../../../obj/modules/trace/compdesc.o ../../../obj/modules/trace/buffer.o ../../../obj/modules/trace/bufferpage.o ../../../obj/modules/trace/daemonif.o ../../../obj/modules/trace/debug.o ../../../obj/modules/trace/assert.o ../../../obj/core/module_init.o \
               -T ../../../src/module.ld -o ../../../img/libtrace.so
../../../obj/modules/trace/service.o:(.bss._ZGVZN9SingletonIN5TRACE13ComponentListEE8instanceEvE8instance+0x0): multiple definition of `guard variable for Singleton<TRACE::ComponentList>::instance()::instance'
../../../obj/modules/trace/interface.o:(.bss._ZGVZN9SingletonIN5TRACE13ComponentListEE8instanceEvE8instance+0x0): first defined here
../../../obj/modules/trace/service.o:(.bss._ZZN9SingletonIN5TRACE13ComponentListEE8instanceEvE8instance+0x0): multiple definition of `Singleton<TRACE::ComponentList>::instance()::instance'
../../../obj/modules/trace/interface.o:(.bss._ZZN9SingletonIN5TRACE13ComponentListEE8instanceEvE8instance+0x0): first defined here
../../../obj/modules/trace/service.o:(.bss._ZGVZN9SingletonIN5TRACE7ServiceEE8instanceEvE8instance+0x0): multiple definition of `guard variable for Singleton<TRACE::Service>::instance()::instance'
../../../obj/modules/trace/interface.o:(.bss._ZGVZN9SingletonIN5TRACE7ServiceEE8instanceEvE8instance+0x0): first defined here
../../../obj/modules/trace/service.o:(.bss._ZZN9SingletonIN5TRACE7ServiceEE8instanceEvE8instance+0x0): multiple definition of `Singleton<TRACE::Service>::instance()::instance'
../../../obj/modules/trace/interface.o:(.bss._ZZN9SingletonIN5TRACE7ServiceEE8instanceEvE8instance+0x0): first defined here

These duplicate templated symbols should have been unified during the link, but it looks like the tracehash tool drops the .group sections, which are required to coalesce these.

The input file to tracehash:

[jk@pablo trace]$ objdump -h service.o.trace 

service.o.trace:     file format elf64-big

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .group        00000008  0000000000000000  0000000000000000  00000040  2**2
                  CONTENTS, READONLY, EXCLUDE, GROUP, LINK_ONCE_DISCARD
  1 .group        00000008  0000000000000000  0000000000000000  00000048  2**2
                  CONTENTS, READONLY, EXCLUDE, GROUP, LINK_ONCE_DISCARD
  2 .group        00000008  0000000000000000  0000000000000000  00000050  2**2
                  CONTENTS, READONLY, EXCLUDE, GROUP, LINK_ONCE_DISCARD
  3 .group        00000008  0000000000000000  0000000000000000  00000058  2**2
                  CONTENTS, READONLY, EXCLUDE, GROUP, LINK_ONCE_DISCARD
  4 .group        00000008  0000000000000000  0000000000000000  00000060  2**2
                  CONTENTS, READONLY, EXCLUDE, GROUP, LINK_ONCE_DISCARD
  5 .group        00000008  0000000000000000  0000000000000000  00000068  2**2
                  CONTENTS, READONLY, EXCLUDE, GROUP, LINK_ONCE_DISCARD
  6 .text         00000000  0000000000000000  0000000000000000  00000070  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
[...]

and after tracehash has run:

[jk@pablo trace]$ objdump -h service.o

service.o:     file format elf64-big

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00000000  0000000000000000  0000000000000000  00000040  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .data         00000000  0000000000000000  0000000000000000  00000040  2**0
                  CONTENTS, ALLOC, LOAD, DATA
[...]

Doesn't build with binutils 2.25

In my working tree for updating buildroot, we have the option of going to binutils 2.25 rather than a patched 2.24, but if we enable binutils 2.25, hostboot no longer builds.

I'll update this issue with more information as my tree stabilises.

Hostboot build fails with ENABLE_CHECKSTOP_ANALYSIS set in Palmetto config

When I set ENABLE_CHECKSTOP_ANALYSIS in the hostboot palmetto config I get the following build failure:
exception caught: Could not find symbol _ZNK4PRDF17PnorFirDataReader7getScomEPN9TARGETING6TargetEmRm
make[4]: *** [../img/hbirt.bin] Error 255
make[4]: *** Waiting for unfinished jobs....
exception caught: Could not find symbol _ZNK4PRDF17PnorFirDataReader7getScomEPN9TARGETING6TargetEmRm
make[4]: *** [../img/hbirt_test.bin] Error 255
make[3]: *** [_BUILD/PASSES/IMAGE/BODY] Error 2
make[2]: *** [_BUILD/SUBDIR/IMAGE/src] Error 2
make[1]: *** [_BUILD/PASSES/IMAGE/POST] Error 2

My op build is pointing to HB commit 3593853

IPL failure with corrupt HBEL partition

(from #67 )

Erase the HBEL partition, but not setting ECC:

pflash -P HBEL -e

and then, when trying to IPL, you end up failing:

  0.61819|ECC error in PNOR flash in section offset 0x00008000

  0.62322|System shutting down with error status 0x60F
  3.22583|Ignoring boot flags, incorrect version 0x0
  3.30757|ISTEP  6. 3
  1.09243|ECC error in PNOR flash in section offset 0x00008000

  1.09246|System shutting down with error status 0x60F
  3.68140|Ignoring boot flags, incorrect version 0x0
  3.76426|ISTEP  6. 3

Hostboot sends incomplete FRU data for CPU1

Seen in Barreleye, have not check other platforms.
Issue: Hostboot only sends the first chunk for VPD data for CPU1 (FRU_ID 2). Because the VPD data is incomplete, the data is not written to the inventory.

BMC Traces:
** CPU0 (FRU_ID 1), complete VPD data is sent in 2 chunks:

IPMI Incoming: Seq 0x0b, NetFn 0x0a, CMD: 0x12
0x000000: 01 00 00 01 00 00 01 00 00 00 fe 01 07 00 00 00 ................
0x000010: 00 c3 49 42 4d d0 50 52 4f 43 45 53 53 4f 52 20 ..IBM.PROCESSOR
0x000020: 4d 4f 44 55 4c 45 cc 59 41 33 39 33 32 36 34 37 MODULE.YA3932647
0x000030: 35 36 35 c7 30 30 55 4c 38 36 34 02             565.00UL864.
ipmi_storage_write_fru_data(): Start
IPMI WRITE-FRU-DATA for [/tmp/ipmifru01]  Offset = [0] Length = [57]
...
IPMI Incoming: Seq 0x0d, NetFn 0x0a, CMD: 0x12
0x000000: 01 39 00 20 20 00 00 c1 00 02                   .9.  .....
ipmi_storage_write_fru_data(): Start
IPMI WRITE-FRU-DATA for [/tmp/ipmifru01]  Offset = [57] Length = [7]

** CPU1 (FRU_ID 2) only receives the first chunk:

IPMI Incoming: Seq 0x0f, NetFn 0x0a, CMD: 0x12
0x000000: 02 00 00 01 00 00 01 00 00 00 fe 01 07 00 00 00 ................
0x000010: 00 c3 49 42 4d d0 50 52 4f 43 45 53 53 4f 52 20 ..IBM.PROCESSOR
0x000020: 4d 4f 44 55 4c 45 cc 59 41 33 39 33 32 36 34 37 MODULE.YA3932647
0x000030: 35 36 39 c7 30 30 55 4c 38 36 34 02             569.00UL864.
ipmi_storage_write_fru_data(): Start
IPMI WRITE-FRU-DATA for [/tmp/ipmifru02]  Offset = [0] Length = [57]

Two socket systems fail with a deconfigured slave processor.

8.50476|ISTEPS_TRACE|PRESENT> Proc[05]=C000000000000000
...
9.19614|HWAS_I|I>hwasPlatDeconfigGard.C: Get returning 1 GARD Records
9.21550|HWAS_I|I>deconfigGard.C: Deconfiguring Target 00050001, errlEid 0x90000015
...
9.23584|HWAS_I|I>deconfigGard.C: _deconfigureByAssoc BUS Peer: 000F0000
9.23585|HWAS_I|I>deconfigGard.C: Deconfiguring Target 000F0000, errlEid 0x90000015
...
19.51245|ISTEPS_TRACE|Running proc_a_x_pci_dmi_pll_initf HWP on target HUID 00050000
19.53177|FAPI|proc_a_x_pci_dmi_pll_initf.C:
Parameter1, start_XBUS=true
Parameter2, start_ABUS=false
Parameter3, start_PCIE=true
Parameter4, start_DMI=true
...
19.93265|FAPI_I|proc_start_clocks_chiplets.C: proc_start_clocks_chiplets: Entering ...
19.95199|XSCOM|E>XSCOM status error HMER: 00c0050000000000 ,XSComStatus = 5, Addr=8030006

Looks like if the Abus unit is deconfigured (by association) we do not call PLL init for the Abus logic on the master processor. The very next istep tries to touch the logic still.

We do not see this problem on our multi-node system because we keep all the Abus units "configured" until much later in the IPL.

Hostboot is setting pa-features incorrectly

The other thing to check is whether you have MMU_FTR_CI_LARGE_PAGE:

$ dmesg|grep mmu_features
[ 0.000000] mmu_features = 0x7c000003

You can see 0x20000000 is set.

They have this:
root@ubuntu:/var/log# grep mmu_features dmesg*
dmesg:[ 0.000000] mmu_features = 0x5c000001

So it's not set.

Looks like the bug is in hostboot. We have this in the device tree in
linux.

root@palm6-p1:~# lsprop /proc/device-tree/cpus/PowerPC,POWER8@20/ibm
,pa-features
/proc/device-tree/cpus/PowerPC,POWER8@20/ibm,pa-features
00000006 00000000 000000f6 0000003f
000000c7 00000000 00000080 000000c0

That's 32bit properties there but should be 8 bit.

For openpower, that property is generated from hostboot and skiboot
just
passes it through.

from:
https://github.com/open-power/hostboot/blob/master/src/usr/devtree/bld_devtree.C#L763

It looks like this will fix it in hostboot (totally untested)

diff --git a/src/usr/devtree/bld_devtree.C
b/src/usr/devtree/bld_devtree.C
index ff5fce0..957f1e8 100644
--- a/src/usr/devtree/bld_devtree.C
+++ b/src/usr/devtree/bld_devtree.C
@@ -760,7 +760,7 @@ uint32_t bld_cpu_node(devTree * i_dt, dtOffset_t &
i_parentNode,
* of thread 0 of that core.
*/

  • uint32_t paFeatures[8] = { 0x6, 0x0, 0xf6, 0x3f, 0xc7, 0x0, 0x80,
    0xc0 };
  • uint32_t paFeatures[2] = { 0x0600f63f, 0xc70080c0 };
    uint32_t pageSizes[4] = { 0xc, 0x10, 0x18, 0x22 };
    uint32_t segmentSizes[4] = { 0x1c, 0x28, 0xffffffff, 0xffffffff };
    uint32_t segmentPageSizes[] =

Dean, Patrick, this seem like the right fix? There might be others
features with the same problem, so that code might need an audit.

Mikey

HBRT infinite loop on ECC error during startup

Start opal-prd and observe this log before opal-prd gets stuck at 100% CPU.

HBRT: PRDF:>>PRDF::main() Global attnType=0004
HBRT: PRDF:>>PRDF::noLock_initialize() 
HBRT: PRDF:>>PegasusConfigurator::build()
HBRT: PRDF:<<PegasusConfigurator::build()
HBRT: PRDF:<<PRDF::noLock_initialize() 
HBRT: ERRL:>>ErrlManager::ErrlManager constructor.
HBRT: ERRL:iv_hiddenErrorLogsEnable = 0x0
HBRT: ERRL:>>setupPnorInfo
HBRT: PNOR:>>RtPnor::getSectionInfo
HBRT: PNOR:>>RtPnor::readFromDevice: i_offset=0x0, i_procId=0 sec=11 size=0x20000 ecc=1
HBRT: PNOR:RtPnor::readFromDevice: removing ECC...
HBRT: PNOR:RtPnor::readFromDevice> Uncorrectable ECC error : chip=0,offset=0x0

(at which point everything stops with opal-prd chewing 100% CPU)

Which ends up being a fairly classic race in trying to log an error before everything has been initialized.

Consequently, opal-prd spins a core and is right off into the weeds.

Error data from IPMI is not parsed correctly.

If we have an error in IPMI, the console displays as 'unknown' for all the fields that should be derived from the error log tags in the code (/*@ ... */ field). Specifically I saw this problem on this error in ipmibt.C:

            /* @errorlog tag
             * @errortype       ERRL_SEV_INFORMATIONAL
             * @moduleid        IPMI::MOD_IPMISRV_REPLY
             * @reasoncode      IPMI::RC_READ_EVENT_FAILURE
             * @userdata1       command of message
             * @userdata2       completion code
             * @devdesc         an async completion code was not CC_OK
             * @custdesc        Unexpected IPMI completion code from the BMC
             */

I suspect the reason for this is that the error comments have the stray "tag" in them after "errorlog" and it is throwing off one of the parsers we have.

We should either remove the stray tags or make sure that our parser code and handle any ascii garbage after the 'errorlog' tag.

palmetto: hostboot stuck at istep 8.6

After a power cycle, booting the palmetto board stopped at istep 8.6 with the firmware that came with the board. This only manifested itself once out of the five or six times I powered on the machine:

  3.04864|ISTEP  6. 3
  3.31803|ISTEP  6. 4
  3.31852|ISTEP  6. 5
  3.39041|ISTEP  6. 6
  3.45384|ISTEP  6. 7
  3.45469|ISTEP  6. 8
  3.49119|ISTEP  6. 9
  3.50965|ISTEP  6.10
  3.51005|ISTEP  6.11
  3.51084|ISTEP  6.12
  3.51141|ISTEP  6.13
  3.51185|ISTEP  7. 1
  3.55743|ISTEP  7. 2
  3.68424|ISTEP  7. 3
  3.70389|ISTEP  7. 4
  3.73991|ISTEP  7. 5
  3.90578|ISTEP  7. 6
  3.95356|ISTEP  7. 7
  3.95456|ISTEP  7. 8
  3.99539|ISTEP  7. 9
  3.99599|ISTEP  8. 1
  4.05983|ISTEP  8. 2
  4.06067|ISTEP  8. 3
  4.06123|ISTEP  8. 4
  4.07068|ISTEP  8. 5
  4.07160|ISTEP  8. 6

make infrastructure can't handle custom CFLAGS

I'm trying to compile hostboot with -mbig-endian, by providing a custom CFLAGS:

CROSS_PREFIX=/home/jk/IBM/openpower/op-build/output.tmp/host/usr/bin/powerpc64-buildroot-linux-gnu- CONFIG_FILE=/home/jk/IBM/openpower/op-build/openpower/configs/hostboot/palmetto.config HOST_BINUTILS_DIR=/home/jk/IBM/openpower/op-build/output/build/host-binutils-2.24/ HOST_PREFIX="" OPENPOWER_BUILD=1 BUILD_VERBOSE=1 CFLAGS=-mbig-endian make

but it looks like specifying CFLAGS= in the environment causes the make infrastructure to construct an invalid compiler command:

/home/jk/IBM/openpower/op-build/output.tmp/host/usr/bin/powerpc64-buildroot-linux-gnu-g++ -c  -mbig-endian -O3 -nostdlib -mcpu=power7 -nostdinc -g -mno-vsx -mno-altivec -Wall -Werror -mtraceback=no -pipe -ffunction-sections -fdata-sections -ffreestanding -O3 -nostdlib -mcpu=power7 -nostdinc -g -mno-vsx -mno-altivec -Wall -Werror -mtraceback=no -pipe -ffunction-sections -fdata-sections -ffreestanding -O3 -nostdlib -mcpu=power7 -nostdinc -g -mno-vsx -mno-altivec -Wall -Werror -mtraceback=no -pipe -ffunction-sections -fdata-sections -ffreestanding -O3 -nostdlib -mcpu=power7 -nostdinc -g -mno-vsx -mno-altivec -Wall -Werror -mtraceback=no -pipe -ffunction-sections -fdata-sections -ffreestanding -O3 -nostdlib -mcpu=power7 -nostdinc -g -mno-vsx -mno-altivec -Wall -Werror -mtraceback=no -pipe -ffunction-sections -fdata-sections -ffreestanding -O3 -nostdlib -mcpu=power7 -nostdinc -g -mno-vsx -mno-altivec -Wall -Werror -mtraceback=no -pipe -ffunction-sections -fdata-sections -ffreestanding -D__HOSTBOOT_MODULE=trace -fPIC -Bsymbolic -Bsymbolic-functions -O3 -nostdlib -mcpu=power7 -nostdinc -g -mno-vsx -mno-altivec -Wall -Werror -mtraceback=no -pipe -ffunction-sections -fdata-sections -ffreestanding -D__HOSTBOOT_MODULE=trace -fPIC -Bsymbolic -Bsymbolic-functions -O3 -nostdlib -mcpu=power7 -nostdinc -g -mno-vsx -mno-altivec -Wall -Werror -mtraceback=no -pipe -ffunction-sections -fdata-sections -ffreestanding -D__HOSTBOOT_MODULE=tracedaemon -fPIC -Bsymbolic -Bsymbolic-functions -O3 -nostdlib -mcpu=power7 -nostdinc -g -mno-vsx -mno-altivec -Wall -Werror -mtraceback=no -pipe -ffunction-sections -fdata-sections -ffreestanding -D__HOSTBOOT_MODULE=tracedaemon -fPIC -Bsymbolic -Bsymbolic-functions -O3 -nostdlib -mcpu=power7 -nostdinc -g -mno-vsx -mno-altivec -Wall -Werror -mtraceback=no -pipe -ffunction-sections -fdata-sections -ffreestanding -nostdinc++ -fno-rtti -fno-exceptions -Wall -fuse-cxa-atexit daemon.C \
                -o ../../../../obj/modules/tracedaemon/daemon.o.trace -I../../../../src/include/usr -I../../../../src/include/ -I../../../../obj/genfiles -iquote .
<command-line>:0:0: error: "__HOSTBOOT_MODULE" redefined [-Werror]
<command-line>:0:0: note: this is the location of the previous definition
cc1plus: all warnings being treated as errors
  • it looks like the CFLAGS+= stanzas from the .env.mk files are being duplicated. If I do a make -p, the top-level makefile is constructing a valid CFLAGS, but the rest have duplicated components.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.