omniosorg / illumos-omnios Goto Github PK

This project forked from illumos/illumos-gate

Community developed and maintained version of the OS/Net consolidation

Makefile 1.36% C 88.83% C++ 0.41% Assembly 1.11% Roff 5.47% Awk 0.05% Shell 1.35% Forth 0.05% DIGITAL Command Language 0.19% Batchfile 0.01% Perl 0.21% Yacc 0.16% Scilab 0.17% Lex 0.05% RenderScript 0.01% DTrace 0.28% Java 0.30% D 0.01% Logos 0.01% IGOR Pro 0.01%

omnios illumos operating-system

illumos-omnios's Introduction

illumos-omnios

This is the OmniOS fork of the core illumos source tree.

Building

In general, and for least surprise, OmniOS must be built on an OmniOS system of the same version. See the build instructions for further information.

Contributing

The OmniOS fork is regularly updated with changes from the upstream illumos-gate. Contributions to OmniOS are also welcome via Github pull requests.

Contact

Contact details for OmniOS can be found on our web site.

Community

The illumos community is small but active. We welcome everybody who would like to use the software and participate in the community -- whether you have decades of experience in systems software, or you're just getting started; whether you work for a company that uses illumos, or you just find it personally interesting.

The Community guide includes details about Mailing Lists and IRC channels.

Licence

Most of the existing code is licensed under the CDDL and we expect new code will generally be under this licence as well. Modifications of existing code may not alter the original license terms. Integrations of code from upstream sources that use another open source license are generally permissible.

illumos-omnios's People

Contributors

Stargazers

Watchers

Forkers

citrus-it hadfl lschweiss-wustl wdowling v-a-b filipfrancis seriv ed-at-work oetiker skaldique ewanheming ebardes timwinkler lminoja moetiker jubalskaggs ripudamank2 turvamies bengentil dpquigl mvwieringen bbnhh wmaddox catalinbostan hygonsoc dwrank sitedata john-connett niclasr glasspelican kurlon wsgalaxy xsbymz ericdeblon yuleihua webzcc stensonb mikeowens gnattu noengo fejfighter snltd hyvanix effemmess nanxiao ebi1 sperreault kes999 oxdrove jgroenveld jlinnosa theneilzahra alslater pinecat sjorge calmsacibis995 skyn9ne stcgaoty alm0ndiga andrewjweiss76 johnconnett quension-gh damonhung marksmith a1768623020 parrotmac baihs daleghent antranigv nvn1115 ryanarneson cmchittom cneira phiconsul jyancey freegrok

illumos-omnios's Issues

ELF vulnerability allowing non-privileged users to DoS a system locally

The following code in /usr/src/uts/common/exec/elf/elf.c does not sanity check the p_filesz of the PT_DYNAMIC segment. You would think that the call to ndyns = MIN(DYN_STRIDE, ndyns) would mitigate this, but ndyns overflows to a negative number so dynsize ends up being huge.

Solution: Make sure that dynamicphdr->p_filesz does not exceed the size of the file.

#define DYN_STRIDE      100
                for (i = 0; i < dynamicphdr->p_filesz;
                    i += sizeof (*dyn) * DYN_STRIDE) {
                        int ndyns = (dynamicphdr->p_filesz - i) / sizeof (*dyn);
                        size_t dynsize;

                        ndyns = MIN(DYN_STRIDE, ndyns);
                        dynsize = ndyns * sizeof (*dyn);

                        dyn = kmem_alloc(dynsize, KM_SLEEP);

                        if ((error = vn_rdwr(UIO_READ, vp, (caddr_t)dyn,
                            dynsize, (offset_t)(dynamicphdr->p_offset + i),
                            UIO_SYSSPACE, 0, (rlim64_t)0,
                            CRED(), &resid)) != 0) {
                                uprintf("%s: cannot read .dynamic section\n",
                                    exec_file);
                                goto out;
                        }

Panic during install of r151040g on Windows 11 Hyper-V

Host: Windows 11 Pro; Version 21H2; OS build 22000.527
Installation Media: omnios-r151040g.iso
VM Settings: Generation 2; Secure Boot disabled; Memory 8192 MB; 2 Virtual processors

To reproduce: start the VM and wait. The "Welcome to the OmniOS installer" is displayed; the countdown finishes and almost immediately panics.

This is a screen capture of the tail end of the boot log:

I have attached a named pipe to COM1 using:
Set-VMComPort -VMName atelier2 -Path \\.\pipe\atelier2-com1 -Number 1
However, I am unfamiliar with the techniques to capture the serial console output.

The Hypervisior scheduler type is 0x4 (see below). This is the root scheduler which is the default on my system.

Get-WinEvent -FilterHashTable @{ProviderName="Microsoft-Windows-Hyper-V-Hypervisor"; ID=2} | select -First 1

   ProviderName: Microsoft-Windows-Hyper-V-Hypervisor

TimeCreated                     Id LevelDisplayName Message
-----------                     -- ---------------- -------
26/02/2022 15:10:42              2 Information      Hypervisor scheduler type is 0x4.

Happy to investigate further or provide more information.

Note, the following installation media can be installed successfully on the same VM:

Fedora-Workstation-Live-x86_64-35-1.2.iso (Secure Boot enabled)
ubuntu-21.10-desktop-amd64.iso (Secure Boot enabled)
FreeBSD-13.0-RELEASE-amd64-dvd1.iso (Secure Boot disabled)

clone -m copy broken for lipkg|ipkg

# zoneadm -z zone1 clone -m copy zone0
/usr/lib/brand/ipkg/clone: -m: unknown option

brand-specific usage: clone {sourcezone}
usage:  clone [-m method] [-s <ZFS snapshot>] [brand-specific args] zonename
        Clone the installation of another zone.  The -m option can be used to
        specify 'copy' which forces a copy of the source zone.  The -s option
        can be used to specify the name of a ZFS snapshot that was taken from
        a previous clone command.  The snapshot will be used as the source
        instead of creating a new ZFS snapshot.  All other arguments are passed
        to the brand clone function; see brands(5) for more information.

Hyper-V 18343 causing system to hang (r151030 LTS)

The system hangs on "Configuring devices" on Hyper-V 18343, in a Gen1 machine. Upon enabing verbose boot, it looks like the Hyper-V interface is recognized but then it just stops.
I've described the issue in a more verbose way in this article.

Here is the tail of the boot log with verbose boot enabled:

Sparse zone network/physical depends on unavailable package

Networking fails to start in a sparse zone because it depends on /network/varpd which isn't available. Changing the network/phsycial smf to mark it optional_all allows the system to properly start up.

   <dependency name='network-physical-varpd' grouping='optional_all' restart_on='none' type='service'>
      <service_fmri value='svc:/network/varpd'/>
    </dependency>

Found in r151033

Hang configuring devices under Hyper-V with single guest CPU

Installed OmniOS (omniosce-r151030.iso) under Hyper-V on Windows Server 2012r2
With a single guest CPU configured, it boots as far as configuring devices, then hangs.
Work-around: configure at least two guest CPUs.

We'd like single CPU configurations to work too, as that's the default when using the Hyper-V U/I to create a new guest.

whatis database not created automatically

After installation of OmniOS r151030e, man -k and apropos fails with:

/usr/local/man/whatis: No such file or directory
/usr/share/man/whatis: No such file or directory
/opt/ooce/share/man/whatis: No such file or directory
/usr/gnu/share/man/whatis: No such file or directory

after running catman -w, it's fixed. It'd make sense to automate this, so that new users who are looking for man pages don't first have to create the index.

USB 3.0 (xhci) not working in r151024g

I tried to use my USB3.0 ports on my box and it turned out that it does not work for some reason. It seems as if the hard disk is not registered by the system when connected to the respective port. The controller is recognized...

Here some info from my console:

[~] prtconf -dD | grep -i xhci
            pci1043,8488 (pciex1b21,1042) [ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller] (driver name: xhci)
[~] mdb -ke '::prtusb'
INDEX   DRIVER      INST  NODE          GEN  VID.PID     PRODUCT             
1       ehci        0     pci1043,8389  2.0  0000.0000   No Product String
2       ehci        1     pci1043,8389  2.0  0000.0000   No Product String
3       ohci        0     pci1043,8389  1.1  0000.0000   No Product String
4       ohci        1     pci1043,8389  1.1  0000.0000   No Product String
5       ohci        2     pci1043,8389  1.1  0000.0000   No Product String
6       ohci        3     pci1043,8389  1.1  0000.0000   No Product String
7       ohci        4     pci1043,8389  1.1  0000.0000   No Product String
[~] cat /etc/release
  OmniOS v11 r151024g
  Copyright 2017 OmniTI Computer Consulting, Inc. All rights reserved.
  Copyright 2017 OmniOS Community Edition (OmniOSce) Association.
  All rights reserved. Use is subject to license terms.
[~]

Using the disk on a USB2.0 port works fine:

[~] mdb -ke '::prtusb'
INDEX   DRIVER      INST  NODE          GEN  VID.PID     PRODUCT             
1       ehci        0     pci1043,8389  2.0  0000.0000   No Product String
2       ehci        1     pci1043,8389  2.0  0000.0000   No Product String
3       ohci        0     pci1043,8389  1.1  0000.0000   No Product String
4       ohci        1     pci1043,8389  1.1  0000.0000   No Product String
5       ohci        2     pci1043,8389  1.1  0000.0000   No Product String
6       ohci        3     pci1043,8389  1.1  0000.0000   No Product String
7       ohci        4     pci1043,8389  1.1  0000.0000   No Product String
8       scsa2usb    0     storage       2.1  0480.a00c   External USB 3.0
[~]

Setting promiscphys=true on a vnic in bhyve breaks host/zone -> vm traffic

Took me a bit to figure out but when enabling promiscphys to get things like CARP work, it breaks host -> vm traffic.

origin	destination	works
host	vm	no
zone	vm	no
vm	vm	no
device*	vm	yes

device as another device on the network that is not the node running the vm

Looks like the TCP checksum is incorrect and the packet rejected, might also explain why ping does work.
I tried both IPv6 and IPv4 the result is the same.

Frame 1: 94 bytes on wire (752 bits), 94 bytes captured (752 bits)
    Encapsulation type: Ethernet (1)
    Arrival Time: Aug 19, 2021 17:00:58.701450000 CEST
    [Time shift for this packet: 0.000000000 seconds]
    Epoch Time: 1629385258.701450000 seconds
    [Time delta from previous captured frame: 0.000000000 seconds]
    [Time delta from previous displayed frame: 0.000000000 seconds]
    [Time since reference or first frame: 0.000000000 seconds]
    Frame Number: 1
    Frame Length: 94 bytes (752 bits)
    Capture Length: 94 bytes (752 bits)
    [Frame is marked: False]
    [Frame is ignored: False]
    [Protocols in frame: eth:ethertype:ipv6:tcp]
    [Coloring Rule Name: Checksum Errors]
    [Coloring Rule String [truncated]: eth.fcs.status=="Bad" || ip.checksum.status=="Bad" || tcp.checksum.status=="Bad" || udp.checksum.status=="Bad" || sctp.checksum.status=="Bad" || mstp.checksum.status=="Bad" || cdp.checksum.status=="Bad" |]
Ethernet II, Src: Cyberdyn_eb:db:e2 (00:22:06:eb:db:e2), Dst: Cyberdyn_eb:5c:9f (00:22:06:eb:5c:9f)
    Destination: Cyberdyn_eb:5c:9f (00:22:06:eb:5c:9f)
    Source: Cyberdyn_eb:db:e2 (00:22:06:eb:db:e2)
    Type: IPv6 (0x86dd)
Internet Protocol Version 6, Src: 2a02:578:470f:10::101, Dst: 2a02:578:470f:10::1042
    0110 .... = Version: 6
    .... 0001 0000 .... .... .... .... .... = Traffic Class: 0x10 (DSCP: Unknown, ECN: Not-ECT)
    .... .... .... 0000 0000 0000 0000 0000 = Flow Label: 0x00000
    Payload Length: 40
    Next Header: TCP (6)
    Hop Limit: 60
    Source Address: 2a02:578:470f:10::101
    Destination Address: 2a02:578:470f:10::1042
Transmission Control Protocol, Src Port: 40704, Dst Port: 22, Seq: 0, Len: 0
    Source Port: 40704
    Destination Port: 22
    [Stream index: 0]
    [TCP Segment Len: 0]
    Sequence Number: 0    (relative sequence number)
    Sequence Number (raw): 3370996639
    [Next Sequence Number: 1    (relative sequence number)]
    Acknowledgment Number: 0
    Acknowledgment number (raw): 0
    1010 .... = Header Length: 40 bytes (10)
    Flags: 0x002 (SYN)
        000. .... .... = Reserved: Not set
        ...0 .... .... = Nonce: Not set
        .... 0... .... = Congestion Window Reduced (CWR): Not set
        .... .0.. .... = ECN-Echo: Not set
        .... ..0. .... = Urgent: Not set
        .... ...0 .... = Acknowledgment: Not set
        .... .... 0... = Push: Not set
        .... .... .0.. = Reset: Not set
        .... .... ..1. = Syn: Set
            [Expert Info (Chat/Sequence): Connection establish request (SYN): server port 22]
                [Connection establish request (SYN): server port 22]
                [Severity level: Chat]
                [Group: Sequence]
        .... .... ...0 = Fin: Not set
        [TCP Flags: ··········S·]
    Window: 64080
    [Calculated window size: 64080]
    Checksum: 0xfea3 incorrect, should be 0x0d51(maybe caused by "TCP checksum offload"?)
        [Expert Info (Error/Checksum): Bad checksum [should be 0x0d51]]
            [Bad checksum [should be 0x0d51]]
            [Severity level: Error]
            [Group: Checksum]
    [Checksum Status: Bad]
    [Calculated Checksum: 0x0d51]
    Urgent Pointer: 0
    Options: (20 bytes), Maximum segment size, SACK permitted, Timestamps, No-Operation (NOP), Window scale
        TCP Option - Maximum segment size: 1440 bytes
            Kind: Maximum Segment Size (2)
            Length: 4
            MSS Value: 1440
        TCP Option - SACK permitted
            Kind: SACK Permitted (4)
            Length: 2
        TCP Option - Timestamps: TSval 80707984, TSecr 0
            Kind: Time Stamp Option (8)
            Length: 10
            Timestamp value: 80707984
            Timestamp echo reply: 0
        TCP Option - No-Operation (NOP)
            Kind: No-Operation (1)
        TCP Option - Window scale: 1 (multiply by 2)
            Kind: Window Scale (3)
            Length: 3
            Shift count: 1
            [Multiplier: 2]
    [Timestamps]
        [Time since first frame in this TCP stream: 0.000000000 seconds]
        [Time since previous frame in this TCP stream: 0.000000000 seconds]

The other packets after it are the same but marked as retransmit, which is what we'd expect.

Here is a capture of nearly an identical VM but without promiscphys=true, the first frame of a SSH connection

Frame 1: 94 bytes on wire (752 bits), 94 bytes captured (752 bits)
    Encapsulation type: Ethernet (1)
    Arrival Time: Aug 19, 2021 17:05:32.700451000 CEST
    [Time shift for this packet: 0.000000000 seconds]
    Epoch Time: 1629385532.700451000 seconds
    [Time delta from previous captured frame: 0.000000000 seconds]
    [Time delta from previous displayed frame: 0.000000000 seconds]
    [Time since reference or first frame: 0.000000000 seconds]
    Frame Number: 1
    Frame Length: 94 bytes (752 bits)
    Capture Length: 94 bytes (752 bits)
    [Frame is marked: False]
    [Frame is ignored: False]
    [Protocols in frame: eth:ethertype:ipv6:tcp]
    [Coloring Rule Name: TCP SYN/FIN]
    [Coloring Rule String: tcp.flags & 0x02 || tcp.flags.fin == 1]
Ethernet II, Src: Cyberdyn_eb:db:e2 (00:22:06:eb:db:e2), Dst: Cyberdyn_ae:a3:64 (00:22:06:ae:a3:64)
    Destination: Cyberdyn_ae:a3:64 (00:22:06:ae:a3:64)
    Source: Cyberdyn_eb:db:e2 (00:22:06:eb:db:e2)
    Type: IPv6 (0x86dd)
Internet Protocol Version 6, Src: 2a02:578:470f:10::101, Dst: 2a02:578:470f:10::2042
    0110 .... = Version: 6
    .... 0001 0000 .... .... .... .... .... = Traffic Class: 0x10 (DSCP: Unknown, ECN: Not-ECT)
    .... .... .... 0000 0000 0000 0000 0000 = Flow Label: 0x00000
    Payload Length: 40
    Next Header: TCP (6)
    Hop Limit: 60
    Source Address: 2a02:578:470f:10::101
    Destination Address: 2a02:578:470f:10::2042
Transmission Control Protocol, Src Port: 33012, Dst Port: 22, Seq: 0, Len: 0
    Source Port: 33012
    Destination Port: 22
    [Stream index: 0]
    [TCP Segment Len: 0]
    Sequence Number: 0    (relative sequence number)
    Sequence Number (raw): 1085336387
    [Next Sequence Number: 1    (relative sequence number)]
    Acknowledgment Number: 0
    Acknowledgment number (raw): 0
    1010 .... = Header Length: 40 bytes (10)
    Flags: 0x002 (SYN)
        000. .... .... = Reserved: Not set
        ...0 .... .... = Nonce: Not set
        .... 0... .... = Congestion Window Reduced (CWR): Not set
        .... .0.. .... = ECN-Echo: Not set
        .... ..0. .... = Urgent: Not set
        .... ...0 .... = Acknowledgment: Not set
        .... .... 0... = Push: Not set
        .... .... .0.. = Reset: Not set
        .... .... ..1. = Syn: Set
            [Expert Info (Chat/Sequence): Connection establish request (SYN): server port 22]
                [Connection establish request (SYN): server port 22]
                [Severity level: Chat]
                [Group: Sequence]
        .... .... ...0 = Fin: Not set
        [TCP Flags: ··········S·]
    Window: 64080
    [Calculated window size: 64080]
    Checksum: 0xe390 [correct]
    [Checksum Status: Good]
    [Calculated Checksum: 0xe390]
    Urgent Pointer: 0
    Options: (20 bytes), Maximum segment size, SACK permitted, Timestamps, No-Operation (NOP), Window scale
        TCP Option - Maximum segment size: 1440 bytes
            Kind: Maximum Segment Size (2)
            Length: 4
            MSS Value: 1440
        TCP Option - SACK permitted
            Kind: SACK Permitted (4)
            Length: 2
        TCP Option - Timestamps: TSval 80980465, TSecr 0
            Kind: Time Stamp Option (8)
            Length: 10
            Timestamp value: 80980465
            Timestamp echo reply: 0
        TCP Option - No-Operation (NOP)
            Kind: No-Operation (1)
        TCP Option - Window scale: 1 (multiply by 2)
            Kind: Window Scale (3)
            Length: 3
            Shift count: 1
            [Multiplier: 2]
    [Timestamps]
        [Time since first frame in this TCP stream: 0.000000000 seconds]
        [Time since previous frame in this TCP stream: 0.000000000 seconds]

OmniOS installer kernel panic when pcie NVME drive is installed

Hi All,

We are trying to install latest release from omniOS (r151030) into our system which has a pcie NVME drive.
the drive is Oracle F320 NVMe AIC with P/N MZPLK3T2HCJL

while installing before the installer even start we face this kernel panic
SunOS Release 5.11 Version omnios-r151030-f1109fc02c Copyright (c) 1903, 2010, Oracle Configuring devices.
panic[cpu0]/thread=fffffcc26b4dcc20: programming error: async event request 1imit exceeded in cmd fffffe23b08c8e40

fffffcc26b4dcab0 genunix:dev_err+7b ()
fffffcc26b4dcad0 nvme:nvme_check_specific_cmd_status+aa ()
fffffcc26b4dcb40 nvme:nvme_async_event_task+445 ()
fffffcc26b4dcc00 genunix:taskq_thread+2d0 ()
fffffcc26b4dcc10 unix:thread_start+8 ()

if i remove the card the install will work normally
when readding the card after install done it will boot but if we create any BE it will fail again with same exception

is there any way to fix this issue ?

Thanks

OS-8181 exposed fault in lx netlink bind/sendto

Following 8390247 - LX: OS-8181 lx getsockopt of SO_PROTOCOL fails - newer lx zone images do not boot.

# zadm boot -c -de_ lxtest
[Connected to zone 'lxtest' console]
systemd 245.4-4ubuntu3.1 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid)
Detected virtualization container-other.
Detected architecture x86-64.

Welcome to Ubuntu 20.04 LTS!

Failed to enqueue IPv4 loopback address add request: Invalid argument
Caught <SEGV>, dumped core as pid 11979.
Exiting PID 1...

systemd 241 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid)
Detected virtualization container-other.
Detected architecture x86-64.

Welcome to Debian GNU/Linux 10 (buster)!

Set hostname to <localhost>.
Failed to enqueue IPv4 loopback address add request: Invalid argument
Assertion '*_head == _item' failed at ../src/libsystemd/sd-netlink/netlink-slot.c:100, function netlink_slot_disconnect(). Aborting.
Caught <ABRT>, dumped core as pid 1493.
Exiting PID 1...

@papertigers - is this the same on SmartOS?

slow NFS writes in 151026

I recently installed a new host. So new I couldn't install LTS on it so I've installed 151026.

This host is strictly for serving ZFS-based NFS & CIFS. Everything else is just default.

Over time it has become fairly obvious to me that NFS writes are ... well, abysmal.

This example is copying a 36GB directory of mixed size/type files. The first copy is strictly on a filesystem on the new server. The second is reading from the new server to an existing one. The third is doing the same read/write activity as test one but on an existing server running 151022.

on new fileserver:

: || nomad@omics1 fs2test ; time cp -rp 004test omics1/004test-1

real 22m27.225s
user 0m0.188s
sys 0m29.880s

reading from new fileserver, writing to existing fileserver:

: || nomad@omics1 hvfs2test ; time cp -rp /misc/fs2test/004test .

real 2m9.770s
user 0m0.180s
sys 0m28.694s

existing fileserver:

: || nomad@omics1 hvfs2test ; time cp -rp 004test omics1/004test-1

real 2m14.158s
user 0m0.242s
sys 0m30.313s

While the user and system times are consistent across all tests the wall clock time of the first test is 10x that of the others. I've seen wall clock time on these tests take as long as 50 minutes. All tests were done on the same CentOS 7 host.

Watching snoop collect packets I see multiple-minutes-long pauses while writing to the new server. If I'm reading the heat maps right - https://drive.google.com/open?id=1zcX9ryXjrPMH0_uUbfywiTTnJDau4WW0 - it seems to be spending about 81% of its time in _t_cancel, waiting on a thread to cancel. I'm not a dev, haven't looked at the code, so it's quite possible I'm misunderstanding what the map is saying.

The client spends a lot of time so stuck in diskwait that it can take several minutes to respond after a SIGINT, SIGHUP, or SIGKILL to the cp process.

nomad

Panic after boot menu after upgrading to r151032

I have a server we use for testing, Asus X399 motherboard, running OmniOS r151028. After upgrading to r151032, I am experiencing PCI(-X) Express panic right after exiting initial boot menu.

Trying to install r151032 from USB memstick have same results: PCI(-X) Express panic after boot menu.

Server boots fine if, at boot menu, I selecting BE prior to upgrade (on this server, r151028-7).

Server boots also fine by switching off both USB 3.1 controller and illumos-compatible Intel I211-AT network card.

I do not mind losing USB 3.1, but of course network is a must.

I am reporting FYI, just to let you know as curious fact: because same server is working fine with r151028, but throws kernel panic with r151032. You can close this at any time. I will probably end up buying a new NIC.

Review merges from illumos-gate for backport

This is the list of changes merged into master from illumos-gate since the r151022 branch was updated. We should review these for any that need backporting to r151022.

Current backport list:

8055 mr_sas online-controller-reset (OCR) does not work with some gen3 adapters.
8303 loader: biosdisk interface should be able to cope with 4k sectors
8226 missing boot environments cause bootadm list-menu to segfault

    8297 update mdocml to 1.14.1
    8082 last(1) should be able to print years in output
    8323 ndmpd: left shift of the negative value
    8415 loader: biosdisk comment wording
    8374 loader: devicename.c cleanup
    8376 cached v_path should be kept fresh
    3729 getifaddrs must learn to stop worrying and love the other address families
    8335 mr_sas - remove PDSUPPORT conditional.
    8396 uts: vm_dep.h error: left shift of negative value
    8336 ed: misleading-indentation
    8384 AVX512 dis - EVEX prefix support
    8385 32-bit avx dis test mishandles EVEX prefix
    8386 32-bit bound dis is incorrect
    8362 libc: install_legacy() overwrites __runetype, __maplower, and __mapupper for _DefaultRuneLocale
    8349 thrd_equal implements the wrong specification
    8371 remove warlock files in usr/src/uts
    8393 bnxe: left shift of negative value and bad macro
    8392 Do not cast the return value of xdr_free()
    8367 remove warlock leftovers from usr/src/uts Makefiles
    8329 ldapcachemgr: misleading-indentation
    8364 ldapcachemgr does not set log file in debug mode if -l was not used
    8223 libshell: misleading-indentation
    7876 libast: misleading-indentation errors
    8379 illumos-gate 'install' make target is too eager building things
    8360 ipdadm missing 'all' target
    8359 libzpool Makefiles are slightly broken
    8397 sysevent.h: C++11 requires a space between literal and string macro
    8410 ucoreadm links against libraries outside the proto area
    8394 fcoet: array subscript is above array bounds
    8398 pcmcia: sizeof on array function parameter
    8381 Convert ipsec_alg_lock from mutex to rwlock
    8229 ixgbe: misleading-indentation
    8362 libc: install_legacy() overwrites __runetype, __maplower, and __mapupper for _DefaultRuneLocale (breaks copy relocs)
    8331 zfs_unshare returns wrong error code for smb unshare failure
    8362 libc: install_legacy() overwrites __runetype, __maplower, and __mapupper for _DefaultRuneLocale
    8311 ZFS_READONLY is a little too strict
    8332 krb5: misleading-indentation
    8204 Makefile changes in zfstest cannot cope with empty directories
    8334 ipf: self-comparison always evaluates to false
    8375 Kernel memory leak in nvpair code
    5220 L2ARC does not support devices that do not provide 512B access
    8317 ddi_periodic_add(9F) has wrong type for arg in summary
    8322 nl: misleading-indentation
    8106 authloopback_marshal() can violate the RPC specification
    8109 Kernel AUTH_SYS and AUTH_LOOPBACK implementation can ignore provided credentials
    8354 sync regcomp(3C) with upstream (fix make catalog)
    5428 provide fts(), reallocarray(), and strtonum()
    8086 Add Stride parameter to dd
    8328 keyserv: sizeof on array function parameter
    8369 libcmdutils should be better about large file support
    8370 libcmdutils needlessly defines its own OFFSETOF() macro
    6856 sys/stream.h exposes unnecessary macros to userland
    8264 want support for promoting datasets in libzfs_core
    8319 dis support for new xsave instructions
    8232 pcmcia: misleading-indentation
    8350 mr_sas - replace sprintf() with snprintf()
    8302 svr4pkg unused variables
    8326 logger: misleading-indentation
    8316 srchtxt: misleading-indentation
    8296 tcopy: misleading-indentation
    8303 loader: biosdisk interface should be able to cope with 4k sectors
    8131 loader: add support for chain and device BE's
    8366 remove warlock leftovers from usr/src/cmd and usr/src/lib
    8355 need libc regex tests
    8354 sync regcomp(3C) with upstream
    8327 logadm: misleading-indentation
    8325 luxadm: misleading-indentation
    8338 format: misleading-indentation
    8173 workaround qemu-xhci HCIVERSION bug
    8321 passmgmt: misleading-indentation
    8314 tbl: misleading-indentation
    8315 sendmail: misleading-indentation
    8287 arn: misleading-indentation
    8320 regcmp: misleading-indentation
    8340 datadm: self-comparison always evaluates to false
    8333 idmap: misleading-indentation
    8130 loader: enable BE menu if we have BE list
    8129 bootadm: add support for non-zfs boot entries in menu.lst
    8226 missing boot environments cause bootadm list-menu to segfault
    8250 libnsl: Raw RPC client sends unlimited data
    8306 provide ofmt routines in public libofmt library and document them
    8269 dtrace stddev aggregation is normalized incorrectly
    8108 zdb -l fails to read labels 2 and 3
    8056 zfs send size estimate is inaccurate for some zvols
    8156 dbuf_evict_notify() does not need dbuf_evict_lock
    8168 NULL pointer dereference in zfs_create()
    8276 rpcbind leaks memory due to libumem per thread caching.
    8270 dnlc_reverse_lookup() is unsafe at any speed
    8300 fix man page issues found by mandoc 1.14.1
    8337 gss: misleading-indentation
    8324 more: misleading-indentation
    8304 zfs-tests/bin/zfstest should allow DISKS=(zvols)
    8005 poor performance of 1MB writes on certain RAID-Z configurations
    8155 simplify dmu_write_policy handling of pre-compressed buffers
    5097 psignal and psiginfo don't handle NULL arguments correctly
    8305 Need to handle NVMe devices with EUI64 values
    8293 fs.d: misleading-indentation and longjump issues
    8286 chxge: misleading-indentation
    8285 kssl: misleading-indentation
    8284 fdc: misleading-indentation
    6939 add sysevents to zfs core for commands
    7751 mpt_sas sometimes times out sending SEP messages
    8055 mr_sas online-controller-reset (OCR) does not work with some gen3 adapters.
    8295 tsol: misleading-indentation
    8294 auditd: case values not in enum
    8292 rdc: misleading-indentation
    8291 sdbc: misleading-indentation
    8290 libwanboot: misleading-indentation
    8289 sun_fc: Sun_fcAdapterCreateWWN.cc is missing unistd.h
    8288 uts: common/os/cap_util.c has misleading indentation
    8282 hci1394: self-comparison always evaluates to true
    8281 ucbcmd/ls: misleading-indentation
    8132 loader: boot does leave BE menu in environment
    6961 64-bit octal printf may overflow internal buffer
    8298 snoop: dhcp option_types list is missing strings
    8191 in.routed: misleading-indentation
    8162 cscope-fast: this statement may fall through
    8239 Want NVMe 1.2 support
    8240 AVX512 dis - opmask instruction support
    8271 loader: Replacing iterating over rootpath by strsep
    8247 uts: Remove archaic register keyword from zmod
    8238 xdr_callmsg() should clear residual bytes
    7768 Avoid vgatext dependency on agpmaster
    8262 sadp is neither built nor used
    8279 socketpair(AF_UNIX, SOCK_DGRAM,...) broken after 7590
    8194 kmfcfg: case value not in enumerated type
    8171 loader: distinguish NFS versus TFTP boot by rootpath
    8202 doors man pages contain extra whitespace
    5180 door_server_create(3c): Incomplete return type
    8170 update CLDR data to v31
    8263 pkgchk has unused -Q flag
    8099 loader: do not build complex commandline for mb2 kernel
    7908 add loader manpage to pkg://system/boot/loader
    8257 ifconfig configinfo confuses mtu with metric
    4713 rpc_svc_create(3nsl): "for the given program" is strange
    backout: 8067 zdb should be able to dump literal embedded block pointer (breaks build)
    8265 Reserve send stream flag for large dnode feature
    8067 zdb should be able to dump literal embedded block pointer
    7578 Fix/improve some aspects of ZIL writing.
    8192 in.ndpd: misleading-indentation
    8246 snoop(1m) clobbers status for the NFSv4 SETATTR operation
    8234 rpcib: misleading-indentation
    8166 zpool scrub thinks it repaired offline device
    8021 ARC buf data scatter-ization
    8100 8021 seems to cause random BAD TRAP: type=d (#gp General protection)
    7446 zpool create should support efi system partition
    8186 rdist: misleading-indentation
    8182 mac: misleading-indentation
    8230 e1000api: misleading-indentation
    8213 uts: get smbios from bootloader
    8198 acct: misleading-indentation
    8197 bnu: misleading-indentation
    8071 zfs-tests: 7290 missed some cases
    8125 kmem_move tunables must not be declared static
    8070 Add some ZFS comments
    8076 zfs-tests suite fails rootpool_002_neg
    8077 zfs-tests suite fails zpool_get_002_pos
    8072 zfs-tests: several test cases incorrectly spell TESTPOOL
    8064 need a static DTrace probe in VN_HOLD (incorporate review feedback)
    7590 sendmsg on AF_UNIX socket fails after process drops privileges
    8231 xdr_admin(3nsl): Invalid return types in the man page
    8064 need a static DTrace probe in VN_HOLD
    8149 deadlock between datalink deletion and kstat read
    7444 fs/xattr.c should be more transparent (zfs_acl_test)
    8221 libndmp: misleading-indentation
    8222 libdscfg: misleading-indentation
    8215 print: misleading-indentation
    OS-6052 need /proc/self/uid_map
    8219 mech_krb5: misleading-indentation
    8200 cmd/lp: misleading-indentation
    8216 libsldap: misleading-indentation
    8207 localedef links against libraries outside of proto area
    8218 libshare: misleading-indentation
    8217 libldap5: misleading-indentation
    8214 pam_modules: misleading-indentation
    8220 common/iscsi: misleading-indentation
    8180 Invalid netbuf decoded by xdr_netbuf()
    8193 pktool: misleading-indentation
    8190 in.talkd: misleading-indentation
    8189 dns-sd: misleading-indentation
    8188 tic: misleading-indentation
    8185 telnet: misleading-indentation

dfstab & sharetab - tool validation before edit

I was recently reminded that a path entry in /etc/dfs/dfstab should not match one in /etc/dfs/sharetab (and visa versa). I was reminded this when our main file server became unavailable to both NFS and SMB because a filesystem was NFS exported in both places.

It would be useful if the /usr/sbin/share and zfs set share* commands would validate that the entity being shared isn't already in the other config file and exit with error if it is, as a protective measure.

thanks,
nomad

bhhwcompat throw error Cannot open /dev/vmmctl: No such file or directory

Tested on two installations with omnios-r151028-3aeadc46d9
Bhyve is no working on both systems, and when I use the bhhwcompat -v command to check compatibility, it throw this error: Cannot open /dev/vmmctl: No such file or directory

systemd-tmpfiles failing in updated lx zone

A customer reported that after updating their ubuntu lx zone several services failed to start after reboot, including sshd.

The problem is that the systemd-tmpfiles service fails to start properly and so several temporary directories under /var/run are not created after /run is mounted on tmpfs.

root@ubuntu-14-04-b:~# systemctl status systemd-tmpfiles-setup.service
 systemd-tmpfiles-setup.service - Create Volatile Files and Directories
   Loaded: loaded (/lib/systemd/system/systemd-tmpfiles-setup.service; static; v
   Active: failed (Result: exit-code) since Thu 2018-11-22 12:00:51 UTC; 3min 47
     Docs: man:tmpfiles.d(5)
           man:systemd-tmpfiles(8)
  Process: 28706 ExecStart=/bin/systemd-tmpfiles --create --remove --boot --excl
 Main PID: 28706 (code=exited, status=1/FAILURE)

The service log shows several instances of:

Failed to validate path /var/run/sshd: Too many levels of symbolic links

and strace shows:

open("/", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 3
openat(3, "var", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 4
openat(4, "run", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = -1 ELOOP (Too many levels of symbolic links)

/var/run is a symbolic link to /run

The linux man page for openat(2) states:

O_PATH
              If pathname is a symbolic link and the O_NOFOLLOW flag is also
              specified, then the call returns a file descriptor referring
              to the symbolic link.  This file descriptor can be used as the
              dirfd argument in calls to fchownat(2), fstatat(2), linkat(2),
              and readlinkat(2) with an empty pathname to have the calls
              operate on the symbolic link.

Since the openat call is opening the /var/run symlink with O_PATH|O_NOFOLLOW, it should succeed under lx. Newer systemd (after systemd/systemd@addc3e302dad239fb11cf280b) expects this.

Requesting support for /etc/netgroups (not needing NIS for netgroups)

I have several NFS exports that go to so many hosts I can't list them on zfs sharenfs lines (meaning I have a mix of zfs shared and /etc/dfs/dfstab entries) plus there is always a high chance of error whenever updating all of them when a host is added/removed from the list.

It would be very helpful if https://www.illumos.org/issues/3163 were to get some traction. Being able to just list netgroup entries would greatly improve reliability.

thanks,
nomad

ps in r151032 is truncating command arguments

Andy mentioned that r151032 has the new ps.

Unfortunately it appears to be truncating long commands. The simplest way to show this is to run something like the following.

Test program:

# cd /usr/share/man
# more */*

Result in r151030:

# ps uaxww 549|wc
       2   10597  261237

Result in r151032:

# ps uaxww 569|wc
       2     321    4227
# ps -Fl -p 569|wc
       2     329    4276

It appears that it's truncating the amount of the command shown to 4k instead of showing all of it like it should (ps uaxww and ps uaxwwe is a regression, the ps -F behaviour isn't consistent with the man page).

backporting ashift property from 151032 to 151030

There is still a lot of 512e disks around which sometimes result in ashift=9 and other times result in ashift=12. The old way of using sd.conf is also not bullit proof and using mdb to setting minimum allowed ashift is also just a hack since you will need to change this back to W9 when using native 512 disk so I definitely would like ashift property backported to current LTS.

Backport to r151030: 10215 lofiadm -la fails after lofiadm -a / lofiadm -d

Is there any chance this fix can be backported to r151030?
illumos@45ca534#diff-0319b8ec2cbc2de25a001d9d090f7123

This bug is causing me a lot of grief because as soon as I detach a lofi device after having a labelled device attached previously I'm not able to attach any more.

Thanks.

LX: systemd hanging on some boots

Since 8390247 (OS-8181 lx getsockopt of SO_PROTOCOL fails) and its follow-up 462ccf4 (OS-8181 exposed fault in lx netlink bind/sendto) were integrated, modern Linux systems with systemd intermittently fail to boot (in some cases they fail more often than not)

When it fails, systemd hangs early:

NOTICE: Zone booting up]
Failed to read /proc/sys/fs/nr_open, ignoring: No such file or directory
<30>systemd[1]: systemd 245.4-4ubuntu3.1 running in system mode. (+PAM +AUDIT +S
ELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL
 +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hy
brid)
<30>systemd[1]: Detected virtualization container-other.
<30>systemd[1]: Detected architecture x86-64.

Welcome to Ubuntu 20.04 LTS!

<31>systemd[1]: sd-netlink: Failed to enable NETLINK_EXT_ACK option, ignoring: Invalid argument
<31>systemd[1]: Failed to bring loopback interface up: Connection timed out

strace on the systemd process shows it in a loop calling poll() and recvmsg(PEEK)

recvmsg(4, {msg_name=0x7fffffeff750, msg_namelen=128->0, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_PEEK|MSG_TRUNC) = 0
ppoll([{fd=4, events=POLLIN}], 1, {tv_sec=5, tv_nsec=0}, NULL, 8) = 0 (Timeout)
recvmsg(4, {msg_name=0x7fffffeff750, msg_namelen=128->0, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_PEEK|MSG_TRUNC) = 0
ppoll([{fd=4, events=POLLIN}], 1, {tv_sec=5, tv_nsec=0}, NULL, 8) = 0 (Timeout)
recvmsg(4, {msg_name=0x7fffffeff750, msg_namelen=128->0, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_PEEK|MSG_TRUNC) = 0
ppoll([{fd=4, events=POLLIN}], 1, {tv_sec=5, tv_nsec=0}, NULL, 8) = 0 (Timeout)
recvmsg(4, {msg_name=0x7fffffeff750, msg_namelen=128->0, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_PEEK|MSG_TRUNC) = 0

Extracting the stack trace (via mdb and /nap) shows:

0x7fffffeff4c8: libc.so.6`ppoll+0x4f
0x7fffffeff818: libsystemd-shared-245.so`socket_read_message+0x68
0x7fffffeff958: libsystemd-shared-245.so`close_nointr+0xb

I used dtrace to do some more delving here and fd 4 is a netlink socket. Modifying the lx init process to give me time to attach strace early shows that systemd sends three netlink messages in quick succession, and there are three replies.

sendto(4, " \0\0\0\24\0\5\0\1\0\0\0\0\0\0\0\2\10\200\376\1\0\0\0\10\0\2\0\177\0\0\1", 32, 0, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, 16)
sendto(4, ",\0\0\0\24\0\5\0\2\0\0\0\0\0\0\0\n\200\200\376\1\0\0\0\24\0\2\0\0\0\0\0"..., 44, 0, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, 16)
sendto(4, " \0\0\0\23\0\5\0\3\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0", 32, 0, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, 16)

ppoll([{fd=4, events=POLLIN}], 1, {tv_sec=4, tv_nsec=999790000}, NULL, 8)
recvmsg(4, {msg_namelen=12}, MSG_PEEK|MSG_TRUNC)
recvmsg(4, {msg_namelen=12}, MSG_TRUNC)
ppoll([{fd=4, events=POLLIN}], 1, {tv_sec=4, tv_nsec=999209000}, NULL, 8)
recvmsg(4, {msg_namelen=12}, MSG_PEEK|MSG_TRUNC)
recvmsg(4, {msg_namelen=12}, MSG_TRUNC)
ppoll([{fd=4, events=POLLIN}], 1, {tv_sec=4, tv_nsec=998680000}, NULL, 8)
recvmsg(4, {msg_namelen=12}, MSG_PEEK|MSG_TRUNC)
recvmsg(4, {msg_namelen=12}, MSG_TRUNC)

Yet when it fails to boot it does not seem happy with the replies, and continues to wait.

I used this dtrace script to investigate the messages going in each direction.

#!/usr/sbin/dtrace -FCs

fbt::lx_netlink_send:entry {
        lxsock = (lx_netlink_sock_t *)arg0;
        mp = (mblk_t *)arg1;
        msg = (struct msghdr *)arg2;
        hdr = (lx_netlink_hdr_t *)mp->b_rptr;

        printf("---------------------------------------------------------\n");
        printf("proto: %x, type: %x, seq: %x",
            lxsock->lxns_proto, hdr->lxnh_type, hdr->lxnh_seq);
        print(*hdr);
        printf("\n");
}

fbt::lx_netlink_reply_done:entry {
        r = (lx_netlink_reply_t *)arg0;
        mp = (mblk_t *)r->lxnr_err;
        self->rr = (lx_netlink_err_t *)mp->b_rptr;
}

fbt::lx_netlink_reply_sendup:entry {
        reply = (lx_netlink_reply_t *)arg0;
        mp = (mblk_t *)arg1;
        mpl = (mblk_t *)arg2;
        rr = (lx_netlink_err_t *)self->rr;

        printf("\n");
        print(*reply);
        printf("\n");
        print(*rr);
        printf("\n");
}

Here's the first send and receive:

# CPU FUNCTION
  0  -> lx_netlink_send                       ---------------------------------------------------------
                                                               proto: 0, type: 14, seq: 1
  0    -> lx_netlink_reply_sendup

lx_netlink_reply_t {
    lx_netlink_hdr_t lxnr_hdr = {
        uint32_t lxnh_len = 0x20
        uint16_t lxnh_type = 0x14
        uint16_t lxnh_flags = 0x5
        uint32_t lxnh_seq = 0x1
        uint32_t lxnh_pid = 0
    }
    lx_netlink_sock_t *lxnr_sock = 0xfffffe16f39045d8
    uint32_t lxnr_seq = 0
    uint16_t lxnr_type = 0
    mblk_t *lxnr_mp = 0
    mblk_t *lxnr_err = 0
    mblk_t *lxnr_mp1 = 0xfffffe17162cce20
    int lxnr_errno = 0x7a
}
lx_netlink_err_t {
    lx_netlink_hdr_t lxne_hdr = {
        uint32_t lxnh_len = 0x24
        uint16_t lxnh_type = 0x2
        uint16_t lxnh_flags = 0x6b6c
        uint32_t lxnh_seq = 0x1
        uint32_t lxnh_pid = 0x3d21
    }
    int32_t lxne_errno = 0x7a
    lx_netlink_hdr_t lxne_failed = {
        uint32_t lxnh_len = 0x20
        uint16_t lxnh_type = 0x14
        uint16_t lxnh_flags = 0x5
        uint32_t lxnh_seq = 0x1
        uint32_t lxnh_pid = 0
    }
}

That lxnh_flags value of 0x6b6c in the error header is a bit suspicious.
Looking at the netlink code, the error path does not set the flags field so it is uninitialised memory.
Bit 1 (0x2) of that flags field indicates a multipart message. A hypothesis is that if the reply is flagged as multipart, systemd will wait for a NLMSG_DONE packet.

I wrote a smaller dtrace script to just show the flags in the reply messages and this definitely seems to be the case.
Reproducibly, if the reply messages have flags with bit 1 set, then the zone does not boot.

For example, this is a failed boot:

 CPU     ID                    FUNCTION:NAME
  9  72240            lx_netlink_send:entry proto: 0, type: 14, seq: 1
  9  72186    lx_netlink_reply_sendup:entry Flags: 6f72
  9  72240            lx_netlink_send:entry proto: 0, type: 14, seq: 2
  9  72186    lx_netlink_reply_sendup:entry Flags: 4143
  9  72240            lx_netlink_send:entry proto: 0, type: 13, seq: 3
  9  72186    lx_netlink_reply_sendup:entry Flags: 6176

with bit 1 set in each of the flags

and this one succeeded:

 CPU     ID                    FUNCTION:NAME
  8  72240            lx_netlink_send:entry proto: 0, type: 14, seq: 1
  8  72186    lx_netlink_reply_sendup:entry Flags: 7570
  8  72240            lx_netlink_send:entry proto: 0, type: 14, seq: 2
  8  72186    lx_netlink_reply_sendup:entry Flags: 6c61
  8  72240            lx_netlink_send:entry proto: 0, type: 13, seq: 3
  8  72186    lx_netlink_reply_sendup:entry Flags: 6261

with bit 1 clear

When there is a mix, the zone also does not boot - here's one where the first reply does not have the flag set.

   3  72240            lx_netlink_send:entry proto: 0, type: 14, seq: 1
  3  72186    lx_netlink_reply_sendup:entry Flags: 6261
  3  72240            lx_netlink_send:entry proto: 0, type: 14, seq: 2
  3  72186    lx_netlink_reply_sendup:entry Flags: 6f72
  3  72240            lx_netlink_send:entry proto: 0, type: 13, seq: 3
  3  72186    lx_netlink_reply_sendup:entry Flags: 7473

bloody: zones do not shut down properly.

Date: Fri, 21 Jul 2017 08:22:51 +0000
From: Jim Klimov
To: OmniOS-discuss
Subject: [OmniOS-discuss] Local zone regression in CE bloody

Hi all,

I have an OmniOS bloody box that was last running 151023. Yesterday I updated it
to latest available original omnios from May 15 or so, and updated that BE to
omniosce bloody. Between the two, zone shutdown stopped working for me (both
ipkg and lx), with the ce variant claiming that "datalinks remain in zone after
shutdown / unable to unconfigure network interfaces in zone / unable to destroy
zone".

When the zone boots it seems okay and usable, but when trying to halt it -
becomes "down" and I can not change the state (no boot/mount/ready/... options
succeed); killing its zoneadmd and wiping then/var/run/zones also does not help>

only GZ reboot.

Jim

new lipkg zone unexpectedly was not linked from the outset

@idodeclare:

Hello. I had an odd experience with a new lipkg zone yesterday. I created it as usual from the root with zonecfg/zoneadm, and then later in the day became aware of the r151024q update.

From the root, though, pkg update -r would not work because of outdated pkg, but then neither pkg install -r pkg:/package/pkg nor pkg install pkg:/package/pkg would work because the new zone apparently was already on the new bits and was raising IPS exceptions concerning version.

@citrus-it:

That is strange.. there was a pkg update in r151024q
did you manage to resolve it? You might have had to downgrade pkg in the zone
or, use pkg update -f

@idodeclare:

Yes. Since that's rare, I wonder if there is a latent bug where new lipkg zones might get unexpectedly new bits.

I.e. in violation of "linked" right off the bat even though the initial operations (zonecfg/zoneadm) happened from the root.

@citrus-it:

yes, definitely worth further testing.

`sendfile` Apparently not Translated to Linux Equivalent for lx Zone

Version: r151036 (will grab my uname later, am on my phone).
lx package version: (...)
Guest OS: Ubuntu Focal

I have re-encountered this bug occurring in python, from within a Ubuntu guest in an lx zone.

It appears python hadn't taken into account the differences in behaviour between Linux's sendfile, and how this is handled in illumos, namely that illumos returns a EINVAL when Linux doesn't under certain conditions (glossing over it, the source data is smaller than the arguments given). Their patch is to disable use of this syscall on non-linux platforms for now.

I downloaded a newer version of python to find the problem persisted. Their patch uses the OS version as reported to determine whether to use sendfile or not, and within the guest, this is correctly returning a Linux distro. I had to edit the file in question and set the result to False manually, at which point it succeeded.

This implies to me that the syscall isn't correctly being mapped to a Linux equivalent implementation as part of the lx subsystem, and subsequently, the Solaris-specific behaviour is being seen within a Linux guest zone, causing the discrepancy.

AES encrypt-then-decrypt does not produce original file

Please see below chat from 31-Dec-2019 where jbk diagnosed a bug in OmniOS encrypt/decrypt and identified the illumos-gate commit that would fix it:

[18:50:31] <idodeclare> I'm not sure when this broke but `encrypt -a aes ...` followed by `decrypt -a aes ...` does not produce the original input file
[19:00:29] <LeftWing> idodeclare: It'd help to produce a short shell program that demonstrates the failure
[19:03:28] <jbk> also, kernel/distro versions
[19:04:19] <jbk> the encrypt/decrypt commands used CKM_AES_CBC_PAD, there were some bugs that were fixed this past year
[19:04:50] <jbk> (so good to know if you are seeing the issue with/without the fixes)
[19:10:46] <jbk> (having found and looked at the source for them)
[19:25:32] <idodeclare> LeftWing: OK https://gist.github.com/idodeclare/5b07f5143d9115e3ce4128938895a55a to encrypt/decrypt /etc/resolv.conf. Run ./demo_encrypt_decrypt arcfour to show a working algo and then ./demo_encrypt_decrypt aes
[19:25:44] <idodeclare> jbk: I'm on latest Omni OS r151032
[19:29:27] <jbk> it looks like it's missing this commit:
[19:29:30] <jbk> commit 8d91e49dd95381d46f9364f5de9e9027a11e1118
[19:29:30] <jbk> Author: Jason King <jason.king at joyent dot com>
[19:29:30] <jbk> Date:   Fri Jun 28 00:45:40 2019 +0000
[19:29:30] <jbk>     11825 PKCS#11 CKM_AES_CBC_PAD decryption can fail
[19:29:30] <jbk>     Reviewed by: Dan McDonald <danmcd at joyent dot com>
[19:29:32] <jbk>     Approved by: Gordon Ross <gordon.w.ross at gmail dot com>
[19:30:24] <jbk> I've got to run and do a few things, but I can try that a bit later on the latest smartos (which has that fix)
[19:31:21] <jbk> which would tell us if that does fix the issue (I suspect it will, but always nice to be able to confirm it)
[19:31:36] <idodeclare> jbk: thank you for looking at it
[19:32:17] <jbk> well i accidentialy broke it trying to add support for other stuff (sorry about that :( ), so only fitting I should try to fix it
[19:32:57] <jbk> unfortunately, the padding requirements and the PKCS#11 API requirements intersect in some unfortunate and annoying ways
[19:42:07] <idodeclare> Oh :) well much appreciated
[20:56:03] <jbk> idodeclare: looks like on the latest smartos it works ok..
[20:57:09] <jbk> i'm not sure what the omnios policy is on backporting fixes, though the person most likely to know is in the Uk and is likely further along on the NYE celebrations than here in the US :), so it may be a day or two before someone knows the answer
[20:58:59] <jbk> https://pastebin.com/5zz9UrR2
[21:02:46] <jbk> if you're in a pinch and need something sooner, I can probably build you a fixed libpkcs11_softtoken.so (that's where the bug is) that you could replace or loopback mount to make the encrypt/decrypt commands work w/ AES
[21:06:11] <idodeclare> jbk: oh that's encouraging. No I'm not in a pinch — but thank you for offering. Happy New Year!
[21:13:52] <jbk> ok.. if that changes feel free to let me know.. and sorry again for the breakage
[21:16:39] <idodeclare> no worries!

pkg://omnios/system/test/smbclient should be in "extra" repository

Arguably, all packages in the core repository should be usable without the "extra" publisher configured. However, system/test/smbclient depends on ooce/runtime/expect which is in the "extra" repository:

$ uname -a
SunOS nfs 5.11 omnios-r151034-f57f507df0 i86pc i386 i86pc

$ pkg contents -t depend smbclient
TYPE FMRI
require consolidation/osnet/osnet-incorporation
require pkg:/system/[email protected]
require system/test/testrunner
require pkg:/[email protected]
require ooce/runtime/expect

Routing from non-global zone broken under ESXi as of r151026u

After update to r151026u, non-global zone routing is not working from exclusive-IP zones for hosts running under ESXi. From a NGZ, I can successfully ping the GZ interface, but it fails for anything off the host.

I am used to running an illumos ESXi host in a promiscuous port group to get zones routing to work, but still it is now broken for OmniOS r151026u.

I also run OpenIndiana, and I confirm that Hipster 2018.04 NGZ routing still works.

pfexec.1 has mistaken duplication

There is the following weirdness in OmniOS's man pfexec possibly from a merge mistake in 5e6220e:

...
       For pfexec to function correctly, the pfexecd daemon must be running in
       the current zone. This is normally managed by the
       "svc:/system/pfexec:default" SMF service (see smf(5)).

USAGE
       For pfexec to function correctly, the pfexecd daemon must be running in
       the current zone. This is normally managed by the
       "svc:/system/pfexec:default" SMF service (see smf(5)).

USAGE
       pfexec is used to execute commands with predefined process attributes,
       such as specific user or group IDs.

       Refer to the sh(1), csh(1), and ksh(1) man pages for complete usage
       descriptions of the profile shells.
...

Install media not available by using USB 3.0 device

It's not possible to install OmniOS if the install media is an USB 3.0 flash drive because the device is not detected during boot. This issue is only related to USB 3.0 flash drives, so the installation works by using an USB 2.0 stick.

It might be related to:

Fix lint warnings

There are three lint warnings when building bloody, caused by the newer version of lint. These cause nightly to report a build failure even when the build was otherwise successful.

==== lint warnings src ====

"../../common/os/acct.c", line 438: warning: assignment causes implicit narrowing
conversion (E_ASSIGN_NARROW_CONV)

"/data/omnios-build/omniosorg/omnios.bloody/usr/src/uts/common/io/iwn/if_iwn.c",
 line 2052: warning: function returns value which is sometimes ignored: memset
(E_FUNC_RET_MAYBE_IGNORED2)

"/data/omnios-build/omniosorg/omnios.bloody/usr/src/uts/common/os/sysent.c",
line 1213: warning: function returns value which is always ignored: yield
(E_FUNC_RET_ALWAYS_IGNOR2)

eventfd() failed (38: Function not implemented)

it seems we are missing this patch from our lx TritonDataCenter@662ce04

Ldap crash causing system to hang (illumos 8543)

From the mailing list:

Date: Mon, 31 Jul 2017 07:05:14 +0000
From: Oliver Weinmann
To: omnios-discuss
Subject: [OmniOS-discuss] Ldap crash causing system to hang fixed in    illumos>

Hi Guys,

I'm currently facing this bug under OmniOS 151022 and I just got informed that
this has been fixed:

https://www.illumos.org/issues/8543

As this bug is causing a complete system hang only a reboot helps. Can this
maybe be implemented?

Best Regards,
Oliver

ESXi / OmniOS passthrough (MSIX) problem Intel P4xxx P9xx

Hi,
anybody can solve this longstanding problem?
The last discussion for this was here:
https://illumos.topicbox.com/groups/discuss/T3388f5869edff6a1/esxi-omnios-passthrough-problem-intel-p4510

I can not program but I can help to test.
In the next few weeks I´ll install a new ESXi-test-server with P4600 and P900.

grep dumps core with -C -B flags

Using built-in grep with -C or -B flags dumps core.

SunOS itstore.vsm.in 5.11 omnios-r151026-673c59f55d i86pc i386 i86pc Solaris
root@omnios:~# which grep
/usr/bin/grep
root@omnios:~# which ggrep
/usr/bin/ggrep

Test:

root@omnios:~# /usr/sbin/prtconf -v | grep -A 1 "6535/b"
            value='id1,kdev@w0025385971b16535/b'
        name='keyboard-layout' type=string items=1
Segmentation Fault (core dumped)

Same if the flags are implicit:

root@omnios:~# /usr/sbin/prtconf -v | grep -1 "6535/b"
            value='id1,kdev@w0025385971b16535/b'
        name='keyboard-layout' type=string items=1
Segmentation Fault (core dumped)

ggrep seems to be fine:

root@omnios:~# /usr/sbin/prtconf -v | ggrep -1 "6535/b"
        name='diskdevid' type=string items=1
            value='id1,kdev@w0025385971b16535/b'
        name='keyboard-layout' type=string items=1

If you wish, I can file report on illumos.org tracker, but I have no way to check this on other illumos-based OS.

__rpcb_findaddr_timed fails the NFS mount process when NFS server doesn't support rpcbind version 4

During NFSv3 mount process the __rpcb_findaddr_timed first try rpcbind version 4 and run RPCBPROC_GETADDRLIST call.
If the NFSv3 server doesn't support RPCBPROC_GETADDRLIST, the __rpcb_findaddr_timed expects it to return RPC_PROCUNAVAIL or RPC_PROGUNAVAIL and then proceed with version 3.
However, some NFS server on the market in this scenario return NFS3ERR_NOTSUPP which is not expected by __rpcb_findaddr_timed and so the NFSv3 mount process fails.
I have verified that on other Linux/Unix OS, like centos or Solaris, GETPORT is run instead of GETADDRLIST and this way the NFSv3 mount process works.

lx zone installation issue

Looks like after zone boot zone is looping in vnic registration and deregistration.
zoneadm shows that zone is in "ready" state which does not allow shutdown.

Exerythinng is explainet in https://illumos.topicbox.com/groups/omnios-discuss/Tef54c5d573098a70-Md5ce4483a6aafb8db3b63c2b

kernel panic on halt - BAD TRAP: type=e (#pf Page fault) rp=fffffe00b6e59b50 addr=0 occurred in module "usba" due to a NULL pointer dereference

SunOS hvfs2 5.11 omnios-r151030-521a1fc4d1 i86pc i386 i86pc

We're using NUT (Network UPS Tool) to monitor our UPSs for power loss. Last week on the 23rd we had an event and the OmniOS hosts started shutting down. However, it seems the newest two (out of 4 total) panic'd on the way down. Both panics were the same.

As part of a scheduled maintenance window I was testing the NUT config on the 25th and evidently triggered two more panics. Here's the last one from syslog:

Dec 25 10:06:55 chrufs nrpe[522]: [ID 702911 daemon.notice] Caught SIGTERM - shutting down...
Dec 25 10:06:55 chrufs nrpe[522]: [ID 702911 daemon.notice] Daemon shutdown
Dec 25 10:06:55 chrufs pseudo: [ID 129642 kern.info] pseudo-device: pm0
Dec 25 10:06:55 chrufs genunix: [ID 936769 kern.info] pm0 is /pseudo/pm@0
Dec 25 10:06:55 chrufs syslogd: going down on signal 15
Dec 25 10:06:58 chrufs rpcbind: [ID 240694 daemon.error] rpcbind terminating on signal 15.
Dec 25 10:07:02 chrufs genunix: [ID 672855 kern.notice] syncing file systems...
Dec 25 10:07:02 chrufs genunix: [ID 904073 kern.notice] done
Dec 25 10:07:04 chrufs unix: [ID 836849 kern.notice]
Dec 25 10:07:04 chrufs ^Mpanic[cpu8]/thread=fffffe83d37b7420:
Dec 25 10:07:04 chrufs genunix: [ID 335743 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=fffffe00b6e59b50 addr=0 occurred in module "usba" due to a NULL pointer dereference
Dec 25 10:07:04 chrufs unix: [ID 100000 kern.notice]
Dec 25 10:07:04 chrufs unix: [ID 839527 kern.notice] svc.startd:
Dec 25 10:07:04 chrufs unix: [ID 753105 kern.notice] #pf Page fault
Dec 25 10:07:04 chrufs unix: [ID 532287 kern.notice] Bad kernel fault at addr=0x0
Dec 25 10:07:04 chrufs unix: [ID 243837 kern.notice] pid=9, pc=0xfffffffff7b87f33, sp=0xfffffe00b6e59c40, eflags=0x10282
Dec 25 10:07:04 chrufs unix: [ID 619397 kern.notice] cr0: 80050033<pg,wp,ne,et,mp,pe> cr4: 3626f8<smap,smep,osxsav,pcide,vmxe,xmme,fxsr,pge,mce,pae,pse,de>
Dec 25 10:07:05 chrufs unix: [ID 152204 kern.notice] cr2: 0
Dec 25 10:07:05 chrufs unix: [ID 634440 kern.notice] cr3: 7000000
Dec 25 10:07:05 chrufs unix: [ID 625715 kern.notice] cr8: f
Dec 25 10:07:05 chrufs unix: [ID 100000 kern.notice]
Dec 25 10:07:05 chrufs unix: [ID 592667 kern.notice] rdi: 0 rsi: 0 rdx: 8
Dec 25 10:07:05 chrufs unix: [ID 592667 kern.notice] rcx: 7 r8: 0 r9: 0
Dec 25 10:07:05 chrufs unix: [ID 592667 kern.notice] rax: fffffe83d317adc0 rbx: 0 rbp: fffffe00b6e59c60
Dec 25 10:07:05 chrufs unix: [ID 592667 kern.notice] r10: e0 r11: 0 r12: 2
Dec 25 10:07:05 chrufs unix: [ID 592667 kern.notice] r13: 1 r14: 0 r15: fffffe83d2e52d40
Dec 25 10:07:06 chrufs unix: [ID 592667 kern.notice] fsb: 0 gsb: fffffe83d1c0d000 ds: 4b
Dec 25 10:07:04 chrufs unix: [ID 592667 kern.notice] es: 4b fs: 0 gs: 1c3
Dec 25 10:07:04 chrufs unix: [ID 592667 kern.notice] trp: e err: 0 rip: fffffffff7b87f33
Dec 25 10:07:04 chrufs unix: [ID 592667 kern.notice] cs: 30 rfl: 10282 rsp: fffffe00b6e59c40
Dec 25 10:07:04 chrufs unix: [ID 266532 kern.notice] ss: 38
Dec 25 10:07:04 chrufs unix: [ID 100000 kern.notice]
Dec 25 10:07:04 chrufs genunix: [ID 655072 kern.notice] fffffe00b6e59a40 unix:real_mode_stop_cpu_stage2_end+c8fc ()
Dec 25 10:07:04 chrufs genunix: [ID 655072 kern.notice] fffffe00b6e59b40 unix:trap+15b1 ()
Dec 25 10:07:04 chrufs genunix: [ID 655072 kern.notice] fffffe00b6e59b50 unix:cmntrap+e6 ()
Dec 25 10:07:05 chrufs genunix: [ID 655072 kern.notice] fffffe00b6e59c60 usba:usb_console_input_enter+13 ()
Dec 25 10:07:05 chrufs genunix: [ID 655072 kern.notice] fffffe00b6e59c80 hid:hid_polled_input_enter+18 ()
Dec 25 10:07:05 chrufs genunix: [ID 655072 kern.notice] fffffe00b6e59ca0 usbkbm:usbkbm_polled_enter+42 ()
Dec 25 10:07:05 chrufs genunix: [ID 655072 kern.notice] fffffe00b6e59cd0 conskbd:conskbd_polledio_enter+38 ()
Dec 25 10:07:05 chrufs genunix: [ID 655072 kern.notice] fffffe00b6e59cf0 wc:wc_polled_enter+24 ()
Dec 25 10:07:05 chrufs genunix: [ID 655072 kern.notice] fffffe00b6e59d00 genunix:prom_poll_enter+1f ()
Dec 25 10:07:05 chrufs genunix: [ID 655072 kern.notice] fffffe00b6e59d10 genunix:prom_exit_to_mon+9 ()
Dec 25 10:07:06 chrufs genunix: [ID 655072 kern.notice] fffffe00b6e59d40 unix:kdi_isr_end+4873 ()
Dec 25 10:07:06 chrufs genunix: [ID 655072 kern.notice] fffffe00b6e59da0 unix:mdboot+116 ()
Dec 25 10:07:04 chrufs genunix: [ID 655072 kern.notice] fffffe00b6e59e30 genunix:kadmin+446 ()
Dec 25 10:07:04 chrufs genunix: [ID 655072 kern.notice] fffffe00b6e59eb0 genunix:uadmin+16d ()
Dec 25 10:07:04 chrufs genunix: [ID 655072 kern.notice] fffffe00b6e59f10 unix:brand_sys_syscall32+1aa ()
Dec 25 10:07:04 chrufs unix: [ID 100000 kern.notice]
Dec 25 10:07:04 chrufs genunix: [ID 111219 kern.notice] dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel + curproc
Dec 25 10:07:04 chrufs ahci: [ID 405573 kern.info] NOTICE: ahci0: ahci_tran_reset_dport port 4 reset port
Dec 25 10:07:04 chrufs ahci: [ID 405573 kern.info] NOTICE: ahci0: ahci_tran_reset_dport port 5 reset port
Dec 25 10:07:04 chrufs genunix: [ID 100000 kern.notice]
Dec 25 10:07:04 chrufs genunix: [ID 665016 kern.notice] ^M100% done: 1296002 pages dumped,
Dec 25 10:07:04 chrufs genunix: [ID 851671 kern.notice] dump succeeded

I didn't realize it at the time (because powerloss) so I didn't grab a savecore. However, it is possible the hosts never needed to swap so I did a savecore -v now and it didn't give any errors.

[[[[,,,savecore -v
,,,SunOS chrufs 5.11 omnios-r151030-521a1fc4d1 i86pc
,,,Crash dump time Wed Dec 25 10:07:04 2019

,,,/var/crash/chrufs/vmdump.0
Metrics:
Master cpu_seqid,8
Master cpu_id,8
dump_flags,0x4000b
dump_ioerr,0
Helpers:
,,000, * * * * * * * * M * * * * * * *
ncbuf_used,1
ncmap,1
Found 4M ranges,0
Found small pages,0
Compression level,0
Compression type,serial lzjb
Compression ratio,6.24
nhelper_used,1
Dump I/O rate MBS,63.90
..total bytes,849408000
..total nsec,13291674186
dumpbuf.iosize,131072
dumpbuf.size,131072
Dump pages/sec,48000
Dump pages,1296002
Dump time,27
per-cent map utilization,87

Per-page metrics:
bitmap nsec/page,14
map nsec/page,0
unmap nsec/page,1
copy nsec/page,1596
compress nsec/page,6462
write nsec/page,9675
inwait nsec/page,0
outwait nsec/page,0
freebufq.empty,0
helperq.empty,0
writerq.empty,0
mainq.empty,17828
I/O wait nsec/page,4134558

Copy pages,1296002
Copy time,4
Copy pages/sec,324000
]]]]

What, if any, information would you like me to try to pull from the cores?

nomad

Allow CARP in bhyve-zones

When setting up CARP between two OpenBSD guests in bhyve with a VNIC each the multicast MAC-address gets filtered and the shared IP is not reachable.
Trying to add the multicast MAC-address as a secondary address using "dladm set-linkprop" on the VNIC it can only be set on one host.
Setting it on a second machine returns the following error:
dladm: warning: cannot set link property 'secondary-macs' on 'znic1': object already exists
Would it be possible to allow a MAC address to be shared among two hosts?

/opt/onbld/env/omnios-* files need smatch

This may be folded into a larger smatch issue, but at the very least the sample env files should include smatch once it officially shows up in illumos-gate. Maybe commented-out configs would be nice in the short-term.

WARNING: vmem_destroy('ipf_minor'): leaked 65949 identifiers

Small svcs(1) fixes for more flexible zones

Thinking about cherry picking the 3 patches mentioned in the mailing list below, as they have (still) not been upstreamed.

https://illumos.topicbox.com/groups/developer/discussions/T8ca4788d89bc77de-Md7dfdeaceadd36b473495e60

Mainly as an exercise and to keep momentum here.

Any objections? @danmcd, do you know by chance if Joyent is still in the process of upstreaming these?

assertion failed: eop != 0, file: ../../common/io/i40e/i40e_transceiver.c, line: 1440

While trying to debug a slowness problem with aggregate network (trunk)s I was attempting to see if we were getting the same speed with just one link as with two. Towards that end I physically removed one of two links on aggr0 (twin i40e NICs). The speed did not change. I then re-installed the link and paused briefly after the link light came back on. I then pulled the other link (to verify that traffic was actually able to flow over both links) and the kernel panic'd.

aggr_nas0 aggr 1500 up -- i40e2,i40e3
aggr_net10 aggr 1500 up -- igb0,igb1
aggr0 aggr 1500 up -- i40e0,i40e1

fs2-i40e-panic_crash.txt

Disk subsystem is completely left off: no parted in pkg

There's seemingly an assumption that there're no 4K sector sized drives in the world. Well, there are (some).

Some fixes from OS-6175 would reduce build noise

A current build generates mandoc lint noise from lx.5 and zfd.7d

This is fixed in this commit for illumos-joyent:

TritonDataCenter@4a59c82#diff-aa688d062b10cee7ba0cd59cea7d7c8d

and the Joyent bugid is OS-6175

We only really want the changes to the two files mentioned, not the other 24 in that commit. (All of the manpages in illumos-gate were fixed before the mandoc update, to eliminate this noise.)

Write performance regression compared to r151014 ?

Howdy!

    I have about 20 machines built on this image circa 2015?

    OmniOS_Text_r151014.usb-dd

    I just noticed that OminOS is being continued as omniosce.org.
    I downloaded the latest install image and did some testing,
    and I got about 1/3 - 1/2 the write performance I was expecting on a simple RAIDZ setup.

    I installed fresh images  from omnios.omniti.com and omniosce.org based on r22

    r151022.usb-dd

 as well as the r14 image above.  

   I get about 1/3 - 1/2 the write performance with r22 compared to r14. 
   It is  a simple write test using dd and measuring performance with zpool iostat.

  Hardware and zpool is identical for each test. I simply swapped out the boot disks
  and booted a different image.

 The hardware is simple..  DELL R710 with LSI-SAS9201-16e

   Thoughts?

  I would like to help resolve this if it interests anyone.

  r14 suits my purpose well enough... but this issue must affect others...

     -steve

Small typo in zlogin man page

"Standalone-processs Interactive Mode" should perhaps be "Standalone-process Interactive Mode"
https://github.com/omniosorg/illumos-omnios/blob/master/usr/src/man/man1/zlogin.1#L87

/usr/lib/smbsrv/smbd dumping core on startup

Host is running 151038. SMB services suddenly stopped. In debugging the problem I see the following in /var/svc/log/network-smb-server:default.log:

[ Aug 12 06:30:35 Executing start method ("/usr/lib/smbsrv/smbd start"). ]
smbd: smbd starting, pid 22478
smbd: NetBIOS services disabled
smbd: service initialized
[ Aug 12 06:30:35 Method "start" exited with status 0. ]
smbd_dc_monitor: online
smbd_localtime_monitor: online
@ Thu Aug 12 06:30:46 2021
smbd.info: smbd_dc_update: [redacted]: located [redacted]
[ Aug 12 06:31:46 Stopping because process dumped core. ]
[ Aug 12 06:31:46 Executing stop method (:kill). ]

This was followed by multiple retry attempts - all with the same result - then ultimately giving up since the service was restarting too frequently.

I had to get things working again quickly so rebooted the host and it has been stable for the past hour.

Kernel SMB (cifs) fails to work by name, it works just be IP after joining the domain, from Window 10 Client

Ok. How to make the R151026 fail with SMB:

Install a brand new latest omnios from ISO, set static IP, and set a hostname (ax55)
Set DNS and Gateway on OmniOs - DNS set to Windwos Server AD DNS for local resolve, run pkg update
Add a DNS record for the OmniOS IP in the windows DNS server
Install Napp-it and add a simple pool with filesystem and a SMB share (not guest enabled)
Test access to share by using the DNS name of OmniOS and it IP - it will ask for password, which is ok
Go to Napp-IT and join the omnios server to the AD domain as usually, works for me everytime (Server 2016, 2012, 2008 whatever)
Open the share again from Windows 10 (\ax55) - if you are lucky it will open, no password asked and work for a few minutes
Wait some time, or just reboot Windows 10 PC from which you are accessing the \ax55 share.
Try to open the share again \ax55. Experience fail - either "windows can't find...." or "unspecifed error ..."

You can still reslove the name to the ip for ax55 from command prompt, you can still ping the OmnioS with IP and hostname from Windows 10, just the share is not working.
You can still access the share by using the IP directly \192.168....
It still works in Windows 7, XP , Server 2008 etc...

Now you can disconnect the OmniOS server from domain, reboot it, whatever and share will never work (by name) again in Windows 10
If I go and change the hostname, it will work again, until it is joined to domain, and then it will fail as described above.

R151022 works as expected, nothing breaks, everything works. Something was changed from R151022 to R151026 to cause this...

SMB 1 disabled / enabled doesn't change anything, I tried it all

pkg:/archiver/gnu-tar missing dependency

pkg:/archiver/gnu-tar has a dependency on pkg:/compress/gzip (needs it for gzcat) which is not part of the manifest. I can't see the manifest in the source tree...