Coder Social home page Coder Social logo

litedram's People

Contributors

acomodi avatar andrewb1999 avatar antonblanchard avatar bunnie avatar cklarhorst avatar craigjb avatar enjoy-digital avatar gatecat avatar gsomlo avatar hansfbaier avatar jedrzejboczar avatar jersey99 avatar johnsully avatar kgugala avatar maribu avatar mateusz-holenko avatar mglb avatar michalsieron avatar mithro avatar mkj avatar oskirby avatar ozbenh avatar rrozak avatar sd-fritze avatar teknoman117 avatar timkpaine avatar tongchen126 avatar trabucayre avatar ximinity avatar xiretza avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

litedram's Issues

tRRD and tFAW not respected

This is hidden when only using a single user port because the time to complete a command exceeds these timings. However when operating with multiple ports it is possible to exceed both tRRD and tFAW as the mux will issue a row command every cycle if it receives a command from a bank to do so.

List various usecases, find best architecture and document how to get the best efficiency.

The controller can be used on various systems which all have specific needs. The current architecture try to be simple and give reasonable efficiency/performance for most of the usecases.

Until now, performance was good on access patterns that were used. (mostly for linear data buffers and store CPU code). We still need to do #8 and #57 to have better metrics.

With more users, some new usecases appear with different access patterns and we need to understand what are the eventual bottlenecks and the way to improve this.

Add BL8 support for S7DDRPHY in 1:2

S7DDRPHY works with:

  • DDR2 / 2 phases / BL4.
  • DDR3 / 4 phases / BL8.

To support DDR3 / 2 phases / BL8, the PHY needs some modifications.

Here is an old modified version of the PHY that has these modifications and that should be merged cleanly:

# 1:4, 1:2 frequency-ratio DDR2/DDR3 PHY for Xilinx's Series7
# DDR2: 400, 533, 667, 800 and 1066 MT/s
# DDR3: 1066, 1333 and 1600 MT/s

import math

from migen import *

from litex.soc.interconnect.csr import *

from litedram.common import PhySettings
from litedram.phy.dfi import *


def get_cl_cw(memtype, tck):
    if memtype == "DDR2":
        # ddr2-400
        if tck >= 2/400e6:
            cl, cwl = 3, 2
        # ddr2-533
        elif tck >= 2/533e6:
            cl, cwl = 4, 3
        # ddr2-667
        elif tck >= 2/677e6:
            cl, cwl = 5, 4
        # ddr2-800
        elif tck >= 2/800e6:
            cl, cwl = 6, 5
        # ddr2-1066
        elif tck >= 2/1066e6:
            cl, cwl = 7, 5
        else:
            raise ValueError
    elif memtype == "DDR3":
        # ddr3-1066
        if tck >= 2/1066e6:
            cl, cwl = 7, 6
        # ddr3-1333
        elif tck >= 2/1333e6:
            cl, cwl = 10, 7
        # ddr3-1600
        elif tck >= 2/1600e6:
            cl, cwl = 11, 8
        else:
            raise ValueError
    return cl, cwl

def get_sys_latency(nphases, cas_latency):
    return math.ceil(cas_latency/nphases)

def get_sys_phases(nphases, sys_latency, cas_latency, write=False):
    cmd_phase = 0
    dat_phase = 0
    diff_phase = 0
    while (diff_phase + cas_latency) != sys_latency*nphases:
        dat_phase += 1
        if dat_phase == nphases:
            dat_phase = 0
            cmd_phase += 1
        if write:
            diff_phase = dat_phase - cmd_phase
        else:
            diff_phase = cmd_phase - dat_phase
    return cmd_phase, dat_phase


class S7DDRPHY(Module, AutoCSR):
    def __init__(self, pads, with_odelay, memtype="DDR3", nphases=4, sys_clk_freq=100e6, iodelay_clk_freq=200e6):
        tck = 2/(2*nphases*sys_clk_freq)
        addressbits = len(pads.a)
        bankbits = len(pads.ba)
        databits = len(pads.dq)
        nphases = nphases

        iodelay_tap_average = {
            200e6: 78e-12,
            300e6: 52e-12,
        }
        half_sys8x_taps = math.floor(tck/(4*iodelay_tap_average[iodelay_clk_freq]))
        self._half_sys8x_taps = CSRStorage(4, reset=half_sys8x_taps)

        if with_odelay:
            self._wlevel_en = CSRStorage()
            self._wlevel_strobe = CSR()

        self._dly_sel = CSRStorage(databits//8)

        self._rdly_dq_rst = CSR()
        self._rdly_dq_inc = CSR()
        self._rdly_dq_bitslip_rst = CSR()
        self._rdly_dq_bitslip = CSR()

        if with_odelay:
            self._wdly_dq_rst = CSR()
            self._wdly_dq_inc = CSR()
            self._wdly_dqs_rst = CSR()
            self._wdly_dqs_inc = CSR()

        # compute phy settings
        cl, cwl = get_cl_cw(memtype, tck)
        cl_sys_latency = get_sys_latency(nphases, cl)
        cwl_sys_latency = get_sys_latency(nphases, cwl)

        rdcmdphase, rdphase = get_sys_phases(nphases, cl_sys_latency, cl)
        wrcmdphase, wrphase = get_sys_phases(nphases, cwl_sys_latency, cwl, write=True)
        wrcmdphase = 1
        print("wrcmdphase: " + str(wrcmdphase) + " wrphase: " + str(wrphase))
        self.settings = PhySettings(
            memtype=memtype,
            dfi_databits=4*databits,
            nphases=nphases,
            rdphase=rdphase,
            wrphase=wrphase,
            rdcmdphase=rdcmdphase,
            wrcmdphase=wrcmdphase,
            cl=cl,
            cwl=cwl,
            read_latency=2 + cl_sys_latency + 2 + 1,
            write_latency=cwl_sys_latency
        )

        self.dfi = Interface(addressbits, bankbits, 4*databits, 4)

        # # #

        bl8_sel = Signal()

        # Clock
        ddr_clk = "sys2x" if nphases == 2 else "sys4x"
        for i in range(len(pads.clk_p)):
            sd_clk_se = Signal()
            self.specials += [
                Instance("OSERDESE2",
                    p_DATA_WIDTH=2*nphases, p_TRISTATE_WIDTH=1,
                    p_DATA_RATE_OQ="DDR", p_DATA_RATE_TQ="BUF",
                    p_SERDES_MODE="MASTER",

                    o_OQ=sd_clk_se,
                    i_OCE=1,
                    i_RST=ResetSignal(),
                    i_CLK=ClockSignal(ddr_clk), i_CLKDIV=ClockSignal(),
                    i_D1=0, i_D2=1, i_D3=0, i_D4=1,
                    i_D5=0, i_D6=1, i_D7=0, i_D8=1
                ),
                Instance("OBUFDS",
                    i_I=sd_clk_se,
                    o_O=pads.clk_p[i],
                    o_OB=pads.clk_n[i]
                )
            ]

        # Addresses and commands
        for i in range(addressbits):
            self.specials += \
                Instance("OSERDESE2",
                    p_DATA_WIDTH=2*nphases, p_TRISTATE_WIDTH=1,
                    p_DATA_RATE_OQ="DDR", p_DATA_RATE_TQ="BUF",
                    p_SERDES_MODE="MASTER",

                    o_OQ=pads.a[i],
                    i_OCE=1,
                    i_RST=ResetSignal(),
                    i_CLK=ClockSignal(ddr_clk), i_CLKDIV=ClockSignal(),
                    i_D1=self.dfi.phases[0].address[i], i_D2=self.dfi.phases[0].address[i],
                    i_D3=self.dfi.phases[1].address[i], i_D4=self.dfi.phases[1].address[i],
                    i_D5=self.dfi.phases[2].address[i], i_D6=self.dfi.phases[2].address[i],
                    i_D7=self.dfi.phases[3].address[i], i_D8=self.dfi.phases[3].address[i]
                )
        for i in range(bankbits):
            self.specials += \
                Instance("OSERDESE2",
                    p_DATA_WIDTH=2*nphases, p_TRISTATE_WIDTH=1,
                    p_DATA_RATE_OQ="DDR", p_DATA_RATE_TQ="BUF",
                    p_SERDES_MODE="MASTER",

                    o_OQ=pads.ba[i],
                    i_OCE=1,
                    i_RST=ResetSignal(),
                    i_CLK=ClockSignal(ddr_clk), i_CLKDIV=ClockSignal(),
                    i_D1=self.dfi.phases[0].bank[i], i_D2=self.dfi.phases[0].bank[i],
                    i_D3=self.dfi.phases[1].bank[i], i_D4=self.dfi.phases[1].bank[i],
                    i_D5=self.dfi.phases[2].bank[i], i_D6=self.dfi.phases[2].bank[i],
                    i_D7=self.dfi.phases[3].bank[i], i_D8=self.dfi.phases[3].bank[i]
                )
        controls = ["ras_n", "cas_n", "we_n", "cke", "odt"]
        if hasattr(pads, "reset_n"):
            controls.append("reset_n")
        if hasattr(pads, "cs_n"):
            controls.append("cs_n")
        for name in controls:
            self.specials += \
                Instance("OSERDESE2",
                   p_DATA_WIDTH=2*nphases, p_TRISTATE_WIDTH=1,
                   p_DATA_RATE_OQ="DDR", p_DATA_RATE_TQ="BUF",
                   p_SERDES_MODE="MASTER",

                   o_OQ=getattr(pads, name),
                   i_OCE=1,
                   i_RST=ResetSignal(),
                   i_CLK=ClockSignal(ddr_clk), i_CLKDIV=ClockSignal(),
                   i_D1=getattr(self.dfi.phases[0], name), i_D2=getattr(self.dfi.phases[0], name),
                   i_D3=getattr(self.dfi.phases[1], name), i_D4=getattr(self.dfi.phases[1], name),
                   i_D5=getattr(self.dfi.phases[2], name), i_D6=getattr(self.dfi.phases[2], name),
                   i_D7=getattr(self.dfi.phases[3], name), i_D8=getattr(self.dfi.phases[3], name)
                )

        # DQS and DM
        oe_dqs = Signal()
        dqs_serdes_pattern = Signal(8, reset=0b01010101)
        if with_odelay:
            self.comb += \
                If(self._wlevel_en.storage,
                    If(self._wlevel_strobe.re,
                        dqs_serdes_pattern.eq(0b00000001)
                    ).Else(
                        dqs_serdes_pattern.eq(0b00000000)
                    )
                ).Else(
                    dqs_serdes_pattern.eq(0b01010101)
                )
        for i in range(databits//8):
            dm_o_nodelay = Signal()
            dm_data = Signal(8)
            dm_data_d = Signal(8)
            dm_data_muxed = Signal(4)
            self.comb += dm_data.eq(Cat(
                self.dfi.phases[0].wrdata_mask[0*databits//8+i], self.dfi.phases[0].wrdata_mask[1*databits//8+i],
                self.dfi.phases[0].wrdata_mask[2*databits//8+i], self.dfi.phases[0].wrdata_mask[3*databits//8+i],
                self.dfi.phases[1].wrdata_mask[0*databits//8+i], self.dfi.phases[1].wrdata_mask[1*databits//8+i],
                self.dfi.phases[1].wrdata_mask[2*databits//8+i], self.dfi.phases[1].wrdata_mask[3*databits//8+i]),
            )
            self.sync += dm_data_d.eq(dm_data)
            self.comb += \
                If(bl8_sel,
                    dm_data_muxed.eq(dm_data_d[4:])
                ).Else(
                    dm_data_muxed.eq(dm_data[:4])
                )
            self.specials += \
                Instance("OSERDESE2",
                    p_DATA_WIDTH=2*nphases, p_TRISTATE_WIDTH=1,
                    p_DATA_RATE_OQ="DDR", p_DATA_RATE_TQ="BUF",
                    p_SERDES_MODE="MASTER",

                    o_OQ=dm_o_nodelay if with_odelay else pads.dm[i],
                    i_OCE=1,
                    i_RST=ResetSignal(),
                    i_CLK=ClockSignal(ddr_clk), i_CLKDIV=ClockSignal(),
                    i_D1=dm_data_muxed[0], i_D2=dm_data_muxed[1],
                    i_D3=dm_data_muxed[2], i_D4=dm_data_muxed[3]
                )
            if with_odelay:
                self.specials += \
                    Instance("ODELAYE2",
                        p_DELAY_SRC="ODATAIN", p_SIGNAL_PATTERN="DATA",
                        p_CINVCTRL_SEL="FALSE", p_HIGH_PERFORMANCE_MODE="TRUE", p_REFCLK_FREQUENCY=iodelay_clk_freq/1e6,
                        p_PIPE_SEL="FALSE", p_ODELAY_TYPE="VARIABLE", p_ODELAY_VALUE=0,

                        i_C=ClockSignal(),
                        i_LD=self._dly_sel.storage[i] & self._wdly_dq_rst.re,
                        i_CE=self._dly_sel.storage[i] & self._wdly_dq_inc.re,
                        i_LDPIPEEN=0, i_INC=1,

                        o_ODATAIN=dm_o_nodelay, o_DATAOUT=pads.dm[i]
                    )

            dqs_nodelay = Signal()
            dqs_delayed = Signal()
            dqs_t = Signal()
            self.specials += \
                Instance("OSERDESE2",
                    p_DATA_WIDTH=2*nphases, p_TRISTATE_WIDTH=1,
                    p_DATA_RATE_OQ="DDR", p_DATA_RATE_TQ="BUF",
                    p_SERDES_MODE="MASTER",

                    o_OFB=dqs_nodelay if with_odelay else Signal(),
                    o_OQ=Signal() if with_odelay else dqs_nodelay,
                    o_TQ=dqs_t,
                    i_OCE=1, i_TCE=1,
                    i_RST=ResetSignal(),
                    i_CLK=ClockSignal(ddr_clk) if with_odelay else ClockSignal(ddr_clk+"_dqs"), i_CLKDIV=ClockSignal(),
                    i_D1=dqs_serdes_pattern[0], i_D2=dqs_serdes_pattern[1],
                    i_D3=dqs_serdes_pattern[2], i_D4=dqs_serdes_pattern[3],
                    i_D5=dqs_serdes_pattern[4], i_D6=dqs_serdes_pattern[5],
                    i_D7=dqs_serdes_pattern[6], i_D8=dqs_serdes_pattern[7],
                    i_T1=~oe_dqs
                )
            if with_odelay:
                self.specials += \
                    Instance("ODELAYE2",
                        p_DELAY_SRC="ODATAIN", p_SIGNAL_PATTERN="DATA",
                        p_CINVCTRL_SEL="FALSE", p_HIGH_PERFORMANCE_MODE="TRUE", p_REFCLK_FREQUENCY=iodelay_clk_freq/1e6,
                        p_PIPE_SEL="FALSE", p_ODELAY_TYPE="VARIABLE", p_ODELAY_VALUE=half_sys8x_taps,

                        i_C=ClockSignal(),
                        i_LD=self._dly_sel.storage[i] & self._wdly_dqs_rst.re,
                        i_CE=self._dly_sel.storage[i] & self._wdly_dqs_inc.re,
                        i_LDPIPEEN=0, i_INC=1,

                        o_ODATAIN=dqs_nodelay, o_DATAOUT=dqs_delayed
                    )
            self.specials += \
                Instance("OBUFTDS",
                    i_I=dqs_delayed if with_odelay else dqs_nodelay, i_T=dqs_t,
                    o_O=pads.dqs_p[i], o_OB=pads.dqs_n[i]
                )

        # DQ
        oe_dq = Signal()
        for i in range(databits):
            dq_o_nodelay = Signal()
            dq_o_delayed = Signal()
            dq_i_nodelay = Signal()
            dq_i_delayed = Signal()
            dq_t = Signal()
            dq_data = Signal(8)
            dq_data_d = Signal(8)
            dq_data_muxed = Signal(4)
            self.comb += dq_data.eq(Cat(
                self.dfi.phases[0].wrdata[0*databits+i], self.dfi.phases[0].wrdata[1*databits+i],
                self.dfi.phases[0].wrdata[2*databits+i], self.dfi.phases[0].wrdata[3*databits+i],
                self.dfi.phases[1].wrdata[0*databits+i], self.dfi.phases[1].wrdata[1*databits+i],
                self.dfi.phases[1].wrdata[2*databits+i], self.dfi.phases[1].wrdata[3*databits+i])
            )
            self.sync += dq_data_d.eq(dq_data)
            self.comb += \
                If(bl8_sel,
                    dq_data_muxed.eq(dq_data_d[4:])
                ).Else(
                    dq_data_muxed.eq(dq_data[:4])
                )
            self.specials += \
                Instance("OSERDESE2",
                    p_DATA_WIDTH=2*nphases, p_TRISTATE_WIDTH=1,
                    p_DATA_RATE_OQ="DDR", p_DATA_RATE_TQ="BUF",
                    p_SERDES_MODE="MASTER",

                    o_OQ=dq_o_nodelay, o_TQ=dq_t,
                    i_OCE=1, i_TCE=1,
                    i_RST=ResetSignal(),
                    i_CLK=ClockSignal(ddr_clk), i_CLKDIV=ClockSignal(),
                    i_D1=dq_data_muxed[0], i_D2=dq_data_muxed[1],
                    i_D3=dq_data_muxed[2], i_D4=dq_data_muxed[3],
                    i_T1=~oe_dq
                )
            dq_i_data = Signal(8)
            dq_i_data_d = Signal(8)
            self.specials += \
                Instance("ISERDESE2",
                    p_DATA_WIDTH=2*nphases, p_DATA_RATE="DDR",
                    p_SERDES_MODE="MASTER", p_INTERFACE_TYPE="NETWORKING",
                    p_NUM_CE=1, p_IOBDELAY="IFD",

                    i_DDLY=dq_i_delayed,
                    i_CE1=1,
                    i_RST=ResetSignal() | (self._dly_sel.storage[i//8] & self._rdly_dq_bitslip_rst.re),
                    i_CLK=ClockSignal(ddr_clk), i_CLKB=~ClockSignal(ddr_clk), i_CLKDIV=ClockSignal(),
                    i_BITSLIP=self._dly_sel.storage[i//8] & self._rdly_dq_bitslip.re,
                    o_Q8=dq_i_data[7], o_Q7=dq_i_data[6],
                    o_Q6=dq_i_data[5], o_Q5=dq_i_data[4],
                    o_Q4=dq_i_data[3], o_Q3=dq_i_data[2],
                    o_Q2=dq_i_data[1], o_Q1=dq_i_data[0]
                )
            self.sync += dq_i_data_d.eq(dq_i_data)
            self.comb += [
                self.dfi.phases[0].rddata[0*databits+i].eq(dq_i_data_d[3]), self.dfi.phases[0].rddata[1*databits+i].eq(dq_i_data_d[2]),
                self.dfi.phases[0].rddata[2*databits+i].eq(dq_i_data_d[1]), self.dfi.phases[0].rddata[3*databits+i].eq(dq_i_data_d[0]),
                self.dfi.phases[1].rddata[0*databits+i].eq(dq_i_data[3]), self.dfi.phases[1].rddata[1*databits+i].eq(dq_i_data[2]),
                self.dfi.phases[1].rddata[2*databits+i].eq(dq_i_data[1]), self.dfi.phases[1].rddata[3*databits+i].eq(dq_i_data[0]),
            ]

            if with_odelay:
                self.specials += \
                    Instance("ODELAYE2",
                        p_DELAY_SRC="ODATAIN", p_SIGNAL_PATTERN="DATA",
                        p_CINVCTRL_SEL="FALSE", p_HIGH_PERFORMANCE_MODE="TRUE", p_REFCLK_FREQUENCY=iodelay_clk_freq/1e6,
                        p_PIPE_SEL="FALSE", p_ODELAY_TYPE="VARIABLE", p_ODELAY_VALUE=0,

                        i_C=ClockSignal(),
                        i_LD=self._dly_sel.storage[i//8] & self._wdly_dq_rst.re,
                        i_CE=self._dly_sel.storage[i//8] & self._wdly_dq_inc.re,
                        i_LDPIPEEN=0, i_INC=1,

                        o_ODATAIN=dq_o_nodelay, o_DATAOUT=dq_o_delayed
                    )
            self.specials += \
                Instance("IDELAYE2",
                    p_DELAY_SRC="IDATAIN", p_SIGNAL_PATTERN="DATA",
                    p_CINVCTRL_SEL="FALSE", p_HIGH_PERFORMANCE_MODE="TRUE", p_REFCLK_FREQUENCY=iodelay_clk_freq/1e6,
                    p_PIPE_SEL="FALSE", p_IDELAY_TYPE="VARIABLE", p_IDELAY_VALUE=0,

                    i_C=ClockSignal(),
                    i_LD=self._dly_sel.storage[i//8] & self._rdly_dq_rst.re,
                    i_CE=self._dly_sel.storage[i//8] & self._rdly_dq_inc.re,
                    i_LDPIPEEN=0, i_INC=1,

                    i_IDATAIN=dq_i_nodelay, o_DATAOUT=dq_i_delayed
                )
            self.specials += \
                Instance("IOBUF",
                    i_I=dq_o_delayed if with_odelay else dq_o_nodelay, o_O=dq_i_nodelay, i_T=dq_t,
                    io_IO=pads.dq[i]
                )

        # Flow control
        #
        # total read latency:
        #  2 cycles through OSERDESE2
        #  cl_sys_latency cycles CAS
        #  2 cycles through ISERDESE2
        rddata_en = self.dfi.phases[self.settings.rdphase].rddata_en
        for i in range(self.settings.read_latency-1):
            n_rddata_en = Signal()
            self.sync += n_rddata_en.eq(rddata_en)
            rddata_en = n_rddata_en
        if with_odelay:
            self.sync += [phase.rddata_valid.eq(rddata_en | self._wlevel_en.storage)
                for phase in self.dfi.phases]
        else:
            self.sync += [phase.rddata_valid.eq(rddata_en)
                for phase in self.dfi.phases]

        oe = Signal()
        last_wrdata_en = Signal(cwl_sys_latency+3)
        wrphase = self.dfi.phases[self.settings.wrphase]
        self.sync += last_wrdata_en.eq(Cat(wrphase.wrdata_en, last_wrdata_en[:-1]))
        self.comb += oe.eq(
            last_wrdata_en[cwl_sys_latency-1] |
            last_wrdata_en[cwl_sys_latency] |
            last_wrdata_en[cwl_sys_latency+1] |
            last_wrdata_en[cwl_sys_latency+2])
        if with_odelay:
            self.sync += \
                If(self._wlevel_en.storage,
                    oe_dqs.eq(1), oe_dq.eq(0)
                ).Else(
                    oe_dqs.eq(oe), oe_dq.eq(oe)
                )
        else:
            self.sync += [
                oe_dqs.eq(oe),
                oe_dq.eq(oe)
            ]

        self.sync += bl8_sel.eq(last_wrdata_en[cwl_sys_latency-1])


class V7DDRPHY(S7DDRPHY):
    def __init__(self, pads, **kwargs):
        S7DDRPHY.__init__(self, pads, with_odelay=True, **kwargs)


class K7DDRPHY(S7DDRPHY):
    def __init__(self, pads, **kwargs):
        S7DDRPHY.__init__(self, pads, with_odelay=True, **kwargs)

class A7DDRPHY(S7DDRPHY):
    def __init__(self, pads, **kwargs):
        S7DDRPHY.__init__(self, pads, with_odelay=False, **kwargs)

Example Constraint File

It would be nice to have an example constraint file for high speed operation (800MT/s and above).

Specifically to answer the questions:

  • Slew fast/slow (and are A pins different than Q pins given they are half the toggle rate)
  • IN_TERM values
  • SSTL15/SSTL15_R? (And DDR3L examples)

BL8 1:2 hangs with UART bridge

What I know so far:

Only repros in 1:2 mode works fine in 1:4
Is sensitive to the frequency (design meets timing)
Writes succeed, only reads fail.

Add Reordering support

Ports can now expose banks to user and can allow reordering accesses to the memory.

To implement reordering, we could create a module that would work on two native ports:

  • one used by the user with all accesses in order and not exposing the banks.
  • one used internally with reordered accesses and exposing the banks.
user_port = LiteDRAMNativePort(..., with_reordering=False)
internal_port = LiteDRAMNativePort(..., with_reordering=True)

class LiteDRAMReordering(Module)
    def __init__(self, user_port, internal_port):
        [...]

We could implement this scenario:

For writes:

  • add bank cmd buffers (only store row/col).
  • each bank cmd buffers maintain a in /out count.
  • add data ram (min depth = nbanks * depth of cmd buffers)
  • redirect write cmd to the proper bank cmd buffer.
  • write cmd is accepted if the proper bank buffer is not full.
  • when bank cmd buffer accepts the cmd, accept the data and store it at location
    bank << log2(buffers' depth) + bank in index.
  • when bank cmd buffer outputs the cmd, retrieve the data in ram at location
    bank << log2(buffers' depth) + bank out count and put it in data queue that will
    be presented to the crossbar.

For reads:

  • maintain a global cmd_in and data_out count.
  • add bank cmd buffers (only store row/col/cmd_in count).
  • add data ram (min depth = nbanks * depth of cmd buffers).
  • redirect read cmd to the proper bank cmd buffer.
  • read cmd is accepted if the proper bank buffer is not full and if read data corresponding to the same cmd_in count value has been presented to the user (should be cmd_in count + 1 != data_out count).
  • when a bank accepts a cmd, put cmd_in outputed by the cmd buffer value in a queue, use this queue to know where to store the next returned data in the data ram.
  • use a flip bit in data ram to indicate that data has been updated (flip this bit each time a location is used).
  • read data at data_out count location, if bit has flipped, present the data and increment data_out count to read next location.

BL8 1:2 controller fails to compile

Traceback (most recent call last):
File "core.py", line 275, in
main()
File "core.py", line 256, in main
soc.generate_sdram_phy_py_header()
File "core.py", line 240, in generate_sdram_phy_py_header
self.sdram.controller.settings.timing))
File "/home/john/repos/litedram/litedram/sdram_init.py", line 292, in get_sdram_phy_py_header
init_sequence, _ = get_sdram_phy_init_sequence(phy_settings, timing_settings)
File "/home/john/repos/litedram/litedram/sdram_init.py", line 175, in get_sdram_phy_init_sequence
mr0 = format_mr0(bl, cl, wr, 1)
File "/home/john/repos/litedram/litedram/sdram_init.py", line 125, in format_mr0
mr0 |= wr_to_mr0[wr] << 9
KeyError: 4

The formula computing this uses tWTR*nphases which I think is wrong. Micron says this should be configured with the following formula:
WR (cycles) = roundup (tWR [ns]/tCK [ns]).

nizox/tapcfg is not available anymore

When trying to install LiteX :

Cloning into '/home/xilinxbox/litex/litex/build/sim/core/modules/ethernet/tapcfg'...
Username for 'https://github.com':
Password for 'https://github.com':
remote: Repository not found.
fatal: Authentication failed for 'https://github.com/nizox/tapcfg/'
fatal: clone of 'https://github.com/nizox/tapcfg' into submodule path '/home/xilinxbox/litex/litex/build/sim/core/modules/ethernet/tapcfg' failed
Failed to clone 'litex/build/sim/core/modules/ethernet/tapcfg' a second time, aborting

And https://github.com/nizox/tapcfg gives a 404...

Multiple Timings Ignored

This commit fixes it, however its entangled with my auto_precharge pull request. Since timing has to be re verified anyways might as well pull them both.

The specific fix is here: JohnSully@a4be642

Refactor S7DDRPHY / add proper 1:2 BL8 support

This will improve code sharing, ease understanding and allow implementing BL8 support more easily for 1:2 PHY:

  • move DDR2/DDR3 latency/phase computation functions to a common file (can be reused by others phys)
  • split code in modules: control / tx datapath / rx datapath
  • add proper BL8 support for the 1:2 PHY: control path is similar, we can just do some adaptation on the tx datapath / rx datapath signals.

ddr calibration fails when SoC temperature is hot

When die temp is high: 70.0107 (0x0ae5)

        __   _ __      _  __
       / /  (_) /____ | |/_/
      / /__/ / __/ -_)>  <
     /____/_/\__/\__/_/|_|
 SoC BIOS / CPU: VexRiscv / 100MHz
(c) Copyright 2012-2018 Enjoy-Digital
(c) Copyright 2007-2018 M-Labs Limited
Built Jul 19 2018 15:06:12

BIOS CRC passed (5bf71210)
Initializing SDRAM...
Read bitslip: 3
Read delays scan:
m0: 00000000000000000000000000000000
m1: 00000000001111111111111000000000
m2: 00000000001111111111111000000000
m3: 00000000001111111111111100000000
Read delays: 3:10-24  2:10-24  1:10-24  0:32-33  completed
Memtest bus failed: 156/256 errors
Memtest data failed: 524288/524288 errors
Memtest addr failed: 8192/8192 errors
Memory initialization failed

With the latest release, the board fails to boot:

        __   _ __      _  __
       / /  (_) /____ | |/_/
      / /__/ / __/ -_)>  <
     /____/_/\__/\__/_/|_|
 SoC BIOS / CPU: VexRiscv / 100MHz
(c) Copyright 2012-2018 Enjoy-Digital
(c) Copyright 2007-2018 M-Labs Limited
Built Jul 19 2018 15:34:22

BIOS CRC passed (317e43f7)
Initializing SDRAM...
Read bitslip: 2
Read delays scan:
m0: 00000000000000000000000000000000
m1: 11111111000000000000000000000000
m2: 11111111000000000000000000000000
m3: 111111110000

Out of order interface is incomplete

This is more of an FYI that out of order is not yet fully completed. Specifically the write interface must also output the bank its using. In addition OO is disabled in the frontend crossbar.

The interface is also not stable. If we implement re-ordering within banks (e.g. write batching) then we'll have to move to a tag system to track operations.

SDR / fix refresh

Refresh seems to be broken with SDR (working with DDR, DDR2, DDR3, DDR4).

Multirank: add dynamic ODT

To support higher frequencies in dual/quad rank mode, we will need to drive ODT dynamically.
Useful information can be found in:

  • TN-41-08: Design Guide for Two DDR3-1066 UDIMM Systems
  • TN-04-54: High-Speed DRAM Controller Design

SDRAM CKE as optional

make it easy to not export simple SDRAM CKE pin, as it is optional and may be tied hi on target PCB, and yes it may be that there is not a single unused pin left to assign this signal too

migen

You should mention in the README the dependency on migen and how to install it.

settings.timing.tWTR expressed in nanoseconds but used as cycles

In multiplexer.py the line: fsm.delayed_enter("WTR", "READ", settings.timing.tWTR-1) uses the tWTR parameter. However in the modules its expressed in nanoseconds.

Other parameters such as tRP also expressed in nanoseconds undergo a transformation elsewhere to get converted into cycles - however tWTR never does. This greatly exaggerates the delay switching between write and read.

Latest litedram fails on the Digilent Atlys board

https://travis-ci.org/mithro/HDMI2USB-litex-firmware/jobs/352035389 and https://api.travis-ci.org/v3/job/352035389/log.txt

Total REAL time to Placer completion: 3 mins 8 secs 
Total CPU  time to Placer completion: 3 mins 8 secs 
Running post-placement packing...
Writing output files...
*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
ERROR:PhysDesignRules:1939 - Issue with pin connections and/or configuration on
   block:<ODDR2_5>:<OLOGIC2_OUTFF>.  The OUTFF Flip-flop SRTYPE_OQ mode must be
   ASYNC in DDR mode with DDR_ALIGNMENT mode of C0 or C1.
ERROR:Pack:1642 - Errors in physical DRC.
Mapping completed.
See MAP report file "top_map.mrp" for details.
Problem encountered during the packing phase.
Design Summary
--------------
Number of errors   :   2
Number of warnings :  10

Problems instantiating on ice40 board

I have made an SDRAM plug-in board for the Lattice ICE40 HX8K EVB. It uses the AS4C16M16 which is already supported.

I am trying to make a simple Litex project to use it, but I get a synthesis error:

$ python3 ice40hx8k_litedram_nn.py 
lxbuildenv: v2019.8.19.1 (run ice40hx8k_litedram_nn.py --lx-help for help)
<__main__.Platform object at 0x7fb78610b7f0>
{'cpu_type': None, 'cpu_variant': None, 'integrated_rom_size': 0, 'integrated_sram_size': 0}
ERROR: Conflicting init values for signal 1'1 (\soc_sdram_master_p0_act_n = 1'1, \soc_sdram_choose_req_want_activates = 1'0).
Traceback (most recent call last):
  File "ice40hx8k_litedram_nn.py", line 224, in <module>
    main()
  File "ice40hx8k_litedram_nn.py", line 220, in main
    vns = builder.build()
  File "/home/niklas/hack/litex/notbuildenv/deps/litex/litex/soc/integration/builder.py", line 185, in build
    toolchain_path=toolchain_path, **kwargs)
  File "/home/niklas/hack/litex/notbuildenv/deps/litex/litex/soc/integration/soc_core.py", line 452, in build
    return self.platform.build(self, *args, **kwargs)
  File "/home/niklas/hack/litex/notbuildenv/deps/litex/litex/build/lattice/platform.py", line 34, in build
    return self.toolchain.build(self, *args, **kwargs)
  File "/home/niklas/hack/litex/notbuildenv/deps/litex/litex/build/lattice/icestorm.py", line 189, in build
    _run_script(script)
  File "/home/niklas/hack/litex/notbuildenv/deps/litex/litex/build/lattice/icestorm.py", line 121, in _run_script
    raise OSError("Subprocess failed")
OSError: Subprocess failed

I have created a repo here with a small example file and the build output checked in. I'm using yosys (Yosys 0.9+932 (git sha1 9e6632c4, clang 9.0.0 -fPIC -Os).

The issue is solved by the following patch against LiteDRAM, also included in the test repo above:

diff --git a/litedram/phy/dfi.py b/litedram/phy/dfi.py
index 2948556..44bca0b 100644
--- a/litedram/phy/dfi.py
+++ b/litedram/phy/dfi.py
@@ -15,7 +15,7 @@ def phase_cmd_description(addressbits, bankbits, nranks):
         ("cke",          nranks, DIR_M_TO_S),
         ("odt",          nranks, DIR_M_TO_S),
         ("reset_n",           1, DIR_M_TO_S),
-        ("act_n",             1, DIR_M_TO_S)
+        #("act_n",             1, DIR_M_TO_S)
     ]
 
 
@@ -52,7 +52,7 @@ class Interface(Record):
             p.cs_n.reset = (2**nranks-1)
             p.ras_n.reset = 1
             p.we_n.reset = 1
-            p.act_n.reset = 1
+            #p.act_n.reset = 1
 
     # Returns pairs (DFI-mandated signal name, Migen signal object)
     def get_standard_names(self, m2s=True, s2m=True):
@@ -85,11 +85,11 @@ class DDR4DFIMux(Module):
             self.comb += [
                 p_i.connect(p_o),
                 If(~p_i.ras_n & p_i.cas_n & p_i.we_n,
-                   p_o.act_n.eq(0),
+                   #p_o.act_n.eq(0),
                    p_o.we_n.eq(p_i.address[14]),
                    p_o.cas_n.eq(p_i.address[15]),
                    p_o.ras_n.eq(p_i.address[16])
                 ).Else(
-                    p_o.act_n.eq(1),
+                    #p_o.act_n.eq(1),
                 )
             ]

1:2 7-series Phy doesn't issue commands in write mode

Underlying issue appears to be that the wrcmdphase has the same value as the wrphase. The wrphase wins and the command is silently dropped (but the bank machine thinks it was sent).

Forcing wrcmdphase to 1 appears to work.

tFAW is too pessimistic

Haven't had a chance to look into why yet, but I've temporarily disabled it locally. Its a major bottleneck the difference between 42% and 80% bus efficiency on my random write test. I'm not getting tFAW errors with it disabled which makes me think its not correct.

pin sdram_dm[1] stuck at GND can this be ignored?

Warning (13410): Pin "sdram_dm[1]" is stuck at GND File: X:/GIT/litex-boards/litex_boards/partner/targets/soc_basesoc_c10lprefkit/gateware/top.v Line: 3435

can this warning be ignored?

memtest reports OK, but having 1 out of 2 DM pins stuck GND does not sound like a warning to be ignored ?

Allow all CL/CWL combinations to be used with DDR3

Some CL/CWL combinations are not possible since cmd_phase and dat_phase will be the same and there is no arbitration.

An assert (assert cmd_phase != dat_phase) is currently implemented in the code to prevent this case, but this limits the possible CL/CWL combinations.

We should probably remove this limitation by doing arbitration between cmd and data phases when both are valid.

Improve SDRAMModule speedgrade definition

From @mithro:

Wouldn't it be nicer to have;

SpeedGrade = namedtuple("tRP", "tRCD", "tWR", "tRFC", "tFAW", "tRC", "tRAS")

# DDR3
class MT41J128M16(SDRAMModule):
     memtype = "DDR3"
     # geometry
     nbanks = 8
     nrows  = 16384
     ncols  = 1024
     # speedgrade invariant timings
     tREFI = 64e6/8192
     tWTR  = (4, 7.5)
     tCCD  = (4, None)
     tRRD  = 10

     speed_grades = {
         800: SpeedGrade(tRP=13.1, tRCD=13.1, tWR=13.1, tRFC=64, tRC=50.625, tRAS=37.5),
         1066:  SpeedGrade(...),
     }

lpddr: NameError: name 'mr1' is not defined

(LX P=mimasv2 C=vexriscv R=master-clean) tansell@tansell:~/github/timvideos/litex-buildenv-clean$ make firmware                                                                                                   
mkdir -p build/mimasv2_base_vexriscv/
time python -u ./make.py --platform=mimasv2 --target=base --cpu-type=vexriscv --iprange=192.168.100 -Ob toolchain_path /opt/Xilinx/    --no-compile-gateware \                                                    
        2>&1 | tee -a /usr/local/google/home/tansell/github/timvideos/litex-buildenv-clean/build/mimasv2_base_vexriscv//output.20191030-132700.log; (exit ${PIPESTATUS[0]})                                       
Traceback (most recent call last):
  File "./make.py", line 164, in <module>
    main()
  File "./make.py", line 148, in main
    vns = builder.build(**dict(args.build_option))
  File "/usr/local/google/home/tansell/github/timvideos/litex-buildenv-clean/third_party/litex/litex/soc/integration/builder.py", line 167, in build                                                              
    self._generate_includes()
  File "/usr/local/google/home/tansell/github/timvideos/litex-buildenv-clean/third_party/litex/litex/soc/integration/builder.py", line 129, in _generate_includes                                                 
    self.soc.sdram.controller.settings.timing))
  File "/usr/local/google/home/tansell/github/timvideos/litex-buildenv-clean/third_party/litedram/litedram/init.py", line 434, in get_sdram_phy_c_header                                                          
    init_sequence, mr1 = get_sdram_phy_init_sequence(phy_settings, timing_settings)
  File "/usr/local/google/home/tansell/github/timvideos/litex-buildenv-clean/third_party/litedram/litedram/init.py", line 377, in get_sdram_phy_init_sequence                                                     
    }[phy_settings.memtype](phy_settings, timing_settings)
  File "/usr/local/google/home/tansell/github/timvideos/litex-buildenv-clean/third_party/litedram/litedram/init.py", line 78, in get_lpddr_phy_init_sequence                                                      
    return init_sequence, mr1
NameError: name 'mr1' is not defined

real    0m0.371s
user    0m0.311s
sys     0m0.066s
make: *** [Makefile:303: firmware-cmd] Error 1

Can't calibrate

This is related to my other issue where calibration hangs. However even when working around it with a delay between reads and writes calibration still fails. It was working about a month ago.

Trying to figure it out on my own, but I'm running out of ideas.

TypeError: phase_cmd_description() missing 1 required positional argument: 'nranks'

litedram revision e5696ad

From https://travis-ci.org/timvideos/litex-buildenv/jobs/430294286

time python -u ./make.py --platform=atlys --target=base --cpu-type=lm32 --iprange=192.168.100     --no-compile-gateware \
		2>&1 | tee -a build/atlys_base_lm32//output.20180918-230444.log; (exit ${PIPESTATUS[0]})
Traceback (most recent call last):
  File "./make.py", line 156, in <module>
    main()
  File "./make.py", line 123, in main
    soc = get_soc(args, platform)
  File "./make.py", line 57, in get_soc
    soc = SoC(platform, ident=SoC.__name__, **soc_sdram_argdict(args), **dict(args.target_option))
  File "/home/travis/build/timvideos/litex-buildenv/targets/atlys/base.py", line 228, in __init__
    dqs_ddr_alignment="C0")
  File "/home/travis/build/timvideos/litex-buildenv/third_party/litedram/litedram/phy/s6ddrphy.py", line 111, in __init__
    r_dfi = Array(Record(phase_cmd_description(addressbits, bankbits)) for i in range(nphases))
  File "/home/travis/build/timvideos/litex-buildenv/third_party/litedram/litedram/phy/s6ddrphy.py", line 111, in <genexpr>
    r_dfi = Array(Record(phase_cmd_description(addressbits, bankbits)) for i in range(nphases))
TypeError: phase_cmd_description() missing 1 required positional argument: 'nranks'

Simulation is broken

Hi,
I'm trying to test a custom configuration in simulation. Before that, I went to execute sim.py without any change and found couple of issues:

  1. The mem_1.init is not referenced (copied) anywhere on sim.py
WARNING: File mem_1.init referenced on $PATH/litex/litedram/examples/sim/litedram_core.v at line 15687 cannot be opened for reading. Please ensure that this file is available in the current working directory.
  1. There's a X prop when litedram_core releases the user reset and introduces an unknown state to the DDR3 model.
top_tb.dut.ddr3.main: at time 530100.0 ps ERROR: CK and CK_N are not allowed to go to an unknown state.

image

  1. I'm not sure if I'm doing something wrong or there's an extra step needed, but as far as I goes, I haven't had any successful simulation result. As you can see, init_done is never asserted, so the main fsm got stuck on idle states forever. Tracing back, that signal ends asserted by a CSR, but a) There's no CPU or register write interface and b) Both init memories are empty by default.

  2. There are a lot of redundant writes to variables, therefore the generated code gets verbose. In this excerpt from top.v lines 335 to 341, the variable fsm_next_state is written twice, along with the declaration that is initialized. This can be improved.

always @(*) begin
	sdram_checker_addr_gen_ce <= 1'd0;
	sdram_checker_sink_sink_valid <= 1'd0;
	fsm0_next_state <= 2'd0; // Already initialized by 0
	sdram_checker_cmd_counter_fsm0_next_value <= 24'd0;
	sdram_checker_cmd_counter_fsm0_next_value_ce <= 1'd0;
	fsm0_next_state <= fsm0_state; // Written again

Thank you.

DDR4 / Add optional features support

  • Manage bank groups.
  • Manage Vref DQ Calibration.
  • Add data CRC support (See tn_4003_DDR4_network_design_guide / p4).
  • Add DBI support (See tn_4003_DDR4_network_design_guide / p6).

Verify DDR3 MR0 Write Recovery configuration

Write Recovery should be computed as follow:
WR (cycles) = roundup (tWR [ns]/tCK [ns])

We are currently using a fixed value:
mr0 = format_mr0(bl, cl, 14, 1) # wr=8 FIXME: this should be ceiling(tWR/tCK)

Add migen testbenches for the core

The core is currently tested with high level simulation of the full controller which is not practical and slow.

We need good migen testbenches for the cores to:

  • ensure we don't have regression when doing changes.
  • ensure all timings parameters are handled correctly.
  • provide a better way to evaluate efficiency according to the access pattern.
  • find the best architecture.

Allow sharing bank machines between banks

Currently, each bank has its own bank machine. On designs where efficiency is less critical, being able to reduce the number of bank machines and sharing a bank machine between multiple banks would be useful to reduce resource usage.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.