enjoy-digital / litedram Goto Github PK

View Code? Open in Web Editor NEW

370.0 370.0 119.0 3.33 MB

Small footprint and configurable DRAM core

License: Other

Python 98.50% C 0.84% CSS 0.12% Verilog 0.10% Jinja 0.43%

litedram's People

Contributors

Stargazers

Watchers

Forkers

mithro nconnolly felixheld monological johnsully bunnie alphamaxmedia daveshah1 antmicro helloworld1983 rowhit ewouth vehar gsomlo ambrop72 open-design seraph42407201 yxist sergachev lapnd olofk kaolpr john-k zsipos enjens mwelling kamejoko80 darlington72 pdp7 shingarov marrkson ximinity fei-g xiretza jimlj ozbenh kingoflolz mfkiwl tambewilliam daverichmond oskirby lu-ping pragyabansal02 gregdavill norbertthiel teknoman117 chengzhangxinli tubbz-alt mdpye hplp garytwong craigjb lswang2 nickoe jersey99 hansfbaier andrewb1999 orb1t-ua ombhilare999 beaglewire qshan jfng chriswils95 antonblanchard mdejw alainlou mkj aew2015 thirtythreeforty psumesh cklarhorst j11332 salaheldinashraf johnsel jevinskie wenxuan-hu technosystem-labs hfyfpga barun-s wuzhanmin tommycox lifpnc scalarlab-rowhammerpart icodein machdyne svhb1000 hsrm-progenitor trabucayre yuyexingkong vuhuycan kartosis pitt-joneslab goran-mahovlic billzpage dau-dev jiegec thezoq2 tylerricks377 byuccl flexalogic

litedram's Issues

Data corruption when out of order ports accessed simultaneously.

When I have an in-order port and an out-of-order port, traffic on the out of order port corrupts data on the in order port. Issue goes away when traffic is stopped on the out of order port.

DDR4 / Add data CRC support

See tn_4003_DDR4_network_design_guide / p4.

memtest passing when one DQn pin is not connected

we get memtest OK results while one of the DQ pins was not connected due to error in platform, so the dq port in top had only 15 pins, but memtest is reporting OK ?

Litedram is currently broken on the minispartan6+

The minispartan6+ has sdram rather than DDR.

This is hidden when only using a single user port because the time to complete a command exceeds these timings. However when operating with multiple ports it is possible to exceed both tRRD and tFAW as the mux will issue a row command every cycle if it receives a command from a bank to do so.

List various usecases, find best architecture and document how to get the best efficiency.

The controller can be used on various systems which all have specific needs. The current architecture try to be simple and give reasonable efficiency/performance for most of the usecases.

Until now, performance was good on access patterns that were used. (mostly for linear data buffers and store CPU code). We still need to do #8 and #57 to have better metrics.

With more users, some new usecases appear with different access patterns and we need to understand what are the eventual bottlenecks and the way to improve this.

Update / Refactor KUSDDRPHY

Apply the improvements made to S7DDRPHY and do similar refactoring than #60.

Add BL8 support for S7DDRPHY in 1:2

S7DDRPHY works with:

DDR2 / 2 phases / BL4.
DDR3 / 4 phases / BL8.

To support DDR3 / 2 phases / BL8, the PHY needs some modifications.

Here is an old modified version of the PHY that has these modifications and that should be merged cleanly:

# 1:4, 1:2 frequency-ratio DDR2/DDR3 PHY for Xilinx's Series7
# DDR2: 400, 533, 667, 800 and 1066 MT/s
# DDR3: 1066, 1333 and 1600 MT/s

import math

from migen import *

from litex.soc.interconnect.csr import *

from litedram.common import PhySettings
from litedram.phy.dfi import *


def get_cl_cw(memtype, tck):
    if memtype == "DDR2":
        # ddr2-400
        if tck >= 2/400e6:
            cl, cwl = 3, 2
        # ddr2-533
        elif tck >= 2/533e6:
            cl, cwl = 4, 3
        # ddr2-667
        elif tck >= 2/677e6:
            cl, cwl = 5, 4
        # ddr2-800
        elif tck >= 2/800e6:
            cl, cwl = 6, 5
        # ddr2-1066
        elif tck >= 2/1066e6:
            cl, cwl = 7, 5
        else:
            raise ValueError
    elif memtype == "DDR3":
        # ddr3-1066
        if tck >= 2/1066e6:
            cl, cwl = 7, 6
        # ddr3-1333
        elif tck >= 2/1333e6:
            cl, cwl = 10, 7
        # ddr3-1600
        elif tck >= 2/1600e6:
            cl, cwl = 11, 8
        else:
            raise ValueError
    return cl, cwl

def get_sys_latency(nphases, cas_latency):
    return math.ceil(cas_latency/nphases)

def get_sys_phases(nphases, sys_latency, cas_latency, write=False):
    cmd_phase = 0
    dat_phase = 0
    diff_phase = 0
    while (diff_phase + cas_latency) != sys_latency*nphases:
        dat_phase += 1
        if dat_phase == nphases:
            dat_phase = 0
            cmd_phase += 1
        if write:
            diff_phase = dat_phase - cmd_phase
        else:
            diff_phase = cmd_phase - dat_phase
    return cmd_phase, dat_phase


class S7DDRPHY(Module, AutoCSR):
    def __init__(self, pads, with_odelay, memtype="DDR3", nphases=4, sys_clk_freq=100e6, iodelay_clk_freq=200e6):
        tck = 2/(2*nphases*sys_clk_freq)
        addressbits = len(pads.a)
        bankbits = len(pads.ba)
        databits = len(pads.dq)
        nphases = nphases

        iodelay_tap_average = {
            200e6: 78e-12,
            300e6: 52e-12,
        }
        half_sys8x_taps = math.floor(tck/(4*iodelay_tap_average[iodelay_clk_freq]))
        self._half_sys8x_taps = CSRStorage(4, reset=half_sys8x_taps)

        if with_odelay:
            self._wlevel_en = CSRStorage()
            self._wlevel_strobe = CSR()

        self._dly_sel = CSRStorage(databits//8)

        self._rdly_dq_rst = CSR()
        self._rdly_dq_inc = CSR()
        self._rdly_dq_bitslip_rst = CSR()
        self._rdly_dq_bitslip = CSR()

        if with_odelay:
            self._wdly_dq_rst = CSR()
            self._wdly_dq_inc = CSR()
            self._wdly_dqs_rst = CSR()
            self._wdly_dqs_inc = CSR()

        # compute phy settings
        cl, cwl = get_cl_cw(memtype, tck)
        cl_sys_latency = get_sys_latency(nphases, cl)
        cwl_sys_latency = get_sys_latency(nphases, cwl)

        rdcmdphase, rdphase = get_sys_phases(nphases, cl_sys_latency, cl)
        wrcmdphase, wrphase = get_sys_phases(nphases, cwl_sys_latency, cwl, write=True)
        wrcmdphase = 1
        print("wrcmdphase: " + str(wrcmdphase) + " wrphase: " + str(wrphase))
        self.settings = PhySettings(
            memtype=memtype,
            dfi_databits=4*databits,
            nphases=nphases,
            rdphase=rdphase,
            wrphase=wrphase,
            rdcmdphase=rdcmdphase,
            wrcmdphase=wrcmdphase,
            cl=cl,
            cwl=cwl,
            read_latency=2 + cl_sys_latency + 2 + 1,
            write_latency=cwl_sys_latency
        )

        self.dfi = Interface(addressbits, bankbits, 4*databits, 4)

        # # #

        bl8_sel = Signal()

        # Clock
        ddr_clk = "sys2x" if nphases == 2 else "sys4x"
        for i in range(len(pads.clk_p)):
            sd_clk_se = Signal()
            self.specials += [
                Instance("OSERDESE2",
                    p_DATA_WIDTH=2*nphases, p_TRISTATE_WIDTH=1,
                    p_DATA_RATE_OQ="DDR", p_DATA_RATE_TQ="BUF",
                    p_SERDES_MODE="MASTER",

                    o_OQ=sd_clk_se,
                    i_OCE=1,
                    i_RST=ResetSignal(),
                    i_CLK=ClockSignal(ddr_clk), i_CLKDIV=ClockSignal(),
                    i_D1=0, i_D2=1, i_D3=0, i_D4=1,
                    i_D5=0, i_D6=1, i_D7=0, i_D8=1
                ),
                Instance("OBUFDS",
                    i_I=sd_clk_se,
                    o_O=pads.clk_p[i],
                    o_OB=pads.clk_n[i]
                )
            ]

        # Addresses and commands
        for i in range(addressbits):
            self.specials += \
                Instance("OSERDESE2",
                    p_DATA_WIDTH=2*nphases, p_TRISTATE_WIDTH=1,
                    p_DATA_RATE_OQ="DDR", p_DATA_RATE_TQ="BUF",
                    p_SERDES_MODE="MASTER",

                    o_OQ=pads.a[i],
                    i_OCE=1,
                    i_RST=ResetSignal(),
                    i_CLK=ClockSignal(ddr_clk), i_CLKDIV=ClockSignal(),
                    i_D1=self.dfi.phases[0].address[i], i_D2=self.dfi.phases[0].address[i],
                    i_D3=self.dfi.phases[1].address[i], i_D4=self.dfi.phases[1].address[i],
                    i_D5=self.dfi.phases[2].address[i], i_D6=self.dfi.phases[2].address[i],
                    i_D7=self.dfi.phases[3].address[i], i_D8=self.dfi.phases[3].address[i]
                )
        for i in range(bankbits):
            self.specials += \
                Instance("OSERDESE2",
                    p_DATA_WIDTH=2*nphases, p_TRISTATE_WIDTH=1,
                    p_DATA_RATE_OQ="DDR", p_DATA_RATE_TQ="BUF",
                    p_SERDES_MODE="MASTER",

                    o_OQ=pads.ba[i],
                    i_OCE=1,
                    i_RST=ResetSignal(),
                    i_CLK=ClockSignal(ddr_clk), i_CLKDIV=ClockSignal(),
                    i_D1=self.dfi.phases[0].bank[i], i_D2=self.dfi.phases[0].bank[i],
                    i_D3=self.dfi.phases[1].bank[i], i_D4=self.dfi.phases[1].bank[i],
                    i_D5=self.dfi.phases[2].bank[i], i_D6=self.dfi.phases[2].bank[i],
                    i_D7=self.dfi.phases[3].bank[i], i_D8=self.dfi.phases[3].bank[i]
                )
        controls = ["ras_n", "cas_n", "we_n", "cke", "odt"]
        if hasattr(pads, "reset_n"):
            controls.append("reset_n")
        if hasattr(pads, "cs_n"):
            controls.append("cs_n")
        for name in controls:
            self.specials += \
                Instance("OSERDESE2",
                   p_DATA_WIDTH=2*nphases, p_TRISTATE_WIDTH=1,
                   p_DATA_RATE_OQ="DDR", p_DATA_RATE_TQ="BUF",
                   p_SERDES_MODE="MASTER",

                   o_OQ=getattr(pads, name),
                   i_OCE=1,
                   i_RST=ResetSignal(),
                   i_CLK=ClockSignal(ddr_clk), i_CLKDIV=ClockSignal(),
                   i_D1=getattr(self.dfi.phases[0], name), i_D2=getattr(self.dfi.phases[0], name),
                   i_D3=getattr(self.dfi.phases[1], name), i_D4=getattr(self.dfi.phases[1], name),
                   i_D5=getattr(self.dfi.phases[2], name), i_D6=getattr(self.dfi.phases[2], name),
                   i_D7=getattr(self.dfi.phases[3], name), i_D8=getattr(self.dfi.phases[3], name)
                )

        # DQS and DM
        oe_dqs = Signal()
        dqs_serdes_pattern = Signal(8, reset=0b01010101)
        if with_odelay:
            self.comb += \
                If(self._wlevel_en.storage,
                    If(self._wlevel_strobe.re,
                        dqs_serdes_pattern.eq(0b00000001)
                    ).Else(
                        dqs_serdes_pattern.eq(0b00000000)
                    )
                ).Else(
                    dqs_serdes_pattern.eq(0b01010101)
                )
        for i in range(databits//8):
            dm_o_nodelay = Signal()
            dm_data = Signal(8)
            dm_data_d = Signal(8)
            dm_data_muxed = Signal(4)
            self.comb += dm_data.eq(Cat(
                self.dfi.phases[0].wrdata_mask[0*databits//8+i], self.dfi.phases[0].wrdata_mask[1*databits//8+i],
                self.dfi.phases[0].wrdata_mask[2*databits//8+i], self.dfi.phases[0].wrdata_mask[3*databits//8+i],
                self.dfi.phases[1].wrdata_mask[0*databits//8+i], self.dfi.phases[1].wrdata_mask[1*databits//8+i],
                self.dfi.phases[1].wrdata_mask[2*databits//8+i], self.dfi.phases[1].wrdata_mask[3*databits//8+i]),
            )
            self.sync += dm_data_d.eq(dm_data)
            self.comb += \
                If(bl8_sel,
                    dm_data_muxed.eq(dm_data_d[4:])
                ).Else(
                    dm_data_muxed.eq(dm_data[:4])
                )
            self.specials += \
                Instance("OSERDESE2",
                    p_DATA_WIDTH=2*nphases, p_TRISTATE_WIDTH=1,
                    p_DATA_RATE_OQ="DDR", p_DATA_RATE_TQ="BUF",
                    p_SERDES_MODE="MASTER",

                    o_OQ=dm_o_nodelay if with_odelay else pads.dm[i],
                    i_OCE=1,
                    i_RST=ResetSignal(),
                    i_CLK=ClockSignal(ddr_clk), i_CLKDIV=ClockSignal(),
                    i_D1=dm_data_muxed[0], i_D2=dm_data_muxed[1],
                    i_D3=dm_data_muxed[2], i_D4=dm_data_muxed[3]
                )
            if with_odelay:
                self.specials += \
                    Instance("ODELAYE2",
                        p_DELAY_SRC="ODATAIN", p_SIGNAL_PATTERN="DATA",
                        p_CINVCTRL_SEL="FALSE", p_HIGH_PERFORMANCE_MODE="TRUE", p_REFCLK_FREQUENCY=iodelay_clk_freq/1e6,
                        p_PIPE_SEL="FALSE", p_ODELAY_TYPE="VARIABLE", p_ODELAY_VALUE=0,

                        i_C=ClockSignal(),
                        i_LD=self._dly_sel.storage[i] & self._wdly_dq_rst.re,
                        i_CE=self._dly_sel.storage[i] & self._wdly_dq_inc.re,
                        i_LDPIPEEN=0, i_INC=1,

                        o_ODATAIN=dm_o_nodelay, o_DATAOUT=pads.dm[i]
                    )

            dqs_nodelay = Signal()
            dqs_delayed = Signal()
            dqs_t = Signal()
            self.specials += \
                Instance("OSERDESE2",
                    p_DATA_WIDTH=2*nphases, p_TRISTATE_WIDTH=1,
                    p_DATA_RATE_OQ="DDR", p_DATA_RATE_TQ="BUF",
                    p_SERDES_MODE="MASTER",

                    o_OFB=dqs_nodelay if with_odelay else Signal(),
                    o_OQ=Signal() if with_odelay else dqs_nodelay,
                    o_TQ=dqs_t,
                    i_OCE=1, i_TCE=1,
                    i_RST=ResetSignal(),
                    i_CLK=ClockSignal(ddr_clk) if with_odelay else ClockSignal(ddr_clk+"_dqs"), i_CLKDIV=ClockSignal(),
                    i_D1=dqs_serdes_pattern[0], i_D2=dqs_serdes_pattern[1],
                    i_D3=dqs_serdes_pattern[2], i_D4=dqs_serdes_pattern[3],
                    i_D5=dqs_serdes_pattern[4], i_D6=dqs_serdes_pattern[5],
                    i_D7=dqs_serdes_pattern[6], i_D8=dqs_serdes_pattern[7],
                    i_T1=~oe_dqs
                )
            if with_odelay:
                self.specials += \
                    Instance("ODELAYE2",
                        p_DELAY_SRC="ODATAIN", p_SIGNAL_PATTERN="DATA",
                        p_CINVCTRL_SEL="FALSE", p_HIGH_PERFORMANCE_MODE="TRUE", p_REFCLK_FREQUENCY=iodelay_clk_freq/1e6,
                        p_PIPE_SEL="FALSE", p_ODELAY_TYPE="VARIABLE", p_ODELAY_VALUE=half_sys8x_taps,

                        i_C=ClockSignal(),
                        i_LD=self._dly_sel.storage[i] & self._wdly_dqs_rst.re,
                        i_CE=self._dly_sel.storage[i] & self._wdly_dqs_inc.re,
                        i_LDPIPEEN=0, i_INC=1,

                        o_ODATAIN=dqs_nodelay, o_DATAOUT=dqs_delayed
                    )
            self.specials += \
                Instance("OBUFTDS",
                    i_I=dqs_delayed if with_odelay else dqs_nodelay, i_T=dqs_t,
                    o_O=pads.dqs_p[i], o_OB=pads.dqs_n[i]
                )

        # DQ
        oe_dq = Signal()
        for i in range(databits):
            dq_o_nodelay = Signal()
            dq_o_delayed = Signal()
            dq_i_nodelay = Signal()
            dq_i_delayed = Signal()
            dq_t = Signal()
            dq_data = Signal(8)
            dq_data_d = Signal(8)
            dq_data_muxed = Signal(4)
            self.comb += dq_data.eq(Cat(
                self.dfi.phases[0].wrdata[0*databits+i], self.dfi.phases[0].wrdata[1*databits+i],
                self.dfi.phases[0].wrdata[2*databits+i], self.dfi.phases[0].wrdata[3*databits+i],
                self.dfi.phases[1].wrdata[0*databits+i], self.dfi.phases[1].wrdata[1*databits+i],
                self.dfi.phases[1].wrdata[2*databits+i], self.dfi.phases[1].wrdata[3*databits+i])
            )
            self.sync += dq_data_d.eq(dq_data)
            self.comb += \
                If(bl8_sel,
                    dq_data_muxed.eq(dq_data_d[4:])
                ).Else(
                    dq_data_muxed.eq(dq_data[:4])
                )
            self.specials += \
                Instance("OSERDESE2",
                    p_DATA_WIDTH=2*nphases, p_TRISTATE_WIDTH=1,
                    p_DATA_RATE_OQ="DDR", p_DATA_RATE_TQ="BUF",
                    p_SERDES_MODE="MASTER",

                    o_OQ=dq_o_nodelay, o_TQ=dq_t,
                    i_OCE=1, i_TCE=1,
                    i_RST=ResetSignal(),
                    i_CLK=ClockSignal(ddr_clk), i_CLKDIV=ClockSignal(),
                    i_D1=dq_data_muxed[0], i_D2=dq_data_muxed[1],
                    i_D3=dq_data_muxed[2], i_D4=dq_data_muxed[3],
                    i_T1=~oe_dq
                )
            dq_i_data = Signal(8)
            dq_i_data_d = Signal(8)
            self.specials += \
                Instance("ISERDESE2",
                    p_DATA_WIDTH=2*nphases, p_DATA_RATE="DDR",
                    p_SERDES_MODE="MASTER", p_INTERFACE_TYPE="NETWORKING",
                    p_NUM_CE=1, p_IOBDELAY="IFD",

                    i_DDLY=dq_i_delayed,
                    i_CE1=1,
                    i_RST=ResetSignal() | (self._dly_sel.storage[i//8] & self._rdly_dq_bitslip_rst.re),
                    i_CLK=ClockSignal(ddr_clk), i_CLKB=~ClockSignal(ddr_clk), i_CLKDIV=ClockSignal(),
                    i_BITSLIP=self._dly_sel.storage[i//8] & self._rdly_dq_bitslip.re,
                    o_Q8=dq_i_data[7], o_Q7=dq_i_data[6],
                    o_Q6=dq_i_data[5], o_Q5=dq_i_data[4],
                    o_Q4=dq_i_data[3], o_Q3=dq_i_data[2],
                    o_Q2=dq_i_data[1], o_Q1=dq_i_data[0]
                )
            self.sync += dq_i_data_d.eq(dq_i_data)
            self.comb += [
                self.dfi.phases[0].rddata[0*databits+i].eq(dq_i_data_d[3]), self.dfi.phases[0].rddata[1*databits+i].eq(dq_i_data_d[2]),
                self.dfi.phases[0].rddata[2*databits+i].eq(dq_i_data_d[1]), self.dfi.phases[0].rddata[3*databits+i].eq(dq_i_data_d[0]),
                self.dfi.phases[1].rddata[0*databits+i].eq(dq_i_data[3]), self.dfi.phases[1].rddata[1*databits+i].eq(dq_i_data[2]),
                self.dfi.phases[1].rddata[2*databits+i].eq(dq_i_data[1]), self.dfi.phases[1].rddata[3*databits+i].eq(dq_i_data[0]),
            ]

            if with_odelay:
                self.specials += \
                    Instance("ODELAYE2",
                        p_DELAY_SRC="ODATAIN", p_SIGNAL_PATTERN="DATA",
                        p_CINVCTRL_SEL="FALSE", p_HIGH_PERFORMANCE_MODE="TRUE", p_REFCLK_FREQUENCY=iodelay_clk_freq/1e6,
                        p_PIPE_SEL="FALSE", p_ODELAY_TYPE="VARIABLE", p_ODELAY_VALUE=0,

                        i_C=ClockSignal(),
                        i_LD=self._dly_sel.storage[i//8] & self._wdly_dq_rst.re,
                        i_CE=self._dly_sel.storage[i//8] & self._wdly_dq_inc.re,
                        i_LDPIPEEN=0, i_INC=1,

                        o_ODATAIN=dq_o_nodelay, o_DATAOUT=dq_o_delayed
                    )
            self.specials += \
                Instance("IDELAYE2",
                    p_DELAY_SRC="IDATAIN", p_SIGNAL_PATTERN="DATA",
                    p_CINVCTRL_SEL="FALSE", p_HIGH_PERFORMANCE_MODE="TRUE", p_REFCLK_FREQUENCY=iodelay_clk_freq/1e6,
                    p_PIPE_SEL="FALSE", p_IDELAY_TYPE="VARIABLE", p_IDELAY_VALUE=0,

                    i_C=ClockSignal(),
                    i_LD=self._dly_sel.storage[i//8] & self._rdly_dq_rst.re,
                    i_CE=self._dly_sel.storage[i//8] & self._rdly_dq_inc.re,
                    i_LDPIPEEN=0, i_INC=1,

                    i_IDATAIN=dq_i_nodelay, o_DATAOUT=dq_i_delayed
                )
            self.specials += \
                Instance("IOBUF",
                    i_I=dq_o_delayed if with_odelay else dq_o_nodelay, o_O=dq_i_nodelay, i_T=dq_t,
                    io_IO=pads.dq[i]
                )

        # Flow control
        #
        # total read latency:
        #  2 cycles through OSERDESE2
        #  cl_sys_latency cycles CAS
        #  2 cycles through ISERDESE2
        rddata_en = self.dfi.phases[self.settings.rdphase].rddata_en
        for i in range(self.settings.read_latency-1):
            n_rddata_en = Signal()
            self.sync += n_rddata_en.eq(rddata_en)
            rddata_en = n_rddata_en
        if with_odelay:
            self.sync += [phase.rddata_valid.eq(rddata_en | self._wlevel_en.storage)
                for phase in self.dfi.phases]
        else:
            self.sync += [phase.rddata_valid.eq(rddata_en)
                for phase in self.dfi.phases]

        oe = Signal()
        last_wrdata_en = Signal(cwl_sys_latency+3)
        wrphase = self.dfi.phases[self.settings.wrphase]
        self.sync += last_wrdata_en.eq(Cat(wrphase.wrdata_en, last_wrdata_en[:-1]))
        self.comb += oe.eq(
            last_wrdata_en[cwl_sys_latency-1] |
            last_wrdata_en[cwl_sys_latency] |
            last_wrdata_en[cwl_sys_latency+1] |
            last_wrdata_en[cwl_sys_latency+2])
        if with_odelay:
            self.sync += \
                If(self._wlevel_en.storage,
                    oe_dqs.eq(1), oe_dq.eq(0)
                ).Else(
                    oe_dqs.eq(oe), oe_dq.eq(oe)
                )
        else:
            self.sync += [
                oe_dqs.eq(oe),
                oe_dq.eq(oe)
            ]

        self.sync += bl8_sel.eq(last_wrdata_en[cwl_sys_latency-1])


class V7DDRPHY(S7DDRPHY):
    def __init__(self, pads, **kwargs):
        S7DDRPHY.__init__(self, pads, with_odelay=True, **kwargs)


class K7DDRPHY(S7DDRPHY):
    def __init__(self, pads, **kwargs):
        S7DDRPHY.__init__(self, pads, with_odelay=True, **kwargs)

class A7DDRPHY(S7DDRPHY):
    def __init__(self, pads, **kwargs):
        S7DDRPHY.__init__(self, pads, with_odelay=False, **kwargs)

Example Constraint File

It would be nice to have an example constraint file for high speed operation (800MT/s and above).

Specifically to answer the questions:

Slew fast/slow (and are A pins different than Q pins given they are half the toggle rate)
IN_TERM values
SSTL15/SSTL15_R? (And DDR3L examples)

revert TXXDController changes

once YosysHQ/yosys#850 lands in yosys master.

BL8 1:2 hangs with UART bridge

What I know so far:

Only repros in 1:2 mode works fine in 1:4
Is sensitive to the frequency (design meets timing)
Writes succeed, only reads fail.

Add Reordering support

Ports can now expose banks to user and can allow reordering accesses to the memory.

To implement reordering, we could create a module that would work on two native ports:

one used by the user with all accesses in order and not exposing the banks.
one used internally with reordered accesses and exposing the banks.

user_port = LiteDRAMNativePort(..., with_reordering=False)
internal_port = LiteDRAMNativePort(..., with_reordering=True)

class LiteDRAMReordering(Module)
    def __init__(self, user_port, internal_port):
        [...]

We could implement this scenario:

For writes:

add bank cmd buffers (only store row/col).
each bank cmd buffers maintain a in /out count.
add data ram (min depth = nbanks * depth of cmd buffers)
redirect write cmd to the proper bank cmd buffer.
write cmd is accepted if the proper bank buffer is not full.
when bank cmd buffer accepts the cmd, accept the data and store it at location
bank << log2(buffers' depth) + bank in index.
when bank cmd buffer outputs the cmd, retrieve the data in ram at location
bank << log2(buffers' depth) + bank out count and put it in data queue that will
be presented to the crossbar.

For reads:

maintain a global cmd_in and data_out count.
add bank cmd buffers (only store row/col/cmd_in count).
add data ram (min depth = nbanks * depth of cmd buffers).
redirect read cmd to the proper bank cmd buffer.
read cmd is accepted if the proper bank buffer is not full and if read data corresponding to the same cmd_in count value has been presented to the user (should be cmd_in count + 1 != data_out count).
when a bank accepts a cmd, put cmd_in outputed by the cmd buffer value in a queue, use this queue to know where to store the next returned data in the data ram.
use a flip bit in data ram to indicate that data has been updated (flip this bit each time a location is used).
read data at data_out count location, if bit has flipped, present the data and increment data_out count to read next location.

Add some information about performance / bandwidth

It would be good if the docs had some information about performance of litedram in different configurations.

Maybe it could be added to my spreadsheet here.

BL8 1:2 controller fails to compile

Traceback (most recent call last):
File "core.py", line 275, in
main()
File "core.py", line 256, in main
soc.generate_sdram_phy_py_header()
File "core.py", line 240, in generate_sdram_phy_py_header
self.sdram.controller.settings.timing))
File "/home/john/repos/litedram/litedram/sdram_init.py", line 292, in get_sdram_phy_py_header
init_sequence, _ = get_sdram_phy_init_sequence(phy_settings, timing_settings)
File "/home/john/repos/litedram/litedram/sdram_init.py", line 175, in get_sdram_phy_init_sequence
mr0 = format_mr0(bl, cl, wr, 1)
File "/home/john/repos/litedram/litedram/sdram_init.py", line 125, in format_mr0
mr0 |= wr_to_mr0[wr] << 9
KeyError: 4

The formula computing this uses tWTR*nphases which I think is wrong. Micron says this should be configured with the following formula:
WR (cycles) = roundup (tWR [ns]/tCK [ns]).

1:2 Sequential Access Throughput is Lower Than Expected

The 1:2 Phy only gets 64% bus efficency with a sequential read pattern. The 1:4 is capable of 97%.

My theory is the arbitrator in the bank machine mux is to blame.

nizox/tapcfg is not available anymore

When trying to install LiteX :

Cloning into '/home/xilinxbox/litex/litex/build/sim/core/modules/ethernet/tapcfg'...
Username for 'https://github.com':
Password for 'https://github.com':
remote: Repository not found.
fatal: Authentication failed for 'https://github.com/nizox/tapcfg/'
fatal: clone of 'https://github.com/nizox/tapcfg' into submodule path '/home/xilinxbox/litex/litex/build/sim/core/modules/ethernet/tapcfg' failed
Failed to clone 'litex/build/sim/core/modules/ethernet/tapcfg' a second time, aborting

And https://github.com/nizox/tapcfg gives a 404...

Add support for ice40 with sdram (CAT board?)

Most ice40 boards have SRAM which makes them useless for running real OSes like Linux. In theory you should be able to use the SDRAM controller from litedram on the ice40.

The CAT board https://hackaday.io/project/7982-cat-board is the only ice40 boards I've found with SDRAM so far.

DDR4 / Add command parity support

See tn_4003_DDR4_network_design_guide / p5.

Multiple Timings Ignored

This commit fixes it, however its entangled with my auto_precharge pull request. Since timing has to be re verified anyways might as well pull them both.

The specific fix is here: JohnSully@a4be642

1:2 Controller hard to meet timing

The main issue seems to be the bankmachine.hit calculation. Checking the openrow for equality should be registered.

Refactor S7DDRPHY / add proper 1:2 BL8 support

This will improve code sharing, ease understanding and allow implementing BL8 support more easily for 1:2 PHY:

move DDR2/DDR3 latency/phase computation functions to a common file (can be reused by others phys)
split code in modules: control / tx datapath / rx datapath
add proper BL8 support for the 1:2 PHY: control path is similar, we can just do some adaptation on the tx datapath / rx datapath signals.

ddr calibration fails when SoC temperature is hot

When die temp is high: 70.0107 (0x0ae5)

        __   _ __      _  __
       / /  (_) /____ | |/_/
      / /__/ / __/ -_)>  <
     /____/_/\__/\__/_/|_|
 SoC BIOS / CPU: VexRiscv / 100MHz
(c) Copyright 2012-2018 Enjoy-Digital
(c) Copyright 2007-2018 M-Labs Limited
Built Jul 19 2018 15:06:12

BIOS CRC passed (5bf71210)
Initializing SDRAM...
Read bitslip: 3
Read delays scan:
m0: 00000000000000000000000000000000
m1: 00000000001111111111111000000000
m2: 00000000001111111111111000000000
m3: 00000000001111111111111100000000
Read delays: 3:10-24  2:10-24  1:10-24  0:32-33  completed
Memtest bus failed: 156/256 errors
Memtest data failed: 524288/524288 errors
Memtest addr failed: 8192/8192 errors
Memory initialization failed

With the latest release, the board fails to boot:

        __   _ __      _  __
       / /  (_) /____ | |/_/
      / /__/ / __/ -_)>  <
     /____/_/\__/\__/_/|_|
 SoC BIOS / CPU: VexRiscv / 100MHz
(c) Copyright 2012-2018 Enjoy-Digital
(c) Copyright 2007-2018 M-Labs Limited
Built Jul 19 2018 15:34:22

BIOS CRC passed (317e43f7)
Initializing SDRAM...
Read bitslip: 2
Read delays scan:
m0: 00000000000000000000000000000000
m1: 11111111000000000000000000000000
m2: 11111111000000000000000000000000
m3: 111111110000

Out of order interface is incomplete

This is more of an FYI that out of order is not yet fully completed. Specifically the write interface must also output the bank its using. In addition OO is disabled in the frontend crossbar.

The interface is also not stable. If we implement re-ordering within banks (e.g. write batching) then we'll have to move to a tag system to track operations.

SDR / fix refresh

Refresh seems to be broken with SDR (working with DDR, DDR2, DDR3, DDR4).

Multirank: add dynamic ODT

To support higher frequencies in dual/quad rank mode, we will need to drive ODT dynamically.
Useful information can be found in:

TN-41-08: Design Guide for Two DDR3-1066 UDIMM Systems
TN-04-54: High-Speed DRAM Controller Design

SDRAM CKE as optional

make it easy to not export simple SDRAM CKE pin, as it is optional and may be tied hi on target PCB, and yes it may be that there is not a single unused pin left to assign this signal too

migen

You should mention in the README the dependency on migen and how to install it.

Refresh command is not issued.

I get a precharge all, but no refresh command. Reverting 6620a91 seems to get it back.

settings.timing.tWTR expressed in nanoseconds but used as cycles

In multiplexer.py the line: fsm.delayed_enter("WTR", "READ", settings.timing.tWTR-1) uses the tWTR parameter. However in the modules its expressed in nanoseconds.

Other parameters such as tRP also expressed in nanoseconds undergo a transformation elsewhere to get converted into cycles - however tWTR never does. This greatly exaggerates the delay switching between write and read.

DDR4 / Manage Vref DQ Calibration

Latest litedram fails on the Digilent Atlys board

https://travis-ci.org/mithro/HDMI2USB-litex-firmware/jobs/352035389 and https://api.travis-ci.org/v3/job/352035389/log.txt

Total REAL time to Placer completion: 3 mins 8 secs 
Total CPU  time to Placer completion: 3 mins 8 secs 
Running post-placement packing...
Writing output files...
*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
ERROR:PhysDesignRules:1939 - Issue with pin connections and/or configuration on
   block:<ODDR2_5>:<OLOGIC2_OUTFF>.  The OUTFF Flip-flop SRTYPE_OQ mode must be
   ASYNC in DDR mode with DDR_ALIGNMENT mode of C0 or C1.
ERROR:Pack:1642 - Errors in physical DRC.
Mapping completed.
See MAP report file "top_map.mrp" for details.
Problem encountered during the packing phase.
Design Summary
--------------
Number of errors   :   2
Number of warnings :  10

Problems instantiating on ice40 board

I have made an SDRAM plug-in board for the Lattice ICE40 HX8K EVB. It uses the AS4C16M16 which is already supported.

I am trying to make a simple Litex project to use it, but I get a synthesis error:

$ python3 ice40hx8k_litedram_nn.py 
lxbuildenv: v2019.8.19.1 (run ice40hx8k_litedram_nn.py --lx-help for help)
<__main__.Platform object at 0x7fb78610b7f0>
{'cpu_type': None, 'cpu_variant': None, 'integrated_rom_size': 0, 'integrated_sram_size': 0}
ERROR: Conflicting init values for signal 1'1 (\soc_sdram_master_p0_act_n = 1'1, \soc_sdram_choose_req_want_activates = 1'0).
Traceback (most recent call last):
  File "ice40hx8k_litedram_nn.py", line 224, in <module>
    main()
  File "ice40hx8k_litedram_nn.py", line 220, in main
    vns = builder.build()
  File "/home/niklas/hack/litex/notbuildenv/deps/litex/litex/soc/integration/builder.py", line 185, in build
    toolchain_path=toolchain_path, **kwargs)
  File "/home/niklas/hack/litex/notbuildenv/deps/litex/litex/soc/integration/soc_core.py", line 452, in build
    return self.platform.build(self, *args, **kwargs)
  File "/home/niklas/hack/litex/notbuildenv/deps/litex/litex/build/lattice/platform.py", line 34, in build
    return self.toolchain.build(self, *args, **kwargs)
  File "/home/niklas/hack/litex/notbuildenv/deps/litex/litex/build/lattice/icestorm.py", line 189, in build
    _run_script(script)
  File "/home/niklas/hack/litex/notbuildenv/deps/litex/litex/build/lattice/icestorm.py", line 121, in _run_script
    raise OSError("Subprocess failed")
OSError: Subprocess failed

I have created a repo here with a small example file and the build output checked in. I'm using yosys (Yosys 0.9+932 (git sha1 9e6632c4, clang 9.0.0 -fPIC -Os).

The issue is solved by the following patch against LiteDRAM, also included in the test repo above:

diff --git a/litedram/phy/dfi.py b/litedram/phy/dfi.py
index 2948556..44bca0b 100644
--- a/litedram/phy/dfi.py
+++ b/litedram/phy/dfi.py
@@ -15,7 +15,7 @@ def phase_cmd_description(addressbits, bankbits, nranks):
         ("cke",          nranks, DIR_M_TO_S),
         ("odt",          nranks, DIR_M_TO_S),
         ("reset_n",           1, DIR_M_TO_S),
-        ("act_n",             1, DIR_M_TO_S)
+        #("act_n",             1, DIR_M_TO_S)
     ]
 
 
@@ -52,7 +52,7 @@ class Interface(Record):
             p.cs_n.reset = (2**nranks-1)
             p.ras_n.reset = 1
             p.we_n.reset = 1
-            p.act_n.reset = 1
+            #p.act_n.reset = 1
 
     # Returns pairs (DFI-mandated signal name, Migen signal object)
     def get_standard_names(self, m2s=True, s2m=True):
@@ -85,11 +85,11 @@ class DDR4DFIMux(Module):
             self.comb += [
                 p_i.connect(p_o),
                 If(~p_i.ras_n & p_i.cas_n & p_i.we_n,
-                   p_o.act_n.eq(0),
+                   #p_o.act_n.eq(0),
                    p_o.we_n.eq(p_i.address[14]),
                    p_o.cas_n.eq(p_i.address[15]),
                    p_o.ras_n.eq(p_i.address[16])
                 ).Else(
-                    p_o.act_n.eq(1),
+                    #p_o.act_n.eq(1),
                 )
             ]

1:2 7-series Phy doesn't issue commands in write mode

Underlying issue appears to be that the wrcmdphase has the same value as the wrphase. The wrphase wins and the command is silently dropped (but the bank machine thinks it was sent).

Forcing wrcmdphase to 1 appears to work.

tFAW is too pessimistic

Haven't had a chance to look into why yet, but I've temporarily disabled it locally. Its a major bottleneck the difference between 42% and 80% bus efficiency on my random write test. I'm not getting tFAW errors with it disabled which makes me think its not correct.

Add tCCD for older DDR technologies

It seems that litedram requires a tCCD value on modules now. From @JohnSully and timvideos/litex-buildenv#77 (comment)

For DDR and SDR tCCD is 1 cycle.
LPDDR depends on the module. It’s burst length/2.

pin sdram_dm[1] stuck at GND can this be ignored?

Warning (13410): Pin "sdram_dm[1]" is stuck at GND File: X:/GIT/litex-boards/litex_boards/partner/targets/soc_basesoc_c10lprefkit/gateware/top.v Line: 3435

can this warning be ignored?

memtest reports OK, but having 1 out of 2 DM pins stuck GND does not sound like a warning to be ignored ?

Allow all CL/CWL combinations to be used with DDR3

Some CL/CWL combinations are not possible since cmd_phase and dat_phase will be the same and there is no arbitration.

An assert (assert cmd_phase != dat_phase) is currently implemented in the code to prevent this case, but this limits the possible CL/CWL combinations.

We should probably remove this limitation by doing arbitration between cmd and data phases when both are valid.

Improve SDRAMModule speedgrade definition

From @mithro:

Wouldn't it be nicer to have;

SpeedGrade = namedtuple("tRP", "tRCD", "tWR", "tRFC", "tFAW", "tRC", "tRAS")

# DDR3
class MT41J128M16(SDRAMModule):
     memtype = "DDR3"
     # geometry
     nbanks = 8
     nrows  = 16384
     ncols  = 1024
     # speedgrade invariant timings
     tREFI = 64e6/8192
     tWTR  = (4, 7.5)
     tCCD  = (4, None)
     tRRD  = 10

     speed_grades = {
         800: SpeedGrade(tRP=13.1, tRCD=13.1, tWR=13.1, tRFC=64, tRC=50.625, tRAS=37.5),
         1066:  SpeedGrade(...),
     }

lpddr: NameError: name 'mr1' is not defined

(LX P=mimasv2 C=vexriscv R=master-clean) tansell@tansell:~/github/timvideos/litex-buildenv-clean$ make firmware                                                                                                   
mkdir -p build/mimasv2_base_vexriscv/
time python -u ./make.py --platform=mimasv2 --target=base --cpu-type=vexriscv --iprange=192.168.100 -Ob toolchain_path /opt/Xilinx/    --no-compile-gateware \                                                    
        2>&1 | tee -a /usr/local/google/home/tansell/github/timvideos/litex-buildenv-clean/build/mimasv2_base_vexriscv//output.20191030-132700.log; (exit ${PIPESTATUS[0]})                                       
Traceback (most recent call last):
  File "./make.py", line 164, in <module>
    main()
  File "./make.py", line 148, in main
    vns = builder.build(**dict(args.build_option))
  File "/usr/local/google/home/tansell/github/timvideos/litex-buildenv-clean/third_party/litex/litex/soc/integration/builder.py", line 167, in build                                                              
    self._generate_includes()
  File "/usr/local/google/home/tansell/github/timvideos/litex-buildenv-clean/third_party/litex/litex/soc/integration/builder.py", line 129, in _generate_includes                                                 
    self.soc.sdram.controller.settings.timing))
  File "/usr/local/google/home/tansell/github/timvideos/litex-buildenv-clean/third_party/litedram/litedram/init.py", line 434, in get_sdram_phy_c_header                                                          
    init_sequence, mr1 = get_sdram_phy_init_sequence(phy_settings, timing_settings)
  File "/usr/local/google/home/tansell/github/timvideos/litex-buildenv-clean/third_party/litedram/litedram/init.py", line 377, in get_sdram_phy_init_sequence                                                     
    }[phy_settings.memtype](phy_settings, timing_settings)
  File "/usr/local/google/home/tansell/github/timvideos/litex-buildenv-clean/third_party/litedram/litedram/init.py", line 78, in get_lpddr_phy_init_sequence                                                      
    return init_sequence, mr1
NameError: name 'mr1' is not defined

real    0m0.371s
user    0m0.311s
sys     0m0.066s
make: *** [Makefile:303: firmware-cmd] Error 1

Verify DDR3 DQS Write Preamble/Postamble

Current DDR3 phys are doing Preamble/Postamble on 8 DDR cycles.
Verify that it's not violating specification and if it's the case, reduce it.

Can't calibrate

This is related to my other issue where calibration hangs. However even when working around it with a delay between reads and writes calibration still fails. It was working about a month ago.

Trying to figure it out on my own, but I'm running out of ideas.

Controller refuses commands when in reordering mode

Related to: 06ca53d

Specifically the bank machine command fifo isn't updated to handle the fact that ready can go high when valid is low.

TypeError: phase_cmd_description() missing 1 required positional argument: 'nranks'

litedram revision e5696ad

From https://travis-ci.org/timvideos/litex-buildenv/jobs/430294286

time python -u ./make.py --platform=atlys --target=base --cpu-type=lm32 --iprange=192.168.100     --no-compile-gateware \
		2>&1 | tee -a build/atlys_base_lm32//output.20180918-230444.log; (exit ${PIPESTATUS[0]})
Traceback (most recent call last):
  File "./make.py", line 156, in <module>
    main()
  File "./make.py", line 123, in main
    soc = get_soc(args, platform)
  File "./make.py", line 57, in get_soc
    soc = SoC(platform, ident=SoC.__name__, **soc_sdram_argdict(args), **dict(args.target_option))
  File "/home/travis/build/timvideos/litex-buildenv/targets/atlys/base.py", line 228, in __init__
    dqs_ddr_alignment="C0")
  File "/home/travis/build/timvideos/litex-buildenv/third_party/litedram/litedram/phy/s6ddrphy.py", line 111, in __init__
    r_dfi = Array(Record(phase_cmd_description(addressbits, bankbits)) for i in range(nphases))
  File "/home/travis/build/timvideos/litex-buildenv/third_party/litedram/litedram/phy/s6ddrphy.py", line 111, in <genexpr>
    r_dfi = Array(Record(phase_cmd_description(addressbits, bankbits)) for i in range(nphases))
TypeError: phase_cmd_description() missing 1 required positional argument: 'nranks'

Simulation is broken

Hi,
I'm trying to test a custom configuration in simulation. Before that, I went to execute sim.py without any change and found couple of issues:

The mem_1.init is not referenced (copied) anywhere on sim.py

WARNING: File mem_1.init referenced on $PATH/litex/litedram/examples/sim/litedram_core.v at line 15687 cannot be opened for reading. Please ensure that this file is available in the current working directory.

There's a X prop when litedram_core releases the user reset and introduces an unknown state to the DDR3 model.

top_tb.dut.ddr3.main: at time 530100.0 ps ERROR: CK and CK_N are not allowed to go to an unknown state.

I'm not sure if I'm doing something wrong or there's an extra step needed, but as far as I goes, I haven't had any successful simulation result. As you can see, init_done is never asserted, so the main fsm got stuck on idle states forever. Tracing back, that signal ends asserted by a CSR, but a) There's no CPU or register write interface and b) Both init memories are empty by default.
There are a lot of redundant writes to variables, therefore the generated code gets verbose. In this excerpt from top.v lines 335 to 341, the variable fsm_next_state is written twice, along with the declaration that is initialized. This can be improved.

always @(*) begin
	sdram_checker_addr_gen_ce <= 1'd0;
	sdram_checker_sink_sink_valid <= 1'd0;
	fsm0_next_state <= 2'd0; // Already initialized by 0
	sdram_checker_cmd_counter_fsm0_next_value <= 24'd0;
	sdram_checker_cmd_counter_fsm0_next_value_ce <= 1'd0;
	fsm0_next_state <= fsm0_state; // Written again

Thank you.

DDR4 / Add optional features support

Manage bank groups.
Manage Vref DQ Calibration.
Add data CRC support (See tn_4003_DDR4_network_design_guide / p4).
Add DBI support (See tn_4003_DDR4_network_design_guide / p6).

Add a test design for each of the DDR technologies

It would be good to have at least one test for each of the technology group;

SDRAM
DDR
LPDDR
DDR2
DDR3

I believe DDR3 is the primary one covered by tests at the moment?

Verify DDR3 MR0 Write Recovery configuration

Write Recovery should be computed as follow:
WR (cycles) = roundup (tWR [ns]/tCK [ns])

We are currently using a fixed value:
mr0 = format_mr0(bl, cl, 14, 1) # wr=8 FIXME: this should be ceiling(tWR/tCK)

Add migen testbenches for the core

The core is currently tested with high level simulation of the full controller which is not practical and slow.

We need good migen testbenches for the cores to:

ensure we don't have regression when doing changes.
ensure all timings parameters are handled correctly.
provide a better way to evaluate efficiency according to the access pattern.
find the best architecture.

DDR4 / Add DBI support

See tn_4003_DDR4_network_design_guide / p6.

Allow sharing bank machines between banks

Currently, each bank has its own bank machine. On designs where efficiency is less critical, being able to reduce the number of bank machines and sharing a bank machine between multiple banks would be useful to reduce resource usage.