enjoy-digital / litedram Goto Github PK
View Code? Open in Web Editor NEWSmall footprint and configurable DRAM core
License: Other
Small footprint and configurable DRAM core
License: Other
When I have an in-order port and an out-of-order port, traffic on the out of order port corrupts data on the in order port. Issue goes away when traffic is stopped on the out of order port.
See tn_4003_DDR4_network_design_guide / p4.
we get memtest OK results while one of the DQ pins was not connected due to error in platform, so the dq port in top had only 15 pins, but memtest is reporting OK ?
The minispartan6+ has sdram rather than DDR.
This is hidden when only using a single user port because the time to complete a command exceeds these timings. However when operating with multiple ports it is possible to exceed both tRRD and tFAW as the mux will issue a row command every cycle if it receives a command from a bank to do so.
The controller can be used on various systems which all have specific needs. The current architecture try to be simple and give reasonable efficiency/performance for most of the usecases.
Until now, performance was good on access patterns that were used. (mostly for linear data buffers and store CPU code). We still need to do #8 and #57 to have better metrics.
With more users, some new usecases appear with different access patterns and we need to understand what are the eventual bottlenecks and the way to improve this.
Apply the improvements made to S7DDRPHY and do similar refactoring than #60.
S7DDRPHY works with:
To support DDR3 / 2 phases / BL8, the PHY needs some modifications.
Here is an old modified version of the PHY that has these modifications and that should be merged cleanly:
# 1:4, 1:2 frequency-ratio DDR2/DDR3 PHY for Xilinx's Series7
# DDR2: 400, 533, 667, 800 and 1066 MT/s
# DDR3: 1066, 1333 and 1600 MT/s
import math
from migen import *
from litex.soc.interconnect.csr import *
from litedram.common import PhySettings
from litedram.phy.dfi import *
def get_cl_cw(memtype, tck):
if memtype == "DDR2":
# ddr2-400
if tck >= 2/400e6:
cl, cwl = 3, 2
# ddr2-533
elif tck >= 2/533e6:
cl, cwl = 4, 3
# ddr2-667
elif tck >= 2/677e6:
cl, cwl = 5, 4
# ddr2-800
elif tck >= 2/800e6:
cl, cwl = 6, 5
# ddr2-1066
elif tck >= 2/1066e6:
cl, cwl = 7, 5
else:
raise ValueError
elif memtype == "DDR3":
# ddr3-1066
if tck >= 2/1066e6:
cl, cwl = 7, 6
# ddr3-1333
elif tck >= 2/1333e6:
cl, cwl = 10, 7
# ddr3-1600
elif tck >= 2/1600e6:
cl, cwl = 11, 8
else:
raise ValueError
return cl, cwl
def get_sys_latency(nphases, cas_latency):
return math.ceil(cas_latency/nphases)
def get_sys_phases(nphases, sys_latency, cas_latency, write=False):
cmd_phase = 0
dat_phase = 0
diff_phase = 0
while (diff_phase + cas_latency) != sys_latency*nphases:
dat_phase += 1
if dat_phase == nphases:
dat_phase = 0
cmd_phase += 1
if write:
diff_phase = dat_phase - cmd_phase
else:
diff_phase = cmd_phase - dat_phase
return cmd_phase, dat_phase
class S7DDRPHY(Module, AutoCSR):
def __init__(self, pads, with_odelay, memtype="DDR3", nphases=4, sys_clk_freq=100e6, iodelay_clk_freq=200e6):
tck = 2/(2*nphases*sys_clk_freq)
addressbits = len(pads.a)
bankbits = len(pads.ba)
databits = len(pads.dq)
nphases = nphases
iodelay_tap_average = {
200e6: 78e-12,
300e6: 52e-12,
}
half_sys8x_taps = math.floor(tck/(4*iodelay_tap_average[iodelay_clk_freq]))
self._half_sys8x_taps = CSRStorage(4, reset=half_sys8x_taps)
if with_odelay:
self._wlevel_en = CSRStorage()
self._wlevel_strobe = CSR()
self._dly_sel = CSRStorage(databits//8)
self._rdly_dq_rst = CSR()
self._rdly_dq_inc = CSR()
self._rdly_dq_bitslip_rst = CSR()
self._rdly_dq_bitslip = CSR()
if with_odelay:
self._wdly_dq_rst = CSR()
self._wdly_dq_inc = CSR()
self._wdly_dqs_rst = CSR()
self._wdly_dqs_inc = CSR()
# compute phy settings
cl, cwl = get_cl_cw(memtype, tck)
cl_sys_latency = get_sys_latency(nphases, cl)
cwl_sys_latency = get_sys_latency(nphases, cwl)
rdcmdphase, rdphase = get_sys_phases(nphases, cl_sys_latency, cl)
wrcmdphase, wrphase = get_sys_phases(nphases, cwl_sys_latency, cwl, write=True)
wrcmdphase = 1
print("wrcmdphase: " + str(wrcmdphase) + " wrphase: " + str(wrphase))
self.settings = PhySettings(
memtype=memtype,
dfi_databits=4*databits,
nphases=nphases,
rdphase=rdphase,
wrphase=wrphase,
rdcmdphase=rdcmdphase,
wrcmdphase=wrcmdphase,
cl=cl,
cwl=cwl,
read_latency=2 + cl_sys_latency + 2 + 1,
write_latency=cwl_sys_latency
)
self.dfi = Interface(addressbits, bankbits, 4*databits, 4)
# # #
bl8_sel = Signal()
# Clock
ddr_clk = "sys2x" if nphases == 2 else "sys4x"
for i in range(len(pads.clk_p)):
sd_clk_se = Signal()
self.specials += [
Instance("OSERDESE2",
p_DATA_WIDTH=2*nphases, p_TRISTATE_WIDTH=1,
p_DATA_RATE_OQ="DDR", p_DATA_RATE_TQ="BUF",
p_SERDES_MODE="MASTER",
o_OQ=sd_clk_se,
i_OCE=1,
i_RST=ResetSignal(),
i_CLK=ClockSignal(ddr_clk), i_CLKDIV=ClockSignal(),
i_D1=0, i_D2=1, i_D3=0, i_D4=1,
i_D5=0, i_D6=1, i_D7=0, i_D8=1
),
Instance("OBUFDS",
i_I=sd_clk_se,
o_O=pads.clk_p[i],
o_OB=pads.clk_n[i]
)
]
# Addresses and commands
for i in range(addressbits):
self.specials += \
Instance("OSERDESE2",
p_DATA_WIDTH=2*nphases, p_TRISTATE_WIDTH=1,
p_DATA_RATE_OQ="DDR", p_DATA_RATE_TQ="BUF",
p_SERDES_MODE="MASTER",
o_OQ=pads.a[i],
i_OCE=1,
i_RST=ResetSignal(),
i_CLK=ClockSignal(ddr_clk), i_CLKDIV=ClockSignal(),
i_D1=self.dfi.phases[0].address[i], i_D2=self.dfi.phases[0].address[i],
i_D3=self.dfi.phases[1].address[i], i_D4=self.dfi.phases[1].address[i],
i_D5=self.dfi.phases[2].address[i], i_D6=self.dfi.phases[2].address[i],
i_D7=self.dfi.phases[3].address[i], i_D8=self.dfi.phases[3].address[i]
)
for i in range(bankbits):
self.specials += \
Instance("OSERDESE2",
p_DATA_WIDTH=2*nphases, p_TRISTATE_WIDTH=1,
p_DATA_RATE_OQ="DDR", p_DATA_RATE_TQ="BUF",
p_SERDES_MODE="MASTER",
o_OQ=pads.ba[i],
i_OCE=1,
i_RST=ResetSignal(),
i_CLK=ClockSignal(ddr_clk), i_CLKDIV=ClockSignal(),
i_D1=self.dfi.phases[0].bank[i], i_D2=self.dfi.phases[0].bank[i],
i_D3=self.dfi.phases[1].bank[i], i_D4=self.dfi.phases[1].bank[i],
i_D5=self.dfi.phases[2].bank[i], i_D6=self.dfi.phases[2].bank[i],
i_D7=self.dfi.phases[3].bank[i], i_D8=self.dfi.phases[3].bank[i]
)
controls = ["ras_n", "cas_n", "we_n", "cke", "odt"]
if hasattr(pads, "reset_n"):
controls.append("reset_n")
if hasattr(pads, "cs_n"):
controls.append("cs_n")
for name in controls:
self.specials += \
Instance("OSERDESE2",
p_DATA_WIDTH=2*nphases, p_TRISTATE_WIDTH=1,
p_DATA_RATE_OQ="DDR", p_DATA_RATE_TQ="BUF",
p_SERDES_MODE="MASTER",
o_OQ=getattr(pads, name),
i_OCE=1,
i_RST=ResetSignal(),
i_CLK=ClockSignal(ddr_clk), i_CLKDIV=ClockSignal(),
i_D1=getattr(self.dfi.phases[0], name), i_D2=getattr(self.dfi.phases[0], name),
i_D3=getattr(self.dfi.phases[1], name), i_D4=getattr(self.dfi.phases[1], name),
i_D5=getattr(self.dfi.phases[2], name), i_D6=getattr(self.dfi.phases[2], name),
i_D7=getattr(self.dfi.phases[3], name), i_D8=getattr(self.dfi.phases[3], name)
)
# DQS and DM
oe_dqs = Signal()
dqs_serdes_pattern = Signal(8, reset=0b01010101)
if with_odelay:
self.comb += \
If(self._wlevel_en.storage,
If(self._wlevel_strobe.re,
dqs_serdes_pattern.eq(0b00000001)
).Else(
dqs_serdes_pattern.eq(0b00000000)
)
).Else(
dqs_serdes_pattern.eq(0b01010101)
)
for i in range(databits//8):
dm_o_nodelay = Signal()
dm_data = Signal(8)
dm_data_d = Signal(8)
dm_data_muxed = Signal(4)
self.comb += dm_data.eq(Cat(
self.dfi.phases[0].wrdata_mask[0*databits//8+i], self.dfi.phases[0].wrdata_mask[1*databits//8+i],
self.dfi.phases[0].wrdata_mask[2*databits//8+i], self.dfi.phases[0].wrdata_mask[3*databits//8+i],
self.dfi.phases[1].wrdata_mask[0*databits//8+i], self.dfi.phases[1].wrdata_mask[1*databits//8+i],
self.dfi.phases[1].wrdata_mask[2*databits//8+i], self.dfi.phases[1].wrdata_mask[3*databits//8+i]),
)
self.sync += dm_data_d.eq(dm_data)
self.comb += \
If(bl8_sel,
dm_data_muxed.eq(dm_data_d[4:])
).Else(
dm_data_muxed.eq(dm_data[:4])
)
self.specials += \
Instance("OSERDESE2",
p_DATA_WIDTH=2*nphases, p_TRISTATE_WIDTH=1,
p_DATA_RATE_OQ="DDR", p_DATA_RATE_TQ="BUF",
p_SERDES_MODE="MASTER",
o_OQ=dm_o_nodelay if with_odelay else pads.dm[i],
i_OCE=1,
i_RST=ResetSignal(),
i_CLK=ClockSignal(ddr_clk), i_CLKDIV=ClockSignal(),
i_D1=dm_data_muxed[0], i_D2=dm_data_muxed[1],
i_D3=dm_data_muxed[2], i_D4=dm_data_muxed[3]
)
if with_odelay:
self.specials += \
Instance("ODELAYE2",
p_DELAY_SRC="ODATAIN", p_SIGNAL_PATTERN="DATA",
p_CINVCTRL_SEL="FALSE", p_HIGH_PERFORMANCE_MODE="TRUE", p_REFCLK_FREQUENCY=iodelay_clk_freq/1e6,
p_PIPE_SEL="FALSE", p_ODELAY_TYPE="VARIABLE", p_ODELAY_VALUE=0,
i_C=ClockSignal(),
i_LD=self._dly_sel.storage[i] & self._wdly_dq_rst.re,
i_CE=self._dly_sel.storage[i] & self._wdly_dq_inc.re,
i_LDPIPEEN=0, i_INC=1,
o_ODATAIN=dm_o_nodelay, o_DATAOUT=pads.dm[i]
)
dqs_nodelay = Signal()
dqs_delayed = Signal()
dqs_t = Signal()
self.specials += \
Instance("OSERDESE2",
p_DATA_WIDTH=2*nphases, p_TRISTATE_WIDTH=1,
p_DATA_RATE_OQ="DDR", p_DATA_RATE_TQ="BUF",
p_SERDES_MODE="MASTER",
o_OFB=dqs_nodelay if with_odelay else Signal(),
o_OQ=Signal() if with_odelay else dqs_nodelay,
o_TQ=dqs_t,
i_OCE=1, i_TCE=1,
i_RST=ResetSignal(),
i_CLK=ClockSignal(ddr_clk) if with_odelay else ClockSignal(ddr_clk+"_dqs"), i_CLKDIV=ClockSignal(),
i_D1=dqs_serdes_pattern[0], i_D2=dqs_serdes_pattern[1],
i_D3=dqs_serdes_pattern[2], i_D4=dqs_serdes_pattern[3],
i_D5=dqs_serdes_pattern[4], i_D6=dqs_serdes_pattern[5],
i_D7=dqs_serdes_pattern[6], i_D8=dqs_serdes_pattern[7],
i_T1=~oe_dqs
)
if with_odelay:
self.specials += \
Instance("ODELAYE2",
p_DELAY_SRC="ODATAIN", p_SIGNAL_PATTERN="DATA",
p_CINVCTRL_SEL="FALSE", p_HIGH_PERFORMANCE_MODE="TRUE", p_REFCLK_FREQUENCY=iodelay_clk_freq/1e6,
p_PIPE_SEL="FALSE", p_ODELAY_TYPE="VARIABLE", p_ODELAY_VALUE=half_sys8x_taps,
i_C=ClockSignal(),
i_LD=self._dly_sel.storage[i] & self._wdly_dqs_rst.re,
i_CE=self._dly_sel.storage[i] & self._wdly_dqs_inc.re,
i_LDPIPEEN=0, i_INC=1,
o_ODATAIN=dqs_nodelay, o_DATAOUT=dqs_delayed
)
self.specials += \
Instance("OBUFTDS",
i_I=dqs_delayed if with_odelay else dqs_nodelay, i_T=dqs_t,
o_O=pads.dqs_p[i], o_OB=pads.dqs_n[i]
)
# DQ
oe_dq = Signal()
for i in range(databits):
dq_o_nodelay = Signal()
dq_o_delayed = Signal()
dq_i_nodelay = Signal()
dq_i_delayed = Signal()
dq_t = Signal()
dq_data = Signal(8)
dq_data_d = Signal(8)
dq_data_muxed = Signal(4)
self.comb += dq_data.eq(Cat(
self.dfi.phases[0].wrdata[0*databits+i], self.dfi.phases[0].wrdata[1*databits+i],
self.dfi.phases[0].wrdata[2*databits+i], self.dfi.phases[0].wrdata[3*databits+i],
self.dfi.phases[1].wrdata[0*databits+i], self.dfi.phases[1].wrdata[1*databits+i],
self.dfi.phases[1].wrdata[2*databits+i], self.dfi.phases[1].wrdata[3*databits+i])
)
self.sync += dq_data_d.eq(dq_data)
self.comb += \
If(bl8_sel,
dq_data_muxed.eq(dq_data_d[4:])
).Else(
dq_data_muxed.eq(dq_data[:4])
)
self.specials += \
Instance("OSERDESE2",
p_DATA_WIDTH=2*nphases, p_TRISTATE_WIDTH=1,
p_DATA_RATE_OQ="DDR", p_DATA_RATE_TQ="BUF",
p_SERDES_MODE="MASTER",
o_OQ=dq_o_nodelay, o_TQ=dq_t,
i_OCE=1, i_TCE=1,
i_RST=ResetSignal(),
i_CLK=ClockSignal(ddr_clk), i_CLKDIV=ClockSignal(),
i_D1=dq_data_muxed[0], i_D2=dq_data_muxed[1],
i_D3=dq_data_muxed[2], i_D4=dq_data_muxed[3],
i_T1=~oe_dq
)
dq_i_data = Signal(8)
dq_i_data_d = Signal(8)
self.specials += \
Instance("ISERDESE2",
p_DATA_WIDTH=2*nphases, p_DATA_RATE="DDR",
p_SERDES_MODE="MASTER", p_INTERFACE_TYPE="NETWORKING",
p_NUM_CE=1, p_IOBDELAY="IFD",
i_DDLY=dq_i_delayed,
i_CE1=1,
i_RST=ResetSignal() | (self._dly_sel.storage[i//8] & self._rdly_dq_bitslip_rst.re),
i_CLK=ClockSignal(ddr_clk), i_CLKB=~ClockSignal(ddr_clk), i_CLKDIV=ClockSignal(),
i_BITSLIP=self._dly_sel.storage[i//8] & self._rdly_dq_bitslip.re,
o_Q8=dq_i_data[7], o_Q7=dq_i_data[6],
o_Q6=dq_i_data[5], o_Q5=dq_i_data[4],
o_Q4=dq_i_data[3], o_Q3=dq_i_data[2],
o_Q2=dq_i_data[1], o_Q1=dq_i_data[0]
)
self.sync += dq_i_data_d.eq(dq_i_data)
self.comb += [
self.dfi.phases[0].rddata[0*databits+i].eq(dq_i_data_d[3]), self.dfi.phases[0].rddata[1*databits+i].eq(dq_i_data_d[2]),
self.dfi.phases[0].rddata[2*databits+i].eq(dq_i_data_d[1]), self.dfi.phases[0].rddata[3*databits+i].eq(dq_i_data_d[0]),
self.dfi.phases[1].rddata[0*databits+i].eq(dq_i_data[3]), self.dfi.phases[1].rddata[1*databits+i].eq(dq_i_data[2]),
self.dfi.phases[1].rddata[2*databits+i].eq(dq_i_data[1]), self.dfi.phases[1].rddata[3*databits+i].eq(dq_i_data[0]),
]
if with_odelay:
self.specials += \
Instance("ODELAYE2",
p_DELAY_SRC="ODATAIN", p_SIGNAL_PATTERN="DATA",
p_CINVCTRL_SEL="FALSE", p_HIGH_PERFORMANCE_MODE="TRUE", p_REFCLK_FREQUENCY=iodelay_clk_freq/1e6,
p_PIPE_SEL="FALSE", p_ODELAY_TYPE="VARIABLE", p_ODELAY_VALUE=0,
i_C=ClockSignal(),
i_LD=self._dly_sel.storage[i//8] & self._wdly_dq_rst.re,
i_CE=self._dly_sel.storage[i//8] & self._wdly_dq_inc.re,
i_LDPIPEEN=0, i_INC=1,
o_ODATAIN=dq_o_nodelay, o_DATAOUT=dq_o_delayed
)
self.specials += \
Instance("IDELAYE2",
p_DELAY_SRC="IDATAIN", p_SIGNAL_PATTERN="DATA",
p_CINVCTRL_SEL="FALSE", p_HIGH_PERFORMANCE_MODE="TRUE", p_REFCLK_FREQUENCY=iodelay_clk_freq/1e6,
p_PIPE_SEL="FALSE", p_IDELAY_TYPE="VARIABLE", p_IDELAY_VALUE=0,
i_C=ClockSignal(),
i_LD=self._dly_sel.storage[i//8] & self._rdly_dq_rst.re,
i_CE=self._dly_sel.storage[i//8] & self._rdly_dq_inc.re,
i_LDPIPEEN=0, i_INC=1,
i_IDATAIN=dq_i_nodelay, o_DATAOUT=dq_i_delayed
)
self.specials += \
Instance("IOBUF",
i_I=dq_o_delayed if with_odelay else dq_o_nodelay, o_O=dq_i_nodelay, i_T=dq_t,
io_IO=pads.dq[i]
)
# Flow control
#
# total read latency:
# 2 cycles through OSERDESE2
# cl_sys_latency cycles CAS
# 2 cycles through ISERDESE2
rddata_en = self.dfi.phases[self.settings.rdphase].rddata_en
for i in range(self.settings.read_latency-1):
n_rddata_en = Signal()
self.sync += n_rddata_en.eq(rddata_en)
rddata_en = n_rddata_en
if with_odelay:
self.sync += [phase.rddata_valid.eq(rddata_en | self._wlevel_en.storage)
for phase in self.dfi.phases]
else:
self.sync += [phase.rddata_valid.eq(rddata_en)
for phase in self.dfi.phases]
oe = Signal()
last_wrdata_en = Signal(cwl_sys_latency+3)
wrphase = self.dfi.phases[self.settings.wrphase]
self.sync += last_wrdata_en.eq(Cat(wrphase.wrdata_en, last_wrdata_en[:-1]))
self.comb += oe.eq(
last_wrdata_en[cwl_sys_latency-1] |
last_wrdata_en[cwl_sys_latency] |
last_wrdata_en[cwl_sys_latency+1] |
last_wrdata_en[cwl_sys_latency+2])
if with_odelay:
self.sync += \
If(self._wlevel_en.storage,
oe_dqs.eq(1), oe_dq.eq(0)
).Else(
oe_dqs.eq(oe), oe_dq.eq(oe)
)
else:
self.sync += [
oe_dqs.eq(oe),
oe_dq.eq(oe)
]
self.sync += bl8_sel.eq(last_wrdata_en[cwl_sys_latency-1])
class V7DDRPHY(S7DDRPHY):
def __init__(self, pads, **kwargs):
S7DDRPHY.__init__(self, pads, with_odelay=True, **kwargs)
class K7DDRPHY(S7DDRPHY):
def __init__(self, pads, **kwargs):
S7DDRPHY.__init__(self, pads, with_odelay=True, **kwargs)
class A7DDRPHY(S7DDRPHY):
def __init__(self, pads, **kwargs):
S7DDRPHY.__init__(self, pads, with_odelay=False, **kwargs)
It would be nice to have an example constraint file for high speed operation (800MT/s and above).
Specifically to answer the questions:
once YosysHQ/yosys#850 lands in yosys master.
What I know so far:
Only repros in 1:2 mode works fine in 1:4
Is sensitive to the frequency (design meets timing)
Writes succeed, only reads fail.
Ports can now expose banks to user and can allow reordering accesses to the memory.
To implement reordering, we could create a module that would work on two native ports:
user_port = LiteDRAMNativePort(..., with_reordering=False)
internal_port = LiteDRAMNativePort(..., with_reordering=True)
class LiteDRAMReordering(Module)
def __init__(self, user_port, internal_port):
[...]
We could implement this scenario:
For writes:
For reads:
It would be good if the docs had some information about performance of litedram in different configurations.
Maybe it could be added to my spreadsheet here.
Traceback (most recent call last):
File "core.py", line 275, in
main()
File "core.py", line 256, in main
soc.generate_sdram_phy_py_header()
File "core.py", line 240, in generate_sdram_phy_py_header
self.sdram.controller.settings.timing))
File "/home/john/repos/litedram/litedram/sdram_init.py", line 292, in get_sdram_phy_py_header
init_sequence, _ = get_sdram_phy_init_sequence(phy_settings, timing_settings)
File "/home/john/repos/litedram/litedram/sdram_init.py", line 175, in get_sdram_phy_init_sequence
mr0 = format_mr0(bl, cl, wr, 1)
File "/home/john/repos/litedram/litedram/sdram_init.py", line 125, in format_mr0
mr0 |= wr_to_mr0[wr] << 9
KeyError: 4
The formula computing this uses tWTR*nphases which I think is wrong. Micron says this should be configured with the following formula:
WR (cycles) = roundup (tWR [ns]/tCK [ns]).
The 1:2 Phy only gets 64% bus efficency with a sequential read pattern. The 1:4 is capable of 97%.
My theory is the arbitrator in the bank machine mux is to blame.
When trying to install LiteX :
Cloning into '/home/xilinxbox/litex/litex/build/sim/core/modules/ethernet/tapcfg'...
Username for 'https://github.com':
Password for 'https://github.com':
remote: Repository not found.
fatal: Authentication failed for 'https://github.com/nizox/tapcfg/'
fatal: clone of 'https://github.com/nizox/tapcfg' into submodule path '/home/xilinxbox/litex/litex/build/sim/core/modules/ethernet/tapcfg' failed
Failed to clone 'litex/build/sim/core/modules/ethernet/tapcfg' a second time, aborting
And https://github.com/nizox/tapcfg gives a 404...
Most ice40 boards have SRAM which makes them useless for running real OSes like Linux. In theory you should be able to use the SDRAM controller from litedram on the ice40.
The CAT board https://hackaday.io/project/7982-cat-board is the only ice40 boards I've found with SDRAM so far.
See tn_4003_DDR4_network_design_guide / p5.
This commit fixes it, however its entangled with my auto_precharge pull request. Since timing has to be re verified anyways might as well pull them both.
The specific fix is here: JohnSully@a4be642
The main issue seems to be the bankmachine.hit calculation. Checking the openrow for equality should be registered.
This will improve code sharing, ease understanding and allow implementing BL8 support more easily for 1:2 PHY:
When die temp is high: 70.0107 (0x0ae5)
__ _ __ _ __
/ / (_) /____ | |/_/
/ /__/ / __/ -_)> <
/____/_/\__/\__/_/|_|
SoC BIOS / CPU: VexRiscv / 100MHz
(c) Copyright 2012-2018 Enjoy-Digital
(c) Copyright 2007-2018 M-Labs Limited
Built Jul 19 2018 15:06:12
BIOS CRC passed (5bf71210)
Initializing SDRAM...
Read bitslip: 3
Read delays scan:
m0: 00000000000000000000000000000000
m1: 00000000001111111111111000000000
m2: 00000000001111111111111000000000
m3: 00000000001111111111111100000000
Read delays: 3:10-24 2:10-24 1:10-24 0:32-33 completed
Memtest bus failed: 156/256 errors
Memtest data failed: 524288/524288 errors
Memtest addr failed: 8192/8192 errors
Memory initialization failed
With the latest release, the board fails to boot:
__ _ __ _ __
/ / (_) /____ | |/_/
/ /__/ / __/ -_)> <
/____/_/\__/\__/_/|_|
SoC BIOS / CPU: VexRiscv / 100MHz
(c) Copyright 2012-2018 Enjoy-Digital
(c) Copyright 2007-2018 M-Labs Limited
Built Jul 19 2018 15:34:22
BIOS CRC passed (317e43f7)
Initializing SDRAM...
Read bitslip: 2
Read delays scan:
m0: 00000000000000000000000000000000
m1: 11111111000000000000000000000000
m2: 11111111000000000000000000000000
m3: 111111110000
This is more of an FYI that out of order is not yet fully completed. Specifically the write interface must also output the bank its using. In addition OO is disabled in the frontend crossbar.
The interface is also not stable. If we implement re-ordering within banks (e.g. write batching) then we'll have to move to a tag system to track operations.
Refresh seems to be broken with SDR (working with DDR, DDR2, DDR3, DDR4).
To support higher frequencies in dual/quad rank mode, we will need to drive ODT dynamically.
Useful information can be found in:
make it easy to not export simple SDRAM CKE pin, as it is optional and may be tied hi on target PCB, and yes it may be that there is not a single unused pin left to assign this signal too
You should mention in the README the dependency on migen and how to install it.
I get a precharge all, but no refresh command. Reverting 6620a91 seems to get it back.
In multiplexer.py the line: fsm.delayed_enter("WTR", "READ", settings.timing.tWTR-1) uses the tWTR parameter. However in the modules its expressed in nanoseconds.
Other parameters such as tRP also expressed in nanoseconds undergo a transformation elsewhere to get converted into cycles - however tWTR never does. This greatly exaggerates the delay switching between write and read.
https://travis-ci.org/mithro/HDMI2USB-litex-firmware/jobs/352035389 and https://api.travis-ci.org/v3/job/352035389/log.txt
Total REAL time to Placer completion: 3 mins 8 secs
Total CPU time to Placer completion: 3 mins 8 secs
Running post-placement packing...
Writing output files...
*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
ERROR:PhysDesignRules:1939 - Issue with pin connections and/or configuration on
block:<ODDR2_5>:<OLOGIC2_OUTFF>. The OUTFF Flip-flop SRTYPE_OQ mode must be
ASYNC in DDR mode with DDR_ALIGNMENT mode of C0 or C1.
ERROR:Pack:1642 - Errors in physical DRC.
Mapping completed.
See MAP report file "top_map.mrp" for details.
Problem encountered during the packing phase.
Design Summary
--------------
Number of errors : 2
Number of warnings : 10
I have made an SDRAM plug-in board for the Lattice ICE40 HX8K EVB. It uses the AS4C16M16 which is already supported.
I am trying to make a simple Litex project to use it, but I get a synthesis error:
$ python3 ice40hx8k_litedram_nn.py
lxbuildenv: v2019.8.19.1 (run ice40hx8k_litedram_nn.py --lx-help for help)
<__main__.Platform object at 0x7fb78610b7f0>
{'cpu_type': None, 'cpu_variant': None, 'integrated_rom_size': 0, 'integrated_sram_size': 0}
ERROR: Conflicting init values for signal 1'1 (\soc_sdram_master_p0_act_n = 1'1, \soc_sdram_choose_req_want_activates = 1'0).
Traceback (most recent call last):
File "ice40hx8k_litedram_nn.py", line 224, in <module>
main()
File "ice40hx8k_litedram_nn.py", line 220, in main
vns = builder.build()
File "/home/niklas/hack/litex/notbuildenv/deps/litex/litex/soc/integration/builder.py", line 185, in build
toolchain_path=toolchain_path, **kwargs)
File "/home/niklas/hack/litex/notbuildenv/deps/litex/litex/soc/integration/soc_core.py", line 452, in build
return self.platform.build(self, *args, **kwargs)
File "/home/niklas/hack/litex/notbuildenv/deps/litex/litex/build/lattice/platform.py", line 34, in build
return self.toolchain.build(self, *args, **kwargs)
File "/home/niklas/hack/litex/notbuildenv/deps/litex/litex/build/lattice/icestorm.py", line 189, in build
_run_script(script)
File "/home/niklas/hack/litex/notbuildenv/deps/litex/litex/build/lattice/icestorm.py", line 121, in _run_script
raise OSError("Subprocess failed")
OSError: Subprocess failed
I have created a repo here with a small example file and the build output checked in. I'm using yosys (Yosys 0.9+932 (git sha1 9e6632c4, clang 9.0.0 -fPIC -Os).
The issue is solved by the following patch against LiteDRAM, also included in the test repo above:
diff --git a/litedram/phy/dfi.py b/litedram/phy/dfi.py
index 2948556..44bca0b 100644
--- a/litedram/phy/dfi.py
+++ b/litedram/phy/dfi.py
@@ -15,7 +15,7 @@ def phase_cmd_description(addressbits, bankbits, nranks):
("cke", nranks, DIR_M_TO_S),
("odt", nranks, DIR_M_TO_S),
("reset_n", 1, DIR_M_TO_S),
- ("act_n", 1, DIR_M_TO_S)
+ #("act_n", 1, DIR_M_TO_S)
]
@@ -52,7 +52,7 @@ class Interface(Record):
p.cs_n.reset = (2**nranks-1)
p.ras_n.reset = 1
p.we_n.reset = 1
- p.act_n.reset = 1
+ #p.act_n.reset = 1
# Returns pairs (DFI-mandated signal name, Migen signal object)
def get_standard_names(self, m2s=True, s2m=True):
@@ -85,11 +85,11 @@ class DDR4DFIMux(Module):
self.comb += [
p_i.connect(p_o),
If(~p_i.ras_n & p_i.cas_n & p_i.we_n,
- p_o.act_n.eq(0),
+ #p_o.act_n.eq(0),
p_o.we_n.eq(p_i.address[14]),
p_o.cas_n.eq(p_i.address[15]),
p_o.ras_n.eq(p_i.address[16])
).Else(
- p_o.act_n.eq(1),
+ #p_o.act_n.eq(1),
)
]
Underlying issue appears to be that the wrcmdphase has the same value as the wrphase. The wrphase wins and the command is silently dropped (but the bank machine thinks it was sent).
Forcing wrcmdphase to 1 appears to work.
Haven't had a chance to look into why yet, but I've temporarily disabled it locally. Its a major bottleneck the difference between 42% and 80% bus efficiency on my random write test. I'm not getting tFAW errors with it disabled which makes me think its not correct.
It seems that litedram requires a tCCD value on modules now. From @JohnSully and timvideos/litex-buildenv#77 (comment)
Warning (13410): Pin "sdram_dm[1]" is stuck at GND File: X:/GIT/litex-boards/litex_boards/partner/targets/soc_basesoc_c10lprefkit/gateware/top.v Line: 3435
can this warning be ignored?
memtest reports OK, but having 1 out of 2 DM pins stuck GND does not sound like a warning to be ignored ?
Some CL/CWL combinations are not possible since cmd_phase and dat_phase will be the same and there is no arbitration.
An assert (assert cmd_phase != dat_phase) is currently implemented in the code to prevent this case, but this limits the possible CL/CWL combinations.
We should probably remove this limitation by doing arbitration between cmd and data phases when both are valid.
From @mithro:
Wouldn't it be nicer to have;
SpeedGrade = namedtuple("tRP", "tRCD", "tWR", "tRFC", "tFAW", "tRC", "tRAS")
# DDR3
class MT41J128M16(SDRAMModule):
memtype = "DDR3"
# geometry
nbanks = 8
nrows = 16384
ncols = 1024
# speedgrade invariant timings
tREFI = 64e6/8192
tWTR = (4, 7.5)
tCCD = (4, None)
tRRD = 10
speed_grades = {
800: SpeedGrade(tRP=13.1, tRCD=13.1, tWR=13.1, tRFC=64, tRC=50.625, tRAS=37.5),
1066: SpeedGrade(...),
}
(LX P=mimasv2 C=vexriscv R=master-clean) tansell@tansell:~/github/timvideos/litex-buildenv-clean$ make firmware
mkdir -p build/mimasv2_base_vexriscv/
time python -u ./make.py --platform=mimasv2 --target=base --cpu-type=vexriscv --iprange=192.168.100 -Ob toolchain_path /opt/Xilinx/ --no-compile-gateware \
2>&1 | tee -a /usr/local/google/home/tansell/github/timvideos/litex-buildenv-clean/build/mimasv2_base_vexriscv//output.20191030-132700.log; (exit ${PIPESTATUS[0]})
Traceback (most recent call last):
File "./make.py", line 164, in <module>
main()
File "./make.py", line 148, in main
vns = builder.build(**dict(args.build_option))
File "/usr/local/google/home/tansell/github/timvideos/litex-buildenv-clean/third_party/litex/litex/soc/integration/builder.py", line 167, in build
self._generate_includes()
File "/usr/local/google/home/tansell/github/timvideos/litex-buildenv-clean/third_party/litex/litex/soc/integration/builder.py", line 129, in _generate_includes
self.soc.sdram.controller.settings.timing))
File "/usr/local/google/home/tansell/github/timvideos/litex-buildenv-clean/third_party/litedram/litedram/init.py", line 434, in get_sdram_phy_c_header
init_sequence, mr1 = get_sdram_phy_init_sequence(phy_settings, timing_settings)
File "/usr/local/google/home/tansell/github/timvideos/litex-buildenv-clean/third_party/litedram/litedram/init.py", line 377, in get_sdram_phy_init_sequence
}[phy_settings.memtype](phy_settings, timing_settings)
File "/usr/local/google/home/tansell/github/timvideos/litex-buildenv-clean/third_party/litedram/litedram/init.py", line 78, in get_lpddr_phy_init_sequence
return init_sequence, mr1
NameError: name 'mr1' is not defined
real 0m0.371s
user 0m0.311s
sys 0m0.066s
make: *** [Makefile:303: firmware-cmd] Error 1
Current DDR3 phys are doing Preamble/Postamble on 8 DDR cycles.
Verify that it's not violating specification and if it's the case, reduce it.
This is related to my other issue where calibration hangs. However even when working around it with a delay between reads and writes calibration still fails. It was working about a month ago.
Trying to figure it out on my own, but I'm running out of ideas.
Related to: 06ca53d
Specifically the bank machine command fifo isn't updated to handle the fact that ready can go high when valid is low.
litedram revision e5696ad
From https://travis-ci.org/timvideos/litex-buildenv/jobs/430294286
time python -u ./make.py --platform=atlys --target=base --cpu-type=lm32 --iprange=192.168.100 --no-compile-gateware \
2>&1 | tee -a build/atlys_base_lm32//output.20180918-230444.log; (exit ${PIPESTATUS[0]})
Traceback (most recent call last):
File "./make.py", line 156, in <module>
main()
File "./make.py", line 123, in main
soc = get_soc(args, platform)
File "./make.py", line 57, in get_soc
soc = SoC(platform, ident=SoC.__name__, **soc_sdram_argdict(args), **dict(args.target_option))
File "/home/travis/build/timvideos/litex-buildenv/targets/atlys/base.py", line 228, in __init__
dqs_ddr_alignment="C0")
File "/home/travis/build/timvideos/litex-buildenv/third_party/litedram/litedram/phy/s6ddrphy.py", line 111, in __init__
r_dfi = Array(Record(phase_cmd_description(addressbits, bankbits)) for i in range(nphases))
File "/home/travis/build/timvideos/litex-buildenv/third_party/litedram/litedram/phy/s6ddrphy.py", line 111, in <genexpr>
r_dfi = Array(Record(phase_cmd_description(addressbits, bankbits)) for i in range(nphases))
TypeError: phase_cmd_description() missing 1 required positional argument: 'nranks'
Hi,
I'm trying to test a custom configuration in simulation. Before that, I went to execute sim.py without any change and found couple of issues:
WARNING: File mem_1.init referenced on $PATH/litex/litedram/examples/sim/litedram_core.v at line 15687 cannot be opened for reading. Please ensure that this file is available in the current working directory.
top_tb.dut.ddr3.main: at time 530100.0 ps ERROR: CK and CK_N are not allowed to go to an unknown state.
I'm not sure if I'm doing something wrong or there's an extra step needed, but as far as I goes, I haven't had any successful simulation result. As you can see, init_done is never asserted, so the main fsm got stuck on idle states forever. Tracing back, that signal ends asserted by a CSR, but a) There's no CPU or register write interface and b) Both init memories are empty by default.
There are a lot of redundant writes to variables, therefore the generated code gets verbose. In this excerpt from top.v lines 335 to 341, the variable fsm_next_state is written twice, along with the declaration that is initialized. This can be improved.
always @(*) begin
sdram_checker_addr_gen_ce <= 1'd0;
sdram_checker_sink_sink_valid <= 1'd0;
fsm0_next_state <= 2'd0; // Already initialized by 0
sdram_checker_cmd_counter_fsm0_next_value <= 24'd0;
sdram_checker_cmd_counter_fsm0_next_value_ce <= 1'd0;
fsm0_next_state <= fsm0_state; // Written again
Thank you.
It would be good to have at least one test for each of the technology group;
I believe DDR3 is the primary one covered by tests at the moment?
Write Recovery should be computed as follow:
WR (cycles) = roundup (tWR [ns]/tCK [ns])
We are currently using a fixed value:
mr0 = format_mr0(bl, cl, 14, 1) # wr=8 FIXME: this should be ceiling(tWR/tCK)
The core is currently tested with high level simulation of the full controller which is not practical and slow.
We need good migen testbenches for the cores to:
See tn_4003_DDR4_network_design_guide / p6.
Currently, each bank has its own bank machine. On designs where efficiency is less critical, being able to reduce the number of bank machines and sharing a bank machine between multiple banks would be useful to reduce resource usage.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.