Coder Social home page Coder Social logo

magic-blue-smoke / dual-edge-tpu-adapter Goto Github PK

View Code? Open in Web Editor NEW
268.0 52.0 3.0 1011 KB

Dual Edge TPU Adapter to use it on a system with single PCIe port on m.2 A/B/E/M slot

coral-tpu edge-ai tpu-acceleration tpu-benchmarks tpu pcie-card pcie-interface m2-module m2 coral

dual-edge-tpu-adapter's People

Contributors

magic-blue-smoke avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dual-edge-tpu-adapter's Issues

PCIe adaptor for multiple mini-PCIe chips

Hi, thank you for your work!

This might be a short-sighted request, but reflects current production conditions.
At present, the only module in stock is the single TPU in mini-PCIe form.

With a passive adaptor like these, I was able to get them to connect to my server. However, this is highly wasteful occupying a multiple-lane PCIe slot with only a card that uses a single lane.

Would you be able to produce an adapter card that converts multiple PCIe lanes in a single slot to multiple mini-PCIe? Say, perhaps 4 mini-PCIe or any number that makes sense.

Are Edge TPU's useful in NLP (Natural Language Processing) ?

I have read that Edge TPUs are basically designed for inference on computer vision models. I have also seen that the least recent versions incorporate the unidirectional LSTM layer.

Is there any NLP inference project that is taking advantage of the capabilities of Egde TPU?

design a m2 mkey to dual ekey

Hello, your design is very interesting, but I want to design a passive switching solution by myself, which will transfer an M.2 Mkey plus a retimer to two ekeys, which will help me in the standard four-port plx There are 8 Coral Dual Edge TPUs implemented above. Can you share the packaging files of Coral Dual Edge TPU and ekey interface of your pcb design software with me? Thank you very much!

Dual TPU detected, but not functioning

Hardware:
ASRock Rack X470D4U
Ryzen 1700X

So I got the low profile Dual-Edge TPU, along with a Dual TPU off of Ebay. Got them installed into my home NAS (slot PCIE6) to run Frigate, via a VM. Got PCIe Passthru setup and working, passing through both TPUs to the VM that's running latest Debian. Install the drivers/etc as per setup instructions, see both devices in /dev/ as I'd expect.

# ls -l /dev/apex_*
crw-rw---- 1 root apex 120, 0 Jan  1 13:07 /dev/apex_0
crw-rw---- 1 root apex 120, 1 Jan  1 13:07 /dev/apex_1

Starting up Frigate, it sees both TPUs, but the detector threads keep crashing. I decided to run the sample detection model, however it just sits there and doesn't actually run

# python3 examples/classify_image.py --model test_data/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite --labels test_data/inat_bird_labels.txt --input test_data/parrot.jpg
----INFERENCE TIME----
Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.

When I look in dmesg, I notice there are what looks like some PCI routing or interrupt issues, which is very strange

[   11.325719] apex 0000:00:0a.0: can't derive routing for PCI INT A
[   11.325721] apex 0000:00:0a.0: PCI INT A: no GSI
[   11.330283] apex 0000:00:0b.0: can't derive routing for PCI INT A
[   11.330285] apex 0000:00:0b.0: PCI INT A: no GSI

I also see some gasket messages, above those apex messages

[   11.270448] gasket: loading out-of-tree module taints kernel.
[   11.270504] gasket: module verification failed: signature and/or required key missing - tainting kernel

Host shows these PCIe devices:

2b:00.0 PCI bridge: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch
2c:03.0 PCI bridge: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch
2c:07.0 PCI bridge: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch
2d:00.0 Non-VGA unclassified device: Global Unichip Corp. Coral Edge TPU
2e:00.0 Non-VGA unclassified device: Global Unichip Corp. Coral Edge TPU

VM shows these PCIe devices

00:0a.0 System peripheral: Global Unichip Corp. Coral Edge TPU
00:0b.0 System peripheral: Global Unichip Corp. Coral Edge TPU

Anyone have any ideas? Bad card and I should ask for a refund? Should I get a basic PCIe holder like this one to try and get just 1 of them working to verify the Dual-Edge TPU card is fine?

Test results for B+M key adapter

All 5 adapters I had with B+M key for Coral Dual TPU card are sent and as of today, even 2.5 adapters are delivered.

Please take your time to test and please report:

  • motherboard and adapter (if mated via PCIe-m.2 adapter)
  • OS
  • Fits? Works?
  • If using 2x single core (non-pipeline, for example camera1->TPU1, camera2->TPU2) - is there decrease in performance compared to single core only (ie only camera1 -> TPU1)
  • Models (or project) used and inference time
  • Photo, optional, but appreciated
  • What thermal solution are you using?
  • Comments, suggestions for production adapters

Next shipment?

Hello,
When will the next shipment be available as it states all sold out and go on waiting list but no estimated time it will be available again.
Cheers

Optiplex 3000 Micro SSD+Wifi+EdgeTPU 2 M2 slots

Hi! I have a Dell Optiplex 300 Micro, with 2 M2 slots, one for SSD and another for Wifi+Bluetooth card.

I'd like to install an Edge TPU or Dual Edge TPU on this machine.

Do you have an adapter for this scenario?

I have only these 2 slots available:

Untitled

Edit: Just to make it clear, I'd like to use the SSD, Wireless card and the Edge TPU in this machine.

Prevents MB Power Up

I just received 2 of these units. With the Dual Edge TPU inserted, and the PCIe card inserted into the motherboard (ASUS Prime B540M A-AX with Ryzen 7950x CPU), will not power on.

Passthrough adapter to ESXi VM not working

I'm running ESXi 8.0.1 on an HP DL380p Gen8 and want to pass through the Coral devices to a VM.

If I SSH into the ESXi server and run lspci, it shows:

0000:0c:00.0 Non-VGA unclassified device: Global Unichip Corp. Coral Edge TPU 0000:0d:00.0 Non-VGA unclassified device: Global Unichip Corp. Coral Edge TPU

Under ESXi PCI devices I see that passthrough is not supported:

esxi

Through vCenter it says "This device cannot be made available for VMs to use" when selecting it.

vcenter

If I click toggle passthrough, it displays an error saying "This device cannot be made available for VMs to use"

Has anyone gotten this to work before?

B+M Key Adapter mounting screw

I just received my B+M Key Adapters today from MakerFabs. They appear to work as expected under Windows with two Coral Devices showing up in Device Manager. My only concern is the screw that is shipped with the adapter for securing the M2 Coral does not have the right shape to secure them properly. The supplied screw has a round head which does not clamp down nicely on the M2 Coral and looks very close to not securing it at all. The correct fastener should have a flat mounting face on the underside.

Will source be available at some point?

I've been hoping for availability of boards for quite a long time and at this point would be willing to invest the work to source and assemble them myself. Will the source files (schematic and layouts) be available at some point?

TPU (with PCIe adapter) not functioning/Throwing pcieport/Apex errors

This is a "self built" AMD Epyc server:
Motherboard: AS Rock Rack ROMED8-2T (on latest BIOS version)
CPU: Epyc 7302
OS: Proxmox 8
TPU adapter installed in pcie slot 7 (have also tried in slot 6)

trying to Follow: https://github.com/Bytelake/Coral-in-LXC for install

Just received "Dual Edge TPU Adapter - PCIe x1 Low Profile" and installed my Dual TPU (also new. No way to test otherwise without this adapter)

Upon booting I receive

[    5.197351] pcieport 0000:80:01.1: Data Link Layer Link Active not set in 1000 msec
[    5.197355] pcieport 0000:80:01.1: AER: subordinate device reset failed
[    5.197367] pcieport 0000:80:01.1: AER: device recovery failed
[    5.197370] pcieport 0000:80:01.1: DPC: containment event, status:0x1f01 source:0x0000
[    5.197371] pcieport 0000:80:01.1: DPC: unmasked uncorrectable error detected
[    5.197378] pcieport 0000:80:01.1: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
[    5.197465] pcieport 0000:80:01.1:   device [1022:1483] error status/mask=00090000/04000000
[    5.197542] pcieport 0000:80:01.1:    [16] UnxCmplt              
[    5.197611] pcieport 0000:80:01.1:    [19] ECRC                   (First)
[    5.197682] pcieport 0000:80:01.1: AER:   TLP Header: 4a008001 84000004 80002100 00000000
[    5.197762] pci 0000:83:00.0: AER: can't recover (no error_detected callback)
[    5.197764] pci 0000:84:00.0: AER: can't recover (no error_detected callback)

[    8.042192] apex 0000:83:00.0: Unable to change power state from D3cold to D0, device inaccessible

[ 1433.786816] apex 0000:83:00.0: Apex performance not throttled due to temperature
[ 1436.346787] apex 0000:84:00.0: Apex performance not throttled due to temperature
[ 1438.906750] apex 0000:83:00.0: Apex performance not throttled due to temperature
[ 1441.466714] apex 0000:84:00.0: Apex performance not throttled due to temperature
[ 1444.026694] apex 0000:83:00.0: Apex performance not throttled due to temperature
[ 1446.586653] apex 0000:84:00.0: Apex performance not throttled due to temperature
[ 1449.146624] apex 0000:83:00.0: Apex performance not throttled due to temperature

[ 2092.722612] apex 0000:83:00.0: RAM did not enable within timeout (12000 ms)
[ 2092.722651] apex 0000:83:00.0: Error in device open cb: -110

After booting, "lspci" sees the 2 TPU cores. Files "/dev/apex_0" and "/dev/apex_1" exist.

When I move adapter to slot 6, "0000:80:01.1" above changes to "0000:c0:01.1".

I'm kind of new to Linux at this level. Not sure how to go about debugging this issue.
I've done a bunch of google searching, and not finding a whole lot.
Thank yoU!

No TPUs show up

I finally got around to trying the dual TPU adapter, and I'm seeing 0 of them show up in lspci.

My motherboard is a gigabyte z170x gaming 7. Gigabyte's description says:

Dual PCIe Gen3 x4 M.2 Connectors with up to 32Gb/s Data Transfer (PCIe NVMe & SATA SSD support)

so I know that the M.2 ports aren't SATA-only.

It's not a wifi slot so it shouldn't be CNVio.

Before installing the dual-tpu adapter, I was using a m.2 adapter from amazon, and that worked for showing just one TPU.

I also tried removing:

blacklist gasket
blacklist apex
options vfio-pci ids=xxxx:089a

from /etc/modprobe.d/blacklist-apex.conf and ran update-initramfs -u -k all in case that was preventing lspci from showing the card. it didn't help. lspci still shows nothing when i grep for 089a.

Any ideas on what could be happening? Maybe some configuration I should update/change?

Here's the full lspci output.

root@proxmox:/home/rhee# lspci -nnk
00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers [8086:191f] (rev 07)
	Subsystem: Gigabyte Technology Co., Ltd Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers [1458:5000]
	Kernel driver in use: skl_uncore
00:01.0 PCI bridge [0604]: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 07)
	Kernel driver in use: pcieport
00:14.0 USB controller [0c03]: Intel Corporation 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller [8086:a12f] (rev 31)
	Subsystem: Gigabyte Technology Co., Ltd 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller [1458:5007]
	Kernel driver in use: xhci_hcd
	Kernel modules: xhci_pci
00:16.0 Communication controller [0780]: Intel Corporation 100 Series/C230 Series Chipset Family MEI Controller #1 [8086:a13a] (rev 31)
	Subsystem: Gigabyte Technology Co., Ltd 100 Series/C230 Series Chipset Family MEI Controller [1458:1c3a]
	Kernel driver in use: mei_me
	Kernel modules: mei_me
00:17.0 SATA controller [0106]: Intel Corporation Q170/Q150/B150/H170/H110/Z170/CM236 Chipset SATA Controller [AHCI Mode] [8086:a102] (rev 31)
	Subsystem: Gigabyte Technology Co., Ltd Q170/Q150/B150/H170/H110/Z170/CM236 Chipset SATA Controller [AHCI Mode] [1458:b005]
	Kernel driver in use: ahci
	Kernel modules: ahci
00:1b.0 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #17 [8086:a167] (rev f1)
	Kernel driver in use: pcieport
00:1b.2 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #19 [8086:a169] (rev f1)
	Kernel driver in use: pcieport
00:1b.3 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #20 [8086:a16a] (rev f1)
	Kernel driver in use: pcieport
00:1c.0 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #1 [8086:a110] (rev f1)
	Kernel driver in use: pcieport
00:1c.1 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #2 [8086:a111] (rev f1)
	Kernel driver in use: pcieport
00:1c.2 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #3 [8086:a112] (rev f1)
	Kernel driver in use: pcieport
00:1c.4 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #5 [8086:a114] (rev f1)
	Kernel driver in use: pcieport
00:1d.0 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #9 [8086:a118] (rev f1)
	Kernel driver in use: pcieport
00:1f.0 ISA bridge [0601]: Intel Corporation Z170 Chipset LPC/eSPI Controller [8086:a145] (rev 31)
	Subsystem: Gigabyte Technology Co., Ltd Z170 Chipset LPC/eSPI Controller [1458:5001]
00:1f.2 Memory controller [0580]: Intel Corporation 100 Series/C230 Series Chipset Family Power Management Controller [8086:a121] (rev 31)
	Subsystem: Gigabyte Technology Co., Ltd 100 Series/C230 Series Chipset Family Power Management Controller [1458:5001]
00:1f.3 Audio device [0403]: Intel Corporation 100 Series/C230 Series Chipset Family HD Audio Controller [8086:a170] (rev 31)
	Subsystem: Gigabyte Technology Co., Ltd 100 Series/C230 Series Chipset Family HD Audio Controller [1458:a036]
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel
00:1f.4 SMBus [0c05]: Intel Corporation 100 Series/C230 Series Chipset Family SMBus [8086:a123] (rev 31)
	Subsystem: Gigabyte Technology Co., Ltd 100 Series/C230 Series Chipset Family SMBus [1458:5001]
	Kernel driver in use: i801_smbus
	Kernel modules: i2c_i801
00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (2) I219-V [8086:15b8] (rev 31)
	Subsystem: Gigabyte Technology Co., Ltd Ethernet Connection (2) I219-V [1458:e000]
	Kernel driver in use: e1000e
	Kernel modules: e1000e
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK208B [GeForce GT 710] [10de:128b] (rev a1)
	Subsystem: PNY GK208B [GeForce GT 710] [196e:118b]
	Kernel driver in use: nouveau
	Kernel modules: nvidiafb, nouveau
01:00.1 Audio device [0403]: NVIDIA Corporation GK208 HDMI/DP Audio Controller [10de:0e0f] (rev a1)
	Subsystem: PNY GK208 HDMI/DP Audio Controller [196e:118b]
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel
04:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:0612] (rev 02)
	Subsystem: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:1062]
	Kernel driver in use: ahci
	Kernel modules: ahci
07:00.0 Ethernet controller [0200]: Qualcomm Atheros Killer E2400 Gigabit Ethernet Controller [1969:e0a1] (rev 10)
	Subsystem: Gigabyte Technology Co., Ltd Killer E2400 Gigabit Ethernet Controller [1458:e000]
	Kernel driver in use: alx
	Kernel modules: alx
08:00.0 PCI bridge [0604]: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] [8086:1578]
	Kernel driver in use: pcieport
09:00.0 PCI bridge [0604]: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] [8086:1578]
	Kernel driver in use: pcieport
09:01.0 PCI bridge [0604]: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] [8086:1578]
	Kernel driver in use: pcieport
09:02.0 PCI bridge [0604]: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] [8086:1578]
	Kernel driver in use: pcieport
09:04.0 PCI bridge [0604]: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] [8086:1578]
	Kernel driver in use: pcieport
25:00.0 USB controller [0c03]: Intel Corporation DSL6540 USB 3.1 Controller [Alpine Ridge] [8086:15b6]
	Subsystem: Device [2222:1111]
	Kernel driver in use: xhci_hcd
	Kernel modules: xhci_pci

Flexible extension cable for m.2

Hello.
I plan to connect the "coral dual tpu" via your adapter to the "ROCK 3A" single board (https://wiki.radxa.com/Rock3/hardware/3a) in the M.2 E-key slot. There is little space, only for 22x30 boards.
It is logical to connect the adapter via a flexible m.2 extension.
Please tell me which flexible extension cord is right for me?

There is one more feature, I plan to put a board on top "Penta SATA HAT" (https://wiki.radxa.com/Penta_SATA_HAT )
It will look something like this - https://hardware.developpez.com/dossiers/NAS/RockPi-3-Penta-Sata-Hat/images/montage-astuce-nappe.jpeg

it turns out that the flexible cable should come out at an angle of 90 degrees. Do they exist?

ASM1182 datasheet and temperature

I got the m.2 Dual edge adapter and noticed that the IC is getting quite warm without any usage. Following questions

  1. Is this normal behaviour?
  2. Is there a datasheet for this IC (I couldnt find one)
  3. Is there a command to check temperature or temperature to Tjmax
  4. Does the IC shutdown if it gets too hot? If so what is its maximum temperature?

Dual Edge TPU Adapter - PCIe x1 Low Profile, ubuntu 22.04

Hi,

Would like to ask for some advice. I've installed the Dual Edge TPU adapter in a 8x Raiser card ( as I don't have any x1 slots in my motherboard ). I have a Fujitsu Siemens Primergy RX200-S7 server, running Ubuntu 22.04.

I don't get the PCIe adapter listed when typing lspci.
Also ls dev/apex_0 or apex* is returning: that file or folder does not exist.

Any help would be truly appreciated

SBC/Motherboards that support Dual M.2 / 2 buses

Firstly super kudos @magic-blue-smoke for developing these adapters.

I know the google-coral/edgetpu#256 thread discusses support HW for Edge TPU but it's not really specific to the Dual Edge TPU or HW with 2 buses. OP notes M.2 dual tested with Nexcom. Reading the comments there's lots of people getting one TPU working from the Dual. But it's difficult to determine what HW has 2 buses and could work with your adapters to support Dual.

Regarding @magic-blue-smoke's Dual-Edge-TPU-Adapters specifically. I thought we could leverage this post and record any details of SBC's/Motherboard that are know to work with the adapter (and thus the Dual M.2 TPU / 2 buses.) But as @magic-blue-smoke suggests it's a good idea to record this to the google-coral support thread as well.

increased latency?

One PCIe bus on m.2 E-key connector is not end of the World. With PCIe switch bus can be split in two busses at cost of slightly increased latency and shared bandwidth.

Hello.

  1. What is the difference in milliseconds when using "Coral DUAL tpu" on PCIe x2 without an adapter and on PCIe x1 through an adapter?
  2. Not sure why "Coral DUAL tpu" requires 2 PCIe lanes? Is the data bitrate really that big?

SSD and dual TPU on the same adapter

Hello Magic Blue Smoke, I read you're planning to use more lanes for the upstream port of the PCIe switch.
Do you think it's possible to make a B+M key adapter (22110 format?) to host an M key 2242 SSD (at least with 2 lanes) and the Coral dual edge TPU using a PCIe Gen 3 switch (on the backside?), like the Asmedia ASM2812I or the Diodes PI7C9X3G808GP? I have to say that the first one is even not listed on distributors' catalogue and the second one is distributed only by Future Electronics with no availability at the moment. Both are in BGA package.
I know they cost 3-4 times the price of the ASM1184E, but the idea is to mantain the NVME SSD in SFF or USFF computers and add the Dual TPU, with a good bandwidth for both devices.
They could share a single (ad hoc) heatsink and have an additional power input header if needed.
Somebody interested in such adapter, like me, could contribute for the prototypes, if you think you can realize it.

unraid pcie issues

Hello,

I have been trying to get your PCIe adaptor to work for a few months now with no luck. I am using unraid with Frigate v0.10 Docker container. I can see both TPUs as apex_0 and apex_1. Symptom is Frigate will un for a bot then I get a PCIe error in my syslog for unraid. IT will then shutdown one of the TPUs and the Temp goes negative. I have posted my issues in the Frigate github and the unraid forums with no luck. I have reposted my unraid post below. Please let me know what else I can troubleshoot. Love all the work you have done for the community hoping to get this to work properly.

I am having a similar issue to @AdvancedMobileRepairs Using the Dual TPU in Magic-Blue-smoke PCIe adapter. Prior to this I was using a single TPU with a different adapter that was working fine. I have been monitoring the Coral Temperatures at they have not been going above 48 Degrees. I have this error in my syslog:

image

If anyone has any insight into this? I already asked in the Frigate github and we troubleshooted to a point but then they told me to ask in the unraid forum.

Thank you

EDIT EDIT:

Per this thread:

https://forums.unraid.net/topic/103901-solved-aer-pcie-bus-errors/

I disabled ASPM on PCIe in my BIOS. restarted server and running frigate to see how long it works before the coral shuts down.

And it failed again! That did not fix the issue. very weird

image

Temp is not the issue it seems

image

Any insight?

Order confirmation

Hi Alexander,

I placed an order with you and got an email letting me know I missed out filling in my last name. I replied twice to your emails, but am not sure you are getting them for some reason as you sent me an email saying I hadn't responded after I had, and have had no feedback to the follow up email.

Cheers
Simon

Verifying before ordering TPUs

Since all of The Coral TPUs are on backorder, I’m going to go ahead and order them now with the hope of getting them by the end of the year. I’m just wanting to verify that you’re still planning to release another round of the dual TPU to PCIe adapters before placing an order on either the dual or PCIe version TPUs.

Also, I would be very interested in the 2x dual TPU to PCIe adapter if you still plan to release that later this year. Would you happen to know if individual TPUs could be assigned to different virtual machines (like in Proxmox) using your adapter?

Thank you.

Which m.2 variants for manufacturing?

This issue is to separate desktop PCIe here /issues/4 and m.2 variants

Dual Edge TPU to m.2 adapter is possible in following configurations:

  • m.2 A+E key 2242. This form-factor is non-standard (standard is 2230 for A/E keys) - you need to check clearance for longer card
  • m.2 B+M key 2242, 2260 and 2280.

Putting all of those into production is not easy, please let me know:

  • which key and form factor you'd prefer
  • quantity

Buy now not visible

Hi,

I saw your comment in #8 but for me it still says coming soon, not buy now.

Hopefully they haven't all sold already!?

Simon

Pricing?

Hi, since you haven't got any for sale at them moment there is no mention of pricing anywhere. It would be nice if you published rough/estimated prices somewhere (including shipping costs).

Only one tpu showing up with PCIe adapter

Hi.

I have just installed the PCIe adapter with the coral dual tpu board. However I'm still only seeing a single Coral TPU. I followed the instructions on the coral page to install the Linux drivers. Is there anything else I need to do to see both TPUs?

PCIe x1 adapter cooling options

Option 1

Dual Edge TPU cards are distributed by Mouser, so I went through their catalog looking for heat sinks.

TPU_Adapter_PCIe_heat_sink

This one fits, with few thoughts in mind:

  1. Distance between mounting holes on PCB is 45.25mm, heat sink is 46.7mm. Pushpins are tilted a bit to compensate this difference, but locked securely
  2. TPU flipchips are not the highest components, that's why thermal grease is not recommended and elastic thermal pads should be used
  3. This heat sink is way too high and won't let another card in the next PCIe slot

update:
4. thermal pads have to be purchased separately, as existing thermal grease won't reach TPU and PMIC flipchips
5. Rev2 boards (with 4 mounting slots) solve distance issue (1) above, however would require spacers for pins to lock securely.
6. Rev2 board mounting slots accept heatsinks with square mounting holes arrangement and distance between holes 31.3-32.7mm

Heat sink Mfr. No: ATS-CPX040040020-115-C2-R0
Pushpins Mfr. No: ATS-HK127-R0
Also available form Digikey

Option 2

Jump to 3D printed Coral TPU Cooler Adapter for 40mm Heatsink by @ZCalilung
image

Testers are needed for Dual Edge TPU Adapter board prototypes

Testers are needed for Dual Edge TPU Adapter board prototypes.

Boards available:

  • m.2 2242* AE key
  • m.2 2280* BM key (can be dremeled down to 2242, mounting holes at 42, 60 and 80)
    (*) actual width is 24mm

DM on twitter (preferred) @magic__smoke (double underscore) if interested or leave a message here.

PCIe x4 to four m.2 E-key slots adapter

Hello,

On the waitlist I saw on option for "PCIe x4 to four m.2 E-key slots adapter".

Will this adapter support 4 x Dual TPUs for a total of 8 TPUs?
Do you have a timeline for when, approximately, this may be available?

Thank you so much for your efforts!

Bring up log

Linux x86_64: nothing with "lspci | grep 1ac1". VM might be an issue. Have to get a new drive for Linux or one of SBCs mentioned here google-coral/edgetpu/issues/256

Raspberry Pi:

pi@raspberrypi:~ $ lspci | grep 1ac1
03:00.0 System peripheral: Device 1ac1:089a
04:00.0 System peripheral: Device 1ac1:089a

What is the advantage of the adapter?

Hey,

first of all, thank you very much for the products! I bought zwo Dual Edge TPUs and did not mention, that they use E Key. My first question is, why? Why did they use E Key and not M Key for the Dual Edge TPU?

I have a Gigabyte MW34-SP0
https://www.gigabyte.com/de/Enterprise/Server-Motherboard/MW34-SP0-rev-10#Overview
MW34-SP0_BlockDiagram

I can only use one Dual Edge TPU and I'm not sure if both tpus will be recognized. So I ordert two adapters befor I found your solution:
https://www.delock.de/produkt/65831/merkmale.html?f=s

I will buy your PCIe Adapter for two Dual edge TPUs, but I want to understand what is the advantage of your adapter compared to the delock adapter?

Edit: Can I use the Low profile PCIe x4 Card with the PCIe x16 slot? So the TPUs will be directly conneted to the cpu :-)

Dual Edge TPU Adapter Causing PCI issues and BSOD Server 2022

Hello, I recently got a Dual TPU adapter and i plugged it in, installed the drivers and it worked great in my Lenovo SR250 blade server. I am using it for AI detection in Blue Iris. Both TPU's were detected and I checked both TPU temps and everything seemed good (in the 30-40c range). The server ran for about 5 min and than had a BSOD. I checked the seating of both the TPU and the pcie slot and gave everything a good blow of air to make sure there were no dust or contaminants. Loaded the server back up but after some time it did the same thing. Here is a log dump from what my server says is erroring out. Any help would be appreciated because when it works, it works amazingly.

0 Power FQXSPPW0008I Host Power has been turned off. December 27, 2023 2:18:13 PM
1 System FQXSPPW0009I Host Power has been Power Cycled. December 27, 2023 2:17:38 PM
2 System FQXSPIO0011N An Uncorrectable Error has occurred on CPUs. December 27, 2023 2:17:05 PM
3 Memory FQXSFMA0006I Unqualified DIMM 1 has been detected, the DIMM serial number is 1AA6ECDC-V20. December 27, 2023 2:04:08 PM
4 Memory FQXSFMA0006I Unqualified DIMM 2 has been detected, the DIMM serial number is 1AADD8FF-V20. December 27, 2023 2:04:08 PM
5 Memory FQXSFMA0006I Unqualified DIMM 3 has been detected, the DIMM serial number is 1AAD66FB-V20. December 27, 2023 2:03:58 PM
6 Memory FQXSFMA0006I Unqualified DIMM 4 has been detected, the DIMM serial number is 1AA6EBA8-V20. December 27, 2023 2:03:58 PM
11 Disks FQXSPSD0000I The M2 Drive has been added. December 27, 2023 2:02:48 PM
12 Power FQXSPPW0008I Host Power has been turned off. December 27, 2023 1:26:26 PM
13 System FQXSPPW0009I Host Power has been Power Cycled. December 27, 2023 2:11:11 AM
14 System FQXSPPW0009I Host Power has been Power Cycled. December 27, 2023 2:05:32 AM
15 System FQXSPIO2006I System ThinkSystem SR250 has recovered from an NMI. December 27, 2023 2:01:17 AM
16 System FQXSPPW0009I Host Power has been Power Cycled. December 27, 2023 2:00:51 AM
17 System FQXSPPW0009I Host Power has been Power Cycled. December 27, 2023 2:00:36 AM
18 System FQXSPIO0006N A software NMI has occurred on system ThinkSystem SR250. December 27, 2023 1:58:59 AM
19 System FQXSPIO0015M Fault in slot 2 on system ThinkSystem SR250. December 27, 2023 1:58:56 AM
20 System FQXSFIO0010M An Uncorrectable PCIe Error has Occurred at Bus 0000 Device 01 Function 00. The Vendor ID for the device is 8086 and the Device ID is 1901. The Physical slot number is 2. December 27, 2023 1:58:53 AM
21 System FQXSFIO0010M An Uncorrectable PCIe Error has Occurred at Bus 0000 Device 01 Function 00. The Vendor ID for the device is 8086 and the Device ID is 1901. The Physical slot number is 2. December 27, 2023 1:58:52 AM
22 System FQXSPIO2015I Fault condition removed on slot 2 on system ThinkSystem SR250. December 27, 2023 1:58:51 AM
23 System FQXSPPW0009I Host Power has been Power Cycled. December 27, 2023 12:39:19 AM
24 System FQXSPIO2006I System ThinkSystem SR250 has recovered from an NMI. December 27, 2023 12:19:37 AM
25 System FQXSPPW0009I Host Power has been Power Cycled. December 27, 2023 12:19:09 AM
26 System FQXSPPW0009I Host Power has been Power Cycled. December 27, 2023 12:18:53 AM
27 System FQXSPIO0006N A software NMI has occurred on system ThinkSystem SR250. December 27, 2023 12:17:17 AM
28 System FQXSPIO0015M Fault in slot 2 on system ThinkSystem SR250. December 27, 2023 12:17:17 AM
29 System FQXSFIO0010M An Uncorrectable PCIe Error has Occurred at Bus 0000 Device 01 Function 00. The Vendor ID for the device is 8086 and the Device ID is 1901. The Physical slot number is 2. December 27, 2023 12:17:14 AM
30 System FQXSPCA2015I Sensor CPU Overtemp has deasserted the transition from normal to non-critical state. December 27, 2023 12:07:48 AM
31 System FQXSPCA0015J Sensor CPU Overtemp has transitioned from normal to non-critical state. December 27, 2023 12:07:45 AM
32 System FQXSPPW0009I Host Power has been Power Cycled. December 27, 2023 12:06:07 AM
33 System FQXSPIO0011N An Uncorrectable Error has occurred on CPUs. December 27, 2023 12:05

"PCIe error as occurred" at boot with a Dell R230 server

Hi,

I have just installed the PCIe adapter with the coral dual TPU board in a Dell R230 server. The server does not boot and complains about a PCIe error. Do you think, there is something to do to make it working?

The adapter works without any issue in a Dell desktop and the 2 TPUs are detected.

Thank you.

PCIe x1 vs PCIe x16 vs M.2 SATA

Hi @magic-blue-smoke

Got my adapters from you today! THANK YOU!

Been some years since I played with PC and PCI specs.
I got the
I use Intel 11th Gen CPU, and my M.2_1 are occupied with a NVMe drive.

Could you please tell me;

  • Can I use the PCIe x1 adapter in the PCIe x16 slot and get the same performance?
  • Can I use the M.2 adapter in the M2_2 slot, listed as: M.2_2 slot (Key M), type 2242/2260/2280 (supports PCIe 3.0 x4 & SATA modes)

On my motherboard, I'm not sure if I can use PCIEX16_1 (should be able as I use Intel 11th gen).

Look forward for your reply :)

PCIe x1 version pre-order

PCIe x1 (desktop) version is planned to be available for purchase around September.
To estimate production volume, please DM me on Twitter @magic__smoke (double underscore) with boards quantity.

Update: number of boards to be produced is estimated and online store link will be posted here when everything is set and ready. However, I keep this issue/Twitter DM open for feedback (especially on x4 Dual Edge TPU adapter) or if you feel like personal notification works better.

m.2 B+M key -> m.2 E-key Dual TPU cooling options

So far there is only 1 copper heat spreader for 2230 that is widely available ... and it's meant for the Steam Deck SSD. Based on the video review of this heat spreader, it doesn't attach securely to the 2230 card as it depends additional pressure on top to keep it secure.

https://www.amazon.com/Steam-Deck-SSD-Heatsink-Thermal/dp/B0C1BQVZW2

There are also 2230-like heat spreaders with somewhat proprietary form factors meant for Dell laptops that don't look like they can be secured without an additional screw somewhere in the 2242 position.

https://www.amazon.com/Heatsink-8F83M-08F83M-Replacement-Alienware/dp/B0B21NK7Z3

Is anyone aware of heatsinks that would fit onto the 2230 Dual TPU variant when mounted on the M.2 B+M adapter?

Page table slot error

Hi - I just installed a newly purchased m.2 B+M Adapter with a dual edge TPU. I'm able to see both of the TPUs from the card but I'm constantly seeing errors in the kernel log:

apex 0000:06:00.0: get user pages failed for addr=0x7f0dae98c000, offset=0x0 [ret=-14]
[  613.240568] apex 0000:06:00.0: gasket_perform_mapping -14
[  613.241044] apex 0000:06:00.0: page table slots 4096 (@ 0x1000000) to 4096 are not available

I don't think this is driver related since my original TPU (a single M.2 A+E version) is working fine in the WiFi slot. Does this error indicate a bad TPU, a bad adapter or some configuration issue?

~$ lspci
00:00.0 Host bridge: Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers (rev 07)
00:01.0 PCI bridge: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) (rev 07)
00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630]
00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller (rev 10)
00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10)
00:16.0 Communication controller: Intel Corporation Cannon Lake PCH HECI Controller (rev 10)
00:17.0 SATA controller: Intel Corporation Cannon Lake PCH SATA AHCI Controller (rev 10)
00:1b.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #21 (rev f0)
00:1c.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #6 (rev f0)
00:1f.0 ISA bridge: Intel Corporation Q370 Chipset LPC/eSPI Controller (rev 10)
00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
00:1f.5 Serial bus controller: Intel Corporation Cannon Lake PCH SPI Controller (rev 10)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7) I219-LM (rev 10)
01:00.0 PCI bridge: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch
02:03.0 PCI bridge: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch
02:07.0 PCI bridge: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch
03:00.0 System peripheral: Global Unichip Corp. Coral Edge TPU
04:00.0 System peripheral: Global Unichip Corp. Coral Edge TPU
05:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
06:00.0 System peripheral: Global Unichip Corp. Coral Edge TPU
~$ ls /dev | grep apex
apex_0
apex_1
apex_2

Thanks

PCI Bracket for x1 adapter

Does anyone know of a PCI (metal or otherwise) slot bracket that fits the adapter well? I am hoping to be able to screw it in to the case.

I tried:

Keystone 9203

But the mounting holes in the board don't line up. I seem to remember seeing a suggestion on here for one but I can't seem to track it down again.

I get no ethernet when dual pcie adapter is plugged in

I have a server running ubuntu 23.04 kernel 6.2.0-27-generic. My motherboard is an asus prime b450m-a.
When i plug in the pcie card with or without an edge tpu installed on it my ethernet just stops working. When i unplug it again it works again.
I have cleared my journalctl and dumped it in a file when the pcie card was plugged in. But i don't find anything that can help me.
all_logs2.txt

Can i do anything else to help you and me diagnose this problem or does someone already know of a way how to solve this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.