Jetson Nano Brings AI Computing to Everyone

NVIDIA announced the Jetson Nano Developer Kit at the 2019 NVIDIA GPU Technology Conference (GTC), a $99 computer available now for embedded designers, researchers, and DIY makers, delivering the power of modern AI in a compact, easy-to-use platform with full software programmability. Jetson Nano delivers 472 GFLOPS of computing performance with a quad-core 64-bit ARM CPU and a 128-core integrated NVIDIA GPU. It also includes 4GB LPDDR4 memory in an efficient, low-power package with 5W/10W power modes and 5V DC input, as shown below.


Jetson Nano Developer Kit (80x100mm), available now for $99

The Jetson Nano Developer Kit fits in a footprint of just 80x100mm and features four high-speed USB 3.0 ports, MIPI CSI-2 camera connector, HDMI 2.0 and DisplayPort 1.3, Gigabit Ethernet, M.2 Key-E module, MicroSD card slot, and 40-pin GPIO header. The ports and GPIO header works out-of-the-box with a variety of popular peripherals, sensors, and ready-to-use projects, such as the 3D-printable deep learning JetBot that NVIDIA has open-sourced on GitHub.

The devkit boots from a removable MicroSD card which can be formatted and imaged from any PC with an SD card adapter. The devkit can be conveniently powered via either the Micro USB port or a 5V DC barrel jack adapter. The camera connector is compatible with affordable MIPI CSI sensors including modules based on the 8MP IMX219, available from Jetson ecosystem partners. Also supported is the Raspberry Pi Camera Module v2, which includes driver support in JetPack. Table 1 shows key specifications.

Jetson Nano specifications

The devkit is built around a 260-pin SODIMM-style System-on-Module (SoM), shown in figure 2. The SoM contains the processor, memory, and power management circuitry. The Jetson Nano compute module is 45x70mm and will be shipping starting in June 2019 for $129 (in 1000-unit volume) for embedded designers to integrate into production systems. The production compute module will include 16GB eMMC onboard storage and enhanced I/O with PCIe Gen2 x4/x2/x1, MIPI DSI, additional GPIO, and 12 lanes of MIPI CSI-2 for connecting up to three x4 cameras or up to four cameras in x4/x2 configurations. Jetson’s unified memory subsystem, which is shared between CPU, GPU, and multimedia engines, provides streamlined ZeroCopy sensor ingest and efficient processing pipelines.

Deep Learning Inference Benchmarks

Jetson Nano can run a wide variety of advanced networks, including the full native versions of popular ML frameworks like TensorFlow, PyTorch, Caffe/Caffe2, Keras, MXNet, and others. These networks can be used to build autonomous machines and complex AI systems by implementing robust capabilities such as image recognition, object detection and localization, pose estimation, semantic segmentation, video enhancement, and intelligent analytics.

Figure 3 shows results from inference benchmarks across popular models available online. The inferencing used batch size 1 and FP16 precision, employing NVIDIA’s TensorRT accelerator library included with JetPack 4.2. Jetson Nano attains real-time performance in many scenarios and is capable of processing multiple high-definition video streams.

Multi-Stream Video Analytics

Jetson Nano processes up to eight HD full-motion video streams in real-time and can be deployed as a low-power edge intelligent video analytics platform for Network Video Recorders (NVR), smart cameras, and IoT gateways. NVIDIA’s DeepStream SDK optimizes the end-to-end inferencing pipeline with ZeroCopy and TensorRT to achieve ultimate performance at the edge and for on-premises servers. The video below shows Jetson Nano performing object detection on eight 1080p30 streams simultaneously with a ResNet-based model running at full resolution and a throughput of 500 megapixels per second (MP/s).

7-Zip benchmark on Raspberry Pi

The 7-Zip Benchmark command

7zip

Measures speed of the CPU and checks RAM for errors.

You can install 7-Zip from the Raspbian Desktop – this is how:

  • Click on the Raspberry in the top left of your screen:
  • Go down to “Preferences” –> and click on “Add / Remove Software”:
  • When the new window opens, type “p7zip” in the search box and hit enter
  • Click both of the checkboxes for “p7zip” (they should be the last 2 choices)

You can also install 7-Zip from the command line:

sudo apt-get install p7zip

Syntax

b [number_of_iterations] [-mmt{N}] [-md{N}] [-mm={Method}]

There are two tests:

  1. Compressing with LZMA method
  2. Decompressing with LZMA method

The benchmark shows a rating in MIPS (million instructions per second). The rating value is calculated from the measured CPU speed and it is normalized with results of Intel Core 2 CPU with multi-threading option switched off. So if you have Intel Core 2 Duo, rating values must be close to real CPU frequency.

You can change the upper dictionary size to increase memory usage by -md{N} switch. Also, you can change the number of threads by -mmt{N} switch.

The Dict column shows the dictionary size. For example, 21 means 2^21 = 2 MB.

The Usage column shows the percentage of time the processor is working. It’s normalized for a one-thread load. For example, 180% CPU Usage for 2 threads can mean that average CPU usage is about 90% for each thread.

The R / U column shows the rating normalized for 100% of CPU usage. That column shows the performance of one average CPU thread.

Avr shows averages for different dictionary sizes.

Tot shows averages of the compression and decompression ratings.

Compression speed and rating strongly depend on memory (RAM) latency.

Decompression speed and rating strongly depend on the integer performance of the CPU. For example, the Intel Pentium 4 has big branch misprediction penalty (which is an effect of its long pipeline) and pretty slow multiply and shift operations. So, the Pentium 4 has pretty low decompressing ratings.

You can run a CRC calculation benchmark by specifying -mm=crc. That test shows the speed of CRC calculation in MB/s. The first column shows the size of the block. The next column shows the speed of CRC calculation for one thread. The other columns are results for multi-threaded CRC calculation.

With -mm=* switch you can run a complex benchmark. It tests hash calculation methods, compression and encryption codecs of 7-Zip. Note that the tests of LZMA have a big weight in “total” results. And the results are normalized with AMD K8 CPU in a complex benchmark.

Examples:

#Runs the benchmark once - takes about 75 seconds on my
#Raspberry Pi 3B+ so please be patient...
7zr b
#You can run and save the output to a file if you wish
#You will not see it running this time while the benchmark
#is running - again please be patient for about 75 seconds
7zr b > 7zip-basic-benchmark-example.txt
#To view the output later or to share it with others
cat 7zip-basic-benchmark-example.txt
#Runs the benchmark twice and give you an average of the
#2 tests - this takes about 150 seconds for this test
7zr b ; 7zr b
#Runs the complete 7-zip benchmark - please be patient...
#There is more information @ http://www.single-board.com 
7zr b -mm=*
#Runs the benchmark 30 times and gives you an average
#This test takes a very long time on the Raspberry Pi
#Watch my YouTube video to see all the cores working on
#Conky - and I am using SimpleScreenRecorder and 
#Asciinema to record everything your seeing today.
7zr b 30

Asciinema
Asciinema

Click here for a direct link to the Asciicast in a new window.

To learn how to install Asciinema click here.

Here is the Asciicast:

NOTE: first 70 seconds don’t show anything as I was showing how to install 7-Zip through the Raspberry Pi GUI. You can see that in the YouTube video below.

To watch this YouTube video of the whole process in a new window, click here.

SimpleScreenRecorder
Simple Screen Recorder

Otherwise, click on the video below and enjoy!

NOTE 1:

I use several different software programs and hardware at the same time in this video. This is a culmination of hardware and software that I have used in my previous Asciicast, blogs, and videos. If you want to ask me specific questions I am always available via email, just be patient 🙂

NOTE 2:

If you are interested in testing Single Board Computers like I am, you might just want to head over to “Performance Analysis Methodology” and read what is there. It is very interesting and worth the time if you’re serious about accurate results and not just a stack of data.

%d bloggers like this: