Monday, September 26, 2016

Odroid XU4 - My new NAS

Few thoughts on ARM:

ARM has been kicking ass and taking names for several years now. It's no surprise that other companies would want to acquire it - Japanese company SoftBank finished it's acquisition of ARM a couple days ago, at an astonishing $32 Billion.


ARM doesn't make anything physically - but they do create CPU designs and license them. ARM based processors are found everywhere today - in cameras, routers, smart TV, cellphones, game consoles, laptops, even servers.


For ARM based servers, the goal is energy efficiency where raw performance is secondary. Rarely would one look at arm over the standard x86 and consider it for a performance advantage. Even newer Atom processors (though the line is effectively dead *supposedly?*) is generally more powerful.


There are many use cases where processing power isn't quite necessary though. When I/O is the bottleneck, or where we need reliability, or need to keep a server running with little power, the importance of fast processors is far less. In multiprocessing environments, where a few heavy CPU bound processes can stymie a fast processor with low core count, a slower processor with high core count can still stay operational, letting a user do whatever else they may need to do.



On the Odroid-xu4:






A good use case would be a webserver, file storage or network cache system (memcached, cachewho [my own thing...]), home automation server. For my own purchase of an Odroid-xu4, I am using it as a NAS, with minor web/development jobs.


Here's the specs on the little bugger:

CPU     : Samsung Exynos 5422 (2GHz Cortex A15 x4 + 1.3GHz Cortex A7 x4)
RAM     : 2GB LPDDR3, 933MHz (16bit interface, 14.9 GBps bandwidth)
GPU     : Mali-T628
Video   : HDMI (standard type-A), 1080p capable
Audio   : HDMI, i2s (no audio jack)
USB3.0  : 2 ports
USB2.0  : 1 port
Storage : EMMC, Micro-SD
Network : RJ-45, Gigabit Ethernet
GPIO    : 30pin + 12pin section (i2s, i2c, spi, ADC etc.)


Possible uses:
HTPC - video out via HDMI means this is a compact way to create an HTPC system. various operating systems are supported - several Linux distributions, and even Android.
NAS - Standard Linux isn't too hard to turn into a NAS, but there's also dedicated storage distros like Open Media Vault.
Web Server - Python, Ruby, Apache, Nginx, PHP, Node - loads of ways to get a web app running here.
Media Streaming - Plex can turn this into a media streaming server.
Home Automation - Much in the same manner as a Raspberry PI can be used. This is probably a little less optimal, since the extra power over a Raspberry PI isn't really needed.
Robotics - GPIO pins can communicate with servo boards and arduinos and other devices giving control over mechanical parts, input for sensors, and the device of course packs a lot of processing capability.

The advantages of this over an x86 device:

Cost - $80 + SD card ($10?). x86 devices are quite cheap if you go with Atom, though not quite at this level. If you do find one, it's usually not a complete system, or lacks USB 3.0/gigabit ethernet.
Noise - very quiet fan, optional no noise case. Not really much of an advantage over Atom which doesn't need a fan in many cases either.
Size - while there ARE Intel Stick PC devices, they are lacking some connectivity - often USB3.0 and/or Gigabit Ethernet. The combination of these is necessary for a decent NAS, or any server where you want fast transfers. Looking at devices that offer USB3.0/Gigabit Ethernet means lots more money.

NB: recent SolidRun board can give this a run for its money, though it's more expensive, with some options at several times the price.

http://techreport.com/news/30699/solidrun-microsom-offers-braswell-cpus-on-a-tiny-package




Why Odroid-xu4? Why not a Raspberry PI which has more community support?


I'm making a huge deal about Gigabit Ethernet and USB3.0 over the usual USB2.0 and 10/100Mbps Ethernet, because it's quite relevant to current media types.



A Raspberry Pi 3 is a powerful device for it's size and cost being the same credit-card size as the XU4 with a retail value of $40. It has a solid quad core CPU, but USB2.0 and 100Mbps Ethernet. It's certainly usable for a storage server, but reading from a disk would be limited to the slowest speed in the chain - 12.5MBps. That's mind numbingly slow when you're syncing a terabyte or six. 



For perspective, lets consider how different bandwidths handle 1 TB of data:

─┬
│    Interface    │       Bandwidth      Time/TB  
│"Fast" Ethernet  │  12.5MB per second 0.97 days
USB2.0           │  60MB   per second  4hr 51.3m │
Gigabit Ethernet │ 125MB   per second  2hr 20m   │
USB3.0           │ 625MB   per second  28m       │


While you won't always need to sync terabytes of information, you will be concerned with hundreds of gigabytes. Easily happens when you're getting back from holiday and took a lot of RAW images and uncompressed video. Of course you cannot expect these speeds - the harddrive itself has a limit. Of course when you're syncing from one external drive to another external drive (as is my plan) you're looking at half the bandwidth in the best case - the same bus handles reading from one drive and writing to the other. So that seemingly acceptable 12.5MBps drops to just over 6MBps - and we're assuming maximum theoretical speeds. With the USB3.0 speeds, I can comfortably sync one drive to another and not take much of a hit - I'd most likely bottleneck near the maximum write speed of my drive.



To NTFS or not to NTFS....


Assuming I use the Odroid-xu4 for a LAN, what filesystem to use? NTFS is relatively limited compared to EXT4. The primary concern was the extra CPU utilization that NTFS would take over EXT4, however NTFS will be compatible with my other computers/laptop. My initial sync was over USB anyway directly connected to the windows machine that currently holds my library - so NTFS, purely for compatibility sake.



Setup:


I got the OpenMediaVault image from here.

https://sourceforge.net/projects/openmediavault/files/Odroid-XU3_XU4/

Win32diskimager can put the ISO on an SD card.

https://sourceforge.net/projects/win32diskimager/

From there that's it. Put the card in the mini server and boot up. No need for a directly connected mouse and keyboard, just ssh into it, or use the web interface. For most things NAS related, the web interface will suffice.




So you can make users, shared on connected drives, mount/unmount the drives, make users, cronjobs etc. 


I also came across this blog here http://obihoernchen.net/1235/odroid-xu4-with-openmediavault/

which was invaluable in configuring the NTFS drives for better performance:

Use on demand CPU governor -

in: /etc/default/openmediavault
add:
 OMV_CPUFREQUTILS_GOVERNOR="ondemand"
run:
 omv-mkconf cpufrequtils
 update-rc.d cpufrequtils defaults

CPU governor tuning:

run:
 apt-get install sysfsutils

in: /etc/sysfs.conf

add:
 # cpu0 sets cpu[0-3], cpu4 sets cpu[4-7]
 devices/system/cpu/cpu0/cpufreq/ondemand/io_is_busy = 1
 devices/system/cpu/cpu4/cpufreq/ondemand/io_is_busy = 1 
 devices/system/cpu/cpu0/cpufreq/ondemand/sampling_down_factor = 10
 devices/system/cpu/cpu4/cpufreq/ondemand/sampling_down_factor = 10
 devices/system/cpu/cpu0/cpufreq/ondemand/up_threshold = 80
 devices/system/cpu/cpu4/cpufreq/ondemand/up_threshold = 80

run:

 cpufreq-set -g ondemand -c 0
 cpufreq-set -g ondemand -c 4
 service sysfsutils start

NTFS mount options:

in: /etc/default/openmediavault
add:
 OMV_FSTAB_MNTOPS_NTFS="defaults,nofail,noexec,noatime,big_writes"

I strongly suggest you read the original link at Obihörnchen's blog to understand what each command does.



Drive performance:


root@odroid:~/major# dd if=/dev/zero of=./testfile bs=1000M count=1 oflag=direct

1+0 records in
1+0 records out
1048576000 bytes (1.0 GB) copied, 12.7106 s, 82.5 MB/s
root@odroid:~/major# dd if=/dev/zero of=./testfile bs=1000M count=1 oflag=direct
1+0 records in
1+0 records out
1048576000 bytes (1.0 GB) copied, 12.6035 s, 83.2 MB/s

Getting over 80MBps write - that's much better than I was expecting.

root@odroid:~/major# dd if=./testfile of=/dev/null bs=1000M count=1
1+0 records in
1+0 records out
1048576000 bytes (1.0 GB) copied, 11.272 s, 93.0 MB/s
root@odroid:~/major# dd if=./testfile of=/dev/null bs=1000M count=1
1+0 records in
1+0 records out
1048576000 bytes (1.0 GB) copied, 11.313 s, 92.7 MB/s

Getting over 90MBps read - again a really decent result for a single disk.

My understanding is the drive I'm using (Western Digital RED 6TB should handle double that - 175MB/s read/write - and it's possible I might achieve that with EXT4 on this same platform. More testing is needed when I purchase another drive. For now, I'll enjoy the speeds that are near the maximum of the interface, and easily faster than my AC Wifi provides.


The drive I'm currently using will be accompanied by a few more later. This is achieved by using an external 4-bay harddrive enclosure.
Next to the Odroid, this stack of drive bays looks huge. It isn't :D
This is a Vantec HX4R.

Bays lock in place with a clip. The enclosure supports SATA and USB3.0, and from the sticker you can also see RAID settings - 0, 1, 0+1, 5 JBOD - and in the way I use it, just as a hub for all drives. That top drive is a drive I removed all drive parts and interfaces from. It's just a shell - holding some screws for the other bays but makes for a good hidden storage unit.



Benchmarks:


These benchmarks were taken using the on demand CPU governor configuration described. I've compared it against my laptop, and desktop.


Desktop: Core i5 2400 (Sandy Bridge), 16GB RAM

Laptop: Celeron N2940 (BayTrail-m), 8GB RAM
Odroid-XU4: Exynos 5422, 2GB RAM

It's interesting noting the specifications of these:



▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄

█    CPU        █  Power?             █ Cores  Frequency intro 
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀█▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀█▀▀▀▀▀▀▀▀█▀▀▀▀▀▀▀▀▀▀▀█▀▀▀▀▀▀▀█
 Core i5 2400  █  95W TDP            █ 4      3.1-3.4   █ 2011  
Celeron N2940 █ 7.5W/4.5W TDP/SDP   █ 4      1.83-2.252014  
Exynos 5422   █ 10W/14W max CPU/GPU █ 8      2.0/1.3   █ 2015  
 ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀

While it may seem that the Exynos 5422 is a higher watt CPU, this was achieved from tests from the odroid forum. Typically the entire system hovers at a wattage my laptop's processor will only dream of. There's further tweaks that lower power too - such as down clocking the GPU to reduce power as a server even more. It should also be noted, the Thermal Design Power (TDP) and Scenario Design Power (SDP). The Intel-AMD war brought in marketing departments that spread BS over these numbers. TDP was the rating the silicone was designed to handle. SDP was a typical workload. Nothing really was equal in comparing AMD and Intel CPUs that stated these. 


The power supply for the odroid system is 5 Volts, 4 Amps. That's 20 Watts. Consider that in several cases here, the GPU is also adding to power consumption: http://www.mikronauts.com/hardkernel/hardkernel-odroid-xu4-review/11/

Either way you compare the Baytrail Celeron against the Exynos 5422, the Core i5 is sorely out of place. The question is is the performance also that far out there, or do the lower powered processors give it a disadvantage in efficiency?

I ran each test a several times to get 3 close results, and kept the middle.

+---------------+------------+-----------------+------------+
|               | Odroid-Xu4 | Baytrail Laptop | i5 Desktop |
+---------------+------------+-----------------+------------+
| Mencoder      |    3148    |     2478        |    795     |
| p7zip (text)  |    7.342   |     6.708       |    2.675   |
| p7zip (video) |    174     |     143         |    32      |
ImageMagick   |            |                 |            |
Apache bench  |            |                 |            |
+---------------+------------+-----------------+------------+


It's apparent that the i5 desktop is several times faster - but maybe it's not fast enough. This CPU power can jump over 70Watts when under heavy load. That puts it at least 10 times the power of the other CPUs. It's really interesting the see the large difference in 7zip on data that can't be compressed well (video) compared to data that compresses a lot (text). My laptop's CPU is never that far ahead of the Xu4 either. Expect laptops with ARM to gain in popularity (there's already Chromebooks and Android). Especially with the Atom line no longer available for that purpose.

4 comments:

  1. Hey thanks for mentioning my blog :)
    Nice cpu comparison you did there!

    ReplyDelete
    Replies
    1. Oh, thank you sir! Your blog information was invaluable to getting the performance I wanted out of this.

      I am curious about how the on demand governor handles the different CPU types for a load. Do you happen to know how the OS prioritizes threads on the A15 vs A7 cores? Are the A15s generally unused until high CPU load and then high load threads get moved to them?

      Delete
    2. Got curious...my tests show that initial CPU intensive threads populate the A15 cores first. Subsequent threads do run on the A7 cores, and are moved to the A15 cores when they become free. I should do a little write up on that after playing around more... I wonder if heterogeneous designs are really the future...

      Delete
    3. Yes it works as you described. You can't really predict whether a task will run on a15 or a7.
      I pinned my network and USB ports to A15 cores and could improve performance a little bit more with this.
      I want to finish this blog for like 6 month xD

      I wonder as well. Kernel support is still a mess and there are quite a few problems with such a setup.

      Delete