Monday, September 26, 2016

Odroid XU4 - My new NAS

Few thoughts on ARM:

ARM has been kicking ass and taking names for several years now. It's no surprise that other companies would want to acquire it - Japanese company SoftBank finished it's acquisition of ARM a couple days ago, at an astonishing $32 Billion.


ARM doesn't make anything physically - but they do create CPU designs and license them. ARM based processors are found everywhere today - in cameras, routers, smart TV, cellphones, game consoles, laptops, even servers.


For ARM based servers, the goal is energy efficiency where raw performance is secondary. Rarely would one look at arm over the standard x86 and consider it for a performance advantage. Even newer Atom processors (though the line is effectively dead *supposedly?*) is generally more powerful.


There are many use cases where processing power isn't quite necessary though. When I/O is the bottleneck, or where we need reliability, or need to keep a server running with little power, the importance of fast processors is far less. In multiprocessing environments, where a few heavy CPU bound processes can stymie a fast processor with low core count, a slower processor with high core count can still stay operational, letting a user do whatever else they may need to do.



On the Odroid-xu4:






A good use case would be a webserver, file storage or network cache system (memcached, cachewho [my own thing...]), home automation server. For my own purchase of an Odroid-xu4, I am using it as a NAS, with minor web/development jobs.


Here's the specs on the little bugger:

CPU     : Samsung Exynos 5422 (2GHz Cortex A15 x4 + 1.3GHz Cortex A7 x4)
RAM     : 2GB LPDDR3, 933MHz (16bit interface, 14.9 GBps bandwidth)
GPU     : Mali-T628
Video   : HDMI (standard type-A), 1080p capable
Audio   : HDMI, i2s (no audio jack)
USB3.0  : 2 ports
USB2.0  : 1 port
Storage : EMMC, Micro-SD
Network : RJ-45, Gigabit Ethernet
GPIO    : 30pin + 12pin section (i2s, i2c, spi, ADC etc.)


Possible uses:
HTPC - video out via HDMI means this is a compact way to create an HTPC system. various operating systems are supported - several Linux distributions, and even Android.
NAS - Standard Linux isn't too hard to turn into a NAS, but there's also dedicated storage distros like Open Media Vault.
Web Server - Python, Ruby, Apache, Nginx, PHP, Node - loads of ways to get a web app running here.
Media Streaming - Plex can turn this into a media streaming server.
Home Automation - Much in the same manner as a Raspberry PI can be used. This is probably a little less optimal, since the extra power over a Raspberry PI isn't really needed.
Robotics - GPIO pins can communicate with servo boards and arduinos and other devices giving control over mechanical parts, input for sensors, and the device of course packs a lot of processing capability.

The advantages of this over an x86 device:

Cost - $80 + SD card ($10?). x86 devices are quite cheap if you go with Atom, though not quite at this level. If you do find one, it's usually not a complete system, or lacks USB 3.0/gigabit ethernet.
Noise - very quiet fan, optional no noise case. Not really much of an advantage over Atom which doesn't need a fan in many cases either.
Size - while there ARE Intel Stick PC devices, they are lacking some connectivity - often USB3.0 and/or Gigabit Ethernet. The combination of these is necessary for a decent NAS, or any server where you want fast transfers. Looking at devices that offer USB3.0/Gigabit Ethernet means lots more money.

NB: recent SolidRun board can give this a run for its money, though it's more expensive, with some options at several times the price.

http://techreport.com/news/30699/solidrun-microsom-offers-braswell-cpus-on-a-tiny-package




Why Odroid-xu4? Why not a Raspberry PI which has more community support?


I'm making a huge deal about Gigabit Ethernet and USB3.0 over the usual USB2.0 and 10/100Mbps Ethernet, because it's quite relevant to current media types.



A Raspberry Pi 3 is a powerful device for it's size and cost being the same credit-card size as the XU4 with a retail value of $40. It has a solid quad core CPU, but USB2.0 and 100Mbps Ethernet. It's certainly usable for a storage server, but reading from a disk would be limited to the slowest speed in the chain - 12.5MBps. That's mind numbingly slow when you're syncing a terabyte or six. 



For perspective, lets consider how different bandwidths handle 1 TB of data:

─┬
│    Interface    │       Bandwidth      Time/TB  
│"Fast" Ethernet  │  12.5MB per second 0.97 days
USB2.0           │  60MB   per second  4hr 51.3m │
Gigabit Ethernet │ 125MB   per second  2hr 20m   │
USB3.0           │ 625MB   per second  28m       │


While you won't always need to sync terabytes of information, you will be concerned with hundreds of gigabytes. Easily happens when you're getting back from holiday and took a lot of RAW images and uncompressed video. Of course you cannot expect these speeds - the harddrive itself has a limit. Of course when you're syncing from one external drive to another external drive (as is my plan) you're looking at half the bandwidth in the best case - the same bus handles reading from one drive and writing to the other. So that seemingly acceptable 12.5MBps drops to just over 6MBps - and we're assuming maximum theoretical speeds. With the USB3.0 speeds, I can comfortably sync one drive to another and not take much of a hit - I'd most likely bottleneck near the maximum write speed of my drive.



To NTFS or not to NTFS....


Assuming I use the Odroid-xu4 for a LAN, what filesystem to use? NTFS is relatively limited compared to EXT4. The primary concern was the extra CPU utilization that NTFS would take over EXT4, however NTFS will be compatible with my other computers/laptop. My initial sync was over USB anyway directly connected to the windows machine that currently holds my library - so NTFS, purely for compatibility sake.



Setup:


I got the OpenMediaVault image from here.

https://sourceforge.net/projects/openmediavault/files/Odroid-XU3_XU4/

Win32diskimager can put the ISO on an SD card.

https://sourceforge.net/projects/win32diskimager/

From there that's it. Put the card in the mini server and boot up. No need for a directly connected mouse and keyboard, just ssh into it, or use the web interface. For most things NAS related, the web interface will suffice.




So you can make users, shared on connected drives, mount/unmount the drives, make users, cronjobs etc. 


I also came across this blog here http://obihoernchen.net/1235/odroid-xu4-with-openmediavault/

which was invaluable in configuring the NTFS drives for better performance:

Use on demand CPU governor -

in: /etc/default/openmediavault
add:
 OMV_CPUFREQUTILS_GOVERNOR="ondemand"
run:
 omv-mkconf cpufrequtils
 update-rc.d cpufrequtils defaults

CPU governor tuning:

run:
 apt-get install sysfsutils

in: /etc/sysfs.conf

add:
 # cpu0 sets cpu[0-3], cpu4 sets cpu[4-7]
 devices/system/cpu/cpu0/cpufreq/ondemand/io_is_busy = 1
 devices/system/cpu/cpu4/cpufreq/ondemand/io_is_busy = 1 
 devices/system/cpu/cpu0/cpufreq/ondemand/sampling_down_factor = 10
 devices/system/cpu/cpu4/cpufreq/ondemand/sampling_down_factor = 10
 devices/system/cpu/cpu0/cpufreq/ondemand/up_threshold = 80
 devices/system/cpu/cpu4/cpufreq/ondemand/up_threshold = 80

run:

 cpufreq-set -g ondemand -c 0
 cpufreq-set -g ondemand -c 4
 service sysfsutils start

NTFS mount options:

in: /etc/default/openmediavault
add:
 OMV_FSTAB_MNTOPS_NTFS="defaults,nofail,noexec,noatime,big_writes"

I strongly suggest you read the original link at Obihörnchen's blog to understand what each command does.



Drive performance:


root@odroid:~/major# dd if=/dev/zero of=./testfile bs=1000M count=1 oflag=direct

1+0 records in
1+0 records out
1048576000 bytes (1.0 GB) copied, 12.7106 s, 82.5 MB/s
root@odroid:~/major# dd if=/dev/zero of=./testfile bs=1000M count=1 oflag=direct
1+0 records in
1+0 records out
1048576000 bytes (1.0 GB) copied, 12.6035 s, 83.2 MB/s

Getting over 80MBps write - that's much better than I was expecting.

root@odroid:~/major# dd if=./testfile of=/dev/null bs=1000M count=1
1+0 records in
1+0 records out
1048576000 bytes (1.0 GB) copied, 11.272 s, 93.0 MB/s
root@odroid:~/major# dd if=./testfile of=/dev/null bs=1000M count=1
1+0 records in
1+0 records out
1048576000 bytes (1.0 GB) copied, 11.313 s, 92.7 MB/s

Getting over 90MBps read - again a really decent result for a single disk.

My understanding is the drive I'm using (Western Digital RED 6TB should handle double that - 175MB/s read/write - and it's possible I might achieve that with EXT4 on this same platform. More testing is needed when I purchase another drive. For now, I'll enjoy the speeds that are near the maximum of the interface, and easily faster than my AC Wifi provides.


The drive I'm currently using will be accompanied by a few more later. This is achieved by using an external 4-bay harddrive enclosure.
Next to the Odroid, this stack of drive bays looks huge. It isn't :D
This is a Vantec HX4R.

Bays lock in place with a clip. The enclosure supports SATA and USB3.0, and from the sticker you can also see RAID settings - 0, 1, 0+1, 5 JBOD - and in the way I use it, just as a hub for all drives. That top drive is a drive I removed all drive parts and interfaces from. It's just a shell - holding some screws for the other bays but makes for a good hidden storage unit.



Benchmarks:


These benchmarks were taken using the on demand CPU governor configuration described. I've compared it against my laptop, and desktop.


Desktop: Core i5 2400 (Sandy Bridge), 16GB RAM

Laptop: Celeron N2940 (BayTrail-m), 8GB RAM
Odroid-XU4: Exynos 5422, 2GB RAM

It's interesting noting the specifications of these:



▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄

█    CPU        █  Power?             █ Cores  Frequency intro 
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀█▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀█▀▀▀▀▀▀▀▀█▀▀▀▀▀▀▀▀▀▀▀█▀▀▀▀▀▀▀█
 Core i5 2400  █  95W TDP            █ 4      3.1-3.4   █ 2011  
Celeron N2940 █ 7.5W/4.5W TDP/SDP   █ 4      1.83-2.252014  
Exynos 5422   █ 10W/14W max CPU/GPU █ 8      2.0/1.3   █ 2015  
 ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀

While it may seem that the Exynos 5422 is a higher watt CPU, this was achieved from tests from the odroid forum. Typically the entire system hovers at a wattage my laptop's processor will only dream of. There's further tweaks that lower power too - such as down clocking the GPU to reduce power as a server even more. It should also be noted, the Thermal Design Power (TDP) and Scenario Design Power (SDP). The Intel-AMD war brought in marketing departments that spread BS over these numbers. TDP was the rating the silicone was designed to handle. SDP was a typical workload. Nothing really was equal in comparing AMD and Intel CPUs that stated these. 


The power supply for the odroid system is 5 Volts, 4 Amps. That's 20 Watts. Consider that in several cases here, the GPU is also adding to power consumption: http://www.mikronauts.com/hardkernel/hardkernel-odroid-xu4-review/11/

Either way you compare the Baytrail Celeron against the Exynos 5422, the Core i5 is sorely out of place. The question is is the performance also that far out there, or do the lower powered processors give it a disadvantage in efficiency?

I ran each test a several times to get 3 close results, and kept the middle.

+---------------+------------+-----------------+------------+
|               | Odroid-Xu4 | Baytrail Laptop | i5 Desktop |
+---------------+------------+-----------------+------------+
| Mencoder      |    3148    |     2478        |    795     |
| p7zip (text)  |    7.342   |     6.708       |    2.675   |
| p7zip (video) |    174     |     143         |    32      |
ImageMagick   |            |                 |            |
Apache bench  |            |                 |            |
+---------------+------------+-----------------+------------+


It's apparent that the i5 desktop is several times faster - but maybe it's not fast enough. This CPU power can jump over 70Watts when under heavy load. That puts it at least 10 times the power of the other CPUs. It's really interesting the see the large difference in 7zip on data that can't be compressed well (video) compared to data that compresses a lot (text). My laptop's CPU is never that far ahead of the Xu4 either. Expect laptops with ARM to gain in popularity (there's already Chromebooks and Android). Especially with the Atom line no longer available for that purpose.

Friday, September 9, 2016

Arduino is fun - great for custom Radio Control signal handling

I've been having some fun with arduino. They're quite cheap boards, with arduino nano coming in at ~$3 per board. That's $3 - THREE. That's pretty awesome for a small amount of custom processing power. Unlike a board like the Raspberry PI which has to run a full Linux kernel, this just runs whatever script you write. As such, the timings are very consistent - it's excellent for handling servo PWN signals where you need to measure pulses from ~1000-2000 microseconds accurately. From my experience the resolution was 4μs.

This makes it excellent for managing radiocontrol PWM signals in complex RC vehicles. 



That wrapped up red component in my latest RC vehicle (another hobby, I build RC cars...) is an arduino that reads in steering and another channel and processes it to output a new signal that lets me choose if I want the truck to have 4 wheel steering, or crabwalk (both wheels point the same way making it go mostly sideways).




Definitely one of the more interesting vehicles I've built, and a decent summer project (though I did start last year wrecking the transmission pushing myself in a chair). 

This actually started out with me receiving a non-working unit for handling quadsteer that I had purchased on amazon. When using the unit, the servos lacked power, wouldn't center, and were slow. I used an oscilloscope to diagnose the signal and saw the pulses were 40 milliseconds apart. For Radio control PWM signals, the leading edge should be 20 milliseconds apart. What was essentially happening was the server would get a signal and act on it for 20 milliseconds, and then not do anything for the next 20 milliseconds.

So I turned to arduino nano in the hopes to making my own. Below I've setup that arduino nano on the breadboard to mix signals as I'd like. Each square on the oscilloscope is 10 ms, so my working prototype is showing the proper waveform. This is what was wrapped up in electrical tape in the first picture.




Below is the code I wrote to handle this. The idea is to read two channels - the steering and a spare channel to know if to invert the steering between front and rear. Since I'm reading the time of the pulse, I need a little math to normalize 1000-2000μs to -1 to 1 and back again. This lets me multiply the channels so I can then smoothly transition between crabwalk/quadsteer. 

Should be noted we're using interrupts to gather input channel data. This is a non-blocking method for gathering the input. I just requires us to determine the time of the pulse by subtracting the time the clock had at the signal's leading edge.

Edit: 20200907 - code had some bugs, turning off/on interrupts wasn't needed and led to jerky servo output. Tidied it up a bit.


#include <Servo.h> 

volatile unsigned long leadingedge1;
volatile unsigned long leadingedge2;
volatile int pulsetime1; 
volatile int pulsetime2;


//declare servo pins
int servoin1  = 2; // pin 2 - steering
int servoin2  = 3; // pin 3 - inversion channel
int servoout1 = 9; // output read servo is pin 9, front is pin 2
Servo rearservo;


int ch1_hist[3];
int ch2_hist[3];

int execute=0;
long RServo=1500; //value to write to rear servo

void setup()
{
  pinMode(servoin1, INPUT);      // sets the digital pin 1 as input
  pinMode(servoin2, INPUT);      // sets the digital pin 1 as input
  pinMode(servoout1, OUTPUT);    // sets the digital pin 9 as output
//  pinMode(servoout2, OUTPUT);    // sets the digital pin 9 as output
  leadingedge1 = 0;
  leadingedge2 = 0;
  pulsetime1 = 1500;
  pulsetime2 = 1500;
  attachInterrupt(0, chan1, CHANGE);
  attachInterrupt(1, chan2, CHANGE);
  rearservo.attach(servoout1);
  Serial.begin(115200);        // for debugging
}

int amode(int a[]){ //mode or average of an array - use to smooth glitches
  if (a[0]==a[1]) { return a[0]; }
  if (a[0]==a[2]) { return a[0]; }
  if (a[1]==a[2]) { return a[1]; }
  return (a[0]+a[1]+a[2])/3;
}

int pusharr(int a[],int pushval) {
  a[2]=a[1];
  a[1]=a[0];
  a[0]=pushval;
  return 0;
}

void dostuff(){
  execute=0;
  RServo = 1500+((((long)pulsetime1-1500)*((long)pulsetime2-1500))/500);
  if (RServo > 2050) {
    RServo=2050;
  } else {
    if (RServo < 950) {
      RServo=950;
    }
  }
  Serial.print("P1.");
  Serial.print(pulsetime1);
  Serial.print("--P2.");
  Serial.print(pulsetime2);
  Serial.print("--SRear.");
  Serial.print(RServo);
  Serial.print("\n" );
  execute=1;
}

void chan1()
{
  if(digitalRead(servoin1) == HIGH)
  {
    leadingedge1 = micros();
  } else {
    if (leadingedge1 > 0)
    {
      pulsetime1 = ((volatile long)micros() - leadingedge1)-14;
      //14 us added from other operations? center needed normalizing to 1500
      if ((pulsetime1 > 800 and pulsetime1 < 2200)) {pusharr(ch1_hist,pulsetime1);}
      leadingedge1 = 0;
      pulsetime1= amode(ch1_hist);
    }
  }
}

void chan2()
{
  if(digitalRead(servoin2) == HIGH)
  {
    leadingedge2 = micros();
  }
  else
  {
    if(leadingedge2 > 0)
    {
      pulsetime2 = ((volatile long)micros() - leadingedge2)-14;
      if ((pulsetime2 > 800 and pulsetime2 < 2200)) {pusharr(ch2_hist,pulsetime2);}
      leadingedge2 = 0;
      pulsetime2= amode(ch2_hist);
    }
  }
}

void loop()
{  
  delay(2);  //delay is non blocking
  dostuff();  //do stuff after the receiver has sent all pulses
  if (execute==1) { rearservo.write(RServo);}
}