An article on Python vs Julia for scripting

For those whom don’t know, Julia is a very powerful new language, which aims to leverage a JIT compilation mechanism to generate very fast numerical/computational code in general from a well thought out language.

I’ve argued for a while that it feels like a better Python than Python. Python, for those whom aren’t aware, is a scripting language which has risen in popularity over the recent years. It is generally fairly easy to work in, with a few caveats.

Indentation is the killer for me. The language is tolerable though, IMO, not nearly as “simple” as people claim, with a number of lower level abstractions peaking through. I am fine with those. I am not fine with (and have never been fine with) structure by indentation. This isn’t its only issue. The global interpreter lock, the incompatibility between Python 2.x and 3.x. Python does have a very nice interface to C/C++ libraries though, which make extending it relatively easy.

Julia eschews this structure by indentation. It also tries hard to be convenient, and consistent. IMO it does a great job of it. We are experimenting with using it for more than basic analytics, and it is installed on every single machine we ship in /opt/scalable/bin/julia , and have been for years. As is Python3, and Perl 5.xx.

These tools are part of our analytics stack, which has a few different versions depending upon physical footprint requirements.

Julia has made interacting with the underlying system trivial, as it should be, with few abstractions peaking out from underneath the syntax. This article discusses the differences from a pragmatic viewpoint.

Overall I agree with the points made. Perl, my go-to scripting language, has some of the python issues (abstraction leakage). Perl6 is better. Much better. Really … been looking into it in depth … and it is pretty incredible. Julia is better, and much better at the stuff that you’d want to use Python for.

Viewed 2444 times by 305 viewers

OpenLDAP + sssd … the simple guide

Ok. Here’s the problem. Small environment for customers, whom are not really sure what they want and need for authentication. Yes, they asked us to use local users for the machines. No, the number of users was not small. AD may or may not be in the picture.

Ok, I am combining two sets of users with common problems here. In one case, they wanted manual installation of many users onto machines without permanent config files. In another case I have to worry about admins whom don’t want the hassle of dealing with admin on many machines.

Enter OpenLDAP. Its basically a read heavy directory service. Using lots of old (outdated) concepts. But it works fairly well once you get it setup. But getting it set up is annoying beyond belief. So much so, that people look to Microsoft AD as an easier LDAP. A single unified authentication/authorization panel for their windows/linux environment.

For these cases, we don’t have buy in from the groups running the AD. So we can’t connect to it.

Which means locally hosted LDAP.

This part is doable in appliance form. It is still not user friendly by any measure. I don’t have problem with the configuration of the services … but they are beyond ugly. Not something we should be using in 2016.

Then the client side. Originally in Linux, you used the PADL tools (ldap*). Like the whole LDAP system, it is … well … ugly. It is non-trivial to use. You have to be very careful of how you invoke it. Even for testing.

So RedHat noticed this and wrote what is generally considered a saner version. SSSD. And it is generally better … sssd.conf is well documented, but there are few real working examples for you.

So here is one. sssd.conf talking to a machine named ldap, which hosts an openldap database. Change your ldap_search_base and ldap_uri to point to what you need.

[sssd]
config_file_version = 2
services = nss, pam
domains = LDAP

[nss]
filter_users = root
filter_groups = root

[domain/LDAP]
enumerate = true
cache_credentials = true

id_provider = ldap
auth_provider = ldap
chpass_provider = ldap

ldap_uri = ldap://ldap
ldap_search_base = dc=unison,dc=local
# following is debian specific
ldap_tls_cacert = /etc/ssl/certs/ca-certificates.crt
entry_cache_timeout = 600
ldap_network_timeout = 2

then you need to modify some of the pam system to make sure it makes use of this.

/etc/pam.d/common-password

password        sufficient                      pam_sss.so
password        [success=1 default=ignore]      pam_unix.so obscure try_first_pass sha512
password        requisite                       pam_deny.so
password        required                        pam_permit.so

/etc/pam.d/common-session

session [default=1]   pam_permit.so
session requisite     pam_deny.so
session required      pam_permit.so
session optional      pam_mkhomedir.so skel=/etc/skel umask=0077
session optional      pam_sss.so
session required      pam_unix.so 

Then, when you do this right, your test user is visible.

getent passwd | grep testuser1
testuser1:*:1000:501:testuser1:/home/testuser1:/bin/bash

Viewed 3189 times by 380 viewers

M&A time: HPE buys SGI, mostly for the big data analytics appliances

I do expect more consolidation in this space. There aren’t many players doing what SGI (and the day job) does.

The story is here.

The interesting thing about this is, that this is in the high performance data analytics appliance space. As they write:

The explosion of data — in volume and variety, across all sectors and applications — is driving organizations to adopt high-end computing systems to run compute-intensive applications and big data workloads that traditional infrastructure solutions cannot handle. This includes investments in big data analytics to quickly and securely process massive data sets and enable real-time decision making. High-end systems are being used to advance research in weather, genomics and life sciences, and enhance cyber defenses at organizations around the world.

As a result of this demand, according to International Data Corporation (IDC), the $11 billion HPC segment is expected to grow at an estimated 6-8% CAGR over the next three years1, with the data analytics segment growing at over twice that rate.

12-16% CAGR for data analytics, which I think is low … . And the point they may about the data explosion is exactly what we talk about as well.

I’ve written about this in the past, with the cloud model (ultra cheap/deep/inefficient, scale performance by throwing incredible amounts of hardware at a problem … at a fairly sizeable cost, even if it is OpEx), or … far more efficient, far faster, better designed systems that can provide unapologetic massive firepower efficiently, so you need far … far less hardware to accomplish the same thing, at a corresponding savings in CapEx/OpEx.

There aren’t many players in this space, so lets see what else happens.

Viewed 14883 times by 892 viewers

@scalableinfo 60 bay Unison with these: 3.6PB raw per 4U box

Color me impressed … Seagate and their 60TB 3.5inch SAS drive. Yes, the 60 bay Unison units can handle this. That would be 3.6PB per 4U unit. 10x 4U per 48U rack. 36PB raw per rack. 100PB in 3 racks, 30 racks for an exabyte (EB).

The issue would be the storage bandwidth wall height. Doing the math, 60TB/(1GB/s) -> 6 x 104 seconds to empty/fill such a single unit. We can drive these about 50GB/s in a box, so a single box would be 3600TB/(50GB/s) or 7.2 x 104 seconds to empty/fill a box full. Network bandwidth would be the biggest issue … we could get 2x 100Gb NICs going full speed, but even that would still be 20% of where we need to be to keep it fully loaded.

This would need to be for an archive, and you’d need a mixture of object store and erasure codes on this. No way would you even consider a RAID on such a beast.

Viewed 16690 times by 1022 viewers

Raw Unapologetic Firepower: kdb+ from @Kx

While the day job builds (hyperconverged) appliances for big data analytics and storage, our partners build the tools that enable users to work easily with astounding quantities of data, and do so very rapidly, and without a great deal of code.

I’ve always been amazed at the raw power in this tool. Think of a concise functional/vector language, coupled tightly to a SQL database. Its not quite an exact description, have a look at Kx’s website for a more accurate one.

A few years ago, I took my little Riemann Zeta Function test for a spin with a few languages, including kdb+ just to play with it. I am doing some more work with it now (32 bit version for testing/development).

This said, you need to see what this tool can do. Have a look at Fintan Quill’s (@FintanQuill) video of a talk/demo he gave at a meetup in Seattle in 2015. Demos start around 20m mark.

The people in the audience appear to be blown away by the power they see. While we like to think our machine (running the demo db) has something to do with it. Kdb+ is absolutely fantastic for dealing with huge quantities of time series data. You need to be able to store/move/process this quickly (which is where the machine comes in), but being able to so succinctly use the data, as compared to what spark/hive/etc. do in so many more steps/lines of code, requiring so many more machines …

Tremendous power and power density saves money and time. Packing a huge amount of power into a small package lets you use fewer packages to accomplish the same things as the system requiring many more packages. The cloud model is “spin up more instances to get performance by sharding and parallelism”, while kdb+ and the day job suggest “start out using very performant and efficient tools to begin with, so you need fewer of them to do the same things, which costs you less time/effort/money/etc.”

It is, in case you are not sure, the basis for the day job’s Cadence appliance. Massive fire power. Itty bitty box.

Imagine what you could do with this sort of power …

Viewed 20222 times by 1241 viewers

Seagate and ClusterStor: a lesson in not jumping to conclusions based on what was not said

I saw this analysis this morning on the Register’s channel site. This follows on the announcement of other layoffs and shuttering of facilities.

A few things. First a disclosure: arguably, the day job and more specifically our Unison product is in “direct” competition with ClusterStor, though we never see them in deals. This may or may not be a bad thing, and likely more due to market focus (we do big data, analytics, insanely fast storage in hyperconverged packages) than anything else. SGI, HPE, and Cray all resell/rebrand ClusterStor under their own system.

That out of the way, this is speculation on the part of the article. Granted, they are reading into what is, and is not being said … spokes people tend to choose words carefully, and work to “correct” (aka spin) what they perceive as an incorrect read on the matter. Indeed, Ken Claffey of Seagate strove to correct this in the first comment.

Even more to the point, the article itself wasn’t updated, but there is a new article indicating precisely this.

Short version: They are fine, just moving production elsewhere.

This actually highlights a danger in our very high frequency world. “Information” gets out into the wild, and it takes someone’s time/effort and a number of resources to bring this “information” to the point of being correct. I have no reason to disbelieve them … large companies move people/processes about all the time, specifically, to leverage economies of scale, and better cost structures elsewhere.

In the 1980s or so, IBM used to be (internally) nicknamed “Ive Been Moved”.

I think the issue was assuming that the woes in the PC drive space extended to the enterprise/high performance space. I don’t think they do. Seagate may or may not choose to break out revenues/costs associated with each business unit, likely the provide some of this in their investor relations bits.

I think it unlikely that they would have gone on the spending spree they have in this space, and then just shutter it when the PC space contracts.

All this said, in the bigger picture, the storage market is changing dramatically and quickly. Spinning disk is not necessarily toast, but it is being relegated in many designs we’ve worked on, to the same thing that tape has traditionally been used for. This is a fairly fundamental change. But remember, tape is still with us now. Think very long tail, very large volumes of data that cannot be effectively moved from tape to disk. Disk to SSD/NVM is possible, though I think disk still has a longer shelf life than NVM.

Viewed 32151 times by 1677 viewers

Systemd and non-desktop scenarios

So we’ve been using Debian 8 as the basis of our SIOS v2 system. Debian has a number of very strong features that make it a fantastic basis for developing a platform … for one, it doesn’t have significant negative baggage/technical debt associated with poor design decisions early on in the development of the system as others do.

But it has systemd.

I’ve been generally non-committal about systemd, as it seemed like it should improve some things, at a fairly minor cost in additional complexity. It provides a number of things in a very nice and straightforward manner.

That is … until … you run into the default config scenarios. These will leave you, as the server guy, asking “seriously … whiskey tango foxtrot???!?”

Well, ok, some of these are built atop Debian, so there is blame to share.

The first is the size of tmpfs (ramdisks). By default, this is controlled in early boot (and not controllable via a kernel boot parameter) by the contents of /etc/default/tmpfs. In it, you see this:

TMPFS_SIZE=20%VM

as the default. That is, each tmpfs you allocate will get a 20% of your virtual memory total as its size by default, unless you specify a size. And as it turns out, this is actually a bad thing. As the /run directory is allocated early on in the boot, not governed by /etc/fstab (not necessarily a bad thing, as the fstab is a control point) and not having any other control points …

root@unison:~# df -h /run
Filesystem      Size  Used Avail Use% Mounted on
tmpfs            13G  2.5M   13G   1% /run

root@unison:~# grep run /etc/fstab
root@unison:~# 

Hey, look at that. Its 13GB for a /run directory that would struggle to ever be 1GB.

Ok, its tmpfs, so the allocation isn’t locked. But it is backed by swap.

UNLESS YOU TURN SWAP OFF IN WHICH CASE AAAARRRRRRGGGGGHHHHH

So … to recap … Whiskey Tango Foxtrot?

But, before you get all “hey, relax dude, its just one mount … chillax” … you have to ask about the interaction with other systemd technology (/run is mounted by systemd … oh yes … it is).

Like, I dunno. Logind mebbe?

So there you are. Logging into your machine. And you notice, curiously, you have this whole /run/user/$pid thing going on. And if you look closely enough, you have these as tmpfs mounts. And they are each getting 20% of VM.

Starting to see the problem yet? No?

Ok. So you have these defaults … And a bunch of users. Whom log in. And use up these resources.

Now, to add complexity, lets say you have a swapfile rather than a swap partition. I am not a huge believer in swap … rather the opposite. If you are swapping, this is a strong signal you need more memory. If it is very rare swapping, once a month, on a non-critical system, sure, swapping is fine. If it is a daily occurance under load on a production box, you need to buy more memory. Or tune your processes so they don’t need so much memory.

This swapfile, is sitting atop a file system. This is a non-optimal scenario, but the user insisted upon swap, so we provided it. This is a failure waiting to happen, as filesystem IO requires memory allocations, which, if you think about what swap is/does, will be highly problematic in the context of actually swapping. That is, if you need to allocate memory, in order to page out to a disk, because you are trying to allocate memory … lets just say that this is the thing livelocks are made of.

And, of course, to make things worse, we have a caching layer between the physical device and the file system. One we can’t turn off completely. The caching layer also does allocations. With the same net effect.

Now that I’ve set out the chess pieces for you, let me explain what we’ve seen.

6 or 7 users log in. These tmpfs allocations are made. No swap. vm.overcommit=0. Failure. Ok, add swap. Change vm.overcommit=1. Make the allocatable percentage 85% rather than 50%. Rinse. Repeat.

Eventual failure.

Customer seriously questioning my sanity.

All the logs are showing allocation problems, but no swap. Change to vm.overcommit=2. Stamp a big old FAIL across any process that wants to overallocate. Yeah, it will catch others, not unlike the wild west of OOM killer, but at least we’ll get a real signal now.

… and …

who authorized 20% ram for these logins? The failures seem correlated with them.

Thats /etc/default/tmpfs defaults (which are insane). Ok, can fix those. But … still a problem, as logind thinks we should give this out.

Deep in the heart of darkness … er … /etc/systemd/ we find logind.conf. Which has this little gem.

RuntimeDirectorySize=20%

as its default.

Um.

Whiskey. Tango. Foxtrot.

This is where you put user temp files for the session.

Yeah … for Gnome, and other desktop uses cases, sure, 20% may be reasonable for the vast majority of people.

Not so much for heavily used servers. For the same reasons as above.

Do yourself a favor, and if you have a server, change this to

RuntimeDirectorySize=256M

which may be overkill itself.

We really don’t need these insane (for server) defaults in place … which is why I am wondering what else in systemd defaults I am going to have to fix to not cause surprises …

I’ll document them as I run into them. We are building the fixes directly into SIOS, so our users will have our updated firmware on reboot.

Viewed 35388 times by 1852 viewers

You can’t win

Like that old joke about the patient going to the Doctor for a pain …

Patient: Doctor, it hurts when I do this (does some action which hurts)
Doctor: Don’t do it then

Imagine if you will, a patient whom, after being told what is wrong, and why it hurts, and what to do about it, continues to do it. And be more intensive about doing it. And then complains when it hurts.

This is a rough metaphor for some recent support experiences.

We do our best to convince them not to do the things that cause them pain, as in this case, they are self-inflicted.

I dunno. I try. I just don’t see any way to win here (e.g. for the patient in this case to come out ahead) until they make the changes that we recommended.

Viewed 34975 times by 1830 viewers

That was fun … no wait … the other thing … not fun

Long overdue update of the server this blog runs on. It is no longer running a Ubuntu flavor, but instead running SIOSv2 which is the same appliance operating system that powers our products.

This isn’t specifically a case of eating our own dog-food, but more a case that Ubuntu, even the LTS versions, have a specific sell by date, and it is often very hard to update to the newer revs. I know, I know … they have this nice, friendly, upgrade me button on their updater. So its “easy”. I could quote Inigo Montoya here

Ok, so roll in SIOSv2. Based upon Debian 8.x (there is a RHEL/CentOS version, but I am moving away from deploying those by default unless there is a customer request behind it, due to the extra effort in making everything work right. I might post on that sometime soon. Flip the OS disks. Reboot. Configure the network. Start up the VM.

The VM required I import the disk and create a new config for it. In this way, I really wish virsh behaved the same as the VM system on SmartOS. For a number of reasons this unit couldn’t be a SmartOS box.

Ok. Had to fix the VM. Took about 10 minutes and done. Now name services and other things work. Yay.

Ok. Now install nginx and other bits for the blog. See, this is where containers would come in handy … and this unit is prepped and ready to go with two different container stacks (depending upon how I want to configure it later). But for the moment, we are building this as a monolith, with the idea of making it a microbox server later.

Install mysql and some php oddity, because WordPress.

Find my daily DB dump, import it, light up the blog and …

Everything is gone. Database connection error.

Ok.

Look at the DB dump. Looks a little small. Look for the blog stuff it it.

AND IT IS MISSING …. OMFG ….

Ok … what happened?

Didn’t I see some mysql error on a table a while ago? One I don’t use anymore in the blog? One that was corrupt?

Could that have blorked the dump?

Swap back to the old boot drives. Bring it up. Run mysqlcheck.

Sure enough, 1 broken table.

Ok, lets fix it.

#insert "sounds_of_birds_and_crickets_chirping.h"

A while later, I redo the dump.

The 75MB file is now a 3.9GB file.

Yeah, was missing some data.

Grrrr… Bad mysql … Bad ….

Swap boot drives. Restart. Reimport. Rinse.

No repeat.

And it works.

Yay.

Viewed 57825 times by 2609 viewers

And this was a good idea … why ?

The Debian/Ubuntu update tool is named “apt” with various utilities built around it. For the most part, it works very well, and software upgrades nicely. Sort of like yum and its ilk, but it pre-dates them.

This tool is meant for automated (e.g. lights out) updates. No keyboard interaction should be required.

Ever.

For any reason.

However … a recent update to one particular package, in Debian, and in Ubuntu, has resulted in installation/updates pausing. Because the person who built the update decided that it would be really … really good … if there were a text pager in the update process. So the update pauses, unless you quit the text pager, or go to the end of it.

That this is moronic is an understatement.

That this is wrong, minimizes how broken it is.

That this ever escaped Q&A boggles the mind.

Don’t make me interact with my fleet of machines for updates. Just … don’t.

If you feel you must, well … hand over maint of your code base to someone whom understands how completely wrong this is.

It is 2016. We’ve got automated tooling going across all of our systems. Our systems will break with a forced manual interaction. Which means someone either wasn’t thinking clearly, or was unaware that this is 2016.

/sigh

Viewed 56126 times by 2538 viewers