Our little skunkworks project boots!!!
Mwahahahaha! Must check off on our list
- design
- build
- boot
- ???
- profit (or something)!
Note to self: work on eeeeevul laughter …. And get step 4 ironed out too.
Viewed 1647 times by 380 viewers
икона за подарък… and in the process, take down a drive, 5 of its friends, and our RAID card.
We have backups from before the move (15+ days old … sigh).
We’ve decided to go full monty on the new unit. Its a JackRabbit JR4 with 12x 2TB drives, 2 hot spares, and 10 disk RAID6 (8x data drives). 2x OS drives (on SSDs, rear mount). Leaves us 12 open bays. We’ll probably put some database bits on it, using some SSDs. Nightly backups to the bigger unit.
Yeah … can’t tell you how unhappy I am with this. More work, less time for revenue generating work, as important data is on vault (our backup unit), which we have to copy back to this machine.
Only 2 weeks or so of changes “lost” and lots of this is recoverable given the way we do our workflows. Still … this is precious time. Maybe the other side of the unit will be a mirror of the first RAID. A big RAID6+1.
BTW: this machine was/is our mail, web, wiki, …. etc. Not to mention our data store.
If you haven’t heard me say it before, let me be absolutely clear: RAID IS NOT A BACKUP!!!! EVER.
Viewed 4389 times by 670 viewers
So in order to (really) bring my monitoring app into the modern age, I want to change its flow from a synchronous on-demand event driven analysis and reporting tool, to an asynchronous monitoring and analysis tool, with an on-demand “report” function which is basically a presentation core atop the data set.
There are many reasons for this. Not the least of which is that this should be far more efficient at handling what I want to do … not to mention more responsive. I also don’t really want to do this as many independent processes … past history with debugging many independent but functionally interdependent processes.
What we are fundamentally doing is parsing logs. Right now, its apache logs, but a well designed system should be able to parse any logs, with the addition of a basic parser code (no, not a grammar … but something nice and simple).
So what if we wanted to run the parser when the log gets updated? Ok, I know … there are some codes that are smart enough to trigger an event upon an action. Assume for the moment that we are dealing with something where this isn’t true.
Let me go far afield from Apache. And look at Gluster. Its logs are (at best) a horrible … horrible mess. Extracting anything useful from them is very hard. And unfortunately, with many more people depending upon it, we have to parse the output, and at generate some sort of signal when dejecta impacts the rotating air movement system.
But the same is true of other servers as well. The issue is that there really is no good standard for this right now. Something with one of the message queues and a nice standard format? Would be nice. Until then, we have to ()*&*(&^%&% parse ()*&*&^%$%$%$ logs.
Apache is my stand in for a good test case.
So, rather than wait for an external query event to look at stuff, why not set up a nice asynchonous inotify based log reader? Maintain local state only during program execution. Read till the end of the file on startup, calculate the offset, turn on an inotify listener, and only scan the changes from the offset to the end of the file on the write event … updating the internal offset, and doing whatever needful thing we need to do after parsing the data?
Yeah, its more complex. But it gives us far more power.
Viewed 6754 times by 843 viewers
Good article from Matt Asay in The Register today.
While there are some stand-out success stories in Silicon Valley, there is also a raft of startups pushing product features masquerading as companies. Some of these will be acquired by the likes of Zynga and Amazon, and some will go public. But most won’t. In fact, building in the shadow of the giants in the hopes of getting acquired, or of beating them, is a losing strategy for the vast majority of companies.
This is about as truthful as it gets. There are many tiny startups, pulling in various fractions of $1M to more than $10M to develop … product features.
Is this really the right approach for VCs?
And this opens up some interesting new questions on startups and their product offerings themselves. Take Netflix. Running on Amazon S3. And what does Amazon do? Decide they want that market and go after it. To wit
Viewed 7879 times by 971 viewers
(this was actually a while ago, just getting to publishing it now).
Friday, I drove up to a local University to drop off our bid. I sent a note beforehand to let them know I might be a few minutes late, there was construction. Sure enough, got caught in a 30 minute slowdown.
I was 13 minutes late.
They said, “hey thats great. We won’t look at it”
Then on the way back, the old landlord refused to acknowledge that we were tenants, so they refused to refund our deposit.
As I told my wife later in the day, had I been kicked in the groin, my day would have improved some.
Viewed 9103 times by 1034 viewers
No, not Meatloaf lyrics. A few years ago, I guessed that the HPC market was going to bifurcate or possibly trifurcate. Well, its about 3 years on, and bifurcate it did.
Accelerators (in the form of GPUs) are everywhere. I was dead on correct in almost every aspect of what I had predicted (privately to VCs, from whom we couldn’t raise a cent in the early/mid 2000′s for this market).
Remote cluster/clouds with dropping prices per CPU hour are taking over sections of HPC, and we see some impact upon purchase decisions made by people buying clusters. Buy what you will use day to day, and buy the extra cycles you need when you need em. Just in time cycle acquisition.
Got these right. And yes, we even tried raising money for the cloud bit in 2005/2006. This time from a (short sighted) state program and VCs. Had a large customer lined up, had a VC willing to chip in, just needed the state program to agree to this.
That state program is now generally seen as an abject failure in its previous incarnation … it was supposed to help start up companies with good ideas, VCs, and likely customers. Go figure.
Of course, I got some things wrong.
I guessed that “muscular desktops” and “personal supercomputers” would become the norm.
Boy was I wrong.
Desktops, the ones that people bought, were cheap units for the most part. The big powerful supercomputer in a deskside chassis? Not selling so much. More than 8 processor cores and more than 16 GB ram? Not so interesting to people.
I had bet they were, and we built the Pegasus deskside units around them. These are basically very powerful computers with many cores, huge amounts of ram, accelerators, IO, networking, and graphics.
Viewed 7891 times by 1027 viewers
Seems I’m not alone in the world wanting to parse apache log files. I googled lots of people bitterly complaining about it. Some folks wanted to write a grammar, and a flex/yacc/bison thingy. I am sure that there are some Java programmers who’ve been working on this … oh … 6 or 7 years or so, and may be approaching a solution, with a Java byte code only slightly below 1 PB in size.
But I digress. This is the core of the code I’ve mentioned before, and darn it, I wanted to get the logging in shape. So I looked at the horrible morass of terrible … ancient code . Really horrible stuff that. And I looked at the logs.
And thought to myself … dammit, I can make a regex that handles this.
So I tried, and … sure enough, it works.
@column = ($line =~ /(\d+.\d+.\d+.\d+)\s+(\S+)\s+(\S+)\s+\[(\d+\/\S+\/\d+):(\d+:\d+:\d+)\s+([-+]{0,1}\d+)\]\s+\"(.*?)\s+HTTP\/\d+\.\d+\"\s+(\d+)\s+(\d+)\s+\"(.*?)\"\s+\"(.*?)\"/);
# parsed it BABY!!!
# c[0] = IP address
# c[1] = user name?
# c[2] = unknown
# c[3] = date
# c[4] = time
# c[5] = timezone (relative to GMT)
# c[6] = incoming request (GET, PUT, HEAD, ... with relative URI part)
# c[7] = return code (200, 404, ...)
# c[8] = size of returned data in bytes
# c[9] = referrer (or - for none)
# c[10]= User Agent string
As Chris pointed out, there’s an XKCD for that.
Yeah. Baby! My inner loop just lost 80% of its lines. Much easier to understand (is it wrong that I can parse some subset of regexes in my head? The recursive ones give me a headache and I have to start banging my head against the wall to stop them).
A minor error in my edits in the loop, will fix now. Nice that this works so well …
Viewed 24273 times by 2021 viewers
Way back in my early days at web programming stuff, I started out with HTML::Mason as a templating engine. There is nothing wrong with Mason, its actually quite good. But it encourages the same sort of “code-in-page” designs that the entire language of PHP was built around.
I’m mostly a Perl guy for application level stuff these days … have done my time with Fortran, Python, x86 assembly, C/C++, and many others. I have my biases, and I understand them. I simply don’t have time to let a language get in the way of expressing what I want to do.
So I wrote a web log analyzer a while ago. It was a Mason application, with the logic and the page intertwined. It was a mess (just like most PHP turns out). Very hard to understand, but it worked, and darn it, if I wasn’t using it for the last ~8 years or so.
About 5 years ago, I started working with the Catalyst framework for web sites. And I tried rewriting this app in it. But it was, unfortunately, not easy. So I gave up and left it alone.
With my recent EC2 foray, I thought about revisiting this code. I’ve since switched all of our development over to Mojolicious, which is very easy to write code for. And easy to separate out the view and controller functions (PHP and Mason encourage a fused view and controller).
Mojolicious is better in that its very easy to write and deploy … its installation dependency radius is minimal (e.g. how much extra crap you have to install to make it work). Catalyst is huge in this regard, and Jifty, another framework I’ve liked in the past is Ginormous. There are some other similar to Mojolicious frameworks out there, like Dancer. But Dancer (and Catalyst) all use Moose, and Moose has a fairly huge dependency radius (and it is slow).
Viewed 23235 times by 2065 viewers
Turns out Comcast doesn’t follow through (even when you call them many times to try to get them to). Thanks #Comcast .
On Thursday, I bought a Mifi (pay as you go) from Verizon. Got it into the office. Had moved the web/mail stuff to Amazon EC2 “just in case” Comcast pulled a … well … Comcast.
Yeah, took me a little while to fix the email and web side. We’ve been using our router appliance as our SOA for dns, and I had to unplug it at the old site (got everything out before 5pm Friday). So we now have a pair of machines running in EC2 (not reserved instances yet, I don’t think we need them until I get a read from Comcast on exactly how long it will take to get fracking feedback from them on when they might even deign to come out to wire us up … the box is right fracking next to my rear door, just one hole … and some wiring … geez).
ok, enough grousing.
Got some of the net infrastructure up there. Have the Comcast cable modem tied into the appliance, and tied into our GbE switch. Also have the Wifi up.
I may connect another appliance to this, configure it to use the Mifi as its gateway, and then just wire up the rest of the machines. Won’t be able to get customers into the site, and we have a few waiting.
But the EC2 setup was pretty painless. I’d like more options though, in terms of number of virtual cores with a 4GB memory size. Mail doesn’t take much, but web serving with Drupal can. We have a database, a web server, and other bits. Running on a single instance. Single virtual core. Sometimes it bogs down.
Might cycle it for a 4 core instance for a while. I can deal with a couple of days like that if needed.
I was amused by the Amazon cost comparison. Apparently, it costs me more than $100k to run my own servers per year. Who knew?
Ok, those are marketing numbers, and they are complete crap. They have little bearing on reality. We know that. Running our web/email presence on EC2 in perpetuity would be more than an order of magnitude more costly than running it ourselves.
The real value in EC2 is that you can spin it up pretty quickly to handle crap like this.
Even after we get our service back, I’ll probably keep these instances at least stored somewhere, so I can spin em up quickly if needed.
What would be awesome, from my perspective, and I am sure someone has something like this (not the VMware product) … is to take a real machine, and convert it into an AMI, so we can upload it. That would be useful. Haven’t tried this yet, haven’t searched for the code to do this with. VMware has some product like this … we tried the 4.0x version that sorta kinda almost worked.
The inverse process would be nice as well, but right now I want to take physical and make virtual out of it.
Even better would be being able to run these locally. I think Eucalyptus does the “run locally” version, but last I’ve heard they aren’t doing well. Might prefer to go with OpenStack like things and kvm.
Anyone know of a nice physical -> kvm VM converter? And last I saw, Amazon doesn’t (yet) run kvm machines. Hopefully this will change, or AMI < -> kvm converters will start working well.
Viewed 22802 times by 2057 viewers