Riaking to Docker

Riak is really for me the true key value store that epitomizes everything about a distributed database system. I had presented about it in Feb 2010.

Docker is this new thing on the block with this fantastic mechanics of delivering packages are self contained the images ready to run anywhere docker is installed(linux).

Now here is the Dockerfile on how to create a docker image with [Riak] in it.

FROM ubuntu:12.04
RUN apt-get update
RUN apt-get install -y curl lsb-release
RUN curl http://apt.basho.com/gpg/basho.apt.key | apt-key add -
RUN bash -c "echo deb http://apt.basho.com $(lsb_release -sc) main > /etc/apt/sources.list.d/basho.list"
RUN apt-get update
RUN apt-get install -y riak

The above docker file gist is available here.

Why ubuntu:12.04 and not ubuntu:latest?

The there is not build candidate of riak for the latest LTS release of the Ubuntu, Trusty. Yes, the Riak documentation, does not explicitly mention.

Why install curl and lsb-release?

These packages are not part of the base ubuntu docker images. I presume at least lsb-release should be, and why not curl as well.

How to create an image from this Dockerfile?

I put this in a directory, exclusively created for the build. And then from inside that dicrectory I run this command.

$ sudo docker build -t samof76/riak .
How to run an instance of this container?

This is the simplest way but there could be other options depending on the need.

$ sudo docker run -i -t samof76/riak /bin/bash

Since this is pushed to the docker repository hub, it could run from anywhere there is docker installed.

Out to Control Flask

Pocoo has finally decided to have some control in top of the flask and its extensions.

Metaflask is out to help in this regard. This is an interesting project to govern the flask extensions which seem to be growing with the ever growing popularity of Flask itself.

Stewardship is an interesting concept to watch where the member who volunteer as stewards would be responsible for the sanity of an extension.

PyPI Blues!

There are times when your administrator blocks the sites that have packages indexes. The same thing happened to me a couple of days back when the main PyPI was blocked off and I could not either do a pip or easy_install of any python packages. I tussled with the Admins and all was doing was hitting the wall. I this sounds familiar, I have a solution for you.

Luckily python packages indexes are mirrored and there are many indexes all around. But some of the popular ones you could find them here. All you have to do is the following.

In the ~/.pip directory create a pip.conf file, if you dont have one yet, or edit it, to have the following line.

index-url = http://mirror.picosecond.org/pypi/simple

Now try to get install something using pip.

$ pip install wtf
Downloading/unpacking wtf
  http://mirror.picosecond.org/pypi/simple/wtf/ uses an insecure transport scheme (http). Consider using https if mirror.picosecond.org has it available
  Downloading wtf-0.1.tar.gz
  Running setup.py (path:/opt/sandbox/python/build/wtf/setup.py) egg_info for package wtf

Installing collected packages: wtf
  Running setup.py install for wtf

Successfully installed wtf
Cleaning up..

I had chosen mirror.picosecond.org/pypi mirror but you are free to choose anything you wish. Enjoy!

Befriending Pandas - I

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

Why this kung fu?

Some Kung fu this is gonna be. I promised a friend I will write about Pandas. So that people like her, who gracefully accept their noviceness(no experience in programming, or no experience in statistics), will be able to understand, grasp and wield pandas.

Some hair has fallen off since I have made that promise, just scratching and thinking how the hell to start and where the hell to start. Finally said to myself, I will start writing and see where that takes me. So here I am doing kung fu with my alter ego, the novice.

Eats Shoots and Leaves

The key aspect of all this starts from the very beginning, your data. It is absolutely imperative to have some kind of data to work on. And it’s absolutely essential that you understand this data.

This, I guess, has nothing to do with programming, and has nothing do to with statistics(as yet). So make sure you have your data. And since I cannot see yours I have my own: The Bamboo Shoot Heights. What better data could I find for pandas to munch on?

I would disclaim here, that though this is not the best data, but this is a good thing as an introductory diet for pandas. Isn’t that cute?

Pandas eating shoots

Slither a Bit

Try moving without any limbs, it’s impossible but to slither. Its best left to snakes to do that, thats why we take help of two huge ones to wield pandas. These are Python and Anaconda.

Python is the programming language and Anaconda is a set of tools for data analysis, pandas included, with Python bundle. So if you get and install [Anaconda], you have got it all to get started.

So next up is petting Anaconda to work for you.

GRSecurity: Harden those Boxes


For over the past decade, grsecurity has provided webhosting companies and other users of Linux the highest level of security available for any mainstream OS.

Unlike other expensive security “solutions” that pretend to achieve security through known-vulnerability patching, signature-based detection, or other reactive methods, grsecurity provides real proactive security. The only solution that hardens both your applications and operating system, grsecurity is essential for public-facing servers and shared-hosting environments.

Only grsecurity provides protection against zero-day and other advanced threats that buys administrators valuable time while vulnerability fixes make their way out to distributions and production testing.

Add increased authentication for administrators, audit important system events, and confine your system with no manual configuration through advanced Role-Based Access Control.

Use Trusted Path Execution to prevent users from executing their own binaries or binaries in unsafe locations.

Invisibly reinforce the most common filesystem isolation, turning it into a true jail.

Through partnership with the PaX project, creators of ASLR and many other exploit prevention techniques — some now imitated by Microsoft and Apple, grsecurity makes many attacks technically and economically infeasible by introducing unpredictability and complexity to attempted attacks, while actively responding in ways that deny the attacker another chance.

Available for free under the GNU GPL version 2 with commercial support and the opportunity to sponsor our work, grsecurity brings you the security of the next decade, today.

Boxen: Automatic for Mac


Boxen is your team’s IT robot. It’s a dangerously opinionated framework that automates every piece of your development environment. GitHub, Inc. wrote the first version of Boxen (imaginatively called “The Setup”) to help employees start shipping on day one. It’s configuration management for everyone: Designers, HR mavens, legal eagles, and developers. We believe that development is production, so we value consistency, predictability, and reproducibility over artisanal, hand-tweaked development environments.

We ditched The Setup and wrote Boxen so it’s easily usable by any company, not just GitHub. We’ve extracted most Boxen features into modules that can be mixed and matched to create your perfect environment, and custom behavior is always just a module away.

Astyanax: Brother to Cassandra


Astyanax is a Java Cassandra client library. Astyanax was the son of Hector in Greek mythology. As such, Astyanax is a refactoring of Hector into a cleaner abstraction for the connection manager and a simpler API.

Astyanax provides a complete abstraction of the connection pool implementation from the API layer. Some key features include,

  1.     Automatic failover with context
  2.     Pinning request to a specific host
  3.     Host partitions based on token ranges
  4.     Pluggable latency tracking strategy
  5.     Pluggable host selection (ex. Round Robin, Lowest latency first)
  6.     Pluggable bad host detector to determine when to mark a host as down (ex. if it times out too frequently)
  7.     Pluggable monitor interface. There is no logging inside the connection pool.
  8.     Pluggable host retry backoff strategy.
  9.     Pluggable node discovery strategy. Can use ring_describe or custom node registry service.
  10.     Minimal use of synchronized by using non-blocking data structures

Provided implementations

  1.     Basic round robin
  2.     Token aware
  3.     Bag of connections