Sure Easy! Being Sam.

Sure Easy! Being Sam.

Now this is what happens when you don’t know cloud.

Out to Control Flask

Pocoo has finally decided to have some control in top of the flask and its extensions.

Metaflask is out to help in this regard. This is an interesting project to govern the flask extensions which seem to be growing with the ever growing popularity of Flask itself.

Stewardship is an interesting concept to watch where the member who volunteer as stewards would be responsible for the sanity of an extension.

Migrating From AWS to FB

instagram-engineering:

When Instagram joined Facebook in 2012, we quickly found numerous integration points with Facebook’s infrastructure that allowed us to accelerate our product development and make our community safer. In the beginning, we built these integrations by effectively bouncing through Facebook web…

This is great effort to move such a big elephant from AWS to Facebook. I searched on github for Neti but could not find any. Couple interesting things to note here.

  1. The migration within AWS from Classic EC2 to VPC itself is a major major issue.
  2. Instagram was running the Ubuntu on EC2
  3. Facebook runs on CentOS

It would be interesting if Instagram does through some light on what issues they faced when migrating from one distro to another.

PyPI Blues!

There are times when your administrator blocks the sites that have packages indexes. The same thing happened to me a couple of days back when the main PyPI was blocked off and I could not either do a pip or easy_install of any python packages. I tussled with the Admins and all was doing was hitting the wall. I this sounds familiar, I have a solution for you.

Luckily python packages indexes are mirrored and there are many indexes all around. But some of the popular ones you could find them here. All you have to do is the following.

In the ~/.pip directory create a pip.conf file, if you dont have one yet, or edit it, to have the following line.

[global]
index-url = http://mirror.picosecond.org/pypi/simple

Now try to get install something using pip.

$ pip install wtf
Downloading/unpacking wtf
  http://mirror.picosecond.org/pypi/simple/wtf/ uses an insecure transport scheme (http). Consider using https if mirror.picosecond.org has it available
  Downloading wtf-0.1.tar.gz
  Running setup.py (path:/opt/sandbox/python/build/wtf/setup.py) egg_info for package wtf

Installing collected packages: wtf
  Running setup.py install for wtf

Successfully installed wtf
Cleaning up..

I had chosen mirror.picosecond.org/pypi mirror but you are free to choose anything you wish. Enjoy!

Befriending Pandas - I

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

Why this kung fu?

Some Kung fu this is gonna be. I promised a friend I will write about Pandas. So that people like her, who gracefully accept their noviceness(no experience in programming, or no experience in statistics), will be able to understand, grasp and wield pandas.

Some hair has fallen off since I have made that promise, just scratching and thinking how the hell to start and where the hell to start. Finally said to myself, I will start writing and see where that takes me. So here I am doing kung fu with my alter ego, the novice.

Eats Shoots and Leaves

The key aspect of all this starts from the very beginning, your data. It is absolutely imperative to have some kind of data to work on. And it’s absolutely essential that you understand this data.

This, I guess, has nothing to do with programming, and has nothing do to with statistics(as yet). So make sure you have your data. And since I cannot see yours I have my own: The Bamboo Shoot Heights. What better data could I find for pandas to munch on?

I would disclaim here, that though this is not the best data, but this is a good thing as an introductory diet for pandas. Isn’t that cute?

Pandas eating shoots

Slither a Bit

Try moving without any limbs, it’s impossible but to slither. Its best left to snakes to do that, thats why we take help of two huge ones to wield pandas. These are Python and Anaconda.

Python is the programming language and Anaconda is a set of tools for data analysis, pandas included, with Python bundle. So if you get and install [Anaconda], you have got it all to get started.

So next up is petting Anaconda to work for you.

Metalsmith: Working now on Static Sites

Metalsmith: Working now on Static Sites

GRSecurity: Harden those Boxes

GRSecurity

For over the past decade, grsecurity has provided webhosting companies and other users of Linux the highest level of security available for any mainstream OS.

Unlike other expensive security “solutions” that pretend to achieve security through known-vulnerability patching, signature-based detection, or other reactive methods, grsecurity provides real proactive security. The only solution that hardens both your applications and operating system, grsecurity is essential for public-facing servers and shared-hosting environments.

Only grsecurity provides protection against zero-day and other advanced threats that buys administrators valuable time while vulnerability fixes make their way out to distributions and production testing.

Add increased authentication for administrators, audit important system events, and confine your system with no manual configuration through advanced Role-Based Access Control.

Use Trusted Path Execution to prevent users from executing their own binaries or binaries in unsafe locations.

Invisibly reinforce the most common filesystem isolation, turning it into a true jail.

Through partnership with the PaX project, creators of ASLR and many other exploit prevention techniques — some now imitated by Microsoft and Apple, grsecurity makes many attacks technically and economically infeasible by introducing unpredictability and complexity to attempted attacks, while actively responding in ways that deny the attacker another chance.

Available for free under the GNU GPL version 2 with commercial support and the opportunity to sponsor our work, grsecurity brings you the security of the next decade, today.

Boxen: Automatic for Mac

Boxen

Boxen is your team’s IT robot. It’s a dangerously opinionated framework that automates every piece of your development environment. GitHub, Inc. wrote the first version of Boxen (imaginatively called “The Setup”) to help employees start shipping on day one. It’s configuration management for everyone: Designers, HR mavens, legal eagles, and developers. We believe that development is production, so we value consistency, predictability, and reproducibility over artisanal, hand-tweaked development environments.

We ditched The Setup and wrote Boxen so it’s easily usable by any company, not just GitHub. We’ve extracted most Boxen features into modules that can be mixed and matched to create your perfect environment, and custom behavior is always just a module away.

Astyanax: Brother to Cassandra

Astyanax

Astyanax is a Java Cassandra client library. Astyanax was the son of Hector in Greek mythology. As such, Astyanax is a refactoring of Hector into a cleaner abstraction for the connection manager and a simpler API.

Astyanax provides a complete abstraction of the connection pool implementation from the API layer. Some key features include,

  1.     Automatic failover with context
  2.     Pinning request to a specific host
  3.     Host partitions based on token ranges
  4.     Pluggable latency tracking strategy
  5.     Pluggable host selection (ex. Round Robin, Lowest latency first)
  6.     Pluggable bad host detector to determine when to mark a host as down (ex. if it times out too frequently)
  7.     Pluggable monitor interface. There is no logging inside the connection pool.
  8.     Pluggable host retry backoff strategy.
  9.     Pluggable node discovery strategy. Can use ring_describe or custom node registry service.
  10.     Minimal use of synchronized by using non-blocking data structures

Provided implementations

  1.     Basic round robin
  2.     Token aware
  3.     Bag of connections

Bamboo: Data Bamboozled

Bamboo

bamboo is an application that systematizes realtime data analysis. bamboo provides an interface for merging, aggregating and adding algebraic calculations to dynamic datasets. Clients can interact with bamboo through a REST web interface and through Python.

bamboo supports a simple querying language to build calculations (e.g. student teacher ratio) and aggregations (e.g. average number of students per district) from datasets. These are updated as new data is received.

bamboo uses pandas for data analysis, pyparsing to read formulas, and mongodb to serialize data.

bamboo is open source software released under the 3-clause BSD license, which is also known as the “Modified BSD License”.