o l l a v: June 2016

Monday, June 27, 2016

The microS Way

As a conclusion to my notes and reflections on the CA Microservices Conference, I wanted to share the MicroS Way that was presented there.

1. Establish the right boundaries in your code and organization

2. Have a system that balances easy and speed with safety

3. Have the right processes and standards

4. Steer the system and measure it

5. Accept that right, easy, and safe are a product of time and context

MicroS - lessons learned part 3

This is a continuation on my notes from (and reflections on) the CA conference on microservices.

How to Change

Use shock and awe to make change. Be a little outrageous. Do not pivot by spinning on your heel --sprint with a two-by-four, slam it in the ground, and spin round to a new direction. An enterprise is about maintaining stability, so it won't change just because you casually mention it. You need to prove this change is happening and motivate the teams. Have a crazy challenge motivator.

And make the rules of change embody your bold statement. For instance, Gilt moved to microservices by saying you cannot use the main database or commit to the Ruby on Rails depo. These two rules helped to build a completely autonomous system. The trick is to find something simple that forces a new direction.

Besides changing the technology focus, change the way you frame the problem. Gilt doesn't say "do this" but instead gets the best out of a team by asking, "how do you solve this?" You want to tell your team where you are headed, but not how to get there. This is partly because it encourages inguinity, but also because the people on the ground floor always have great knowledge of how things work. Even if management rises from within things can change and those that know the day-to-day need to be consulted.

Another way to get the best out of a team is to have the right team skills. Agile teaches that you put the key players on the team. You choose the engineers and developers that have the skills for the software and technology that you need. The expertise is built within the team so they don't have to reach out and wait for someone outside the team to have the time to help. But there's an expertise that can be overlooked in the team: the business. Having close allignment with the business is automatic when someone with business background sits as a member of the team. Furthermore that person will gain an understanding of the pressures on IT so that the whole team can make wise decisions on what is both business valuable, but not shortsighted of long term IT cost.

Part of long term IT cost is the fact that once something exists it seems to be able to justify its continued existence based only on the fact that it exists. "Someone might need it" justifies going to large cost to maintain an unnecessary system, while half of that cost will not be spent to build a necessary system if it does not exist yet. There are very human reasons for thinking this way that all of us are prone to. The way to combat this natural tendency is to have a Sunset Team. This team is charged with responsibily shutting down systems that are no longer needed. The entire organization should celebrate shutting down an old service as much as they celebrate a new one.

Building Teams

It can sound easy to just put together a team, but there is a lot to consider. Besides what I already spoke to about having the right skills on the team there is also attitude and size to consider. At the conference a team size of four to five people and a department size of around 20 was recommended. If you do not have at least ten engineers working on microS, then your group likely won't be able to handle the increased overhead that comes with microS. As for choosing the people watch how they naturally gravitate to each other. For instance, watch for those that want to work on the new and shiny stuff versus those that want to build a solid infrastructure that is hardened. Put them in separate teams and give them the projects that fit them best.

But do not build new teams and ask them to build microS right from the start. The team should always better understand the work than the technology. In other words you can be new to building a shopping cart, but you better be experienced in building websites and never vice-versa. Your first microS should be ones that you build to solve a business problem that your team already understands. Always know the business better than the technology.

One reason for knowing the technology better than the business is that debugging is always more difficult with something you do not know. This is especially true of microS, but true of anything. If you know the business part well you at least have a firm understanding of what going right means even if you don't know why something is going wrong when a bug emerges.

Focus on team culture. Change that goes against the culture will not be lasting and likely will not even take place. It is an attitude challenge. Culture can cause a team that always codes in Java to write Python as if it were Java. Changing technology does not change culture. Over time a balance will grow in a group between culture, technology, and structure. When you change one the balance is lost, but the group will try to find the balance they once knew by modifying the other two to make up for the change in the one. Unless of course the change to all three is thought out and the team is properly prepared for it.

You properly prepare by knowing that these three steps must happen in this order:

1. The technology and culture is stabilized. Things in flux are actually less able to change because everyone recognizes that the last thing you want to do with a shaky house is change the foundation.

2. Optimize them. Once you have stabilized you can build on that and have a better system, which helps motivate the team to believe in their ability to create effective change.

3. Transform them. Now you are ready for real change.

Embrace Change and Risk

I wrote in a previous post about how to talk about failure. Netflix has a great culture that embraces this at all levels. The fact that they can openly talk about Quickster proves they understand that innovation comes with risk and the best way to learn from it is to talk about it openly. They trade lessons learned in an open way, which transfers knowledge between teams in a radically better way than in a workplace where competition means not admitting what went wrong or could have been better.

Sunday, June 26, 2016

MicroS - lessons learned part 2

At the CA Microservices Conference, Vijay Alagarasan spoke about some anti-patterns. I have reworded them to be patterns to follow and added a couple of my own thoughts.

Automation is Litmus

The litmus test on whether you can handle microS is if you can automate. Automated testing and deployment is key to microS, but beyond that if what you have is not regular enough or so eccentric that automation is not possible then you either still in R&D land or you simply are not ready yet. You do not need to automate everything, but if there is something about the software that inhibits automation then that will be a problem for stability, scale, or your own sanity when problems occur.

Centrally Manage Config

Design a configuration manger. There should be one server (not literally one, you're allowed redundancy) that can send configuration changes to your services. If you don't start from day one then you will be unable to control scale. The time to centrally manage the configuration of a service is when you deploy the first instance. Once you are running 15 instances of it, you are likely to miss an instance when you need to change a setting on it. Just think to the last time you were debugging an issue that was intermittent and you checked everything only to finally discover that one of the servers was configured differently. Sometimes it is obvious, but given that you always assume someone was diligent and set them all up identically you can often overlook that sort of bug. Avoid the headache and always manage configuration centrally.

The one argument you can make against central config is security. As I write this it does occur to me that a malicious actor who merely wanted to reek havoc on your network could do it through the config server, but that is more the case of a disgruntled employee with system knowledge. Chances are your hack is going to be by someone that wants to find their way to something valuable and changing something like a rate limit on an API is not likely to help much. Maybe talk to your security officer about this if you are concerned.

API Gateway

An API Gateway has many benefits. It can reveal a lot about network traffic and help to control it with things like caching, routing, throttling (limiting particular calls), and authorization. It can also help you transition from one version of a service to a another by slowly moving traffic to the new one. For example: consumer A is routed to version 2, but all other consumers are routed to version 1. Then consumer B moves to version 2 and so on. A good version strategy allows you to to introduce new versions of a service with a minimum of risk.

However all of the stability a gateway can offer can be reversed through misuse. Every bit of business logic that works its way into the gateway is a potential production issue. The point of the gateway is to abstract you from the business logic so that you can change your business logic carefully (as mentioned with transitioning to a new version of a service). If the business logic is in the gateway then you cannot change that logic without risking all of your traffic being negatively impacted. There will be some business logic in your gateway, but assume anything you put in there may need to be changed and cause a full system outage. Each IF statement should be put in place with that realization.

I'm unfamiliar with these products, but CA Layer 7 and AWS Mobile Gatway are examples of API Gateways.

Question Every Layer

Abstraction is a useful tool, but it needs to be used purposefully. For instance, having a policy to abstract the connection to every database is not good. Each use of abstraction should be for a strong reason about that particular case. Unnecessary layers should be avoided. A coworker told me "no one ever wants to be the middle man" and you should think of layers like middle men. Most of the objections to middle men are the same reason you don't want extra layers in your software. They introduce complexity and the potential for translation issues, they constrain your message into their terms (admitidly this can be a pro), and they consume resources. People would always rather go to the source, but they will put up for a middle man when there's a good reason. Don't just have an extra layer in place like in Office Space.

Do not separate your layers by bussiness logic, data access, or orchestration. One layer should likely embrace all three of these things. The boundaries of a service should be based on a business solution and not a technical one. You are building services to solve business needs. Being able to switch your database out to another one should not be the guiding star by which you design your services. If you have a good version strategy then that should be all you need for technical changes.

Saturday, June 25, 2016

Microservices (microS) - lessons learned part 1

This is a continuation on what I learned from the API Academy's microservices conference. None of these are rules, they are all principles. My personal belief is that the only rule should be to embrace continuous improvement.

wAgile

Holger Reinhardt introduced the concept of wAgile which is a blend of waterfall and agile. On the one hand this sounds like an oxymoron and certainly there were laughs at the term. (A colleague of mine suggested perhaps watgile for WATerfall aGILE would be more successful as a term.) If you consider however how the triad of thesis, antithesis, synthesis works of history works though it makes sense.

The industry started with waterfall which is great for project management and it helps managers place timelines, staffing, and schedules of other projects, but waterfall has the danger of mapping things out in such detail and so far into the future that by the time you get to the end of the project you find it fell victim to the rapid pace of change of technology and business, which did not respect your careful planning. A well planed and executed project that solves a problem that no longer exists is sadly as much a failure as a disorganized one that couldn't even complete.

There have been various solutions to this, but agile is the recent darling of the industry. For startup culture the focus on MVP (minimum viable product), which gives early results means you begin to learn and adapt to your market rapidly. Not having a timeline of more than a few weeks out is fine for a company which isn't entirely sure what the business will be by then. Removing the long term calendar and only deciding on when you will complete the next task vastly increases the accuracy of a developer or engineer's time estimates.

In agile training I picked up the razor for separating a candidate for waterfall from agile, which is to ask where the unknowns are. If you know your technology well and know the business problem well then you can probably have a very successful waterfall project. Waterfall depends on your ability to predict and if you have previous experience that applies to all aspects of the project then you should be able to reasonably predict. However if you are building a new business model with new technology then you will do better with agile.

In the enterprise world though things are rarely so clear. To balance innovation with safety you are likely introducing new technology carefully and not all at once, so it is a partial unknown. The same goes for the business which tends to solve problems with mostly familiar solutions and doesn't jump into completely unknown markets in big ways. Also my experience is that no one likes to hear "we'll be done when we are done."

What wAgile says is do the planning and accept the unknown. You do that with the old trick of padding the schedule, but you do it in a smarter way. If you pad each task or phase on a Gantt chart then invariably the time will get used. If someone is done early they likely have pressures on them to get other work done for the business and they will do that work. Yes, it is time spent productively, but it is not helping to get the project done.

Instead of padding parts within the schedule, Holger says, you pad the entire schedule at the end. Keep every task or part tight without any extra time and try to stick to that. When the inevitable happens though and something does not work as planned or the time allocated was simply not enough then take some time out of the padding at the end of the schedule.

New Technology

How do you keep current with new technology and have standards? On the one hand you want to pick a few tools and make those the standard so everyone can know them well and technology can be reused or repurposed easily. Yet this closes the door on introducing new technology and keeping current.

Gilt does this by having an architecture board that brings various groups together to talk about what works and what doesn't. They celebrate when you try and fail (more on that below), because that is how you learn and you cannot innovate without trying new things. The process is to drive consensus on the standards.

There was support at the conference for what I believe is also Amazon's take on this. The policy should be if you adhere to standards you can expect full support. So things like monitoring, off hours support staff, production hardened environments, and infrastructure automation are all available to you. However if you want to code in a new language then all bets are off. If the system crashes at 3 AM you take the phone call and fix it. Sure your colleagues may help you, but you cannot expect it and you certainly cannot complain.

This not only is a policy that allows the introduction of new technology, it is also a policy for retaining key talent. Teams often have a mix of those who constantly play with the latest tech and those who hone their craft on their current skills, but the former will be discouraged if there are no opportunities to use the new tech in their jobs.

When evaluating new technology have metrics on the new and old. If new technology is a narrow improvement then be skeptical. There is a lot of overheard in learning new technology and implementing it which has to be outweighed by the improvement.

I cannot speak to how useful it is, but a few people said that Thought Works' Tech Radar is a good start to conversations on new technology.

How to Talk about Failure

A blameless postmortem is pointless. You need to learn from mistakes, but you need to also accept that everyone is human and mistakes happen. Pointing fingers is not helpful. People need to be encouraged to feel comfortable saying, "it was me, I broke it, and here's how I fixed it." A good programer is one who knows how to recover well from a failure. A programer who hasn't had failures is lucky and lacks that experience needed to handle that inevitable failure when it happens. This is what is at the heart of the concept of "fail quickly" which assumes you will recover quickly. You can only do this in an environment where failure can be discussed openly.

Saturday, June 18, 2016

Microservices (or microS) - overview

The API Academy hosted a conference on microservices in Manhattan. I only stayed for the morning due to a busy schedule for myself that afternoon, but I was impressed. When a presentation includes pros and cons or includes stories of what went wrong I regard it as honest (more on that in my next post). A flawless IT project or technology is a rare thing and probably more about luck than skill. If you are solving a challenging problem with new tools then things will go wrong. I have always felt the mark of professionalism is not avoiding problems, but quickly recovering from them. (When you only seek to avoid problems then you stop innovating.)

What follows is what I learned from the conference about microS along with some of my own thinking. The presenters of the conference are owed credit to the good ideas, and the mistaken and flawed should be credited to me.

To cut down on typing and just because I like it, I'm going to refer to microservices as microS from now on.

Why microS?
As we have embraced SOA (Service Oriented Archetecture) we now have a mess of confusion on the network. Just as when the automobile first hit the road, it was fine when you just had a few cars, but once everyone had a car on the road there was chaos and traffic rules needed to be introduced. It is not sustainable to blindly create a service whenever there is a need if you lack an overall structure and plan. The hope is that microS give you "speed, safety, and scale in harmony" which is what the highway system, DOT, traffic regulations, and traffic conventions brough to the roadway.

What are microS?
As an evolving buzzword about an evolving concept it is difficult to define. The discussion seems to be centered around the concepts in the image below. Everyone's definition is basically a collection of a handful of these concepts. (ignore that small is highlighted)

A word about small: microS are not as their name suggests all about size. It is more about thoughtfully setting boundaries. If you divide up your services based on rules such as "it's a different service each time a different system is involved" or "it's a different service each time it's a different dev team" then you are setting boundaries based on an irrelevant thing. Set the boundaries based on what the software is doing. Systems suffer when the service size is made too small (often caused by trying to abstract and layer everything, which ironically makes change difficult) and they suffer when the service size is too large.

Another misconception that microS are trying to fight is the urge to not repeat code. Multiple version of the same thing are not only acceptable but encouraged. MicroS are trying to manage change. The desire to have, for instance, just one search engine might be making change management intolerable. If having your HR system and product system share a search engine results in exponential complexity then don't solve search with just one service. Build a service, clone it, and have two.

Startups vs. Endups - speed vs. safety
When you are a startup you can afford to be less safe and you can be like Facebook and declare that you are going to "move fast and break things." But as you grow stability becomes part of the business model and even Facebook finds itself with a less sexy slogan like "move fast with stable infrastructure." Endups (established businesses) need to balance speed and safety. MicroS exist in this space where they are not like monolith system design where safety requires a very slow pace of change and they are not like the constantly updating system that 10 developers can maintain in a startup mode.

For instance one difference is that unlike at a startup where you might edit code directly in production you do not edit production code in microS. However unlike in a monolith culture where you need sign off from lots of parties and full system testing, with microS you have automated testing which allows you to safely push your code live without that beaurorcrasy in the way. It is a balance.

The MicroS Way
Rather than define what microS are we can define what it looks like when you have the right environment for them:

establish the right boundaries (size) in code and your organization (developers and business people)
balance safety with ease of change in the system
have the right processes and standards (more on this in my next blog post)
steer the system (from a high level) and measure it (data driven)
accept this: right, easy, and safe are a product of time and context

The word you see continually is "balance" and when you think of the old monolith design (one huge system that ran everything) contrasted with the small individualized world of SOA, what microS offers is a balance between the two. We will also see this in my next post in the concept of wAgile (waterfall agile) which attempts to balance the approaches.

o l l a v