Saturday, June 25, 2016

Microservices (microS) - lessons learned part 1

This is a continuation on what I learned from the API Academy's microservices conference. None of these are rules, they are all principles. My personal belief is that the only rule should be to embrace continuous improvement.


wAgile

Holger Reinhardt introduced the concept of wAgile which is a blend of waterfall and agile. On the one hand this sounds like an oxymoron and certainly there were laughs at the term. (A colleague of mine suggested perhaps watgile for WATerfall aGILE would be more successful as a term.) If you consider however how the triad of thesis, antithesis, synthesis works of history works though it makes sense.

The industry started with waterfall which is great for project management and it helps managers place timelines, staffing, and schedules of other projects, but waterfall has the danger of mapping things out in such detail and so far into the future that by the time you get to the end of the project you find it fell victim to the rapid pace of change of technology and business, which did not respect your careful planning. A well planed and executed project that solves a problem that no longer exists is sadly as much a failure as a disorganized one that couldn't even complete.

There have been various solutions to this, but agile is the recent darling of the industry. For startup culture the focus on MVP (minimum viable product), which gives early results means you begin to learn and adapt to your market rapidly. Not having a timeline of more than a few weeks out is fine for a company which isn't entirely sure what the business will be by then. Removing the long term calendar and only deciding on when you will complete the next task vastly increases the accuracy of a developer or engineer's time estimates.

In agile training I picked up the razor for separating a candidate for waterfall from agile, which is to ask where the unknowns are. If you know your technology well and know the business problem well then you can probably have a very successful waterfall project. Waterfall depends on your ability to predict and if you have previous experience that applies to all aspects of the project then you should be able to reasonably predict. However if you are building a new business model with new technology then you will do better with agile.

In the enterprise world though things are rarely so clear. To balance innovation with safety you are likely introducing new technology carefully and not all at once, so it is a partial unknown. The same goes for the business which tends to solve problems with mostly familiar solutions and doesn't jump into completely unknown markets in big ways. Also my experience is that no one likes to hear "we'll be done when we are done."

What wAgile says is do the planning and accept the unknown. You do that with the old trick of padding the schedule, but you do it in a smarter way. If you pad each task or phase on a Gantt chart then invariably the time will get used. If someone is done early they likely have pressures on them to get other work done for the business and they will do that work. Yes, it is time spent productively, but it is not helping to get the project done.

Instead of padding parts within the schedule, Holger says, you pad the entire schedule at the end. Keep every task or part tight without any extra time and try to stick to that. When the inevitable happens though and something does not work as planned or the time allocated was simply not enough then take some time out of the padding at the end of the schedule.


New Technology

How do you keep current with new technology and have standards? On the one hand you want to pick a few tools and make those the standard so everyone can know them well and technology can be reused or repurposed easily. Yet this closes the door on introducing new technology and keeping current.

Gilt does this by having an architecture board that brings various groups together to talk about what works and what doesn't. They celebrate when you try and fail (more on that below), because that is how you learn and you cannot innovate without trying new things. The process is to drive consensus on the standards.

There was support at the conference for what I believe is also Amazon's take on this. The policy should be if you adhere to standards you can expect full support. So things like monitoring, off hours support staff, production hardened environments, and infrastructure automation are all available to you. However if you want to code in a new language then all bets are off. If the system crashes at 3 AM you take the phone call and fix it. Sure your colleagues may help you, but you cannot expect it and you certainly cannot complain.

This not only is a policy that allows the introduction of new technology, it is also a policy for retaining key talent. Teams often have a mix of those who constantly play with the latest tech and those who hone their craft on their current skills, but the former will be discouraged if there are no opportunities to use the new tech in their jobs.

When evaluating new technology have metrics on the new and old. If new technology is a narrow improvement then be skeptical. There is a lot of overheard in learning new technology and implementing it which has to be outweighed by the improvement.

I cannot speak to how useful it is, but a few people said that Thought Works' Tech Radar is a good start to conversations on new technology.


How to Talk about Failure

A blameless postmortem is pointless. You need to learn from mistakes, but you need to also accept that everyone is human and mistakes happen. Pointing fingers is not helpful. People need to be encouraged to feel comfortable saying, "it was me, I broke it, and here's how I fixed it." A good programer is one who knows how to recover well from a failure. A programer who hasn't had failures is lucky and lacks that experience needed to handle that inevitable failure when it happens. This is what is at the heart of the concept of "fail quickly" which assumes you will recover quickly. You can only do this in an environment where failure can be discussed openly.

No comments: