spotify engineering culture part 1
原文 ,因为原视频说的太快太长, 又没有字幕,于是借助youtube,把原文听&打出来了。 中文版日后有时间再翻译。
one of the big success factors here at Spority is our agile engineering culture. Culture trends to be invisible we don't notice it because it's there all the time, kind of like the air we breathe.
But if everyone understands the culture, were more likely to be able to keep it and even strengthen it as we grow. So that's the purpose of this video.
When our first music player was launched in 2008, we were pretty much as scrum company. scrum is a well-established agile development approach and it gave us a nice teammate culture.
However ,a few years later, we had gone into a bunch of teams and found that some other standards come practices we're actually getting in the way. So we decided to make all this optional.
Rules are good start ,but then break them when needed .
We decided :
agile manners matters more than scrum,
and agile principles matter more than any specific practices.
we renamed the scrum after all to agile coach ,because we wanted servant leaders more than process masters. We also started using the term squad instead of scrum team . And our key driving force became autonomy.
So what's an autonomy squad?
If scrum is a small cross-functional self organization team , usually less than 8 people. They sit together and they have n to n responsibility for the stuff they build : design , commit ,deploy, maintain , operations -- The whole thing .
each squad has a long term mission, such as: make Spotify the best place to discover music or internal stuff like infrastructure for ab testing. Autonom basically means that the squad decides what to build , how to build it and how to work together while doing it.
There are of course some boundary , such as the squad mission, the overall product strategy for whatever area they're working on and short term goals that are renegotiated every quarter.
our office is optimized for collaboration. Here's a typical squad area that squad members were closely together here with adjustable desks and easy access to each other screens. Together we're here in the lounge for things like planning sessions and retrospective. And back there is a huddle room for smaller meetings or just get some quiet time.Almost all wall are white boards.
so why is autonomy so important? Well because it's motivating and motivated people build better stuff. Also autonomy makes us fast by letting decision happen locally in the squad instead of a bunch of managers and committers and stuff. It helps us
minimize hand of waiting so we can scale without getting bogged down with dependencies and coordination.
Although each squad has its own mission, the need to be aligned with product strategy company priorities and other squads. basically be a good citizen in the ecosystem.
spotify's overall mission is more important than any individual squad .
So the key principle is really be autonomous but don't sub-optimize.
It's kind of a jazz band , although each musician is a autonomy and plays his own instrument, listens to each other and focuses on the whole song together . That's how great musician is created.
So our goal is loosely coupled but tightly aligned squads.
We're not all there yet ,but we experiment a lot with different ways of getting closer.
in fact that applies to most things in this video , this culture description is really a mix up what we are today and what we're trying to become in the future.
Alignment and autonomy may seem like different ends at scale . as in more autonomy equals less alignment. However we think a bit more like wto different dimensions. Down here is low alignment and low autonomy. a micro-management culture no high-level purpose ,just shut up and follow orders. up here is high alignment, but still low autonomy so leader are good at communicating what problem is be solved, but they're also telling people how to solve it . high alignment and high autonomy means leaders focus on what problem to be solved, but let the team figure out how to solve it . What about down here low alignment and high autonomy, means teams do whatever they want and basically all run in a different directions. Leader are helpless in our product becomes a frankenstein.
we're trying hard to be up here aligned autonomy and we keep experimenting with different ways of doing that. So alignment enables autonomy , the strong aliment we have , the more autonomy we can afford to grant . that means the leader's job is to communicate what problem needs to be solved and why. And squard's collaborate with each other to find the best solution.
One consequence the autonomy is that we have little standardization when people ask things like which code editor do you use, or how do you plan. the answer is mostly depends on which squad. some do scrum and sprint , others do cucumber, some estimate stories and measured velocity other don't . It's really up to each squad is that a formal standards.
We have a strong culture of cross-pollination. when enough squad to use a specific practice or tool such as git . That becomes the path of least retainment. And other squads tend to pick the same tool squad apart supporting that tools and helping each other and it becomes like a defacto standard. This informal approach give us a balance between consistency and flexibility.
our architecture is based on ver a hundred separate systems .
code and deploy indepently. display near interaction. but each system focus on one specific need , such as playlist management search or monitoring. We try to keep them small indie couple with clear interface and protocols.
Technically each system is owned by one squad, in fact most own several. But we have an internal open source model in our culture is more about than owning. suppose squad 1 here need something done. in system Be and squad 2 knows that code best . they typically ask squad 2 to do it . however if squad 2 to destn't have time would have other priorities, then squad 1 doesn't necessarily need to wait . We hate waiting. instead , it's wellcome to go and edit the code themself , and they ask squad 2 to review the changes . so anyone can edit any code, but we have a culture of peer code review . this improves quality , and more importantly spreads knowledge.
overtime we've involved design guideline, code standards, and other things to reduce engineering fiction but only went badly needed.
so on scale from authoritative to liberal we're definitely more on the liberal side.
now none of this would work if it wasn't for the people. we have a really strong culture : a mutual respect . I keep hearing comments like my college are awesome . people often give credit to each other for great work and seldom take credit for themself. Considering how much talent we have , there is surprisingly little ego .
one big a harm for new hires is that autonomy is kinda scary at first, you and your squad mate are expected to find your own solution. no one will tell you what to do ,but it turns out if you ask for help, you will get lots of it and fast . there's genuines respect for the fact that we're all in this boat together and need help each other succeed .
we focus a lot on motivation . Here is an example an actual email form the head of people operations:
hi everyone ,
our employee satisfaction survey, says 91% enjoy working here and 4% don't . (now that may seem like a pretty high satisfaction rate especially considering our growth pain from 2006 to 2013 . we've double every year. and now have over 2100 people but then he continues. )
this is of course not satisfactory and we want to fix it.
if you're one of those unhappy 4% , please contact us . we're here for your sake, and nothing else.
so good enough isn't good enough. half year later, things had improved , the satisfaction rate rised up to 94%.
with this strong focus autonomation, it's no coincidence that we have awesome reputation as workplace. nevertheless we’ve done plenty of problems to deal with so we need to keep improving.
ok say so we have over 50 squads spread across 4 cities. some kind of structure is need. current squads are grouped into tribes. a tribe is a light weight matrix each people is a member of the squad as well as a chapter. the squad is primary dimension focusing on product delivery and quality. while the chapter is a competency area , such as quality assistant as a coaching or web development as squad member. my chapter leaders my formal line manager a servant leader focusing on coaching and mentoring me as engineer.so i can switch squad without getting a new manager .
It's a pretty picture of accept that it's not really true in reality line. art nice and straight and things keep changing . Here's a real-life example.
from one moment in time for one tried and of course it's all different by now. and that's ok the most valuable communication habitat informal and unpredictable ways to support this . we also have gilts. a guild is a light-weight community of interest where people across the whole company gather and share knowledge with a specific area , for example leadership, web development or continues to livery . anyone can join or leave a guild at anytime. guilds typically have a mailing list , biannual on conferences and other informal communication methods . most organizational charts are in illusion . so our main focus is community rather than hierarchical structures .
we found that a strong enough community can get away with an informal motel structure .if you always need to know exactly who is making decisions , you are in the wrong place .
one thing that matters a lot for autonomy ,is how easily can we get our stuff into production. if realsing is hard will be trend to release seldom to avoid the pain. that means each release is bigger. and therefore even hard .it's a vicious cycle . but if releasing is easy , we can release ofen. that means each release is smaller and therefore easier to stay in the loop and void that one . we encourage small frequency release and invest heavily in test automation and continues to liver infrastructure really should be routine not drama .
sometimes we make big investment to make releasing easier , for example the original Spotify desktop client was a single monolithic application. in the early days with just a handful developers . it was fine. but as we grew , this became a huge problem, dozens of squad had to synchronize with each other, for each release and it could take months to get stable version.
instead of creating lots of processing rules and stuff to manager this , we change the architecture to enable decoupled releases. Using chromium embedded framework , the client is now basically a web browser in this guide. each section is like a frame on the website and squads can release their own stuff directly. as part of this architectural change , we started seeing each client platform as a client app. and involve three different flavors of squads client app squads, features squads and infrastructure squads. a feature squad focuses on one feature area , such as search . this squad will build ship and maintain search related features on all platforms. a client app squad focuses on making realse easy on one specific client platform, such as desktop , ios or android. infrastructure squads focus on making other squad more effective .they provide tools and routines for things like continues delivery , ab testing , monitoring and operations , regardless of the current structure we always strive for a self-service model , kinda like a buffet the restaurants staff , don't server you directly, they enable you to server yourself . so we avoid hand of like the plague, for example an operation squad or client app squad does not put code into production for people. instead their job is to make it easy for features got to put there own code into production. despite the -self model we sometime need a bit of sync between squads doing release.
we manager this using release train and feature toggles , each light app has a release train that departs on a regular schedule . typically every week or every 3 weeks depending on which client . just like in the physical world, if trains depart frequently and reliably , you don't need much up-front planning , just show up and take the next train.
suppose there are 3 squads are building stuff. and when the next release train arrives, feature A B and C done , while D is still in progress, the release train will include all 4 features but the unfinished one is hidden , using a feature toggle. it may sounds weird to release unfinished features and hide them . but it's nice because it exposes integration problem early and minimizes the need for code branches. on merged code hide problems and is a former technical debt. feature toggle let us dynamically show and hide stuff in test as well as production.
in addition to hiding unfinished work we used this to ab testing gradually roll out to finished features . all and all are release process is better than is used to be. but we still see plenty of improvement areas so we'll keep experimenting . this may seem like a scary model, letting each squad put the own stuff into production ,without any formal centralized control and we do screw up sometimes. but we've learned that trust in more important than control. why we hire some who we don't trust. agile at scale requires trust at scale and that means no politics , it also means no fear. fear doesn't just kill trust , it kills innovation. because the failure gets punished people won't dare try new things.
so let's talk about failure. actually no. let's take a break get on your feet get some coffee. let this stuff sink in for a bit and we come back when you are part of the individual , ok?
to be continued...