Role of Technical Lead On-Call (Slack)

Category
Falit Jain
May 24, 2024
5 min read
Table of Content

TLOCs: who are they?

Technical specialists from many service sectors make up TLOCs. Their responsibility is to promptly and safely diagnose, mitigate, and resolve SEVs. However, they are not responsible for keeping engineers at ease or updating management; that is the responsibility of the IMOCs. Instead, a TLOC digs in and concentrates only on technical problem solutions, only reaching out to the IMOC for assistance or to provide status updates when absolutely required.

Though the TLOC works valiantly, other engineers are willing to step in and assist as needed. They appreciate the TLOC's need for focus.

Following a SEV, the TLOCs collaborate with their service teams to identify the issue's core cause and produce action items, such as bug fixes and the decommissioning of outdated systems. To make sure the SEV doesn't happen again, the TLOCs oversee chaos experiments following any changes. Similar to integration tests, but for your whole application stack are these experiments.

This post-SEV procedure gradually raises MTTP (mean time to prevention) and MTBF (mean time between failure).

Establishing Rotations

TLOCs, of course, alternate who is on call. If there are only a few engineers on your team (say, five), you will make one TLOC rotation that covers every service area. You will establish one TLOC rotation for every service area if your engineering team is larger (for example, 50 engineers). The distribution of such service areas is contingent upon the size of your team.

Let's say you employ ten engineers. Given that an ideal rotation consists of five TLOCs, that is sufficient for two rotations. (Too many, and no TLOC will be available frequently enough to maintain their expertise; too few, and the TLOCs risk burnout.) You must divide your services into two buckets in order to accommodate two rotations. As an illustration:

In charge of internal services including MySQL, Memcache, Amazon S3, Kafka, Monitoring, and Self-Healing Software is TLOC Rotation 1 - Infrastructure Engineering Services.

Product engineering services, or TLOC Rotation 2, are in charge of services that interact with customers, including user interface, billing, web, desktop, and mobile applications.

Every rotation assigns a Primary and a Secondary TLOC for each given week. However, there is only one acting TLOC for each rotation at any one time. Everything continues to move forward when one engineer is in charge, which increases your mean time to diagnosis (MTTD) and mean time to resolution (MTTR).

You'll see that every TLOC acts as Secondary for one week before switching to Primary the next. This gives a perhaps rusty TLOC time to warm up before reporting back for duty. To allow the most recent on-calls to impart lessons learned and assign action items to the upcoming on-calls, the TLOCs ought to convene once a week.

As your engineering team expands, you will reorganise your service buckets in a way that best suits your business and add additional 5-person rotations. As your team expands to fifteen members, for instance, your mobile application may require its own TLOC rotation if it is more sophisticated and less reliable than other components of your stack. One company might rotate its Web App differently.

Getting New TLOCs Ready

Before beginning their first on-call cycle, new TLOCs must complete training because they are the only ones in charge of driving technical resolution of SEVs. A one-hour in-person training session led by one or more seasoned TLOCs should cover the following topics:

  • SEV Management Programme Fundamentals
  • The fundamentals of GameDays
  • Examples of the company's prior SEV 0s, together with debugging methods
  • Strategies for mitigating SEV 0s

Include every new TLOC in their service area's pager rotation after training, and make sure they are receiving pages. Make sure to verify that pages transition to Secondary TLOCs in the event that the Primary fails to respond after a minute.

Lastly, grant complete access to all networking, performance, reliability, and monitoring tools and dashboards to each TLOC.

In summary

A technical specialist known as the Technical Lead On-Call (TLOC) swiftly and safely evaluates and resolves high severity occurrences (SEVs). You now know how to create TLOC rotations at your organisation and how to think about the TLOC role after reading this post. Ask your engineering manager if you'd want to become a TLOC at your organisation. It's a great chance for any engineer. Please share your war memories in the comments if you are already a TLOC!

View all
Design
Product
Software Engineering
Customer Success

Latest blogs