BostonRecruiter Since 2001
the smart solution for Boston jobs

Sr. Site Reliability Engineer

Company: NBCUniversal
Location: Beverly
Posted on: May 29, 2023

Job Description:

NBCUniversal owns and operates over 20 different businesses across 30 countries including a valuable portfolio of news and entertainment television networks, a premier motion picture company, significant television production operations, a leading television stations group, world-renowned theme parks and a premium ad-supported streaming service.
Here you can be your authentic self. As a company uniquely positioned to educate, entertain and empower through our platforms, Comcast NBCUniversal stands for including everyone. We strive to foster a diverse and inclusive culture where our employees feel supported, embraced and heard. We believe that our workforce should represent the communities we live in, so that together, we can continue to create and deliver content that reflects the current and ever-changing face of the world. Click here ( to learn more about Comcast NBCUniversal's commitment and how we are making an impact.

  • Create solutions to improve performance, scalability, and reliability (15%).

  • Develop scripts for monitoring tools, maintenance, and for deploying code on servers.

  • Make design recommendations for internal software and hardware projects.

  • Attend technical meetings and execute strategic and tactical plans.

  • Responsible for fine tuning applications and databases to make sure they run efficiently and optimize their performance.

  • Contribute to developing deployment tools which are smart enough to understand difference between old and new code; it should be able stop the services on servers gracefully before we make changes on the servers. This is achieved with combination of Python and Ansible working together.

  • Work closely with development teams to support continuous integration and deployment (15%).

  • Integrate automation tools such as foreman with home grown tools to automate server provisioning process.

  • Make sure correct Software version is deployed on all servers. We rely on the Software version report received from monitoring tools and act on the discrepancies found in the version report.

  • Work closely with developers on resolution if any new changes cause performance degradation. Solve problems quickly and automate processes (15%).

  • Develop and maintain containers with the help of Mesos, develop Docker images and maintain private Docker registry to store all the Docker images.

  • Monitor site stability and performance (15%). Monitoring tools play a significant role in the job as we rely on them to detect service issues and alert us in a timely manner.

  • Work on production impacting alerts received from various monitoring tools.

  • Test and integrate new tools such as Nagios, PagerDuty, Grafana, ELK, Splunk etc. into environment to monitor and detect issues quickly.

  • Perform patching on production servers and containers to keep the Operating System and Applications safe and up to date.

  • Work on external monitoring services such as Dynatrace and Website Pulse for detecting costumer impacting issues and submitting a fix swiftly.

  • Providing analytical ad hoc support (15%); and Work with various third parties in integrating their services with Vudu, which requires understanding how their service work and make appropriate changes on our end.

  • Deploy latest software on production servers to push out new features and promotions.

  • Work on securing services by deploying/renewing SSL certs on servers/services.

  • Manage users/group activity and providing limited/appropriate access to users/group with help of Active Directory.

  • Work on identifying root cause of the issues observed in monitoring tools and submitting a fix for them. 24x7 weekly on call rotation (25%).

  • Work closely with other teams such as development and QA in resolving the issues noticed in production and making sure we have proper solution and monitors are in place to avoid the reoccurrence of the issue.

  • Create new monitors plugins in Nagios, adjust thresholds of the monitors to avoid false positive alerts, create new alerts, reports, and dashboards in Splunk as we bring up new services/features in Vudu.

  • Catch Database issues, setup replications among DB clusters, and fine tune them for better performance. As a part of disaster recovery (DR) plan, responsible for supporting DR site by creating replication between active and standby data centre.

  • Distribute the traffic on web servers and database servers by configuring Load balancers, creating virtual IPs, and configuring them to bind with respective servers.

  • Perform various activity on webservers to ensure our main service is secure and available to our customers, this is achieved by administrating and maintaining Apache and Nginx.

  • Responsible for Installing/updating SSL certs, enabling certain TLS (Transport Layer Security) versions, loading modules, limiting request size.

    This position is eligible for company sponsored benefits, including medical, dental and vision insurance, 401(k), paid leave, tuition reimbursement, and a variety of other discounts and perks. Learn more about the benefits offered by NBCUniversal by visiting the Benefits page of the Careers website. Salary Range: $155,000-$165,000

    • Bachelor's degree in Computer Science, Information Systems/IT, Software Engineering or Closely related technical field (or foreign equivalent),

    • 5 (five) years of experience in the job offered or closely related occupation OR a Master's degree in Computer Science, Information Systems/IT, Software Engineering or Closely related technical field (or foreign equivalent),

    • 2 (two) years of experience in the job offered or closely related occupation.

      Special Requirements:

      • Must possess expertise/knowledge sufficient to adequately perform the duties of the job being offered. Expertise/knowledge may be gained through employment experience or education. Such expertise/knowledge cannot be "quantified" by "time." Required expertise/knowledge includes: Experience with Continuous Integration, Continuous Delivery practices.

      • Understanding of DevOps principles, experience with operational tools (Ansible, Puppet, Terraform) and best practices for infrastructure and software deployment.

      • Experience with Docker Containerization.

      • Operational experience at large scale web sites. Scripting skills in at least two languages (bash, python, ruby, perl, etc.)

      • Knowledge of monitoring and metrics.

      • Background with relational databases and database administration, with MySQL and Postgres.

      • Working knowledge of NoSQL data stores (MongoDB, Cassandra, DynamoDB, Couchbase, Postgres etc.)

      • Systems engineering experience on Linux and/or Windows platforms, Experience using AWS in a production environment.

        JOB LOCATION: -
        Fandango Media, LLC
        407 N Maple Dr., Beverly Hills, CA 90210
        40 hours per week / If offered employment, must have legal right to work in U.S.
        *Telecommuting/Remote work permitted 100% of the time from anywhere in the United States

        CONTACT: - - - - - - - - - - - - - - - - - - - - - - - - - -
        Qualified applicants please send resume to Elsbeth Velasco at
        **Must reference JOB CODE# AV23PN: when applying.
        This notice is being provided as a result of the filing of a permanent alien labor certification application for this job opportunity. Any person may provide documentary evidence bearing on the application to the Certifying Officer of the Department of Labor at:
        U.S. Department of Labor
        Employment and Training Administration
        Office of Foreign Labor Certification
        200 Constitution Avenue, NW
        Room N-5311
        Washington, DC 20210
        Phone: - 202-693-8200

        NBCUniversal's policy is to provide equal employment opportunities to all applicants and employees without regard to race, color, religion, creed, gender, gender identity or expression, age, national origin or ancestry, citizenship, disability, sexual orientation, marital status, pregnancy, veteran status, membership in the uniformed services, genetic information, or any other basis protected by applicable law. NBCUniversal will consider for employment qualified applicants with criminal histories in a manner consistent with relevant legal requirements, including the City of Los Angeles Fair Chance Initiative For Hiring Ordinance, where applicable.
        If you are a qualified individual with a disability or a disabled veteran, you have the right to request a reasonable accommodation if you are unable or limited in your ability to use or access as a result of your disability. You can request reasonable accommodations in the US by calling 1-818-777-4107 and in the UK by calling +44 2036185726.

Keywords: NBCUniversal, Boston , Sr. Site Reliability Engineer, Engineering , Beverly, Massachusetts

Click here to apply!

Didn't find what you're looking for? Search again!

I'm looking for
in category

Log In or Create An Account

Get the latest Massachusetts jobs by following @recnetMA on Twitter!

Boston RSS job feeds