Senior Site Reliability Engineer
Company: The MathWorks, Inc.
Location: Natick, Massachusetts
Posted on: May 16, 2022
Job Description:
Working under the direction of the Manager or Senior Team Lead,
will be responsible for designing, developing and testing
sophisticated software to solve complicated infrastructure and
architecture problems, with a focus on improving site reliability;
designing, implementing, enhancing, and administering enterprise
Observability (monitoring, logging, and metrics) software tools;
selecting products from both commercial and open source
applications; writing automation scripts using Python and Ansible;
designing resilient and performant systems; ensuring MathWorks
users have better visibility and insight into the health of their
systems; designing and implementing software monitoring and logging
solutions to ensure resiliency and high availability; making
software tooling recommendations based on internal needs and
industry trends; developing integrations between observability
tools; administering observability tools including Splunk and
InfluxDB Administration to ensure they are available and
performant; and writing scripts to automate repeatable tasks. Education and Experience: Masters degree in Engineering, Computer Science, or a closely
related field (or foreign education equivalent) and two (2) years
of experience in job offered or two (2) years of experience
developing and deploying security monitoring tools. Special Requirements: Demonstrated expertise installing Splunk-based applications and
Splunk Add-ons and developing custom Splunk applications using XML,
JavaScript and HTML to monitor, search, analyze and visualize
machine-generated data and provide operational intelligence for
business operations. Demonstrated expertise creating and executing configurations for
data onboarding and user onboarding, including on-boarding data
from data inputs -- database connections, HTTP event collectors,
Syslog servers, Networks sources (TCP/UDP), scripted inputs, and
files and directories. Demonstrated expertise applying Splunk best practices by
optimizing searches, manual and automatic delimited extractions,
hardware and data storage management, data routing, and UI
performance remedies for accuracy and fast results. Demonstrated expertise using Chef configuration management tool
to install, upgrade and maintain Splunk on Windows, Linux, AIX and
Debian servers; installing and maintaining monitoring applications
-- Grafana and ELK stack -- as an alternative logging solutions for
logs; and customizing Nagios application for alerting Splunk
crashes and reboots. [Expertise may be gained during Graduate program.] For the position listed above, interested candidates may search
by job code 28680 for specific job details and requirements and
apply online on the Careers Page at
https://www.mathworks.com/company/jobs/opportunities/search/
Keywords: The MathWorks, Inc., Boston , Senior Site Reliability Engineer, Computer , Natick, Massachusetts, Massachusetts