Government Careers
  • Staff Engineer, Command Center Insights & Actions

  • Epoch Biodesign
  • San Francisco, California 94199 United States View Map

Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack — from electrons to tokens — to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster.We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that — with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI.We're looking for problem‑solving, opportunity‑finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved — people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services.If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high‑performing team that believes in each other, come build with us at Crusoe.About the RoleWe are looking for a Staff Engineer to be the detection authority for Crusoe's Command Center platform—the person who owns what "something is wrong" means. You will define the heuristics, thresholds, and rules that power our alerting and anomaly detection systems, translating raw infrastructure telemetry into precise, actionable signals.This is a full software engineering role. You will ship customer‑facing features, build systems from 0 to 1, and scale existing services alongside a team of strong generalist engineers. What sets you apart is deep expertise in anomaly detection, heuristics, and machine/reinforcement learning—applied to real infrastructure at global scale. Beyond your technical contributions, you will be a cultural bar‑raiser and force multiplier: someone who elevates the entire team's craft.You will report to the Engineering Manager of the CCIA team and operate as a peer to the Distributed Systems Staff Engineer on the team.What You'll Be Working OnDetection & Intelligence Ownership: Own the full detection stack — heuristics, threshold calibration, precision/recall tuning, and the rule systems that define what "something is wrong" means for the platform.Anomaly Detection Pipelines: Design and maintain detection systems including straggler node detection, GPU health signals, and fleet‑level behavioral baselines.Signal Calibration: Drive detection fidelity by reducing false positives, increasing signal coverage, and building feedback loops that keep thresholds accurate as the fleet grows.ML/RL Integration: Evaluate and integrate machine learning and reinforcement learning techniques where they outperform rule‑based approaches — and know when not to reach for a model.Product Engineering: Ship customer‑facing features end‑to‑end across the CCIA stack — alert rule engine, control plane APIs, automated action systems, and insights delivery surfaces.0‑to‑1 & Scale: Build new systems from scratch and scale existing ones to support Crusoe's rapidly growing global fleet.Cross‑Functional Collaboration: Work closely with product counterparts to shape requirements early and partner with the data science team to develop and validate detection models.System Design: Participate in design discussions across teams, contribute architectural perspective, and help evaluate technical trade‑offs.Technical Mentorship: Mentor engineers at all levels through code review, design feedback, and direct coaching, and contribute to hiring by helping define what great looks like.What You'll Bring to the TeamAnomaly Detection & Heuristics Expertise: Deep experience building anomaly detection systems, heuristics‑based rule engines, or ML/RL systems for infrastructure or data‑intensive domains.Threshold & Signal Calibration: Demonstrated ability to reason about precision/recall trade‑offs and build feedback loops that keep detection systems accurate over time.Distributed Systems Fundamentals: Strong foundations in the building blocks of reliable, scalable backend systems—you can hold your own in any system design conversation.Full Software Engineering Craft: 5+ years shipping production software; experience with modern compiled or systems languages (Go, Rust, C++, Java, or similar).Data & Observability Fluency: Comfortable with time‑series data, telemetry pipelines, and observability primitives—you understand how raw metrics become actionable insights.Communication: You can explain detection logic, trade‑offs, and system behavior clearly to both engineers and non‑technical partners.Force Multiplier Mindset: You make the team better—through mentorship, clear technical vision, and a genuine investment in the people around you.Bonus PointsExperience with GPU profiling tools (Nsight, NCCL Inspector) or hardware‑level infrastructure diagnostics.Background in observability platforms or products.Experience with reinforcement learning applied to operational or infrastructure problems.Familiarity with large‑scale fleet management or cloud infrastructure.Passion for building team culture and engineering quality of life.BenefitsCompetitive compensation and equity packagesRestricted Stock UnitsPaid time off, paid holidays & leave of absence programsComprehensive health, dental & vision insuranceEmployer contributions to HSA accountPaid parental leavePaid life insurance, short‑term and long‑term disabilityProfessional development & tuition reimbursementMental health & wellness supportCommuter benefits (parking & transit)Cell phone stipend401(k) Retirement plan with company match up to 4% of salaryVolunteer time offGlobal travel insurance & emergency assistanceDaily meals allowanceAdditional perks & programs specific to locationCompensation RangeCompensation will be paid in the range of up to $210,000 - $255,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant's knowledge, education, and abilities, as well as internal equity and alignment with market data.Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.#J-18808-Ljbffr

Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack — from electrons to tokens — to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster.We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that — with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI.We're looking for problem‑solving, opportunity‑finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved — people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services.If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high‑performing team that believes in each other, come build with us at Crusoe.About the RoleWe are looking for a Staff Engineer to be the detection authority for Crusoe's Command Center platform—the person who owns what "something is wrong" means. You will define the heuristics, thresholds, and rules that power our alerting and anomaly detection systems, translating raw infrastructure telemetry into precise, actionable signals.This is a full software engineering role. You will ship customer‑facing features, build systems from 0 to 1, and scale existing services alongside a team of strong generalist engineers. What sets you apart is deep expertise in anomaly detection, heuristics, and machine/reinforcement learning—applied to real infrastructure at global scale. Beyond your technical contributions, you will be a cultural bar‑raiser and force multiplier: someone who elevates the entire team's craft.You will report to the Engineering Manager of the CCIA team and operate as a peer to the Distributed Systems Staff Engineer on the team.What You'll Be Working OnDetection & Intelligence Ownership: Own the full detection stack — heuristics, threshold calibration, precision/recall tuning, and the rule systems that define what "something is wrong" means for the platform.Anomaly Detection Pipelines: Design and maintain detection systems including straggler node detection, GPU health signals, and fleet‑level behavioral baselines.Signal Calibration: Drive detection fidelity by reducing false positives, increasing signal coverage, and building feedback loops that keep thresholds accurate as the fleet grows.ML/RL Integration: Evaluate and integrate machine learning and reinforcement learning techniques where they outperform rule‑based approaches — and know when not to reach for a model.Product Engineering: Ship customer‑facing features end‑to‑end across the CCIA stack — alert rule engine, control plane APIs, automated action systems, and insights delivery surfaces.0‑to‑1 & Scale: Build new systems from scratch and scale existing ones to support Crusoe's rapidly growing global fleet.Cross‑Functional Collaboration: Work closely with product counterparts to shape requirements early and partner with the data science team to develop and validate detection models.System Design: Participate in design discussions across teams, contribute architectural perspective, and help evaluate technical trade‑offs.Technical Mentorship: Mentor engineers at all levels through code review, design feedback, and direct coaching, and contribute to hiring by helping define what great looks like.What You'll Bring to the TeamAnomaly Detection & Heuristics Expertise: Deep experience building anomaly detection systems, heuristics‑based rule engines, or ML/RL systems for infrastructure or data‑intensive domains.Threshold & Signal Calibration: Demonstrated ability to reason about precision/recall trade‑offs and build feedback loops that keep detection systems accurate over time.Distributed Systems Fundamentals: Strong foundations in the building blocks of reliable, scalable backend systems—you can hold your own in any system design conversation.Full Software Engineering Craft: 5+ years shipping production software; experience with modern compiled or systems languages (Go, Rust, C++, Java, or similar).Data & Observability Fluency: Comfortable with time‑series data, telemetry pipelines, and observability primitives—you understand how raw metrics become actionable insights.Communication: You can explain detection logic, trade‑offs, and system behavior clearly to both engineers and non‑technical partners.Force Multiplier Mindset: You make the team better—through mentorship, clear technical vision, and a genuine investment in the people around you.Bonus PointsExperience with GPU profiling tools (Nsight, NCCL Inspector) or hardware‑level infrastructure diagnostics.Background in observability platforms or products.Experience with reinforcement learning applied to operational or infrastructure problems.Familiarity with large‑scale fleet management or cloud infrastructure.Passion for building team culture and engineering quality of life.BenefitsCompetitive compensation and equity packagesRestricted Stock UnitsPaid time off, paid holidays & leave of absence programsComprehensive health, dental & vision insuranceEmployer contributions to HSA accountPaid parental leavePaid life insurance, short‑term and long‑term disabilityProfessional development & tuition reimbursementMental health & wellness supportCommuter benefits (parking & transit)Cell phone stipend401(k) Retirement plan with company match up to 4% of salaryVolunteer time offGlobal travel insurance & emergency assistanceDaily meals allowanceAdditional perks & programs specific to locationCompensation RangeCompensation will be paid in the range of up to $210,000 - $255,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant's knowledge, education, and abilities, as well as internal equity and alignment with market data.Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.#J-18808-Ljbffr

Government Careers

Government Careers

Government jobs offer stability, competitive benefits, and the chance to make a meaningful impact on your community and country.

Whether you’re starting your career or seeking new opportunities, these roles provide pathways for growth, security, and service.

Explore positions across a wide range of fields and take the first step toward a rewarding future in public service.

Show more

MORE JOBS