Under the general direction of the Senior Director of Computing Services, the High Performance Computer (HPC) System Administrator is responsible for implementing and maintaining the HPC environments for San Jose State University.
The HPC System Administrator makes recommendations, provides specifications for procurement, architecture, installation, support, and maintenance of all requisite hardware and software for the HPC environments. Performing day-to-day operations which include; supporting end-users with issues using the cluster, optimizing batch job submissions, getting a full understanding of workflows and pipelines and finding how best to optimize resources.
The incumbent provides updates to the Senior Director of Computing Services on the current status of the HPC environments for SJSU.Key Responsibilities
Knowledge, Skills & Abilities
- Ensure effective, reliable, efficient, and continuous operation of the various HPC environments for SJSU.
- Maintain operational policies, procedures and practices necessary for reliable delivery of HPC services and coordinate technology projects with the appropriate faculty, staff, and students.
- Develop and maintain databases, records, documents and files associated with HPC environments.
- Build consensus and solicit input when making significant changes, and maintain good channels of communication in terms of decisions and policies associated with the delivery of HPC services within the SJSU colleges, departments and auxiliaries.
- Recommend and generate proposals for enhancements, updates and replacements.
- Plan, design, procure, install, monitor, maintain, troubleshoot and upgrades HPC Compute and GPU nodes, high performance cluster storage, and cluster related network equipment associated with the HPC.
- Ensure HPC configurations are relevant to and adequately support both academic and research requirements for faculty and students.
- Research and stay abreast of recent trends in software and hardware for the HPC environments in private data centers and public clouds.
- Implement complex projects and time-sensitive tasks under minimal supervision.
- Develop and implement requests and proposals for acquisition of equipment, software, supplies, and services.
- Create and manage user accounts and access different HPCs.
- Ensure 24/7 operation of highly available customer-facing and internal server environments at multiple data centers on campus and in public clouds.
- Plan and implement patching schedule for all components of the HPC including software and firmware updates.
- Assist users with experiment and application setup using a variety of development, performance analysis, and hardware configuration tools.
- Follow defined processes for software installations on the HPC for applications that support both academic classroom support and research support.
- Diagnose and repair software/hardware failures for various HPC environments.
- Provide escalated support to SJSU Faculties to help resolve HPC issues.
- Utilize/manage iSupport for all incidents related to the HPC.
- Advanced knowledge and proven ability to design, install, and maintain large scale Linux compute clusters and commercial supercomputing/HPC systems.
- Strong customer service skills, team skills, and the ability to collaborate within a cross-functional teams.
- Knowledge of systems administration tools and languages, such as Perl, Python, C, PHP and shell scripts.
- Advanced knowledge of designing, installing, configuring, managing and using high performance computers such as clusters of Linux machines, Parallel and Cluster file systems such as Lustre, GPFS and GFS.
- Knowledge of OpenHPC and scheduling software such as Slurm.
- Knowledge of and expertise in: High performance networking technologies such as Omnipath and Infiniband.
- Extensive knowledge of and proficiency in installing, configuring and troubleshooting computer hardware and peripherals.
- Thorough knowledge of security access and network protocols such as TCP/IP, DHCP, DNS, NFS and VPN.
- Ability to complete large complex projects with strict time constraints.
- Ability to analyze and organize large complex projects and prioritize subtasks.
- Ability to develop methodologies that enhance utility and operation of computer systems.
- Thorough knowledge of using Microsoft Active Directory and binding it to Linux Operating Systems.
- Ability to effectively and quickly troubleshoot minor to complex operating system, software and hardware failures and take appropriate corrective actions and/or develop sound workarounds.
- Ability to research problem solutions and maintain knowledge of current technologies.
- Knowledge of adult learning methods and documentation standards; strong written and oral communication skills.
- A bachelor's degree in computer science, information systems, educational technology, communications, or related fields, or similar certified coursework in applicable fields of study
- Five years of experience supporting information systems and technology
- 5+ years of experience in relevant field
- Experience working with HPC systems
- Experience maintaining Linux computer lab workstations
- Experience maintaining databases, records, documents and files associated with HPC environments
Classification: Information Technology Consultant - Expert
Hiring Range: $6,249/month - $12,100/month
San José State University offers employees a comprehensive benefits package typically worth 30-35% of your base salary. For more information on programs available, please see the Employee Benefits Summary
Click Apply Now to complete the SJSU Online Employment Application and attach the following documents:
- Letter of Interest
All applicants must apply within the specified application period: March 19, 2021 through April 2, 2021
. This position is open until filled; however, applications received after screening has begun will be considered at the discretion of the university.Contact Information
Satisfactory completion of a background check (including a criminal records check) is required for employment. SJSU will issue a contingent offer of employment to the selected candidate, which may be rescinded if the background check reveals disqualifying information, and/or it is discovered that the candidate knowingly withheld or falsified information. Failure to satisfactorily complete the background check may affect the continued employment of a current CSU employee who was offered the position on a contingent basis.
The standard background check includes: criminal check, employment and education verification. Depending on the position, a motor vehicle and/or credit check may be required. All background checks are conducted through the university's third party vendor, Accurate Background. Some positions may also require fingerprinting. SJSU will pay all costs associated with this procedure. Evidence of required degree(s) or certification(s) will be required at time of hire.
SJSU IS NOT A SPONSORING AGENCY FOR STAFF OR MANAGEMENT POSITIONS. (e.g. H1-B VISAS)
All San José State University employees are considered mandated reporters under the California Child Abuse and Neglect Reporting Act and are required to comply with the requirements set forth in CSU Executive Order 1083 as a condition of employment.Equal Employment Statement
San José State University (SJSU) is an Equal Opportunity/Affirmative Action employer committed to nondiscrimination on the basis of age, ancestry, citizenship status, color, creed, disability, ethnicity, gender, genetic information, marital status, medical condition, national origin, race, religion or lack thereof, sex, sexual orientation, transgender, or protected veteran status consistent with applicable federal and state laws. This policy applies to all SJSU students, faculty and staff programs and activities. Title IX of the Education Amendments of 1972, and certain other federal and state laws, prohibit discrimination on the basis of sex in all education programs and activities operated by the university (both on and off campus).
Closing Date/Time: Open until filled