1+ months

Research Data Scientist I

Cleveland Clinic
Cleveland, OH 44114
  • Jobs Rated

Job Summary:

The Eng Laboratory is seeking a highly-motivated data scientist for the Genomic Medicine Institute at Cleveland Clinic Lerner Institute of Research. The successful candidate will be involved in multiple projects to interrogate how a single gene, PTEN, contributes to disparate clinical outcomes – from autism spectrum disorder to multiple cancers – in individuals who carry inborn (germline) PTEN alterations. As part of these studies, the data scientist will work closely with multiple Eng lab team members under Dr. Eng’s mentorship.

The Genomic Medicine Institute at the Cleveland Clinic is an interdisciplinary institute and department committed to performing translational and clinical human genetics and genomics investigation. A major focus of the Eng Lab is to utilize multi-disciplinary approaches – classical genetics, genomics, metagenomics, metabolomics, transcriptomics, computational/systems biology, computational biophysics, immunology, molecular biology, and animal modeling – to elucidate the processes that contribute to cancer predisposition and neurodevelopmental disorders (eg, autism spectrum disorder), and other PTEN-related phenotypes in the context of germline PTEN disruption (reviewed in detail in Yehia et al., 2019 in The Journal of Clinical Investigation).



  • Gathers requirements and program model development under direction of other Data Scientists to inform problem formulation.
  • Generates relevant outputs based on specific computational pipelines, including graphical representation of quality control and downstream data.
  • Troubleshoots problems associated with the project design and/or computational pipelines
  • Participates in model building.
  • Utilizes methods particularly in the areas of modeling, Artificial Intelligence (AI), Machine Learning (ML), Deep Learning (DL), Natural Language Processing (NLP) and experience designing/implementing solutions.
  • Provides insight and recommendations to solve business problem and help inform business decisions.
  • Documents best practices and solution frameworks. Replicates and scales solutions to projects with common/similar scientific questions
  • Uses expertise in defining (with internal lab members) the challenge that needs to be solved, conducting the analysis/modeling/experiment and then providing the “answer” in a clear and scientifically friendly manner for our lab members to discuss and take actionable steps.
  • Other duties as assigned.


    • Bachelor’s Degree in Statistician, Actuarial Science, Econometrics, Physics, Biostatistics, Computer Science, Applied Mathematics, Engineering, Business Analytics, Economics, Finance or related field required.  Degree in Genetics/Genomics, Biomedical Informatics preferred.
    • Master’s or PhD preferred.

      Technical Languages:

      • Excellent written, verbal, and presentation skills in English and ability to explain the value of Machine Learning (ML) and Artificial Intelligence (AI) to business leaders required.


        • Certification or fellowship in analytics, big data, data science or related subject preferred.
        • Technology partner certification (in technology, Big data, advanced analytics, data science) – Microsoft, Oracle, Teradata, IBM, EMC, Cloudera, Hortonworks, Informatica, Tableau, SAS, R, Python preferred.

          Complexity of Work:

          • Requires critical thinking skills, decisive judgment and the ability to work unsupervised or with minimal supervision.
          • Must be able to work in a stressful environment and take appropriate action. 
          • A strong business-orientation, able to select the appropriate complex quantitative methodologies in response to specific business goals.
          • Demonstrated team, leadership, organizational, and problem solving abilities.  

            Work Experience:

            • A minimum of 18 months of related experience working with relational databases and/or distributed computing platforms, and their query interfaces, such as SQL, Teradata, MapReduce, PIG, and Hive required.
            • Offset: Master’s degree and no relevant experience.
            • Healthcare or life science experience preferred.
            • Knowledge and experience working with and optimising -omics/NGS data pipelines (e.g. whole-exome/genome sequencing, RNA-seq, GWAS, metabolomics) preferred.
            • Ability to integrate multi-omics and clinical data preferred.
            • Basic knowledge and implementation of statistical and machine learning concepts as pertinent to big data analysis (e.g., ANOVA, mixed models, machine learning algorithms such as SVM, Random Forest, PCA, etc.) is a plus.
            • Experience working with a variety of statistical languages/packages, e.g., SAS, R, Python, Spark, and/or SPSS required.
            • Ability to troubleshoot problems associated with the project design and/or computational pipelines preferred.
            • Knowledge applying advanced statistics to complex business problems required (e.g., modeling, AI, ML, Deep Learning [DL], and/or Natural Language Processing [NLP]). 
            • Familiarity with additional programming languages, including Python, Java, or C/C++.
            • Experience leveraging visualization software and techniques and business intelligence (BI) software.
            • Technical knowledge of distributed computing platforms, and common data process flows from data instrumentation & generation, to ETL, to the data warehouse itself. 
            • Demonstrated leadership qualities, including presentation, influencing and negotiation.

              Physical Requirements:

              • Manual dexterity to operate office equipment.
              • May require periods of sitting, standing and the ability to walk to various locations throughout the Foundation to attend meetings.
              • Must have normal or correction vision and ability to clearly communicate verbally by phone or in person.

                Personal Protective Equipment:

                • Follows standard precautions using personal protective equipment as required.

                  Keywords: PTEN, genetics, genomics, data, integrative omics, statistics, relational databases, next-generation sequencing

                  The policy of Cleveland Clinic and its system hospitals (Cleveland Clinic) is to provide equal opportunity to all of our employees and applicants for employment in our tobacco free and drug free environment. All offers of employment are followed by testing for controlled substance and nicotine. Job offers will be rescinded for candidates for employment who test positive for nicotine. Candidates for employment who are impacted by Cleveland Clinic’s Smoking Policy will be permitted to reapply for open positions after 90 days. Decisions concerning employment, transfers and promotions are made upon the basis of the best qualified candidate without regard to color, race, religion, national origin, age, sex, sexual orientation, marital status, ancestry, status as a disabled or Vietnam era veteran or any other characteristic protected by law. Information provided on this application may be shared with any Cleveland Clinic facility.

                  Cleveland Clinic is pleased to be an equal employment employer: Women/Minorities/Veterans/Individuals with Disabilities",


Jobs Rated Reports for Data Scientist

Posted: 2020-07-08 Expires: 2020-09-06

Before you go...

Our free job seeker tools include alerts for new jobs, saving your favorites, optimized job matching, and more! Just enter your email below.

Share this job:

Research Data Scientist I

Cleveland Clinic
Cleveland, OH 44114

Join us to start saving your Favorite Jobs!

Sign In Create Account
Data Scientist
7th2018 - Data Scientist
Overall Rating: 7/220
Median Salary: $111,840

Work Environment
Very Good
Very Low
Very Good
Powered ByCareerCast