Big Data Engineer
The Big Data ML Engineer fills a critical data, analytics, technology support, and innovation role for the business analytics and advanced analytics functions within the organization. The Engineer is primarily responsible for end-user product development, deployment in production framework, data analytics technical support as well as leveraging best tools and techniques, and end-user training of new emerging analytics open source technologies. S/he is also the primary conduit for identifying, researching, and evaluating new and innovative technologies that enhance the organization’s enterprise analytics and advanced analytics capabilities.
Essential Job Functions
The ML Engineer works both independently and in collaboration with a cross-functional team of Data scientists and solution system architects to effectively develop, deploy, monitor, manage, and support AI/ML models and advanced analytics technology, data infrastructures, and underlying analytics use cases—primarily focused around open source technologies including cloud infrastructures.
This individual evaluates short/long-term business needs required to support key business goals and priorities and works to ensure Advanced analytics solutions are built and deployed in an effective and efficient manner on Key Enterprise systems. Under the guidance of the Group’s Director and in cooperation with partners in decision science, technology, and data the Engineer will coordinate the development of on-premise and cloud-based analytical non-production and production infrastructure and tools providing computational and statistical capabilities to enhance business results and monetize on key data assets for business decision management solutions.
The Engineer will be working closely with data scientists, data mining experts, and business partner supporting the design of experiments and analytics, data sampling and mining, verification of data quality and information integrity, and best practices around the development and deployment of predictive/prescriptive models, DevOps operational systems and practices, and data visualization solutions.
The Engineer has responsibility for advising data scientists, Agile project teams, and solution architects in the integration of analytical models/methods into decision management solutions. The Engineer will assist peers in best practices and in the selection and integration of appropriate tools to support required analytic products in close coordination with the organization’s AI/ AutoML analytics, digital intelligence engineers, solution/data architects, data integration developers, and data science community ensuring tight integration of functionality and toolsets.
- Bachelor’s degree in computer science, electrical/electronic engineering or other engineering or technical discipline is required.
- Minimum of 8 years of experience in IT and Big data software development is required
- Minimum 3+ Predictive Analytics model implementation experience in production environments using ML/DL libraries like TensorFlow, H20, Pytorch , Sci-kit Learn.
- Experience in using NLP, Bi/Visual analytics, Graph Databases like Neo4j/Tiger Graph is preferred,
- Experiences in designing, developing, optimizing and troubleshooting complex data analytic pipelines and ML model applications using Spark, HDFS and other big data related technologies
- Programming in Python, R or Scala using distributed frameworks like PySpark , Spark, SparkR
- Working Knowledge in IDE environment/Tools like Jupyter , R Studio, GitHub, Docker, Jenkins
- Solid knowledge of data warehousing such as Hadoop, MapReduce, HIVE, Apache Spark, as well as cloud base data storage: Google Cloud Storage with various formats (Parquet, JSON, ORC, Avro, delimited)
- Solid understanding of databases such as DB2, Oracle, Teradata, MySQL, PostgreSQL
- Extensive Experience with R and Python including language-specific and data science-oriented packages required.
- Experience with Hadoop and Spark cluster, SparkSQL , Spark ML, and other third-party machine learning algorithms using Scala, PySpark and/or SparkR
- Experience with Linux/Unix required
- Exposure to Google Cloud services- GCP or any cloud environment.
- Working experience on Apache Airflow
- Experience in enterprise scale analytic solutions development and deployment with high performance, scalability, availability & reliability.
- Certified Professional Google Data Engineer preferred
Discover More AI Jobs:
- Address Independence, OH