11+ Apache Flume Jobs in Chennai | Apache Flume Job openings in Chennai
Apply to 11+ Apache Flume Jobs in Chennai on CutShort.io. Explore the latest Apache Flume Job opportunities across top companies like Google, Amazon & Adobe.
Location: Chennai
Education: BE/BTech
Experience: Minimum 3+ years of experience as a Data Scientist/Data Engineer
Domain knowledge: Data cleaning, modelling, analytics, statistics, machine learning, AI
Requirements:
- To be part of Digital Manufacturing and Industrie 4.0 projects across client group of companies
- Design and develop AI//ML models to be deployed across factories
- Knowledge on Hadoop, Apache Spark, MapReduce, Scala, Python programming, SQL and NoSQL databases is required
- Should be strong in statistics, data analysis, data modelling, machine learning techniques and Neural Networks
- Prior experience in developing AI and ML models is required
- Experience with data from the Manufacturing Industry would be a plus
Roles and Responsibilities:
- Develop AI and ML models for the Manufacturing Industry with a focus on Energy, Asset Performance Optimization and Logistics
- Multitasking, good communication necessary
- Entrepreneurial attitude
Additional Information:
- Travel: Must be willing to travel on shorter duration within India and abroad
- Job Location: Chennai
- Reporting to: Team Leader, Energy Management System
We are hiring for Tier 1 MNC for the software developer with good knowledge in Spark,Hadoop and Scala
Title: Platform Engineer Location: Chennai Work Mode: Hybrid (Remote and Chennai Office) Experience: 4+ years Budget: 16 - 18 LPA
Responsibilities:
- Parse data using Python, create dashboards in Tableau.
- Utilize Jenkins for Airflow pipeline creation and CI/CD maintenance.
- Migrate Datastage jobs to Snowflake, optimize performance.
- Work with HDFS, Hive, Kafka, and basic Spark.
- Develop Python scripts for data parsing, quality checks, and visualization.
- Conduct unit testing and web application testing.
- Implement Apache Airflow and handle production migration.
- Apply data warehousing techniques for data cleansing and dimension modeling.
Requirements:
- 4+ years of experience as a Platform Engineer.
- Strong Python skills, knowledge of Tableau.
- Experience with Jenkins, Snowflake, HDFS, Hive, and Kafka.
- Proficient in Unix Shell Scripting and SQL.
- Familiarity with ETL tools like DataStage and DMExpress.
- Understanding of Apache Airflow.
- Strong problem-solving and communication skills.
Note: Only candidates willing to work in Chennai and available for immediate joining will be considered. Budget for this position is 16 - 18 LPA.
Responsibilities:
- Must be able to write quality code and build secure, highly available systems.
- Assemble large, complex datasets that meet functional / non-functional business requirements.
- Identify, design, and implement internal process improvements: automating manual processes, optimizing datadelivery, re-designing infrastructure for greater scalability, etc with the guidance.
- Create datatools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
- Monitoring performance and advising any necessary infrastructure changes.
- Defining dataretention policies.
- Implementing the ETL process and optimal data pipeline architecture
- Build analytics tools that utilize the datapipeline to provide actionable insights into customer acquisition, operational efficiency, and other key business performance metrics.
- Create design documents that describe the functionality, capacity, architecture, and process.
- Develop, test, and implement datasolutions based on finalized design documents.
- Work with dataand analytics experts to strive for greater functionality in our data
- Proactively identify potential production issues and recommend and implement solutions
Skillsets:
- Good understanding of optimal extraction, transformation, and loading of datafrom a wide variety of data sources using SQL and AWS ‘big data’ technologies.
- Proficient understanding of distributed computing principles
- Experience in working with batch processing/ real-time systems using various open-source technologies like NoSQL, Spark, Pig, Hive, Apache Airflow.
- Implemented complex projects dealing with the considerable datasize (PB).
- Optimization techniques (performance, scalability, monitoring, etc.)
- Experience with integration of datafrom multiple data sources
- Experience with NoSQL databases, such as HBase, Cassandra, MongoDB, etc.,
- Knowledge of various ETL techniques and frameworks, such as Flume
- Experience with various messaging systems, such as Kafka or RabbitMQ
- Good understanding of Lambda Architecture, along with its advantages and drawbacks
- Creation of DAGs for dataengineering
- Expert at Python /Scala programming, especially for dataengineering/ ETL purposes
Python + Data scientist : |
• Build data-driven models to understand the characteristics of engineering systems |
• Train, tune, validate, and monitor predictive models |
• Sound knowledge on Statistics |
• Experience in developing data processing tasks using PySpark such as reading, merging, enrichment, loading of data from external systems to target data destinations |
• Working knowledge on Big Data or/and Hadoop environments |
• Experience creating CI/CD Pipelines using Jenkins or like tools |
• Practiced in eXtreme Programming (XP) disciplines |
GCP Data Analyst profile must have below skills sets :
- Knowledge of programming languages like https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.simplilearn.com%2Ftutorials%2Fsql-tutorial%2Fhow-to-become-sql-developer&data=05%7C01%7Ca_anjali%40hcl.com%7C4ae720b3f3cc45c3e04608da3346b335%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C637878675987971859%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=EImfaJAD1KHOyrBQ7FkbaPl1STtfnf4QdQlbjw72%2BmE%3D&reserved=0" target="_blank">SQL, Oracle, R, MATLAB, Java and https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.simplilearn.com%2Fwhy-learn-python-a-guide-to-unlock-your-python-career-article&data=05%7C01%7Ca_anjali%40hcl.com%7C4ae720b3f3cc45c3e04608da3346b335%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C637878675987971859%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Z2n1Xy%2F3YN6nQqSweU5T7EfUTa1kPAAjbCMTWxDCh%2FY%3D&reserved=0" target="_blank">Python
- Data cleansing, data visualization, data wrangling
- Data modeling , data warehouse concepts
- Adapt to Big data platform like Hadoop, Spark for stream & batch processing
- GCP (Cloud Dataproc, Cloud Dataflow, Cloud Datalab, Cloud Dataprep, BigQuery, Cloud Datastore, Cloud Datafusion, Auto ML etc)
We are looking for an outstanding ML Architect (Deployments) with expertise in deploying Machine Learning solutions/models into production and scaling them to serve millions of customers. A candidate with an adaptable and productive working style which fits in a fast-moving environment.
Skills:
- 5+ years deploying Machine Learning pipelines in large enterprise production systems.
- Experience developing end to end ML solutions from business hypothesis to deployment / understanding the entirety of the ML development life cycle.
- Expert in modern software development practices; solid experience using source control management (CI/CD).
- Proficient in designing relevant architecture / microservices to fulfil application integration, model monitoring, training / re-training, model management, model deployment, model experimentation/development, alert mechanisms.
- Experience with public cloud platforms (Azure, AWS, GCP).
- Serverless services like lambda, azure functions, and/or cloud functions.
- Orchestration services like data factory, data pipeline, and/or data flow.
- Data science workbench/managed services like azure machine learning, sagemaker, and/or AI platform.
- Data warehouse services like snowflake, redshift, bigquery, azure sql dw, AWS Redshift.
- Distributed computing services like Pyspark, EMR, Databricks.
- Data storage services like cloud storage, S3, blob, S3 Glacier.
- Data visualization tools like Power BI, Tableau, Quicksight, and/or Qlik.
- Proven experience serving up predictive algorithms and analytics through batch and real-time APIs.
- Solid working experience with software engineers, data scientists, product owners, business analysts, project managers, and business stakeholders to design the holistic solution.
- Strong technical acumen around automated testing.
- Extensive background in statistical analysis and modeling (distributions, hypothesis testing, probability theory, etc.)
- Strong hands-on experience with statistical packages and ML libraries (e.g., Python scikit learn, Spark MLlib, etc.)
- Experience in effective data exploration and visualization (e.g., Excel, Power BI, Tableau, Qlik, etc.)
- Experience in developing and debugging in one or more of the languages Java, Python.
- Ability to work in cross functional teams.
- Apply Machine Learning techniques in production including, but not limited to, neuralnets, regression, decision trees, random forests, ensembles, SVM, Bayesian models, K-Means, etc.
Roles and Responsibilities:
Deploying ML models into production, and scaling them to serve millions of customers.
Technical solutioning skills with deep understanding of technical API integrations, AI / Data Science, BigData and public cloud architectures / deployments in a SaaS environment.
Strong stakeholder relationship management skills - able to influence and manage the expectations of senior executives.
Strong networking skills with the ability to build and maintain strong relationships with both business, operations and technology teams internally and externally.
Provide software design and programming support to projects.
Qualifications & Experience:
Engineering and post graduate candidates, preferably in Computer Science, from premier institutions with proven work experience as a Machine Learning Architect (Deployments) or a similar role for 5-7 years.