- KSQL
- Data Engineering spectrum (Java/Spark)
- Spark Scala / Kafka Streaming
- Confluent Kafka components
- Basic understanding of Hadoop
Similar jobs
Hi,
We are hiring for Data Scientist for Bangalore.
Req Skills:
- NLP
- ML programming
- Spark
- Model Deployment
- Experience processing unstructured data and building NLP models
- Experience with big data tools pyspark
- Pipeline orchestration using Airflow and model deployment experience is preferred
Company Overview: An 8-year-old IT Services and consulting company based in Hyderabad providing services in maximizing product value while delivering rapid incremental innovation, possessing extensive SaaS company M&A experience including 20+ closed transactions on both the buy and sell sides. They have over 100 employees and looking to grow the team.
Location: Hyderabad and Bengaluru
Position: Data Pipeline Engineer
Experience: 8+ years of commercial experience
Experience in the below (Min 8yrs)
- Designing and implementing a real time, robust and scalable data pipeline on unstructured / semi-structured geospatial that can handle big volumes of geospatial data.
- Building required infrastructure for optimal extraction, transformation and loading of data from various remote sensing devices using AWS and Geospatial technologies.
Skills:
- Python
- ETLs
- SQL (Postgres, MySQL)
- NoSQL (MongoDB)
- Spark, Kafka (Anyone)
- Airflow (GTH)
- AWS (Lambda, S3, ECS)
- Docker (GTH)
- QGIS, ArcGIS, LASTools, Bayesmap, Postpac, Pointerra (GTH)
Roles and
Responsibilities
Seeking AWS Cloud Engineer /Data Warehouse Developer for our Data CoE team to
help us in configure and develop new AWS environments for our Enterprise Data Lake,
migrate the on-premise traditional workloads to cloud. Must have a sound
understanding of BI best practices, relational structures, dimensional data modelling,
structured query language (SQL) skills, data warehouse and reporting techniques.
Extensive experience in providing AWS Cloud solutions to various business
use cases.
Creating star schema data models, performing ETLs and validating results with
business representatives
Supporting implemented BI solutions by: monitoring and tuning queries and
data loads, addressing user questions concerning data integrity, monitoring
performance and communicating functional and technical issues.
Job Description: -
This position is responsible for the successful delivery of business intelligence
information to the entire organization and is experienced in BI development and
implementations, data architecture and data warehousing.
Requisite Qualification
Essential
-
AWS Certified Database Specialty or -
AWS Certified Data Analytics
Preferred
Any other Data Engineer Certification
Requisite Experience
Essential 4 -7 yrs of experience
Preferred 2+ yrs of experience in ETL & data pipelines
Skills Required
Special Skills Required
AWS: S3, DMS, Redshift, EC2, VPC, Lambda, Delta Lake, CloudWatch etc.
Bigdata: Databricks, Spark, Glue and Athena
Expertise in Lake Formation, Python programming, Spark, Shell scripting
Minimum Bachelor’s degree with 5+ years of experience in designing, building,
and maintaining AWS data components
3+ years of experience in data component configuration, related roles and
access setup
Expertise in Python programming
Knowledge in all aspects of DevOps (source control, continuous integration,
deployments, etc.)
Comfortable working with DevOps: Jenkins, Bitbucket, CI/CD
Hands on ETL development experience, preferably using or SSIS
SQL Server experience required
Strong analytical skills to solve and model complex business requirements
Sound understanding of BI Best Practices/Methodologies, relational structures,
dimensional data modelling, structured query language (SQL) skills, data
warehouse and reporting techniques
Preferred Skills
Required
Experience working in the SCRUM Environment.
Experience in Administration (Windows/Unix/Network/
plus.
Experience in SQL Server, SSIS, SSAS, SSRS
Comfortable with creating data models and visualization using Power BI
Hands on experience in relational and multi-dimensional data modelling,
including multiple source systems from databases and flat files, and the use of
standard data modelling tools
Ability to collaborate on a team with infrastructure, BI report development and
business analyst resources, and clearly communicate solutions to both
technical and non-technical team members
• Strong experience working with Big Data technologies like Spark (Scala/Java),
• Apache Solr, HIVE, HBase, ElasticSearch, MongoDB, Airflow, Oozie, etc.
• Experience working with Relational databases like MySQL, SQLServer, Oracle etc.
• Good understanding of large system architecture and design
• Experience working in AWS/Azure cloud environment is a plus
• Experience using Version Control tools such as Bitbucket/GIT code repository
• Experience using tools like Maven/Jenkins, JIRA
• Experience working in an Agile software delivery environment, with exposure to
continuous integration and continuous delivery tools
• Passionate about technology and delivering solutions to solve complex business
problems
• Great collaboration and interpersonal skills
• Ability to work with team members and lead by example in code, feature
development, and knowledge sharing
Technical/Core skills
- Minimum 3 yrs of exp in Informatica Big data Developer(BDM) in Hadoop environment.
- Have knowledge of informatica Power exchange (PWX).
- Minimum 3 yrs of exp in big data querying tool like Hive and Impala.
- Ability to designing/development of complex mappings using informatica Big data Developer.
- Create and manage Informatica power exchange and CDC real time implementation
- Strong Unix knowledge skills for writing shell scripts and troubleshoot of existing scripts.
- Good knowledge of big data platforms and its framework.
- Good to have an experience in cloudera data platform (CDP)
- Experience with building stream processing systems using Kafka and spark
- Excellent SQL knowledge
Soft skills :
- Ability to work independently
- Strong analytical and problem solving skills
- Attitude of learning new technology
- Regular interaction with vendors, partners and stakeholders
Title: Data Engineer (Azure) (Location: Gurgaon/Hyderabad)
Salary: Competitive as per Industry Standard
We are expanding our Data Engineering Team and hiring passionate professionals with extensive
knowledge and experience in building and managing large enterprise data and analytics platforms. We
are looking for creative individuals with strong programming skills, who can understand complex
business and architectural problems and develop solutions. The individual will work closely with the rest
of our data engineering and data science team in implementing and managing Scalable Smart Data
Lakes, Data Ingestion Platforms, Machine Learning and NLP based Analytics Platforms, Hyper-Scale
Processing Clusters, Data Mining and Search Engines.
What You’ll Need:
- 3+ years of industry experience in creating and managing end-to-end Data Solutions, Optimal
Data Processing Pipelines and Architecture dealing with large volume, big data sets of varied
data types.
- Proficiency in Python, Linux and shell scripting.
- Strong knowledge of working with PySpark dataframes, Pandas dataframes for writing efficient pre-processing and other data manipulation tasks.
● Strong experience in developing the infrastructure required for data ingestion, optimal
extraction, transformation, and loading of data from a wide variety of data sources using tools like Azure Data Factory, Azure Databricks (or Jupyter notebooks/ Google Colab) (or other similiar tools).
- Working knowledge of github or other version control tools.
- Experience with creating Restful web services and API platforms.
- Work with data science and infrastructure team members to implement practical machine
learning solutions and pipelines in production.
- Experience with cloud providers like Azure/AWS/GCP.
- Experience with SQL and NoSQL databases. MySQL/ Azure Cosmosdb / Hbase/MongoDB/ Elasticsearch etc.
- Experience with stream-processing systems: Spark-Streaming, Kafka etc and working experience with event driven architectures.
- Strong analytic skills related to working with unstructured datasets.
Good to have (to filter or prioritize candidates)
- Experience with testing libraries such as pytest for writing unit-tests for the developed code.
- Knowledge of Machine Learning algorithms and libraries would be good to have,
implementation experience would be an added advantage.
- Knowledge and experience of Datalake, Dockers and Kubernetes would be good to have.
- Knowledge of Azure functions , Elastic search etc will be good to have.
- Having experience with model versioning (mlflow) and data versioning will be beneficial
- Having experience with microservices libraries or with python libraries such as flask for hosting ml services and models would be great.
Datametica is looking for talented SQL engineers who would get training & the opportunity to work on Cloud and Big Data Analytics.
Mandatory Skills:
- Strong in SQL development
- Hands-on at least one scripting language - preferably shell scripting
- Development experience in Data warehouse projects
Opportunities:
- Selected candidates will be provided training opportunities on one or more of the following: Google Cloud, AWS, DevOps Tools, Big Data technologies like Hadoop, Pig, Hive, Spark, Sqoop, Flume, and KafkaWould get a chance to be part of the enterprise-grade implementation of Cloud and Big Data systems
- Will play an active role in setting up the Modern data platform based on Cloud and Big Data
- Would be part of teams with rich experience in various aspects of distributed systems and computing
- 1-5 years of experience in building and maintaining robust data pipelines, enriching data, low-latency/highly-performance data analytics applications.
- Experience handling complex, high volume, multi-dimensional data and architecting data products in streaming, serverless, and microservices-based Architecture and platform.
- Experience in Data warehousing, Data modeling, and Data architecture.
- Expert level proficiency with the relational and NoSQL databases.
- Expert level proficiency in Python, and PySpark.
- Familiarity with Big Data technologies and utilities (Spark, Hive, Kafka, Airflow).
- Familiarity with cloud services (preferable AWS)
- Familiarity with MLOps processes such as data labeling, model deployment, data-model feedback loop, data drift.
Key Roles/Responsibilities:
- Act as a technical leader for resolving problems, with both technical and non-technical audiences.
- Identifying and solving issues with data pipelines regarding consistency, integrity, and completeness.
- Lead data initiatives, architecture design discussions, and implementation of next-generation BI solutions.
- Partner with data scientists, tech architects to build advanced, scalable, efficient self-service BI infrastructure.
- Provide thought leadership and mentor data engineers in information presentation and delivery.
ETL Developer – Talend
Job Duties:
- ETL Developer is responsible for Design and Development of ETL Jobs which follow standards,
best practices and are maintainable, modular and reusable.
- Proficiency with Talend or Pentaho Data Integration / Kettle.
- ETL Developer will analyze and review complex object and data models and the metadata
repository in order to structure the processes and data for better management and efficient
access.
- Working on multiple projects, and delegating work to Junior Analysts to deliver projects on time.
- Training and mentoring Junior Analysts and building their proficiency in the ETL process.
- Preparing mapping document to extract, transform, and load data ensuring compatibility with
all tables and requirement specifications.
- Experience in ETL system design and development with Talend / Pentaho PDI is essential.
- Create quality rules in Talend.
- Tune Talend / Pentaho jobs for performance optimization.
- Write relational(sql) and multidimensional(mdx) database queries.
- Functional Knowledge of Talend Administration Center/ Pentaho data integrator, Job Servers &
Load balancing setup, and all its administrative functions.
- Develop, maintain, and enhance unit test suites to verify the accuracy of ETL processes,
dimensional data, OLAP cubes and various forms of BI content including reports, dashboards,
and analytical models.
- Exposure in Map Reduce components of Talend / Pentaho PDI.
- Comprehensive understanding and working knowledge in Data Warehouse loading, tuning, and
maintenance.
- Working knowledge of relational database theory and dimensional database models.
- Creating and deploying Talend / Pentaho custom components is an add-on advantage.
- Nice to have java knowledge.
Skills and Qualification:
- BE, B.Tech / MS Degree in Computer Science, Engineering or a related subject.
- Having an experience of 3+ years.
- Proficiency with Talend or Pentaho Data Integration / Kettle.
- Ability to work independently.
- Ability to handle a team.
- Good written and oral communication skills.