Qualifications & Experience:
▪ 2 - 4 years overall experience in ETLs, data pipeline, Data Warehouse development and database design
▪ Software solution development using Hadoop Technologies such as MapReduce, Hive, Spark, Kafka, Yarn/Mesos etc.
▪ Expert in SQL, worked on advanced SQL for at least 2+ years
▪ Good development skills in Java, Python or other languages
▪ Experience with EMR, S3
▪ Knowledge and exposure to BI applications, e.g. Tableau, Qlikview
▪ Comfortable working in an agile environment
About Gipfel & Schnell Consultings Pvt Ltd
Similar jobs
Roles and Responsibilities:
• Work with stakeholders across the organization to identify opportunities for leveraging company data to drive business solutions.
• Mine and analyze data from company databases to drive optimization and improvement of processes, marketing techniques and business strategies.
• Develop custom data models and algorithms to apply to data sets.
• Use predictive modeling to increase and optimize the business process and solutions.
• Research and development of AI algorithms and their applicability in business-related problems to build intelligent systems
Desired Profile:
• Need to have experience in data analytics, descriptive analytics and predictive analytics
• Need to have experience with regression/classification models: Linear Regression, Logistic Regression, Decision Trees, Random Forest, Neural Networks and ML Models
• Need to have experience coding in Python, R, SQL, Java, C and C++ • Experience using AI/ML tools available from cloud service providers like AWS/AZURE/GCP including TensorFlow, SageMaker and Azure ML • Experience with Data Visualization
• Strong problem-solving skills with an emphasis on product development.
• Knowledge of advanced statistical techniques and concepts (regression, properties of distributions, statistical tests and proper usage, etc.) and experience with applications.
• Proven track record of supporting global clients or internal stakeholders in data science projects.
• Technical knowledge in Artificial Intelligence (AI), image processing and/or video analytics • Good understanding of the latest research and technologies in Artificial Intelligence
Good to Have:
• Need to have background/education/experience in Statistics • Experience in working across multiple geographic borders and time zones
• Outstanding spoken and written communication skills: able to deal confidently, tactfully and appropriately with people of different disciplines and at all levels of the organization.
• Communicate technical concepts effectively a to non-technical audiences
• Able to build strong relationships with multiple global stakeholders quickly via virtual tools, listen, understand and respond to any concerns.
• Comfortable working within a large and complex environment with multiple stakeholders and interest groups. Ability to motivate and influence others to achieve results.
We are seeking an experienced Senior Data Platform Engineer to join our team. The ideal candidate should have extensive experience with Pyspark, Airflow, Presto, Hive, Kafka and Debezium, and should be passionate about developing scalable and reliable data platforms.
Responsibilities:
- Design, develop, and maintain our data platform architecture using Pyspark, Airflow, Presto, Hive, Kafka, and Debezium.
- Develop and maintain ETL processes to ingest, transform, and load data from various sources into our data platform.
- Work closely with data analysts, data scientists, and other stakeholders to understand their requirements and design solutions that meet their needs.
- Implement and maintain data governance policies and procedures to ensure data quality, privacy, and security.
- Continuously monitor and optimize the performance of our data platform to ensure scalability, reliability, and cost-effectiveness.
- Keep up-to-date with the latest trends and technologies in the field of data engineering and share knowledge and best practices with the team.
Requirements:
- Bachelor's degree in Computer Science, Information Technology, or related field.
- 5+ years of experience in data engineering or related fields.
- Strong proficiency in Pyspark, Airflow, Presto, Hive, Datalake, and Debezium.
- Experience with data warehousing, data modeling, and data governance.
- Experience working with large-scale distributed systems and cloud platforms (e.g., AWS, GCP, Azure).
- Strong problem-solving skills and ability to work independently and collaboratively.
- Excellent communication and interpersonal skills.
If you are a self-motivated and driven individual with a passion for data engineering and a strong background in Pyspark, Airflow, Presto, Hive, Datalake, and Debezium, we encourage you to apply for this exciting opportunity. We offer competitive compensation, comprehensive benefits, and a collaborative work environment that fosters innovation and growth.
- Data Engineer
Required skill set: AWS GLUE, AWS LAMBDA, AWS SNS/SQS, AWS ATHENA, SPARK, SNOWFLAKE, PYTHON
Mandatory Requirements
- Experience in AWS Glue
- Experience in Apache Parquet
- Proficient in AWS S3 and data lake
- Knowledge of Snowflake
- Understanding of file-based ingestion best practices.
- Scripting language - Python & pyspark
CORE RESPONSIBILITIES
- Create and manage cloud resources in AWS
- Data ingestion from different data sources which exposes data using different technologies, such as: RDBMS, REST HTTP API, flat files, Streams, and Time series data based on various proprietary systems. Implement data ingestion and processing with the help of Big Data technologies
- Data processing/transformation using various technologies such as Spark and Cloud Services. You will need to understand your part of business logic and implement it using the language supported by the base data platform
- Develop automated data quality check to make sure right data enters the platform and verifying the results of the calculations
- Develop an infrastructure to collect, transform, combine and publish/distribute customer data.
- Define process improvement opportunities to optimize data collection, insights and displays.
- Ensure data and results are accessible, scalable, efficient, accurate, complete and flexible
- Identify and interpret trends and patterns from complex data sets
- Construct a framework utilizing data visualization tools and techniques to present consolidated analytical and actionable results to relevant stakeholders.
- Key participant in regular Scrum ceremonies with the agile teams
- Proficient at developing queries, writing reports and presenting findings
- Mentor junior members and bring best industry practices
QUALIFICATIONS
- 5-7+ years’ experience as data engineer in consumer finance or equivalent industry (consumer loans, collections, servicing, optional product, and insurance sales)
- Strong background in math, statistics, computer science, data science or related discipline
- Advanced knowledge one of language: Java, Scala, Python, C#
- Production experience with: HDFS, YARN, Hive, Spark, Kafka, Oozie / Airflow, Amazon Web Services (AWS), Docker / Kubernetes, Snowflake
- Proficient with
- Data mining/programming tools (e.g. SAS, SQL, R, Python)
- Database technologies (e.g. PostgreSQL, Redshift, Snowflake. and Greenplum)
- Data visualization (e.g. Tableau, Looker, MicroStrategy)
- Comfortable learning about and deploying new technologies and tools.
- Organizational skills and the ability to handle multiple projects and priorities simultaneously and meet established deadlines.
- Good written and oral communication skills and ability to present results to non-technical audiences
- Knowledge of business intelligence and analytical tools, technologies and techniques.
Familiarity and experience in the following is a plus:
- AWS certification
- Spark Streaming
- Kafka Streaming / Kafka Connect
- ELK Stack
- Cassandra / MongoDB
- CI/CD: Jenkins, GitLab, Jira, Confluence other related tools
We are looking for a Quantitative Developer who is passionate about financial markets and wants to join a scale-up with an excellent track record and growth potential in an innovative and fast-growing industry.
As a Quantitative Developer, you will be working on the infrastructure of our platform,as part of a very ambitious team.
At QCAlpha you have the freedom to choose the path that leads to the solution and get a lot of responsibility.
Responsibilities
• Design, develop, test, and deploy elegant software solutions for automated trading systems
• Building high-performance, bullet-proof components for both live trading and simulation
• Responsible for technology infrastructure systems development, which includes connectivity, maintenance, and internal automation processes
• Achieving trading system robustness through automated reconciliation and system-wide alerts
Requirements
• Bachelor’s degree or higher in computer science or other quantitative discipline
• Strong fundamental knowledge of OOP programming, algorithms, data structures and design patterns.
• Familiar with the following technology stacks: Linux shell, Python and its ecosystem, NumPy, Pandas, SQL, Redis, Docker or similar system
• Experience in python frameworks such as Django or Flask.
• Solid understanding of git, ci/cd.
• Excellent design, debugging and problem-solving skills.
• Proven versatility and ability to pick up new technologies and learn systems quickly.
• Trading Execution development and support experience is a plus.
Job Responsibilities:-
- Develop robust, scalable and maintainable machine learning models to answer business problems against large data sets.
- Build methods for document clustering, topic modeling, text classification, named entity recognition, sentiment analysis, and POS tagging.
- Perform elements of data cleaning, feature selection and feature engineering and organize experiments in conjunction with best practices.
- Benchmark, apply, and test algorithms against success metrics. Interpret the results in terms of relating those metrics to the business process.
- Work with development teams to ensure models can be implemented as part of a delivered solution replicable across many clients.
- Knowledge of Machine Learning, NLP, Document Classification, Topic Modeling and Information Extraction with a proven track record of applying them to real problems.
- Experience working with big data systems and big data concepts.
- Ability to provide clear and concise communication both with other technical teams and non-technical domain specialists.
- Strong team player; ability to provide both a strong individual contribution but also work as a team and contribute to wider goals is a must in this dynamic environment.
- Experience with noisy and/or unstructured textual data.
knowledge graph and NLP including summarization, topic modelling etc
- Strong coding ability with statistical analysis tools in Python or R, and general software development skills (source code management, debugging, testing, deployment, etc.)
- Working knowledge of various text mining algorithms and their use-cases such as keyword extraction, PLSA, LDA, HMM, CRF, deep learning & recurrent ANN, word2vec/doc2vec, Bayesian modeling.
- Strong understanding of text pre-processing and normalization techniques, such as tokenization,
- POS tagging and parsing and how they work at a low level.
- Excellent problem solving skills.
- Strong verbal and written communication skills
- Masters or higher in data mining or machine learning; or equivalent practical analytics / modelling experience
- Practical experience in using NLP related techniques and algorithms
- Experience in open source coding and communities desirable.
Able to containerize Models and associated modules and work in a Microservices environment
- 5+ years of experience in a Data Engineering role on cloud environment
- Must have good experience in Scala/PySpark (preferably on data-bricks environment)
- Extensive experience with Transact-SQL.
- Experience in Data-bricks/Spark.
- Strong experience in Dataware house projects
- Expertise in database development projects with ETL processes.
- Manage and maintain data engineering pipelines
- Develop batch processing, streaming and integration solutions
- Experienced in building and operationalizing large-scale enterprise data solutions and applications
- Using one or more of Azure data and analytics services in combination with custom solutions
- Azure Data Lake, Azure SQL DW (Synapse), and SQL Database products or equivalent products from other cloud services providers
- In-depth understanding of data management (e. g. permissions, security, and monitoring).
- Cloud repositories for e.g. Azure GitHub, Git
- Experience in an agile environment (Prefer Azure DevOps).
Good to have
- Manage source data access security
- Automate Azure Data Factory pipelines
- Continuous Integration/Continuous deployment (CICD) pipelines, Source Repositories
- Experience in implementing and maintaining CICD pipelines
- Power BI understanding, Delta Lake house architecture
- Knowledge of software development best practices.
- Excellent analytical and organization skills.
- Effective working in a team as well as working independently.
- Strong written and verbal communication skills.
- Expertise in database development projects and ETL processes.
Science, Statistics, Informatics, Information Systems or another quantitative field. They should also have experience using the following software/tools:
● Experience with big data tools: Hive/Hadoop, Spark, Kafka, Hive etc.
● Experience with querying multiple databases SQL/NoSQL, including
Oracle, MySQL and MongoDB etc.
● Experience in Redis, RabbitMQ, Elastic Search is desirable.
● Strong Experience with object-oriented/functional/ scripting languages:
Python(preferred), Core Java, Java Script, Scala, Shell Scripting etc.
● Must have debugging complex code skills, experience on ML/AI
algorithms is a plus.
● Experience in version control tool Git or any is mandatory.
● Experience with AWS cloud services: EC2, EMR, RDS, Redshift, S3
● Experience with stream-processing systems: Storm, Spark-Streaming,
etc
This is the first senior person we are bringing for this role. This person will start with the training program but will go on to build a team and eventually also be responsible for the entire training program + Bootcamp.
We are looking for someone fairly senior and has experience in data + tech. At some level, we have all the technical expertise to teach you the data stack as needed. So it's not super important you know all the tools. However, having basic knowledge of the stack requirement. The training program covers 2 parts - Technology (our stack) and Process (How we work with clients). Both of which are super important.
- Full-time flexible working schedule and own end-to-end training
- Self-starter - who can communicate effectively and proactively
- Function effectively with minimal supervision.
- You can train and mentor potential 5x engineers on Data Engineering skillsets
- You can spend time on self-learning and teaching for new technology when needed
- You are an extremely proactive communicator, who understands the challenges of remote/virtual classroom training and the need to over-communicate to offset those challenges.
Requirements
- Proven experience as a corporate trainer or have passion for Teaching/ Providing Training
- Expertise in Data Engineering Space and have good experience in Data Collection, Data
- Ingestion, Data Modeling, Data Transformation, and Data Visualization technologies and techniques
- Experience Training working professionals on in-demand skills like Snowflake, debt, Fivetran, google data studio, etc.
- Training/Implementation Experience using Fivetran, DBT Cloud, Heap, Segment, Airflow, Snowflake is a big plus
- Insurance P&C and Specialty domain experience a plus
- Experience in a cloud-based architecture preferred, such as Databricks, Azure Data Lake, Azure Data Factory, etc.
- Strong understanding of ETL fundamentals and solutions. Should be proficient in writing advanced / complex SQL, expertise in performance tuning and optimization of SQL queries required.
- Strong experience in Python/PySpark and Spark SQL
- Experience in troubleshooting data issues, analyzing end to end data pipelines, and working with various teams in resolving issues and solving complex problems.
- Strong experience developing Spark applications using PySpark and SQL for data extraction, transformation, and aggregation from multiple formats for analyzing & transforming the data to uncover insights and actionable intelligence for internal and external use
- Must have 5-8 years of experience in handling data
- Must have the ability to interpret large amounts of data and to multi-task
- Must have strong knowledge of and experience with programming (Python), Linux/Bash scripting, databases(SQL, etc)
- Must have strong analytical and critical thinking to resolve business problems using data and tech
- Must have domain familiarity and interest of – Cloud technologies (GCP/Azure Microsoft/ AWS Amazon), open-source technologies, Enterprise technologies
- Must have the ability to collect, organize, analyze, and disseminate significant amounts of information with attention to detail and accuracy.
- Must have good communication skills
- Working knowledge/exposure to ElasticSearch, PostgreSQL, Athena, PrestoDB, Jupyter Notebook