We are looking for a Data Engineer that will be responsible for collecting, storing, processing, and analyzing huge sets of data that is coming from different sources.
Working with Big Data tools and frameworks to provide requested capabilities Identify development needs in order to improve and streamline operations Develop and manage BI solutions Implementing ETL process and Data Warehousing Monitoring performance and managing infrastructure
Proficient understanding of distributed computing principles Proficiency with Hadoop and Spark Experience with building stream-processing systems, using solutions such as Kafka and Spark-Streaming Good knowledge of Data querying tools SQL and Hive Knowledge of various ETL techniques and frameworks Experience with Python/Java/Scala (at least one) Experience with cloud services such as AWS or GCP Experience with NoSQL databases, such as DynamoDB,MongoDB will be an advantage Excellent written and verbal communication skills
About Yulu Bikes Pvt Ltd
Smart and Sustainable Micro Mobility for Urban India.
A company driven by the passion to transform the mobility landscape, Yulu truly believes in achieving the impossible. We have carefully handpicked individuals who place their confidence and commitment in the vision of Yulu - to reduce traffic congestion and do so in a sustainable way. Yuluites challenge the everyday routine and work hard to create meaningful solutions for making mobility a seamless experience for everyone.
If you’re passionate about technology, the environment and changing the way the world moves, we’d love to welcome you! Join Us. And together we'll reinvent every city's mobility landscape.
We are currently seeking talented and highly motivated Data Engineers to lead in the development of our discovery and support platform. The successful candidate will join a small, global team of data focused associates that have successfully built, and maintained a best of class traditional, Kimball based, SQL server founded, data warehouse. The successful candidate will lead the conversion of the existing data structure into an AWS focused, big data framework and assist in identifying and pipelining existing and augmented data sets into this environment. The successful candidate must be able to lead and assist in architecting and constructing the AWS foundation and initial data ports.
Specific responsibilities will be to:
- Lead and assist in design, deploy, and maintain robust methods for data management and analysis, primarily using the AWS cloud
- Develop computational methods for integrating multiple data sources to facilitate target and algorithmic
- Provide computational tools to ensure trustworthy data sources and facilitate reproducible
- Provide leadership around architecting, designing, and building target AWS data environment (like data lake and data warehouse).
- Work with on staff subject-matter experts to evaluate existing data sources, DW, ETL ports, existing stove type data sources and available augmentation data sets.
- Implement methods for execution of high-throughput assays and subsequent acquisition, management, and analysis of the
- Assist in the communications of complex scientific, software and data concepts and
- Assist in the identification and hiring of additional data engineer associates.
- Master’s Degree (or equivalent experience) in computer science, data science or a scientific field that has relevance to healthcare in the United States
- Extensive experience in the use of a high-level programming language (i.e., Python or Scala) and relevant AWS services.
- Experience in AWS cloud services like S3, Glue, Lake Formation, Athena, and others.
- Experience in creating and managing Data Lakes and Data Warehouses.
- Experience with big data tools like Hadoop, Hive, Talend, Apache Spark, Kafka.
- Advance SQL scripting.
- Database Management Systems (for example, Oracle, MySQL or MS SQL Server)
- Hands on experience in data transformation tools, data processing and data modeling on a big data environment.
- Understanding the basics of distributed systems.
- Experience working and communicating with subject matter expert
- The ability to work independently as well as to collaborate on multidisciplinary, global teams in a startup fashion with traditional data warehouse skilled data associates and business teams unfamiliar with data science techniques
- Strong communication, data presentation and visualization
- Role: Machine Learning Lead
- Experience: 5+ Years
- Employee strength: 80+
- Remuneration: Most competitive in the market
• Advance knowledge of Python.
• Object Oriented Programming skills.
• Mathematical understanding of machine learning and deep learning algorithms.
• Thorough grasp on statistical terminologies.
• Libraries: Tensorflow, Keras, Pytorch, Statsmodels, Scikit-learn, SciPy, Numpy, Pandas, Matplotlib, Seaborn, Plotly
• Algorithms: Ensemble Algorithms, Artificial Neural Networks and Deep Learning, Clustering Algorithms, Decision Tree Algorithms, Dimensionality Reduction Algorithms, etc.
• MySQL, MongoDB, ElasticSearch or other NoSQL database implementations.
If interested kindly share your cv at tanya @tigihr. com
- Experience in AWS Glue
- Experience in Apache Parquet
- Proficient in AWS S3 and data lake
- Knowledge of Snowflake
- Understanding of file-based ingestion best practices.
- Scripting language - Python & pyspark
- Create and manage cloud resources in AWS
- Data ingestion from different data sources which exposes data using different technologies, such as: RDBMS, REST HTTP API, flat files, Streams, and Time series data based on various proprietary systems. Implement data ingestion and processing with the help of Big Data technologies
- Data processing/transformation using various technologies such as Spark and Cloud Services. You will need to understand your part of business logic and implement it using the language supported by the base data platform
- Develop automated data quality check to make sure right data enters the platform and verifying the results of the calculations
- Develop an infrastructure to collect, transform, combine and publish/distribute customer data.
- Define process improvement opportunities to optimize data collection, insights and displays.
- Ensure data and results are accessible, scalable, efficient, accurate, complete and flexible
- Identify and interpret trends and patterns from complex data sets
- Construct a framework utilizing data visualization tools and techniques to present consolidated analytical and actionable results to relevant stakeholders.
- Key participant in regular Scrum ceremonies with the agile teams
- Proficient at developing queries, writing reports and presenting findings
- Mentor junior members and bring best industry practices
- 5-7+ years’ experience as data engineer in consumer finance or equivalent industry (consumer loans, collections, servicing, optional product, and insurance sales)
- Strong background in math, statistics, computer science, data science or related discipline
- Advanced knowledge one of language: Java, Scala, Python, C#
- Production experience with: HDFS, YARN, Hive, Spark, Kafka, Oozie / Airflow, Amazon Web Services (AWS), Docker / Kubernetes, Snowflake
- Proficient with
- Data mining/programming tools (e.g. SAS, SQL, R, Python)
- Database technologies (e.g. PostgreSQL, Redshift, Snowflake. and Greenplum)
- Data visualization (e.g. Tableau, Looker, MicroStrategy)
- Comfortable learning about and deploying new technologies and tools.
- Organizational skills and the ability to handle multiple projects and priorities simultaneously and meet established deadlines.
- Good written and oral communication skills and ability to present results to non-technical audiences
- Knowledge of business intelligence and analytical tools, technologies and techniques.
Familiarity and experience in the following is a plus:
- AWS certification
- Spark Streaming
- Kafka Streaming / Kafka Connect
- ELK Stack
- Cassandra / MongoDB
- CI/CD: Jenkins, GitLab, Jira, Confluence other related tools
SteelEye is the only regulatory compliance technology and data analytics firm that offers transaction reporting, record keeping, trade reconstruction, best execution and data insight in one comprehensive solution. The firm’s scalable secure data storage platform offers encryption at rest and in flight and best-in-class analytics to help financial firms meet regulatory obligations and gain competitive advantage.
The company has a highly experienced management team and a strong board, who have decades of technology and management experience and worked in senior positions at many leading international financial businesses. We are a young company that shares a commitment to learning, being smart, working hard and being honest in all we do and striving to do that better each day. We value all our colleagues equally and everyone should feel able to speak up, propose an idea, point out a mistake and feel safe, happy and be themselves at work.
Being part of a start-up can be equally exciting as it is challenging. You will be part of the SteelEye team not just because of your talent but also because of your entrepreneurial flare which we thrive on at SteelEye. This means we want you to be curious, contribute, ask questions and share ideas. We encourage you to get involved in helping shape our business. What you'll do
What you will do?
- Deliver plugins for our python based ETL pipelines.
- Deliver python services for provisioning and managing cloud infrastructure.
- Design, Develop, Unit Test, and Support code in production.
- Deal with challenges associated with large volumes of data.
- Manage expectations with internal stakeholders and context switch between multiple deliverables as priorities change.
- Thrive in an environment that uses AWS and Elasticsearch extensively.
- Keep abreast of technology and contribute to the evolution of the product.
- Champion best practices and provide mentorship.
What we're looking for
- Python 3.
- Python libraries used for data (such as pandas, numpy).
- Performance tuning.
- Object Oriented Design and Modelling.
- Delivering complex software, ideally in a FinTech setting.
- CI/CD tools.
- Knowledge of design patterns.
- Sharp analytical and problem-solving skills.
- Strong sense of ownership.
- Demonstrable desire to learn and grow.
- Excellent written and oral communication skills.
- Mature collaboration and mentoring abilities.
What will you get?
- This is an individual contributor role. So, if you are someone who loves to code and solve complex problems and build amazing products and not worry about anything else, this is the role for you.
- You will have the chance to learn from the best in the business who have worked across the world and are technology geeks.
- Company that always appreciates ownership and initiative. If you are someone who is full of ideas, this role is for you.
About the Company:
This opportunity is for an AI Drone Technology startup funded by the Indian Army. It is working to develop cutting-edge products to help the Indian Army gain an edge in New Age Enemy Warfare.
They are working on using drones to neutralize terrorists hidden in deep forests. Get a chance to contribute to secure our borders against the enemy.
- Extensive knowledge in machine learning and deep learning techniques
- Solid background in image processing/computer vision
- Experience in building datasets for computer vision tasks
- Experience working with and creating data structures/architectures
- Proficiency in at least one major machine learning framework such as Tensorflow, Pytorch
- Experience visualizing data to stakeholders
- Ability to analyze and debug complex algorithms
- Highly skilled in Python scripting language
- Creativity and curiosity for solving highly complex problems
- Excellent communication and collaboration skills
MS in Engineering, Applied Mathematics, Data Science, Computer Science or equivalent field, with 3 years industry experience, a PhD degree or equivalent industry experience.
- Experience with relational SQL & NoSQL databases including MySQL & MongoDB.
- Familiar with the basic principles of distributed computing and data modeling.
- Experience with distributed data pipeline frameworks like Celery, Apache Airflow, etc.
- Experience with NLP and NER models is a bonus.
- Experience building reusable code and libraries for future use.
- Experience building REST APIs.
Preference for candidates working in tech product companies
Job Role : Associate Manager (Database Development)
- Optimizing performances of many stored procedures, SQL queries to deliver big amounts of data under a few seconds.
- Designing and developing numerous complex queries, views, functions, and stored procedures
- to work seamlessly with the Application/Development team’s data needs.
- Responsible for providing solutions to all data related needs to support existing and new
- Creating scalable structures to cater to large user bases and manage high workloads
- Responsible in every step from the beginning stages of the projects from requirement gathering to implementation and maintenance.
- Developing custom stored procedures and packages to support new enhancement needs.
- Working with multiple teams to design, develop and deliver early warning systems.
- Reviewing query performance and optimizing code
- Writing queries used for front-end applications
- Designing and coding database tables to store the application data
- Data modelling to visualize database structure
- Working with application developers to create optimized queries
- Maintaining database performance by troubleshooting problems.
- Accomplishing platform upgrades and improvements by supervising system programming.
- Securing database by developing policies, procedures, and controls.
- Designing and managing deep statistical systems.
Desired Skills and Experience :
- 7+ years of experience in database development
- Minimum 4+ years of experience in PostgreSQL is a must
- Experience and in-depth knowledge in PL/SQL
- Ability to come up with multiple possible ways of solving a problem and deciding on the most optimal approach for implementation that suits the work case the most
- Have knowledge of Database Administration and have the ability and experience of using the CLI tools for administration
- Experience in Big Data technologies is an added advantage
- Secondary platforms: MS SQL 2005/2008, Oracle, MySQL
- Ability to take ownership of tasks and flexibility to work individually or in team
- Ability to communicate with teams and clients across time zones and global regions
- Good communication and self-motivated
- Should have the ability to work under pressure
- Knowledge of NoSQL and Cloud Architecture will be an advantage
- Exploring and visualizing data to gain an understanding of it, then identifying differences in data distribution that could affect performance when deploying the model in the real world.
- Verifying data quality, and/or ensuring it via data cleaning.
- Able to adapt and work fast in producing the output which upgrades the decision making of stakeholders using ML.
- To design and develop Machine Learning systems and schemes.
- To perform statistical analysis and fine-tune models using test results.
- To train and retrain ML systems and models as and when necessary.
- To deploy ML models in production and maintain the cost of cloud infrastructure.
- To develop Machine Learning apps according to client and data scientist requirements.
- To analyze the problem-solving capabilities and use-cases of ML algorithms and rank them by how successful they are in meeting the objective.
- Worked with real time problems, solved them using ML and deep learning models deployed in real time and should have some awesome projects under his belt to showcase.
- Proficiency in Python and experience with working with Jupyter Framework, Google collab and cloud hosted notebooks such as AWS sagemaker, DataBricks etc.
- Proficiency in working with libraries Sklearn, Tensorflow, Open CV2, Pyspark, Pandas, Numpy and related libraries.
- Expert in visualising and manipulating complex datasets.
- Proficiency in working with visualisation libraries such as seaborn, plotly, matplotlib etc.
- Proficiency in Linear Algebra, statistics and probability required for Machine Learning.
- Proficiency in ML Based algorithms for example, Gradient boosting, stacked Machine learning, classification algorithms and deep learning algorithms. Need to have experience in hypertuning various models and comparing the results of algorithm performance.
- Big data Technologies such as Hadoop stack and Spark.
- Basic use of clouds (VM’s example EC2).
- Brownie points for Kubernetes and Task Queues.
- Strong written and verbal communications.
- Experience working in an Agile environment.