Data Analyst
Job Description
Summary
Are you passionate about handling large & complex data problems, want to make an impact and have the desire to work on ground-breaking big data technologies? Then we are looking for you.
At Amagi, great ideas have a way of becoming great products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish. Would you like to work in a fast-paced environment where your technical abilities will be challenged on a day-to-day basis? If so, Amagi’s Data Engineering and Business Intelligence team is looking for passionate, detail-oriented, technical savvy, energetic team members who like to think outside the box.
Amagi’s Data warehouse team deals with petabytes of data catering to a wide variety of real-time, near real-time and batch analytical solutions. These solutions are an integral part of business functions such as Sales/Revenue, Operations, Finance, Marketing and Engineering, enabling critical business decisions. Designing, developing, scaling and running these big data technologies using native technologies of AWS and GCP are a core part of our daily job.
Key Qualifications
- Experience in building highly cost optimised data analytics solutions
- Experience in designing and building dimensional data models to improve accessibility, efficiency and quality of data
- Experience (hands on) in building high quality ETL applications, data pipelines and analytics solutions ensuring data privacy and regulatory compliance.
- Experience in working with AWS or GCP
- Experience with relational and NoSQL databases
- Experience to full stack web development (Preferably Python)
- Expertise with data visualisation systems such as Tableau and Quick Sight
- Proficiency in writing advanced SQL queries with expertise in performance tuning handling large data volumes
- Familiarity with ML/AÍ technologies is a plus
- Demonstrate strong understanding of development processes and agile methodologies
- Strong analytical and communication skills. Should be self-driven, highly motivated and ability to learn quickly
Description
Data Analytics is at the core of our work, and you will have the opportunity to:
- Design Data-warehousing solutions on Amazon S3 with Athena, Redshift, GCP Bigtable etc
- Lead quick prototypes by integrating data from multiple sources
- Do advanced Business Analytics through ad-hoc SQL queries
- Work on Sales Finance reporting solutions using tableau, HTML5, React applications
We build amazing experiences and create depth in knowledge for our internal teams and our leadership. Our team is a friendly bunch of people that help each other grow and have a passion for technology, R&D, modern tools and data science.
Our work relies on deep understanding of the company needs and an ability to go through vast amounts of internal data such as sales, KPIs, forecasts, Inventory etc. One of the key expectations of this role would be to do data analytics, building data lakes, end to end reporting solutions etc. If you have a passion for cost optimised analytics and data engineering and are eager to learn advanced data analytics at a large scale, this might just be the job for you..
Education & Experience
A bachelor’s/master’s degree in Computer Science with 5 to 7 years of experience and previous experience in data engineering is a plus.
Similar jobs
Job Description: Data Engineer
We are looking for a curious Data Engineer to join our extremely fast-growing Tech Team at StanPlus
About RED.Health (Formerly Stanplus Technologies)
Get to know the team:
Join our team and help us build the world’s fastest and most reliable emergency response system using cutting-edge technology.
Because every second counts in an emergency, we are building systems and flows with 4 9s of reliability to ensure that our technology is always there when people need it the most. We are looking for distributed systems experts who can help us perfect the architecture behind our key design principles: scalability, reliability, programmability, and resiliency. Our system features a powerful dispatch engine that connects emergency service providers with patients in real-time
.
Key Responsibilities
● Build Data ETL Pipelines
● Develop data set processes
● Strong analytic skills related to working with unstructured datasets
● Evaluate business needs and objectives
● Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery
● Interpret trends and patterns
● Work with data and analytics experts to strive for greater functionality in our data system
● Build algorithms and prototypes
● Explore ways to enhance data quality and reliability
● Work with the Executive, Product, Data, and D esign teams, to assist with data-related technical issues and support their data infrastructure needs.
● Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency, and other key business performance metrics.
Key Requirements
● Proven experience as a data engineer, software developer, or similar of at least 3 years.
● Bachelor's / Master’s degree in data engineering, big data analytics, computer engineering, or related field.
● Experience with big data tools: Hadoop, Spark, Kafka, etc.
● Experience with relational SQL and NoSQL databases, including Postgres and Cassandra.
● Experience with data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc.
● Experience with Azure, AWS cloud services: EC2, EMR, RDS, Redshift
● Experience with BigQuery
● Experience with stream-processing systems: Storm, Spark-Streaming, etc.
● Experience with languages: Python, Java, C++, Scala, SQL, R, etc.
● Good hands-on with Hive, Presto.
Roles & Responsibilities:
-Adopt novel and breakthrough Deep Learning/Machine Learning technology to fully solve real world problems for different industries. -Develop prototypes of machine learning models based on existing research papers.
-Utilize published/existing models to meet business requirements. Tweak existing implementations to improve efficiencies and adapt for use-case variations.
-Optimize machine learning model training and inference time. -Work closely with development and QA teams in transitioning prototypes to commercial products
-Independently work end-to-end from data collection, preparation/annotation to validation of outcomes.
-Define and develop ML infrastructure to improve efficiency of ML development workflows.
Must Have:
- Experience in productizing and deployment of ML solutions.
- AI/ML expertise areas: Computer Vision with Deep Learning. Experience with object detection, classification, recognition; document layout and understanding tasks, OCR/ICR
. - Thorough understanding of full ML pipeline, starting from data collection to model building to inference.
- Experience with Python, OpenCV and at least a few framework/libraries (TensorFlow / Keras / PyTorch / spaCy / fastText / Scikit-learn etc.)
- Years with relevant experience:
5+ -Experience or Knowledge in ML OPS.
Good to Have: NLP: Text classification, entity extraction, content summarization. AWS, Docker.
Technical/Core skills
- Minimum 3 yrs of exp in Informatica Big data Developer(BDM) in Hadoop environment.
- Have knowledge of informatica Power exchange (PWX).
- Minimum 3 yrs of exp in big data querying tool like Hive and Impala.
- Ability to designing/development of complex mappings using informatica Big data Developer.
- Create and manage Informatica power exchange and CDC real time implementation
- Strong Unix knowledge skills for writing shell scripts and troubleshoot of existing scripts.
- Good knowledge of big data platforms and its framework.
- Good to have an experience in cloudera data platform (CDP)
- Experience with building stream processing systems using Kafka and spark
- Excellent SQL knowledge
Soft skills :
- Ability to work independently
- Strong analytical and problem solving skills
- Attitude of learning new technology
- Regular interaction with vendors, partners and stakeholders
- Handling Survey Scripting Process through the use of survey software platform such as Toluna, QuestionPro, Decipher.
- Mining large & complex data sets using SQL, Hadoop, NoSQL or Spark.
- Delivering complex consumer data analysis through the use of software like R, Python, Excel and etc such as
- Working on Basic Statistical Analysis such as:T-Test &Correlation
- Performing more complex data analysis processes through Machine Learning technique such as:
- Classification
- Regression
- Clustering
- Text
- Analysis
- Neural Networking
- Creating an Interactive Dashboard Creation through the use of software like Tableau or any other software you are able to use.
- Working on Statistical and mathematical modelling, application of ML and AI algorithms
What you need to have:
- Bachelor or Master's degree in highly quantitative field (CS, machine learning, mathematics, statistics, economics) or equivalent experience.
- An opportunity for one, who is eager of proving his or her data analytical skills with one of the Biggest FMCG market player.
About the Company
Blue Sky Analytics is a Climate Tech startup that combines the power of AI & Satellite data to aid in the creation of a global environmental data stack. Our funders include Beenext and Rainmatter. Over the next 12 months, we aim to expand to 10 environmental data-sets spanning water, land, heat, and more!
We are looking for a data scientist to join its growing team. This position will require you to think and act on the geospatial architecture and data needs (specifically geospatial data) of the company. This position is strategic and will also require you to collaborate closely with data engineers, data scientists, software developers and even colleagues from other business functions. Come save the planet with us!
Your Role
Manage: It goes without saying that you will be handling large amounts of image and location datasets. You will develop dataframes and automated pipelines of data from multiple sources. You are expected to know how to visualize them and use machine learning algorithms to be able to make predictions. You will be working across teams to get the job done.
Analyze: You will curate and analyze vast amounts of geospatial datasets like satellite imagery, elevation data, meteorological datasets, openstreetmaps, demographic data, socio-econometric data and topography to extract useful insights about the events happening on our planet.
Develop: You will be required to develop processes and tools to monitor and analyze data and its accuracy. You will develop innovative algorithms which will be useful in tracking global environmental problems like depleting water levels, illegal tree logging, and even tracking of oil-spills.
Demonstrate: A familiarity with working in geospatial libraries such as GDAL/Rasterio for reading/writing of data, and use of QGIS in making visualizations. This will also extend to using advanced statistical techniques and applying concepts like regression, properties of distribution, and conduct other statistical tests.
Produce: With all the hard work being put into data creation and management, it has to be used! You will be able to produce maps showing (but not limited to) spatial distribution of various kinds of data, including emission statistics and pollution hotspots. In addition, you will produce reports that contain maps, visualizations and other resources developed over the course of managing these datasets.
Requirements
These are must have skill-sets that we are looking for:
- Excellent coding skills in Python (including deep familiarity with NumPy, SciPy, pandas).
- Significant experience with git, GitHub, SQL, AWS (S3 and EC2).
- Worked on GIS and is familiar with geospatial libraries such as GDAL and rasterio to read/write the data, a GIS software such as QGIS for visualisation and query, and basic machine learning algorithms to make predictions.
- Demonstrable experience implementing efficient neural network models and deploying them in a production environment.
- Knowledge of advanced statistical techniques and concepts (regression, properties of distributions, statistical tests and proper usage, etc.) and experience with applications.
- Capable of writing clear and lucid reports and demystifying data for the rest of us.
- Be curious and care about the planet!
- Minimum 2 years of demonstrable industry experience working with large and noisy datasets.
Benefits
- Work from anywhere: Work by the beach or from the mountains.
- Open source at heart: We are building a community where you can use, contribute and collaborate on.
- Own a slice of the pie: Possibility of becoming an owner by investing in ESOPs.
- Flexible timings: Fit your work around your lifestyle.
- Comprehensive health cover: Health cover for you and your dependents to keep you tension free.
- Work Machine of choice: Buy a device and own it after completing a year at BSA.
- Quarterly Retreats: Yes there's work-but then there's all the non-work+fun aspect aka the retreat!
- Yearly vacations: Take time off to rest and get ready for the next big assignment by availing the paid leaves.
ears of Exp: 3-6+ Years
Skills: Scala, Python, Hive, Airflow, SparkLanguages: Java, Python, Shell Scripting
GCP: BigTable, DataProc, BigQuery, GCS, Pubsub
OR
AWS: Athena, Glue, EMR, S3, RedshiftMongoDB, MySQL, Kafka
Platforms: Cloudera / Hortonworks
AdTech domain experience is a plus.
Job Type - Full Time
-
Owns the end to end implementation of the assigned data processing components/product features i.e. design, development, dep
loyment, and testing of the data processing components and associated flows conforming to best coding practices -
Creation and optimization of data engineering pipelines for analytics projects.
-
Support data and cloud transformation initiatives
-
Contribute to our cloud strategy based on prior experience
-
Independently work with all stakeholders across the organization to deliver enhanced functionalities
-
Create and maintain automated ETL processes with a special focus on data flow, error recovery, and exception handling and reporting
-
Gather and understand data requirements, work in the team to achieve high-quality data ingestion and build systems that can process the data, transform the data
-
Be able to comprehend the application of database index and transactions
-
Involve in the design and development of a Big Data predictive analytics SaaS-based customer data platform using object-oriented analysis
, design and programming skills, and design patterns -
Implement ETL workflows for data matching, data cleansing, data integration, and management
-
Maintain existing data pipelines, and develop new data pipeline using big data technologies
-
Responsible for leading the effort of continuously improving reliability, scalability, and stability of microservices and platform