Senior Big Data Engineer
Note: Notice Period : 45 days
Banyan Data Services (BDS) is a US-based data-focused Company that specializes in comprehensive data solutions and services, headquartered in San Jose, California, USA.
We are looking for a Senior Hadoop Bigdata Engineer who has expertise in solving complex data problems across a big data platform. You will be a part of our development team based out of Bangalore. This team focuses on the most innovative and emerging data infrastructure software and services to support highly scalable and available infrastructure.
It's a once-in-a-lifetime opportunity to join our rocket ship startup run by a world-class executive team. We are looking for candidates that aspire to be a part of the cutting-edge solutions and services we offer that address next-gen data evolution challenges.
· 5+ years of experience working with Java and Spring technologies
· At least 3 years of programming experience working with Spark on big data; including experience with data profiling and building transformations
· Knowledge of microservices architecture is plus
· Experience with any NoSQL databases such as HBase, MongoDB, or Cassandra
· Experience with Kafka or any streaming tools
· Knowledge of Scala would be preferable
· Experience with agile application development
· Exposure of any Cloud Technologies including containers and Kubernetes
· Demonstrated experience of performing DevOps for platforms
· Strong Skillsets in Data Structures & Algorithm in using efficient way of code complexity
· Exposure to Graph databases
· Passion for learning new technologies and the ability to do so quickly
· A Bachelor's degree in a computer-related field or equivalent professional experience is required
· Scope and deliver solutions with the ability to design solutions independently based on high-level architecture
· Design and develop the big data-focused micro-Services
· Involve in big data infrastructure, distributed systems, data modeling, and query processing
· Build software with cutting-edge technologies on cloud
· Willing to learn new technologies and research-orientated projects
· Proven interpersonal skills while contributing to team effort by accomplishing related results as needed
About Banyan Data Services
We're hell-bent on making this the most enjoyable job you've ever had. Send your resume to [email protected]
We foster a positive leadership culture and ensure that employees at all levels feel comfortable collaborating with one another.
Grow & Learn
Our employees are being groomed by instilling a startup culture in them, as well as providing them with tech-savvy mentors and a passionate team to drive the highest quality of work.
The success and pleasure of employees are top concerns. No matter their level, employees feel valued in all aspects of their lives, including both their professional and personal aspirations.
We strive to create a diverse and inclusive workplace in which everyone, regardless of who they are or what they do for the company, feels equally involved and supported in all aspects of the workplace.
- 3+ years of industry experience in administering (including setting up, managing, monitoring) data processing pipelines (both streaming and batch) using frameworks such as Kafka, ELK Stack, Fluentd and streaming databases like druid
- Strong industry expertise with containerization technologies including kubernetes, docker-compose
- 2+ years of industry in experience in developing scalable data ingestion processes and ETLs
- Experience with cloud platform services such as AWS, Azure or GCP especially with EKS, Managed Kafka
- Experience with scripting languages. Python experience highly desirable.
- 2+ Industry experience in python
- Experience with popular modern web frameworks such as Spring boot, Play framework, or Django
- Demonstrated expertise of building cloud native applications
- Experience in administering (including setting up, managing, monitoring) data processing pipelines (both streaming and batch) using frameworks such as Kafka, ELK Stack, Fluentd
- Experience in API development using Swagger
- Strong expertise with containerization technologies including kubernetes, docker-compose
- Experience with cloud platform services such as AWS, Azure or GCP.
- Implementing automated testing platforms and unit tests
- Proficient understanding of code versioning tools, such as Git
- Familiarity with continuous integration, Jenkins
- Design and Implement Large scale data processing pipelines using Kafka, Fluentd and Druid
- Assist in dev ops operations
- Develop data ingestion processes and ETLs
- Design and Implement APIs
- Assist in dev ops operations
- Identify performance bottlenecks and bugs, and devise solutions to these problems
- Help maintain code quality, organization, and documentation
- Communicate with stakeholders regarding various aspects of solution.
- Mentor team members on best practices
Building out and manage a young data science vertical within the organization
Provide technical leadership in the areas of machine learning, analytics, and data sciences
Work with the team and create a roadmap to solve the company’s requirements by solving data-mining, analytics, and ML problems by Identifying business problems that could be solved using Data Science and scoping it out end to end.
Solve business problems by applying advanced Machine Learning algorithms and complex statistical models on large volumes of data.
Develop heuristics, algorithms, and models to deanonymize entities on public blockchains
Data Mining - Extend the organization’s proprietary dataset by introducing new data collection methods and by identifying new data sources.
Keep track of the latest trends in cryptocurrency usage on open-web and dark-web and develop counter-measures to defeat concealment techniques used by criminal actors.
Develop in-house algorithms to generate risk scores for blockchain transactions.
Work with data engineers to implement the results of your work.
Assemble large, complex data sets that meet functional / non-functional business requirements.
Build, scale and deploy holistic data science products after successful prototyping.
Clearly articulate and present recommendations to business partners, and influence future plans based on insights.
>8+ years of relevant experience as a Data Scientist or Analyst. A few years of work experience solving NLP problems or other ML problems is a plus
Must have previously managed a team of at least 5 data scientists or analysts or demonstrate that they have prior experience in scaling a data science function from the ground
Good understanding of python, bash scripting, and basic cloud platform skills (on GCP or AWS)
Excellent communication skills and analytical skills
What you’ll get
Work closely with the Founders in helping grow the organization to the next level alongside some of the best and brightest talents around you
An excellent culture, we encourage collaboration, growth, and learning amongst the team
Competitive salary and equity
An autonomous and flexible role where you will be trusted with key tasks.
An opportunity to have a real impact and be part of a company with purpose.
Must Have Skills:
- Solid Knowledge on DWH, ETL and Big Data Concepts
- Excellent SQL Skills (With knowledge of SQL Analytics Functions)
- Working Experience on any ETL tool i.e. SSIS / Informatica
- Working Experience on any Azure or AWS Big Data Tools.
- Experience on Implementing Data Jobs (Batch / Real time Streaming)
- Excellent written and verbal communication skills in English, Self-motivated with strong sense of ownership and Ready to learn new tools and technologies
- Experience on Py-Spark / Spark SQL
- AWS Data Tools (AWS Glue, AWS Athena)
- Azure Data Tools (Azure Databricks, Azure Data Factory)
- Knowledge about Azure Blob, Azure File Storage, AWS S3, Elastic Search / Redis Search
- Knowledge on domain/function (across pricing, promotions and assortment).
- Implementation Experience on Schema and Data Validator framework (Python / Java / SQL),
- Knowledge on DQS and MDM.
- Independently work on ETL / DWH / Big data Projects
- Gather and process raw data at scale.
- Design and develop data applications using selected tools and frameworks as required and requested.
- Read, extract, transform, stage and load data to selected tools and frameworks as required and requested.
- Perform tasks such as writing scripts, web scraping, calling APIs, write SQL queries, etc.
- Work closely with the engineering team to integrate your work into our production systems.
- Process unstructured data into a form suitable for analysis.
- Analyse processed data.
- Support business decisions with ad hoc analysis as needed.
- Monitoring data performance and modifying infrastructure as needed.
Responsibility: Smart Resource, having excellent communication skills
About the CompanyBlue Sky Analytics is a Climate Tech startup that combines the power of AI & Satellite data to aid in the creation of a global environmental data stack. Our funders include Beenext and Rainmatter. Over the next 12 months, we aim to expand to 10 environmental data-sets spanning water, land, heat, and more!
We are looking for a Data Lead - someone who works at the intersection of data science, GIS, and engineering. We want a leader who not only understands environmental data but someone who can quickly assemble large scale datasets that are crucial to the well being of our planet. Come save the planet with us!
Manage: As a leadership position, this requires long term strategic thinking. You will be in charge of daily operations of the data team. This would include running team standups, planning the execution of data generation and ensuring the algorithms are put in production. You will also be the person in charge to dumb down the data science for the rest of us who do not know what it means.
Love and Live Data: You will also be taking all the responsibility of ensuring that the data we generate is accurate, clean, and is ready to use for our clients. This would entail that you understand what the market needs, calculate feasibilities and build data pipelines. You should understand the algorithms that we use or need to use and take decisions on what would serve the needs of our clients well. We also want our Data Lead to be constantly probing for newer and optimized ways of generating datasets. It would help if they were abreast of all the latest developments in the data science and environmental worlds. The Data Lead also has to be able to work with our Platform team on integrating the data on our platform and API portal.
Collaboration: We use Clubhouse to track and manage our projects across our organization - this will require you to collaborate with the team and follow up with members on a regular basis. About 50% of the work, needs to be the pulse of the platform team. You'll collaborate closely with peers from other functions—Design, Product, Marketing, Sales, and Support to name a few—on our overall product roadmap, on product launches, and on ongoing operations. You will find yourself working with the product management team to define and execute the feature roadmap. You will be expected to work closely with the CTO, reporting on daily operations and development. We don't believe in a top-down hierarchical approach and are transparent with everyone. This means honest and mutual feedback and ability to adapt.
Teaching: Not exactly in the traditional sense. You'll recruit, coach, and develop engineers while ensuring that they are regularly receiving feedback and making rapid progress on personal and professional goals.
Humble and cool: Look we will be upfront with you about one thing - our team is fairly young and is always buzzing with work. In this fast-paced setting, we are looking for someone who can stay cool, is humble, and is willing to learn. You are adaptable, can skill up fast, and are fearless at trying new methods. After all, you're in the business of saving the planet!
- A minimum of 5 years of industry experience.
- Exceptional at Remote Sensing Data, GIS, Data Science.
- Must have big data & data analytics experience
- Very good in documentation & speccing datasets
- Experience with AWS Cloud, Linux, Infra as Code & Docker (containers) is a must
- Coordinate with cross-functional teams (DevOPS, QA, Design etc.) on planning and execution
- Lead, mentor and manage deliverables of a team of talented and highly motivated team of developers
- Must have experience in building, managing, growing & hiring data teams. Has built large-scale datasets from scratch
- Managing work on team's Clubhouse & follows up with the team. ~ 50% of work, needs to be the pulse of the platform team
- Exceptional communication skills & ability to abstract away problems & build systems. Should be able to explain to the management anything & everything
- Quality control - you'll be responsible for maintaining a high quality bar for everything your team ships. This includes documentation and data quality
- Experience of having led smaller teams, would be a plus.
- Work from anywhere: Work by the beach or from the mountains.
- Open source at heart: We are building a community where you can use, contribute and collaborate on.
- Own a slice of the pie: Possibility of becoming an owner by investing in ESOPs.
- Flexible timings: Fit your work around your lifestyle.
- Comprehensive health cover: Health cover for you and your dependents to keep you tension free.
- Work Machine of choice: Buy a device and own it after completing a year at BSA.
- Quarterly Retreats: Yes there's work-but then there's all the non-work+fun aspect aka the retreat!
- Yearly vacations: Take time off to rest and get ready for the next big assignment by availing the paid leaves.
YOU'LL BE OUR : Data Scientist YOU'LL BE BASED AT: IBC Knowledge Park, Bangalore
YOU'LL BE ALIGNED WITH :Engineering Manager
YOU'LL BE A MEMBER OF : Data Intelligence
WHAT YOU'LL DO AT ATHER:
Work with the vehicle intelligence platform to evolve the algorithms and the platform enhancing ride experience.
Provide data driven solutions from simple to fairly complex insights on the data collected from the vehicle
Identify measures and metrics that could be used insightfully to make decisions across firmware components and productionize these.
Support the data science lead and manager and partner in fairly intensive projects around diagnostics, predictive modeling, BI and Engineering data sciences.
Build and automate scripts that could be re-used efficiently.
Build interactive reports/dashboards that could be re-used across engineering teams for their discussions/ explorations iteratively
Support monitoring, measuring the success of algorithms and features build and lead innovation through objective reasoning and thinking Engage with the data science lead and the engineering team stakeholders on the solution approach and draft a plan of action.
Contribute to product/team roadmap by generating and implementing innovative data and analysis based ideas as product features
Handhold/Guide team in successful conceptualization and implementation of key product differentiators through effective benchmarking.
HERE'S WHAT WE ARE LOOKING FOR :
• Good understanding of C++, Golang programming skills and system architecture understanding
• Experience with IOT, telemetry will be a plus
• Proficient in R markdown/ Python/ Grafana
• Proficient in SQL and No-SQL
• Proficient in R / Python programming
• Good understanding of ML techniques/ Sparks ML
YOU BRING TO ATHER:
• B.E/B.Tech preferably in Computer Science
• 3 to 5 yrs of work experience as Data Scientist
Job Description for :
Role: Data/Integration Architect
Experience – 8-10 Years
Notice Period: Under 30 days
Key Responsibilities: Designing, Developing frameworks for batch and real time jobs on Talend. Leading migration of these jobs from Mulesoft to Talend, maintaining best practices for the team, conducting code reviews and demos.
Talend Data Fabric - Application, API Integration, Data Integration. Knowledge on Talend Management Cloud, deployment and scheduling of jobs using TMC or Autosys.
Programming Languages - Python/Java
Databases: SQL Server, Other Databases, Hadoop
Should have worked on Agile
Sound communication skills
Should be open to learning new technologies based on business needs on the job
Awareness of other data/integration platforms like Mulesoft, Camel
Awareness Hadoop, Snowflake, S3