We have an urgent requirements of Big Data Developer profiles in our reputed MNC company.
Location: Pune/Bangalore/Hyderabad/Nagpur
Experience: 4-9yrs
Skills: Pyspark,AWS
or Spark,Scala,AWS
or Python Aws
About Persistent Systems
Similar jobs
- Extensive exposure to at least one Business Intelligence Platform (if possible, QlikView/Qlik Sense) – if not Qlik, ETL tool knowledge, ex- Informatica/Talend
- At least 1 Data Query language – SQL/Python
- Experience in creating breakthrough visualizations
- Understanding of RDMS, Data Architecture/Schemas, Data Integrations, Data Models and Data Flows is a must
Proficiency in Linux.
Must have SQL knowledge and experience working with relational databases,
query authoring (SQL) as well as familiarity with databases including Mysql,
Mongo, Cassandra, and Athena.
Must have experience with Python/Scala.
Must have experience with Big Data technologies like Apache Spark.
Must have experience with Apache Airflow.
Experience with data pipeline and ETL tools like AWS Glue.
Experience working with AWS cloud services: EC2, S3, RDS, Redshift.
Qualifications
- 5+ years of professional experience in experiment design and applied machine learning predicting outcomes in large-scale, complex datasets.
- Proficiency in Python, Azure ML, or other statistics/ML tools.
- Proficiency in Deep Neural Network, Python based frameworks.
- Proficiency in Azure DataBricks, Hive, Spark.
- Proficiency in deploying models into production (Azure stack).
- Moderate coding skills. SQL or similar required. C# or other languages strongly preferred.
- Outstanding communication and collaboration skills. You can learn from and teach others.
- Strong drive for results. You have a proven record of shepherding experiments to create successful shipping products/services.
- Experience with prediction in adversarial (energy) environments highly desirable.
- Understanding of the model development ecosystem across platforms, including development, distribution, and best practices, highly desirable.
As a dedicated Data Scientist on our Research team, you will apply data science and your machine learning expertise to enhance our intelligent systems to predict and provide proactive advice. You’ll work with the team to identify and build features, create experiments, vet ML models, and ship successful models that provide value additions for hundreds of EE customers.
At EE, you’ll have access to vast amounts of energy-related data from our sources. Our data pipelines are curated and supported by engineering teams (so you won't have to do much data engineering - you get to do the fun stuff.) We also offer many company-sponsored classes and conferences that focus on data science and ML. There’s great growth opportunity for data science at EE.
We are looking for an exceptionally talented Lead data engineer who has exposure in implementing AWS services to build data pipelines, api integration and designing data warehouse. Candidate with both hands-on and leadership capabilities will be ideal for this position.
Qualification: At least a bachelor’s degree in Science, Engineering, Applied Mathematics. Preferred Masters degree
Job Responsibilities:
• Total 6+ years of experience as a Data Engineer and 2+ years of experience in managing a team
• Have minimum 3 years of AWS Cloud experience.
• Well versed in languages such as Python, PySpark, SQL, NodeJS etc
• Has extensive experience in Spark ecosystem and has worked on both real time and batch processing
• Have experience in AWS Glue, EMR, DMS, Lambda, S3, DynamoDB, Step functions, Airflow, RDS, Aurora etc.
• Experience with modern Database systems such as Redshift, Presto, Hive etc.
• Worked on building data lakes in the past on S3 or Apache Hudi
• Solid understanding of Data Warehousing Concepts
• Good to have experience on tools such as Kafka or Kinesis
• Good to have AWS Developer Associate or Solutions Architect Associate Certification
• Have experience in managing a team
Data Analyst(Python, ETL & SQL)
at Top Management Consulting Company
python, sql, etl (building etl pipelines)
EXPERIENCE : 3 -7 years of experience in Data Engineering or Data Warehousing
LOCATION: Bangalore and Gurugram
COMPANY DESCRIPTION:
- Bachelor’s degree in Engineering or Computer Science; Master’s degree is a plus
- 3+ years of professional work experience with a reputed analytics firm
- Expertise in handling large amount of data through Python
- Conduct data assessment, perform data quality checks and transform data using SQL and ETL tools
- Experience of deploying ETL / data pipelines and workflows in cloud techhnologies and architecture such as Azure and Amazon Web Services will be valued
- Comfort with data modelling principles (e.g. database structure, entity relationships, UID etc.)and software development principles (e.g. modularization, testing, refactoring, etc.)
- A thoughtful and comfortable communicator (verbal and written) with the ability to facilitate discussions and conduct training
- Track record of strong problemsolving, requirement gathering, and leading by example
- Ability to thrive in a flexible and collaborative environment
- Track record of completing projects successfully on time, within budget and as per scope
Data Engineer (Pyspark+SQL)
Must Have Skills:
Job Description
Do you have a passion for computer vision and deep learning problems? We are looking for someone who thrives on collaboration and wants to push the boundaries of what is possible today! Material Depot (materialdepot.in) is on a mission to be India’s largest tech company in the Architecture, Engineering and Construction space by democratizing the construction ecosystem and bringing stakeholders onto a common digital platform. Our engineering team is responsible for developing Computer Vision and Machine Learning tools to enable digitization across the construction ecosystem. The founding team includes people from top management consulting firms and top colleges in India (like BCG, IITB), and have worked extensively in the construction space globally and is funded by top Indian VCs.
Our team empowers Architectural and Design Businesses to effectively manage their day to day operations. We are seeking an experienced, talented Data Scientist to join our team. You’ll be bringing your talents and expertise to continue building and evolving our highly available and distributed platform.
Our solutions need complex problem solving in computer vision that require robust, efficient, well tested, and clean solutions. The ideal candidate will possess the self-motivation, curiosity, and initiative to achieve those goals. Analogously, the candidate is a lifelong learner who passionately seeks to improve themselves and the quality of their work. You will work together with similar minds in a unique team where your skills and expertise can be used to influence future user experiences that will be used by millions.
In this role, you will:
- Extensive knowledge in machine learning and deep learning techniques
- Solid background in image processing/computer vision
- Experience in building datasets for computer vision tasks
- Experience working with and creating data structures / architectures
- Proficiency in at least one major machine learning framework
- Experience visualizing data to stakeholders
- Ability to analyze and debug complex algorithms
- Good understanding and applied experience in classic 2D image processing and segmentation
- Robust semantic object detection under different lighting conditions
- Segmentation of non-rigid contours in challenging/low contrast scenarios
- Sub-pixel accurate refinement of contours and features
- Experience in image quality assessment
- Experience with in depth failure analysis of algorithms
- Highly skilled in at least one scripting language such as Python or Matlab and solid experience in C++
- Creativity and curiosity for solving highly complex problems
- Excellent communication and collaboration skills
- Mentor and support other technical team members in the organization
- Create, improve, and refine workflows and processes for delivering quality software on time and with carefully calculated debt
- Work closely with product managers, customer support representatives, and account executives to help the business move fast and efficiently through relentless automation.
How you will do this:
- You’re part of an agile, multidisciplinary team.
- You bring your own unique skill set to the table and collaborate with others to accomplish your team’s goals.
- You prioritize your work with the team and its product owner, weighing both the business and technical value of each task.
- You experiment, test, try, fail, and learn continuously.
- You don’t do things just because they were always done that way, you bring your experience and expertise with you and help the team make the best decisions.
For this role, you must have:
- Strong knowledge of and experience with the functional programming paradigm.
- Experience conducting code reviews, providing feedback to other engineers.
- Great communication skills and a proven ability to work as part of a tight-knit team.
• 5+ years’ experience developing and maintaining modern ingestion pipeline using
technologies like Spark, Apache Nifi etc).
• 2+ years’ experience with Healthcare Payors (focusing on Membership, Enrollment, Eligibility,
• Claims, Clinical)
• Hands on experience on AWS Cloud and its Native components like S3, Athena, Redshift &
• Jupyter Notebooks
• Strong in Spark Scala & Python pipelines (ETL & Streaming)
• Strong experience in metadata management tools like AWS Glue
• String experience in coding with languages like Java, Python
• Worked on designing ETL & streaming pipelines in Spark Scala / Python
• Good experience in Requirements gathering, Design & Development
• Working with cross-functional teams to meet strategic goals.
• Experience in high volume data environments
• Critical thinking and excellent verbal and written communication skills
• Strong problem-solving and analytical abilities, should be able to work and delivery
individually
• Good-to-have AWS Developer certified, Scala coding experience, Postman-API and Apache
Airflow or similar schedulers experience
• Nice-to-have experience in healthcare messaging standards like HL7, CCDA, EDI, 834, 835, 837
• Good communication skills
Data Engineer
at NeenOpal Intelligent Solutions Private Limited
We are actively seeking a Senior Data Engineer experienced in building data pipelines and integrations from 3rd party data sources by writing custom automated ETL jobs using Python. The role will work in partnership with other members of the Business Analytics team to support the development and implementation of new and existing data warehouse solutions for our clients. This includes designing database import/export processes used to generate client data warehouse deliverables.
- 2+ Years experience as an ETL developer with strong data architecture knowledge around data warehousing concepts, SQL development and optimization, and operational support models.
- Experience using Python to automate ETL/Data Processes jobs.
- Design and develop ETL and data processing solutions using data integration tools, python scripts, and AWS / Azure / On-Premise Environment.
- Experience / Willingness to learn AWS Glue / AWS Data Pipeline / Azure Data Factory for Data Integration.
- Develop and create transformation queries, views, and stored procedures for ETL processes, and process automation.
- Document data mappings, data dictionaries, processes, programs, and solutions as per established standards for data governance.
- Work with the data analytics team to assess and troubleshoot potential data quality issues at key intake points such as validating control totals at intake and then upon transformation, and transparently build lessons learned into future data quality assessments
- Solid experience with data modeling, business logic, and RESTful APIs.
- Solid experience in the Linux environment.
- Experience with NoSQL / PostgreSQL preferred
- Experience working with databases such as MySQL, NoSQL, and Postgres, and enterprise-level connectivity experience (such as connecting over TLS and through proxies).
- Experience with NGINX and SSL.
- Performance tune data processes and SQL queries, and recommend and implement data process optimization and query tuning techniques.