About Accion Labs
Accion Labs, Inc. ranked number one IT Company based out of Pittsburgh headquartered global technology firm.
Accion labs Inc: Winner of Fastest growing Company in Pittsburgh, Raked as #1 IT services company two years in a row (2014, 2015), by Pittsburgh Business Times Accion Labs is venture-funded, profitable and fast-growing- allowing you an opportunity to grow with us 11 global offices, 1300+ employees, 80+ tech company clients 90% of our clients we work with are Direct Clients and project based. Offering a full range of product life-cycle services in emerging technology segments including Web 2.0, Open Source, SaaS /Cloud, Mobility, IT Operations Management/ITSM, Big Data and traditional BI/DW, Automation engineering (Rackspace team), devops engineering.
Employee strength: 1300+ employees
Why Accion Labs:
- Emerging technology projects i.e. Web 2.0, SaaS, cloud, mobility, BI/DW and big data
- Great learning environment
- Onsite opportunity it totally depends on project requirement
- We invest in training our resources in latest frameworks, tools, processes and best-practices and also cross-training our resources across a range of emerging technologies – enabling you to develop more marketable skill
- Employee friendly environment with 100% focus on work-life balance, life-long learning and open communication
- Allow our employees to directly interact with clients
The Platform Data Science team works at the intersection of data science and engineering. Domain experts develop and advance platforms, including the data platforms, machine learning platform, other platforms for Forecasting, Experimentation, Anomaly Detection, Conversational AI, Underwriting of Risk, Portfolio Management, Fraud Detection & Prevention and many more. We also are the Data Science and Analytics partners for Product and provide Behavioural Science insights across Jupiter.
About the role:
We’re looking for strong Software Engineers that can combine EMR, Redshift, Hadoop, Spark, Kafka, Elastic Search, Tensorflow, Pytorch and other technologies to build the next generation Data Platform, ML Platform, Experimentation Platform. If this sounds interesting we’d love to hear from you!
This role will involve designing and developing software products that impact many areas of our business. The individual in this role will have responsibility help define requirements, create software designs, implement code to these specifications, provide thorough unit and integration testing, and support products while deployed and used by our stakeholders.
Participate, Own & Influence in architecting & designing of systems
Collaborate with other engineers, data scientists, product managers
Build intelligent systems that drive decisions
Build systems that enable us to perform experiments and iterate quickly
Build platforms that enable scientists to train, deploy and monitor models at scale
Build analytical systems that drives better decision making
Programming experience with at least one modern language such as Java, Scala including object-oriented design
Experience in contributing to the architecture and design (architecture, design patterns, reliability and scaling) of new and current systems
Bachelor’s degree in Computer Science or related field
Computer Science fundamentals in object-oriented design
Computer Science fundamentals in data structures
Computer Science fundamentals in algorithm design, problem solving, and complexity analysis
Experience in databases, analytics, big data systems or business intelligence products:
Data lake, data warehouse, ETL, ML platform
Big data tech like: Hadoop, Apache Spark
We are looking for a Data Engineer to join our data team to solve data-driven critical
business problems. The hire will be responsible for expanding and optimizing the existing
end-to-end architecture including the data pipeline architecture. The Data Engineer will
collaborate with software developers, database architects, data analysts, data scientists and platform team on data initiatives and will ensure optimal data delivery architecture is
consistent throughout ongoing projects. The right candidate should have hands on in
developing a hybrid set of data-pipelines depending on the business requirements.
- Develop, construct, test and maintain existing and new data-driven architectures.
- Align architecture with business requirements and provide solutions which fits best
- to solve the business problems.
- Build the infrastructure required for optimal extraction, transformation, and loading
- of data from a wide variety of data sources using SQL and Azure ‘big data’
- Data acquisition from multiple sources across the organization.
- Use programming language and tools efficiently to collate the data.
- Identify ways to improve data reliability, efficiency and quality
- Use data to discover tasks that can be automated.
- Deliver updates to stakeholders based on analytics.
- Set up practices on data reporting and continuous monitoring
Required Technical Skills
- Graduate in Computer Science or in similar quantitative area
- 1+ years of relevant work experience as a Data Engineer or in a similar role.
- Advanced SQL knowledge, Data-Modelling and experience working with relational
- databases, query authoring (SQL) as well as working familiarity with a variety of
- Experience in developing and optimizing ETL pipelines, big data pipelines, and datadriven
- Must have strong big-data core knowledge & experience in programming using Spark - Python/Scala
- Experience with orchestrating tool like Airflow or similar
- Experience with Azure Data Factory is good to have
- Build processes supporting data transformation, data structures, metadata,
- dependency and workload management.
- Experience supporting and working with cross-functional teams in a dynamic
- Good understanding of Git workflow, Test-case driven development and using CICD
- is good to have
- Good to have some understanding of Delta tables It would be advantage if the candidate also have below mentioned experience using
- the following software/tools:
- Experience with big data tools: Hadoop, Spark, Hive, etc.
- Experience with relational SQL and NoSQL databases
- Experience with cloud data services
- Experience with object-oriented/object function scripting languages: Python, Scala, etc.
As a Data Engineer, your role will encompass:
- Designing and building production data pipelines from ingestion to consumption within a hybrid big data architecture using Scala, Python, Talend etc.
- Gather and address technical and design requirements.
- Refactor existing applications to optimize its performance through setting the appropriate architecture and integrating the best practices and standards.
- Participate in the entire data life-cycle mainly focusing on coding, debugging, and testing.
- Troubleshoot and debug ETL Pipelines.
- Documentation of each process.
Technical Requirements: -
- BSc degree in Computer Science/Computer Engineering. (Masters is a plus.)
- 2+ years of experience as a Data Engineer.
- In-depth understanding of core ETL concepts, Data Modelling, Data Lineage, Data Governance, Data Catalog, etc.
- 2+ years of work experience in Scala, Python, Java.
- Good Knowledge on Big Data Tools such as Spark/HDFS/Hive/Flume, etc.
- Hands on experience on ETL tools like Talend/Informatica is a plus.
- Good knowledge in Kafka and spark streaming is a big plus.
- 2+ years of experience in using Azure cloud and its resources/services (like Azure Data factory, Azure Databricks, SQL Synapse, Azure Devops, Logic Apps, Power Bi, Azure Event Hubs, etc).
- Strong experience in Relational Databases (MySQL, SQL Server)
- Exposure on data visualization tools like Power BI / Qlik sense / MicroStrategy
- 2+ years of experience in developing APIs (REST & SOAP protocols).
- Strong knowledge in Continuous Integration & Continuous Deployment (CI/CD) utilizing Docker containers, Jenkins, etc.
- Strong competencies in algorithms and software architecture.
- Excellent analytical and teamwork skills.
Good to have: -
- Previous on-prem working experience is a plus.
- In-depth understanding of the entire web development process (design, development, and deployment)
- Previous experience in automated testing including unit testing & UI testing.
Skills- Informatica with Big Data Management
1.Minimum 6 to 8 years of experience in informatica BDM development
2.Experience working on Spark/SQL
3.Develops informtica mapping/Sql
1. Communicate with the clients and understand their business requirements.
2. Build, train, and manage your own team of junior data engineers.
3. Assemble large, complex data sets that meet the client’s business requirements.
4. Identify, design and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
5. Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources, including the cloud.
6. Assist clients with data-related technical issues and support their data infrastructure requirements.
7. Work with data scientists and analytics experts to strive for greater functionality.
Skills required: (experience with at least most of these)
1. Experience with Big Data tools-Hadoop, Spark, Apache Beam, Kafka etc.
2. Experience with object-oriented/object function scripting languages: Python, Java, C++, Scala, etc.
3. Experience in ETL and Data Warehousing.
4. Experience and firm understanding of relational and non-relational databases like MySQL, MS SQL Server, Postgres, MongoDB, Cassandra etc.
5. Experience with cloud platforms like AWS, GCP and Azure.
6. Experience with workflow management using tools like Apache Airflow.
Responsibilities for Data Engineer
- Create and maintain optimal data pipeline architecture,
- Assemble large, complex data sets that meet functional / non-functional business requirements.
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS ‘big data’ technologies.
- Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics.
- Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
- Keep our data separated and secure across national boundaries through multiple data centers and AWS regions.
- Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
- Work with data and analytics experts to strive for greater functionality in our data systems.
Qualifications for Data Engineer
- Advanced working SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases.
- Experience building and optimizing ‘big data’ data pipelines, architectures and data sets.
- Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
- Strong analytic skills related to working with unstructured datasets.
- Build processes supporting data transformation, data structures, metadata, dependency and workload management.
- A successful history of manipulating, processing and extracting value from large disconnected datasets.
- Working knowledge of message queuing, stream processing, and highly scalable ‘big data’ data stores.
- Strong project management and organizational skills.
- Experience supporting and working with cross-functional teams in a dynamic environment.
- We are looking for a candidate with 5+ years of experience in a Data Engineer role, who has attained a Graduate degree in Computer Science, Statistics, Informatics, Information Systems or another quantitative field. They should also have experience using the following software/tools:
- Experience with big data tools: Hadoop, Spark, Kafka, etc.
- Experience with relational SQL and NoSQL databases, including Postgres and Cassandra.
- Experience with data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc.
- Experience with AWS cloud services: EC2, EMR, RDS, Redshift
- Experience with stream-processing systems: Storm, Spark-Streaming, etc.
- Experience with object-oriented/object function scripting languages: Python, Java, C++, Scala, etc.
- Building and operationalizing large scale enterprise data solutions and applications using one or more of AZURE data and analytics services in combination with custom solutions - Azure Synapse/Azure SQL DWH, Azure Data Lake, Azure Blob Storage, Spark, HDInsights, Databricks, CosmosDB, EventHub/IOTHub.
- Experience in migrating on-premise data warehouses to data platforms on AZURE cloud.
- Designing and implementing data engineering, ingestion, and transformation functions
- Experience with Azure Analysis Services
- Experience in Power BI
- Experience with third-party solutions like Attunity/Stream sets, Informatica
- Experience with PreSales activities (Responding to RFPs, Executing Quick POCs)
- Capacity Planning and Performance Tuning on Azure Stack and Spark.
Technical Knowledge (Must Have)
- Strong experience in SQL / HiveQL/ AWS Athena,
- Strong expertise in the development of data pipelines (snaplogic is preferred).
- Design, Development, Deployment and administration of data processing applications.
- Good Exposure towards AWS and Azure Cloud computing environments.
- Knowledge around BigData, AWS Cloud Architecture, Best practices, Securities, Governance, Metadata Management, Data Quality etc.
- Data extraction through various firm sources (RDBMS, Unstructured Data Sources) and load to datalake with all best practices.
- Knowledge in Python
- Good knowledge in NoSQL technologies (Neo4J/ MongoDB)
- Experience/knowledge in SnapLogic (ETL Technologies)
- Working knowledge on Unix (AIX, Linux), shell scripting
- Experience/knowledge in Data Modeling. Database Development
- Experience/knowledge creation of reports and dashboards in Tableau/ PowerBI
• Drive the data engineering implementation
• Strong experience in building data pipelines
• AWS stack experience is must
• Deliver Conceptual, Logical and Physical data models for the implementation
• SQL stronghold is must. Advanced SQL working knowledge and experience
working with a variety of relational databases, SQL query authoring
• AWS Cloud data pipeline experience is must. Data pipelines and data centric
applications using distributed storage platforms like S3 and distributed processing
platforms like Spark, Airflow, Kafka
• Working knowledge of AWS technologies such as S3, EC2, EMR, RDS, Lambda,
• Ability to use a major programming (e.g. Python /Java) to process data for
Your mission is to help lead team towards creating solutions that improve the way our business is run. Your knowledge of design, development, coding, testing and application programming will help your team raise their game, meeting your standards, as well as satisfying both business and functional requirements. Your expertise in various technology domains will be counted on to set strategic direction and solve complex and mission critical problems, internally and externally. Your quest to embracing leading-edge technologies and methodologies inspires your team to follow suit.
Responsibilities and Duties :
- As a Data Engineer you will be responsible for the development of data pipelines for numerous applications handling all kinds of data like structured, semi-structured &
unstructured. Having big data knowledge specially in Spark & Hive is highly preferred.
- Work in team and provide proactive technical oversight, advice development teams fostering re-use, design for scale, stability, and operational efficiency of data/analytical solutions
Education level :
- Bachelor's degree in Computer Science or equivalent
- Minimum 5+ years relevant experience working on production grade projects experience in hands on, end to end software development
- Expertise in application, data and infrastructure architecture disciplines
- Expert designing data integrations using ETL and other data integration patterns
- Advanced knowledge of architecture, design and business processes
Proficiency in :
- Modern programming languages like Java, Python, Scala
- Big Data technologies Hadoop, Spark, HIVE, Kafka
- Writing decently optimized SQL queries
- Orchestration and deployment tools like Airflow & Jenkins for CI/CD (Optional)
- Responsible for design and development of integration solutions with Hadoop/HDFS, Real-Time Systems, Data Warehouses, and Analytics solutions
- Knowledge of system development lifecycle methodologies, such as waterfall and AGILE.
- An understanding of data architecture and modeling practices and concepts including entity-relationship diagrams, normalization, abstraction, denormalization, dimensional
modeling, and Meta data modeling practices.
- Experience generating physical data models and the associated DDL from logical data models.
- Experience developing data models for operational, transactional, and operational reporting, including the development of or interfacing with data analysis, data mapping,
and data rationalization artifacts.
- Experience enforcing data modeling standards and procedures.
- Knowledge of web technologies, application programming languages, OLTP/OLAP technologies, data strategy disciplines, relational databases, data warehouse development and Big Data solutions.
- Ability to work collaboratively in teams and develop meaningful relationships to achieve common goals
Must Know :
- Core big-data concepts
- Spark - PySpark/Scala
- Data integration tool like Pentaho, Nifi, SSIS, etc (at least 1)
- Handling of various file formats
- Cloud platform - AWS/Azure/GCP
- Orchestration tool - Airflow