As a Data Engineer you will: Create and maintain optimal data pipeline architecture, Assemble large, complex data sets that meet functional / non-functional business requirements. Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc. Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS ‘big data’ technologies. Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics. Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs. Keep our data separated and secure across national boundaries through multiple data centers and AWS regions. Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader. Work with data and analytics experts to strive for greater functionality in our data systems. Qualifications for a Data Engineer: 4+ years of experience in a Data Engineer role Advanced working SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases Experience building and optimizing 'big data' data pipelines, architectures and data sets Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement Strong analytic skills related to working with unstructured datasets Build processes supporting data transformation, data structures, metadata, dependency and workload management A successful history of manipulating, processing and extracting value from large disconnected datasets Working knowledge of message queuing, stream processing, and highly scalable 'big data' data stores Experience with big data tools: Hadoop, Spark, Kafka, etc Experience with relational SQL and NoSQL databases, including Postgres and Cassandra Experience with data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc. Experience with AWS cloud services: EC2, EMR, RDS, Redshift, Kinesis Experience with stream-processing systems: Storm, Spark-Streaming, etc Experience with object-oriented/object function scripting languages: Python, Java, C++, Scala, etc Rupeek tech Stack: You can take a look at our tech stack here: http://stackshare.io/AmarPrabhu/rupeek
Description At LogMeIn, we build beautifully simple and easy-to-use Cloud-based, cross-platform Web, Mobile and Desktop software products. You probably know us by such industry-defining brand names as GoToMeeting®, GoToWebinar®, JoinMe®, LastPass®, Rescue® and BoldChat® as well as other award winning products and services. LogMeIn enables customers around the world to enjoy highly productive, mobile workstyles. Currently, we’re searching for a high caliber and innovative Big Data and Analytics Engineer who will provide useful insights into the data and enable the stakeholders make Data Driven decisions. He’ll be part of the team building the next generation data platform on Cloud using cutting edge technologies like Spark, Presto, Kinesis, EMR, Pig, Hive, Redshift. If you're passionate about building high quality software for data, thrive in an innovative, cutting-edge startup-like environment, and consider yourself to be a top-notch Data Engineer, then LogMeIn could very well be the perfect fit for you and your career. Responsibilities • Responsible for analysis, design and development activities on multiple projects; plans, organizes, and performs the technical work within area of specialization • Participates in design activity with other programmers on technical aspects relating to the project, including functional specifications, design parameters, feature enhancements, and alternative solutions; • Meets or exceeds standards for the quality and timeliness of the work products that they create (e.g., requirements, designs, code, fixes). • Implements, unit tests, debugs and integrates complex code; designs, writes, conducts, and directs the development of tests to verify the functionality, accuracy, and efficiency of developed or enhanced software; analyzes results for conformance to plans and specifications making recommendations based on the results • Generally provides technical direction and project management within a project/scrum team with increased leadership of others; provides guidance in methodology selection, project planning, the review of work products; may serve in a part-time technical lead capacity to a limited number of junior engineers, providing immediate direction and guidance • Keeps technically abreast of trends and advancements within area of specialization, incorporating these improvements where applicable; attends technical conferences as appropriate Requirements • Bachelor’s degree or equivalent in computer science or related field is preferred, with 5-8 years of directly related work experience • Hands-on experience designing, developing and maintaining high-volume ETL processes using Big Data technologies like Pig, Hive, Oozie, Spark, MapReduce • Solid understanding of Data Warehousing concepts • Strong understanding of Dimensional Data Modeling • Experience in using Hadoop, S3, MapReduce, Redshift, RDS on AWS • Expertise in at least one Visualization tool like Tableau, Quicksight, PowerBI, Sisense, Birst, QlikView, Looker etc. • Experience in working on processing real time streaming data • Strong SQL and Stored Procedure development skills. Knowledge of NoSQL is an added plus • Knowledge of Java to leverage Big Data technologies is desired • Knowledge of scripting language preferably Python or statistical programming language R is desired • Working knowledge of Linux environment • Knowledge of SDLC and Agile development methodologies • Expertise in OOAD principles and methodologies (e.g., UML) and OS concepts • Extensive knowledge and discipline in software engineering process; experience as a technical lead on complex projects, providing guidance on design and development approach • Expertise implementing, unit testing, debugging and integrating code of moderate complexity • Experience helping others to design, write, conduct, and direct the development of tests • Experience independently publishing papers, blogs, and creating and presenting briefings to technical audiences • Strong critical thinking and problem solving skills • Approaches problems with curiosity and open-mindedness
Critical Tasks and Expected Contributions/Results : The role will be primarily focused on the design, development and testing of ETL workflows (using Talend) as well as the batch management and error handling processes. Build Business Intelligence Applications using tools like Power BI. Additional responsibilities include the documentation of technical specifications and related project artefacts. - Gather requirement and propose possible ETL solutions for in-house designed Data Warehouse - Analyze & translate functional specifications & change requests into technical specifications. - Design and Creating star schema data models - Design, Build and Implement Business Intelligence Solutions using Power BI - Develop, implement & test ETL program logic. - Deployment and support any related issues Key Competency : - A good understanding of the concepts and best practices of data warehouse ETL design and be able to apply these suitably to solve specific business needs. - Expert knowledge of ETL tool like Talend - Have more than 8 years experience in designing and developing ETL work packages, and be able to demonstrate expertise in ETL tool- Talend - Knowledge of BI tools like Power BI is required - Ability to follow functional ETL specifications and challenge business logic and schema design where appropriate, as well as manage their time effectively. - Exposure to Performance tuning is essential - Good organisational skills. - Methodical and structured approach to design and development. - Good interpersonal skills.
candidate will be responsible for all aspects of data acquisition, data transformation, and analytics scheduling and operationalization to drive high-visibility, cross-division outcomes. Expected deliverables will include the development of Big Data ELT jobs using a mix of technologies, stitching together complex and seemingly unrelated data sets for mass consumption, and automating and scaling analytics into the GRAND's Data Lake. Key Responsibilities : - Create a GRAND Data Lake and Warehouse which pools all the data from different regions and stores of GRAND in GCC - Ensure Source Data Quality Measurement, enrichment and reporting of Data Quality - Manage All ETL and Data Model Update Routines - Integrate new data sources into DWH - Manage DWH Cloud (AWS/AZURE/Google) and Infrastructure Skills Needed : - Very strong in SQL. Demonstrated experience with RDBMS, Unix Shell scripting preferred (e.g., SQL, Postgres, Mongo DB etc) - Experience with UNIX and comfortable working with the shell (bash or KRON preferred) - Good understanding of Data warehousing concepts. Big data systems : Hadoop, NoSQL, HBase, HDFS, MapReduce - Aligning with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments. - Working with data delivery teams to set up new Hadoop users. This job includes setting up Linux users, setting up and testing HDFS, Hive, Pig and MapReduce access for the new users. - Cluster maintenance as well as creation and removal of nodes using tools like Ganglia, Nagios, Cloudera Manager Enterprise, and other tools. - Performance tuning of Hadoop clusters and Hadoop MapReduce routines. - Screen Hadoop cluster job performances and capacity planning - Monitor Hadoop cluster connectivity and security - File system management and monitoring. - HDFS support and maintenance. - Collaborating with application teams to install operating system and - Hadoop updates, patches, version upgrades when required. - Defines, develops, documents and maintains Hive based ETL mappings and scripts