About Us DataWeave is a Data Platform which aggregates publicly available data from disparate sources and makes it available in the right format to enable companies take strategic decisions using trans-firewall Analytics. It's hard to tell what we love more, problems or solutions! Every day, we choose to address some of the hardest data problems that there are. We are in the business of making sense of messy public data on the web. At serious scale! Roles and Responsibilities: ● Inclined towards working in a start-up like environment. ● Comfort with frequent, incremental code testing and deployment, Data management skills ● Design and Build robust and scalable data engineering solutions for structured and unstructured data for delivering business insights, reporting and analytics. ● Expertise in troubleshooting, debugging, data completeness and quality issues and scaling overall system performance. ● Build robust API’s that powers our delivery points (Dashboards, Visualizations and other integrations). Skills and Requirements: ● Good communication and collaboration skills with 3-5 years of experience. ● Ability to code and script with strong grasp of CS fundamentals, excellent problem solving abilities. ● Comfort with frequent, incremental code testing and deployment, Data management skills ● Good understanding of RDBMS ● Experience in building Data pipelines and processing large datasets. ● Knowledge of building crawlers and data mining is a plus. ● Working knowledge of open source tools such as mysql, Solr, ElasticSearch, Cassandra (data stores) would be a plus.
Sr Data Engineer Job Description About Us DataWeave is a Data Platform which aggregates publicly available data from disparate sources and makes it available in the right format to enable companies take strategic decisions using trans-firewall Analytics. It's hard to tell what we love more, problems or solutions! Every day, we choose to address some of the hardest data problems that there are. We are in the business of making sense of messy public data on the web. At serious scale! Read more on Become a DataWeaver Requirements: • Building an intelligent and highly scalable crawling platform • Data extraction and processing at scale • Enhancing existing data stores/data models • Building a low latency API layer for serving data to power Dashboards, Reports, and Analytics functionality • Constantly evolving our data platform to support new features Expectations: • 4+ years of relevant industry experience. • Strong in algorithms and problem solving Skills • Software development experience in one or more general purpose programming languages (e.g. Python, C/C++, Ruby, Java, C#). • Exceptional coding abilities and experience with building large-scale and high-availability applications. • Experience in search/information retrieval platforms like Solr, Lucene and ElasticSearch. • Experience in building and maintaining large scale web crawlers. • In Depth knowledge of SQL and and No-Sql datastore. • Ability to design and build quick prototypes. • Experience in working on cloud based infrastructure like AWS, GCE. Growth at DataWeave • Fast paced growth opportunities at dynamically evolving start-up. • You have the opportunity to work in many different areas and explore wide variety of tools to figure out what really excites you.
Critical Tasks and Expected Contributions/Results : The role will be primarily focused on the design, development and testing of ETL workflows (using Talend) as well as the batch management and error handling processes. Build Business Intelligence Applications using tools like Power BI. Additional responsibilities include the documentation of technical specifications and related project artefacts. - Gather requirement and propose possible ETL solutions for in-house designed Data Warehouse - Analyze & translate functional specifications & change requests into technical specifications. - Design and Creating star schema data models - Design, Build and Implement Business Intelligence Solutions using Power BI - Develop, implement & test ETL program logic. - Deployment and support any related issues Key Competency : - A good understanding of the concepts and best practices of data warehouse ETL design and be able to apply these suitably to solve specific business needs. - Expert knowledge of ETL tool like Talend - Have more than 8 years experience in designing and developing ETL work packages, and be able to demonstrate expertise in ETL tool- Talend - Knowledge of BI tools like Power BI is required - Ability to follow functional ETL specifications and challenge business logic and schema design where appropriate, as well as manage their time effectively. - Exposure to Performance tuning is essential - Good organisational skills. - Methodical and structured approach to design and development. - Good interpersonal skills.
candidate will be responsible for all aspects of data acquisition, data transformation, and analytics scheduling and operationalization to drive high-visibility, cross-division outcomes. Expected deliverables will include the development of Big Data ELT jobs using a mix of technologies, stitching together complex and seemingly unrelated data sets for mass consumption, and automating and scaling analytics into the GRAND's Data Lake. Key Responsibilities : - Create a GRAND Data Lake and Warehouse which pools all the data from different regions and stores of GRAND in GCC - Ensure Source Data Quality Measurement, enrichment and reporting of Data Quality - Manage All ETL and Data Model Update Routines - Integrate new data sources into DWH - Manage DWH Cloud (AWS/AZURE/Google) and Infrastructure Skills Needed : - Very strong in SQL. Demonstrated experience with RDBMS, Unix Shell scripting preferred (e.g., SQL, Postgres, Mongo DB etc) - Experience with UNIX and comfortable working with the shell (bash or KRON preferred) - Good understanding of Data warehousing concepts. Big data systems : Hadoop, NoSQL, HBase, HDFS, MapReduce - Aligning with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments. - Working with data delivery teams to set up new Hadoop users. This job includes setting up Linux users, setting up and testing HDFS, Hive, Pig and MapReduce access for the new users. - Cluster maintenance as well as creation and removal of nodes using tools like Ganglia, Nagios, Cloudera Manager Enterprise, and other tools. - Performance tuning of Hadoop clusters and Hadoop MapReduce routines. - Screen Hadoop cluster job performances and capacity planning - Monitor Hadoop cluster connectivity and security - File system management and monitoring. - HDFS support and maintenance. - Collaborating with application teams to install operating system and - Hadoop updates, patches, version upgrades when required. - Defines, develops, documents and maintains Hive based ETL mappings and scripts