"candidate will be responsible for all aspects of data acquisition, data transformation, and analytics scheduling and operationalization to drive high-visibility, cross-division outcomes. Expected deliverables will include the development of Big Data ELT jobs using a mix of technologies, stitching together complex and seemingly unrelated data sets for mass consumption, and automating and scaling analytics into the GRAND's Data Lake.\n\nKey Responsibilities :\n\n- Create a GRAND Data Lake and Warehouse which pools all the data from different regions and stores of GRAND in GCC\n\n- Ensure Source Data Quality Measurement, enrichment and reporting of Data Quality\n\n- Manage All ETL and Data Model Update Routines\n\n- Integrate new data sources into DWH\n\n- Manage DWH Cloud (AWS/AZURE/Google) and Infrastructure\n\nSkills Needed :\n\n- Very strong in SQL. Demonstrated experience with RDBMS, Unix Shell scripting preferred (e.g., SQL, Postgres, Mongo DB etc)\n\n- Experience with UNIX and comfortable working with the shell (bash or KRON preferred)\n\n- Good understanding of Data warehousing concepts.\n\nBig data systems : Hadoop, NoSQL, HBase, HDFS, MapReduce\n\n- Aligning with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.\n\n- Working with data delivery teams to set up new Hadoop users. This job includes setting up Linux users, setting up and testing HDFS, Hive, Pig and MapReduce access for the new users.\n\n- Cluster maintenance as well as creation and removal of nodes using tools like Ganglia, Nagios, Cloudera Manager Enterprise, and other tools.\n\n- Performance tuning of Hadoop clusters and Hadoop MapReduce routines.\n\n- Screen Hadoop cluster job performances and capacity planning\n\n- Monitor Hadoop cluster connectivity and security\n\n- File system management and monitoring.\n\n- HDFS support and maintenance.\n\n- Collaborating with application teams to install operating system and\n\n- Hadoop updates, patches, version upgrades when required.\n\n- Defines, develops, documents and maintains Hive based ETL mappings and scripts"
"You will be working in the Cancer Information Data Trust (CIDT) business unit within Inspirata, India at Bangalore. Inspirata is creating the most innovative cancer information big data with associated analytics that renders data to various portals. The Data Integrator will be a critical role in bringing accurate and conditioned Data to CIDT and Digital Pathology Products.\n\nYour Role\nSr. Software Engineer – Data Integration is an active and influential member of CIDT team and is responsible for the development of Inspirata’s Data Integration Bus Solution. Ideal candidate is a highly experienced engineer with exceptional skills, has an aptitude for integration technologies and the drive and desire to push the boundaries to solve complex problems. \n\nYour Responsibilities\n•\tDesign and development of a middleware bus technology\n•\tHave the developed product acquire data residing in different sources of medical data elements and provide a unified and trusted view of the data\n•\tExperience with Integration/ESB tools such as Orion Rhapsody, CorePoint, Informatica, TIBCO, Talend or similar tools\n•\tHave deep understanding of data management best practices, including ETL, data modeling, data management, file management, reference data management, etc.\n•\tShould have at least working knowledge of integrating unstructured data (NO SQL) along with structured data (RDBMS)\n•\tAbility to identify multiple approaches to problem solving and recommend the best case solution\n \nRequirements\n•\tB.E. / M.Tech in Computer Science or related field with 5 - 8 years of product development experience in ETL, Data Modeling, SQL, Data Analysis in a health care industry (preferred)\n•\tKnowledge of HL7 protocols, PHI regulations \n•\tGood communication skills"