Big Data Hadoop
Job Number: 3851
External Description:
A career at our company is an ongoing journey of discovery: our 57,000 people are shaping how the world lives, works and plays
through next generation advancements in healthcare, life science and Electronics . For more than 350 years and across the
world we have passionately pursued our curiosity to find novel and vibrant ways of enhancing the lives of others.
Job
Title: Data Engineer/ DevOps - Enterprise Big Data Platform
Job Location : Bangalore
In
this role, you will be part of a growing, global team of data engineers, who collaborate in DevOps mode, in order to enable
our Life Science business with state-of-the-art technology to leverage data as an asset and to take better informed decisions.
The
Life Science Data Engineering Team is responsible for designing, developing, testing, and supporting automated end-to-end
data pipelines and applications on Life Science's data management and analytics platform (Palantir Foundry, Hadoop and other
components).
The Foundry platform comprises multiple different technology stacks, which are hosted on Amazon Web Services
(AWS) infrastructure or on-premise our own data centers. Developing pipelines and applications on Foundry requires:
- Proficiency in SQL / Java / Python (Python required; all 3 not necessary)
- Proficiency in PySpark for distributed computation
- Familiarity with Postgres and ElasticSearch
- Familiarity with HTML, CSS, and JavaScript and basic design/visual competency
- Familiarity with common databases (e.g. JDBC, mySQL, Microsoft SQL). Not all types required
This position will be project based and may work across multiple smaller projects or a single large project utilizing an agile project methodology.
Roles & Responsibilities:
- Develop data pipelines by ingesting various data sources - structured and un-structured - into Palantir Foundry
- Participate in end to end project lifecycle, from requirements analysis to go-live and operations of an application
- Acts as business analyst for developing requirements for Foundry pipelines
- Review code developed by other data engineers and check against platform-specific standards, cross-cutting concerns, coding and configuration standards and functional specification of the pipeline
- Document technical work in a professional and transparent way. Create high quality technical documentation
- Work out the best possible balance between technical feasibility and business requirements (the latter can be quite strict)
- Deploy applications on Foundry platform infrastructure with clearly defined checks
- Implementation of changes and bug fixes via our change management framework and according to system engineering practices (additional training will be provided)
- DevOps project setup following Agile principles (e.g. Scrum)
- Besides working on projects, act as third level support for critical applications; analyze and resolve complex incidents/problems. Debug problems across a full stack of Foundry and code based on Python, Pyspark, and Java
- Work closely with business users, data scientists/analysts
to design physical data models
Education
- B.Sc. (or higher) degree in Computer Science, Engineering, Mathematics, Physical Sciences or related fields
Professional Experience
- 5+ years of experience in system engineering or software development
- 3+ years of experience in engineering with experience in ETL type work with databases and Hadoop platforms.
Skills
Hadoop General- Deep knowledge of distributed file system concepts, map-reduce principles and distributed computing. Knowledge of Spark and differences between Spark and Map-Reduce. Familiarity of encryption and security in a Hadoop cluster.
Data management / data structures- Must be proficient in technical data management tasks, i.e. writing code to read, transform and store data
XML/JSON knowledge
Experience working with REST APIs
Spark- Experience in launching spark jobs in client mode and cluster mode. Familiarity with the property settings of spark jobs and their implications to performance.
Application Development- Familiarity with HTML, CSS, and JavaScript and basic design/visual competency
SCC/Git- Must be experienced in the use of source code control systems such as Git.
ETL - Experience with developing ELT/ETL processes with experience in loading data from enterprise sized RDBMS systems such as Oracle, DB2, MySQL, etc.
Authorization- Basic understanding of user authorization (Apache Ranger preferred)
Programming - Must be at able to code in Python or expert in at least one high level language such as Java, C, Scala.
Must have experience in using REST APIs.
SQL - Must be an expert in manipulating database data using SQL. Familiarity with views, functions, stored procedures and exception handling.
AWS - General knowledge of AWS Stack (EC2, S3, EBS, ...)
IT Process Compliance- SDLC experience and formalized change controls
Working in DevOps teams, based on Agile principles (e.g. Scrum)
ITIL knowledge (especially incident, problem and change management)
Specific information related to the position:
- Physical presence in primary work location (Bangalore)
- Flexible to work CEST and US EST time zones (according to team rotation plan)
- Willingness to travel to Germany, US and potentially other locations (as per project demand)
What we offer: With us, there are always opportunities to break new ground. We empower you to fulfil your ambitions, and our diverse businesses offer various career moves to seek new horizons. We trust you with responsibility early on and support you to draw your own career map that is responsive to your aspirations and priorities in life. Join us and bring your curiosity to life!
Curious? Apply and find more information at https://jobs.vibrantm.com
Job Requisition ID: 216627
Location: Bangalore
Career Level: D - Professional (4-9 years)
Working time model: full-time
Careers during Covid-19
Thank you for visiting our careers website, we are always looking for curious minds to join our teams. We understand how much the world is being impacted by the Covid-19 crisis and we want to assure you that your safety is very important to us. To ensure that everyone's health is protected, instead of a standard face-to-face interview, it is likely that you will be offered alternative digital interview options.
US Disclosure
The Company is an Equal Employment Opportunity employer. No employee or applicant for employment will be discriminated against on the basis of race, color, religion, age, sex, sexual orientation, national origin, ancestry, disability, military or veteran status, genetic information, gender identity, transgender status, marital status, or any other classification protected by applicable federal, state, or local law. This policy of Equal Employment Opportunity applies to all policies and programs relating to recruitment and hiring, promotion, compensation, benefits, discipline, termination, and all other terms and conditions of employment. Any applicant or employee who believes they have been discriminated against by the Company or anyone acting on behalf of the Company must report any concerns to their Human Resources Business Partner, Legal, or Compliance immediately. The Company will not retaliate against any individual because they made a good faith report of discrimination.
North America Disclosure
The Company is committed to accessibility in its workplaces, including during the job application process. Applicants who may require accommodation during the application process should speak with our HR Services team at 855 444 5678 from 8:00am to 5:30pm ET Monday through Friday.
Job Segment: Database, Developer, Oracle, SQL, Java, Technology
Job Number: 216627
Community / Marketing Title: Big Data Hadoop
Location_formattedLocationLong: