Surviving survival analysis with Apache Spark*
Learn about survival analysis in Apache Spark and some questions it can help answer. For instance, what proportion of individuals can be affected by a phenomenon, at what rate will they be affected, how certain events affect the probability of survival.
Survival analysis involves the modeling of time to event data. An event in this context could be the occurrence of a phenomenon or any experience of interest with time being observed from the beginning of the event period.
The Cox Proportional Hazards Model is a very popular survival analysis model used to estimate the relative risk, rather than absolute risk, in common data science tools like R and SAS. The main challenge in implementing this model in a distributed framework is to come up with an efficient algorithm that minimizes scans over the sorted data.
This talk aims to:
• Give an overview on a distributed and big-data centric implementation of the Cox Model in Apache Spark- which is a fast large-scale data processing engine
• Demonstrate how the Cox Model can be used to generate useful insights with medical data. For example, to determine if a vaccination program is ‘better’ at treating a certain demographic of individuals than another, or to map the ‘effect’ of age on reversion to drug use.
Conducted multiple knowledge sharing talks internally at Intel Corporation.
Conducted a webinar - https://www.brighttalk.com/webcast/10773/192209
I start presenting at 19:07 mins
Anahita is a Software Engineer in Intel’s Big Data Solutions group, currently working on the Trusted Analytics Platform. She holds a Master’s degree in Computer Science from Columbia University specialized in Machine Learning. Her main interests are in Machine Learning, Natural Language Processing, Speech Recognition and Data Mining. She is very inclined towards applying various machine learning principles to real world applications and solving challenges that arise as the data scales.