The past decade has witnessed the successful of application of many AI techniques used at `web-scale’, on what are popularly referred to as big data platforms based on the map-reduce parallel computing paradigm and associated technologies such as distributed file systems, no-SQL databases and stream computing engines. Online advertising, machine translation, natural language understanding, sentiment mining, personalized medicine, and national security are some examples of such AI-based web-intelligence applications that are already in the public eye.
This is an intensive, advanced summer school (in the sense used by scientists) in some of the methods of computational, data-intensive science. It covers a variety of topics from applied computer science and engineering, and statistics, and it requires a strong background in computing, statistics, and data-intensive research.
Process mining is the missing link between model-based process analysis and data-oriented analysis techniques. Through concrete data sets and easy to use software the course provides data science knowledge that can be applied directly to analyze and improve processes in a variety of domains. Data science is the profession of the future, because organizations that are unable to use (big) data in a smart way will not survive. It is not sufficient to focus on data storage and data analysis. The data scientist also needs to relate data to process analysis. Process mining bridges the gap between traditional model-based process analysis (e.g., simulation and other business process management techniques) and data-centric analysis techniques such as machine learning and data mining.
This course is an applied statistics course focusing on data analysis. The course will begin with an overview of how to organize, perform, and write-up data analyses. Then we will cover some of the most popular and widely used statistical methods like linear regression, principal components analysis, cross-validation, and p-values. Instead of focusing on mathematical details, the lectures will be designed to help you apply these techniques to real data using the R statistical programming language, interpret the results, and diagnose potential problems in your analysis. You will also have the opportunity to critique and assist your fellow classmates with their data analyses.
This specialization covers the concepts and tools to understand, analyze, and interpret data from next generation sequencing experiments. It teaches the most common tools used in genomic data science including how to use the command line, Python, R, Bioconductor, and Galaxy. The sequence is a stand alone introduction to genomic data science or a perfect compliment to a primary degree or postdoc in biology, molecular biology, or genetics.