The past decade has witnessed the successful of application of many AI techniques used at `web-scale’, on what are popularly referred to as big data platforms based on the map-reduce parallel computing paradigm and associated technologies such as distributed file systems, no-SQL databases and stream computing engines. Online advertising, machine translation, natural language understanding, sentiment mining, personalized medicine, and national security are some examples of such AI-based web-intelligence applications that are already in the public eye.
Process mining is the missing link between model-based process analysis and data-oriented analysis techniques. Through concrete data sets and easy to use software the course provides data science knowledge that can be applied directly to analyze and improve processes in a variety of domains. Data science is the profession of the future, because organizations that are unable to use (big) data in a smart way will not survive. It is not sufficient to focus on data storage and data analysis. The data scientist also needs to relate data to process analysis. Process mining bridges the gap between traditional model-based process analysis (e.g., simulation and other business process management techniques) and data-centric analysis techniques such as machine learning and data mining.
This course is an applied statistics course focusing on data analysis. The course will begin with an overview of how to organize, perform, and write-up data analyses. Then we will cover some of the most popular and widely used statistical methods like linear regression, principal components analysis, cross-validation, and p-values. Instead of focusing on mathematical details, the lectures will be designed to help you apply these techniques to real data using the R statistical programming language, interpret the results, and diagnose potential problems in your analysis. You will also have the opportunity to critique and assist your fellow classmates with their data analyses.
Commerce and research is being transformed by data-driven discovery and prediction. Skills required for data analytics at massive levels – scalable data management on and off the cloud, parallel algorithms, statistical modeling, and proficiency with a complex ecosystem of tools and platforms – span a variety of disciplines and are not easy to obtain through conventional curricula.
This specialization covers the concepts and tools to understand, analyze, and interpret data from next generation sequencing experiments. It teaches the most common tools used in genomic data science including how to use the command line, Python, R, Bioconductor, and Galaxy. The sequence is a stand alone introduction to genomic data science or a perfect compliment to a primary degree or postdoc in biology, molecular biology, or genetics.
If you need to retrieve data from a SQL Server database then you need to know how to use the SELECT statement. This course starts with the basics of a SELECT statement and its various sub-clauses, and progresses to how to select from multiple data sources in the same statement and a comprehensive section on the functions available to manipulate, aggregate, and convert data during the select operation. More then fifty demos help to give you a thorough understanding of how to perform these essential operations, all using a freely-available demo environment that you're shown how to set up and configure. This course is perfect for developers who need to query SQL Server databases to retrieve data, from complete beginners through to more experienced developers who can use some of the modules as reference material. The information in the course applies to all versions from SQL Server 2005 onwards.