He was very informative and helpful.

*Pratheep Ravy - UPC Schweiz GmbH*

Practical Applied Statistics courses

Code | Name | Dauer | Übersicht |
---|---|---|---|

xcelsius | Xcelsius | 14 hours | Description: In this Xcelsius Training course, students will use Xcelsius Present to create interactive visualizations for presenting complex data in a simple way, and to conduct analysis to make critical decisions. Students will also create complete dashboards that present business, project, and human resources information, all consolidated and presented in a user-friendly manner. Finally, students will publish dashboards into various file formats such as Adobe Flash, Microsoft Office PowerPoint, Adobe PDF, and also to the web. Objectives: Upon successful completion of this course, students will be able to: Explore the Xcelsius workspace and an already created dashboard. Create simple visualizations. Conduct data analysis using Xcelsius components that give dynamic functionality to the specified data. Create a Project Management dashboard. Create a dashboard to consolidate and present the Human Resources information of an organization. Finalize dashboards and export them to different file formats. Audience: This course is designed for professionals who conduct data analysis and need to present robust and timely data in an interactive display. 1: Getting Started with Xcelsius 1A: Explore the Xcelsius Interface 1B: Explore a Dashboard 2: Creating Simple and Interactive Visualizations 2A: Create a Simple Xcelsius Chart 2B: Manage Personal Finance Using Value Box 2C: Organize Levels of Information Using Filters 2D: Conduct a Comparative Study Using List Builder and Line Chart 3: Conducting Data Analysis 3A: Conduct Trend Analysis Using Combo Box 3B: Conduct Demand Analysis Using Label Based Menu 3C: Conduct a Region Based Demand Analysis Using Maps 3D: Forecast Revenue Using Sliders and Gauge 4: Creating a Project Management Dashboard 4A: Drill Down the Status of Current Projects Using the Drill Down Function 4B: Analyze Resource Efficiency Using Fisheye Picture Menu and Other Tools 4C: Analyze Resource Utilization Using Combination Chart 5: Creating a Human Resources Dashboard 5A: Create an Organization Dashboard Using Organization Chart 5B: Conduct Attrition Analysis 6: Finalizing Dashboards 6A: Create a Snapshot 6B: Publish Dashboards |

statdm | Statistical Thinking for Decision Makers | 7 hours | This course has been created for decision makers whose primary goal is not to do the calculation and the analysis, but to understand them and be able to choose what kind of statistical methods are relevant in strategic planning of the organization. For example, a prospect participant needs to make decision how many samples needs to be collected before they can make the decision whether the product is going to be launched or not. If you need longer course which covers the very basics of statistical thinking have a look at 5 day "Statistics for Managers" training. What statistics can offer to Decision Makers Descriptive Statistics Basic statistics - which of the statistics (e.g. median, average, percentiles etc...) are more relevant to different distributions Graphs - significance of getting it right (e.g. how the way the graph is created reflects the decision) Variable types - what variables are easier to deal with Ceteris paribus, things are always in motion Third variable problem - how to find the real influencer Inferential Statistics Probability value - what is the meaning of P-value Repeated experiment - how to interpret repeated experiment results Data collection - you can minimize bias, but not get rid of it Understanding confidence level Statistical Thinking Decision making with limited information how to check how much information is enough prioritizing goals based on probability and potential return (benefit/cost ratio ration, decision trees) How errors add up Butterfly effect Black swans What is Schrödinger's cat and what is Newton's Apple in business Cassandra Problem - how to measure a forecast if the course of action has changed Google Flu trends - how it went wrong How decisions make forecast outdated Forecasting - methods and practicality ARIMA Why naive forecasts are usually more responsive How far a forecast should look into the past? Why more data can mean worse forecast? Statistical Methods useful for Decision Makers Describing Bivariate Data Univariate data and bivariate data Probability why things differ each time we measure them? Normal Distributions and normally distributed errors Estimation Independent sources of information and degrees of freedom Logic of Hypothesis Testing What can be proven, and why it is always the opposite what we want (Falsification) Interpreting the results of Hypothesis Testing Testing Means Power How to determine a good (and cheap) sample size False positive and false negative and why it is always a trade-off |

bigddbsysfun | Big Data & Database Systems Fundamentals | 14 hours | The course is part of the Data Scientist skill set (Domain: Data and Technology). Data Warehousing Concepts What is Data Ware House? Difference between OLTP and Data Ware Housing Data Acquisition Data Extraction Data Transformation. Data Loading Data Marts Dependent vs Independent data Mart Data Base design ETL Testing Concepts: Introduction. Software development life cycle. Testing methodologies. ETL Testing Work Flow Process. ETL Testing Responsibilities in Data stage. Big data Fundamentals Big Data and its role in the corporate world The phases of development of a Big Data strategy within a corporation Explain the rationale underlying a holistic approach to Big Data Components needed in a Big Data Platform Big data storage solution Limits of Traditional Technologies Overview of database types NoSQL Databases Hadoop Map Reduce Apache Spark |

datamodeling | Pattern Recognition | 35 hours | This course provides an introduction into the field of pattern recognition and machine learning. It touches on practical applications in statistics, computer science, signal processing, computer vision, data mining, and bioinformatics. The course is interactive and includes plenty of hands-on exercises, instructor feedback, and testing of knowledge and skills acquired. Audience Data analysts PhD students, researchers and practitioners Introduction Probability theory, model selection, decision and information theory Probability distributions Linear models for regression and classification Neural networks Kernel methods Sparse kernel machines Graphical models Mixture models and EM Approximate inference Sampling methods Continuous latent variables Sequential data Combining models |

67795 | Numerical Methods | 14 hours | This course is for data scientists and statisticians that have some familiarity with numerical methods and have at least one programming language from R, Python, Octave, and some C++ options. The emphasis of this course is on the practical aspects of data/model preparation, execution, post hoc analysis and visualization. The purpose of this course is to give a practical introduction in numerical methods to participants interested in applying the methods at work. Sector specific examples are used to make the training relevant to the audience. Topics Covered: curve fitting regression robust regression linear algebra: matrix operations eigenvalue/eigenvectormatrix decompositions ordinary & partial differential equations fourier analysis interpolation & splines |

rprogadv | Fortgeschrittene "R"-Programmierung | 7 hours | Dieser Kurs ist ausgelegt für Data Scientists and Statistiker die breits Grundkenntnisse in "R & C++ coding skills und R-Code haben und fortgeschrittene "R-coding-skills" benötigen. Es handelt sich um einen praxisorientierten Fortgeschrittenen-Kurs in der Programmiersprache "R" für alle diejenigen, die die Methoden für die Arbeit benötigen. Bereichsspezifische Beispiele erhöhen die Relevanz der Schulung für die Teilnehmer * Entwicklungsumgebung * Objektorientieres Programmieren in R * S3 * S4 Referenzklassen * Performanz profiling * Fehlerbehandlung * Debugging von R-Code * Erstellen von R packages * Unit testing * C/C++ Programmierung in R * SEXPRs * Aufruf dynamisch geladener R Bibliotheken * Schreiben und Kompilieren von C/C++-Code aus R * Verbessern der R Performanz mittels "C++ Lineare Algebra Bibliothek" |

rprogda | R Programming for Data Analysis | 14 hours | This course is part of the Data Scientist skill set (Domain: Data and Technology) Introduction and preliminaries Making R more friendly, R and available GUIs Rstudio Related software and documentation R and statistics Using R interactively An introductory session Getting help with functions and features R commands, case sensitivity, etc. Recall and correction of previous commands Executing commands from or diverting output to a file Data permanency and removing objects Simple manipulations; numbers and vectors Vectors and assignment Vector arithmetic Generating regular sequences Logical vectors Missing values Character vectors Index vectors; selecting and modifying subsets of a data set Other types of objects Objects, their modes and attributes Intrinsic attributes: mode and length Changing the length of an object Getting and setting attributes The class of an object Arrays and matrices Arrays Array indexing. Subsections of an array Index matrices The array() function The outer product of two arrays Generalized transpose of an array Matrix facilities Matrix multiplication Linear equations and inversion Eigenvalues and eigenvectors Singular value decomposition and determinants Least squares fitting and the QR decomposition Forming partitioned matrices, cbind() and rbind() The concatenation function, (), with arrays Frequency tables from factors Lists and data frames Lists Constructing and modifying lists Concatenating lists Data frames Making data frames attach() and detach() Working with data frames Attaching arbitrary lists Managing the search path Data manipulation Selecting, subsetting observations and variables Filtering, grouping Recoding, transformations Aggregation, combining data sets Character manipulation, stringr package Reading data Txt files CSV files XLS, XLSX files SPSS, SAS, Stata,… and other formats data Exporting data to txt, csv and other formats Accessing data from databases using SQL language Probability distributions R as a set of statistical tables Examining the distribution of a set of data One- and two-sample tests Grouping, loops and conditional execution Grouped expressions Control statements Conditional execution: if statements Repetitive execution: for loops, repeat and while Writing your own functions Simple examples Defining new binary operators Named arguments and defaults The '...' argument Assignments within functions More advanced examples Efficiency factors in block designs Dropping all names in a printed array Recursive numerical integration Scope Customizing the environment Classes, generic functions and object orientation Graphical procedures High-level plotting commands The plot() function Displaying multivariate data Display graphics Arguments to high-level plotting functions Basic visualisation graphs Multivariate relations with lattice and ggplot package Using graphics parameters Graphics parameters list Automated and interactive reporting Combining output from R with text |

bigdatar | Programming with Big Data in R | 21 hours | Introduction to Programming Big Data with R (bpdR) Setting up your environment to use pbdR Scope and tools available in pbdR Packages commonly used with Big Data alongside pbdR Message Passing Interface (MPI) Using pbdR MPI 5 Parallel processing Point-to-point communication Send Matrices Summing Matrices Collective communication Summing Matrices with Reduce Scatter / Gather Other MPI communications Distributed Matrices Creating a distributed diagonal matrix SVD of a distributed matrix Building a distributed matrix in parallel Statistics Applications Monte Carlo Integration Reading Datasets Reading on all processes Broadcasting from one process Reading partitioned data Distributed Regression Distributed Bootstrap |

mlintro | Introduction to Machine Learning | 7 hours | This training course is for people that would like to apply basic Machine Learning techniques in practical applications. Audience Data scientists and statisticians that have some familiarity with machine learning and know how to program R. The emphasis of this course is on the practical aspects of data/model preparation, execution, post hoc analysis and visualization. The purpose is to give a practical introduction to machine learning to participants interested in applying the methods at work Sector specific examples are used to make the training relevant to the audience. Naive Bayes Multinomial models Bayesian categorical data analysis Discriminant analysis Linear regression Logistic regression GLM EM Algorithm Mixed Models Additive Models Classification KNN Ridge regression Clustering |

dmmlr | Data Mining & Machine Learning with R | 14 hours | Introduction to Data mining and Machine Learning Statistical learning vs. Machine learning Iteration and evaluation Bias-Variance trade-off Regression Linear regression Generalizations and Nonlinearity Exercises Classification Bayesian refresher Naive Bayes Dicriminant analysis Logistic regression K-Nearest neighbors Support Vector Machines Neural networks Decision trees Exercises Cross-validation and Resampling Cross-validation approaches Bootstrap Exercises Unsupervised Learning K-means clustering Examples Challenges of unsupervised learning and beyond K-means Advanced topics Ensemble models Mixed models Boosting Examples Multidimensional reduction Factor Analysis Principal Component Analysis Examples |

stats2 | Statistik Level 2 | 28 hours | This statistics course covers advanced statistics. It explains most of the tools commonly used in research, analysis, forecasting. It provides very short explanation of theory behind the formulas. This course does not related to any specific field of knowledge, but can be tailored if all the delegates have the same background and goals. Some basic computers tools are used during this course (notably Excel and OpenOffice) Describing Bivariate Data Introduction to Bivariate Data Values of the Pearson Correlation Guessing Correlations Simulation Properties of Pearson's r Computing Pearson's r Restriction of Range Demo Variance Sum Law II Exercises Probability Introduction Basic Concepts Conditional Probability Demo Gamblers Fallacy Simulation Birthday Demonstration Binomial Distribution Binomial Demonstration Base Rates Bayes' Theorem Demonstration Monty Hall Problem Demonstration Exercises Normal Distributions Introduction History Areas of Normal Distributions Varieties of Normal Distribution Demo Standard Normal Normal Approximation to the Binomial Normal Approximation Demo Exercises Sampling Distributions Introduction Basic Demo Sample Size Demo Central Limit Theorem Demo Sampling Distribution of the Mean Sampling Distribution of Difference Between Means Sampling Distribution of Pearson's r Sampling Distribution of a Proportion Exercises Estimation Introduction Degrees of Freedom Characteristics of Estimators Bias and Variability Simulation Confidence Intervals Exercises Logic of Hypothesis Testing Introduction Significance Testing Type I and Type II Errors One- and Two-Tailed Tests Interpreting Significant Results Interpreting Non-Significant Results Steps in Hypothesis Testing Signficance Testing and Confidence Intervals Misconceptions Exercises Testing Means Single Mean t Distribution Demo Difference between Two Means (Independent Groups) Robustnes Simulation All Pairwise Comparisons Among Means Specific Comparisons Difference between Two Means (Correlated Pairs) Correlated t Simulation Specific Comparisons (Correlated Observations) Pairwise Comparisons (Correlated Observations) Exercises Power Introduction Factors Affecting Power Why power matters Exercises Prediction Introduction to Simple Linear Regression Linear Fit Demo Partitioning Sums of Squares Standard Error of the Estimate Prediction Line Demo Inferential Statistics for b and r Exercises ANOVA Introduction ANOVA Designs One-Factor ANOVA (Between-Subjects) One-Way Demo Multi-Factor ANOVA (Between-Subjects) Unequal Sample Sizes Tests Supplementing ANOVA Within-Subjects ANOVA Power of Within-Subjects Designs Demo Exercises Chi Square Chi Square Distribution One-Way Tables Testing Distributions Demo Contingency Tables 2 x 2 Table Simulation Exercises |

datascience | Data Science Training | 21 hours | Data Science Training Aim: Obtaining the required knowledge for application of Data Science methods and also getting consultancy for establishing a Data Science team in an insurance company Order: 2-3 days training and consulting in Data Science: One goal is getting consultancy in the introduction and establishment of Data Science, and the statistical environment R as Data Science tool, within a company / organization. Another goal represents the prediction of typical Key Performance Indicators (KPI) and their confidence intervals with R. Suitable reporting and communication of these KPIs to the management board should be trained also. On the basis of use cases which are derived from actual problems in Actuarial Science and Data Science, the respective methods and their implementation in R should be trained and discussed. Content: 1.) Modelling KPIs 1a.) Based on a use case, the modelling of respective KPI via R shall be discussed. Especially following topics have to be concerned: - Using R as a tool to analyze the performance of insurance portfolios - Suitable data organization within R - Application of Bayesian Theory (preferred using Stan Library in R) - Validation of statistical models - Suitable reporting of KPIs, visualization and communication of models and statistical results to the management board Target group: Data Scientists 2) Establishing a Data Science team within an organization Based on practical experience, it should be taught how to establish a Data Science team and R as a Data Science tool within a larger company. Especially the following topics have to be concerned: - Required hardware and software - Definition of interfaces to other teams (Data Integration / Data Governance / IT) - Standardization (Projects / Coding Styles / Methods) - Information Management - Documentation, reproducibility, allocation of tasks - Networking - Compliance Target group: Data Scientists, management board 3.) Claims reserving with R using state of the art methods Using the ChainLadder R Package, reserving shall be conducted. The focus lies on: - Application of state-of-the-art claims reserving methods including o Basic Chain-Ladder o Mack Chain-Ladder o Generalized linear modelling o Bayesian Approach - Estimation of claim severity in case quickly growing portfolios - Prediction of future claim severity in case of a fixed portfolio - Modelling cancellation Target group: Data Scientists, Actuaries Extent: 2-3 day training / consulting Requirements - in-house training is preferred - Training is based on real-life insurance data / experience |

appliedml | Angewandtes Maschinelles Lernen | 14 hours | Der Übungskurs ist für alle diejenigen gedacht, die "Machine Learning" in praktischen Applikationen anwenden möchten Teilnehmer Dieser Kurs ist für Data Scientists und Statistiker, die Grundkenntnisse in Statistik haben und wissen, wie man R programmiert. Der Schwerpunkt des Kurses liegt auf dem praktischen Aspekt von Daten/Modell-Vorbereitung, Execution, post hoc Analyse und Visualisierung. Das Ziel ist es, den Teilnehmern praktische Kenntnisse im Maschinellen Lernen zu vermitteln. Bereichsspezifische Beispiele erhöhen die Relevanz der Schulung für die Teilnehmer. Naive Bayes Multinomial Modelle Bayesian categorical Datenanalyse Diskriminante Analyse Lineare Regression Logistischge Regression GLM EM Algorithm Mixed Models Zusätzliche Modelle Klassifikation KNN Bayesian Graphik-Modelle Factor Analysis (FA) Principal Component Analysis (PCA) Independent Component Analysis (ICA) Support Vector Machines (SVM) für Regression und Klassifikation Boosting Ensemble Modelle Neural networks Hidden Markov Models (HMM) Space State Modelle Clustering |

predmodr | Predictive Modelling with R | 14 hours | Problems facing forecasters Customer demand planning Investor uncertainty Economic planning Seasonal changes in demand/utilization Roles of risk and uncertainty Time series Forecasting Seasonal adjustment Moving average Exponential smoothing Extrapolation Linear prediction Trend estimation Stationarity and ARIMA modelling Econometric methods (casual methods) Regression analysis Multiple linear regression Multiple non-linear regression Regression validation Forecasting from regression Judgemental methods Surveys Delphi method Scenario building Technology forecasting Forecast by analogy Simulation and other methods Simulation Prediction market Probabilistic forecasting and Ensemble forecasting |

stats1 | Statistik Level 1 | 14 hours | This course has been created for people who require general statistics skills. This course can be tailored to a specific area of expertise like market research, biology, manufacturing, public sector research, etc... Introduction Descriptive Statistics Inferential Statistics Sampling Demonstration Variables Percentiles Measurement Levels of Measurement Measurement Demonstration Basics of Data Collection Distributions Summation Notation Linear Transformations Exercises Graphing Distributions Qualitative Variables Quantitative Variables Stem and Leaf Displays Histograms Frequency Polygons Box Plots Box Plot Demonstration Bar Charts Line Graphs Exercises Summarizing Distributions Central Tendency What is Central Tendency Measures of Central Tendency Balance Scale Simulation Absolute Difference Simulation Squared Differences Simulation Median and Mean Mean and Median Simulation Additional Measures Comparing measures Variability Measures of Variability Estimating Variance Simulation Shape Comparing Distributions Demo Effects of Transformations Variance Sum Law I Exercises Normal Distributions History Areas of Normal Distributions Varieties of Normal Distribution Demo Standard Normal Normal Approximation to the Binomial Normal Approximation Demo Exercises |

kdd | Knowledge Discover in Databases (KDD) | 21 hours | Knowledge discovery in databases (KDD) is the process of discovering useful knowledge from a collection of data. Real-life applications for this data mining technique include marketing, fraud detection, telecommunication and manufacturing. In this course, we introduce the processes involved in KDD and carry out a series of exercises to practice the implementation of those processes. Audience Data analysts or anyone interested in learning how to interpret data to solve problems Format of the course After a theoretical discussion of KDD, the instructor will present real-life cases which call for the application of KDD to solve a problem. Participants will prepare, select and cleanse sample data sets and use their prior knowledge about the data to propose solutions based on the results of their observations. Introduction KDD vs data mining Establishing the application domain Establishing relevant prior knowledge Understanding the goal of the investigation Creating a target data set Data cleaning and preprocessing Data reduction and projection Choosing the data mining task Choosing the data mining algorithms Interpreting the mined patterns |

dataminr | Data Mining with R | 14 hours | Sources of methods xxx Artificial intelligence Machine learning Statistics Sources of data Pre processing of data Data Import/Export Data Exploration and Visualization Dimensionality Reduction Dealing with missing values R Packages Data mining main tasks Automatic or semi-automatic analysis of large quantities of data Extracting previously unknown interesting patterns groups of data records (cluster analysis) unusual records (anomaly detection) dependencies (association rule mining) Data mining Anomaly detection (Outlier/change/deviation detection) Association rule learning (Dependency modeling) Clustering Classification Regression Summarization Frequent Pattern Mining Text Mining Decision Trees Regression Neural Networks Sequence Mining Frequent Pattern Mining Data dredging, data fishing, data snooping |

tableauvra | Visual Reporting and Analysis with Tableau | 7 hours | Connecting to Data Connecting to various databases – data connection types Multiple data sources & data blending Creating Basic Visualizations Sorting, Filtering, Organizing data Using Multiple Measures on the Same Axis Showing the Relationship between Numerical Values Mapping Data Geographically Tableau geocoding – advanced mapping + using Background Images Basic calculations and aggregations Parameters, references lines Overview of additional visualizations Dashboards: quick filters, actions, and parameters Advanced calculations Tips & tricks – parameters, calculations, sorting, filtering etc. Best practices when using Tableau |

statsres | Statistik für Forscher | 35 hours | This course aims to give researchers an understanding of the principles of statistical design and analysis and their relevance to research in a range of scientific disciplines. It covers some probability and statistical methods, mainly through examples. This training contains around 30% of lectures, 70% of guided quizzes and labs. In the case of closed course we can tailor the examples and materials to a specific branch (like psychology tests, public sector, biology, genetics, etc...) In the case of public courses, mixed examples are used. Though various software is used during this course (Microsoft Excel to SPSS, Statgraphs, etc...) its main focus is on understanding principles and processes guiding research, reasoning and conclusion. This course can be delivered as a blended course i.e. with homework and assignments. Scientific Method, Probability & Statistics Very short history of statistics Why can be "confident" about the conclusions Probability and decision making Preparation for research (deciding "what" and "how") The big picture: research is a part of a process with inputs and outputs Gathering data Questioners and measurement What to measure Observational Studies Design of Experiments Analysis of Data and Graphical Methods Research Skills and Techniques Research Management Describing Bivariate Data Introduction to Bivariate Data Values of the Pearson Correlation Guessing Correlations Simulation Properties of Pearson's r Computing Pearson's r Restriction of Range Demo Variance Sum Law II Exercises Probability Introduction Basic Concepts Conditional Probability Demo Gamblers Fallacy Simulation Birthday Demonstration Binomial Distribution Binomial Demonstration Base Rates Bayes' Theorem Demonstration Monty Hall Problem Demonstration Exercises Normal Distributions Introduction History Areas of Normal Distributions Varieties of Normal Distribution Demo Standard Normal Normal Approximation to the Binomial Normal Approximation Demo Exercises Sampling Distributions Introduction Basic Demo Sample Size Demo Central Limit Theorem Demo Sampling Distribution of the Mean Sampling Distribution of Difference Between Means Sampling Distribution of Pearson's r Sampling Distribution of a Proportion Exercises Estimation Introduction Degrees of Freedom Characteristics of Estimators Bias and Variability Simulation Confidence Intervals Exercises Logic of Hypothesis Testing Introduction Significance Testing Type I and Type II Errors One- and Two-Tailed Tests Interpreting Significant Results Interpreting Non-Significant Results Steps in Hypothesis Testing Signficance Testing and Confidence Intervals Misconceptions Exercises Testing Means Single Mean t Distribution Demo Difference between Two Means (Independent Groups) Robustnes Simulation All Pairwise Comparisons Among Means Specific Comparisons Difference between Two Means (Correlated Pairs) Correlated t Simulation Specific Comparisons (Correlated Observations) Pairwise Comparisons (Correlated Observations) Exercises Power Introduction Example Calculations Factors Affecting Power Exercises Prediction Introduction to Simple Linear Regression Linear Fit Demo Partitioning Sums of Squares Standard Error of the Estimate Prediction Line Demo Inferential Statistics for b and r Exercises ANOVA Introduction ANOVA Designs One-Factor ANOVA (Between-Subjects) One-Way Demo Multi-Factor ANOVA (Between-Subjects) Unequal Sample Sizes Tests Supplementing ANOVA Within-Subjects ANOVA Power of Within-Subjects Designs Demo Exercises Chi Square Chi Square Distribution One-Way Tables Testing Distributions Demo Contingency Tables 2 x 2 Table Simulation Exercises Case Studies Analysis of selected case studies |

druid | Druid: Build a fast, real-time data analysis system | 21 hours | Druid is an open-source, column-oriented, distributed data store written in Java. It was designed to quickly ingest massive quantities of event data and execute low-latency OLAP queries on that data. Druid is commonly used in business intelligence applications to analyze high volumes of real-time and historical data. It is also well suited for powering fast, interactive, analytic dashboards for end-users. Druid is used by companies such as Alibaba, Airbnb, Cisco, eBay, Netflix, Paypal, and Yahoo. In this course we explore some of the limitations of data warehouse solutions and discuss how Druid can compliment those technologies to form a flexible and scalable streaming analytics stack. We walk through many examples, offering participants the chance to implement and test Druid-based solutions in a lab environment. Audience Application developers Software engineers Technical consultants DevOps professionals Architecture engineers Format of the course Part lecture, part discussion, heavy hands-on practice, occasional tests to gauge understanding Introduction Installing and starting Druid Druid architecture and design Real-time ingestion of event data Sharding and indexing Loading data Querying data Visualizing data Running a distributed cluster Druid + Apache Hive Druid + Apache Kafka Druid + others Troubleshooting Administrative tasks |

MLFWR1 | Machine Learning Fundamentals with R | 14 hours | The aim of this course is to provide a basic proficiency in applying Machine Learning methods in practice. Through the use of the R programming platform and its various libraries, and based on a multitude of practical examples this course teaches how to use the most important building blocks of Machine Learning, how to make data modeling decisions, interpret the outputs of the algorithms and validate the results. Our goal is to give you the skills to understand and use the most fundamental tools from the Machine Learning toolbox confidently and avoid the common pitfalls of Data Sciences applications. Introduction to Applied Machine Learning Statistical learning vs. Machine learning Iteration and evaluation Bias-Variance trade-off Regression Linear regression Generalizations and Nonlinearity Exercises Classification Bayesian refresher Naive Bayes Logistic regression K-Nearest neighbors Exercises Cross-validation and Resampling Cross-validation approaches Bootstrap Exercises Unsupervised Learning K-means clustering Examples Challenges of unsupervised learning and beyond K-means |

spssanal | Statistical Analysis using SPSS | 21 hours | Getting started with SPSS Obtaining, Editing, and saving Statstical output Manipulating Data Descriptive Statistics Procedures Evaluating Score Distribution Assumptions t Tests Univariate Group Differences: Anova and Ancova Multivariate Group Dfferences: Manova Nonparametric procedures for ananlysing frequesncy data Correlations Regression with Quantitative Variables Regression with Categorical Variables Principal Components Analysys and Factor Analysis |

statsman | Statistik für Manager | 35 hours | This course has been created for decision makers whose primary goal is not to do the calculation and the analysis, but to understand them. The course uses a lot of pictures, diagrams, computer simulations, anecdotes and sense of humour to explain concepts and pitfalls of statistics. Introduction to Statistics What are Statistics? Importance of Statistics Descriptive Statistics Inferential Statistics Variables Percentiles Measurement Levels of Measurement Basics of Data Collection Distributions Summation Notation Linear Transformations Common Pitfalls Biased samples Average, mean or median? Misleading graphs Semi-attached figures Third variable problem Ceteris paribus Errors in reasoning Understanding confidence level Understanding Results Describing Bivariate Data Probability Normal Distributions Sampling Distributions Estimation Logic of Hypothesis Testing Testing Means Power Prediction ANOVA Chi Square Case Studies Discussion about case studies chosen by the delegates. |

mlentre | Angewandtes Maschinelles Lernen | 21 hours | Der Übungskurs ist für alle diejenigen gedacht, die "Machine Learning" in praktischen Applikationen anwenden möchten Teilnehmer Dieser Kurs ist für Data Scientists und Statistiker, die Grundkenntnisse in Statistik haben und wissen, wie man R programmiert. Der Schwerpunkt des Kurses liegt auf dem praktischen Aspekt von Daten/Modell-Vorbereitung, Execution, post hoc Analyse und Visualisierung. Das Ziel ist es, den Teilnehmern praktische Kenntnisse im Maschinellen Lernen zu vermitteln. Bereichsspezifische Beispiele erhöhen die Relevanz der Schulung für die Teilnehmer. Naive Bayes Multinomial Modelle Bayesian categorical Datenanalyse Diskriminante Analyse Lineare Regression Logistischge Regression GLM EM Algorithm Mixed Models Zusätzliche Modelle Klassifikation KNN Bayesian Graphik-Modelle Factor Analysis (FA) Principal Component Analysis (PCA) Independent Component Analysis (ICA) Support Vector Machines (SVM) für Regression und Klassifikation Boosting Ensemble Modelle Neural networks Hidden Markov Models (HMM) Space State Modelle Clustering |

octaveda | Octave for Data Analysis | 14 hours | Audience: This course is for data scientists and statisticians that have some familiarity statistical methods and would like to use the Octave programming language at work. The purpose of this course is to give a practical introduction in Octave programming to participants interested in using this programming language at work. environment data types: numeric string, arrays matrices variables expressions control flow functions exception handling debugging input/output linear algebra optimization statistical distributions regression plotting |

intror | Introduction to R with Time Series Analysis | 21 hours | Introduction and preliminaries Making R more friendly, R and available GUIs Rstudio Related software and documentation R and statistics Using R interactively An introductory session Getting help with functions and features R commands, case sensitivity, etc. Recall and correction of previous commands Executing commands from or diverting output to a file Data permanency and removing objects Simple manipulations; numbers and vectors Vectors and assignment Vector arithmetic Generating regular sequences Logical vectors Missing values Character vectors Index vectors; selecting and modifying subsets of a data set Other types of objects Objects, their modes and attributes Intrinsic attributes: mode and length Changing the length of an object Getting and setting attributes The class of an object Arrays and matrices Arrays Array indexing. Subsections of an array Index matrices The array() function The outer product of two arrays Generalized transpose of an array Matrix facilities Matrix multiplication Linear equations and inversion Eigenvalues and eigenvectors Singular value decomposition and determinants Least squares fitting and the QR decomposition Forming partitioned matrices, cbind() and rbind() The concatenation function, (), with arrays Frequency tables from factors Lists and data frames Lists Constructing and modifying lists Concatenating lists Data frames Making data frames attach() and detach() Working with data frames Attaching arbitrary lists Managing the search path Data manipulation Selecting, subsetting observations and variables Filtering, grouping Recoding, transformations Aggregation, combining data sets Character manipulation, stringr package Reading data Txt files CSV files XLS, XLSX files SPSS, SAS, Stata,… and other formats data Exporting data to txt, csv and other formats Accessing data from databases using SQL language Probability distributions R as a set of statistical tables Examining the distribution of a set of data One- and two-sample tests Grouping, loops and conditional execution Grouped expressions Control statements Conditional execution: if statements Repetitive execution: for loops, repeat and while Writing your own functions Simple examples Defining new binary operators Named arguments and defaults The '...' argument Assignments within functions More advanced examples Efficiency factors in block designs Dropping all names in a printed array Recursive numerical integration Scope Customizing the environment Classes, generic functions and object orientation Graphical procedures High-level plotting commands The plot() function Displaying multivariate data Display graphics Arguments to high-level plotting functions Basic visualisation graphs Multivariate relations with lattice and ggplot package Using graphics parameters Graphics parameters list Time series Forecasting Seasonal adjustment Moving average Exponential smoothing Extrapolation Linear prediction Trend estimation Stationarity and ARIMA modelling Econometric methods (casual methods) Regression analysis Multiple linear regression Multiple non-linear regression Regression validation Forecasting from regression |

sixsigmagb | Six Sigma Green Belt | 70 hours | Green Belts participate in and lead Lean and Six Sigma projects from within their regular job function. They can tackle projects as part of a cross functional team or projects scoped within their normal job. Each session of Green Belt training is separated by 3 or 4 weeks when the Green Belts apply their training to their improvement projects. We recommend supporting the Green Belts on their projects in between training sessions and holding stage gate reviews along with leadership and Lean Six Sigma Champions to ensure DMAIC methodology is being rigorously applied. Week 1 Foundation: covers the fundamentals of the Lean Six Sigma Define Measure Analyse Improve Control (DMAIC) approach enabling participants to take part and lead waste and defect reduction projects and initiatives. Week 2 Practitioner: provides additional data analysis and lean tools for participants to lead well scoped process improvement projects related to their regular job function. Block 1 Day 1 Introduction to Six Sigma Project Chartering & VOC Process Mapping Stakeholder analysis Day 2 Team Start Up Prioritisation Matrix Lean Thinking Value Stream Mapping Day 3 Data Collection Minitab and Graphical Analysis Descriptive Statistics Day 4 Measurement System Evaluation Process Capability Cp, CpK Six Sigma Metrics Day 5 5 Why FMEA Block 2 Day 1 Review of Block 1 Multivari Inferential Statistics Intro to Hypothesis Testing Day 2 2 sample t-tests F tests Hypothesis Testing – Chi Sq Day 3 Hypothesis Testing - Anova Day 4 Correlation and Regression Multiple Regression Introduction to Design Of Experiments Day 5 Mistake Proofing Control Plans Control Charts |

statsqa | Statistical Quality Analysis | 7 hours | This course covers the fundamentals of statistical process control and how these quality tools can provide the necessary evidence to improve and control processes. Know when and where to use the various types of control charts available in Minitab for your own processes. And learn how to use capability analysis tools to evaluate your processes. Gage R&R, Destructive Testing, Gage Linearity and Bias, Attribute Agreement, Variables and Attribute Control Charts, Capability Analysis for Normal, Non-normal and Attribute data |

kylin | Apache Kylin: From classic OLAP to real-time data warehouse | 14 hours | Apache Kylin is an extreme, distributed analytics engine for big data. In this instructor-led live training, participants will learn how to use Apache Kylin to set up a real-time data warehouse. By the end of this training, participants will be able to: Consume real-time streaming data using Kylin Utilize Apache Kylin's powerful features, including snowflake schema support, a rich SQL interface, spark cubing and subsecond query latency Note We use the latest version of Kylin (as of this writing, Apache Kylin v2.0) Audience Big data engineers Big Data analysts Format of the course Part lecture, part discussion, exercises and heavy hands-on practice To request a customized course outline for this training, please contact us. |

mrkfct | Marktprognose | 14 hours | Audience This course has been created for analysts, forecasters wanting to introduce or improve forecasting which can be related to sale forecasting, economic forecasting, technology forecasting, supply chain management and demand or supply forecasting. Description This course guides delegates through series of methodologies, frameworks and algorithms which are useful when choosing how to predict the future based on historical data. It uses standard tools like Microsoft Excel or some Open Source programs (notably R project). The principles covered in this course can be implemented by any software (e.g. SAS, SPSS, Statistica, MINITAB ...) Problems facing forecasters Customer demand planning Investor uncertainty Economic planning Seasonal changes in demand/utilization Roles of risk and uncertainty Time series methods Moving average Exponential smoothing Extrapolation Linear prediction Trend estimation Growth curve Econometric methods (casual methods) Regression analysis using linear regression or non-linear regression Autoregressive moving average (ARMA) Autoregressive integrated moving average (ARIMA) Econometrics Judgemental methods Surveys Delphi method Scenario building Technology forecasting Forecast by analogy Simulation and other methods Simulation Prediction market Probabilistic forecasting and Ensemble forecasting Reference class forecasting |

advspsspas | Statistik für Fortgeschrittene - Umgang mit SPSS Predictive Analytics SoftWare | 28 hours | Goal: Mastering the skill work independently with the program SPSS for advanced use, dialog boxes, and command language syntax for the selected analytical techniques. The addressees: Analysts, researchers, scientists, students and all those who want to acquire the ability to use SPSS package and advanced level and learn the selected statistical models. Training takes universal analysis problems and it is dedicated to a specific industry Preparation of a database for analysis management of data collection operations on variables transforming the variables selected functions (logarithmic, exponential, etc.) Parametric and nonparametric statistics, or how to fit a model to the data measuring scale distribution type outliers and influential observations (outliers) sample size central limit theorem Study the differences between the characteristics of statistical tests based on the average and media Analysis of correlation and similarities correlations principal component analysis cluster analysis Prediction - single regression analysis and multivariate method of least squares Linear Model instrumental variable regression models (dummy, effect, orthogonal coding) Statistical Inference |

StaEcoMod | Statistical and Econometric Modelling | 21 hours | The Nature of Econometrics and Economic Data Econometrics and models Steps in econometric modelling Types of economic data, time series, cross-sectional, panel Causality in econometric analysis Specification and Data Issues Functional form Proxy variables Measurement error in variables Missing data, outliers, influential observations Regression Analysis Estimation Ordinary least squares (OLS) estimators Classical OLS assumptions, Gauss Markov-Theorem Best Linear Unbiased Estimators Inference Testing statistical significance of parameters t-test(single, group) Confidence intervals Testing multiple linear restrictions, F-test Goodness of fit Testing functional form Missing variables Binary variables Testing for violation of assumptions and their implications: Heteroscedasticity Autocorrelation Multicolinearity Endogeneity Other Estimation techniques Instrumental Variables Estimation Generalised Least Squares Maximum Likelihood Generalised Method of Moments Models for Binary Response Variables Linear Probability Model Probit Model Logit Model Estimation Interpretation of parameters, Marginal Effects Goodness of Fit Limited Dependent Variables Tobit Model Truncated Normal Distribution Interpretation of Tobit Model Specification and Estimation Issues Time Series Models Characteristics of Time Series Decomposition of Time Series Exponential Smoothing Stationarity ARIMA models Co-Integration ECM model Predictive Analysis Forecasting, Planning and Goals Steps in Forecasting Evaluating Forecast Accuracy Redisual Diagnostics Prediction Intervals |

tbladv | Tableau Advanced | 14 hours | Introduction and Getting Started Filtering, Sorting & Grouping Advanced options for filtering and hiding Understanding many options for ordering and grouping your data Sort, Groups, Bins, Sets Interrelation between all options Working with Data in Tableau Dimension versus Measures Data types, Discrete versus Continous Joining Database sources, Inner, Left, Right join Blending different datasources in a single worksheet Working with extracts instead of live connections Data quality problems Metadata and sharing a connection Calculations on Data and Statistics Row-level calculations Aggregate calculations Arithmetic, string, date calculations Custom aggregations and calculated fields Control-flow calculations What is behind the scene Advanced Statistics Working with dates and times Table Calculations Quick table calculations Scope and direction Addressing and partitioning Advanced table calculations Advanced Geo techniques Building basic maps Geographic fields, map options Customizing a geographic view Web Map Service Visualizing non geographical data with background images Mapping tips Distance Calculations Parameters in tableau Creating parameters Parameters in calculated fields Parameter control options Enhancing analysis and visualizations with parameters Building Advanced Chart Visualizations Bar chart variations –bullet, bar-in-bar, highlights chart Date and time visualizations, gantt charts Stacked bars, treemaps, area charts, pie charts Heat map KPI chart Pareto chart Bullet chart Advanced formattting Labels Legends Highlighting Annotations Telling a data story with Dashboards Dashboard framework Filter actions Highlight actions URL actions Cascading filters Trends and Forecasting Understanding and Customizing trend lines Distributions Forecasting Integrating Tableau and R for advanced data analytics Possibility to include different data analytics methods in R on participants request |

nlpwithr | NLP: Natural Language Processing with R | 21 hours | It is estimated that unstructured data accounts for more than 90 percent of all data, much of it in the form of text. Blog posts, tweets, social media, and other digital publications continuously add to this growing body of data. This course centers around extracting insights and meaning from this data. Utilizing the R Language and Natural Language Processing (NLP) libraries, we combine concepts and techniques from computer science, artificial intelligence, and computational linguistics to algorithmically understand the meaning behind text data. Data samples are available in various languages per customer requirements. By the end of this training participants will be able to prepare data sets (large and small) from disparate sources, then apply the right algorithms to analyze and report on its significance. Audience Linguists and programmers Format of the course Part lecture, part discussion, heavy hands-on practice, occasional tests to gauge understanding Introduction NLP and R vs Python Installing and configuring R Studio Installing R packages related to Natural Language Processing (NLP). An overview of R’s text manipulation capabilities Getting started with an NLP project in R Reading and importing data files into R Text manipulation with R Document clustering in R Parts of speech tagging in R Sentence parsing in R Working with regular expressions in R Named-entity recognition in R Topic modeling in R Text classification in R Working with very large data sets Visualizing your results Optimization Integrating R with other languages (Java, Python, etc.) Closing remarks |

datashrinkgov | Data Shrinkage for Government | 14 hours | Why shrink data Relational databases Introduction Aggregation and disaggregation Normalisation and denormalisation Null values and zeroes Joining data Complex joins Cluster analysis Applications Strengths and weaknesses Measuring distance Hierarchical clustering K-means and derivatives Applications in Government Factor analysis Concepts Exploratory factor analysis Confirmatory factor analysis Principal component analysis Correspondence analysis Software Applications in Government Predictive analytics Timelines and naming conventions Holdout samples Weights of evidence Information value Scorecard building demonstration using a spreadsheet Regression in predictive analytics Logistic regression in predictive analytics Decision Trees in predictive analytics Neural networks Measuring accuracy Applications in Government |

dataar | Data Analytics With R | 21 hours | R is a very popular, open source environment for statistical computing, data analytics and graphics. This course introduces R programming language to students. It covers language fundamentals, libraries and advanced concepts. Advanced data analytics and graphing with real world data. Audience Developers / data analytics Duration 3 days Format Lectures and Hands-on Day One: Language Basics Course Introduction About Data Science Data Science Definition Process of Doing Data Science. Introducing R Language Variables and Types Control Structures (Loops / Conditionals) R Scalars, Vectors, and Matrices Defining R Vectors Matricies String and Text Manipulation Character data type File IO Lists Functions Introducing Functions Closures lapply/sapply functions DataFrames Labs for all sections Day Two: Intermediate R Programming DataFrames and File I/O Reading data from files Data Preparation Built-in Datasets Visualization Graphics Package plot() / barplot() / hist() / boxplot() / scatter plot Heat Map ggplot2 package ( qplot(), ggplot()) Exploration With Dplyr Labs for all sections Day Three: Advanced Programming With R Statistical Modeling With R Statistical Functions Dealing With NA Distributions (Binomial, Poisson, Normal) Regression Introducing Linear Regressions Recommendations Text Processing (tm package / Wordclouds) Clustering Introduction to Clustering KMeans Classification Introduction to Classification Naive Bayes Decision Trees Training using caret package Evaluating Algorithms R and Big Data Connecting R to databases Big Data Ecosystem Labs for all sections |

excelafd | Analyse von Finanzdaten in Excel | 14 hours | Audience Financial or market analysts, managers, accountants Course Objectives Facilitate and automate all kinds of financial analysis with Microsoft Excel Advanced functions Logical functions Math and statistical functions Financial functions Lookups and data tables Using lookup functions Using MATCH and INDEX Advanced list management Validating cell entries Exploring database functions PivotTables and PivotCharts Creating Pivot Tables Calculated Item and Calculated Field Working with External Data Exporting and importing Exporting and importing XML data Querying external databases Linking to a database Linking to a XML data source Analysing online data (Web Queries) Analytical options Goal Seek Solver The Analysis ToolPack Scenarios Macros and custom functions Running and recording a macro Working with VBA code Creating functions Conditional formatting and SmartArt Conditional formatting with graphics SmartArt graphics |

surveyrste | Survey Research, Sampling Techniques & Estimation | 14 hours | Survey research: Principle of sample survey design and implementation survey preliminaries sampling methods (probability & non-probability methods) population & sampling frames survey data collection methods Questionnaire design Design and writing of questionnaires Pre-tests & piloting Planning & organisation of surveys Minimising errors, bias & non-response at the design stage Survey data processing Commissioning surveys/research Sample Techniques & Estimation: Sampling techniques and their strengths/weaknesses (may overlap above sampling methods) Simple Random Sampling Unequal Probability Sampling Stratified Sampling (with proportional to size & disproportional selection) Systematic Sampling Cluster sampling Multi-stage Sampling Quota Sampling Estimation Methods of estimating sample sizes Estimating population parameters using sample estimates Variance and confidence intervals estimation Estimating bias/precision Methods of correcting bias Methods of handling missing data Non-response analysis |

datavisR1 | Introduction to Data Visualization with R | 28 hours | This course is intended for data engineers, decision makers and data analysts and will lead you to create very effective plots using R studio that appeal to decision makers and help them find out hidden information and take the right decisions Day 1: overview of R programming introduction to data visualization scatter plots and clusters the use of noise and jitters Day 2: other type of 2D and 3D plots histograms heat charts categorical data plotting Day 3: plotting KPIs with data R and X charts examples dashboards parallel axes mixing categorical data with numeric data Day 4: different hats of data visualization disguised and hidden trends case studies saving plots and loading Excel files |

pgmt | Der Praktikerguide für mulitvariate Techniken | 14 hours | The introduction of the digital computer, and now the widespread availability of computer packages, has opened up a hitherto difficult area of statistics; multivariate analysis. Previously the formidable computing effort associated with these procedures presented a real barrier. That barrier has now disappeared and the analyst can therefore concentrate on an appreciation and an interpretation of the findings. Multivariate Analysis of Variance (MANOVA) Whereas the Analysis of Variance technique (ANOVA) investigates possible systematic differences between prescribes groups of individuals on a single variable, the technique of Multivariate Analysis of Variance is simply an extension of that procedure to numerous variates viewed collectively. These variates could be distinct in nature; for example Height, Weight etc, or repeated measures of a single variate over time or over space. When the variates are repeated measures over time or space, the analyses may often be reduced to a succession of univariate analyses, with easier interpretation. This procedure is often referred to as Repeated Measure Analysis. Principal Component Analysis If only two variates are recorded for a number of individuals, the data may conveniently be represented on a two-dimensional plot. If there are ‘p’ variates then one could imagine a plot of the data in ‘p’ dimensional space. The technique of Principal Component Analysis corresponds to a rotation of the axes so that the maximum amounts of variation are progressively represented along the new axes. It has been described as …….‘peering into multidimensional space, from every conceivable angle, and selecting as the viewing angle that which contains the maximum amount of variation’ The aim therefore is a reduction of the dimensionality of multivariate data. If for example a very high percentage (say 90%) of the variability is contained in the first two principal components, a plot of these components would be a virtually complete pictorial representation of the variability. Discriminant Analysis Suppose that several variates are observed on individuals from two identified groups. The technique of discriminant analysis involves calculating that linear function of the variates that best separates out the groups. The linear function may therefore be used to identify group membership simply from the pattern of variates. Various methods are available to estimate the success in general of this identification procedure. Canonical Variate Analysis Canonical Variate Analysis is in essence an extension of Discriminant Analysis to accommodate the situation where there are more than two groups of individuals. Cluster Analysis Cluster Analysis as the name suggests involves identifying groupings (or clusters) of individuals in multidimensional space. Since here there is no ‘a priori’ grouping of individuals, the identification of so called clusters is a subjective process subject to various assumptions. Most computer packages offer several clustering procedures that may often give differing results. However the pictorial representation of the so called ‘clusters’, in diagrams called dendrograms, provides a very useful diagnostic. Factor Analysis If ‘p’ variates are observed on each of ‘n’ individuals, the technique of factor analysis attempts to identify say ‘r’ (< p) so called factors which determine to a large extent the variate values. The implicit assumption here therefore is that the entire array of ‘p’ variates is controlled by ‘r’ factors. For example the ‘p’ variates could represent the performance of students in numerous examination subjects, and we wish to determine whether a few attributes such as numerical ability, linguistic ability could account for much of the variability. The difficulties here stem from the fact that the so-called factors are not directly observable, and indeed may not really exist. Factor analysis has been viewed very suspiciously over the years, because of the measure of speculation involved in the identification of factors. One popular numerical procedure starts with the rotation of axes using principal components (described above) followed by a rotation of the factors identified. |

mrkanar | Marketinganalytik mit R | 21 hours | Audience: Business owners (marketing managers, product managers, customer base managers) and their teams; customer insights professionals. Overview: The course follows the customer life cycle from acquiring new customers, managing the existing customers for profitability, retaining good customers, and finally understanding which customers are leaving us and why. We will be working with real (if anonymous) data from a variety of industries including telecommunications, insurance, media, and high tech. Format: Instructor-led training over the course of five half-day sessions with in-class exercises as well as homework. It can be delivered as a classroom or distance (online) course. Trainer: The Instructor has gained 19 years experience in customer insights and customer relationship management after originally graduating in experimental particle physics and working at the CERN laboratory. He has worked with large corporations across Europe and North America to transform the way they look at their customers and derive value from data, and has held interim management roles at leading Dutch and Irish mobile phone operators building their Insights and Customer Base Management teams. He is one of the founders of The PCA Group, which helps large corporations in Europe and South America transform their approach to marketing, and of CYBAEA, which provides analytics-as-a-service across the globe with a strong focus on commercial results. His teaching style focuses on practical example and emphasizes results over theoretical sophistication: his courses are for practitioners who need to deliver value to their organizations and while he covers just enough theory to make sure his students are on a firm footing his teaching is not geared to more theoretical students. Expect much hands-on work and very few formula. Part 1: Inflow - acquiring new customers Our focus is direct marketing so we will not look at advertising campaigns but instead focus on understanding marketing campaigns (e.g. direct mail). This is the foundation for almost everything else in the course. We look at measuring and improving campaign effectiveness. including: The importance of test and control groups. Universal control group. Techniques: Lift curves, AUC Return on investment. Optimizing marketing spend. Part 2: Base Management: managing existing customers Considering the cost of acquiring new customers for many businesses there are probably few assets more valuable than their existing customer base, though few think of it in this way. Topics include: 1. Cross-selling and up-selling: Offering the right product or service to the customer at the right time. Techniques: RFM models. Multinomial regression. b. Value of lifetime purchases. 2. Customer segmentation: Understanding the types of customers that you have. Classification models using first simple decision trees, and then random forests and other, newer techniques. Part 3: Retention: Keeping your good customers Understanding which customers are likely to leave and what you can do about it is key to profitability in many industries, especially where there are repeat purchases or subscriptions. We look at propensity to churn models, including Logistic regression: glm (package stats) and newer techniques (especially gbm as a general tool) Tuning models (caret) and introduction to ensemble models. Part 4: Outflow: Understanding who are leaving and why Customers will leave you – that is a fact of life. What is important is to understand who are leaving and why. Is it low value customers who are leaving or is it your best customers? Are they leaving to competitors or because they no longer need your products and services? Topics include: Customer lifetime value models: Combining value of purchases with propensity to churn and the cost of servicing and retaining the customer. Analysing survey data. (Generally useful, but we will do a brief introduction here in the context of exit surveys.) |

rlang | R | 21 hours | Day 1 Introduction and preliminaries Making R more friendly, R and available GUIs Rstudio Related software and documentation R and statistics Using R interactively An introductory session Getting help with functions and features R commands, case sensitivity, etc. Recall and correction of previous commands Executing commands from or diverting output to a file Data permanency and removing objects Simple manipulations; numbers and vectors Vectors and assignment Vector arithmetic Generating regular sequences Logical vectors Missing values Character vectors Index vectors; selecting and modifying subsets of a data set Other types of objects Objects, their modes and attributes Intrinsic attributes: mode and length Changing the length of an object Getting and setting attributes The class of an object Ordered and unordered factors A specific example The function tapply() and ragged arrays Ordered factors Arrays and matrices Arrays Array indexing. Subsections of an array Index matrices The array() function Mixed vector and array arithmetic. The recycling rule The outer product of two arrays Generalized transpose of an array Matrix facilities Matrix multiplication Linear equations and inversion Eigenvalues and eigenvectors Singular value decomposition and determinants Least squares fitting and the QR decomposition Forming partitioned matrices, cbind() and rbind() The concatenation function, (), with arrays Frequency tables from factors Day 2 Lists and data frames Lists Constructing and modifying lists Concatenating lists Data frames Making data frames attach() and detach() Working with data frames Attaching arbitrary lists Managing the search path Data manipulation Selecting, subsetting observations and variables Filtering, grouping Recoding, transformations Aggregation, combining data sets Character manipulation, stringr package Reading data Txt files CSV files XLS, XLSX files SPSS, SAS, Stata,… and other formats data Exporting data to txt, csv and other formats Accessing data from databases using SQL language Probability distributions R as a set of statistical tables Examining the distribution of a set of data One- and two-sample tests Grouping, loops and conditional execution Grouped expressions Control statements Conditional execution: if statements Repetitive execution: for loops, repeat and while Day 3 Writing your own functions Simple examples Defining new binary operators Named arguments and defaults The '...' argument Assignments within functions More advanced examples Efficiency factors in block designs Dropping all names in a printed array Recursive numerical integration Scope Customizing the environment Classes, generic functions and object orientation Statistical analysis in R Linear regression models Generic functions for extracting model information Updating fitted models Generalized linear models Families The glm() function Classification Logistic Regression Linear Discriminant Analysis Unsupervised learning Principal Components Analysis Clustering Methods( k-means, hierarchical clustering, k-medoids) Survival analysis Survival objects in r Kaplan-Meier estimate Confidence bands Cox PH models, constant covariates Cox PH models, time-dependent covariates Graphical procedures High-level plotting commands The plot() function Displaying multivariate data Display graphics Arguments to high-level plotting functions Basic visualisation graphs Multivariate relations with lattice and ggplot package Using graphics parameters Graphics parameters list Automated and interactive reporting Combining output from R with text Creating html, pdf documents |

advr | Advanced R | 7 hours | Rstudio IDE Data manipulation with dplyr, tidyr, reshape2 Object oriented programming in R Performance profiling Exception handling Debugging R code Creating R packages Reproducible research with knitr and RMarkdown C/C++ coding in R Writing and compiling C/C++ code from R |

sixsigmayb | Six Sigma Yellow Belt | 21 hours | Yellow Belt covers the basics of the Six Sigma Define Measure Analyse Improve Control (DMAIC) approach enabling delegates to take part and lead team based waste and defect reduction projects and initiatives. In addition emphasis is placed on applying the problem solving tools into daily roles. At the end of the course you will be equipped to look at your immediate team and role, determine what can be improved and create a business improvement project on a selected opportunity that is aligned to customer requirements. You will be able to analyse the process using visualization tools and identify the waste (non-value adding) components and work to eliminate these from the process. You will apply root cause analysis techniques to identify the underlying causes of defects in the process. The course uses simulations, case study exercises and work based projects to enable delegates to 'learn through doing'. Notes: This course has a minimum class size of 4. And if requested this course can be delivered in 2 days with some reductions to the course content and level of detail in some areas, notably Customer needs; Graphical analysis and Process handover. An overview of project selection and scoping Understanding customer needs and how they impact project aims Discovering processes using visualisation techniques Understanding the causes of work and how to simplify Finding and removing process waste Graphical analysis to understand process performance Problem solving tools to determine root cause Basic solution creation Piloting & implementation Process handover |

sspsspas | Statistik mit SPSS Predictive Analytics SoftWare | 14 hours | Goal: Learning to work with SPSS at the level of independence The addressees: Analysts, researchers, scientists, students and all those who want to acquire the ability to use SPSS package and learn popular data mining techniques. Using the program The dialog boxes input / downloading data the concept of variable and measuring scales preparing a database Generate tables and graphs formatting of the report Command language syntax automated analysis storage and modification procedures create their own analytical procedures Data Analysis descriptive statistics Key terms: eg variable, hypothesis, statistical significance measures of central tendency measures of dispersion measures of central tendency standardization Introduction to research the relationships between variables correlational and experimental methods Summary: This case study and discussion |

tableau1 | Data analysis with Tableau | 14 hours | Connecting to various databases Data connection types Working with Single Data Sources Multiple data sources & data blending Tableau geocoding Advanced mapping + using Background Images Overview of additional visualizations Dashboards: quick filters, actions, and parameters Advanced calculations Parameters, calculations, sorting, filtering etc. Best practices when using Tableau R programming |

dsbda | Data Science for Big Data Analytics | 35 hours | Introduction to Data Science for Big Data Analytics Data Science Overview Big Data Overview Data Structures Drivers and complexities of Big Data Big Data ecosystem and a new approach to analytics Key technologies in Big Data Data Mining process and problems Association Pattern Mining Data Clustering Outlier Detection Data Classification Introduction to Data Analytics lifecycle Discovery Data preparation Model planning Model building Presentation/Communication of results Operationalization Exercise: Case study From this point most of the training time (80%) will be spent on examples and exercises in R and related big data technology. Getting started with R Installing R and Rstudio Features of R language Objects in R Data in R Data manipulation Big data issues Exercises Getting started with Hadoop Installing Hadoop Understanding Hadoop modes HDFS MapReduce architecture Hadoop related projects overview Writing programs in Hadoop MapReduce Exercises Integrating R and Hadoop with RHadoop Components of RHadoop Installing RHadoop and connecting with Hadoop The architecture of RHadoop Hadoop streaming with R Data analytics problem solving with RHadoop Exercises Pre-processing and preparing data Data preparation steps Feature extraction Data cleaning Data integration and transformation Data reduction – sampling, feature subset selection, Dimensionality reduction Discretization and binning Exercises and Case study Exploratory data analytic methods in R Descriptive statistics Exploratory data analysis Visualization – preliminary steps Visualizing single variable Examining multiple variables Statistical methods for evaluation Hypothesis testing Exercises and Case study Data Visualizations Basic visualizations in R Packages for data visualization ggplot2, lattice, plotly, lattice Formatting plots in R Advanced graphs Exercises Regression (Estimating future values) Linear regression Use cases Model description Diagnostics Problems with linear regression Shrinkage methods, ridge regression, the lasso Generalizations and nonlinearity Regression splines Local polynomial regression Generalized additive models Regression with RHadoop Exercises and Case study Classification The classification related problems Bayesian refresher Naïve Bayes Logistic regression K-nearest neighbors Decision trees algorithm Neural networks Support vector machines Diagnostics of classifiers Comparison of classification methods Scalable classification algorithms Exercises and Case study Assessing model performance and selection Bias, Variance and model complexity Accuracy vs Interpretability Evaluating classifiers Measures of model/algorithm performance Hold-out method of validation Cross-validation Tuning machine learning algorithms with caret package Visualizing model performance with Profit ROC and Lift curves Ensemble Methods Bagging Random Forests Boosting Gradient boosting Exercises and Case study Support vector machines for classification and regression Maximal Margin classifiers Support vector classifiers Support vector machines SVM’s for classification problems SVM’s for regression problems Exercises and Case study Identifying unknown groupings within a data set Feature Selection for Clustering Representative based algorithms: k-means, k-medoids Hierarchical algorithms: agglomerative and divisive methods Probabilistic base algorithms: EM Density based algorithms: DBSCAN, DENCLUE Cluster validation Advanced clustering concepts Clustering with RHadoop Exercises and Case study Discovering connections with Link Analysis Link analysis concepts Metrics for analyzing networks The Pagerank algorithm Hyperlink-Induced Topic Search Link Prediction Exercises and Case study Association Pattern Mining Frequent Pattern Mining Model Scalability issues in frequent pattern mining Brute Force algorithms Apriori algorithm The FP growth approach Evaluation of Candidate Rules Applications of Association Rules Validation and Testing Diagnostics Association rules with R and Hadoop Exercises and Case study Constructing recommendation engines Understanding recommender systems Data mining techniques used in recommender systems Recommender systems with recommenderlab package Evaluating the recommender systems Recommendations with RHadoop Exercise: Building recommendation engine Text analysis Text analysis steps Collecting raw text Bag of words Term Frequency –Inverse Document Frequency Determining Sentiments Exercises and Case study |

sixsigmabb | Six Sigma Black Belt | 84 hours | Six Sigma is a data driven approach that tackles variation to improve the performance of products, services and processes, combining practical problem solving and the best scientific approaches found in experimentation and optimisation of systems. The approach has been widely and successfully applied in industry, notably by Motorola, AlliedSignal & General Electric. Black Belt is a qualification for improvement managers in a Six Sigma organisation. You will learn the tools and techniques to take an improvement project through the Define, Measure, Analyse, Improve and Control phases (DMAIC). These techniques include Process Mapping, Measurement System Evaluation, Regression Analysis, Design of Experiments, Statistical Tolerancing, Monte Carlo Simulation and Lean Thinking. The content of the course takes the participants through the DMAIC phases as well as introducing subjects such as Lean Thinking, Design for Six Sigma and discussing important leadership issues and experiences in deploying a Six Sigma programme. Week 1 Foundation: covers the fundamentals of the Lean Six Sigma Define Measure Analyse Improve Control (DMAIC) approach enabling participants to take part and lead waste and defect reduction projects and initiatives. Week 2 Practitioner: provides additional data analysis and lean tools for participants to lead well scoped process improvement projects related to their regular job function. Week 3 Expert: provides regression, design of experiment and data analysis techniques to enable participants to tackle complex problem solving projects that require understanding of the relationships between multiple variables. The trainer has 16 years experience with Six Sigma and as well as leading the deployment of Six Sigma at a number of businesses he has trained and coached over 300 Black Belts. Here are a few comments from previous participants: “Probably the most valuable course I will ever pass” “The content was very well delivered. The examples very relevant. Thank you” “The course was excellent and I am able to use part of it to coach my lean teams here” (Company supervisor who attended with KTP associate) Block 1 Day 1 Introduction to Six Sigma Project Chartering & VOC Process Mapping Stakeholder analysis Day 2 Team Start Up Prioritisation Matrix Lean Thinking Value Stream Mapping Day 3 Data Collection Minitab and Graphical Analysis Descriptive Statistics Day 4 Measurement System Evaluation Process Capability Cp, CpK Six Sigma Metrics Day 5 5 Why FMEA Block 2 Day 1 Review of Block 1 Multivari Inferential Statistics Intro to Hypothesis Testing Day 2 2 sample t-tests F tests Hypothesis Testing – Chi Sq Day 3 Hypothesis Testing - Anova Day 4 Correlation and Regression Multiple Regression Introduction to Design Of Experiments Day 5 Mistake Proofing Control Plans Control Charts Block 3 Day 1 Review of Block 2 2K Factorial Experiments Box Cox Transformations Hypothesis Testing – Non Parametric Day 2 2K Factorial Experiments Fractional Factorial Experiments Day 3 Noise Blocking Robustness Centre Points General Full Factorial Experiments Day 4 Response Surface Experiments Implementing Improvements Creative Solutions Day 5 Intro to Design for Six Sigma Statistical Tolerancing Monte Carlo Simulation Certification Six Sigma is a practical qualification, to demonstrate knowledge of what has been learnt on the course you will need to undertake 2 coursework projects. There is no report to produce but you will be required to present a PowerPoint presentation to the trainer and examiner showing results and method. The projects can cover work you would complete in your normal work, however you will need to show use of the DMAIC problem solving approach and application of Six Sigma and Lean tools. This provides a good balance between the practical approach and more rigorous analysis which together lead to robust solutions. You will be able to contact the trainer for discussions of how Six Sigma tools could benefit you in your project. Examples of projects from previous participants include: Formulating cream texture for seasonality in dairy feeds. Housing Association complaints reduction Multi-variable (cost, efficiency, size) optimisation of a fuel cell Job Scheduling improvement in a factory Ambulance waiting time reduction Reduction in resin thickness variation in glass manufacture NobleProg & Redlands provide Black Belt certification. For delegates that require independent accreditation, NobleProg & Redlands have partnered with the British Quality Foundation (BQF) to provide Lean Six Sigma Black Belt certification. Certification requires passing an exam at the end of the course and completing and presenting two improvement projects that demonstrate understanding and application of the Six Sigma approach and techniques. An additional charge of £600 plus VAT is levied for BQF independent accreditation. |

samr | Statistische Analysen in der Marktforschung | 28 hours | Goal: Improving consumer behavior researcher workshop products and services Addressees The researchers, market analysts, managers and employees of marketing departments, sales departments primarily pharmaceutical and FMCG, students of socio-economic and everyone interested in market research Module 1 Quantitative research Pre-treatment results check the accuracy of the database control of missing data ważenie obserwacji Statistical models multiple regression conjoint analysis classification trees Automate procedures in tracking studies Analysis of data from a marketing experiment The report and draw conclusions Module 2 Qualitative Research The transformation of qualitative data into a quantitative Statistical models for qualitative data |

datama | Data Mining and Analysis | 28 hours | Objective: Delegates be able to analyse big data sets, extract patterns, choose the right variable impacting the results so that a new model is forecasted with predictive results. Data preprocessing Data Cleaning Data integration and transformation Data reduction Discretization and concept hierarchy generation Statistical inference Probability distributions, Random variables, Central limit theorem Sampling Confidence intervals Statistical Inference Hypothesis testing Multivariate linear regression Specification Subset selection Estimation Validation Prediction Classification methods Logistic regression Linear discriminant analysis K-nearest neighbours Naive Bayes Comparison of Classification methods Neural Networks Fitting neural networks Training neural networks issues Decision trees Regression trees Classification trees Trees Versus Linear Models Bagging, Random Forests, Boosting Bagging Random Forests Boosting Support Vector Machines and Flexible disct Maximal Margin classifier Support vector classifiers Support vector machines 2 and more classes SVM’s Relationship to logistic regression Principal Components Analysis Clustering K-means clustering K-medoids clustering Hierarchical clustering Density based clustering Model Assesment and Selection Bias, Variance and Model complexity In-sample prediction error The Bayesian approach Cross-validation Bootstrap methods |

tidyverse | Introduction to Data Visualization with Tidyverse and R | 7 hours | The Tidyverse is a collection of versatile R packages for cleaning, processing, modeling, and visualizing data. Some of the packages included are: ggplot2, dplyr, tidyr, readr, purrr, and tibble. In this instructor-led, live training, participants will learn how to manipulate and visualize data using the tools included in the Tidyverse. By the end of this training, participants will be able to: Perform data analysis and create appealing visualizations Draw useful conclusions from various datasets of sample data Filter, sort and summarize data to answer exploratory questions Turn processed data into informative line plots, bar plots, histograms Import and filter data from diverse data sources, including Excel, CSV, and SPSS files Audience Beginners to the R language Beginners to data analysis and data visualization Format of the course Part lecture, part discussion, exercises and heavy hands-on practice Introduction Tydyverse vs traditional R plotting Setting up your working environment Preparing the dataset Importing and filtering data Wrangling the data Visualizing the data (graphs, scatter plots) Grouping and summarizing the data Visualizing the data (line plots, bar plots, histograms, boxplots) Working with non-standard data Closing remarks |

mtstatda | Minitab für statistische Datenanalyse | 14 hours | The course is aimed at anyone interested in statistical analysis. It provides familiarity with Minitab and will increase the effectiveness and efficiency of your data analysis and improve your knowledge of statistics. Descriptive Statistics Normal Distribution Correlation Regression Trend analysis & forecasting Confidence intervals t-tests proportion tests variance tests Anova Chi Squared tests |

rneuralnet | Training Neural Network in R | 14 hours | This course is an introduction to applying neural networks in real world problems using R-project software. Introduction to Neural Networks What are Neural Networks What is current status in applying neural networks Neural Networks vs regression models Supervised and Unsupervised learning Overview of packages available nnet, neuralnet and others differences between packages and itls limitations Visualizing neural networks Applying Neural Networks Concept of neurons and neural networks A simplified model of the brain Opportunities neuron XOR problem and the nature of the distribution of values The polymorphic nature of the sigmoidal Other functions activated Construction of neural networks Concept of neurons connect Neural network as nodes Building a network Neurons Layers Scales Input and output data Range 0 to 1 Normalization Learning Neural Networks Backward Propagation Steps propagation Network training algorithms range of application Estimation Problems with the possibility of approximation by Examples OCR and image pattern recognition Other applications Implementing a neural network modeling job predicting stock prices of listed |

webappsr | Building Web Applications in R with Shiny | 7 hours | Description: This is a course designed to teach R users how to create web apps without needing to learn cross-browser HTML, Javascript, and CSS. Objective: Covers the basics of how Shiny apps work. Covers all commonly used input/output/rendering/paneling functions from the Shiny library. An overview of Shiny Installation of Shiny for a local use Basic Shiny concepts Basic control accessories - Buttons, sliders, drop down menus Program structure ui.r, server.r Building first application Running your application Customizing interface Html links in Shiny JavaScript and Shiny Advanced control accessories Showing and Hiding elements of UI Dynamic user interfaces Advanced reactivity Animation Downloading uploading data Sharing Shiny web applications An overview of Shiny extensions |

BigData_ | A practical introduction to Data Analysis and Big Data | 35 hours | Participants who complete this training will gain a practical, real-world understanding of Big Data and its related technologies, methodologies and tools. Participants will have the opportunity to put this knowledge into practice through hands-on exercises. Group interaction and instructor feedback make up an important component of the class. The course starts with an introduction to elemental concepts of Big Data, then progresses into the programming languages and methodologies used to perform Data Analysis. Finally, we discuss the tools and infrastructure that enable Big Data storage, Distributed Processing, and Scalability. Audience Developers / programmers IT consultants Format of the course Part lecture, part discussion, hands-on practice and implementation, occasional quizing to measure progress. Introduction to Data Analysis and Big Data What makes Big Data "big"? Velocity, Volume, Variety, Veracity (VVVV) Limits to traditional Data Processing Distributed Processing Statistical Analysis Types of Machine Learning Analysis Data Visualization Languages used for Data Analysis R language Why R for Data Analysis? Data manipulation, calculation and graphical display Python Why Python for Data Analysis? Manipulating, processing, cleaning, and crunching data Approaches to Data Analysis Statistical Analysis Time Series analysis Forecasting with Correlation and Regression models Inferential Statistics (estimating) Descriptive Statistics in Big Data sets (e.g. calculating mean) Machine Learning Supervised vs unsupervised learning Classification and clustering Estimating cost of specific methods Filtering Natural Language Processing Processing text Understaing meaning of the text Automatic text generation Sentiment analysis / Topic analysis Computer Vision Acquiring, processing, analyzing, and understanding images Reconstructing, interpreting and understanding 3D scenes Using image data to make decisions Big Data infrastructure Data Storage Relational databases (SQL) MySQL Postgres Oracle Non-relational databases (NoSQL) Cassandra MongoDB Neo4js Understanding the nuances Hierarchical databases Object-oriented databases Document-oriented databases Graph-oriented databases Other Distributed Processing Hadoop HDFS as a distributed filesystem MapReduce for distributed processing Spark All-in-one in-memory cluster computing framework for large-scale data processing Structured streaming Spark SQL Machine Learning libraries: MLlib Graph processing with GraphX Scalability Public cloud AWS, Google, Aliyun, etc. Private cloud OpenStack, Cloud Foundry, etc. Auto-scalability Choosing the right solution for the problem The future of Big Data Closing remarks |

excelstatsda | Excel für statistische Datenanalyse | 14 hours | Audience Analysts, researchers, scientists, graduates and students and anyone who is interested in learning how to facilitate statistical analysis in Microsoft Excel. Course Objectives This course will help improve your familiarity with Excel and statistics and as a result increase the effectiveness and efficiency of your work or research. This course describes how to use the Analysis ToolPack in Microsoft Excel, statistical functions and how to perform basic statistical procedures. It will explain what Excel limitation are and how to overcome them. Aggregating Data in Excel Statistical Functions Outlines Subtotals Pivot Tables Data Relation Analysis Normal Distribution Descriptive Statistics Linear Correlation Regression Analysis Covariance Analysing Data in Time Trends/Regression line Linear, Logarithmic, Polynominal, Power, Exponential, Moving Average Smoothing Seasonal fluctuations analysis Comparing Populations Confidence Interval for the Mean Test of Hypothesis Concerning the Population Mean Difference Between Mean of Two Populations ANOVA: Analysis of Variances Goodness-of-Fit Test for Discrete Random Variables Test of Independence: Contingency Tables Test Hypothesis Concerning the Variance of Two Populations Forecasting Extrapolation |

apacheh | Administrator Training for Apache Hadoop | 35 hours | Audience: The course is intended for IT specialists looking for a solution to store and process large data sets in a distributed system environment Goal: Deep knowledge on Hadoop cluster administration. 1: HDFS (17%) Describe the function of HDFS Daemons Describe the normal operation of an Apache Hadoop cluster, both in data storage and in data processing. Identify current features of computing systems that motivate a system like Apache Hadoop. Classify major goals of HDFS Design Given a scenario, identify appropriate use case for HDFS Federation Identify components and daemon of an HDFS HA-Quorum cluster Analyze the role of HDFS security (Kerberos) Determine the best data serialization choice for a given scenario Describe file read and write paths Identify the commands to manipulate files in the Hadoop File System Shell 2: YARN and MapReduce version 2 (MRv2) (17%) Understand how upgrading a cluster from Hadoop 1 to Hadoop 2 affects cluster settings Understand how to deploy MapReduce v2 (MRv2 / YARN), including all YARN daemons Understand basic design strategy for MapReduce v2 (MRv2) Determine how YARN handles resource allocations Identify the workflow of MapReduce job running on YARN Determine which files you must change and how in order to migrate a cluster from MapReduce version 1 (MRv1) to MapReduce version 2 (MRv2) running on YARN. 3: Hadoop Cluster Planning (16%) Principal points to consider in choosing the hardware and operating systems to host an Apache Hadoop cluster. Analyze the choices in selecting an OS Understand kernel tuning and disk swapping Given a scenario and workload pattern, identify a hardware configuration appropriate to the scenario Given a scenario, determine the ecosystem components your cluster needs to run in order to fulfill the SLA Cluster sizing: given a scenario and frequency of execution, identify the specifics for the workload, including CPU, memory, storage, disk I/O Disk Sizing and Configuration, including JBOD versus RAID, SANs, virtualization, and disk sizing requirements in a cluster Network Topologies: understand network usage in Hadoop (for both HDFS and MapReduce) and propose or identify key network design components for a given scenario 4: Hadoop Cluster Installation and Administration (25%) Given a scenario, identify how the cluster will handle disk and machine failures Analyze a logging configuration and logging configuration file format Understand the basics of Hadoop metrics and cluster health monitoring Identify the function and purpose of available tools for cluster monitoring Be able to install all the ecosystem components in CDH 5, including (but not limited to): Impala, Flume, Oozie, Hue, Manager, Sqoop, Hive, and Pig Identify the function and purpose of available tools for managing the Apache Hadoop file system 5: Resource Management (10%) Understand the overall design goals of each of Hadoop schedulers Given a scenario, determine how the FIFO Scheduler allocates cluster resources Given a scenario, determine how the Fair Scheduler allocates cluster resources under YARN Given a scenario, determine how the Capacity Scheduler allocates cluster resources 6: Monitoring and Logging (15%) Understand the functions and features of Hadoop’s metric collection abilities Analyze the NameNode and JobTracker Web UIs Understand how to monitor cluster Daemons Identify and monitor CPU usage on master nodes Describe how to monitor swap and memory allocation on all nodes Identify how to view and manage Hadoop’s log files Interpret a log file |

rintrob | Introductory R for Biologists | 28 hours | I. Introduction and preliminaries 1. Overview Making R more friendly, R and available GUIs Rstudio Related software and documentation R and statistics Using R interactively An introductory session Getting help with functions and features R commands, case sensitivity, etc. Recall and correction of previous commands Executing commands from or diverting output to a file Data permanency and removing objects Good programming practice: Self-contained scripts, good readability e.g. structured scripts, documentation, markdown installing packages; CRAN and Bioconductor 2. Reading data Txt files (read.delim) CSV files 3. Simple manipulations; numbers and vectors + arrays Vectors and assignment Vector arithmetic Generating regular sequences Logical vectors Missing values Character vectors Index vectors; selecting and modifying subsets of a data set Arrays Array indexing. Subsections of an array Index matrices The array() function + simple operations on arrays e.g. multiplication, transposition Other types of objects 4. Lists and data frames Lists Constructing and modifying lists Concatenating lists Data frames Making data frames Working with data frames Attaching arbitrary lists Managing the search path 5. Data manipulation Selecting, subsetting observations and variables Filtering, grouping Recoding, transformations Aggregation, combining data sets Forming partitioned matrices, cbind() and rbind() The concatenation function, (), with arrays Character manipulation, stringr package short intro into grep and regexpr 6. More on Reading data XLS, XLSX files readr and readxl packages SPSS, SAS, Stata,… and other formats data Exporting data to txt, csv and other formats 6. Grouping, loops and conditional execution Grouped expressions Control statements Conditional execution: if statements Repetitive execution: for loops, repeat and while intro into apply, lapply, sapply, tapply 7. Functions Creating functions Optional arguments and default values Variable number of arguments Scope and its consequences 8. Simple graphics in R Creating a Graph Density Plots Dot Plots Bar Plots Line Charts Pie Charts Boxplots Scatter Plots Combining Plots II. Statistical analysis in R 1. Probability distributions R as a set of statistical tables Examining the distribution of a set of data 2. Testing of Hypotheses Tests about a Population Mean Likelihood Ratio Test One- and two-sample tests Chi-Square Goodness-of-Fit Test Kolmogorov-Smirnov One-Sample Statistic Wilcoxon Signed-Rank Test Two-Sample Test Wilcoxon Rank Sum Test Mann-Whitney Test Kolmogorov-Smirnov Test 3. Multiple Testing of Hypotheses Type I Error and FDR ROC curves and AUC Multiple Testing Procedures (BH, Bonferroni etc.) 4. Linear regression models Generic functions for extracting model information Updating fitted models Generalized linear models Families The glm() function Classification Logistic Regression Linear Discriminant Analysis Unsupervised learning Principal Components Analysis Clustering Methods(k-means, hierarchical clustering, k-medoids) 5. Survival analysis (survival package) Survival objects in r Kaplan-Meier estimate, log-rank test, parametric regression Confidence bands Censored (interval censored) data analysis Cox PH models, constant covariates Cox PH models, time-dependent covariates Simulation: Model comparison (Comparing regression models) 6. Analysis of Variance One-Way ANOVA Two-Way Classification of ANOVA MANOVA III. Worked problems in bioinformatics Short introduction to limma package Microarray data analysis workflow Data download from GEO: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE1397 Data processing (QC, normalisation, differential expression) Volcano plot Custering examples + heatmaps |

Piwik | Getting started with Piwik | 21 hours | Audience Web analysist Data analysists Market researchers Marketing and sales professionals System administrators Format of course Part lecture, part discussion, heavy hands-on practice Introduction to Piwik Why use Piwik? Piwik vs Google Analystics Setting up Piwik Selecting which websites to monitor Working with the dashboard Understanding visitor activity Actions Referrals Generating reports |

rintro | Einführung in R | 21 hours | Forecasters, statisticians, managers, analysts who want to use R software http://www.r-project.org/. It shows you how to use the software in available GUI's and command lines. Introduction and preliminaries Making R more friendly, R and available GUIs The R environment Related software and documentation R and statistics Using R interactively An introductory session Getting help with functions and features R commands, case sensitivity, etc. Recall and correction of previous commands Executing commands from or diverting output to a file Data permanency and removing objects Simple manipulations; numbers and vectors Vectors and assignment Vector arithmetic Generating regular sequences Logical vectors Missing values Character vectors Index vectors; selecting and modifying subsets of a data set Other types of objects Objects, their modes and attributes Intrinsic attributes: mode and length Changing the length of an object Getting and setting attributes The class of an object Ordered and unordered factors A specific example The function tapply() and ragged arrays Ordered factors Arrays and matrices Arrays Array indexing. Subsections of an array Index matrices The array() function Mixed vector and array arithmetic. The recycling rule The outer product of two arrays Generalized transpose of an array Matrix facilities Matrix multiplication Linear equations and inversion Eigenvalues and eigenvectors Singular value decomposition and determinants Least squares fitting and the QR decomposition Forming partitioned matrices, cbind() and rbind() The concatenation function, (), with arrays Frequency tables from factors Lists and data frames Lists Constructing and modifying lists Concatenating lists Data frames Making data frames attach() and detach() Working with data frames Attaching arbitrary lists Managing the search path Reading data from files The read.table()function The scan() function Accessing builtin datasets Loading data from other R packages Editing data Probability distributions R as a set of statistical tables Examining the distribution of a set of data One- and two-sample tests Grouping, loops and conditional execution Grouped expressions Control statements Conditional execution: if statements Repetitive execution: for loops, repeat and while Writing your own functions Simple examples Defining new binary operators Named arguments and defaults The '...' argument Assignments within functions More advanced examples Efficiency factors in block designs Dropping all names in a printed array Recursive numerical integration Scope Customizing the environment Classes, generic functions and object orientation Statistical models in R Defining statistical models; formulae Contrasts Linear models Generic functions for extracting model information Analysis of variance and model comparison ANOVA tables Updating fitted models Generalized linear models Families The glm() function Nonlinear least squares and maximum likelihood models Least squares Maximum likelihood Some non-standard models Graphical procedures High-level plotting commands The plot() function Displaying multivariate data Display graphics Arguments to high-level plotting functions Low-level plotting commands Mathematical annotation Hershey vector fonts Interacting with graphics Using graphics parameters Permanent changes: The par() function Temporary changes: Arguments to graphics functions Graphics parameters list Graphical elements Axes and tick marks Figure margins Multiple figure environment Device drivers PostScript diagrams for typeset documents Multiple graphics devices Dynamic graphics Packages Standard packages Contributed packages and CRAN Namespaces |

rdataana | R für Datenanalyse und Forschung | 7 hours | Audience managers developers scientists students Format of the course on-line instruction and discussion OR face-to-face workshops The list below gives an idea of the topics that will be covered in the workshop. The number of topics that will be covered depends on the duration of the workshop (i.e. one, two or three days). In a one or two day workshop it may not be possible to cover all topics, and so the workshop will be tailored to suit the specific needs of the learners. A first R session Syntax for analysing one dimensional data arrays Syntax for analysing two dimensional data arrays Reading and writing data files Sub-setting data, sorting, ranking and ordering data Merging arrays Set membership The main statistical functions in R The Normal Distribution (correlation, probabilities, tests for normality and confidence intervals) Ordinary Least Squares Regression T-tests, Analysis of Variance and Multivariable Analysis of Variance Chi-square tests for categorical variables Writing functions in R Writing software (scripts) in R Control structures (e.g. Loops) Graphical methods (including scatterplots, bar charts, pie charts, histograms, box plots and dot charts) Graphical User Interfaces for R |

frcr | Prognosen mit R | 14 hours | This course allows delegate to fully automate the process of forecasting with R Prognosen mit R Dateninspektion Plotten in R Datentransformationen Korrektur nach Kalendertagen Korrektur nach demografischen Daten Einfache Vorhersagemethoden Naive Methode Mittelwerte Drift Methode Saisonale Methode Vorhersagegenauigkeit auswerten Häufig verwendete Benchmarks Aufteilung in Trainings- und Testdatensätze Kreuzvalidierung Regression Lineare Regression Multiple Regression Regression von Zeitreihen Exponentielles Smoothing Einfaches exponentielles Smoothing Lineare Trendmethode nach Holt Exponentielle Trendmethode Gedämpfte Trendmethode Saisonale Methode nach Holt-Winters ARIMA Autoregressive Modelle Modelle mit beweglichen Mittelwerten Nicht-saisonale ARIMA-Modelle Saisonale ARIMA-Modelle |

Course | Schulungsdatum | Kurspreis (Fernkurs / Schulungsraum) |
---|---|---|

Der Praktikerguide für mulitvariate Techniken - Leipzig | Mo, 2017-12-04 09:30 | 2090EUR / 2590EUR |

Six Sigma Yellow Belt - Erfurt | Mo, 2017-12-04 09:30 | 2860EUR / 3180EUR |

Course | Ort | Schulungsdatum | Kurspreis (Fernkurs / Schulungsraum) |
---|---|---|---|

Big Data Business Intelligence for Telecom & Communication Service Providers | Hamburg | Mo, 2017-11-20 09:30 | 6174EUR / 7124EUR |

UML in Enterprise Architect (Workshops) | Frankfurt am Main | Mi, 2017-11-22 09:30 | 2792EUR / 3442EUR |

Introduction to Machine Learning | Leipzig | Fr, 2017-12-01 09:30 | 980EUR / 1330EUR |

UML 2.0 Certification - Advanced Exam Preparation | Köln | Do, 2017-12-21 09:30 | 1872EUR / 2372EUR |

DTP (InDesign, Photoshop, Illustrator, Acrobat) | Düsseldorf | Mo, 2018-02-12 09:30 | 2624EUR / 3574EUR |

Marktprognose | Hannover | Mi, 2018-02-14 09:30 | 1872EUR / 2372EUR |