[ { "question": "You are developing a hands-on workshop to introduce Docker for Windows to attendees. You need to ensure that workshop attendees can inst all Docker on their devices. Which two prerequisite components should attendees install on the devices? Each correct answer present s part of the solution. NOTE: Each correct selection is worth one point.", "options": [ "A. Microsoft Hardware-Assisted Virtualization Detect ion Tool", "B. Kitematic", "C. BIOS-enabled virtualization", "D. VirtualBox" ], "correct": "", "explanation": "C: Make sure your Windows system supports Hardware Virtualization Technology and that virtualization i s enabled. Ensure that hardware virtualization support is turn ed on in the BIOS settings. For example: E: To run Docker, your machine must have a 64-bit o perating system running Windows 7 or higher.", "references": "https://docs.docker.com/toolbox/toolbox_install_win dows/ D283ABFBEDB32CDCE3B3406B9C29DB2F https://blogs.technet.microsoft.com/canitpro/2015/0 9/08/step-by-step-enabling-hyper-v-for-use-on-windo ws-10/" }, { "question": "Your team is building a data engineering and data s cience development environment. The environment must support the following requirem ents: support Python and Scala compose data storage, movement, and processing serv ices into automated data pipelines the same tool should be used for the orchestration of both data engineering and data science support workload isolation and interactive workload s enable scaling across a cluster of machines You need to create the environment. What should you do?", "options": [ "A. Build the environment in Apache Hive for HDInsigh t and use Azure Data Factory for orchestration.", "B. Build the environment in Azure Databricks and use Azure Data Factory for orchestration.", "C. Build the environment in Apache Spark for HDInsig ht and use Azure Container Instances for orchestrat ion.", "D. Build the environment in Azure Databricks and use Azure Container Instances for orchestration." ], "correct": "B. Build the environment in Azure Databricks and use Azure Data Factory for orchestration.", "explanation": "In Azure Databricks, we can create two different ty pes of clusters. Standard, these are the default cl usters and can be used with Python, R, Scala and SQL High-concurrency Azure Databricks is fully integrated with Azure Dat a Factory. Incorrect Answers: D: Azure Container Instances is good for developmen t or testing. Not suitable for production workloads . Reference: https://docs.microsoft.com/en-us/azure/architecture /data-guide/technology-choices/data-science-and-mac hine- learning", "references": "" }, { "question": "DRAG DROP You are building an intelligent solution using mach ine learning models. The environment must support the following requirem ents: Data scientists must build notebooks in a cloud env ironment Data scientists must use automatic feature engineer ing and model building in machine learning pipeline s. Notebooks must be deployed to retrain using Spark i nstances with dynamic worker allocation. Notebooks must be exportable to be version controll ed locally. You need to create the environment. Which four actions should you perform in sequence? To answer, move the appropriate actions from the li st of actions to the answer area and arrange them in the correct order. Select and Place: D283ABFBEDB32CDCE3B3406B9C29DB2F A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "Step 1: Create an Azure HDInsight cluster to includ e the Apache Spark Mlib library Step 2: Install Microsot Machine Learning for Apach e Spark You install AzureML on your Azure HDInsight cluster . D283ABFBEDB32CDCE3B3406B9C29DB2F Microsoft Machine Learning for Apache Spark (MMLSpa rk) provides a number of deep learning and data science tools for Apache Spark, including seamless integration of Spark Machine Learning pipelines wit h Microsoft Cognitive Toolkit (CNTK) and OpenCV, enab ling you to quickly create powerful, highly-scalabl e predictive and analytical models for large image an d text datasets. Step 3: Create and execute the Zeppelin notebooks o n the cluster Step 4: When the cluster is ready, export Zeppelin notebooks to a local environment. Notebooks must be exportable to be version controlled locally.", "references": "https://docs.microsoft.com/en-us/azure/hdinsight/sp ark/apache-spark-zeppelin-notebook https://azuremlbuild.blob.core.windows.net/pysparka pi/intro.html" }, { "question": "You plan to build a team data science environment. Data for training models in machine learning pipeli nes will be over 20 GB in size. You have the following requirements: Models must be built using Caffe2 or Chainer framew orks. Data scientists must be able to use a data science environment to build the machine learning pipelines and train models on their personal devices in both conn ected and disconnected network environments. Personal devices must support updating machine lear ning pipelines when connected to a network. You need to select a data science environment. Which environment should you use?", "options": [ "A. Azure Machine Learning Service", "B. Azure Machine Learning Studio", "C. Azure Databricks", "D. Azure Kubernetes Service (AKS)" ], "correct": "A. Azure Machine Learning Service", "explanation": "The Data Science Virtual Machine (DSVM) is a custom ized VM image on Microsoft's Azure cloud built specifically for doing data science. Caffe2 and Cha iner are supported by DSVM. DSVM integrates with Azure Machine Learning. Incorrect Answers: B: Use Machine Learning Studio when you want to exp eriment with machine learning models quickly and ea sily, and the built-in machine learning algorithms are su fficient for your solutions.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/data-science-virtual-machine/overview" }, { "question": "You are implementing a machine learning model to pr edict stock prices. The model uses a PostgreSQL database and requires G PU processing. You need to create a virtual machine that is pre-co nfigured with the required tools. D283ABFBEDB32CDCE3B3406B9C29DB2F What should you do?", "options": [ "A. Create a Data Science Virtual Machine (DSVM) Wind ows edition.", "B. Create a Geo Al Data Science Virtual Machine (Geo -DSVM) Windows edition.", "C. Create a Deep Learning Virtual Machine (DLVM) Lin ux edition.", "D. Create a Deep Learning Virtual Machine (DLVM) Win dows edition." ], "correct": "A. Create a Data Science Virtual Machine (DSVM) Wind ows edition.", "explanation": "In the DSVM, your training models can use deep lear ning algorithms on hardware that's based on graphic s processing units (GPUs). PostgreSQL is available for the following operating systems: Linux (all recent distributions), 64-bit installers available for macOS (OS X) version 10.6 and newer Windows (with installers available for 64-bit versi on; tested on latest versions and back to Windows 2012 R2. Incorrect Answers: B: The Azure Geo AI Data Science VM (Geo-DSVM) deli vers geospatial analytics capabilities from Microso ft's Data Science VM. Specifically, this VM extends the AI and data science toolkits in the Data Science VM by adding ESRI's market-leading ArcGIS Pro Geographic Information System. C, D: DLVM is a template on top of DSVM image. In t erms of the packages, GPU drivers etc are all there in the DSVM image. Mostly it is for convenience during cre ation where we only allow DLVM to be created on GPU VM instances on Azure.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/data-science-virtual-machine/overview" }, { "question": "You are developing deep learning models to analyze semi-structured, unstructured, and structured data types. You have the following data available for model bui lding: Video recordings of sporting events Transcripts of radio commentary about events Logs from related social media feeds captured durin g sporting events You need to select an environment for creating the model. Which environment should you use?", "options": [ "A. Azure Cognitive Services", "B. Azure Data Lake Analytics C. Azure HDInsight with Spark MLib", "D. Azure Machine Learning Studio" ], "correct": "A. Azure Cognitive Services", "explanation": "Azure Cognitive Services expand on Microsoft's evol ving portfolio of machine learning APIs and enable D283ABFBEDB32CDCE3B3406B9C29DB2F developers to easily add cognitive features such a s emotion and video detection; facial, speech, and vision recognition; and speech and language understanding into their applications. The goal of Azure Cogniti ve Services is to help developers create applications that can see, hear, speak, understand, and even beg in to reason. The catalog of services within Azure Cognit ive Services can be categorized into five main pill ars - Vision, Speech, Language, Search, and Knowledge.", "references": "https://docs.microsoft.com/en-us/azure/cognitive-se rvices/welcome" }, { "question": "You must store data in Azure Blob Storage to suppor t Azure Machine Learning. You need to transfer the data into Azure Blob Stora ge. What are three possible ways to achieve the goal? E ach correct answer presents a complete solution. NOTE: Each correct selection is worth one point.", "options": [ "A. Bulk Insert SQL Query", "B. AzCopy", "C. Python script", "D. Azure Storage Explorer" ], "correct": "", "explanation": "You can move data to and from Azure Blob storage us ing different technologies: Azure Storage-Explorer AzCopy Python SSIS", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/team-data-science-process/move-azure-blob" }, { "question": "You are moving a large dataset from Azure Machine L earning Studio to a Weka environment. You need to format the data for the Weka environmen t. Which module should you use?", "options": [ "A. Convert to CSV", "B. Convert to Dataset", "C. Convert to ARFF", "D. Convert to SVMLight" ], "correct": "C. Convert to ARFF", "explanation": "D283ABFBEDB32CDCE3B3406B9C29DB2F Use the Convert to ARFF module in Azure Machine Lea rning Studio, to convert datasets and results in Az ure Machine Learning to the attribute-relation file for mat used by the Weka toolset. This format is known as ARFF. The ARFF data specification for Weka supports multi ple machine learning tasks, including data preproce ssing, classification, and feature selection. In this form at, data is organized by entites and their attribut es, and is contained in a single text file.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/convert-to-arff" }, { "question": "You plan to create a speech recognition deep learni ng model. The model must support the latest version of Python . You need to recommend a deep learning framework for speech recognition to include in the Data Science Virtual Machine (DSVM). What should you recommend?", "options": [ "A. Rattle", "B. TensorFlow", "C. Weka", "D. Scikit-learn Correct Answer: B" ], "correct": "", "explanation": "TensorFlow is an open-source library for numerical computation and large-scale machine learning. It us es Python to provide a convenient front-end API for bu ilding applications with the framework TensorFlow c an train and run deep neural networks for handwritten digit classification, image recognition, word embeddings, recurrent neural networks, sequence-to-sequence mod els for machine translation, natural language processing, and PDE (partial differential equation) based simulations. Incorrect Answers: A: Rattle is the R analytical tool that gets you st arted with data analytics and machine learning. C: Weka is used for visual data mining and machine learning software in Java. D: Scikit-learn is one of the most useful libraries for machine learning in Python. It is on NumPy, Sc iPy and matplotlib, this library contains a lot of efficien t tools for machine learning and statistical modeli ng including classification, regression, clustering and dimensio nality reduction.", "references": "https://www.infoworld.com/article/3278008/what-is-t ensorflow-the-machine-learning-library-explained.ht ml" }, { "question": "You plan to use a Deep Learning Virtual Machine (DL VM) to train deep learning models using Compute Uni fied Device Architecture (CUDA) computations. You need to configure the DLVM to support CUDA. What should you implement?", "options": [ "A. Solid State Drives (SSD)", "B. Computer Processing Unit (CPU) speed increase by using overclocking", "C. Graphic Processing Unit (GPU)", "D. High Random Access Memory (RAM) configuration" ], "correct": "C. Graphic Processing Unit (GPU)", "explanation": "A Deep Learning Virtual Machine is a pre-configured environment for deep learning using GPU instances.", "references": "https://azuremarketplace.microsoft.com/en-au/market place/apps/microsoft-ads.dsvm-deep-learning" }, { "question": "Caffe2 and PyTorch. You need to select a pre-configured DSVM to support the frameworks. What should you create?", "options": [ "A. Data Science Virtual Machine for Windows 2012", "B. Data Science Virtual Machine for Linux (CentOS)", "C. Geo AI Data Science Virtual Machine with ArcGIS", "D. Data Science Virtual Machine for Windows 2016" ], "correct": "", "explanation": "Caffe2 and PyTorch is supported by Data Science Vir tual Machine for Linux. Microsoft offers Linux edit ions of the DSVM on Ubuntu 16.04 LTS and CentOS 7.4. Only t he DSVM on Ubuntu is preconfigured for Caffe2 and PyTorch. Incorrect Answers: D: Caffe2 and PytOCH are only supported in the Data Science Virtual Machine for Linux.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/data-science-virtual-machine/overview" }, { "question": "HOTSPOT You are performing sentiment analysis using a CSV f ile that includes 12,000 customer reviews written i n a short sentence format. You add the CSV file to Azure Mach ine Learning Studio and configure it as the startin g point dataset of an experiment. You add the Extract N-Gra m Features from Text module to the experiment to ex tract key phrases from the customer review column in the dataset. You must create a new n-gram dictionary from the cu stomer review text and set the maximum n-gram size to trigrams. What should you select? To answer, select the appro priate options in the answer area. NOTE: Each correct selection is worth one point. D283ABFBEDB32CDCE3B3406B9C29DB2F Hot Area: A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "Vocabulary mode: Create D283ABFBEDB32CDCE3B3406B9C29DB2F For Vocabulary mode, select Create to indicate that you are creating a new list of n-gram features. N-Grams size: 3 For N-Grams size, type a number that indicates the maximum size of the n-grams to extract and store. F or example, if you type 3, unigrams, bigrams, and trig rams will be created. Weighting function: Leave blank The option, Weighting function, is required only if you merge or update vocabularies. It specifies how terms in the two vocabularies and their scores should be wei ghted against each other.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/extract-n-gram-feature s- from-text" }, { "question": "You are developing a data science workspace that us es an Azure Machine Learning service. You need to select a compute target to deploy the w orkspace. What should you use?", "options": [ "A. Azure Data Lake Analytics", "B. Azure Databricks", "C. Azure Container Service", "D. Apache Spark for HDInsight" ], "correct": "C. Azure Container Service", "explanation": "Azure Container Instances can be used as compute ta rget for testing or development. Use for low-scale CPU- based workloads that require less than 48 GB of RAM .", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/service/how-to-deploy-and-where" }, { "question": "You are solving a classification task. The dataset is imbalanced. You need to select an Azure Machine Learning Studio module to improve the classification accuracy. Which module should you use? A. Permutation Feature Importance", "options": [ "B. Filter Based Feature Selection", "C. Fisher Linear Discriminant Analysis", "D. Synthetic Minority Oversampling Technique (SMOTE)" ], "correct": "D. Synthetic Minority Oversampling Technique (SMOTE)", "explanation": "D283ABFBEDB32CDCE3B3406B9C29DB2F Use the SMOTE module in Azure Machine Learning Stud io (classic) to increase the number of underrepresented cases in a dataset used for machin e learning. SMOTE is a better way of increasing the number of rare cases than simply duplicating existi ng cases. You connect the SMOTE module to a dataset that is i mbalanced. There are many reasons why a dataset mig ht be imbalanced: the category you are targeting might be very rare in the population, or the data might simply be difficult to collect. Typically, you use SMOTE when the class you want to analyze is under-represented .", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/smote" }, { "question": "DRAG DROP You configure a Deep Learning Virtual Machine for W indows. You need to recommend tools and frameworks to perfo rm the following: Build deep neural network (DNN) models Perform interactive data exploration and visualizat ion Which tools and frameworks should you recommend? To answer, drag the appropriate tools to the correct tasks. Each tool may be used once, more than once, or not at all. You may need to drag the split bar b etween panes or scroll to view content. NOTE: Each correct selection is worth one point. Select and Place: A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: Vowpal Wabbit Use the Train Vowpal Wabbit Version 8 module in Azu re Machine Learning Studio (classic), to create a machine learning model by using Vowpal Wabbit. Box 2: PowerBI Desktop Power BI Desktop is a powerful visual data explorat ion and interactive reporting tool BI is a name giv en to a modern approach to business decision making in whic h users are empowered to find, explore, and share insights from data across the enterprise.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/train-vowpal-wabbit-ve rsion- 8-model https://docs.microsoft.com/en-us/azure/architecture /data-guide/scenarios/interactive-data-exploration" }, { "question": "You use Azure Machine Learning Studio to build a ma chine learning experiment. You need to divide data into two distinct datasets. Which module should you use?", "options": [ "A. Assign Data to Clusters", "B. Load Trained Model", "C. Partition and Sample", "D. Tune Model-Hyperparameters" ], "correct": "C. Partition and Sample", "explanation": "Partition and Sample with the Stratified split opti on outputs multiple datasets, partitioned using the rules you specified.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/partition-and-sample" }, { "question": "DRAG DROP You are creating an experiment by using Azure Machi ne Learning Studio. You must divide the data into four subsets for eval uation. There is a high degree of missing values in the data. You must prepare the data for analysis. You need to select appropriate methods for producin g the experiment. Which three modules should you run in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order. NOTE: More than one order of answer choices is corr ect. You will receive credit for any of the correct orders you select. D283ABFBEDB32CDCE3B3406B9C29DB2F Select and Place: A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "The Clean Missing Data module in Azure Machine Lear ning Studio, to remove, replace, or infer missing v alues. Incorrect Answers: Latent Direchlet Transformation: Latent Dirichlet A llocation module in Azure Machine Learning Studio, to group otherwise unclassified text into a number of categories. Latent Dirichlet Allocation (LDA) is of ten used in natural language processing (NLP) to find texts tha t are similar. Another common term is topic modelin g. Build Counting Transform: Build Counting Transform module in Azure Machine Learning Studio, to analyze training data. From this data, the module b uilds a count table as well as a set of count-based features that can be used in a predictive model. Missing Value Scrubber: The Missing Values Scrubber module is deprecated. Feature hashing: Feature hashing is used for lingui stics, and works by converting unique tokens into integers. Replace discrete values: the Replace Discrete Value s module in Azure Machine Learning Studio is used to generate a probability score that can be used to represent a discrete value. This score can be usef ul for understanding the information value of the discrete values.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/clean-missing-data" }, { "question": "HOTSPOT You are retrieving data from a large datastore by u sing Azure Machine Learning Studio. You must create a subset of the data for testing pu rposes using a random sampling seed based on the sy stem clock. You add the Partition and Sample module to your exp eriment. You need to select the properties for the module. Which values should you select? To answer, select t he appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area: D283ABFBEDB32CDCE3B3406B9C29DB2F", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: Sampling Create a sample of data This option supports simple random sampling or stra tified random sampling. This is useful if you want to create a smaller representative sample dataset for testing . 1. Add the Partition and Sample module to your expe riment in Studio, and connect the dataset. 2. Partition or sample mode: Set this to Sampling. 3. Rate of sampling. See box 2 below. Box 2: 0 3. Rate of sampling. Random seed for sampling: Opti onally, type an integer to use as a seed value. This option is important if you want the rows to be divided the same way every time. The default value is 0, meaning that a starting seed is generated based on the system clock. This can lead to slightly differe nt results each time you run the experiment.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/partition-and-sample D283ABFBEDB32CDCE3B3406B9C29DB2F" }, { "question": "You are creating a machine learning model. You have a dataset that contains null rows. You need to use the Clean Missing Data module in Az ure Machine Learning Studio to identify and resolve the null and missing data in the dataset. Which parameter should you use?", "options": [ "A. Replace with mean", "B. Remove entire column", "C. Remove entire row", "D. Hot Deck" ], "correct": "C. Remove entire row", "explanation": "Remove entire row: Completely removes any row in th e dataset that has one or more missing values. This is useful if the missing value can be considered rando mly missing.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/clean-missing-data" }, { "question": "HOTSPOT The finance team asks you to train a model using da ta in an Azure Storage blob container named finance -data. You need to register the container as a datastore i n an Azure Machine Learning workspace and ensure th at an error will be raised if the container does not exis t. How should you complete the code? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area: D283ABFBEDB32CDCE3B3406B9C29DB2F", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: register_azure_blob_container Register an Azure Blob Container to the datastore. Box 2: create_if_not_exists = False Create the file share if it does not exist, default s to False.", "references": "https://docs.microsoft.com/en-us/python/api/azureml -core/azureml.core.datastore.datastore" }, { "question": "You plan to provision an Azure Machine Learning Bas ic edition workspace for a data science project. You need to identify the tasks you will be able to perform in the workspace. Which three tasks will you be able to perform? Each correct answer presents a complete solution. NOTE: Each correct selection is worth one point.", "options": [ "A. Create a Compute Instance and use it to run code in Jupyter notebooks.", "B. Create an Azure Kubernetes Service (AKS) inferenc e cluster.", "C. Use the designer to train a model by dragging and dropping pre-defined modules.", "D. Create a tabular dataset that supports versioning ." ], "correct": "", "explanation": "Explanation/Reference: Incorrect Answers: C, E: The UI is included the Enterprise edition onl y.", "references": "D283ABFBEDB32CDCE3B3406B9C29DB2F https://azure.microsoft.com/en-us/pricing/details/m achine-learning/" }, { "question": "HOTSPOT A coworker registers a datastore in a Machine Learn ing services workspace by using the following code: You need to write code to access the datastore from a notebook. How should you complete the code segment? To answer , select the appropriate options in the answer area . NOTE: Each correct selection is worth one point. Hot Area:", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: DataStore To get a specific datastore registered in the curre nt workspace, use the get() static method on the Da tastore class: D283ABFBEDB32CDCE3B3406B9C29DB2F # Get a named datastore from the current workspace datastore = Datastore.get(ws, datastore_name='your datastore name') Box 2: ws Box 3: demo_datastore", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-access-data" }, { "question": "A set of CSV files contains sales records. All the CSV files have the same data schema. Each CSV file contains the sales record for a parti cular month and has the filename sales.csv. Each fi le is stored in a folder that indicates the month and yea r when the data was recorded. The folders are in an Azure blob container for which a datastore has been defin ed in an Azure Machine Learning workspace. The fold ers are organized in a parent folder named sales to cre ate the following hierarchical structure: At the end of each month, a new folder with that mo nth's sales file is added to the sales folder. You plan to use the sales data to train a machine l earning model based on the following requirements: You must define a dataset that loads all of the sal es data to date into a structure that can be easily converted to a dataframe. You must be able to create experiments that use onl y data that was created before a specific previous month, ignoring any data that was added after that month. You must register the minimum number of datasets po ssible. You need to register the sales data as a dataset in Azure Machine Learning service workspace. What should you do?", "options": [ "A. Create a tabular dataset that references the data store and explicitly specifies each 'sales/mm-yyyy/", "B. Create a tabular dataset that references the data store and specifies the path 'sales/*/sales.csv', r egister the", "C. Create a new tabular dataset that references the datastore and explicitly specifies each 'sales/mm-y yyy/", "D. Create a tabular dataset that references the data store and explicitly specifies each 'sales/mm-yyyy/" ], "correct": "B. Create a tabular dataset that references the data store and specifies the path 'sales/*/sales.csv', r egister the", "explanation": "Specify the path. Example: The following code gets the workspace existing work space and the desired datastore by name. And then passes the datastore and file locations to the path parameter to create a new TabularDataset, weather_ ds. from azureml.core import Workspace, Datastore, Data set datastore_name = 'your datastore name' # get existing workspace workspace = Workspace.from_config() # retrieve an existing datastore in the workspace b y name datastore = Datastore.get(workspace, datastore_name ) # create a TabularDataset from 3 file paths in data store datastore_paths = [(datastore, 'weather/2018/11.csv '), (datastore, 'weather/2018/12.csv'), (datastore, 'weather/2019/*.csv')] weather_ds = Dataset.Tabular.from_delimited_files(p ath=datastore_paths)", "references": "" }, { "question": "DRAG DROP An organization uses Azure Machine Learning service and wants to expand their use of machine learning. You have the following compute environments. The or ganization does not want to create another compute environment. You need to determine which compute environment to use for the following scenarios. Which compute types should you use? To answer, drag the appropriate compute environments to the correc t scenarios. Each compute environment may be used onc e, more than once, or not at all. You may need to d rag the split bar between panes or scroll to view conte nt. NOTE: Each correct selection is worth one point. Select and Place: D283ABFBEDB32CDCE3B3406B9C29DB2F A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: nb_server D283ABFBEDB32CDCE3B3406B9C29DB2F Box 2: mlc_cluster With Azure Machine Learning, you can train your mod el on a variety of resources or environments, colle ctively referred to as compute targets. A compute target ca n be a local machine or a cloud resource, such as a n Azure Machine Learning Compute, Azure HDInsight or a remo te virtual machine.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/concept-compute-target https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-set-up-training-targets" }, { "question": "HOTSPOT You create an Azure Machine Learning compute target named ComputeOne by using the STANDARD_D1 virtual machine image. ComputeOne is currently idle and has zero active no des. You define a Python variable named ws that referenc es the Azure Machine Learning workspace. You run th e following Python code: D283ABFBEDB32CDCE3B3406B9C29DB2F For each of the following statements, select Yes if the statement is true. Otherwise, select No. NOTE: Each correct selection is worth one point. Hot Area:", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: Yes ComputeTargetException class: An exception related to failures when creating, interacting with, or con figuring a compute target. This exception is commonly raised f or failures attaching a compute target, missing hea ders, and unsupported configuration values. Create(workspace, name, provisioning_configuration) Provision a Compute object by specifying a compute type and related configuration. This method creates a new compute target rather tha n attaching an existing one. Box 2: Yes Box 3: No The line before print('Step1') will fail.", "references": "https://docs.microsoft.com/en-us/python/api/azureml -core/azureml.core.compute.computetarget D283ABFBEDB32CDCE3B3406B9C29DB2F" }, { "question": "HOTSPOT You are developing a deep learning model by using T ensorFlow. You plan to run the model training workl oad on an Azure Machine Learning Compute Instance. You must use CUDA-based model training. You need to provision the Compute Instance. Which two virtual machines sizes can you use? To an swer, select the appropriate virtual machine sizes in the answer area. NOTE: Each correct selection is worth one point. Hot Area: A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "CUDA is a parallel computing platform and programmi ng model developed by Nvidia for general computing on D283ABFBEDB32CDCE3B3406B9C29DB2F its own GPUs (graphics processing units). CUDA enab les developers to speed up compute-intensive applications by harnessing the power of GPUs for th e parallelizable part of the computation.", "references": "https://www.infoworld.com/article/3299703/what-is-c uda-parallel-programming-for-gpus.html" }, { "question": "DRAG DROP You are analyzing a raw dataset that requires clean ing. You must perform transformations and manipulations by using Azure Machine Learning Studio. You need to identify the correct modules to perform the transformations. Which modules should you choose? To answer, drag th e appropriate modules to the correct scenarios. Eac h module may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content. NOTE: Each correct selection is worth one point. Select and Place: A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: Clean Missing Data Box 2: SMOTE Use the SMOTE module in Azure Machine Learning Stud io to increase the number of underepresented cases in a dataset used for machine learning. SMOTE is a better way of increasing the number of rare cases t han simply duplicating existing cases. Box 3: Convert to Indicator Values Use the Convert to Indicator Values module in Azure Machine Learning Studio. The purpose of this modul e is to convert columns that contain categorical values into a series of binary indicator columns that can more easily be used as features in a machine learning model. Box 4: Remove Duplicate Rows", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/smote https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/convert-to-indicator-v alues" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You are using Azure Machine Learning Studio to perf orm feature engineering on a dataset. You need to normalize values to produce a feature c olumn grouped into bins. Solution: Apply an Entropy Minimum Description Leng th (MDL) binning mode. Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "A. Yes", "explanation": "Entropy MDL binning mode: This method requires that you select the column you want to predict and the column or columns that you want to group into bins. It then makes a pass over the data and attempts to determine the number of bins that minimizes the ent ropy. In other words, it chooses a number of bins t hat allows the data column to best predict the target c olumn. It then returns the bin number associated wi th each row of your data in a column named quantiz ed.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/group-data-into-bins" }, { "question": "HOTSPOT You are preparing to use the Azure ML SDK to run an experiment and need to create compute. You run the following code: For each of the following statements, select Yes if the statement is true. Otherwise, select No. NOTE: Each correct selection is worth one point. Hot Area:", "options": [ "A.", "B.", "C. D." ], "correct": "", "explanation": "D283ABFBEDB32CDCE3B3406B9C29DB2F Box 1: No If a compute cluster already exists it will be used . Box 2: Yes The wait_for_completion method waits for the curren t provisioning operation to finish on the cluster. Box 3: Yes Low Priority VMs use Azure's excess capacity and ar e thus cheaper but risk your run being pre-empted. Box 4: No Need to use training_compute.delete() to deprovisio n and delete the AmlCompute target.", "references": "https://notebooks.azure.com/azureml/projects/azurem l-getting-started/html/how-to-use-azureml/training/ train- on-amlcompute/train-on-amlcompute.ipynb https://docs.microsoft.com/en-us/python/api/azureml -core/azureml.core.compute.computetarget" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You are a data scientist using Azure Machine Learni ng Studio. You need to normalize values to produce an output c olumn into bins to predict a target column. Solution: Apply a Quantiles normalization with a Qu antileIndex normalization. Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "Use the Entropy MDL binning mode which has a target column.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/group-data-into-bins" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You are creating a new experiment in Azure Machine Learning Studio. D283ABFBEDB32CDCE3B3406B9C29DB2F One class has a much smaller number of observations than the other classes in the training set. You need to select an appropriate data sampling str ategy to compensate for the class imbalance. Solution: You use the Scale and Reduce sampling mod e. Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "Instead use the Synthetic Minority Oversampling Tec hnique (SMOTE) sampling mode. Note: SMOTE is used to increase the number of under epresented cases in a dataset used for machine learning. SMOTE is a better way of increasing the n umber of rare cases than simply duplicating existin g cases. Incorrect Answers: Common data tasks for the Scale and Reduce sampling mode include clipping, binning, and normalizing numerical values.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/smote https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/data-transformation-sc ale- and-reduce" }, { "question": "You are analyzing a dataset by using Azure Machine Learning Studio. You need to generate a statistical summary that con tains the p-value and the unique count for each fea ture column. Which two modules can you use? Each correct answer presents a complete solution. NOTE: Each correct selection is worth one point.", "options": [ "A. Computer Linear Correlation", "B. Export Count Table", "C. Execute Python Script", "D. Convert to Indicator Values" ], "correct": "", "explanation": "The Export Count Table module is provided for backw ard compatibility with experiments that use the Bui ld Count Table (deprecated) and Count Featurizer (depr ecated) modules. E: Summarize Data statistics are useful when you wa nt to understand the characteristics of the complet e D283ABFBEDB32CDCE3B3406B9C29DB2F dataset. For example, you might need to know: How many missing values are there in each column? How many unique values are there in a feature colum n? What is the mean and standard deviation for each co lumn? The module calculates the important scores for each column, and returns a row of summary statistics fo r each variable (data column) provided as input. Incorrect Answers: A: The Compute Linear Correlation module in Azure M achine Learning Studio is used to compute a set of Pearson correlation coefficients for each possible pair of variables in the input dataset. C: With Python, you can perform tasks that aren't c urrently supported by existing Studio modules such as: Visualizing data using matplotlib Using Python libraries to enumerate datasets and mo dels in your workspace Reading, loading, and manipulating data from sources not supported by the Import Data module D: The purpose of the Convert to Indicator Values m odule is to convert columns that contain categorica l values into a series of binary indicator columns that can more easily be used as features in a machine learni ng model.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/export-count-table https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/summarize-data" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You are analyzing a numerical dataset which contain s missing values in several columns. You must clean the missing values using an appropri ate operation without affecting the dimensionality of the feature set. You need to analyze a full dataset to include all v alues. Solution: Use the Last Observation Carried Forward (LOCF) method to impute the missing data points. Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "Instead use the Multiple Imputation by Chained Equa tions (MICE) method. Replace using MICE: For each missing value, this op tion assigns a new value, which is calculated by us ing a method described in the statistical literature as \" Multivariate Imputation using Chained Equations\" or \"Multiple Imputation by Chained Equations\". With a multiple i mputation method, each variable with missing data i s modeled conditionally using the other variables in the data before filling in the missing values. D283ABFBEDB32CDCE3B3406B9C29DB2F Note: Last observation carried forward (LOCF) is a method of imputing missing data in longitudinal stu dies. If a person drops out of a study before it ends, then hi s or her last observed score on the dependent varia ble is used for all subsequent (i.e., missing) observation points. LOCF is used to maintain the sample size a nd to reduce the bias caused by the attrition of particip ants in a study.", "references": "https://methods.sagepub.com/reference/encyc-of-rese arch-design/n211.xml https://www.ncbi.nlm.nih.gov/pmc/articles/PMC307424 1/" }, { "question": "HOTSPOT You are creating a machine learning model in Python . The provided dataset contains several numerical columns and one text column. The text column repres ents a product's category. The product category wil l always be one of the following: Bikes Cars Vans Boats You are building a regression model using the sciki t-learn Python package. You need to transform the text data to be compatibl e with the scikit-learn Python package. How should you complete the code segment? To answer , select the appropriate options in the answer area . NOTE: Each correct selection is worth one point. Hot Area: D283ABFBEDB32CDCE3B3406B9C29DB2F", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: pandas as df Pandas takes data (like a CSV or TSV file, or a SQL database) and creates a Python object with rows an d columns called data frame that looks very similar t o table in a statistical software (think Excel or S PSS for example. Box 2: transpose[ProductCategoryMapping] Reshape the data from the pandas Series to columns.", "references": "https://datascienceplus.com/linear-regression-in-py thon/" }, { "question": "You plan to deliver a hands-on workshop to several students. The workshop will focus on creating data visualizations using Python. Each student will use a device that has internet access. Student devices are not configured for Python devel opment. Students do not have administrator access t o install software on their devices. Azure subscripti ons are not available for students. You need to ensure that students can run Python-bas ed data visualization code. Which Azure tool should you use?", "options": [ "A. Anaconda Data Science Platform", "B. Azure BatchAI", "C. Azure Notebooks", "D. Azure Machine Learning Service" ], "correct": "C. Azure Notebooks", "explanation": "Explanation/Reference:", "references": "https://notebooks.azure.com/" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You are analyzing a numerical dataset which contain s missing values in several columns. You must clean the missing values using an appropri ate operation without affecting the dimensionality of the feature set. You need to analyze a full dataset to include all v alues. Solution: Replace each missing value using the Mult iple Imputation by Chained Equations (MICE) method. Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "A. Yes", "explanation": "Replace using MICE: For each missing value, this op tion assigns a new value, which is calculated by us ing a method described in the statistical literature as \" Multivariate Imputation using Chained Equations\" or \"Multiple Imputation by Chained Equations\". With a multiple i mputation method, each variable with missing data i s modeled conditionally using the other variables in the data before filling in the missing values. Note: Multivariate imputation by chained equations (MICE), sometimes called \"fully conditional specifi cation\" or \"sequential regression multiple imputation\" has eme rged in the statistical literature as one principle d method of addressing missing data. Creating multiple imputati ons, as opposed to single imputations, accounts for the statistical uncertainty in the imputations. In addi tion, the chained equations approach is very flexib le and can handle variables of varying types (e.g., continuous or binary) as well as complexities such as bounds or survey skip patterns.", "references": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC307424 1/ https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/clean-missing-data" }, { "question": "D283ABFBEDB32CDCE3B3406B9C29DB2F Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You are analyzing a numerical dataset which contain s missing values in several columns. You must clean the missing values using an appropri ate operation without affecting the dimensionality of the feature set. You need to analyze a full dataset to include all v alues. Solution: Remove the entire column that contains th e missing data point. Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "Use the Multiple Imputation by Chained Equations (M ICE) method.", "references": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC307424 1/ https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/clean-missing-data" }, { "question": "You are creating a new experiment in Azure Machine Learning Studio. You have a small dataset that has missing values in many columns. The data does not r equire the application of predictors for each colum n. You plan to use the Clean Missing Data. You need to select a data cleaning method. Which method should you use?", "options": [ "A. Replace using Probabilistic PCA", "B. Normalization", "C. Synthetic Minority Oversampling Technique (SMOTE)", "D. Replace using MICE" ], "correct": "A. Replace using Probabilistic PCA", "explanation": "Replace using Probabilistic PCA: Compared to other options, such as Multiple Imputation using Chained Equations (MICE), this option has the advantage of not requiring the application of predictors for eac h column. Instead, it approximates the covariance for the ful l dataset. Therefore, it might offer better perform ance for datasets that have missing values in many columns. D283ABFBEDB32CDCE3B3406B9C29DB2F", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/clean-missing-data" }, { "question": "You use Azure Machine Learning Studio to build a ma chine learning experiment. You need to divide data into two distinct datasets. Which module should you use?", "options": [ "A. Split Data", "B. Load Trained Model", "C. Assign Data to Clusters", "D. Group Data into Bins" ], "correct": "D. Group Data into Bins", "explanation": "The Group Data into Bins module supports multiple o ptions for binning data. You can customize how the bin edges are set and how values are apportioned into t he bins.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/group-data-into-bins" }, { "question": "image classification deep learning model that uses a set of labeled bird photographs collected by expe rts. You have 100,000 photographs of birds. All photogra phs use the JPG format and are stored in an Azure b lob container in an Azure subscription. You need to access the bird photograph files in the Azure blob container from the Azure Machine Learni ng service workspace that will be used for deep learni ng model training. You must minimize data movement. What should you do?", "options": [ "A. Create an Azure Data Lake store and move the bird photographs to the store.", "B. Create an Azure Cosmos DB database and attach the Azure Blob containing bird photographs storage to", "C. Create and register a dataset by using TabularDat aset class that references the Azure blob storage", "D. Register the Azure blob storage containing the bi rd photographs as a datastore in Azure Machine Lear ning" ], "correct": "D. Register the Azure blob storage containing the bi rd photographs as a datastore in Azure Machine Lear ning", "explanation": "We recommend creating a datastore for an Azure Blob container. When you create a workspace, an Azure D283ABFBEDB32CDCE3B3406B9C29DB2F blob container and an Azure file share are automati cally registered to the workspace.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-access-data" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You are analyzing a numerical dataset which contain s missing values in several columns. You must clean the missing values using an appropri ate operation without affecting the dimensionality of the feature set. You need to analyze a full dataset to include all v alues. Solution: Calculate the column median value and use the median value as the replacement for any missin g value in the column. Does the solution meet the goal? A. Yes", "options": [ "B. No" ], "correct": "B. No", "explanation": "Use the Multiple Imputation by Chained Equations (M ICE) method.", "references": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC307424 1/ https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/clean-missing-data" }, { "question": "You create an Azure Machine Learning workspace. You must create a custom role named DataScientist t hat meets the following requirements: Role members must not be able to delete the workspa ce. Role members must not be able to create, update, or delete compute resources in the workspace. Role members must not be able to add new users to t he workspace. You need to create a JSON file for the DataScientis t role in the Azure Machine Learning workspace. The custom role must enforce the restrictions speci fied by the IT Operations team. Which JSON code segment should you use? D283ABFBEDB32CDCE3B3406B9C29DB2F A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "The following custom role can do everything in the workspace except for the following actions: It can't create or update a compute resource. It can't delete a compute resource. It can't add, delete, or alter role assignments. It can't delete the workspace. To create a custom role, first construct a role def inition JSON file that specifies the permission and scope for the role. The following example defines a custom ro le named \"Data Scientist Custom\" scoped at a specif ic workspace level: data_scientist_custom_role.json : { \"Name\": \"Data Scientist Custom\", \"IsCustom\": true, \"Description\": \"Can run experiment but can't create or delete compute.\", \"Actions\": [\"*\"], \"NotActions\": [ \"Microsoft.MachineLearningServices/workspaces/*/del ete\", \"Microsoft.MachineLearningServices/workspaces/write \", \"Microsoft.MachineLearningServices/workspaces/compu tes/*/write\", \"Microsoft.MachineLearningServices/ workspaces/computes/*/delete\", \"Microsoft.Authoriza tion/*/write\" ], \"AssignableScopes\": [ \"/subscriptions//resourceGroups//providers/ Microsoft.MachineLearningServices/workspaces/\" ] }", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-assign-roles" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You are a data scientist using Azure Machine Learni ng Studio. You need to normalize values to produce an output c olumn into bins to predict a target column. Solution: Apply an Equal Width with Custom Start an d Stop binning mode. Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "Use the Entropy MDL binning mode which has a target column.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/group-data-into-bins" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You are a data scientist using Azure Machine Learni ng Studio. You need to normalize values to produce an output c olumn into bins to predict a target column. Solution: Apply a Quantiles binning mode with a PQu antile normalization. Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "Use the Entropy MDL binning mode which has a target column.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/group-data-into-bins" }, { "question": "HOTSPOT You are evaluating a Python NumPy array that contai ns six data points defined as follows: data = [10, 20, 30, 40, 50, 60] You must generate the following output by using the k-fold algorithm implantation in the Python Scikit -learn machine learning library: train: [10 40 50 60], test: [20 30] train: [20 30 40 60], test: [10 50] train: [10 20 30 50], test: [40 60] You need to implement a cross-validation to generat e the output. How should you complete the code segment? To answer , select the appropriate code segment in the dialog box in the answer area. NOTE: Each correct selection is worth one point. D283ABFBEDB32CDCE3B3406B9C29DB2F Hot Area:", "options": [ "A.", "B.", "C.", "D. Correct Answer:" ], "correct": "", "explanation": "D283ABFBEDB32CDCE3B3406B9C29DB2F Box 1: k-fold Box 2: 3 K-Folds cross-validator provides train/test indices to split data in train/test sets. Split dataset in to k consecutive folds (without shuffling by default). The parameter n_splits ( int, default=3) is the num ber of folds. Must be at least 2. Box 3: data Example: Example: >>> >>> from sklearn.model_selection import KFold >>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]]) >>> y = np.array([1, 2, 3, 4]) >>> kf = KFold(n_splits=2) >>> kf.get_n_splits(X) >>> print(kf) KFold(n_splits=2, random_state=None, shuffle=False) >>> for train_index, test_index in kf.split(X): ... print(\"TRAIN:\", train_index, \"TEST:\", test_inde x) ... X_train, X_test = X[train_index], X[test_index] ... y_train, y_test = y[train_index], y[test_index] TRAIN: [2 3] TEST: [0 1] TRAIN: [0 1] TEST: [2 3]", "references": "https://scikit-learn.org/stable/modules/generated/s klearn.model_selection.KFold.html" }, { "question": "You are with a time series dataset in Azure Machine Learning Studio. You need to split your dataset into training and te sting subsets by using the Split Data module. Which splitting mode should you use?", "options": [ "A. Recommender Split", "B. Regular Expression Split", "C. Relative Expression Split", "D. Split Rows with the Randomized split parameter se t to true" ], "correct": "D. Split Rows with the Randomized split parameter se t to true", "explanation": "Split Rows: Use this option if you just want to div ide the data into two parts. You can specify the pe rcentage of data to put in each split, but by default, the data is divided 50-50. Incorrect Answers: B: Regular Expression Split: Choose this option whe n you want to divide your dataset by testing a sing le column for a value. C: Relative Expression Split: Use this option whene ver you want to apply a condition to a number colum n.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/split-data D283ABFBEDB32CDCE3B3406B9C29DB2F" }, { "question": "HOTSPOT You are preparing to build a deep learning convolut ional neural network model for image classification . You create a script to train the model using CUDA devic es. You must submit an experiment that runs this script in the Azure Machine Learning workspace. The following compute resources are available: a Microsoft Surface device on which Microsoft Offic e has been installed. Corporate IT policies prevent the installation of additional software a Compute Instance named ds-workstation in the work space with 2 CPUs and 8 GB of memory an Azure Machine Learning compute target named cpu- cluster with eight CPU-based nodes an Azure Machine Learning compute target named gpu- cluster with four CPU and GPU-based nodes You need to specify the compute resources to be use d for running the code to submit the experiment, an d for running the script in order to minimize model train ing time. Which resources should the data scientist use? To a nswer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area:", "options": [ "A.", "B.", "C.", "D. Correct Answer:" ], "correct": "", "explanation": "Box 1: the ds-workstation compute instance A workstation notebook instance is good enough to r un experiments. Box 2: the gpu-cluster compute target Just as GPUs revolutionized deep learning through u nprecedented training and inferencing performance, RAPIDS enables traditional machine learning practit ioners to unlock game-changing performance with GPU s. With RAPIDS on Azure Machine Learning service, user s can accelerate the entire machine learning pipeli ne, including data processing, training and inferencing , with GPUs from the NC_v3, NC_v2, ND or ND_v2 fami lies. Users can unlock performance gains of more than 20X (with 4 GPUs), slashing training times from hours to minutes and dramatically reducing time-to-insight.", "references": "https://azure.microsoft.com/sv-se/blog/azure-machin e-learning-service-now-supports-nvidia-s-rapids/" }, { "question": "You create an Azure Machine Learning workspace. You are preparing a local Python environment on a lapt op computer. You want to use the laptop to connect to the workspace and run experiments. You create the following config.json file. { \"workspace_name\" : \"ml-workspace\" } You must use the Azure Machine Learning SDK to inte ract with data and experiments in the workspace. D283ABFBEDB32CDCE3B3406B9C29DB2F You need to configure the config.json file to conne ct to the workspace from the Python environment. Which two additional parameters must you add to the config.json file in order to connect to the worksp ace? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point.", "options": [ "A. login", "B. resource_group", "C. subscription_id", "D. key" ], "correct": "", "explanation": "To use the same workspace in multiple environments, create a JSON configuration file. The configuratio n file saves your subscription (subscription_id), resource (resource_group), and workspace name so that it ca n be easily loaded. The following sample shows how to create a workspac e. from azureml.core import Workspace ws = Workspace.create(name='myworkspace', subscription_id='', resource_group='myresourcegroup', create_resource_group=True, location='eastus2' )", "references": "https://docs.microsoft.com/en-us/python/api/azureml -core/azureml.core.workspace.workspace" }, { "question": "HOTSPOT You are performing a classification task in Azure M achine Learning Studio. You must prepare balanced testing and training samp les based on a provided data set. You need to split the data with a 0.75:0.25 ratio. Which value should you use for each parameter? To a nswer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area: D283ABFBEDB32CDCE3B3406B9C29DB2F", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: Split rows Use the Split Rows option if you just want to divid e the data into two parts. You can specify the perc entage of data to put in each split, but by default, the data is divided 50-50. You can also randomize the selection of rows in eac h group, and use stratified sampling. In stratified sampling, you must select a single column of data for which y ou want values to be apportioned equally among the two result datasets. Box 2: 0.75 If you specify a number as a percentage, or if you use a string that contains the \"%\" character, the v alue is interpreted as a percentage. All percentage values must be within the range (0, 100), not including th e values 0 and 100. Box 3: Yes To ensure splits are balanced. Box 4: No If you use the option for a stratified split, the o utput datasets can be further divided by subgroups, by selecting a strata column. D283ABFBEDB32CDCE3B3406B9C29DB2F", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/split-data" }, { "question": "You create an Azure Machine Learning compute resour ce to train models. The compute resource is configu red as follows: Minimum nodes: 2 Maximum nodes: 4 You must decrease the minimum number of nodes and i ncrease the maximum number of nodes to the following values: Minimum nodes: 0 Maximum nodes: 8 You need to reconfigure the compute resource. What are three possible ways to achieve this goal? Each correct answer presents a complete solution. NOTE: Each correct selection is worth one point.", "options": [ "A. Use the Azure Machine Learning studio.", "B. Run the update method of the AmlCompute class in the Python SDK.", "C. Use the Azure portal.", "D. Use the Azure Machine Learning designer." ], "correct": "", "explanation": "A: You can manage assets and resources in the Azure Machine Learning studio. B: The update(min_nodes=None, max_nodes=None, idle_ seconds_before_scaledown=None) of the AmlCompute class updates the ScaleSettings for this AmlCompute target. C: To change the nodes in the cluster, use the UI f or your cluster in the Azure portal. Reference: https://docs.microsoft.com/en-us/python/api/azureml -core/azureml.core.compute.amlcompute(class)", "references": "" }, { "question": "HOTSPOT You have a dataset that contains 2,000 rows. You ar e building a machine learning classification model by using Azure Learning Studio. You add a Partition and Samp le module to the experiment. You need to configure the module. You must meet the following requirements: Divide the data into subsets Assign the rows into folds using a round-robin meth od Allow rows in the dataset to be reused How should you configure the module? To answer, sel ect the appropriate options in the dialog box in th e D283ABFBEDB32CDCE3B3406B9C29DB2F answer area. NOTE: Each correct selection is worth one point. Hot Area:", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Use the Split data into partitions option when you want to divide the dataset into subsets of the data . This option is also useful when you want to create a custom num ber of folds for cross-validation, or to split rows into several groups. 1. Add the Partition and Sample module to your expe riment in Studio (classic), and connect the dataset . 2. For Partition or sample mode, select Assign to F olds. 3. Use replacement in the partitioning: Select this option if you want the sampled row to be put back into the pool of rows for potential reuse. As a result, the same row might be assigned to several folds. 4. If you do not use replacement (the default optio n), the sampled row is not put back into the pool o f rows for potential reuse. As a result, each row can be assig ned to only one fold. 5. Randomized split: Select this option if you want rows to be randomly assigned to folds. If you do n ot select this option, rows are assigned to folds using the r ound-robin method.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/partition-and-sample" }, { "question": "You create a new Azure subscription. No resources a re provisioned in the subscription. You need to create an Azure Machine Learning worksp ace. D283ABFBEDB32CDCE3B3406B9C29DB2F What are three possible ways to achieve this goal? Each correct answer presents a complete solution. NOTE: Each correct selection is worth one point.", "options": [ "A. Run Python code that uses the Azure ML SDK librar y and calls the Workspace.create method with name,", "B. Navigate to Azure Machine Learning studio and cre ate a workspace.", "C. Use the Azure Command Line Interface (CLI) with t he Azure Machine Learning extension to call the az", "D. Navigate to Azure Machine Learning studio and cre ate a workspace." ], "correct": "", "explanation": "B: You can create a workspace in the Azure Machine Learning studio C: You can create a workspace for Azure Machine Lea rning with Azure CLI Install the machine learning extension. Create a resource group: az group create --name --location To create a new workspace where the services are au tomatically created, use the following command: az ml workspace create -w -g D: You can create and manage Azure Machine Learning workspaces in the Azure portal. 1. Sign in to the Azure portal by using the credent ials for your Azure subscription. 2. In the upper-left corner of Azure portal, select + Create a resource. 3. Use the search bar to find Machine Learning. 4. Select Machine Learning. 5. In the Machine Learning pane, select Create to b egin. D283ABFBEDB32CDCE3B3406B9C29DB2F Reference: https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-create-workspace-template https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-manage-workspace-cli https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-manage-workspace", "references": "" }, { "question": "HOTSPOT You create an Azure Machine Learning workspace and set up a development environment. You plan to train a deep neural network (DNN) by using the Tensorflow f ramework and by using estimators to submit training scripts. You must optimize computation speed for training ru ns. You need to choose the appropriate estimator to use as well as the appropriate training compute target configuration. Which values should you use? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. D283ABFBEDB32CDCE3B3406B9C29DB2F Hot Area:", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: Tensorflow TensorFlow represents an estimator for training in TensorFlow experiments. Box 2: 12 vCPU, 112 GB memory..,2 GPU,.. Use GPUs for the deep neural network.", "references": "https://docs.microsoft.com/en-us/python/api/azureml -train-core/azureml.train.dnn D283ABFBEDB32CDCE3B3406B9C29DB2F" }, { "question": "HOTSPOT You have an Azure Machine Learning workspace named workspace1 that is accessible from a public endpoin t. The workspace contains an Azure Blob storage datast ore named store1 that represents a blob container i n an Azure storage account named account1. You configure workspace1 and account1 to be accessible by using private endpoints in the same virtual network. You must be able to access the contents of store1 b y using the Azure Machine Learning SDK for Python. You must be able to preview the contents of store1 by u sing Azure Machine Learning studio. You need to configure store1. What should you do? To answer, select the appropria te options in the answer area. NOTE: Each correct selection is worth one point. Hot Area: A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "D283ABFBEDB32CDCE3B3406B9C29DB2F Box 1: Regenerate the keys of account1. Azure Blob Storage support authentication through A ccount key or SAS token. To authenticate your acces s to the underlying storage service, you can provide eit her your account key, shared access signatures (SAS ) tokens, or service principal Box 2: Update the authentication for store1. For Azure Machine Learning studio users, several fe atures rely on the ability to read data from a data set; such as dataset previews, profiles and automated machine learning. For these features to work with storage behind virtual networks, use a workspace managed identity in the studio to allow Azure Machine Learning to ac cess the storage account from outside the virtual networ k. Note: Some of the studio's features are disabled by default in a virtual network. To re-enable these f eatures, you must enable managed identity for storage accoun ts you intend to use in the studio. The following operations are disabled by default in a virtual network: Preview data in the studio.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-access-data" }, { "question": "DRAG DROP You are building an intelligent solution using mach ine learning models. The environment must support the following requirem ents: Data scientists must build notebooks in a cloud env ironment Data scientists must use automatic feature engineer ing and model building in machine learning pipeline s. Notebooks must be deployed to retrain using Spark i nstances with dynamic worker allocation. Notebooks must be exportable to be version controlled locally. You need to create the environment. Which four actions should you perform in sequence? To answer, move the appropriate actions from the li st of actions to the answer area and arrange them in the correct order. Select and Place: D283ABFBEDB32CDCE3B3406B9C29DB2F", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Step 1: Create an Azure HDInsight cluster to includ e the Apache Spark Mlib library Step 2: Install Microsot Machine Learning for Apach e Spark You install AzureML on your Azure HDInsight cluster . D283ABFBEDB32CDCE3B3406B9C29DB2F Microsoft Machine Learning for Apache Spark (MMLSpa rk) provides a number of deep learning and data science tools for Apache Spark, including seamless integration of Spark Machine Learning pipelines wit h Microsoft Cognitive Toolkit (CNTK) and OpenCV, enab ling you to quickly create powerful, highly-scalabl e predictive and analytical models for large image an d text datasets. Step 3: Create and execute the Zeppelin notebooks o n the cluster Step 4: When the cluster is ready, export Zeppelin notebooks to a local environment. Notebooks must be exportable to be version controlled locally.", "references": "https://docs.microsoft.com/en-us/azure/hdinsight/sp ark/apache-spark-zeppelin-notebook https://azuremlbuild.blob.core.windows.net/pysparka pi/intro.html" }, { "question": "HOTSPOT You have a dataset that contains 2,000 rows. You ar e building a machine learning classification model by using Azure Learning Studio. You add a Partition and Samp le module to the experiment. You need to configure the module. You must meet the following requirements: Divide the data into subsets Assign the rows into folds using a round-robin meth od Allow rows in the dataset to be reused How should you configure the module? To answer, sel ect the appropriate options in the dialog box in th e answer area. NOTE: Each correct selection is worth one point. Hot Area:", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Use the Split data into partitions option when you want to divide the dataset into subsets of the data . This option is also useful when you want to create a custom num ber of folds for cross-validation, or to split rows into several groups. Add the Partition and Sample module to your experim ent in Studio (classic), and connect the dataset. For Partition or sample mode, select Assign to Fold s. Use replacement in the partitioning: Select this op tion if you want the sampled row to be put back int o the pool of rows for potential reuse. As a result, the same row might be assigned to several folds. If you do n ot use replacement (the default option), the sampled row i s not put back into the pool of rows for potential reuse. As a result, each row can be assigned to only one fold. Randomized split: Select this option if you want ro ws to be randomly assigned to folds. If you do not select th is option, rows are assigned to folds using the rou nd-robin method.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/partition-and-sample" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You are using Azure Machine Learning Studio to perf orm feature engineering on a dataset. You need to normalize values to produce a feature c olumn grouped into bins. Solution: Apply an Entropy Minimum Description Leng th (MDL) binning mode. Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "A. Yes", "explanation": "Entropy MDL binning mode: This method requires that you select the column you want to predict and the column or columns that you want to group into bins. It then makes a pass over the data and attempts to determine the number of bins that minimizes the ent ropy. In other words, it chooses a number of bins t hat allows the data column to best predict the target c olumn. It then returns the bin number associated wi th each row of your data in a column named quantiz ed.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/group-data-into-bins D283ABFBEDB32CDCE3B3406B9C29DB2F Run experiments and train models Question Set 1" }, { "question": "You are analyzing a dataset containing historical d ata from a local taxi company. You are developing a regression model. You must predict the fare of a taxi trip. You need to select performance metrics to correctly evaluate the regression model. Which two metrics can you use? Each correct answer presents a complete solution? NOTE: Each correct selection is worth one point.", "options": [ "A. a Root Mean Square Error value that is low", "B. an R-Squared value close to 0", "C. an F1 score that is low", "D. an R-Squared value close to 1" ], "correct": "", "explanation": "RMSE and R2 are both metrics for regression models. A: Root mean squared error (RMSE) creates a single value that summarizes the error in the model. By squaring the difference, the metric disregards the difference between over-prediction and under-predic tion. D: Coefficient of determination, often referred to as R2, represents the predictive power of the model as a value between 0 and 1. Zero means the model is random (ex plains nothing); 1 means there is a perfect fit. Ho wever, caution should be used in interpreting R2 values, a s low values can be entirely normal and high values can be suspect. Incorrect Answers: C, E: F-score is used for classification models, no t for regression models.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/evaluate-model" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You are using Azure Machine Learning to run an expe riment that trains a classification model. You want to use Hyperdrive to find parameters that optimize the AUC metric for the model. You configur e a HyperDriveConfig for the experiment by running the following code: D283ABFBEDB32CDCE3B3406B9C29DB2F You plan to use this configuration to run a script that trains a random forest model and then tests it with validation data. The label values for the validatio n data are stored in a variable named y_test variab le, and the predicted probabilities from the model are stored i n a variable named y_predicted. You need to add logging to the script to allow Hype rdrive to optimize hyperparameters for the AUC metr ic. Solution: Run the following code: Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "", "explanation": "Python printing/logging example: logging.info(message) Destination: Driver logs, Azure Machine Learning de signer", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-debug-pipelines" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You are using Azure Machine Learning to run an expe riment that trains a classification model. D283ABFBEDB32CDCE3B3406B9C29DB2F You want to use Hyperdrive to find parameters that optimize the AUC metric for the model. You configur e a HyperDriveConfig for the experiment by running the following code: You plan to use this configuration to run a script that trains a random forest model and then tests it with validation data. The label values for the validatio n data are stored in a variable named y_test variab le, and the predicted probabilities from the model are stored i n a variable named y_predicted. You need to add logging to the script to allow Hype rdrive to optimize hyperparameters for the AUC metr ic. Solution: Run the following code: Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "Explanation Use a solution with logging.info(message) instead. Note: Python printing/logging example: logging.info(message) Destination: Driver logs, Azure Machine Learning de signer", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-debug-pipelines" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. D283ABFBEDB32CDCE3B3406B9C29DB2F After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You are using Azure Machine Learning to run an expe riment that trains a classification model. You want to use Hyperdrive to find parameters that optimize the AUC metric for the model. You configur e a HyperDriveConfig for the experiment by running the following code: You plan to use this configuration to run a script that trains a random forest model and then tests it with validation data. The label values for the validatio n data are stored in a variable named y_test variab le, and the predicted probabilities from the model are stored i n a variable named y_predicted. You need to add logging to the script to allow Hype rdrive to optimize hyperparameters for the AUC metr ic. Solution: Run the following code: Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "Explanation Use a solution with logging.info(message) instead. Note: Python printing/logging example: logging.info(message) Destination: Driver logs, Azure Machine Learning de signer", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-debug-pipelines" }, { "question": "D283ABFBEDB32CDCE3B3406B9C29DB2F You use the following code to run a script as an ex periment in Azure Machine Learning: You must identify the output files that are generat ed by the experiment run. You need to add code to retrieve the output file na mes. Which code segment should you add to the script?", "options": [ "A. files = run.get_properties()", "B. files= run.get_file_names()", "C. files = run.get_details_with_logs()", "D. files = run.get_metrics()" ], "correct": "B. files= run.get_file_names()", "explanation": "You can list all of the files that are associated w ith this run record by called run.get_file_names()", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-track-experiments" }, { "question": "You write five Python scripts that must be processe d in the order specified in Exhibit A which allows the same modules to run in parallel, but will wait for modul es with dependencies. You must create an Azure Machine Learning pipeline using the Python SDK, because you want to script to create the pipeline to be tracked in your version c ontrol system. You have created five PythonScriptSt eps and have named the variables to match the module names.You need to create the pipeline shown. Assume all r elevant imports have been done. D283ABFBEDB32CDCE3B3406B9C29DB2F Which Python code segment should you use?", "options": [ "A.", "B.", "C.", "D." ], "correct": "A.", "explanation": "The steps parameter is an array of steps. To build pipelines that have multiple steps, place the steps in order in this array.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-use-parallel-run-step" }, { "question": "You create a datastore named training_data that ref erences a blob container in an Azure Storage accoun t. The blob container contains a folder named csv_files in which multiple comma-separated values (CSV) files are stored. You have a script named train.py in a local folder named ./script that you plan to run as an experimen t using an estimator. The script includes the following code t o read data from the csv_files folder: D283ABFBEDB32CDCE3B3406B9C29DB2F You have the following script. You need to configure the estimator for the experim ent so that the script can read the data from a dat a reference named data_ref that references the csv_fi les folder in the training_data datastore. Which code should you use to configure the estimato r?", "options": [ "A.", "B.", "C.", "D." ], "correct": "B.", "explanation": "Besides passing the dataset through the input param eters in the estimator, you can also pass the datas et through script_params and get the data path (mounti ng point) in your training script via arguments. Th is way, you can keep your training script independent of az ureml-sdk. In other words, you will be able use the same training script for local debugging and remote trai ning on any cloud platform. Example: from azureml.train.sklearn import SKLearn script_params = { # mount the dataset on the remote compute and pass the mounted path as an argument to the training scr ipt '--data-folder': mnist_ds.as_named_input('mnist').a s_mount(), '--regularization': 0.5 } est = SKLearn(source_directory=script_folder, script_params=script_params, compute_target=compute_target, environment_definition=env, entry_script='train_mnist.py') # Run the experiment run = experiment.submit(est) run.wait_for_completion(show_output=True) Incorrect Answers: A: Pandas DataFrame not used.", "references": "https://docs.microsoft.com/es-es/azure/machine-lear ning/how-to-train-with-datasets" }, { "question": "DRAG DROP You create a multi-class image classification deep learning experiment by using the PyTorch framework. You plan to run the experiment on an Azure Compute clus ter that has nodes with GPU's. You need to define an Azure Machine Learning servic e pipeline to perform the monthly retraining of the image classification model. The pipeline must run with mi nimal cost and minimize the time required to train the model. Which three pipeline steps should you run in sequen ce? To answer, move the appropriate actions from th e list D283ABFBEDB32CDCE3B3406B9C29DB2F of actions to the answer area and arrange them in t he correct order. Select and Place:", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Step 1: Configure a DataTransferStep() to fetch new image data... Step 2: Configure a PythonScriptStep() to run image _resize.y on the cpu-compute compute target. Step 3: Configure the EstimatorStep() to run traini ng script on the gpu_compute computer target. The PyTorch estimator provides a simple way of laun ching a PyTorch training job on a compute target.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-train-pytorch" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. An IT department creates the following Azure resour ce groups and resources: D283ABFBEDB32CDCE3B3406B9C29DB2F The IT department creates an Azure Kubernetes Servi ce (AKS)-based inference compute target named aks- cluster in the Azure Machine Learning workspace. You have a Microsoft Surface Book computer with a G PU. Python 3.6 and Visual Studio Code are installed . You need to run a script that trains a deep neural network (DNN) model and logs the loss and accuracy metrics. Solution: Attach the mlvm virtual machine as a comp ute target in the Azure Machine Learning workspace. Install the Azure ML SDK on the Surface Book and ru n Python code to connect to the workspace. Run the training script as an experiment on the mlvm remote compute resource. Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "A. Yes", "explanation": "Use the VM as a compute target. Note: A compute target is a designated compute reso urce/environment where you run your training script or host your service deployment. This location may be your local machine or a cloud-based compute resourc e.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/concept-compute-target" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. An IT department creates the following Azure resour ce groups and resources: D283ABFBEDB32CDCE3B3406B9C29DB2F The IT department creates an Azure Kubernetes Servi ce (AKS)-based inference compute target named aks- cluster in the Azure Machine Learning workspace. You have a Microsoft Surface Book computer with a G PU. Python 3.6 and Visual Studio Code are installed . You need to run a script that trains a deep neural network (DNN) model and logs the loss and accuracy metrics. Solution: Install the Azure ML SDK on the Surface B ook. Run Python code to connect to the workspace an d then run the training script as an experiment on lo cal compute. Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "Need to attach the mlvm virtual machine as a comput e target in the Azure Machine Learning workspace.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/concept-compute-target" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. An IT department creates the following Azure resour ce groups and resources: D283ABFBEDB32CDCE3B3406B9C29DB2F The IT department creates an Azure Kubernetes Servi ce (AKS)-based inference compute target named aks- cluster in the Azure Machine Learning workspace. You have a Microsoft Surface Book computer with a G PU. Python 3.6 and Visual Studio Code are installed . You need to run a script that trains a deep neural network (DNN) model and logs the loss and accuracy metrics. Solution: Install the Azure ML SDK on the Surface B ook. Run Python code to connect to the workspace. R un the training script as an experiment on the aks-clu ster compute target. Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "Need to attach the mlvm virtual machine as a comput e target in the Azure Machine Learning workspace.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/concept-compute-target" }, { "question": "HOTSPOT You plan to use Hyperdrive to optimize the hyperpar ameters selected when training a model. You create the following code to define options for the hyperparam eter experiment: D283ABFBEDB32CDCE3B3406B9C29DB2F For each of the following statements, select Yes if the statement is true. Otherwise, select No. NOTE: Each correct selection is worth one point. Hot Area:", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: No max_total_runs (50 here) The maximum total number of runs to create. This is the upper bound; there may be fewer runs when the sample space is smaller than this value. Box 2: Yes Policy EarlyTerminationPolicy The early termination policy to use. If None - the default, no early termination policy will be used. Box 3: No Discrete hyperparameters are specified as a choice among discrete values. choice can be: one or more comma-separated values a range object any arbitrary list object", "references": "https://docs.microsoft.com/en-us/python/api/azureml -train-core/azureml.train.hyperdrive.hyperdriveconf ig https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-tune-hyperparameters" }, { "question": "HOTSPOT You are using Azure Machine Learning to train machi ne learning models. You need a compute target on wh ich to remotely run the training script. You run the following Python code: For each of the following statements, select Yes if the statement is true. Otherwise, select No. NOTE: Each correct selection is worth one point. Hot Area:", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: Yes The compute is created within your workspace region as a resource that can be shared with other users. Box 2: Yes It is displayed as a compute cluster. View compute targets 1. To see all compute targets for your workspace, u se the following steps: 2. Navigate to Azure Machine Learning studio. 3. Under Manage, select Compute. 4. Select tabs at the top to show each type of comp ute target. Box 3: Yes min_nodes is not specified, so it defaults to 0. D283ABFBEDB32CDCE3B3406B9C29DB2F", "references": "https://docs.microsoft.com/en-us/python/api/azureml -core/ azureml.core.compute.amlcompute.amlcomputeprovision ingconfiguration https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-create-attach-compute-studio" }, { "question": "HOTSPOT You have an Azure blob container that contains a se t of TSV files. The Azure blob container is registe red as a datastore for an Azure Machine Learning service wor kspace. Each TSV file uses the same data schema. You plan to aggregate data for all of the TSV files together and then register the aggregated data as a dataset in an Azure Machine Learning workspace by using the Azure Machine Learning SDK for Python. You run the following code. For each of the following statements, select Yes if the statement is true. Otherwise, select No. NOTE: Each correct selection is worth one point. Hot Area:", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: No FileDataset references single or multiple files in datastores or from public URLs. The TSV files need to be parsed. Box 2: Yes to_path() gets a list of file paths for each file s tream defined by the dataset. Box 3: Yes TabularDataset.to_pandas_dataframe loads all record s from the dataset into a pandas DataFrame. TabularDataset represents data in a tabular format created by parsing the provided file or list of fil es. Note: TSV is a file extension for a tab-delimited f ile used with spreadsheet software. TSV stands for Tab Separated Values. TSV files are used for raw data a nd can be imported into and exported from spreadshe et software. TSV files are essentially text files, and the raw data can be viewed by text editors, though they are often used when moving raw data between spreadsheet s.", "references": "https://docs.microsoft.com/en-us/python/api/azureml -core/azureml.data.tabulardataset" }, { "question": "You create a batch inference pipeline by using the Azure ML SDK. You configure the pipeline parameters by executing the following code: D283ABFBEDB32CDCE3B3406B9C29DB2F You need to obtain the output from the pipeline exe cution. Where will you find the output?", "options": [ "A. the digit_identification.py script", "B. the debug log", "C. the Activity Log in the Azure portal for the Mach ine Learning workspace", "D. the Inference Clusters tab in Machine Learning st udio" ], "correct": "", "explanation": "output_action (str): How the output is to be organi zed. Currently supported values are 'append_row' an d 'summary_only'. 'append_row' All values output by run() method inv ocations will be aggregated into one unique file na med parallel_run_step.txt that is created in the output location. 'summary_only'", "references": "https://docs.microsoft.com/en-us/python/api/azureml -contrib-pipeline-steps/ azureml.contrib.pipeline.steps.parallelrunconfig" }, { "question": "DRAG DROP You create a multi-class image classification deep learning model. The model must be retrained monthly with the new im age data fetched from a public web portal. You crea te an Azure Machine Learning pipeline to fetch new data, standardize the size of images, and retrain the mod el. You need to use the Azure Machine Learning SDK to c onfigure the schedule for the pipeline. Which four actions should you perform in sequence? To answer, move the appropriate actions from the li st of actions to the answer area and arrange them in the correct order. Select and Place: D283ABFBEDB32CDCE3B3406B9C29DB2F", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Step 1: Publish the pipeline. To schedule a pipeline, you'll need a reference to your workspace, the identifier of your published pi peline, and D283ABFBEDB32CDCE3B3406B9C29DB2F the name of the experiment in which you wish to cre ate the schedule. Step 2: Retrieve the pipeline ID. Needed for the schedule. Step 3: Create a ScheduleRecurrence.. To run a pipeline on a recurring basis, you'll crea te a schedule. A Schedule associates a pipeline, an experiment, and a trigger. First create a schedule. Example: Create a Schedule that begins a run every 15 minutes: recurrence = ScheduleRecurrence(frequency=\"Minute\", interval=15) Step 4: Define an Azure Machine Learning pipeline s chedule.. Example, continued: recurring_schedule = Schedule.create(ws, name=\"MyRe curringSchedule\", description=\"Based on time\", pipeline_id=pipeline_id, experiment_name=experiment_name, recurrence=recurrence)", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-schedule-pipelines" }, { "question": "HOTSPOT You create a script for training a machine learning model in Azure Machine Learning service. You create an estimator by running the following co de: For each of the following statements, select Yes if the statement is true. Otherwise, select No. NOTE: Each correct selection is worth one point. Hot Area: D283ABFBEDB32CDCE3B3406B9C29DB2F", "options": [ "A.", "B.", "C. D." ], "correct": "", "explanation": "Box 1: Yes Parameter source_directory is a local directory con taining experiment configuration and code files nee ded for a training job. Box 2: Yes script_params is a dictionary of command-line argum ents to pass to the training script specified in en try_script. Box 3: No Box 4: Yes The conda_packages parameter is a list of strings r epresenting conda packages to be added to the Pytho n environment for the experiment.", "references": "" }, { "question": "HOTSPOT You have a Python data frame named salesData in the following format: D283ABFBEDB32CDCE3B3406B9C29DB2F The data frame must be unpivoted to a long data for mat as follows: You need to use the pandas.melt() function in Pytho n to perform the transformation. How should you complete the code segment? To answer , select the appropriate options in the answer area . NOTE: Each correct selection is worth one point. Hot Area:", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: dataFrame Syntax: pandas.melt(frame, id_vars=None, value_vars =None, var_name=None, value_name='value', col_level=None)[source] Where frame is a DataFrame Box 2: shop Paramter id_vars id_vars : tuple, list, or ndarray, optional Column(s) to use as identifier variables. Box 3: ['2017','2018'] value_vars : tuple, list, or ndarray, optional Column(s) to unpivot. If not specified, uses all co lumns that are not set as id_vars. Example: df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'}, ... 'B': {0: 1, 1: 3, 2: 5}, ... 'C': {0: 2, 1: 4, 2: 6}}) pd.melt(df, id_vars=['A'], value_vars=['B', 'C']) A variable value 0 a B 1 1 b B 3 2 c B 5 3 a C 2 4 b C 4 5 c C 6", "references": "https://pandas.pydata.org/pandas-docs/stable/refere nce/api/pandas.melt.html" }, { "question": "HOTSPOT You are working on a classification task. You have a dataset indicating whether a student would like t o play soccer and associated attributes. The dataset inclu des the following columns: You need to classify variables by type. Which variable should you add to each category? To answer, select the appropriate options in the answe r area. NOTE: Each correct selection is worth one point. Hot Area: D283ABFBEDB32CDCE3B3406B9C29DB2F", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Explanation/Reference:", "references": "https://www.edureka.co/blog/classification-algorith ms/" }, { "question": "HOTSPOT D283ABFBEDB32CDCE3B3406B9C29DB2F You plan to preprocess text from CSV files. You loa d the Azure Machine Learning Studio default stop wo rds list. You need to configure the Preprocess Text module to meet the following requirements: Ensure that multiple related words from a single ca nonical form. Remove pipe characters from text. Remove words to optimize information retrieval. Which three options should you select? To answer, s elect the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area: D283ABFBEDB32CDCE3B3406B9C29DB2F A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: Remove stop words Remove words to optimize information retrieval. Remove stop words: Select this option if you want t o apply a predefined stopword list to the text colu mn. Stop word removal is performed before any other processe s. D283ABFBEDB32CDCE3B3406B9C29DB2F Box 2: Lemmatization Ensure that multiple related words from a single ca nonical form. Lemmatization converts multiple relat ed words to a single canonical form Box 3: Remove special characters Remove special characters: Use this option to repla ce any non-alphanumeric special characters with the pipe | character.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/preprocess-text" }, { "question": "You plan to run a script as an experiment using a S cript Run Configuration. The script uses modules fr om the scipy library as well as several Python packages th at are not typically installed in a default conda e nvironment. You plan to run the experiment on your local workst ation for small datasets and scale out the experime nt by running it on more powerful remote compute clusters for larger datasets. You need to ensure that the experiment runs success fully on local and remote compute with the least administrative effort. What should you do?", "options": [ "A. Do not specify an environment in the run configur ation for the experiment. Run the experiment by usi ng the", "B. Create a virtual machine (VM) with the required P ython configuration and attach the VM as a compute", "C. Create and register an Environment that includes the required packages. Use this Environment for all", "D. Create a config.yaml file defining the conda pack ages that are required and save the file in the exp eriment" ], "correct": "C. Create and register an Environment that includes the required packages. Use this Environment for all", "explanation": "If you have an existing Conda environment on your l ocal computer, then you can use the service to crea te an environment object. By using this strategy, you can reuse your local interactive environment on remote runs.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-use-environments" }, { "question": "You write a Python script that processes data in a comma-separated values (CSV) file. You plan to run this script as an Azure Machine Lea rning experiment. The script loads the data and determines the number of rows it contains using the following code: D283ABFBEDB32CDCE3B3406B9C29DB2F You need to record the row count as a metric named row_count that can be returned using the get_metric s method of the Run object after the experiment run c ompletes. Which code should you use?", "options": [ "A. run.upload_file(T3 row_count', './data.csv')", "B. run.log('row_count', rows)", "C. run.tag('row_count', rows)", "D. run.log_table('row_count', rows)" ], "correct": "B. run.log('row_count', rows)", "explanation": "Log a numerical or string value to the run with the given name using log(name, value, description=''). Logging a metric to a run causes that metric to be stored in the run record in the experiment. You can log the s ame metric multiple times within a run, the result being consi dered a vector of that metric. Example: run.log(\"accuracy\", 0.95) Incorrect Answers: E: Using log_row(name, description=None, **kwargs) creates a metric with multiple columns as described in kwargs. Each named parameter generates a column wit h the value specified. log_row can be called once t o log an arbitrary tuple, or multiple times in a loop to generate a complete table. Example: run.log_row(\"Y over X\", x=1, y=0.4)", "references": "https://docs.microsoft.com/en-us/python/api/azureml -core/azureml.core.run" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You are creating a new experiment in Azure Machine Learning Studio. One class has a much smaller number of observations than the other classes in the training set. You need to select an appropriate data sampling str ategy to compensate for the class imbalance. D283ABFBEDB32CDCE3B3406B9C29DB2F Solution: You use the Synthetic Minority Oversampli ng Technique (SMOTE) sampling mode. Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "A. Yes", "explanation": "SMOTE is used to increase the number of undereprese nted cases in a dataset used for machine learning. SMOTE is a better way of increasing the number of r are cases than simply duplicating existing cases.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/smote" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You are creating a new experiment in Azure Machine Learning Studio. One class has a much smaller number of observations than the other classes in the training set. You need to select an appropriate data sampling str ategy to compensate for the class imbalance. Solution: You use the Stratified split for the samp ling mode. Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "Instead use the Synthetic Minority Oversampling Tec hnique (SMOTE) sampling mode. Note: SMOTE is used to increase the number of under epresented cases in a dataset used for machine learning. SMOTE is a better way of increasing the n umber of rare cases than simply duplicating existin g cases.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/smote" }, { "question": "You are creating a machine learning model. D283ABFBEDB32CDCE3B3406B9C29DB2F You need to identify outliers in the data. Which two visualizations can you use? Each correct answer presents a complete solution. NOTE: Each correct selection is worth one point.", "options": [ "A. Venn diagram", "B. Box plot", "C. ROC curve", "D. Random forest diagram" ], "correct": "", "explanation": "The box-plot algorithm can be used to display outli ers. One other way to quickly identify Outliers visually is to create scatter plots.", "references": "https://blogs.msdn.microsoft.com/azuredev/2017/05/2 7/data-cleansing-tools-in-azure-machine-learning/" }, { "question": "You are evaluating a completed binary classificatio n machine learning model. You need to use the precision as the evaluation met ric. Which visualization should you use?", "options": [ "A. Violin plot", "B. Gradient descent", "C. Box plot", "D. Binary classification confusion matrix" ], "correct": "D. Binary classification confusion matrix", "explanation": "Incorrect Answers: A: A violin plot is a visual that traditionally com bines a box plot and a kernel density plot. B: Gradient descent is a first-order iterative opti mization algorithm for finding the minimum of a fun ction. To find a local minimum of a function using gradient descen t, one takes steps proportional to the negative of the gradient (or approximate gradient) of the function at the current point. C: A box plot lets you see basic distribution infor mation about your data, such as median, mean, range and quartiles but doesn't show you how your data looks throughout its range.", "references": "https://machinelearningknowledge.ai/confusion-matri x-and-performance-metrics-machine-learning/" }, { "question": "D283ABFBEDB32CDCE3B3406B9C29DB2F You create a multi-class image classification deep learning model that uses the PyTorch deep learning framework. You must configure Azure Machine Learning Hyperdriv e to optimize the hyperparameters for the classific ation model. You need to define a primary metric to determine th e hyperparameter values that result in the model wi th the best accuracy score. Which three actions must you perform? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point.", "options": [ "A. Set the primary_metric_goal of the estimator used to run the bird_classifier_train.py script to maxi mize.", "B. Add code to the bird_classifier_train.py script t o calculate the validation loss of the model and lo g it as a", "C. Set the primary_metric_goal of the estimator used to run the bird_classifier_train.py script to mini mize.", "D. Set the primary_metric_name of the estimator used to run the bird_classifier_train.py script to accu racy." ], "correct": "", "explanation": "AD: primary_metric_name=\"accuracy\", primary_metric_goal=PrimaryMetricGoal.MAXIMIZE Optimize the runs to maximize \"accuracy\". Make sure to log this value in your training script. Note: primary_metric_name: The name of the primary metric to optimize. The name of the primary metric needs to exactly match the name of the metric logged by the training script. primary_metric_goal: It can be either PrimaryMetric Goal.MAXIMIZE or PrimaryMetricGoal.MINIMIZE and determines whether the primary metric will be maxim ized or minimized when evaluating the runs. F: The training script calculates the val_accuracy and logs it as \"accuracy\", which is used as the pri mary metric.", "references": "" }, { "question": "DRAG DROP You have a dataset that contains over 150 features. You use the dataset to train a Support Vector Mach ine (SVM) binary classifier. You need to use the Permutation Feature Importance module in Azure Machine Learning Studio to compute a set of feature importance scores for the dataset. In which order should you perform the actions? To a nswer, move all actions from the list of actions to the answer area and arrange them in the correct order. Select and Place: D283ABFBEDB32CDCE3B3406B9C29DB2F", "options": [ "A.", "B.", "C.", "D. Correct Answer:" ], "correct": "", "explanation": "Step 1: Add a Two-Class Support Vector Machine modu le to initialize the SVM classifier. Step 2: Add a dataset to the experiment Step 3: Add a Split Data module to create training and test dataset. To generate a set of feature scor es requires that you have an already trained model, as well as a test dataset. Step 4: Add a Permutation Feature Importance module and connect to the trained model and test dataset. Step 5: Set the Metric for measuring performance pr operty to Classification - Accuracy and then run th e experiment.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/two-class-support-vect or- machine https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/permutation-feature- importance D283ABFBEDB32CDCE3B3406B9C29DB2F" }, { "question": "You are using the Hyperdrive feature in Azure Machi ne Learning to train a model. You configure the Hyperdrive experiment by running the following code: For each of the following statements, select Yes if the statement is true. Otherwise, select No. NOTE: Each correct selection is worth one point. Hot Area:", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "D283ABFBEDB32CDCE3B3406B9C29DB2F Box 1: Yes In random sampling, hyperparameter values are rando mly selected from the defined search space. Random sampling allows the search space to include both di screte and continuous hyperparameters. Box 2: Yes learning_rate has a normal distribution with mean v alue 10 and a standard deviation of 3. Box 3: No keep_probability has a uniform distribution with a minimum value of 0.05 and a maximum value of 0.1. Box 4: No number_of_hidden_layers takes on one of the values [3, 4, 5].", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-tune-hyperparameters" }, { "question": "You are performing a filter-based feature selection for a dataset to build a multi-class classifier by using Azure Machine Learning Studio. The dataset contains categorical features that are highly correlated to the output label column. You need to select the appropriate feature scoring statistical method to identify the key predictors. Which method should you use?", "options": [ "A. Kendall correlation", "B. Spearman correlation", "C. Chi-squared", "D. Pearson correlation Correct Answer: D" ], "correct": "", "explanation": "Pearson's correlation statistic, or Pearson's corre lation coefficient, is also known in statistical mo dels as the r value. For any two variables, it returns a value th at indicates the strength of the correlation Pearson's correlation coefficient is the test stati stics that measures the statistical relationship, o r association, between two continuous variables. It is known as th e best method of measuring the association between variables of interest because it is based on the me thod of covariance. It gives information about the magnitude of the association, or correlation, as well as the direction of the relationship. Incorrect Answers: C: The two-way chi-squared test is a statistical me thod that measures how close expected values are to actual results.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/filter-based-feature-s election https://www.statisticssolutions.com/pearsons-correl ation-coefficient/" }, { "question": "HOTSPOT You create a binary classification model to predict whether a person has a disease. D283ABFBEDB32CDCE3B3406B9C29DB2F You need to detect possible classification errors. Which error type should you choose for each descrip tion? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area: A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: True Positive A true positive is an outcome where the model corre ctly predicts the positive class Box 2: True Negative A true negative is an outcome where the model corre ctly predicts the negative class. Box 3: False Positive A false positive is an outcome where the model inco rrectly predicts the positive class. Box 4: False Negative A false negative is an outcome where the model inco rrectly predicts the negative class. Note: Let's make the following definitions: D283ABFBEDB32CDCE3B3406B9C29DB2F \"Wolf\" is a positive class. \"No wolf\" is a negative class. We can summarize our \"wolf-prediction\" model using a 2x2 confusion matrix that depicts all four possib le outcomes:", "references": "https://developers.google.com/machine-learning/cras h-course/classification/true-false-positive-negativ e" }, { "question": "HOTSPOT You are using the Azure Machine Learning Service to automate hyperparameter exploration of your neural network classification model. You must define the hyperparameter space to automat ically tune hyperparameters using random sampling according to following requirements: The learning rate must be selected from a normal di stribution with a mean value of 10 and a standard deviation of 3. Batch size must be 16, 32 and 64. Keep probability must be a value selected from a un iform distribution between the range of 0.05 and 0. 1. You need to use the param_sampling method of the Py thon API for the Azure Machine Learning Service. How should you complete the code segment? To answer , select the appropriate options in the answer area . NOTE: Each correct selection is worth one point. Hot Area: A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: normal(10,3) Box 2: choice(16, 32, 64) Box 3: uniform(0.05, 0.1) In random sampling, hyperparameter values are rando mly selected from the defined search space. Random sampling allows the search space to include both di screte and continuous hyperparameters. Example: from azureml.train.hyperdrive import RandomParamete rSampling param_sampling = RandomParameterSampling( { \"learning_rate\": normal(10, 3), \"keep_probability\": uniform(0.05, 0.1), \"batch_size\": choice(16, 32, 64) }", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/service/how-to-tune-hyperparameters" }, { "question": "You plan to use automated machine learning to train a regression model. You have data that has feature s which have missing values, and categorical features with few distinct values. D283ABFBEDB32CDCE3B3406B9C29DB2F You need to configure automated machine learning to automatically impute missing values and encode categorical features as part of the training task. Which parameter and value pair should you use in th e AutoMLConfig class?", "options": [ "A. featurization = 'auto'", "B. enable_voting_ensemble = True", "C. task = 'classification'", "D. exclude_nan_labels = True" ], "correct": "A. featurization = 'auto'", "explanation": "Featurization str or FeaturizationConfig Values: 'auto' / 'off' / FeaturizationConfig Indicator for whether featurization step should be done automatically or not, or whether customized fe aturization should be used. Column type is automatically detected. Based on the detected column type preprocessing/featurization i s done as follows: Categorical: Target encoding, one hot encoding, dro p high cardinality categories, impute missing value s. Numeric: Impute missing values, cluster distance, w eight of evidence. DateTime: Several features such as day, seconds, minutes, hours etc. Text: Bag of words, pre-trained Word embedding, tex t target encoding.", "references": "https://docs.microsoft.com/en-us/python/api/azureml -train-automl-client/ azureml.train.automl.automlconfig.automlconfig" }, { "question": "DRAG DROP You create a training pipeline using the Azure Mach ine Learning designer. You upload a CSV file that c ontains the data from which you want to train your model. You need to use the designer to create a pipeline t hat includes steps to perform the following tasks: Select the training features using the pandas filte r method. Train a model based on the naive_bayes.GaussianNB a lgorithm. Return only the Scored Labels column by using the q uery SELECT [Scored Labels] FROM t1; Which modules should you use? To answer, drag the a ppropriate modules to the appropriate locations. Ea ch module name may be used once, more than once, or no t at all. You may need to drag the split bar betwee n panes or scroll to view content. NOTE: Each correct selection is worth one point. Select and Place: D283ABFBEDB32CDCE3B3406B9C29DB2F A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: Two-Class Neural Network The Two-Class Neural Network creates a binary class ifier using a neural network algorithm. Train a mod el based on the naive_bayes.GaussianNB algorithm. Box 2: Execute python script Select the training features using the pandas filte r method Box 3: Select Columns in DataSet Return only the Scored Labels column by using the q uery SELECT [Scored Labels] FROM t1;", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/two-class-neural-netwo rk" }, { "question": "You are building a regression model for estimating the number of calls during an event. You need to determine whether the feature values ac hieve the conditions to build a Poisson regression model. Which two conditions must the feature set contain? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point. D283ABFBEDB32CDCE3B3406B9C29DB2F", "options": [ "A. The label data must be a negative value.", "B. The label data must be whole numbers.", "C. The label data must be non-discrete.", "D. The label data must be a positive value." ], "correct": "", "explanation": "Poisson regression is intended for use in regressio n models that are used to predict numeric values, t ypically counts. Therefore, you should use this module to cr eate your regression model only if the values you a re trying to predict fit the following conditions: The response variable has a Poisson distribution. Counts cannot be negative. The method will fail out right if you attempt to use it with negative labels . A Poisson distribution is a discrete distribution; therefore, it is not meaningful to use this method with non- whole numbers.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/poisson-regression" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You are creating a new experiment in Azure Machine Learning Studio. One class has a much smaller number of observations than the other classes in the training set. You need to select an appropriate data sampling str ategy to compensate for the class imbalance. Solution: You use the Principal Components Analysis (PCA) sampling mode. Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "Instead use the Synthetic Minority Oversampling Tec hnique (SMOTE) sampling mode. Note: SMOTE is used to increase the number of under epresented cases in a dataset used for machine learning. SMOTE is a better way of increasing the n umber of rare cases than simply duplicating existin g cases. Incorrect Answers: The Principal Component Analysis module in Azure Ma chine Learning Studio (classic) is used to reduce t he dimensionality of your training data. The module an alyzes your data and creates a reduced feature set that D283ABFBEDB32CDCE3B3406B9C29DB2F captures all the information contained in the datas et, but in a smaller number of features.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/smote https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/principal-component- analysis" }, { "question": "You are performing feature engineering on a dataset . You must add a feature named CityName and populate the column value with the text London. You need to add the new feature to the dataset. Which Azure Machine Learning Studio module should y ou use?", "options": [ "A. Edit Metadata", "B. Filter Based Feature Selection", "C. Execute Python Script", "D. Latent Dirichlet Allocation" ], "correct": "A. Edit Metadata", "explanation": "Typical metadata changes might include marking colu mns as features.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/edit-metadata" }, { "question": "You are evaluating a completed binary classificatio n machine learning model. You need to use the precision as the evaluation met ric. Which visualization should you use?", "options": [ "A. violin plot", "B. Gradient descent", "C. Scatter plot", "D. Receiver Operating Characteristic (ROC) curve" ], "correct": "D. Receiver Operating Characteristic (ROC) curve", "explanation": "Receiver operating characteristic (or ROC) is a plo t of the correctly classified labels vs. the incorr ectly classified labels for a particular model. Incorrect Answers: A: A violin plot is a visual that traditionally com bines a box plot and a kernel density plot. D283ABFBEDB32CDCE3B3406B9C29DB2F B: Gradient descent is a first-order iterative opti mization algorithm for finding the minimum of a fun ction. To find a local minimum of a function using gradient descen t, one takes steps proportional to the negative of the gradient (or approximate gradient) of the function at the current point. C: A scatter plot graphs the actual values in your data against the values predicted by the model. The scatter plot displays the actual values along the X-axis, a nd displays the predicted values along the Y-axis. It also displays a line that illustrates the perfect predic tion, where the predicted value exactly matches the actual value.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-understand-automated-ml#confusion-matri x" }, { "question": "You are solving a classification task. You must evaluate your model on a limited data samp le by using k-fold cross-validation. You start by configuring a k parameter as the number of splits. You need to configure the k parameter for the cross -validation. Which value should you use?", "options": [ "A. k=1", "B. k=10", "C. k=0.5", "D. k=0.9" ], "correct": "B. k=10", "explanation": "Leave One Out (LOO) cross-validation Setting K = n (the number of observations) yields n -fold and is called leave-one out cross-validation (LOO), a special case of the K-fold approach. LOO CV is sometimes useful but typically doesn't sh ake up the data enough. The estimates from each fol d are highly correlated and hence their average can have high variance. This is why the usual choice is K=5 or 10. It provides a good compromise for the bias-variance tr adeoff.", "references": "" }, { "question": "HOTSPOT You have a dataset created for multiclass classific ation tasks that contains a normalized numerical fe ature set with 10,000 data points and 150 features. You use 75 percent of the data points for training and 25 percent for testing. You are using the sciki t-learn machine learning library in Python. You use X to de note the feature set and Y to denote class labels. You create the following Python data frames: D283ABFBEDB32CDCE3B3406B9C29DB2F You need to apply the Principal Component Analysis (PCA) method to reduce the dimensionality of the fe ature set to 10 features in both training and testing set s. How should you complete the code segment? To answer , select the appropriate options in the answer area . NOTE: Each correct selection is worth one point. Hot Area: A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: PCA(n_components = 10) Need to reduce the dimensionality of the feature se t to 10 features in both training and testing sets. Example: from sklearn.decomposition import PCA pca = PCA(n_components=2) ;2 dimensions principalComponents = pca.fit_transform(x) Box 2: pca fit_transform(X[, y]) fits the model with X and app ly the dimensionality reduction on X. Box 3: transform(x_test) transform(X) applies dimensionality reduction to X.", "references": "https://scikit-learn.org/stable/modules/generated/s klearn.decomposition.PCA.html" }, { "question": "HOTSPOT You have a feature set containing the following num erical features: X, Y, and Z. The Poisson correlation coefficient (r-value) of X, Y, and Z features is shown in the following image: D283ABFBEDB32CDCE3B3406B9C29DB2F Use the drop-down menus to select the answer choice that answers each question based on the informatio n presented in the graphic. NOTE: Each correct selection is worth one point. Hot Area:", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: 0.859122 Box 2: a positively linear relationship D283ABFBEDB32CDCE3B3406B9C29DB2F +1 indicates a strong positive linear relationship -1 indicates a strong negative linear correlation 0 denotes no linear relationship between the two va riables.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/compute-linear-correla tion" }, { "question": "DRAG DROP You plan to explore demographic data for home owner ship in various cities. The data is in a CSV file w ith the following format: age,city,income,home_owner 21,Chicago,50000,0 35,Seattle,120000,1 23,Seattle,65000,0 45,Seattle,130000,1 18,Chicago,48000,0 You need to run an experiment in your Azure Machine Learning workspace to explore the data and log the results. The experiment must log the following info rmation: the number of observations in the dataset a box plot of income by home_owner a dictionary containing the city names and the aver age income for each city You need to use the appropriate logging methods of the experiment's run object to log the required inf ormation. How should you complete the code? To answer, drag t he appropriate code segments to the correct locatio ns. Each code segment may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content. NOTE: Each correct selection is worth one point. Select and Place: D283ABFBEDB32CDCE3B3406B9C29DB2F", "options": [ "A. B.", "C.", "D." ], "correct": "", "explanation": "Box 1: log The number of observations in the dataset. run.log(name, value, description='') Scalar values: Log a numerical or string value to t he run with the given name. Logging a metric to a r un causes D283ABFBEDB32CDCE3B3406B9C29DB2F that metric to be stored in the run record in the e xperiment. You can log the same metric multiple tim es within a run, the result being considered a vector of that m etric. Example: run.log(\"accuracy\", 0.95) Box 2: log_image A box plot of income by home_owner. log_image Log an image to the run record. Use log_i mage to log a .PNG image file or a matplotlib plot to the run. These images will be visible and comparable in the run record. Example: run.log_image(\"ROC\", plot=plt) Box 3: log_table A dictionary containing the city names and the aver age income for each city. log_table: Log a dictionary object to the run with the given name.", "references": "" }, { "question": "You use the Azure Machine Learning service to creat e a tabular dataset named training_data. You plan t o use this dataset in a training script. You create a variable that references the dataset u sing the following code: training_ds = workspace.datasets.get(\"training_data \") You define an estimator to run the script. You need to set the correct property of the estimat or to ensure that your script can access the traini ng_data dataset. Which property should you set?", "options": [ "A. environment_definition = {\"training_data\":trainin g_ds}", "B. inputs = [training_ds.as_named_input('training_ds ')]", "C. script_params = {\"--training_ds\":training_ds}", "D. source_directory = training_ds" ], "correct": "B. inputs = [training_ds.as_named_input('training_ds ')]", "explanation": "Example: # Get the training dataset diabetes_ds = ws.datasets.get(\"Diabetes Dataset\") # Create an estimator that uses the remote compute hyper_estimator = SKLearn(source_directory=experime nt_folder, inputs=[diabetes_ds.as_named_input ('diabetes')], # Pass the dataset as an input compu te_target = cpu_cluster, conda_packages=['pandas','ipykernel','matplotlib'], pip_packages=['azureml-sdk','argparse','pyarrow'], entry_script='diabetes_training.py')", "references": "https://notebooks.azure.com/GraemeMalcolm/projects/ azureml-primers/html/04%20-%20Optimizing%20Model %20Training.ipynb D283ABFBEDB32CDCE3B3406B9C29DB2F" }, { "question": "You register a file dataset named csv_folder that r eferences a folder. The folder includes multiple co mma- separated values (CSV) files in an Azure storage bl ob container. You plan to use the following code to run a script that loads data from the file dataset. You create a nd instantiate the following variables: You have the following code: You need to pass the dataset to ensure that the scr ipt can read the files it references. Which code segment should you insert to replace the code comment?", "options": [ "A. inputs=[file_dataset.as_named_input('training_fil es')],", "B. inputs=[file_dataset.as_named_input('training_fil es').as_mount()],", "C. inputs=[file_dataset.as_named_input('training_fil es').to_pandas_dataframe()],", "D. script_params={'--training_files': file_dataset}," ], "correct": "B. inputs=[file_dataset.as_named_input('training_fil es').as_mount()],", "explanation": "Example: from azureml.train.estimator import Estimator script_params = { # to mount files referenced by mnist dataset '--data-folder': mnist_file_dataset.as_named_input( 'mnist_opendataset').as_mount(), '--regularization' : 0.5 } est = Estimator(source_directory=script_folder, script_params=script_params, compute_target=compute_target, environment_definition=env, entry_script='train.py')", "references": "D283ABFBEDB32CDCE3B3406B9C29DB2F https://docs.microsoft.com/en-us/azure/machine-lear ning/tutorial-train-models-with-aml" }, { "question": "You are creating a new Azure Machine Learning pipel ine using the designer. The pipeline must train a model using data in a com ma-separated values (CSV) file that is published on a website. You have not created a dataset for this fi le. You need to ingest the data from the CSV file into the designer pipeline using the minimal administrat ive effort. Which module should you add to the pipeline in Desi gner?", "options": [ "A. Convert to CSV", "B. Enter Data Manually", "C. Import Data", "D. Dataset" ], "correct": "D. Dataset", "explanation": "The preferred way to provide data to a pipeline is a Dataset object. The Dataset object points to data that lives in or is accessible from a datastore or at a Web UR L. The Dataset class is abstract, so you will creat e an instance of either a FileDataset (referring to one or more files) or a TabularDataset that's created b y from one or more files with delimited columns of data. Example: from azureml.core import Dataset iris_tabular_dataset = Dataset.Tabular.from_delimit ed_files([(def_blob_store, 'train-dataset/iris.csv' )])", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-create-your-first-pipeline" }, { "question": "You define a datastore named ml-data for an Azure S torage blob container. In the container, you have a folder named train that contains a file named data.csv. Yo u plan to use the file to train a model by using th e Azure Machine Learning SDK. You plan to train the model by using the Azure Mach ine Learning SDK to run an experiment on local comp ute. You define a DataReference object by running the fo llowing code: D283ABFBEDB32CDCE3B3406B9C29DB2F You need to load the training data. Which code segment should you use?", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Example: data_folder = args.data_folder # Load Train and Test data train_data = pd.read_csv(os.path.join(data_folder, 'data.csv'))", "references": "https://www.element61.be/en/resource/azure-machine- learning-services-complete-toolbox-ai" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have D283ABFBEDB32CDCE3B3406B9C29DB2F more than one correct solution, while others might not have a correct solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You create an Azure Machine Learning service datast ore in a workspace. The datastore contains the foll owing files: /data/2018/Q1.csv /data/2018/Q2.csv /data/2018/Q3.csv /data/2018/Q4.csv /data/2019/Q1.csv All files store data in the following format: id,f1,f2,I 1,1,2,0 2,1,1,1 3,2,1,0 4,2,2,1 You run the following code: You need to create a dataset named training_data an d load the data from all files into a single data f rame by using the following code: Solution: Run the following code: Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "Define paths with two file paths instead. Use Dataset.Tabular_from_delimeted as the data isn' t cleansed. D283ABFBEDB32CDCE3B3406B9C29DB2F", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-create-register-datasets" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You create an Azure Machine Learning service datast ore in a workspace. The datastore contains the foll owing files: /data/2018/Q1.csv /data/2018/Q2.csv /data/2018/Q3.csv /data/2018/Q4.csv /data/2019/Q1.csv All files store data in the following format: id,f1,f2,I 1,1,2,0 2,1,1,1 3,2,1,0 4,2,2,1 You run the following code: You need to create a dataset named training_data an d load the data from all files into a single data f rame by using the following code: Solution: Run the following code: Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "Use two file paths. Use Dataset.Tabular_from_delimeted, instead of Data set.File.from_files as the data isn't cleansed. Note: A FileDataset references single or multiple files i n your datastores or public URLs. If your data is a lready cleansed, and ready to use in training experiments, you can download or mount the files to your comput e as a FileDataset object. A TabularDataset represents data in a tabular forma t by parsing the provided file or list of files. Th is provides you with the ability to materialize the data into a pandas or Spark DataFrame so you can work with fam iliar data preparation and training libraries without having t o leave your notebook. You can create a TabularData set object from .csv, .tsv, .parquet, .jsonl files, and from SQL query results.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-create-register-datasets" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You create an Azure Machine Learning service datast ore in a workspace. The datastore contains the foll owing files: /data/2018/Q1.csv /data/2018/Q2.csv /data/2018/Q3.csv /data/2018/Q4.csv /data/2019/Q1.csv All files store data in the following format: id,f1,f2,I 1,1,2,0 2,1,1,1 3,2,1,0 4,2,2,1 You run the following code: You need to create a dataset named training_data an d load the data from all files into a single data f rame by using the following code: D283ABFBEDB32CDCE3B3406B9C29DB2F Solution: Run the following code: Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "A. Yes", "explanation": "Use two file paths. Use Dataset.Tabular_from_delimeted as the data isn' t cleansed. Note: A TabularDataset represents data in a tabular forma t by parsing the provided file or list of files. Th is provides you with the ability to materialize the data into a pandas or Spark DataFrame so you can work with fam iliar data preparation and training libraries without having t o leave your notebook. You can create a TabularData set object from .csv, .tsv, .parquet, .jsonl files, and from SQL query results.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-create-register-datasets" }, { "question": "You plan to use the Hyperdrive feature of Azure Mac hine Learning to determine the optimal hyperparamet er values when training a model. You must use Hyperdrive to try combinations of the following hyperparameter values: learning_rate: any value between 0.001 and 0.1 batch_size: 16, 32, or 64 You need to configure the search space for the Hype rdrive experiment. Which two parameter expressions should you use? Eac h correct answer presents part of the solution. NOTE: Each correct selection is worth one point.", "options": [ "A. a choice expression for learning_rate", "B. a uniform expression for learning_rate", "C. a normal expression for batch_size", "D. a choice expression for batch_size" ], "correct": "", "explanation": "B: Continuous hyperparameters are specified as a di stribution over a continuous range of values. Suppo rted distributions include: uniform(low, high) - Returns a value uniformly dist ributed between low and high D: Discrete hyperparameters are specified as a choi ce among discrete values. choice can be: one or more comma-separated values a range object any arbitrary list object", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-tune-hyperparameters" }, { "question": "HOTSPOT Your Azure Machine Learning workspace has a dataset named real_estate_data. A sample of the data in th e dataset follows. You want to use automated machine learning to find the best regression model for predicting the price column. You need to configure an automated machine learning experiment using the Azure Machine Learning SDK. How should you complete the code? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area: D283ABFBEDB32CDCE3B3406B9C29DB2F A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: training_data The training data to be used within the experiment. It should contain both training features and a lab el column (optionally a sample weights column). If training_d ata is specified, then the label_column_name parame ter must also be specified. Box 2: validation_data Provide validation data: In this case, you can eith er start with a single data file and split it into training and validation sets or you can provide a separate data file for the validation set. Either way, the valida tion_data parameter in your AutoMLConfig object assigns which data to use as your validation set. Example, the following code example explicitly defi nes which portion of the provided data in dataset t o use for training and validation. dataset = Dataset.Tabular.from_delimited_files(data ) D283ABFBEDB32CDCE3B3406B9C29DB2F training_data, validation_data = dataset.random_spl it(percentage=0.8, seed=1) automl_config = AutoMLConfig(compute_target = aml_r emote_compute, task = 'classification', primary_metric = 'AUC_weighted', training_data = training_data, validation_data = validation_data, label_column_name = 'Class' ) Box 3: label_column_name label_column_name: The name of the label column. If the input data is from a pandas.DataFrame which doesn't have column names, column indices can be used instead, expresse d as integers. This parameter is applicable to training_data and v alidation_data parameters. Incorrect Answers: X: The training features to use when fitting pipeli nes during an experiment. This setting is being dep recated. Please use training_data and label_column_name inst ead. Y: The training labels to use when fitting pipeline s during an experiment. This is the value your mode l will predict. This setting is being deprecated. Please u se training_data and label_column_name instead. X_valid: Validation features to use when fitting pi pelines during an experiment. If specified, then y_ valid or sample_weight_valid must also be specified. Y_valid: Validation labels to use when fitting pipe lines during an experiment. Both X_valid and y_valid must be specified together . exclude_nan_labels: Whether to exclude rows with Na N values in the label. The default is True. y_max: y_max (float) Maximum value of y for a regression experiment. The combination of y_min and y_max are used to normali ze test set metrics based on the input data range. If not specified, the maximum value is inferred from t he data.", "references": "https://docs.microsoft.com/en-us/python/api/azureml -train-automl-client/ azureml.train.automl.automlconfig.automlconfig?view =azure-ml-py" }, { "question": "HOTSPOT You have a multi-class image classification deep le arning model that uses a set of labeled photographs . You create the following code to select hyperparameter values when training the model. For each of the following statements, select Yes if the statement is true. Otherwise, select No. NOTE: Each correct selection is worth one point. Hot Area: D283ABFBEDB32CDCE3B3406B9C29DB2F", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Explanation/Reference: Box 1: Yes Hyperparameters are adjustable parameters you choos e to train a model that govern the training process itself. Azure Machine Learning allows you to automate hyper parameter exploration in an efficient manner, savin g you significant time and resources. You specify the ran ge of hyperparameter values and a maximum number of training runs. The system then automatically launch es multiple simultaneous runs with different parame ter configurations and finds the configuration that res ults in the best performance, measured by the metri c you choose. Poorly performing training runs are automat ically early terminated, reducing wastage of comput e resources. These resources are instead used to expl ore other hyperparameter configurations. Box 2: Yes uniform(low, high) - Returns a value uniformly dist ributed between low and high Box 3: No Bayesian sampling does not currently support any ea rly termination policy.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-tune-hyperparameters" }, { "question": "You run an automated machine learning experiment in an Azure Machine Learning workspace. Information about the run is listed in the table below: D283ABFBEDB32CDCE3B3406B9C29DB2F You need to write a script that uses the Azure Mach ine Learning SDK to retrieve the best iteration of the experiment run. Which Python code segment should you use?", "options": [ "A.", "B.", "C.", "D." ], "correct": "D.", "explanation": "The get_output method on automl_classifier returns the best run and the fitted model for the last invo cation. Overloads on get_output allow you to retrieve the b est run and fitted model for any logged metric or f or a particular iteration. In [ ]: best_run, fitted_model = local_run.get_output()", "references": "https://notebooks.azure.com/azureml/projects/azurem l-getting-started/html/how-to-use-azureml/automated - machine-learning/classification-with-deployment/aut o-ml-classification-with-deployment.ipynb" }, { "question": "You have a comma-separated values (CSV) file contai ning data from which you want to train a classifica tion model. You are using the Automated Machine Learning interf ace in Azure Machine Learning studio to train the classification model. You set the task type to Clas sification. D283ABFBEDB32CDCE3B3406B9C29DB2F You need to ensure that the Automated Machine Learn ing process evaluates only linear models. What should you do?", "options": [ "A. Add all algorithms other than linear ones to the blocked algorithms list.", "B. Set the Exit criterion option to a metric score t hreshold.", "C. Clear the option to perform automatic featurizati on.", "D. Clear the option to enable deep learning." ], "correct": "C. Clear the option to perform automatic featurizati on.", "explanation": "Automatic featurization can fit non-linear models.", "references": "https://econml.azurewebsites.net/spec/estimation/dm l.html https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-use-automated-ml-for-ml-models" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You plan to use a Python script to run an Azure Mac hine Learning experiment. The script creates a refe rence to the experiment run context, loads data from a file, identifies the set of unique values for the label column, and completes the experiment run: The experiment must record the unique labels in the data as metrics for the run that can be reviewed l ater. You must add code to the script to record the uniqu e label values as run metrics at the point indicate d by the comment. Solution: Replace the comment with the following co de: D283ABFBEDB32CDCE3B3406B9C29DB2F run.upload_file('outputs/labels.csv', './data.csv') Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "label_vals has the unique labels (from the statemen t label_vals = data['label'].unique()), and it has to be logged. Note: Instead use the run_log function to log the content s in label_vals: for label_val in label_vals: run.log('Label Values', label_val)", "references": "https://www.element61.be/en/resource/azure-machine- learning-services-complete-toolbox-ai" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You plan to use a Python script to run an Azure Mac hine Learning experiment. The script creates a refe rence to the experiment run context, loads data from a file, identifies the set of unique values for the label column, and completes the experiment run: The experiment must record the unique labels in the data as metrics for the run that can be reviewed l ater. You must add code to the script to record the uniqu e label values as run metrics at the point indicate d by the comment. Solution: Replace the comment with the following co de: D283ABFBEDB32CDCE3B3406B9C29DB2F run.log_table('Label Values', label_vals) Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "Instead use the run_log function to log the content s in label_vals: for label_val in label_vals: run.log('Label Values', label_val)", "references": "https://www.element61.be/en/resource/azure-machine- learning-services-complete-toolbox-ai" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You plan to use a Python script to run an Azure Mac hine Learning experiment. The script creates a refe rence to the experiment run context, loads data from a file, identifies the set of unique values for the label column, and completes the experiment run: The experiment must record the unique labels in the data as metrics for the run that can be reviewed l ater. You must add code to the script to record the uniqu e label values as run metrics at the point indicate d by the comment. Solution: Replace the comment with the following co de: for label_val in label_vals: run.log('Label Values', label_val) D283ABFBEDB32CDCE3B3406B9C29DB2F Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "A. Yes", "explanation": "The run_log function is used to log the contents in label_vals: for label_val in label_vals: run.log('Label Values', label_val)", "references": "https://www.element61.be/en/resource/azure-machine- learning-services-complete-toolbox-ai" }, { "question": "HOTSPOT You publish a batch inferencing pipeline that will be used by a business application. The application developers need to know which infor mation should be submitted to and returned by the R EST interface for the published pipeline. You need to identify the information required in th e REST request and returned as a response from the published pipeline. Which values should you use in the REST request and to expect in the response? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area: D283ABFBEDB32CDCE3B3406B9C29DB2F A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: JSON containing an OAuth bearer token Specify your authentication header in the request. To run the pipeline from the REST endpoint, you nee d an OAuth2 Bearer-type authentication header. Box 2: JSON containing the experiment name Add a JSON payload object that has the experiment n ame. Example: rest_endpoint = published_pipeline.endpoint response = requests.post(rest_endpoint, headers=auth_header, json={\"ExperimentName\": \"batch_scoring\", \"ParameterAssignments\": {\"process_count_per_node\": 6}}) run_id = response.json()[\"Id\"] Box 3: JSON containing the run ID Make the request to trigger the run. Include code t o access the Id key from the response dictionary to get the value of the run ID. D283ABFBEDB32CDCE3B3406B9C29DB2F", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/tutorial-pipeline-batch-scoring-classification" }, { "question": "HOTSPOT You create an experiment in Azure Machine Learning Studio. You add a training dataset that contains 10 ,000 rows. The first 9,000 rows represent class 0 (90 pe rcent). The remaining 1,000 rows represent class 1 (10 perc ent). The training set is imbalances between two classes. You must increase the number of training examples for class 1 to 4,000 by using 5 data rows. You add the Synthetic Minority Oversampling Technique (SMOTE) module to the experiment. You need to configure the module. Which values should you use? To answer, select the appropriate options in the dialog box in the answer area. NOTE: Each correct selection is worth one point. Hot Area: A.", "options": [ "B.", "C.", "D.", "A. k=0.5", "B. k=0.01", "C. k=5", "D. k=1" ], "correct": "C. k=5", "explanation": "Leave One Out (LOO) cross-validation Setting K = n (the number of observations) yields n -fold and is called leave-one out cross-validation (LOO), a special case of the K-fold approach. LOO CV is sometimes useful but typically doesn't sh ake up the data enough. The estimates from each fol d are highly correlated and hence their average can have high variance. This is why the usual choice is K=5 or 10. It provides a good compromise for the bias-variance tr adeoff.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/smote D283ABFBEDB32CDCE3B3406B9C29DB2F QUESTION 117 You are solving a classification task. You must evaluate your model on a limited data samp le by using k-fold cross-validation. You start by configuring a k parameter as the number of splits. You need to configure the k parameter for the cross -validation. Which value should you use?" }, { "question": "HOTSPOT You are running Python code interactively in a Cond a environment. The environment includes all require d Azure Machine Learning SDK and MLflow packages. You must use MLflow to log metrics in an Azure Mach ine Learning experiment named mlflow-experiment. How should you complete the code? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area: D283ABFBEDB32CDCE3B3406B9C29DB2F A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: mlflow.set_tracking_uri(ws.get_mlflow_tracki ng_uri()) In the following code, the get_mlflow_tra cking_uri () method assigns a unique tracking URI address to the workspace, ws, and set_tracking_uri() points th e MLflow tracking URI to that address. mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri( )) Box 2: mlflow.set_experiment(experiment_name) Set the MLflow experiment name with set_experiment( ) and start your training run with start_run(). Box 3: mlflow.start_run() Box 4: mlflow.log_metric Then use log_metric() to activate the MLflow loggin g API and begin logging your training run metrics. D283ABFBEDB32CDCE3B3406B9C29DB2F", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-use-mlflow" }, { "question": "DRAG DROP You are creating a machine learning model that can predict the species of a penguin from its measureme nts. You have a file that contains measurements for thre e species of penguin in comma-delimited format. The model must be optimized for area under the rece ived operating characteristic curve performance met ric, averaged for each class. You need to use the Automated Machine Learning user interface in Azure Machine Learning studio to run an experiment and find the best performing model. Which five actions should you perform in sequence? To answer, move the appropriate actions from the li st of actions to the answer area and arrange them in the correct order. Select and Place: A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "Step 1:Create and select a new dataset by uploading he command-delimited file of penguin data. Step 2: Select the Classification task type Step 3: Set the Primary metric configuration settin g to Accuracy. The available metrics you can select is determined by the task type you choose. Primary metrics for classification scenarios: Post thresholded metrics, like accuracy, average_pr ecision_score_weighted, norm_macro_recall, and precision_score_weighted may not optimize as well f or datasets which are very small, have very large c lass skew (class imbalance), or when the expected metric value is very close to 0.0 or 1.0. In those cases, AUC_weighted can be a better choice for the primary metric. Step 4: Configure the automated machine learning ru n by selecting the experiment name, target column, and compute target Step 5: Run the automated machine learning experime nt and review the results.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-configure-auto-train" }, { "question": "HOTSPOT You are tuning a hyperparameter for an algorithm. T he following table shows a data set with different hyperparameter, training error, and validation erro rs. Use the drop-down menus to select the answer choice that answers each question based on the informatio n presented in the graphic. Hot Area: D283ABFBEDB32CDCE3B3406B9C29DB2F", "options": [ "A.", "B. C.", "D." ], "correct": "", "explanation": "Box 1: 4 Choose the one which has lower training and validat ion error and also the closest match. Minimize vari ance (difference between validation error and train erro r). Box 2: 5 Minimize variance (difference between validation er ror and train error).", "references": "https://medium.com/comet-ml/organizing-machine-lear ning-projects-project-management-guidelines- 2d2b85651bbd" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You create a model to forecast weather conditions b ased on historical data. You need to create a pipeline that runs a processin g script to load data from a datastore and pass the processed data to a machine learning model training script. Solution: Run the following code: Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "The two steps are present: process_step and train_s tep The training data input is not setup correctly. Note: Data used in pipeline can be produced by one step a nd consumed in another step by providing a Pipeline Data object as an output of one step and an input of one or more subsequent steps. PipelineData objects are also used when constructin g Pipelines to describe step dependencies. To speci fy that a step requires the output of another step as input , use a PipelineData object in the constructor of b oth steps. For example, the pipeline train step depends on the process_step_output output of the pipeline process step: from azureml.pipeline.core import Pipeline, Pipelin eData from azureml.pipeline.steps import PythonScriptStep datastore = ws.get_default_datastore() process_step_output = PipelineData(\"processed_data\" , datastore=datastore) process_step = PythonScriptS tep (script_name=\"process.py\", arguments=[\"--data_for_train\", process_step_output] , outputs=[process_step_output], compute_target=aml_compute, source_directory=process_directory) train_step = PythonScriptStep(script_name=\"train.py \", arguments=[\"--data_for_train\", process_step_output] , inputs=[process_step_output], compute_target=aml_compute, source_directory=train_directory) pipeline = Pipeline(workspace=ws, steps=[process_st ep, train_step])", "references": "https://docs.microsoft.com/en-us/python/api/azureml -pipeline-core/azureml.pipeline.core.pipelinedata? view=azure-ml-py" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You create a model to forecast weather conditions b ased on historical data. You need to create a pipeline that runs a processin g script to load data from a datastore and pass the processed data to a machine learning model training script. Solution: Run the following code: D283ABFBEDB32CDCE3B3406B9C29DB2F Does the solution meet the goal?", "options": [ "A. Yes", "B. No Correct Answer: B" ], "correct": "", "explanation": "train_step is missing.", "references": "https://docs.microsoft.com/en-us/python/api/azureml -pipeline-core/azureml.pipeline.core.pipelinedata? view=azure-ml-py" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You create a model to forecast weather conditions b ased on historical data. You need to create a pipeline that runs a processin g script to load data from a datastore and pass the processed data to a machine learning model training script. Solution: Run the following code: D283ABFBEDB32CDCE3B3406B9C29DB2F Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "", "explanation": "Note: Data used in pipeline can be produced by one step and consumed in another step by providing a PipelineData object as an output of one step and an input of one or more subsequent steps. Compare with this example, the pipeline train step depends on the process_step_output output of the pi peline process step: from azureml.pipeline.core import Pipeline, Pipelin eData from azureml.pipeline.steps import PythonScriptStep datastore = ws.get_default_datastore() process_step_output = PipelineData(\"processed_data\" , datastore=datastore) process_step = PythonScriptS tep (script_name=\"process.py\", arguments=[\"--data_for_train\", process_step_output] , outputs=[process_step_output], compute_target=aml_compute, source_directory=process_directory) train_step = PythonScriptStep(script_name=\"train.py \", arguments=[\"--data_for_train\", process_step_output] , inputs=[process_step_output], compute_target=aml_compute, source_directory=train_directory) pipeline = Pipeline(workspace=ws, steps=[process_st ep, train_step])", "references": "https://docs.microsoft.com/en-us/python/api/azureml -pipeline-core/azureml.pipeline.core.pipelinedata? view=azure-ml-py" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have D283ABFBEDB32CDCE3B3406B9C29DB2F more than one correct solution, while others might not have a correct solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You have a Python script named train.py in a local folder named scripts. The script trains a regressio n model by using scikit-learn. The script includes code to loa d a training data file which is also located in the scripts folder. You must run the script as an Azure ML experiment o n a compute cluster named aml-compute. You need to configure the run to ensure that the en vironment includes the required packages for model training. You have instantiated a variable named am l-compute that references the target compute cluste r. Solution: Run the following code: Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "A. Yes", "explanation": "The scikit-learn estimator provides a simple way of launching a scikit-learn training job on a compute target. It is implemented through the SKLearn class, which can be used to support single-node CPU training. Example: from azureml.train.sklearn import SKLearn } estimator = SKLearn(source_directory=project_folder , compute_target=compute_target, entry_script='train_iris.py' )", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-train-scikit-learn" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You have a Python script named train.py in a local folder named scripts. The script trains a regressio n model by using scikit-learn. The script includes code to loa d a training data file which is also located in the scripts folder. D283ABFBEDB32CDCE3B3406B9C29DB2F You must run the script as an Azure ML experiment o n a compute cluster named aml-compute. You need to configure the run to ensure that the en vironment includes the required packages for model training. You have instantiated a variable named am l-compute that references the target compute cluste r. Solution: Run the following code: Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "The scikit-learn estimator provides a simple way of launching a scikit-learn training job on a compute target. It is implemented through the SKLearn class, which can be used to support single-node CPU training. Example: from azureml.train.sklearn import SKLearn } estimator = SKLearn(source_directory=project_folder , compute_target=compute_target, entry_script='train_iris.py' )", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-train-scikit-learn" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You have a Python script named train.py in a local folder named scripts. The script trains a regressio n model by using scikit-learn. The script includes code to loa d a training data file which is also located in the scripts folder. You must run the script as an Azure ML experiment o n a compute cluster named aml-compute. You need to configure the run to ensure that the en vironment includes the required packages for model training. You have instantiated a variable named am l-compute that references the target compute cluste r. Solution: Run the following code: D283ABFBEDB32CDCE3B3406B9C29DB2F Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "The scikit-learn estimator provides a simple way of launching a scikit-learn training job on a compute target. It is implemented through the SKLearn class, which can be used to support single-node CPU training. Example: from azureml.train.sklearn import SKLearn } estimator = SKLearn(source_directory=project_folder , compute_target=compute_target, entry_script='train_iris.py' )", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-train-scikit-learn" }, { "question": "DRAG DROP You create machine learning models by using Azure M achine Learning. You plan to train and score models by using a varie ty of compute contexts. You also plan to create a n ew compute resource in Azure Machine Learning studio. You need to select the appropriate compute types. Which compute types should you select? To answer, d rag the appropriate compute types to the correct requirements. Each compute type may be used once, m ore than once, or not at all. You may need to drag the split bar between panes or scroll to view content. NOTE: Each correct selection is worth one point. Select and Place: D283ABFBEDB32CDCE3B3406B9C29DB2F", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: Compute cluster Create a single or multi node compute cluster for y our training, batch inferencing or reinforcement le arning workloads. Box 2: Inference cluster Box 3: Attached compute The compute types that can currently be attached fo r training include: A remote VM Azure Databricks (for use in machine learning pipel ines) Azure Data Lake Analytics (for use in machine learn ing pipelines) Azure HDInsight Box 4: Compute cluster Note: There are four compute types: Compute instance Compute clusters Inference clusters Attached compute Note 2: Compute clusters Create a single or multi node compute cluster for y our training, batch inferencing or reinforcement D283ABFBEDB32CDCE3B3406B9C29DB2F learning workloads. Attached compute To use compute targets created outside the Azure Ma chine Learning workspace, you must attach them. Attaching a compute target makes it available to yo ur workspace. Use Attached compute to attach a comp ute target for training. Use Inference clusters to atta ch an AKS cluster for inferencing. Inference clusters Create or attach an Azure Kubernetes Service (AKS) cluster for large scale inferencing.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-create-attach-compute-studio" }, { "question": "DRAG DROP You are building an experiment using the Azure Mach ine Learning designer. You split a dataset into training and testing sets. You select the Two-Class Boosted Decision Tree as the algorithm. You need to determine the Area Under the Curve (AUC ) of the model. Which three modules should you use in sequence? To answer, move the appropriate modules from the list of modules to the answer area and arrange them in the correct order. Select and Place: A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "Step 1: Train Model Two-Class Boosted Decision Tree First, set up the boosted decision tree model. 1. Find the Two-Class Boosted Decision Tree module in the module palette and drag it onto the canvas. 2. Find the Train Model module, drag it onto the ca nvas, and then connect the output of the Two-Class Boosted Decision Tree module to the left input port of the Train Model module. The Two-Class Boosted Decision Tree module initializes the generic model, and Train Mod el uses training data to train the model. 3. Connect the left output of the left Execute R Sc ript module to the right input port of the Train Mo del module (in this tutorial you used the data coming from the left side of the Split Data module for training). This portion of the experiment now looks something like this: Step 2: Score Model Score and evaluate the models You use the testing data that was separated out by the Split Data module to score our trained models. You can then compare the results of the two models to see w hich generated better results. Add the Score Model modules 1. Find the Score Model module and drag it onto the canvas. 2. Connect the Train Model module that's connected to the Two-Class Boosted Decision Tree module to th e left input port of the Score Model module. 3. Connect the right Execute R Script module (our t esting data) to the right input port of the Score M odel module. D283ABFBEDB32CDCE3B3406B9C29DB2F Step 3: Evaluate Model To evaluate the two scoring results and compare the m, you use an Evaluate Model module. 1. Find the Evaluate Model module and drag it onto the canvas. 2. Connect the output port of the Score Model modul e associated with the boosted decision tree model t o the left input port of the Evaluate Model module. 3. Connect the other Score Model module to the righ t input port.", "references": "" }, { "question": "You create a multi-class image classification deep learning model that uses a set of labeled images. Y ou create a script file named train.py that uses the PyTorch 1.3 framework to train the model. You must run the script by using an estimator. The code must not require any additional Python librari es to be installed in the environment for the estimator. The time required for model training must be minimized . You need to define the estimator that will be used to run the script. Which estimator type should you use?", "options": [ "A. TensorFlow", "B. PyTorch", "C. SKLearn", "D. Estimator" ], "correct": "", "explanation": "For PyTorch, TensorFlow and Chainer tasks, Azure Ma chine Learning provides respective PyTorch, TensorFlow, and Chainer estimators to simplify usin g these frameworks.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-train-ml-models" }, { "question": "You create a pipeline in designer to train a model that predicts automobile prices. Because of non-linear relationships in the data, th e pipeline calculates the natural log (Ln) of the p rices in the training data, trains a model to predict this natur al log of price value, and then calculates the expo nential of the scored label to get the predicted price. The training pipeline is shown in the exhibit. (Cli ck the Training pipeline tab.) Training pipeline D283ABFBEDB32CDCE3B3406B9C29DB2F You create a real-time inference pipeline from the training pipeline, as shown in the exhibit. (Click the Real-time pipeline tab.) Real-time pipeline You need to modify the inference pipeline to ensure that the web service returns the exponential of th e scored label as the predicted automobile price and that cl ient applications are not required to include a pri ce value in the input values. Which three modifications must you make to the infe rence pipeline? Each correct answer presents part o f the solution. NOTE: Each correct selection is worth one point.", "options": [ "A. Connect the output of the Apply SQL Transformatio n to the Web Service Output module.", "B. Replace the Web Service Input module with a data input that does not include the price column.", "C. Add a Select Columns module before the Score Mode l module to select all columns other than price.", "D. Replace the training dataset module with a data i nput that does not include the price column.", "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Explanation/Reference:", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-deploy-and-where" }, { "question": "You are creating a classification model for a banki ng company to identify possible instances of credit card D283ABFBEDB32CDCE3B3406B9C29DB2F fraud. You plan to create the model in Azure Machin e Learning by using automated machine learning. The training dataset that you are using is highly u nbalanced. You need to evaluate the classification model. Which primary metric should you use?", "options": [ "A. normalized_mean_absolute_error", "B. AUC_weighted", "C. accuracy", "D. normalized_root_mean_squared_error" ], "correct": "B. AUC_weighted", "explanation": "AUC_weighted is a Classification metric. Note: AUC is the Area under the Receiver Operating Characteristic Curve. Weighted is the arithmetic me an of the score for each class, weighted by the number of true instances in each class. Incorrect Answers: A: normalized_mean_absolute_error is a regression m etric, not a classification metric. C: When comparing approaches to imbalanced classifi cation problems, consider using metrics beyond accuracy such as recall, precision, and AUROC. It m ay be that switching the metric you optimize for du ring parameter selection or model selection is enough to provide desirable performance detecting the minori ty class. D: normalized_root_mean_squared_error is a regressi on metric, not a classification metric.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-understand-automated-ml" }, { "question": "You create a machine learning model by using the Az ure Machine Learning designer. You publish the mode l as a real-time service on an Azure Kubernetes Service (AKS) inference compute cluster. You make no change s to the deployed endpoint configuration. You need to provide application developers with the information they need to consume the endpoint. Which two values should you provide to application developers? Each correct answer presents part of th e solution. NOTE: Each correct selection is worth one point.", "options": [ "A. The name of the AKS cluster where the endpoint is hosted.", "B. The name of the inference pipeline for the endpoi nt.", "C. The URL of the endpoint.", "D. The run ID of the inference pipeline experiment f or the endpoint.", "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: forcasting Task: The type of task to run. Values can be 'class ification', 'regression', or 'forecasting' dependin g on the type of automated ML problem to solve. Box 2: temperature The training data to be used within the experiment. It should contain both training features and a lab el column (optionally a sample weights column). Box 3: observation_time time_column_name: The name of the time column. This parameter is required when forecasting to specify the datetime column in the input data used for building the time series and inferring its frequency. This setting is being deprecated. Please use forecasting_parameters instead. D283ABFBEDB32CDCE3B3406B9C29DB2F Box 4: 7 \"predicts temperature over the next seven days\" max_horizon: The desired maximum forecast horizon i n units of time-series frequency. The default value is 1. Units are based on the time interval of your traini ng data, e.g., monthly, weekly that the forecaster should predict out. When task type is forecasting, this pa rameter is required. Box 5: 50 \"For the initial round of training, you want to tra in a maximum of 50 different models.\" Iterations: The total number of different algorithm and parameter combinations to test during an autom ated ML experiment.", "references": "https://docs.microsoft.com/en-us/python/api/azureml -train-automl-client/ azureml.train.automl.automlconfig.automlconfig" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You create a model to forecast weather conditions b ased on historical data. You need to create a pipeline that runs a processin g script to load data from a datastore and pass the processed data to a machine learning model training script. Solution: Run the following code: Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "A. Yes", "explanation": "The two steps are present: process_step and train_s tep Data_input correctly references the data in the dat a store. Note: Data used in pipeline can be produced by one step a nd consumed in another step by providing a Pipeline Data object as an output of one step and an input of one or more subsequent steps. PipelineData objects are also used when constructin g Pipelines to describe step dependencies. To speci fy that a step requires the output of another step as input , use a PipelineData object in the constructor of b oth steps. For example, the pipeline train step depends on the process_step_output output of the pipeline process step: from azureml.pipeline.core import Pipeline, Pipelin eData from azureml.pipeline.steps import PythonScriptStep datastore = ws.get_default_datastore() process_step_output = PipelineData(\"processed_data\" , datastore=datastore) process_step = PythonScriptS tep (script_name=\"process.py\", arguments=[\"--data_for_train\", process_step_output] , outputs=[process_step_output], compute_target=aml_compute, source_directory=process_directory) train_step = PythonScriptStep(script_name=\"train.py \", arguments=[\"--data_for_train\", process_step_output] , inputs=[process_step_output], compute_target=aml_compute, source_directory=train_directory) pipeline = Pipeline(workspace=ws, steps=[process_st ep, train_step])", "references": "https://docs.microsoft.com/en-us/python/api/azureml -pipeline-core/azureml.pipeline.core.pipelinedata? view=azure-ml-py" }, { "question": "You run an experiment that uses an AutoMLConfig cla ss to define an automated machine learning task wit h a maximum of ten model training iterations. The task will attempt to find the best performing model base d on a metric named accuracy. You submit the experiment with the following code: You need to create Python code that returns the bes t model that is generated by the automated machine learning task. Which code segment should you use?", "options": [ "A. best_model = automl_run.get_details()", "B. best_model = automl_run.get_metrics()", "C. best_model = automl_run.get_file_names()[1]", "D. best_model = automl_run.get_output()[1]" ], "correct": "D. best_model = automl_run.get_output()[1]", "explanation": "The get_output method returns the best run and the fitted model.", "references": "https://notebooks.azure.com/azureml/projects/azurem l-getting-started/html/how-to-use-azureml/automated - machine-learning/classification/auto-ml-classificat ion.ipynb" }, { "question": "You plan to use the Hyperdrive feature of Azure Mac hine Learning to determine the optimal hyperparamet er values when training a model. You must use Hyperdrive to try combinations of the following hyperparameter values. You must not apply an early termination policy. learning_rate: any value between 0.001 and 0.1 batch_size: 16, 32, or 64 You need to configure the sampling method for the H yperdrive experiment. Which two sampling methods can you use? Each correc t answer is a complete solution. NOTE: Each correct selection is worth one point.", "options": [ "A. No sampling", "B. Grid sampling", "C. Bayesian sampling", "D. Random sampling" ], "correct": "", "explanation": "Explanation/Reference: C: Bayesian sampling is based on the Bayesian optim ization algorithm and makes intelligent choices on the hyperparameter values to sample next. It picks the sample based on how the previous samples performed, such that the new sample improves the reported prim ary metric. Bayesian sampling does not support any early termination policy Example: from azureml.train.hyperdrive import BayesianParame terSampling from azureml.train.hyperdrive import uniform, choice param_sampling = BayesianParameterSampling( { \"learning_rate\": uniform(0.05, 0.1), \"batch_size\": choice(16, 32, 64, 128) } ) D: In random sampling, hyperparameter values are ra ndomly selected from the defined search space. Rand om sampling allows the search space to include both di screte and continuous hyperparameters. D283ABFBEDB32CDCE3B3406B9C29DB2F Incorrect Answers: B: Grid sampling can be used if your hyperparameter space can be defined as a choice among discrete va lues and if you have sufficient budget to exhaustively s earch over all values in the defined search space. Additionally, one can use automated early terminati on of poorly performing runs, which reduces wastage of resources. Example, the following space has a total of six sam ples: from azureml.train.hyperdrive import GridParameterS ampling from azureml.train.hyperdrive import choice param_sampling = GridParameterSampling( { \"num_hidden_layers\": choice(1, 2, 3), \"batch_size\": choice(16, 32) } )Reference: https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-tune-hyperparameters", "references": "" }, { "question": "You are training machine learning models in Azure M achine Learning. You use Hyperdrive to tune the hyperparameters. In previous model training and tuning runs, many mo dels showed similar performance. You need to select an early termination policy that meets the following requirements: accounts for the performance of all previous runs w hen evaluating the current run avoids comparing the current run with only the best performing run to date Which two early termination policies should you use ? Each correct answer presents part of the solution . NOTE: Each correct selection is worth one point.", "options": [ "A. Median stopping", "B. Bandit", "C. Default", "D. Truncation selection" ], "correct": "", "explanation": "The Median Stopping policy computes running average s across all runs and cancels runs whose best performance is worse than the median of the running averages. If no policy is specified, the hyperparameter tunin g service will let all training runs execute to com pletion. Incorrect Answers: B: BanditPolicy defines an early termination policy based on slack criteria, and a frequency and delay interval for evaluation. The Bandit policy takes the following configuration parameters: slack_factor: The amount of slack allowed with resp ect to the best performing training run. This facto r specifies the slack as a ratio. D: The Truncation selection policy periodically can cels the given percentage of runs that rank the low est for their performance on the primary metric. The policy strives for fairness in ranking the runs by accoun ting for D283ABFBEDB32CDCE3B3406B9C29DB2F improving model performance with training time. Whe n ranking a relatively young run, the policy uses t he corresponding (and earlier) performance of older ru ns for comparison. Therefore, runs aren't terminate d for having a lower performance because they have run fo r less time than other runs.", "references": "https://docs.microsoft.com/en-us/python/api/azureml -train-core/azureml.train.hyperdrive.medianstopping policy https://docs.microsoft.com/en-us/python/api/azureml -train-core/ azureml.train.hyperdrive.truncationselectionpolicy https://docs.microsoft.com/en-us/python/api/azureml -train-core/azureml.train.hyperdrive.banditpolicy" }, { "question": "HOTSPOT You are hired as a data scientist at a winery. The previous data scientist used Azure Machine Learning . You need to review the models and explain how each model makes decisions. Which explainer modules should you use? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area:", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Meta explainers automatically select a suitable dir ect explainer and generate the best explanation inf o based on the given model and data sets. The meta explainers leverage all the libraries (SHAP, LIME, Mimic, etc. ) that we have integrated or developed. The following are the meta explainers available in the SDK: Tabular Explainer: Used with tabular datasets. Text Explainer: Used with text datasets. Image Explainer: Used with image datasets. Box 1: Tabular Box 2: Text Box 3: Image Incorrect Answers: Hierarchical Attention Network (HAN) HAN was proposed by Yang et al. in 2016. Key featur es of HAN that differentiates itself from existing approaches to document classification are (1) it ex ploits the hierarchical nature of text data and (2) attention mechanism is adapted for document classification. Reference: https://medium.com/microsoftazure/automated-and-int erpretable-machine-learning-d07975741298", "references": "" }, { "question": "D283ABFBEDB32CDCE3B3406B9C29DB2F HOTSPOT You have a dataset that includes home sales data fo r a city. The dataset includes the following column s. Each row in the dataset corresponds to an individua l home sales transaction. You need to use automated machine learning to gener ate the best model for predicting the sales price b ased on the features of the house. Which values should you use? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area:", "options": [ "A.", "B.", "C.", "D. Correct Answer:" ], "correct": "", "explanation": "Box 1: Regression Regression is a supervised machine learning techniq ue used to predict numeric values. Box 2: Price", "references": "https://docs.microsoft.com/en-us/learn/modules/crea te-regression-model-azure-machine-learning-designer" }, { "question": "You use the Azure Machine Learning SDK in a noteboo k to run an experiment using a script file in an experiment folder. The experiment fails. You need to troubleshoot the failed experiment. What are two possible ways to achieve this goal? Ea ch correct answer presents a complete solution.", "options": [ "A. Use the get_metrics() method of the run object to retrieve the experiment run logs.", "B. Use the get_details_with_logs() method of the run o bject to display the experiment run logs. C. View the log files for the experiment run in the ex periment folder.", "D. View the logs for the experiment run in Azure Machi ne Learning studio." ], "correct": "", "explanation": "Use get_details_with_logs() to fetch the run detail s and logs created by the run. You can monitor Azure Machine Learning runs and vie w their logs with the Azure Machine Learning studio . Incorrect Answers: A: You can view the metrics of a trained model usin g run.get_metrics(). E: get_output() gets the output of the step as Pipe lineData.", "references": "https://docs.microsoft.com/en-us/python/api/azureml -pipeline-core/azureml.pipeline.core.steprun https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-monitor-view-training-logs" }, { "question": "DRAG DROP You have an Azure Machine Learning workspace that c ontains a CPU-based compute cluster and an Azure Kubernetes Service (AKS) inference cluster. You cre ate a tabular dataset containing data that you plan to use to create a classification model. You need to use the Azure Machine Learning designer to create a web service through which client appli cations can consume the classification model by submitting new data and getting an immediate prediction as a response. Which three actions should you perform in sequence? To answer, move the appropriate actions from the l ist of actions to the answer area and arrange them in the correct order. Select and Place: A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "Step 1: Create and start a Compute Instance To train and deploy models using Azure Machine Lear ning designer, you need compute on which to run the training process, test the model, and host the mode l in a deployed service. There are four kinds of compute resource you can cr eate: Compute Instances: Development workstations that da ta scientists can use to work with data and models. Compute Clusters: Scalable clusters of virtual mach ines for on-demand processing of experiment code. Inference Clusters: Deployment targets for predicti ve services that use your trained models. Attached Compute: Links to existing Azure compute resources, such as Virtual Machines or Azure Databricks clusters. Step 2: Create and run a training pipeline.. After you've used data transformations to prepare t he data, you can use it to train a machine learning model. Create and run a training pipeline Step 3: Create and run a real-time inference pipeli ne After creating and running a pipeline to train the model, you need a second pipeline that performs the same data transformations for new data, and then uses th e trained model to inference (in other words, predi ct) label values based on its features. This pipeline will fo rm the basis for a predictive service that you can publish for applications to use.", "references": "https://docs.microsoft.com/en-us/learn/modules/crea te-classification-model-azure-machine-learning-desi gner/" }, { "question": "You use the Two-Class Neural Network module in Azur e Machine Learning Studio to build a binary classification model. You use the Tune Model Hyperp arameters module to tune accuracy for the model. You need to configure the Tune Model Hyperparameter s module. Which two values should you use? Each correct answe r presents part of the solution. NOTE: Each correct selection is worth one point.", "options": [ "A. Number of hidden nodes", "B. Learning Rate", "C. The type of the normalizer", "D. Number of learning iterations" ], "correct": "", "explanation": "D: For Number of learning iterations, specify the m aximum number of times the algorithm should process the training cases. E: For Hidden layer specification, select the type of network architecture to create. Between the inpu t and output layers you can insert multiple hidden layers . Most predictive tasks can be accomplished easily with only one or a few hidden layers.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/two-class-neural-netwo rk" }, { "question": "HOTSPOT You are running a training experiment on remote com pute in Azure Machine Learning. The experiment is configured to use a conda environ ment that includes the mlflow and azureml-contrib-r un packages. You must use MLflow as the logging package for trac king metrics generated in the experiment. You need to complete the script for the experiment. How should you complete the code? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area: D283ABFBEDB32CDCE3B3406B9C29DB2F A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: import mlflow Import the mlflow and Workspace classes to access M Lflow's tracking URI and configure your workspace. Box 2: mlflow.start_run() Set the MLflow experiment name with set_experiment( ) and start your training run with start_run(). Box 3: mlflow.log_metric(' ..') Use log_metric() to activate the MLflow logging API and begin logging your training run metrics. Box 4: mlflow.end_run() Close the run: run.endRun()", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-use-mlflow D283ABFBEDB32CDCE3B3406B9C29DB2F" }, { "question": "You create a binary classification model by using A zure Machine Learning Studio. You must tune hyperparameters by performing a param eter sweep of the model. The parameter sweep must meet the following requirements: iterate all possible combinations of hyperparameter s minimize computing resources required to perform th e sweep You need to perform a parameter sweep of the model. Which parameter sweep mode should you use?", "options": [ "A. Random sweep", "B. Sweep clustering", "C. Entire grid", "D. Random grid" ], "correct": "D. Random grid", "explanation": "Maximum number of runs on random grid: This option also controls the number of iterations over a rando m sampling of parameter values, but the values are no t generated randomly from the specified range; inst ead, a matrix is created of all possible combinations of p arameter values and a random sampling is taken over the matrix. This method is more efficient and less pron e to regional oversampling or undersampling. If you are training a model that supports an integr ated parameter sweep, you can also set a range of s eed values to use and iterate over the random seeds as well. This is optional, but can be useful for avoid ing bias introduced by seed selection. Incorrect Answers: B: If you are building a clustering model, use Swee p Clustering to automatically determine the optimum number of clusters and other parameters. C: Entire grid: When you select this option, the mo dule loops over a grid predefined by the system, to try different combinations and identify the best learne r. This option is useful for cases where you don't know what the best parameter settings might be and want to tr y all possible combination of values.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/tune-model- hyperparameters" }, { "question": "You are building a recurrent neural network to perf orm a binary classification. You review the training loss, validation loss, trai ning accuracy, and validation accuracy for each tra ining epoch. You need to analyze model performance. You need to identify whether the classification mod el is overfitted. Which of the following is correct?", "options": [ "A. The training loss stays constant and the validati on loss stays on a constant value and close to the training", "B. The training loss decreases while the validation loss increases when training the model.", "C. The training loss stays constant and the validati on loss decreases when training the model.", "D. The training loss increases while the validation loss decreases when training the model." ], "correct": "B. The training loss decreases while the validation loss increases when training the model.", "explanation": "An overfit model is one where performance on the tr ain set is good and continues to improve, whereas performance on the validation set improves to a poi nt and then begins to degrade.", "references": "https://machinelearningmastery.com/diagnose-overfit ting-underfitting-lstm-models/" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You have a Python script named train.py in a local folder named scripts. The script trains a regressio n model by using scikit-learn. The script includes code to loa d a training data file which is also located in the scripts folder. You must run the script as an Azure ML experiment o n a compute cluster named aml-compute. You need to configure the run to ensure that the en vironment includes the required packages for model training. You have instantiated a variable named am l-compute that references the target compute cluste r. Solution: Run the following code: Does the solution meet the goal?", "options": [ "A. Yes B. No" ], "correct": "", "explanation": "There is a missing line: conda_packages=['scikit-le arn'], which is needed. Correct example: sk_est = Estimator(source_directory='./my-sklearn-p roj', script_params=script_params, compute_target=compute_target, D283ABFBEDB32CDCE3B3406B9C29DB2F entry_script='train.py', conda_packages=['scikit-learn']) Note: The Estimator class represents a generic estimator to train data using any supplied framework. This class is designed for use with machine learnin g frameworks that do not already have an Azure Mach ine Learning pre-configured estimator. Pre-configured e stimators exist for Chainer, PyTorch, TensorFlow, a nd SKLearn. Example: from azureml.train.estimator import Estimator script_params = { # to mount files referenced by mnist dataset '--data-folder': ds.as_named_input('mnist').as_moun t(), '--regularization': 0.8 }", "references": "https://docs.microsoft.com/en-us/python/api/azureml -train-core/azureml.train.estimator.estimator" }, { "question": "You are performing clustering by using the K-means algorithm. You need to define the possible termination conditi ons. Which three conditions can you use? Each correct an swer presents a complete solution. NOTE: Each correct selection is worth one point.", "options": [ "A. Centroids do not change between iterations.", "B. The residual sum of squares (RSS) rises above a t hreshold.", "C. The residual sum of squares (RSS) falls below a t hreshold.", "D. A fixed number of iterations is executed." ], "correct": "", "explanation": "AD: The algorithm terminates when the centroids sta bilize or when a specified number of iterations are completed. C: A measure of how well the centroids represent th e members of their clusters is the residual sum of squares or RSS, the squared distance of each vector from it s centroid summed over all vectors. RSS is the obje ctive function and our goal is to minimize it.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/k-means-clustering https://nlp.stanford.edu/IR-book/html/htmledition/k -means-1.html" }, { "question": "HOTSPOT D283ABFBEDB32CDCE3B3406B9C29DB2F You are using C-Support Vector classification to do a multi-class classification with an unbalanced tr aining dataset. The C-Support Vector classification using Python code shown below: You need to evaluate the C-Support Vector classific ation code. Which evaluation statement should you use? To answe r, select the appropriate options in the answer are a. NOTE: Each correct selection is worth one point. Hot Area: A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: Automatically adjust weights inversely propo rtional to class frequencies in the input data D283ABFBEDB32CDCE3B3406B9C29DB2F The \"balanced\" mode uses the values of y to automat ically adjust weights inversely proportional to cla ss frequencies in the input data as n_samples / (n_cla sses * np.bincount(y)). Box 2: Penalty parameter Parameter: C : float, optional (default=1.0) Penalty parameter C of the error term.", "references": "https://scikit-learn.org/stable/modules/generated/s klearn.svm.SVC.html" }, { "question": "You are building a machine learning model for trans lating English language textual content into French language textual content. You need to build and train the machine learning mo del to learn the sequence of the textual content. Which type of neural network should you use?", "options": [ "A. Multilayer Perceptions (MLPs)", "B. Convolutional Neural Networks (CNNs)", "C. Recurrent Neural Networks (RNNs)", "D. Generative Adversarial Networks (GANs)" ], "correct": "C. Recurrent Neural Networks (RNNs)", "explanation": "To translate a corpus of English text to French, we need to build a recurrent neural network (RNN). Note: RNNs are designed to take sequences of text a s inputs or return sequences of text as outputs, or both. They're called recurrent because the network's hidd en layers have a loop in which the output and cell state from each time step become inputs at the next time step. This recurrence serves as a form of memory. It all ows contextual information to flow through the network so that relevant outputs from previous time steps c an be applied to network operations at the current time s tep.", "references": "https://towardsdatascience.com/language-translation -with-rnns-d84d43b40571" }, { "question": "You create a binary classification model. You need to evaluate the model performance. Which two metrics can you use? Each correct answer presents a complete solution. NOTE: Each correct selection is worth one point.", "options": [ "A. relative absolute error", "B. precision", "C. accuracy", "D. mean absolute error" ], "correct": "", "explanation": "The evaluation metrics available for binary classif ication models are: Accuracy, Precision, Recall, F1 Score, and AUC. Note: A very natural question is: 'Out of the indiv iduals whom the model, how many were classified cor rectly (TP)?' This question can be answered by looking at the Pre cision of the model, which is the proportion of pos itives that are classified correctly.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio/evaluate-model-performance" }, { "question": "You create a script that trains a convolutional neu ral network model over multiple epochs and logs the validation loss after each epoch. The script includ es arguments for batch size and learning rate. You identify a set of batch size and learning rate values that you want to try. You need to use Azure Machine Learning to find the combination of batch size and learning rate that re sults in the model with the lowest validation loss. What should you do?", "options": [ "A. Run the script in an experiment based on an AutoM LConfig object", "B. Create a PythonScriptStep object for the script a nd run it in a pipeline", "C. Use the Automated Machine Learning interface in A zure Machine Learning studio", "D. Run the script in an experiment based on a Script RunConfig object" ], "correct": "", "explanation": "Explanation/Reference: Reference: https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-tune-hyperparameters", "references": "" }, { "question": "You use the Azure Machine Learning Python SDK to de fine a pipeline to train a model. The data used to train the model is read from a fol der in a datastore. You need to ensure the pipeline runs automatically whenever the data in the folder changes. What should you do?", "options": [ "A. Set the regenerate_outputs property of the pipeli ne to True", "B. Create a ScheduleRecurrance object with a Frequen cy of auto. Use the object to create a Schedule for the", "C. Create a PipelineParameter with a default value t hat references the location where the training data is", "D. Create a Schedule for the pipeline. Specify the d atastore in the datastore property, and the folder containing" ], "correct": "D. Create a Schedule for the pipeline. Specify the d atastore in the datastore property, and the folder containing", "explanation": "Explanation/Reference:", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-trigger-published-pipeline" }, { "question": "You plan to run a Python script as an Azure Machine Learning experiment. The script must read files from a hierarchy of fold ers. The files will be passed to the script as a da taset argument. You must specify an appropriate mode for the datase t argument. Which two modes can you use? Each correct answer pr esents a complete solution. NOTE: Each correct selection is worth one point.", "options": [ "A. to_pandas_dataframe()", "B. as_download()", "C. as_upload()", "D. as_mount()" ], "correct": "B. as_download()", "explanation": "Explanation Explanation/Reference:", "references": "https://docs.microsoft.com/en-us/python/api/azureml -core/azureml.data.filedataset?view=azure-ml-py" }, { "question": "You create a Python script that runs a training exp eriment in Azure Machine Learning. The script uses the Azure Machine Learning SDK for Python. You must add a statement that retrieves the names o f the logs and outputs generated by the script. You need to reference a Python class object from th e SDK for the statement. Which class object should you use?", "options": [ "A. Run", "B. ScriptRunConfig", "C. Workspace", "D. Experiment" ], "correct": "A. Run", "explanation": "A run represents a single trial of an experiment. R uns are used to monitor the asynchronous execution of a trial, log metrics and store output of the trial, a nd to analyze results and access artifacts generate d by the trial. D283ABFBEDB32CDCE3B3406B9C29DB2F The run Class get_all_logs method downloads all log s for the run to a directory. Incorrect Answers: A: A run represents a single trial of an experiment . Runs are used to monitor the asynchronous executi on of a trial, log metrics and store output of the trial, a nd to analyze results and access artifacts generate d by the trial. B: A ScriptRunConfig packages together the configur ation information needed to submit a run in Azure M L, including the script, compute target, environment, and any distributed job-specific configs.", "references": "https://docs.microsoft.com/en-us/python/api/azureml -core/azureml.core.run(class)" }, { "question": "You run a script as an experiment in Azure Machine Learning. You have a Run object named run that references the experiment run. You must review the log files that were generated during the experiment run. You need to download the log files to a local folde r for review. Which two code segments can you run to achieve this goal? Each correct answer presents a complete solu tion. NOTE: Each correct selection is worth one point.", "options": [ "A. run.get_details() B. run.get_file_names()", "C. run.get_metrics()", "D. run.download_files(output_directory='./runfiles')" ], "correct": "", "explanation": "The run Class get_all_logs method downloads all log s for the run to a directory. The run Class get_details gets the definition, stat us information, current log files, and other detail s of the run. Incorrect Answers: B: The run get_file_names list the files that are s tored in association with the run.", "references": "https://docs.microsoft.com/en-us/python/api/azureml -core/azureml.core.run(class)" }, { "question": "You have the following code. The code prepares an e xperiment to run a script: D283ABFBEDB32CDCE3B3406B9C29DB2F The experiment must be run on local computer using the default environment. You need to add code to start the experiment and ru n the script. Which code segment should you use?", "options": [ "A. run = script_experiment.start_logging()", "B. run = Run(experiment=script_experiment)", "C. ws.get_run(run_id=experiment.id)", "D. run = script_experiment.submit(config=script_conf ig)" ], "correct": "D. run = script_experiment.submit(config=script_conf ig)", "explanation": "The experiment class submit method submits an exper iment and return the active created run. Syntax: submit(config, tags=None, **kwargs)", "references": "https://docs.microsoft.com/en-us/python/api/azureml -core/azureml.core.experiment.experiment" }, { "question": "You use the following code to define the steps for a pipeline: from azureml.core import Workspace, Experiment, Run from azureml.pipeline.core import Pipeline from azureml.pipeline.steps import PythonScriptStep ws = Workspace.from_config() . . . step1 = PythonScriptStep(name=\"step1\", ...) step2 = PythonScriptsStep(name=\"step2\", ...) pipeline_steps = [step1, step2] You need to add code to run the steps. Which two code segments can you use to achieve this goal? Each correct answer presents a complete solution. NOTE: Each correct selection is worth one point.", "options": [ "A. experiment = Experiment(workspace=ws,", "B. run = Run(pipeline_steps)", "C. pipeline = Pipeline(workspace=ws, steps=pipeline_ steps)", "D. pipeline = Pipeline(workspace=ws, steps=pipeline_ steps)" ], "correct": "", "explanation": "After you define your steps, you build the pipeline by using some or all of those steps. # Build the pipeline. Example: pipeline1 = Pipeline(workspace=ws, steps=[compare_m odels]) # Submit the pipeline to be run pipeline_run1 = Experiment(ws, 'Compare_Models_Exp' ).submit(pipeline1)", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-create-machine-learning-pipelines" }, { "question": "HOTSPOT You create an Azure Databricks workspace and a link ed Azure Machine Learning workspace. You have the following Python code segment in the A zure Machine Learning workspace: import mlflow import mlflow.azureml import azureml.mlflow import azureml.core from azureml.core import Workspace subscription_id = 'subscription_id' resourse_group = 'resource_group_name' workspace_name = 'workspace_name' ws = Workspace.get(name=workspace_name, subscription_id=subscription_id, resource_group=resource_group) experimentName = \"/Users/{user_name}/{experiment_fo lder}/{experiment_name}\" mlflow.set_experiment (experimentName) uri = ws.get_mlflow_tracking_uri() mlflow.set_tracking_uri(uri) Instructions: For each of the following statements, select Yes if the statement is true. Otherwise, se lect No. NOTE: Each correct selection is worth one point. Hot Area: D283ABFBEDB32CDCE3B3406B9C29DB2F", "options": [ "A.", "B. C.", "D." ], "correct": "", "explanation": "Box 1: No The Workspace.get method loads an existing workspac e without using configuration files. ws = Workspace.get(name=\"myworkspace\", subscription_id='', resource_group='myresourcegroup') Box 2: Yes MLflow Tracking with Azure Machine Learning lets yo u store the logged metrics and artifacts from your local runs into your Azure Machine Learning workspace. The get_mlflow_tracking_uri() method assigns a uniq ue tracking URI address to the workspace, ws, and set_tracking_uri() points the MLflow tracking URI t o that address. Box 3: Yes Note: In Deep Learning, epoch means the total datas et is passed forward and backward in a neural netwo rk once.", "references": "https://docs.microsoft.com/en-us/python/api/azureml -core/azureml.core.workspace.workspace https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-use-mlflow" }, { "question": "You create and register a model in an Azure Machine Learning workspace. You must use the Azure Machine Learning SDK to impl ement a batch inference pipeline that uses a ParallelRunStep to score input data using the model . You must specify a value for the ParallelRunConfi g compute_target setting of the pipeline step. You need to create the compute target. Which class should you use?", "options": [ "A. BatchCompute", "B. AdlaCompute", "C. AmlCompute", "D. AksCompute" ], "correct": "C. AmlCompute", "explanation": "Compute target to use for ParallelRunStep. This par ameter may be specified as a compute target object or the string name of a compute target in the workspace. The compute_target target is of AmlCompute or strin g. Note: An Azure Machine Learning Compute (AmlCompute ) is a managed-compute infrastructure that allows you to easily create a single or multi-node compute . The compute is created within your workspace regi on as a resource that can be shared with other users", "references": "https://docs.microsoft.com/en-us/python/api/azureml -contrib-pipeline-steps/ azureml.contrib.pipeline.steps.parallelrunconfig https://docs.microsoft.com/en-us/python/api/azureml -core/azureml.core.compute.amlcompute(class)" }, { "question": "DRAG DROP You previously deployed a model that was trained us ing a tabular dataset named training-dataset, which is based on a folder of CSV files. Over time, you have collected the features and pred icted labels generated by the model in a folder con taining a CSV file for each month. You have created two tabul ar datasets based on the folder containing the infe rence data: one named predictions-dataset with a schema t hat matches the training data exactly, including th e predicted label; and another named features-dataset with a schema containing all of the feature column s and a timestamp column based on the filename, which inclu des the day, month, and year. You need to create a data drift monitor to identify any changing trends in the feature data since the model was trained. To accomplish this, you must define the re quired datasets for the data drift monitor. Which datasets should you use to configure the data drift monitor? To answer, drag the appropriate dat asets to the correct data drift monitor options. Each source may be used once, more than once, or not at all. Y ou may need to drag the split bar between panes or scroll to view content. NOTE: Each correct selection is worth one point. Select and Place: A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: training-dataset Baseline dataset - usually the training dataset for a model. Box 2: predictions-dataset Target dataset - usually model input data - is comp ared over time to your baseline dataset. This compa rison means that your target dataset must have a timestam p column specified. The monitor will compare the baseline and target da tasets.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-monitor-datasets" }, { "question": "You plan to run a Python script as an Azure Machine Learning experiment. The script contains the following code: You must specify a file dataset as an input to the script. The dataset consists of multiple large imag e files and must be streamed directly from its source. You need to write code to define a ScriptRunConfig object for the experiment and pass the ds dataset a s an argument. Which code segment should you use?", "options": [ "A. arguments = ['--input-data', ds.to_pandas_datafra me()]", "B. arguments = ['--input-data', ds.as_mount()]", "C. arguments = ['--data-data', ds]", "D. arguments = ['--input-data', ds.as_download()]" ], "correct": "A. arguments = ['--input-data', ds.to_pandas_datafra me()]", "explanation": "If you have structured data not yet registered as a dataset, create a TabularDataset and use it direct ly in your training script for your local or remote experiment . To load the TabularDataset to pandas DataFrame df = dataset.to_pandas_dataframe() Note: TabularDataset represents data in a tabular f ormat created by parsing the provided file or list of files.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-train-with-datasets D283ABFBEDB32CDCE3B3406B9C29DB2F Run experiments and train models Testlet 2 Case study Overview You are a data scientist in a company that provides data science for professional sporting events. Mod els will use global and local market data to meet the follow ing business goals: Understand sentiment of mobile device users at spor ting events based on audio from crowd reactions. Assess a user's tendency to respond to an advertise ment. Customize styles of ads served on mobile devices. Use video to detect penalty events Current environment Media used for penalty event detection will be prov ided by consumer devices. Media may include images and videos captured during the sporting event and s hared using social media. The images and videos wil l have varying sizes and formats. The data available for model building comprises of seven years of sporting event media. The sporting e vent media includes; recorded video transcripts or radio commentary, and logs from related social media fee ds captured during the sporting events. Crowd sentiment will include audio recordings submi tted by event attendees in both mono and stereo formats. Penalty detection and sentiment Data scientists must build an intelligent solution by using multiple machine learning models for penal ty event detection. Data scientists must build notebooks in a local env ironment using automatic feature engineering and mo del building in machine learning pipelines. Notebooks must be deployed to retrain by using Spar k instances with dynamic worker allocation. Notebooks must execute with the same code on new Sp ark instances to recode only the source of the data . Global penalty detection models must be trained by using dynamic runtime graph computation during training. Local penalty detection models must be written by u sing BrainScript. Experiments for local crowd sentiment models must c ombine local penalty detection data. Crowd sentiment models must identify known sounds s uch as cheers and known catch phrases. Individual crowd sentiment models will detect similar sounds. All shared features for local models are continuous variables. Shared features must use double precision. Subseque nt layers must have aggregate running mean and standard deviation metrics available. Advertisements During the initial weeks in production, the followi ng was observed: Ad response rated declined. Drops were not consistent across ad styles. The distribution of features across training and pr oduction data are not consistent Analysis shows that, of the 100 numeric features on user location and behavior, the 47 features that c ome from location sources are being used as raw features. A suggested experiment to remedy the bias and varianc e issue is to engineer 10 linearly uncorrelated featu res. Initial data discovery shows a wide range of densit ies of target states in training data used for crow d sentiment models. All penalty detection models show inference phases using a Stochastic Gradient Descent (SGD) are runni ng D283ABFBEDB32CDCE3B3406B9C29DB2F too slow. Audio samples show that the length of a catch phras e varies between 25%-47% depending on region The performance of the global penalty detection mod els shows lower variance but higher bias when comparing training and validation sets. Before impl ementing any feature changes, you must confirm the bias and variance using all training and validation case s. Ad response models must be trained at the beginning of each event and applied during the sporting even t. Market segmentation models must optimize for simila r ad response history. Sampling must guarantee mutual and collective exclu sively between local and global segmentation models that share the same features. Local market segmentation models will be applied be fore determining a user's propensity to respond to an advertisement. Ad response models must support non-linear boundari es of features. The ad propensity model uses a cut threshold is 0.4 5 and retrains occur if weighted Kappa deviated fro m 0.1 +/- 5%. The ad propensity model uses cost factors shown in the following diagram: The ad propensity model uses proposed cost factors shown in the following diagram: Performance curves of current and proposed cost fac tor scenarios are shown in the following diagram: D283ABFBEDB32CDCE3B3406B9C29DB2F" }, { "question": "You need to implement a scaling strategy for the lo cal penalty detection data. Which normalization type should you use?", "options": [ "A. Streaming", "B. Weight", "C. Batch", "D. Cosine" ], "correct": "C. Batch", "explanation": "Post batch normalization statistics (PBN) is the Mi crosoft Cognitive Toolkit (CNTK) version of how to evaluate the population mean and variance of Batch Normaliza tion which could be used in inference Original Pape r. In CNTK, custom networks are defined using the BrainSc riptNetworkBuilder and described in the CNTK networ k description language \"BrainScript.\" Scenario: Local penalty detection models must be written by u sing BrainScript.", "references": "https://docs.microsoft.com/en-us/cognitive-toolkit/ post-batch-normalization-statistics" }, { "question": "HOTSPOT You need to use the Python language to build a samp ling strategy for the global penalty detection mode ls. How should you complete the code segment? To answer , select the appropriate options in the answer area . NOTE: Each correct selection is worth one point. Hot Area: D283ABFBEDB32CDCE3B3406B9C29DB2F", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: import pytorch as deeplearninglib Box 2: ..DistributedSampler(Sampler).. DistributedSampler(Sampler): Sampler that restricts data loading to a subset of the dataset. It is especially useful in conjunction with class:`torch.nn.parallel.DistributedDataParallel`. In such case, each process can pass a DistributedSa mpler instance as a DataLoader sampler, and load a subset of the original dataset that is exclusive to it. Scenario: Sampling must guarantee mutual and collec tive exclusively between local and global segmentat ion models that share the same features. Box 3: optimizer = deeplearninglib.train. GradientD escentOptimizer(learning_rate=0.10) Incorrect Answers: ..SGD.. D283ABFBEDB32CDCE3B3406B9C29DB2F Scenario: All penalty detection models show inferen ce phases using a Stochastic Gradient Descent (SGD) are running too slow. Box 4: .. nn.parallel.DistributedDataParallel.. DistributedSampler(Sampler): The sampler that restr icts data loading to a subset of the dataset. It is especially useful in conjunction with :class:`torch.nn.paralle l.DistributedDataParallel`.", "references": "https://github.com/pytorch/pytorch/blob/master/torc h/utils/data/distributed.py" }, { "question": "DRAG DROP You need to define an evaluation strategy for the c rowd sentiment models. Which three actions should you perform in sequence? To answer, move the appropriate actions from the l ist of actions to the answer area and arrange them in the correct order. Select and Place:", "options": [ "A. B.", "C.", "D." ], "correct": "", "explanation": "Scenario: Experiments for local crowd sentiment models must c ombine local penalty detection data. Crowd sentimen t models must identify known sounds such as cheers an d known catch phrases. Individual crowd sentiment models will detect similar sounds. Note: Evaluate the changed in correlation between m odel error rate and centroid distance In machine le arning, a nearest centroid classifier or nearest prototype classifier is a classification model that assigns t o observations the label of the class of training samples whose me an (centroid) is closest to the observation.", "references": "https://en.wikipedia.org/wiki/Nearest_centroid_clas sifier https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/sweep-clustering" }, { "question": "You need to implement a feature engineering strateg y for the crowd sentiment local models. What should you do?", "options": [ "A. Apply an analysis of variance (ANOVA).", "B. Apply a Pearson correlation coefficient.", "C. Apply a Spearman correlation coefficient.", "D. Apply a linear discriminant analysis." ], "correct": "D. Apply a linear discriminant analysis.", "explanation": "The linear discriminant analysis method works only on continuous variables, not categorical or ordinal variables. D283ABFBEDB32CDCE3B3406B9C29DB2F Linear discriminant analysis is similar to analysis of variance (ANOVA) in that it works by comparing the means of the variables. Scenario: Data scientists must build notebooks in a local env ironment using automatic feature engineering and mo del building in machine learning pipelines. Experiments for local crowd sentiment models must c ombine local penalty detection data. All shared fea tures for local models are continuous variables. Incorrect Answers: B: The Pearson correlation coefficient, sometimes c alled Pearson's R test, is a statistical value that measures the linear relationship between two variables. By e xamining the coefficient values, you can infer some thing about the strength of the relationship between the two variables, and whether they are positively corr elated or negatively correlated. C: Spearman's correlation coefficient is designed f or use with non-parametric and non-normally distrib uted data. Spearman's coefficient is a nonparametric mea sure of statistical dependence between two variable s, and is sometimes denoted by the Greek letter rho. The S pearman's coefficient expresses the degree to which two variables are monotonically related. It is also cal led Spearman rank correlation, because it can be us ed with ordinal variables.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/fisher-linear-discrimi nant- analysis https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/compute-linear-correla tion" }, { "question": "DRAG DROP You need to define a modeling strategy for ad respo nse. Which three actions should you perform in sequence? To answer, move the appropriate actions from the l ist of actions to the answer area and arrange them in the correct order. Select and Place:", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Step 1: Implement a K-Means Clustering model Step 2: Use the cluster as a feature in a Decision jungle model. Decision jungles are non-parametric m odels, which can represent non-linear decision boundaries. Step 3: Use the raw score as a feature in a Score M atchbox Recommender model The goal of creating a recommendation system is to recommend one or more \" items\" to \"users\" of the system. Examples of an ite m could be a movie, restaurant, book, or song. A user could be a person, group of persons, or other enti ty with item preferences. Scenario: Ad response rated declined. Ad response models must be trained at the beginning of each event and applied during the sporting even t. Market segmentation models must optimize for simila r ad response history. Ad response models must supp ort non-linear boundaries of features.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/multiclass-decision-ju ngle https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/score-matchbox- recommender" }, { "question": "DRAG DROP You need to define an evaluation strategy for the c rowd sentiment models. Which three actions should you perform in sequence? To answer, move the appropriate actions from the l ist of actions to the answer area and arrange them in the correct order. Select and Place: D283ABFBEDB32CDCE3B3406B9C29DB2F", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Step 1: Define a cross-entropy function activation When using a neural network to perform classificati on and prediction, it is usually better to use cros s-entropy error than classification error, and somewhat bette r to use cross-entropy error than mean squared erro r to evaluate the quality of the neural network. Step 2: Add cost functions for each target state. Step 3: Evaluated the distance error metric.", "references": "https://www.analyticsvidhya.com/blog/2018/04/fundam entals-deep-learning-regularization-techniques/" }, { "question": "You need to implement a model development strategy to determine a user's tendency to respond to an ad. Which technique should you use?", "options": [ "A. Use a Relative Expression Split module to partiti on the data based on centroid distance.", "B. Use a Relative Expression Split module to partiti on the data based on distance travelled to the even t.", "C. Use a Split Rows module to partition the data bas ed on distance travelled to the event.", "D. Use a Split Rows module to partition the data bas ed on centroid distance." ], "correct": "A. Use a Relative Expression Split module to partiti on the data based on centroid distance.", "explanation": "Split Data partitions the rows of a dataset into tw o distinct sets. The Relative Expression Split opti on in the Split Data module of Azure Machine Learning Studio is hel pful when you need to divide a dataset into trainin g and testing datasets using a numerical expression. Relative Expression Split: Use this option whenever you want to apply a condition to a number column. The number could be a date/time field, a column contain ing age or dollar amounts, or even a percentage. Fo r example, you might want to divide your data set dep ending on the cost of the items, group people by ag e ranges, or separate data by a calendar date. Scenario: Local market segmentation models will be applied be fore determining a user's propensity to respond to an advertisement. The distribution of features across training and pr oduction data are not consistent", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/split-data" }, { "question": "You need to implement a new cost factor scenario fo r the ad response models as illustrated in the perf ormance curve exhibit. Which technique should you use? A. Set the threshold to 0.5 and retrain if weighted Kappa deviates +/- 5% from 0.45.", "options": [ "B. Set the threshold to 0.05 and retrain if weighted Kappa deviates +/- 5% from 0.5.", "C. Set the threshold to 0.2 and retrain if weighted Kappa deviates +/- 5% from 0.6.", "D. Set the threshold to 0.75 and retrain if weighted Kappa deviates +/- 5% from 0.15." ], "correct": "", "explanation": "Scenario: Performance curves of current and proposed cost fac tor scenarios are shown in the following diagram: D283ABFBEDB32CDCE3B3406B9C29DB2F The ad propensity model uses a cut threshold is 0.4 5 and retrains occur if weighted Kappa deviated fro m 0.1 +/- 5%. D283ABFBEDB32CDCE3B3406B9C29DB2F Run experiments and train models Testlet 3 Case study This is a case study. Case studies are not timed se parately. You can use as much exam time as you woul d like to complete each case. However, there may be additi onal case studies and sections on this exam. You mu st manage your time to ensure that you are able to com plete all questions included on this exam in the ti me provided. To answer the questions included in a case study, y ou will need to reference information that is provi ded in the case study. Case studies might contain exhibits and other resources that provide more information abou t the scenario that is described in the case study. Each question is independent of the other questions in t his case study. At the end of this case study, a review screen will appear. This screen allows you to review your answ ers and to make changes before you move to the next section of the exam. After you begin a new section, you canno t return to this section. To start the case study To display the first question in this case study, c lick the Next button. Use the buttons in the left p ane to explore the content of the case study before you answer the questions. Clicking these buttons displays informa tion such as business requirements, existing environment, and problem statements. If the case study has an All Information tab, note that the information displaye d is identical to the information displayed on the subsequent tabs. When you are ready to answer a question, clic k the Question button to return to the question. Overview You are a data scientist for Fabrikam Residences, a company specializing in quality private and commer cial property in the United States. Fabrikam Residences is considering expanding into Europe and has asked you to investigate prices for private residences in major European cities. You use Azure Machine Learning Stu dio to measure the median value of properties. You produce a regression model to predict property prices by u sing the Linear Regression and Bayesian Linear Regressio n modules. Datasets There are two datasets in CSV format that contain p roperty details for two cities, London and Paris. Y ou add both files to Azure Machine Learning Studio as sepa rate datasets to the starting point for an experime nt. Both datasets contain the following columns: D283ABFBEDB32CDCE3B3406B9C29DB2F An initial investigation shows that the datasets ar e identical in structure apart from the MedianValue column. The smaller Paris dataset contains the MedianValue in text format, whereas the larger London dataset c ontains the MedianValue in numerical format. Data issues Missing values The AccessibilityToHighway column in both datasets contains missing values. The missing data must be replaced with new data so that it is modeled condit ionally using the other variables in the data befor e filling in the missing values. Columns in each dataset contain missing and null va lues. The datasets also contain many outliers. The Age column has a high proportion of outliers. You need to remove the rows that have outliers in the Age co lumn. The MedianValue and AvgRoomsInHouse columns both ho ld data in numeric format. You need to select a feature selection algorithm to analyze the relation ship between the two columns in more detail. Model fit The model shows signs of overfitting. You need to p roduce a more refined regression model that reduces the overfitting. Experiment requirements You must set up the experiment to cross-validate th e Linear Regression and Bayesian Linear Regression modules to evaluate performance. In each case, the predictor of the dataset is the column named MedianValue. You must ensure that the datatype of t he MedianValue column of the Paris dataset matches the structure of the London dataset. You must prioritize the columns of data for predict ing the outcome. You must use non-parametric statis tics to measure relationships. You must use a feature selection algorithm to analy ze the relationship between the MedianValue and AvgRoomsInHouse columns. Model training Permutation Feature Importance D283ABFBEDB32CDCE3B3406B9C29DB2F Given a trained model and a test dataset, you must compute the Permutation Feature Importance scores o f feature variables. You must be determined the absol ute fit for the model. Hyperparameters You must configure hyperparameters in the model lea rning process to speed the learning phase. In addit ion, this configuration should cancel the lowest perform ing runs at each evaluation interval, thereby direc ting effort and resources towards models that are more likely t o be successful. You are concerned that the model might not efficien tly use compute resources in hyperparameter tuning. You also are concerned that the model might prevent an increase in the overall tuning time. Therefore, mus t implement an early stopping criterion on models tha t provides savings without terminating promising jo bs. Testing You must produce multiple partitions of a dataset b ased on sampling using the Partition and Sample mod ule in Azure Machine Learning Studio. Cross-validation You must create three equal partitions for cross-va lidation. You must also configure the cross-validat ion process so that the rows in the test and training d atasets are divided evenly by properties that are n ear each city's main river. You must complete this task befo re the data goes through the sampling process. Linear regression module When you train a Linear Regression module, you must determine the best features to use in a model. You can choose standard metrics provided to measure perform ance before and after the feature importance proces s completes. The distribution of features across mult iple training models must be consistent. Data visualization You need to provide the test results to the Fabrika m Residences team. You create data visualizations t o aid in presenting the results. You must produce a Receiver Operating Characteristi c (ROC) curve to conduct a diagnostic test evaluati on of the model. You need to select appropriate methods f or producing the ROC curve in Azure Machine Learnin g Studio to compare the Two-Class Decision Forest and the Two-Class Decision Jungle modules with one another.", "references": "" }, { "question": "HOTSPOT You need to replace the missing data in the Accessi bilityToHighway columns. How should you configure the Clean Missing Data mod ule? To answer, select the appropriate options in t he answer area. NOTE: Each correct selection is worth one point. Hot Area: D283ABFBEDB32CDCE3B3406B9C29DB2F D283ABFBEDB32CDCE3B3406B9C29DB2F", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: Replace using MICE Replace using MICE: For each missing value, this op tion assigns a new value, which is calculated by us ing a method described in the statistical literature as \" Multivariate Imputation using Chained Equations\" or \"Multiple Imputation by Chained Equations\". With a multiple i mputation method, each variable with missing data i s modeled conditionally using the other variables in the data before filling in the missing values. Scenario: The AccessibilityToHighway column in both datasets contains missing values. The missing data must be replaced with new data so that it is modeled con ditionally using the other variables in the data be fore filling in the missing values. Box 2: Propagate Cols with all missing values indicate if columns of all missing values should be preserved in the outp ut.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/clean-missing-data" }, { "question": "DRAG DROP You need to produce a visualization for the diagnos tic test evaluation according to the data visualiza tion requirements. Which three modules should you recommend be used in sequence? To answer, move the appropriate modules from the list of modules to the answer area and arr ange them in the correct order. Select and Place:", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Step 1: Sweep Clustering Start by using the \"Tune Model Hyperparameters\" mod ule to select the best sets of parameters for each of the models we're considering. One of the interesting things about the \"Tune Model Hyperparameters\" module is that it not only output s the results from the Tuning, it also outputs the Traine d Model. Step 2: Train Model Step 3: Evaluate Model Scenario: You need to provide the test results to t he Fabrikam Residences team. You create data visual izations to aid in presenting the results. You must produce a Receiver Operating Characteristi c (ROC) curve to conduct a diagnostic test evaluati on of the model. You need to select appropriate methods f or producing the ROC curve in Azure Machine Learnin g Studio to compare the Two-Class Decision Forest and the Two-Class Decision Jungle modules with one another.", "references": "http://breaking-bi.blogspot.com/2017/01/azure-machi ne-learning-model-evaluation.html" }, { "question": "You need to visually identify whether outliers exis t in the Age column and quantify the outliers befor e the outliers are removed. Which three Azure Machine Learning Studio modules s hould you use? Each correct answer presents part of the solution. D283ABFBEDB32CDCE3B3406B9C29DB2F NOTE: Each correct selection is worth one point.", "options": [ "A. Create Scatterplot", "B. Summarize Data", "C. Clip Values", "D. Replace Discrete Values" ], "correct": "", "explanation": "B: To have a global view, the summarize data module can be used. Add the module and connect it to the data set that needs to be visualized. A: One way to quickly identify Outliers visually is to create scatter plots. C: The easiest way to treat the outliers in Azure M L is to use the Clip Values module. It can identify and optionally replace data values that are above or be low a specified threshold. You can use the Clip Values module in Azure Machine Learning Studio, to identify and optionally replac e data values that are above or below a specified threshol d. This is useful when you want to remove outliers or replace them with a mean, a constant, or other substitute v alue.", "references": "https://blogs.msdn.microsoft.com/azuredev/2017/05/2 7/data-cleansing-tools-in-azure-machine-learning/ https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/clip-values" }, { "question": "HOTSPOT You need to identify the methods for dividing the d ata according to the testing requirements. Which properties should you select? To answer, sele ct the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area: D283ABFBEDB32CDCE3B3406B9C29DB2F D283ABFBEDB32CDCE3B3406B9C29DB2F", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Scenario: Testing You must produce multiple partitions of a dataset b ased on sampling using the Partition and Sample mod ule in Azure Machine Learning Studio. Box 1: Assign to folds Use Assign to folds option when you want to divide the dataset into subsets of the data. This option i s also useful when you want to create a custom number of f olds for cross-validation, or to split rows into se veral groups. Not Head: Use Head mode to get only the first n row s. This option is useful if you want to test a pipe line on a small number of rows, and don't need the data to be balanced or sampled in any way. Not Sampling: The Sampling option supports simple r andom sampling or stratified random sampling. This is useful if you want to create a smaller representati ve sample dataset for testing. Box 2: Partition evenly Specify the partitioner method: Indicate how you wa nt data to be apportioned to each partition, using these options: Partition evenly: Use this option to place an equal number of rows in each partition. To specify the n umber of output partitions, type a whole number in the Sp ecify number of folds to split evenly into text box .", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/algorithm-module-reference/partition-and-sampl e" }, { "question": "HOTSPOT You need to configure the Edit Metadata module so t hat the structure of the datasets match. Which configuration options should you select? To a nswer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area: D283ABFBEDB32CDCE3B3406B9C29DB2F A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: Floating point Need floating point for Median values. Scenario: An initial investigation shows that the d atasets are identical in structure apart from the M edianValue column. The smaller Paris dataset contains the Medi anValue in text format, whereas the larger London d ataset contains the MedianValue in numerical format. Box 2: Unchanged D283ABFBEDB32CDCE3B3406B9C29DB2F Note: Select the Categorical option to specify that the values in the selected columns should be treat ed as categories. For example, you might have a column that contains the numbers 0,1 and 2, but know that the numbers actually mean \"Smoker\", \"Non smoker\" and \"Unknown\". In that case, by flagging the column as categorica l you can ensure that the values are not used in numeric calculations, only to group data.", "references": "" }, { "question": "HOTSPOT You need to configure the Permutation Feature Impor tance module for the model training requirements. What should you do? To answer, select the appropria te options in the dialog box in the answer area. NOTE: Each correct selection is worth one point. Hot Area:", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: 500 For Random seed, type a value to use as seed for ra ndomization. If you specify 0 (the default), a numb er is generated based on the system clock. A seed value is optional, but you should provide a value if you want reproducibility across runs of th e same experiment. Here we must replicate the findings. Box 2: Mean Absolute Error Scenario: Given a trained model and a test dataset, you must compute the Permutation Feature Importanc e scores of feature variables. You need to set up the Permutation Feature Importance module to select th e correct metric to investigate the model's accuracy and replicate the findings. Regression. Choose one of the following: Precision, Recall, Mean Absolute Error, Root Mean Squared Err or, Relative Absolute Error, Relative Squared Error, Co efficient of Determination", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/permutation-feature- importance" }, { "question": "D283ABFBEDB32CDCE3B3406B9C29DB2F You need to select a feature extraction method. Which method should you use?", "options": [ "A. Mutual information", "B. Pearson's correlation", "C. Spearman correlation", "D. Fisher Linear Discriminant Analysis", "A.", "B.", "C.", "D. Correct Answer:" ], "correct": "C. Spearman correlation", "explanation": "Box 1: Accuracy Scenario: You want to configure hyperparameters in the model learning process to speed the learning ph ase by using hyperparameters. In addition, this configurat ion should cancel the lowest performing runs at eac h evaluation interval, thereby directing effort and r esources towards models that are more likely to be successful. Box 2: R-Squared", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/feature-selection-modu les QUESTION 178 HOTSPOT You need to set up the Permutation Feature Importan ce module according to the model training requireme nts. Which properties should you select? To answer, sele ct the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area: D283ABFBEDB32CDCE3B3406B9C29DB2F" }, { "question": "HOTSPOT You need to configure the Feature Based Feature Sel ection module based on the experiment requirements and datasets. D283ABFBEDB32CDCE3B3406B9C29DB2F How should you configure the module properties? To answer, select the appropriate options in the dialo g box in the answer area. NOTE: Each correct selection is worth one point. Hot Area:", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: Mutual Information. The mutual information score is particularly useful in feature selection because it maximizes the mutu al information between the joint distribution and targ et variables in datasets with many dimensions. Box 2: MedianValue MedianValue is the feature column, , it is the pred ictor of the dataset. Scenario: The MedianValue and AvgRoomsinHouse colum ns both hold data in numeric format. You need to select a feature selection algorithm to analyze the relationship between the two columns in more detai l.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/filter-based-feature-s election" }, { "question": "You need to select a feature extraction method. Which method should you use?", "options": [ "A. Mutual information B. Mood's median test", "C. Kendall correlation", "D. Permutation Feature Importance" ], "correct": "C. Kendall correlation", "explanation": "In statistics, the Kendall rank correlation coeffic ient, commonly referred to as Kendall's tau coeffic ient (after the Greek letter ), is a statistic used to measure the ordinal association between two measured quantities . It is a supported method of the Azure Machine Learning Feat ure selection. Note: Both Spearman's and Kendall's can be formulat ed as special cases of a more general correlation coefficient, and they are both appropriate in this scenario. Scenario: The MedianValue and AvgRoomsInHouse colum ns both hold data in numeric format. You need to select a feature selection algorithm to analyze the relationship between the two columns in more detai l.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/feature-selection-modu les" }, { "question": "DRAG DROP You need to implement an early stopping criteria po licy for model training. Which three code segments should you use to develop the solution? To answer, move the appropriate code segments from the list of code segments to the answ er area and arrange them in the correct order. NOTE: More than one order of answer choices is corr ect. You will receive credit for any of the correct orders you select. Select and Place: A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "You need to implement an early stopping criterion o n models that provides savings without terminating promising jobs. Truncation selection cancels a given percentage of lowest performing runs at each evaluation interval. Runs are compared based on their performance on the prim ary metric and the lowest X% are terminated. Example: from azureml.train.hyperdrive import TruncationSele ctionPolicy early_termination_policy = TruncationSelectionPolicy(evaluation_interval=1, tr uncation_percentage=20, delay_evaluation=5) Incorrect Answers: Bandit is a termination policy based on slack facto r/slack amount and evaluation interval. The policy early terminates any runs where the primary metric is not within the specified slack factor / slack amount w ith respect to the best performing training run. Example: from azureml.train.hyperdrive import BanditPolicy early_termination_policy = BanditPolicy(slack_facto r = 0.1, evaluation_interval=1, delay_evaluation=5", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/service/how-to-tune-hyperparameters" }, { "question": "You need to implement early stopping criteria as st ated in the model training requirements. Which three code segments should you use to develop the solution? To answer, move the appropriate code segments from the list of code segments to the answ er area and arrange them in the correct order. NOTE: More than one order of answer choices is corr ect. You will receive the credit for any of the cor rect orders you select. Select and Place: D283ABFBEDB32CDCE3B3406B9C29DB2F", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Step 1: from azureml.train.hyperdrive Step 2: Import TruncationCelectionPolicy Truncation selection cancels a given percentage of lowest performing runs at each evaluation interval. Runs are compared based on their performance on the prim ary metric and the lowest X% are terminated. Scenario: You must configure hyperparameters in the model learning process to speed the learning phase . In addition, this configuration should cancel the lowe st performing runs at each evaluation interval, the reby directing effort and resources towards models that are more likely to be successful. Step 3: early_terminiation_policy = TruncationSelec tionPolicy.. Example: from azureml.train.hyperdrive import TruncationSele ctionPolicy early_termination_policy = TruncationSelectionPolicy(evaluation_interval=1, tr uncation_percentage=20, delay_evaluation=5) In this example, the early termination policy is ap plied at every interval starting at evaluation inte rval 5. A run will be terminated at interval 5 if its performance at i nterval 5 is in the lowest 20% of performance of al l runs at interval 5. Incorrect Answers: D283ABFBEDB32CDCE3B3406B9C29DB2F Median: Median stopping is an early termination policy base d on running averages of primary metrics reported b y the runs. This policy computes running averages across all training runs and terminates runs whose perform ance is worse than the median of the running averages. Slack: Bandit is a termination policy based on slack facto r/slack amount and evaluation interval. The policy early terminates any runs where the primary metric is not within the specified slack factor / slack amount w ith respect to the best performing training run.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/service/how-to-tune-hyperparameters D283ABFBEDB32CDCE3B3406B9C29DB2F Deploy and operationalize machine learning solution s Question Set 1" }, { "question": "HOTSPOT You are a lead data scientist for a project that tr acks the health and migration of birds. You create a multi-image classification deep learning model that uses a set of labeled bird photos collected by experts. You pl an to use the model to develop a cross-platform mobile app th at predicts the species of bird captured by app use rs. You must test and deploy the trained model as a web service. The deployed model must meet the followin g requirements: An authenticated connection must not be required fo r testing. The deployed model must perform with low latency du ring inferencing. The REST endpoints must be scalable and should have a capacity to handle large number of requests when multiple end users are using the mobile applic ation. You need to verify that the web service returns pre dictions in the expected JSON format when a valid R EST request is submitted. Which compute resources should you use? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area:", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: ds-workstation notebook VM An authenticated connection must not be required fo r testing. On a Microsoft Azure virtual machine (VM ), including a Data Science Virtual Machine (DSVM), yo u create local user accounts while provisioning the VM. Users then authenticate to the VM by using these cr edentials. Box 2: gpu-compute cluster Image classification is well suited for GPU compute clusters", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/data-science-virtual-machine/dsvm-common-ident ity https://docs.microsoft.com/en-us/azure/architecture /reference-architectures/ai/training-deep-learning" }, { "question": "You create a deep learning model for image recognit ion on Azure Machine Learning service using GPU-bas ed training. You must deploy the model to a context that allows for real-time GPU-based inferencing. You need to configure compute resources for model i nferencing. Which compute type should you use?", "options": [ "A. Azure Container Instance", "B. Azure Kubernetes Service", "C. Field Programmable Gate Array", "D. Machine Learning Compute Correct Answer: B" ], "correct": "", "explanation": "You can use Azure Machine Learning to deploy a GPU- enabled model as a web service. Deploying a model o n Azure Kubernetes Service (AKS) is one option. The A KS cluster provides a GPU resource that is used by the model for inference. Inference, or model scoring, is the phase where the deployed model is used to make predictions. Using GPUs instead of CPUs offers performance advantages on hi ghly parallelizable computation.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-deploy-inferencing-gpus" }, { "question": "You create a batch inference pipeline by using the Azure ML SDK. You run the pipeline by using the fol lowing code: from azureml.pipeline.core import Pipeline from azureml.core.experiment import Experiment pipeline = Pipeline(workspace=ws, steps=[parallelru n_step]) pipeline_run = Experiment(ws, 'batch_pipeline').sub mit(pipeline) You need to monitor the progress of the pipeline ex ecution. What are two possible ways to achieve this goal? Ea ch correct answer presents a complete solution. NOTE: Each correct selection is worth one point.", "options": [ "A. Run the following code in a notebook:", "B. Use the Inference Clusters tab in Machine Learnin g Studio.", "C. Use the Activity log in the Azure portal for the Machine Learning workspace.", "D. Run the following code in a notebook: E. Run the following code and monitor the console ou tput from the PipelineRun object:" ], "correct": "", "explanation": "A batch inference job can take a long time to finis h. This example monitors progress by using a Jupyte r widget. You can also manage the job's progress by using: Azure Machine Learning Studio. Console output from the PipelineRun object. D283ABFBEDB32CDCE3B3406B9C29DB2F from azureml.widgets import RunDetails RunDetails(pipeline_run).show() pipeline_run.wait_for_completion(show_output=True)", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-use-parallel-run-step#monitor-the-paral lel-run- job" }, { "question": "You train and register a model in your Azure Machin e Learning workspace. You must publish a pipeline that enables client app lications to use the model for batch inferencing. Y ou must use a pipeline with a single ParallelRunStep step t hat runs a Python inferencing script to get predict ions from the input data. You need to create the inferencing script for the P arallelRunStep pipeline step. Which two functions should you include? Each correc t answer presents part of the solution. NOTE: Each correct selection is worth one point.", "options": [ "A. run(mini_batch)", "B. main()", "C. batch()", "D. init()" ], "correct": "", "explanation": "Explanation/Reference:", "references": "https://github.com/Azure/MachineLearningNotebooks/t ree/master/how-to-use-azureml/machine-learning- pipelines/parallel-run" }, { "question": "You deploy a model as an Azure Machine Learning rea l-time web service using the following code. The deployment fails. You need to troubleshoot the deployment failure by determining the actions that were performed during deployment and identifying the specific action that failed. Which code segment should you run?", "options": [ "A. service.get_logs()", "B. service.state", "C. service.serialize()", "D. service.update_deployment_state()" ], "correct": "A. service.get_logs()", "explanation": "You can print out detailed Docker engine log messag es from the service object. You can view the log fo r ACI, AKS, and Local deployments. The following example d emonstrates how to print the logs. # if you already have the service object handy print(service.get_logs()) # if you only know the name of the service (note th ere might be multiple services with the same name b ut different version number) print(ws.webservices['mysvc'].get_logs())", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-troubleshoot-deployment" }, { "question": "You deploy a model in Azure Container Instance. You must use the Azure Machine Learning SDK to call the model API. You need to invoke the deployed model using native SDK classes and methods. How should you complete the command? To answer, sel ect the appropriate options in the answer areas. NOTE: Each correct selection is worth one point. Hot Area:", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: from azureml.core.webservice import Webservi ce The following code shows how to use the SDK to upda te the model, environment, and entry script for a w eb service to Azure Container Instances: from azureml.core import Environment from azureml.core.webservice import Webservice from azureml.core.model import Model, InferenceConf ig Box 2: predictions = service.run(input_json) Example: The following code demonstrates sending da ta to the service: import json test_sample = json.dumps({'data': [ [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [10, 9, 8, 7, 6, 5, 4, 3, 2, 1] ]}) test_sample = bytes(test_sample, encoding='utf8') prediction = service.run(input_data=test_sample) print(prediction)", "references": "https://docs.microsoft.com/bs-latn-ba/azure/machine -learning/how-to-deploy-azure-container-instance https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-troubleshoot-deployment" }, { "question": "You create a multi-class image classification deep learning model. D283ABFBEDB32CDCE3B3406B9C29DB2F You train the model by using PyTorch version 1.2. You need to ensure that the correct version of PyTo rch can be identified for the inferencing environme nt when the model is deployed. What should you do?", "options": [ "A. Save the model locally as a.pt file, and deploy t he model as a local web service.", "B. Deploy the model on computer that is configured t o use the default Azure Machine Learning conda", "C. Register the model with a .pt file extension and the default version property.", "D. Register the model, specifying the model_framewor k and model_framework_version properties." ], "correct": "D. Register the model, specifying the model_framewor k and model_framework_version properties.", "explanation": "framework_version: The PyTorch version to be used f or executing training code.", "references": "https://docs.microsoft.com/en-us/python/api/azureml -train-core/azureml.train.dnn.pytorch?view=azure-ml -py" }, { "question": "You train a machine learning model. You must deploy the model as a real-time inference service for testing. The service requires low CPU u tilization and less than 48 MB of RAM. The compute target for the deployed service must initialize automatically while minimizing cost and administrative overhead. Which compute target should you use?", "options": [ "A. Azure Container Instance (ACI)", "B. attached Azure Databricks cluster", "C. Azure Kubernetes Service (AKS) inference cluster", "D. Azure Machine Learning compute cluster" ], "correct": "A. Azure Container Instance (ACI)", "explanation": "Azure Container Instances (ACI) are suitable only f or small models less than 1 GB in size. Use it for low-scale CPU-based workloads that require less than 48 GB of RAM. Note: Microsoft recommends using single-node Azure Kubernetes Service (AKS) clusters for dev-test of l arger models.", "references": "https://docs.microsoft.com/id-id/azure/machine-lear ning/how-to-deploy-and-where" }, { "question": "You register a model that you plan to use in a batc h inference pipeline. D283ABFBEDB32CDCE3B3406B9C29DB2F The batch inference pipeline must use a ParallelRun Step step to process files in a file dataset. The s cript has the ParallelRunStep step runs must process six inpu t files each time the inferencing function is calle d. You need to configure the pipeline. Which configuration setting should you specify in t he ParallelRunConfig object for the PrallelRunStep step?", "options": [ "A. process_count_per_node= \"6\"", "B. node_count= \"6\"", "C. mini_batch_size= \"6\"", "D. error_threshold= \"6\"" ], "correct": "B. node_count= \"6\"", "explanation": "node_count is the number of nodes in the compute ta rget used for running the ParallelRunStep. Incorrect Answers: A: process_count_per_node Number of processes executed on each node. (optiona l, default value is number of cores on node.) C: mini_batch_size For FileDataset input, this field is the number of files user script can process in one run() call. Fo r TabularDataset input, this field is the approximate size of data the user script can process in one ru n() call. Example values are 1024, 1024KB, 10MB, and 1GB. D: error_threshold The number of record failures for TabularDataset an d file failures for FileDataset that should be igno red during processing. If the error count goes above this valu e, then the job will be aborted.", "references": "https://docs.microsoft.com/en-us/python/api/azureml -contrib-pipeline-steps/ azureml.contrib.pipeline.steps.parallelrunconfig?vi ew=azure-ml-py" }, { "question": "You deploy a real-time inference service for a trai ned model. The deployed model supports a business-critical app lication, and it is important to be able to monitor the data submitted to the web service and the predictions th e data generates. You need to implement a monitoring solution for the deployed model using minimal administrative effort . What should you do?", "options": [ "A. View the explanations for the registered model in A zure ML studio.", "B. Enable Azure Application Insights for the service endpoint and view logged data in the Azure portal.", "C. View the log files generated by the experiment used to train the model.", "D. Create an ML Flow tracking URI that references th e endpoint, and view the data logged by ML Flow." ], "correct": "B. Enable Azure Application Insights for the service endpoint and view logged data in the Azure portal.", "explanation": "D283ABFBEDB32CDCE3B3406B9C29DB2F Configure logging with Azure Machine Learning studi o You can also enable Azure Application Insights from Azure Machine Learning studio. When you're ready t o deploy your model as a web service, use the followi ng steps to enable Application Insights: 1. Sign in to the studio at https://ml.azure.com. 2. Go to Models and select the model you want to de ploy. 3. Select +Deploy. 4. Populate the Deploy model form. 5. Expand the Advanced menu. 6. Select Enable Application Insights diagnostics a nd data collection. Reference: https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-enable-app-insights", "references": "" }, { "question": "HOTSPOT You use Azure Machine Learning to train and registe r a model. You must deploy the model into production as a real -time web service to an inference cluster named ser vice- compute that the IT department has created in the A zure Machine Learning workspace. Client applications consuming the deployed web serv ice must be authenticated based on their Azure Acti ve Directory service principal. You need to write a script that uses the Azure Mach ine Learning SDK to deploy the model. The necessary modules have been imported. How should you complete the code? To answer, select the appropriate options in the answer area. D283ABFBEDB32CDCE3B3406B9C29DB2F NOTE: Each correct selection is worth one point. Hot Area: A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: AksCompute Example: aks_target = AksCompute(ws,\"myaks\") # If deploying to a cluster configured for dev/test , ensure that it was created with enough # cores an d memory to handle this deployment configuration. Note that memory is also used by D283ABFBEDB32CDCE3B3406B9C29DB2F # things such as dependencies and AML components. deployment_config = AksWebservice.deploy_configurat ion(cpu_cores = 1, memory_gb = 1) service = Model.deploy(ws, \"myservice\", [model], inference_co nfig, deployment_config, aks_target) Box 2: AksWebservice Box 3: token_auth_enabled=Yes Whether or not token auth is enabled for the Webser vice. Note: A Service principal defined in Azure Active D irectory (Azure AD) can act as a principal on which authentication and authorization policies can be en forced in Azure Databricks. The Azure Active Directory Authentication Library ( ADAL) can be used to programmatically get an Azure AD access token for a user. Incorrect Answers: auth_enabled (bool): Whether or not to enable key a uth for this Webservice. Defaults to True.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-deploy-azure-kubernetes-service https://docs.microsoft.com/en-us/azure/databricks/d ev-tools/api/latest/aad/service-prin-aad-token" }, { "question": "An organization creates and deploys a multi-class i mage classification deep learning model that uses a set of labeled photographs. The software engineering team reports there is a he avy inferencing load for the prediction web service s during the summer. The production web service for the mode l fails to meet demand despite having a fully-utili zed compute cluster where the web service is deployed. You need to improve performance of the image classi fication web service with minimal downtime and mini mal administrative effort. What should you advise the IT Operations team to do ?", "options": [ "A. Create a new compute cluster by using larger VM s izes for the nodes, redeploy the web service to tha t", "B. Increase the node count of the compute cluster wh ere the web service is deployed.", "C. Increase the minimum node count of the compute cl uster where the web service is deployed.", "D. Increase the VM size of nodes in the compute clus ter where the web service is deployed." ], "correct": "B. Increase the node count of the compute cluster wh ere the web service is deployed.", "explanation": "The Azure Machine Learning SDK does not provide sup port scaling an AKS cluster. To scale the nodes in the cluster, use the UI for your AKS cluster in the Azu re Machine Learning studio. You can only change the node count, not the VM size of the cluster.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-create-attach-kubernetes" }, { "question": "You use Azure Machine Learning designer to create a real-time service endpoint. You have a single Azur e Machine Learning service compute resource. D283ABFBEDB32CDCE3B3406B9C29DB2F You train the model and prepare the real-time pipel ine for deployment. You need to publish the inference pipeline as a web service. Which compute type should you use?", "options": [ "A. a new Machine Learning Compute resource", "B. Azure Kubernetes Services", "C. HDInsight D. the existing Machine Learning Compute resource" ], "correct": "B. Azure Kubernetes Services", "explanation": "Azure Kubernetes Service (AKS) can be used real-tim e inference.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/concept-compute-target" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You train and register a machine learning model. You plan to deploy the model as a real-time web ser vice. Applications must use key-based authenticatio n to use the model. You need to deploy the web service. Solution: Create an AciWebservice instance. Set the value of the ssl_enabled property to True. Deploy the model to the service. Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "Instead use only auth_enabled = TRUE Note: Key-based authentication. D283ABFBEDB32CDCE3B3406B9C29DB2F Web services deployed on AKS have key-based auth en abled by default. ACI-deployed services have key- based auth disabled by default, but you can enable it by setting auth_enabled = TRUE when creating the ACI web service. The following is an example of creatin g an ACI deployment configuration with key-based au th enabled. deployment_config <- aci_webservice_deployment_conf ig(cpu_cores = 1, memory_gb = 1, auth_enabled = TRUE)", "references": "https://azure.github.io/azureml-sdk-for-r/articles/ deploying-models.html" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You train and register a machine learning model. You plan to deploy the model as a real-time web ser vice. Applications must use key-based authenticatio n to use the model. You need to deploy the web service. Solution: Create an AciWebservice instance. Set the value of the auth_enabled property to True. Deploy the model to the service. Does the solution meet the goal?", "options": [ "A. Yes", "B. No", "A. Yes", "B. No" ], "correct": "B. No", "explanation": "Instead use only auth_enabled = TRUE Note: Key-based authentication. Web services deployed on AKS have key-based auth en abled by default. ACI-deployed services have key- based auth disabled by default, but you can enable it by setting auth_enabled = TRUE when creating the ACI web service. The following is an example of creatin g an ACI deployment configuration with key-based au th enabled. deployment_config <- aci_webservice_deployment_conf ig(cpu_cores = 1, memory_gb = 1, auth_enabled = TRUE)", "references": "https://azure.github.io/azureml-sdk-for-r/articles/ deploying-models.html" }, { "question": "You use the following Python code in a notebook to deploy a model as a web service: from azureml.core.webservice import AciWebservice from azureml.core.model import InferenceConfig inference_config = InferenceConfig(runtime='python' , source_directory='model_files', entry_script='score .py', conda_file='env.yml') deployment_config = AciWebservice.deploy_configuration(cpu_cores=1, mem ory_gb=1) service = Model.deploy(ws, 'my-service', [model], inference_config, deployment_config) service.wait_for_deployment(True) The deployment fails. D283ABFBEDB32CDCE3B3406B9C29DB2F You need to use the Python SDK in the notebook to d etermine the events that occurred during service deployment an initialization. Which code segment should you use?", "options": [ "A. service.state", "B. service.get_logs()", "C. service.serialize()", "D. service.environment" ], "correct": "B. service.get_logs()", "explanation": "The first step in debugging errors is to get your d eployment logs. In Python: service.get_logs()", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-troubleshoot-deployment" }, { "question": "You use the Azure Machine Learning Python SDK to de fine a pipeline that consists of multiple steps. When you run the pipeline, you observe that some st eps do not run. The cached output from a previous r un is used instead. You need to ensure that every step in the pipeline is run, even if the parameters and contents of the source directory have not changed since the previous run. What are two possible ways to achieve this goal? Ea ch correct answer presents a complete solution. NOTE: Each correct selection is worth one point.", "options": [ "A. Use a PipelineData object that references a datas tore other than the default datastore.", "B. Set the regenerate_outputs property of the pipeli ne to True.", "C. Set the allow_reuse property of each step in the pipeline to False.", "D. Restart the compute cluster where the pipeline ex periment is configured to run." ], "correct": "", "explanation": "B: If regenerate_outputs is set to True, a new subm it will always force generation of all step outputs , and disallow data reuse for any step of this run. Once this run is complete, however, subsequent runs may reuse the results of this run. C: Keep the following in mind when working with pip eline steps, input/output data, and step reuse. If data used in a step is in a datastore and allow_reuse is True , then changes to the data change won't be detected. If the data is uploaded as part of the sn apshot (under the step's source_directory), though this is not recommended, then the hash will change and will tri gger a rerun.", "references": "https://docs.microsoft.com/en-us/python/api/azureml -pipeline-core/azureml.pipeline.core.pipelinestep D283ABFBEDB32CDCE3B3406B9C29DB2F https://github.com/Azure/MachineLearningNotebooks/b lob/master/how-to-use-azureml/machine-learning- pipelines/intro-to-pipelines/aml-pipelines-getting- started.ipynb" }, { "question": "You train a model and register it in your Azure Mac hine Learning workspace. You are ready to deploy th e model as a real-time web service. You deploy the model to an Azure Kubernetes Service (AKS) inference cluster, but the deployment fails because an error occurs when the service runs the e ntry script that is associated with the model deplo yment. You need to debug the error by iteratively modifyin g the code and reloading the service, without requi ring a re- deployment of the service for each code update. What should you do?", "options": [ "A. Modify the AKS service deployment configuration t o enable application insights and re-deploy to AKS.", "B. Create an Azure Container Instances (ACI) web ser vice deployment configuration and deploy the model on", "C. Add a breakpoint to the first line of the entry s cript and redeploy the service to AKS.", "D. Create a local web service deployment configurati on and deploy the model to a local Docker container ." ], "correct": "B. Create an Azure Container Instances (ACI) web ser vice deployment configuration and deploy the model on", "explanation": "How to work around or solve common Docker deploymen t errors with Azure Container Instances (ACI) and Azure Kubernetes Service (AKS) using Azure Machine Learning. The recommended and the most up to date approach fo r model deployment is via the Model.deploy() API us ing an Environment object as an input parameter. In thi s case our service will create a base docker image for you during deployment stage and mount the required mode ls all in one call. The basic deployment tasks are: 1. Register the model in the workspace model regist ry. 2. Define Inference Configuration: a) Create an Environment object based on the depend encies you specify in the environment yaml file or use one of our procured environments. b) Create an inference configuration (InferenceConf ig object) based on the environment and the scoring script. 3. Deploy the model to Azure Container Instance (AC I) service or to Azure Kubernetes Service (AKS).", "references": "" }, { "question": "You use Azure Machine Learning designer to create a training pipeline for a regression model. You need to prepare the pipeline for deployment as an endpoint that generates predictions asynchronous ly for a dataset of input data values. What should you do?", "options": [ "A. Clone the training pipeline.", "B. Create a batch inference pipeline from the traini ng pipeline.", "C. Create a real-time inference pipeline from the tr aining pipeline.", "D. Replace the dataset in the training pipeline with an Enter Data Manually module." ], "correct": "C. Create a real-time inference pipeline from the tr aining pipeline.", "explanation": "You must first convert the training pipeline into a real-time inference pipeline. This process removes training modules and adds web service inputs and outputs to handle requests. Incorrect Answers: A: Use the Enter Data Manually module to create a s mall dataset by typing values.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/tutorial-designer-automobile-price-deploy https://docs.microsoft.com/en-us/azure/machine-lear ning/algorithm-module-reference/enter-data-manually" }, { "question": "You retrain an existing model. You need to register the new version of a model whi le keeping the current version of the model in the registry. What should you do?", "options": [ "A. Register a model with a different name from the e xisting model and a custom property named version w ith", "B. Register the model with the same name as the exis ting model.", "C. Save the new model in the default datastore with the same name as the existing model. Do not registe r the", "D. Delete the existing model and register the new on e with the same name." ], "correct": "", "explanation": "Model version: A version of a registered model. Whe n a new model is added to the Model Registry, it is added as Version 1. Each model registered to the same mod el name increments the version number.", "references": "https://docs.microsoft.com/en-us/azure/databricks/a pplications/mlflow/model-registry" }, { "question": "You use the Azure Machine Learning SDK to run a tra ining experiment that trains a classification model and calculates its accuracy metric. The model will be retrained each month as new data is available. You must register the model for use in a batch infe rence pipeline. You need to register the model and ensure that the models created by subsequent retraining experiments are registered only if their accuracy is higher than th e currently registered model. What are two possible ways to achieve this goal? Ea ch correct answer presents a complete solution. NOTE: Each correct selection is worth one point. D283ABFBEDB32CDCE3B3406B9C29DB2F", "options": [ "A. Specify a different name for the model each time you register it.", "B. Register the model with the same name each time r egardless of accuracy, and always use the latest", "C. Specify the model framework version when register ing the model, and only register subsequent models if", "D. Specify a property named accuracy with the accura cy metric as a value when registering the model, an d" ], "correct": "", "explanation": "E: Using tags, you can track useful information suc h as the name and version of the machine learning l ibrary used to train the model. Note that tags must be alp hanumeric.", "references": "https://notebooks.azure.com/xavierheriat/projects/a zureml-getting-started/html/how-to-use-azureml/ deployment/register-model-create-image-deploy-servi ce/register-model-create-image-deploy-service.ipynb" }, { "question": "You are a data scientist working for a hotel bookin g website company. You use the Azure Machine Learni ng service to train a model that identifies fraudulent transactions. You must deploy the model as an Azure Machine Learn ing real-time web service using the Model.deploy method in the Azure Machine Learning SDK. The deplo yed web service must return real-time predictions o f fraud based on transaction data input. You need to create the script that is specified as the entry_script parameter for the InferenceConfig class used to deploy the model. What should the entry script do?", "options": [ "A. Register the model with appropriate tags and prop erties.", "B. Create a Conda environment for the web service co mpute and install the necessary Python packages.", "C. Load the model and use it to predict labels from input data.", "D. Start a node on the inference cluster where the w eb service is deployed." ], "correct": "C. Load the model and use it to predict labels from input data.", "explanation": "The entry script receives data submitted to a deplo yed web service and passes it to the model. It then takes the response returned by the model and returns that to the client. The script is specific to your model. I t must understand the data that the model expects and retu rns. The two things you need to accomplish in your entry script are: Loading your model (using a function called init()) Running your model on input data (using a function called run()) D283ABFBEDB32CDCE3B3406B9C29DB2F", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-deploy-and-where" }, { "question": "DRAG DROP You use Azure Machine Learning to deploy a model as a real-time web service. You need to create an entry script for the service that ensures that the model is loaded when the serv ice starts and is used to score new data as it is received. Which functions should you include in the script? T o answer, drag the appropriate functions to the cor rect actions. Each function may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content. NOTE: Each correct selection is worth one point. Select and Place:", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: init() The entry script has only two required functions, i nit() and run(data). These functions are used to in itialize the service at startup and run the model using request data passed in by a client. The rest of the script handles loading and running the model(s). Box 2: run()", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-deploy-existing-model" }, { "question": "You develop and train a machine learning model to p redict fraudulent transactions for a hotel booking website. Traffic to the site varies considerably. The site e xperiences heavy traffic on Monday and Friday and m uch lower traffic on other days. Holidays are also high web t raffic days. You need to deploy the model as an Azure Machine Le arning real-time web service endpoint on compute th at can dynamically scale up and down to support demand . Which deployment compute option should you use?", "options": [ "A. attached Azure Databricks cluster", "B. Azure Container Instance (ACI)", "C. Azure Kubernetes Service (AKS) inference cluster", "D. Azure Machine Learning Compute Instance" ], "correct": "D. Azure Machine Learning Compute Instance", "explanation": "Azure Machine Learning compute cluster is a managed -compute infrastructure that allows you to easily c reate a single or multi-node compute. The compute is crea ted within your workspace region as a resource that can be shared with other users in your workspace. The c ompute scales up automatically when a job is submit ted, and can be put in an Azure Virtual Network.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-create-attach-compute-sdk" }, { "question": "You use the designer to create a training pipeline for a classification model. The pipeline uses a dat aset that includes the features and labels required for model training. You create a real-time inference pipeline from the training pipeline. You observe that the schema for the generated web service input is based on the dataset and includes the label column that the model predi cts. Client applications that use the service must not b e required to submit this value. You need to modify the inference pipeline to meet t he requirement. What should you do? A. Add a Select Columns in Dataset module to the inf erence pipeline after the dataset and use it to sel ect all D283ABFBEDB32CDCE3B3406B9C29DB2F columns other than the label.", "options": [ "B. Delete the dataset from the training pipeline and recreate the real-time inference pipeline.", "C. Delete the Web Service Input module from the infe rence pipeline.", "D. Replace the dataset in the inference pipeline wit h an Enter Data Manually module that includes data for the" ], "correct": "", "explanation": "By default, the Web Service Input will expect the s ame data schema as the module output data which con nects to the same downstream port as it. You can remove t he target variable column in the inference pipeline using Select Columns in Dataset module. Make sure that th e output of Select Columns in Dataset removing targ et variable column is connected to the same port as th e output of the Web Service Intput module.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/tutorial-designer-automobile-price-deploy" }, { "question": "You use the Azure Machine Learning designer to crea te and run a training pipeline. You then create a r eal-time inference pipeline. You must deploy the real-time inference pipeline as a web service. What must you do before you deploy the real-time in ference pipeline?", "options": [ "A. Run the real-time inference pipeline.", "B. Create a batch inference pipeline.", "C. Clone the training pipeline.", "D. Create an Azure Machine Learning compute cluster." ], "correct": "D. Create an Azure Machine Learning compute cluster.", "explanation": "You need to create an inferencing cluster. Deploy the real-time endpoint After your AKS service has finished provisioning, r eturn to the real-time inferencing pipeline to comp lete deployment. 1. Select Deploy above the canvas. 2. Select Deploy new real-time endpoint. 3. Select the AKS cluster you created. 4. Select Deploy. Reference: https://docs.microsoft.com/en-us/azure/machine-lear ning/tutorial-designer-automobile-price-deploy", "references": "" }, { "question": "You create an Azure Machine Learning workspace name d ML-workspace. You also create an Azure Databricks workspace named DB-workspace. DB-workspace contains a cluster named DB-cluster. You must use DB-cluster to run experiments from not ebooks that you import into DB-workspace. D283ABFBEDB32CDCE3B3406B9C29DB2F You need to use ML-workspace to track MLflow metric s and artifacts generated by experiments running on DB- cluster. The solution must minimize the need for cu stom code. What should you do?", "options": [ "A. From DB-cluster, configure the Advanced Logging o ption.", "B. From DB-workspace, configure the Link Azure ML wo rkspace option.", "C. From ML-workspace, create an attached compute.", "D. From ML-workspace, create a compute cluster." ], "correct": "B. From DB-workspace, configure the Link Azure ML wo rkspace option.", "explanation": "Connect your Azure Databricks and Azure Machine Lea rning workspaces: Linking your ADB workspace to your Azure Machine Le arning workspace enables you to track your experime nt data in the Azure Machine Learning workspace. To link your ADB workspace to a new or existing Azu re Machine Learning workspace 1. Sign in to Azure portal. 2. Navigate to your ADB workspace's Overview page. 3. Select the Link Azure Machine Learning workspace button on the bottom right. Reference: https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-use-mlflow-azure-databricks", "references": "" }, { "question": "HOTSPOT You create an Azure Machine Learning workspace. You need to detect data drift between a baseline da taset and a subsequent target dataset by using the D283ABFBEDB32CDCE3B3406B9C29DB2F DataDriftDetector class. How should you complete the code segment? To answer , select the appropriate options in the answer area . NOTE: Each correct selection is worth one point. Hot Area: A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: create_from_datasets The create_from_datasets method creates a new DataD riftDetector object from a baseline tabular dataset and a target time series dataset. Box 2: backfill The backfill method runs a backfill job over a give n specified start and end date. Syntax: backfill(start_date, end_date, compute_targ et=None, create_compute_target=False) Incorrect Answers: List and update do not have datetime parameters.", "references": "https://docs.microsoft.com/en-us/python/api/azureml -datadrift/azureml.datadrift.datadriftdetector(clas s)" }, { "question": "You are planning to register a trained model in an Azure Machine Learning workspace. You must store additional metadata about the model in a key-value format. You must be able to add new metadata and modify or delete metadata after creati on. You need to register the model. Which parameter should you use?", "options": [ "A. description", "B. model_framework", "C. tags", "D. properties" ], "correct": "D. properties", "explanation": "azureml.core.Model.properties: Dictionary of key value properties for the Model. T hese properties cannot be changed after registratio n, however new key value pairs can be added.", "references": "https://docs.microsoft.com/en-us/python/api/azureml -core/azureml.core.model.model" }, { "question": "You have a Python script that executes a pipeline. The script includes the following code: from azureml.core import Experiment pipeline_run = Experiment(ws, 'pipeline_test').subm it(pipeline) You want to test the pipeline before deploying the script. You need to display the pipeline run details writte n to the STDOUT output when the pipeline completes. D283ABFBEDB32CDCE3B3406B9C29DB2F Which code segment should you add to the test scrip t?", "options": [ "A. pipeline_run.get.metrics()", "B. pipeline_run.wait_for_completion(show_output=True )", "C. pipeline_param = PipelineParameter(name=\"stdout\",", "D. pipeline_run.get_status()" ], "correct": "B. pipeline_run.wait_for_completion(show_output=True )", "explanation": "Explanation/Reference: wait_for_completion: Wait for the completion of thi s run. Returns the status object after the wait. Syntax: wait_for_completion(show_output=False, wait _post_processing=False, raise_on_error=True) Parameter: show_output Indicates whether to show the run output on sys.std out. D283ABFBEDB32CDCE3B3406B9C29DB2F Implement Responsible ML Question Set 1", "references": "" }, { "question": "You are a data scientist working for a bank and hav e used Azure ML to train and register a machine lea rning model that predicts whether a customer is likely to repay a loan. You want to understand how your model is making sel ections and must be sure that the model does not vi olate government regulations such as denying loans based on where an applicant lives. You need to determine the extent to which each feat ure in the customer data is influencing predictions . What should you do?", "options": [ "A. Enable data drift monitoring for the model and it s training dataset.", "B. Score the model against some test data with known label values and use the results to calculate a", "C. Use the Hyperdrive library to test the model with multiple hyperparameter values.", "D. Use the interpretability package to generate an e xplainer for the model.", "B.", "C.", "D." ], "correct": "D. Use the interpretability package to generate an e xplainer for the model.", "explanation": "Box 1: from_run_id from_run_id(workspace, experiment_name, run_id) Create the client with factory method given a run I D. Returns an instance of the ExplanationClient. Parameters Workspace Workspace - An object that represents a w orkspace. experiment_name str - The name of an experiment. run_id str - A GUID that represents a run. Box 2: list_model_explanations list_model_explanations returns a dictionary of met adata for all model explanations available. Returns A dictionary of explanation metadata such as id, da ta type, explanation method, model type, and upload time, sorted by upload time Box 3: explanation", "references": "https://docs.microsoft.com/en-us/python/api/azureml -contrib-interpret/ azureml.contrib.interpret.explanation.explanation_c lient.explanationclient?view=azure-ml-py" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these D283ABFBEDB32CDCE3B3406B9C29DB2F questions will not appear in the review screen. You train a classification model by using a logisti c regression algorithm. You must be able to explain the model's predictions by calculating the importance of each feature, bot h as an overall global relative importance value and as a m easure of local importance for a specific set of pr edictions. You need to create an explainer that you can use to retrieve the required global and local feature imp ortance values. Solution: Create a MimicExplainer. Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "Instead use Permutation Feature Importance Explaine r (PFI). Note 1: Mimic explainer is based on the idea of tra ining global surrogate models to mimic blackbox mod els. A global surrogate model is an intrinsically interpre table model that is trained to approximate the pred ictions of any black box model as accurately as possible. Data scientists can interpret the surrogate model to dr aw conclusions about the black box model. Note 2: Permutation Feature Importance Explainer (P FI): Permutation Feature Importance is a technique used to explain classification and regression models. At a high level, the way it works is by randomly shuf fling data one feature at a time for the entire dataset and ca lculating how much the performance metric of intere st changes. The larger the change, the more important that feature is. PFI can explain the overall behavi or of any underlying model but does not explain individual pr edictions.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-machine-learning-interpretability" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You train a classification model by using a logisti c regression algorithm. You must be able to explain the model's predictions by calculating the importance of each feature, bot h as an overall global relative importance value and as a m easure of local importance for a specific set of pr edictions. You need to create an explainer that you can use to retrieve the required global and local feature imp ortance values. Solution: Create a TabularExplainer. Does the solution meet the goal? D283ABFBEDB32CDCE3B3406B9C29DB2F", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "Instead use Permutation Feature Importance Explaine r (PFI). Note 1: Note 2: Permutation Feature Importance Explainer (P FI): Permutation Feature Importance is a technique used to explain classification and regression models. At a high level, the way it works is by randomly shuf fling data one feature at a time for the entire dataset and ca lculating how much the performance metric of intere st changes. The larger the change, the more important that feature is. PFI can explain the overall behavi or of any underlying model but does not explain individual pr edictions. D283ABFBEDB32CDCE3B3406B9C29DB2F", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-machine-learning-interpretability" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You train a classification model by using a logisti c regression algorithm. You must be able to explain the model's predictions by calculating the importance of each feature, bot h as an overall global relative importance value and as a m easure of local importance for a specific set of pr edictions. You need to create an explainer that you can use to retrieve the required global and local feature imp ortance values. Solution: Create a PFIExplainer. Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "A. Yes", "explanation": "Permutation Feature Importance Explainer (PFI): Per mutation Feature Importance is a technique used to explain classification and regression models. At a high level, the way it works is by randomly shuffli ng data one feature at a time for the entire dataset and calcul ating how much the performance metric of interest c hanges. The larger the change, the more important that feat ure is. PFI can explain the overall behavior of any underlying model but does not explain individual predictions.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-machine-learning-interpretability" }, { "question": "HOTSPOT You are performing feature scaling by using the sci kit-learn Python library for x.1 x2, and x3 feature s. Original and scaled data is shown in the following image. D283ABFBEDB32CDCE3B3406B9C29DB2F Use the drop-down menus to select the answer choice that answers each question based on the informatio n presented in the graphic. NOTE: Each correct selection is worth one point. Hot Area:", "options": [ "A. B.", "C.", "D." ], "correct": "", "explanation": "Box 1: StandardScaler The StandardScaler assumes your data is normally di stributed within each feature and will scale them s uch that the distribution is now centred around 0, with a st andard deviation of 1. Example: D283ABFBEDB32CDCE3B3406B9C29DB2F All features are now on the same scale relative to one another. Box 2: Min Max Scaler Notice that the skewness of the distribution is mai ntained but the 3 distributions are brought into th e same scale so that they overlap. Box 3: Normalizer", "references": "http://benalexkeen.com/feature-scaling-with-scikit- learn/" }, { "question": "You are determining if two sets of data are signifi cantly different from one another by using Azure Ma chine Learning Studio. D283ABFBEDB32CDCE3B3406B9C29DB2F Estimated values in one set of data may be more tha n or less than reference values in the other set of data. You must produce a distribution that has a constant Type I error as a function of the correlation. You need to produce the distribution. Which type of distribution should you produce?", "options": [ "A. Unpaired t-test with a two-tail option", "B. Unpaired t-test with a one-tail option", "C. Paired t-test with a one-tail option", "D. Paired t-test with a two-tail option" ], "correct": "D. Paired t-test with a two-tail option", "explanation": "Choose a one-tail or two-tail test. The default is a two-tailed test. This is the most common type of test, in which the expected distribution is symmetric around zero. Example: Type I error of unpaired and paired two-sa mple t-tests as a function of the correlation. The simulated random numbers originate from a bivariate normal di stribution with a variance of 1. Reference: https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/test-hypothesis-using- t-test D283ABFBEDB32CDCE3B3406B9C29DB2F https://en.wikipedia.org/wiki/Student%27s_t-test", "references": "" }, { "question": "DRAG DROP You are producing a multiple linear regression mode l in Azure Machine Learning Studio. Several independent variables are highly correlated . You need to select appropriate methods for conducti ng effective feature engineering on all the data. Which three actions should you perform in sequence? To answer, move the appropriate actions from the l ist of actions to the answer area and arrange them in the correct order. Select and Place: A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "Explanation/Reference: Step 1: Use the Filter Based Feature Selection modu le Filter Based Feature Selection identifies the featu res in a dataset with the greatest predictive power . The module outputs a dataset that contains the best fea ture columns, as ranked by predictive power. It als o outputs the names of the features and their scores from the selected metric. Step 2: Build a counting transform A counting transform creates a transformation that turns count tables into features, so that you can a pply the transformation to multiple datasets. Step 3: Test the hypothesis using t-Test", "references": "https://docs.microsoft.com/bs-latn-ba/azure/machine -learning/studio-module-reference/filter-based-feat ure- selection https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/build-counting-transfo rm" }, { "question": "You are performing feature engineering on a dataset . You must add a feature named CityName and populate the column value with the text London. You need to add the new feature to the dataset. Which Azure Machine Learning Studio module should y ou use?", "options": [ "A. Extract N-Gram Features from Text", "B. Edit Metadata", "C. Preprocess Text", "D. Apply SQL Transformation" ], "correct": "B. Edit Metadata", "explanation": "Typical metadata changes might include marking colu mns as features.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/edit-metadata" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You are creating a model to predict the price of a student's artwork depending on the following variab les: the student's length of education, degree type, and art form. You start by creating a linear regression model. You need to evaluate the linear regression model. D283ABFBEDB32CDCE3B3406B9C29DB2F Solution: Use the following metrics: Mean Absolute Error, Root Mean Absolute Error, Relative Absolute Error, Relative Squared Error, and the Coefficient of Dete rmination. Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "A. Yes", "explanation": "The following metrics are reported for evaluating r egression models. When you compare models, they are ranked by the metric you select for evaluation. Mean absolute error (MAE) measures how close the pr edictions are to the actual outcomes; thus, a lower score is better. Root mean squared error (RMSE) creates a single val ue that summarizes the error in the model. By squar ing the difference, the metric disregards the differenc e between over-prediction and under-prediction. Relative absolute error (RAE) is the relative absol ute difference between expected and actual values; relative because the mean difference is divided by the arith metic mean. Relative squared error (RSE) similarly normalizes t he total squared error of the predicted values by d ividing by the total squared error of the actual values. Mean Zero One Error (MZOE) indicates whether the pr ediction was correct or not. In other words: ZeroOneLoss(x,y) = 1 when x!=y; otherwise 0. Coefficient of determination, often referred to as R2, represents the predictive power of the model as a value between 0 and 1. Zero means the model is random (ex plains nothing); 1 means there is a perfect fit. Ho wever, caution should be used in interpreting R2 values, a s low values can be entirely normal and high values can be suspect. AUC.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/evaluate-model" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You are creating a model to predict the price of a student's artwork depending on the following variab les: the student's length of education, degree type, and art form. You start by creating a linear regression model. You need to evaluate the linear regression model. Solution: Use the following metrics: Accuracy, Prec ision, Recall, F1 score, and AUC. D283ABFBEDB32CDCE3B3406B9C29DB2F Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "Those are metrics for evaluating classification mod els, instead use: Mean Absolute Error, Root Mean Ab solute Error, Relative Absolute Error, Relative Squared Er ror, and the Coefficient of Determination.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/evaluate-model" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You are creating a model to predict the price of a student's artwork depending on the following variab les: the student's length of education, degree type, and art form. You start by creating a linear regression model. You need to evaluate the linear regression model. Solution: Use the following metrics: Relative Squar ed Error, Coefficient of Determination, Accuracy, P recision, Recall, F1 score, and AUC. Does the solution meet the goal?", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "Relative Squared Error, Coefficient of Determinatio n are good metrics to evaluate the linear regressio n model, but the others are metrics for classification model s.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/evaluate-model" }, { "question": "You are a data scientist creating a linear regressi on model. You need to determine how closely the data fits the regression line. Which metric should you review? D283ABFBEDB32CDCE3B3406B9C29DB2F", "options": [ "A. Root Mean Square Error", "B. Coefficient of determination", "C. Recall", "D. Precision" ], "correct": "B. Coefficient of determination", "explanation": "Coefficient of determination, often referred to as R2, represents the predictive power of the model as a value between 0 and 1. Zero means the model is random (ex plains nothing); 1 means there is a perfect fit. Ho wever, caution should be used in interpreting R2 values, a s low values can be entirely normal and high values can be suspect. Incorrect Answers: A: Root mean squared error (RMSE) creates a single value that summarizes the error in the model. By squaring the difference, the metric disregards the difference between over-prediction and under-predic tion. C: Recall is the fraction of all correct results re turned by the model. D: Precision is the proportion of true results over all positive results. E: Mean absolute error (MAE) measures how close the predictions are to the actual outcomes; thus, a lo wer score is better.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/evaluate-model" }, { "question": "You are creating a binary classification by using a two-class logistic regression model. You need to evaluate the model results for imbalanc e. Which evaluation metric should you use?", "options": [ "A. Relative Absolute Error", "B. AUC Curve", "C. Mean Absolute Error", "D. Relative Squared Error" ], "correct": "B. AUC Curve", "explanation": "One can inspect the true positive rate vs. the fals e positive rate in the Receiver Operating Character istic (ROC) curve and the corresponding Area Under the Curve (A UC) value. The closer this curve is to the upper le ft corner; the better the classifier's performance is (that is maximizing the true positive rate while mi nimizing the false positive rate). Curves that are close to the diagonal of the plot, result from classifiers that tend to make predictions that are close to random guessing. D283ABFBEDB32CDCE3B3406B9C29DB2F", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio/evaluate-model-performance#evaluating-a - binary-classification-model" }, { "question": "HOTSPOT You are developing a linear regression model in Azu re Machine Learning Studio. You run an experiment t o compare different algorithms. The following image displays the results dataset ou tput: Use the drop-down menus to select the answer choice that answers each question based on the informatio n presented in the image. NOTE: Each correct selection is worth one point. Hot Area:", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: Boosted Decision Tree Regression Mean absolute error (MAE) measures how close the pr edictions are to the actual outcomes; thus, a lower score is better. Box 2: Online Gradient Descent: If you want the algorithm to find the best parameters for you, set Create tra iner mode option to Parameter Range. You can then specify mul tiple values for the algorithm to try.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/evaluate-model https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/linear-regression" }, { "question": "HOTSPOT You are using a decision tree algorithm. You have t rained a model that generalizes well at a tree dept h equal to 10. You need to select the bias and variance properties of the model with varying tree depth values. Which properties should you select for each tree de pth? To answer, select the appropriate options in t he answer area. Hot Area: D283ABFBEDB32CDCE3B3406B9C29DB2F", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "In decision trees, the depth of the tree determines the variance. A complicated decision tree (e.g. de ep) has low bias and high variance. Note: In statistics and machine learning, the biasv ariance tradeoff is the property of a set of predic tive models whereby models with a lower bias in parameter estim ation have a higher variance of the parameter estim ates across samples, and vice versa. Increasing the bias will decrease the variance. Increasing the varianc e will decrease the bias.", "references": "D283ABFBEDB32CDCE3B3406B9C29DB2F https://machinelearningmastery.com/gentle-introduct ion-to-the-bias-variance-trade-off-in-machine-learn ing/" }, { "question": "DRAG DROP You have a model with a large difference between th e training and validation error values. You must create a new model and perform cross-valid ation. You need to identify a parameter set for the new mo del using Azure Machine Learning Studio. Which module you should use for each step? To answe r, drag the appropriate modules to the correct step s. Each module may be used once or more than once, or not at all. You may need to drag the split bar betw een panes or scroll to view content. NOTE: Each correct selection is worth one point. Select and Place: A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: Split data Box 2: Partition and Sample D283ABFBEDB32CDCE3B3406B9C29DB2F Box 3: Two-Class Boosted Decision Tree Box 4: Tune Model Hyperparameters Integrated train and tune: You configure a set of p arameters to use, and then let the module iterate o ver multiple combinations, measuring accuracy until it finds a \"best\" model. With most learner modules, yo u can choose which parameters should be changed during th e training process, and which should remain fixed. We recommend that you use Cross-Validate Model to e stablish the goodness of the model given the specif ied parameters. Use Tune Model Hyperparameters to ident ify the optimal parameters.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/partition-and-sample" }, { "question": "HOTSPOT You are analyzing the asymmetry in a statistical di stribution. The following image contains two density curves tha t show the probability distribution of two datasets . Use the drop-down menus to select the answer choice that answers each question based on the informatio n presented in the graphic. NOTE: Each correct selection is worth one point. Hot Area: D283ABFBEDB32CDCE3B3406B9C29DB2F", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: Positive skew Positive skew values means the distribution is skew ed to the right. Box 2: Negative skew Negative skewness values mean the distribution is s kewed to the left.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/compute-elementary- statistics" }, { "question": "You are a data scientist building a deep convolutio nal neural network (CNN) for image classification. The CNN model you build shows signs of overfitting. You need to reduce overfitting and converge the mod el to an optimal fit. Which two actions should you perform? Each correct answer presents a complete solution. NOTE: Each correct selection is worth one point.", "options": [ "A. Add an additional dense layer with 512 input unit s.", "B. Add L1/L2 regularization.", "C. Use training data augmentation.", "D. Reduce the amount of training data." ], "correct": "", "explanation": "B: Weight regularization provides an approach to re duce the overfitting of a deep learning neural netw ork model on the training data and improve the performance of the model on new data, such as the holdout test se t. Keras provides a weight regularization API that allows yo u to add a penalty for weight size to the loss func tion. Three different regularizer instances are provided; they are: L1: Sum of the absolute weights. L2: Sum of the squared weights. L1L2: Sum of the absolute and the squared weights. D: Because a fully connected layer occupies most of the parameters, it is prone to overfitting. One me thod to reduce overfitting is dropout. At each training sta ge, individual nodes are either \"dropped out\" of th e net with probability 1-p or kept with probability p, so that a reduced network is left; incoming and outgoing e dges to a dropped-out node are also removed. By avoiding training all nodes on all training data , dropout decreases overfitting.", "references": "https://machinelearningmastery.com/how-to-reduce-ov erfitting-in-deep-learning-with-weight-regularizati on/ https://en.wikipedia.org/wiki/Convolutional_neural_ network" }, { "question": "Note: This question is part of a series of question s that present the same scenario. Each question in the series contains a unique solution that might meet the stat ed goals. Some question sets might have more than o ne correct solution, while others might not have a cor rect solution. After you answer a question in this section, you wi ll NOT be able to return to it. As a result, these questions will not appear in the review screen. You are creating a model to predict the price of a student's artwork depending on the following variab les: the student's length of education, degree type, and art form. You start by creating a linear regression model. You need to evaluate the linear regression model. Solution: Use the following metrics: Mean Absolute Error, Root Mean Absolute Error, Relative Absolute Error, Accuracy, Precision, Recall, F1 score, and AUC. Does the solution meet the goal?", "options": [ "A. Yes", "B. No Correct Answer: B" ], "correct": "", "explanation": "Accuracy, Precision, Recall, F1 score, and AUC are metrics for evaluating classification models. Note: Mean Absolute Error, Root Mean Absolute Error , Relative Absolute Error are OK for the linear reg ression model.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/evaluate-model D283ABFBEDB32CDCE3B3406B9C29DB2F" }, { "question": "You are building a binary classification model by u sing a supplied training set. The training set is imbalanced between two classes. You need to resolve the data imbalance. What are three possible ways to achieve this goal? Each correct answer presents a complete solution. NOTE: Each correct selection is worth one point.", "options": [ "A. Penalize the classification", "B. Resample the dataset using undersampling or overs ampling", "C. Normalize the training feature set", "D. Generate synthetic samples in the minority class" ], "correct": "", "explanation": "A: Try Penalized Models You can use the same algorithms but give them a dif ferent perspective on the problem. Penalized classi fication imposes an additional cost on the model for making classification mistakes on the minority class durin g training. These penalties can bias the model to pay more atte ntion to the minority class. B: You can change the dataset that you use to build your predictive model to have more balanced data. This change is called sampling your dataset and there ar e two main methods that you can use to even-up the classes: Consider testing under-sampling when you have an a lot data (tens- or hundreds of thousands of instanc es or more) Consider testing over-sampling when you don't have a lot of data (tens of thousands of records or less ) D: Try Generate Synthetic Samples A simple way to generate synthetic samples is to ra ndomly sample the attributes from instances in the minority class.", "references": "https://machinelearningmastery.com/tactics-to-comba t-imbalanced-classes-in-your-machine-learning-datas et/" }, { "question": "HOTSPOT You train a classification model by using a decisio n tree algorithm. You create an estimator by running the following Py thon code. The variable feature_names is a list of all feature names, and class_names is a list of all class names . from interpret.ext.blackbox import TabularExplainer explainer = TabularExplainer(model, x_train, features=feature_names, classes=class_names) You need to explain the predictions made by the mod el for all classes by determining the importance of all features. D283ABFBEDB32CDCE3B3406B9C29DB2F For each of the following statements, select Yes if the statement is true. Otherwise, select No. NOTE: Each correct selection is worth one point. Hot Area:", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Box 1: Yes TabularExplainer calls one of the three SHAP explai ners underneath (TreeExplainer, DeepExplainer, or KernelExplainer). Box 2: Yes To make your explanations and visualizations more i nformative, you can choose to pass in feature names and output class names if doing classification. Box 3: No TabularExplainer automatically selects the most app ropriate one for your use case, but you can call ea ch of its three underlying explainers underneath (TreeExplain er, DeepExplainer, or KernelExplainer) directly.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-machine-learning-interpretability-aml" }, { "question": "DRAG DROP You have several machine learning models registered in an Azure Machine Learning workspace. D283ABFBEDB32CDCE3B3406B9C29DB2F You must use the Fairlearn dashboard to assess fair ness in a selected model. Which three actions should you perform in sequence? To answer, move the appropriate actions from the l ist of actions to the answer area and arrange them in the correct order. Select and Place: A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "Step 1: Select a model feature to be evaluated. Step 2: Select a binary classification or regressio n model. Register your models within Azure Machine Learning. For convenience, store the results in a dictionary , which maps the id of the registered model (a string in na me:version format) to the predictor itself. Example: model_dict = {} lr_reg_id = register_model(\"fairness_logistic_regre ssion\", lr_predictor) model_dict[lr_reg_id] = lr_pr edictor svm_reg_id = register_model(\"fairness_svm\", svm_pre dictor) model_dict[svm_reg_id] = svm_predictor Step 3: Select a metric to be measured Precompute fairness metrics. Create a dashboard dictionary using Fairlearn's met rics package.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-machine-learning-fairness-aml D283ABFBEDB32CDCE3B3406B9C29DB2F" }, { "question": "HOTSPOT A biomedical research company plans to enroll peopl e in an experimental medical treatment trial. You create and train a binary classification model to support selection and admission of patients to t he trial. The model includes the following features: Age, Gender, and Ethnicity. The model returns different performance metrics for people from different ethnic groups. You need to use Fairlearn to mitigate and minimize disparities for each category in the Ethnicity feat ure. Which technique and constraint should you use? To a nswer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area:", "options": [ "A. B.", "C.", "D." ], "correct": "", "explanation": "Box 1: Grid Search Fairlearn open-source package provides postprocessi ng and reduction unfairness mitigation algorithms: ExponentiatedGradient, GridSearch, and ThresholdOpt imizer. Note: The Fairlearn open-source package provides po stprocessing and reduction unfairness mitigation algorithms types: Reduction: These algorithms take a standard black-b ox machine learning estimator (e.g., a LightGBM model) and generate a set of retrained models using a sequence of re-weighted training datasets. Post- processing: These algorithms take an existing class ifier and the sensitive feature as input. Box 2: Demographic parity The Fairlearn open-source package supports the foll owing types of parity constraints: Demographic pari ty, Equalized odds, Equal opportunity, and Bounded grou p loss.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/concept-fairness-ml D283ABFBEDB32CDCE3B3406B9C29DB2F Implement Responsible ML Testlet 2 Case study This is a case study. Case studies are not timed se parately. You can use as much exam time as you woul d like to complete each case. However, there may be additi onal case studies and sections on this exam. You mu st manage your time to ensure that you are able to com plete all questions included on this exam in the ti me provided. To answer the questions included in a case study, y ou will need to reference information that is provi ded in the case study. Case studies might contain exhibits and other resources that provide more information abou t the scenario that is described in the case study. Each question is independent of the other questions in t his case study. At the end of this case study, a review screen will appear. This screen allows you to review your answ ers and to make changes before you move to the next section of the exam. After you begin a new section, you canno t return to this section. To start the case study To display the first question in this case study, c lick the Next button. Use the buttons in the left p ane to explore the content of the case study before you answer the questions. Clicking these buttons displays informa tion such as business requirements, existing environment, and problem statements. If the case study has an All Information tab, note that the information displaye d is identical to the information displayed on the subsequent tabs. When you are ready to answer a question, clic k the Question button to return to the question. Overview You are a data scientist for Fabrikam Residences, a company specializing in quality private and commer cial property in the United States. Fabrikam Residences is considering expanding into Europe and has asked you to investigate prices for private residences in major European cities. You use Azure Machine Learning Stu dio to measure the median value of properties. You produce a regression model to predict property prices by u sing the Linear Regression and Bayesian Linear Regressio n modules. Datasets There are two datasets in CSV format that contain p roperty details for two cities, London and Paris. Y ou add both files to Azure Machine Learning Studio as sepa rate datasets to the starting point for an experime nt. Both datasets contain the following columns: D283ABFBEDB32CDCE3B3406B9C29DB2F An initial investigation shows that the datasets ar e identical in structure apart from the MedianValue column. The smaller Paris dataset contains the MedianValue in text format, whereas the larger London dataset c ontains the MedianValue in numerical format. Data issues Missing values The AccessibilityToHighway column in both datasets contains missing values. The missing data must be replaced with new data so that it is modeled condit ionally using the other variables in the data befor e filling in the missing values. Columns in each dataset contain missing and null va lues. The datasets also contain many outliers. The Age column has a high proportion of outliers. You need to remove the rows that have outliers in the Age co lumn. The MedianValue and AvgRoomsInHouse columns both ho ld data in numeric format. You need to select a feature selection algorithm to analyze the relation ship between the two columns in more detail. Model fit The model shows signs of overfitting. You need to p roduce a more refined regression model that reduces the overfitting. Experiment requirements You must set up the experiment to cross-validate th e Linear Regression and Bayesian Linear Regression modules to evaluate performance. In each case, the predictor of the dataset is the column named MedianValue. You must ensure that the datatype of t he MedianValue column of the Paris dataset matches the structure of the London dataset. You must prioritize the columns of data for predict ing the outcome. You must use non-parametric statis tics to measure relationships. You must use a feature selection algorithm to analy ze the relationship between the MedianValue and AvgRoomsInHouse columns. Model training Permutation Feature Importance D283ABFBEDB32CDCE3B3406B9C29DB2F Given a trained model and a test dataset, you must compute the Permutation Feature Importance scores o f feature variables. You must be determined the absol ute fit for the model. Hyperparameters You must configure hyperparameters in the model lea rning process to speed the learning phase. In addit ion, this configuration should cancel the lowest perform ing runs at each evaluation interval, thereby direc ting effort and resources towards models that are more likely t o be successful. You are concerned that the model might not efficien tly use compute resources in hyperparameter tuning. You also are concerned that the model might prevent an increase in the overall tuning time. Therefore, mus t implement an early stopping criterion on models tha t provides savings without terminating promising jo bs. Testing You must produce multiple partitions of a dataset b ased on sampling using the Partition and Sample mod ule in Azure Machine Learning Studio. Cross-validation You must create three equal partitions for cross-va lidation. You must also configure the cross-validat ion process so that the rows in the test and training d atasets are divided evenly by properties that are n ear each city's main river. You must complete this task befo re the data goes through the sampling process. Linear regression module When you train a Linear Regression module, you must determine the best features to use in a model. You can choose standard metrics provided to measure perform ance before and after the feature importance proces s completes. The distribution of features across mult iple training models must be consistent. Data visualization You need to provide the test results to the Fabrika m Residences team. You create data visualizations t o aid in presenting the results. You must produce a Receiver Operating Characteristi c (ROC) curve to conduct a diagnostic test evaluati on of the model. You need to select appropriate methods f or producing the ROC curve in Azure Machine Learnin g Studio to compare the Two-Class Decision Forest and the Two-Class Decision Jungle modules with one another." }, { "question": "DRAG DROP You need to correct the model fit issue. Which three actions should you perform in sequence? To answer, move the appropriate actions from the l ist of actions to the answer area and arrange them in the correct order. Select and Place: D283ABFBEDB32CDCE3B3406B9C29DB2F A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "Step 1: Augment the data Scenario: Columns in each dataset contain missing a nd null values. The datasets also contain many outl iers. Step 2: Add the Bayesian Linear Regression module. Scenario: You produce a regression model to predict property prices by using the Linear Regression and Bayesian Linear Regression modules. Step 3: Configure the regularization weight. Regularization typically is used to avoid overfitti ng. For example, in L2 regularization weight, type the value to use as the weight for L2 regularization. We recomme nd that you use a non-zero value to avoid overfitti ng. Scenario: Model fit: The model shows signs of overfitting. Yo u need to produce a more refined regression model t hat reduces the overfitting. Incorrect Answers: Multiclass Decision Jungle module: Decision jungles are a recent extension to decision forests. A decision jungle consists of an ensemble of decision directed acyclic graphs (DAGs). L-BFGS: L-BFGS stands for \"limited memory Broyden-Fletcher- Goldfarb-Shanno\". It can be found in the wwo-Class D283ABFBEDB32CDCE3B3406B9C29DB2F Logistic Regression module, which is used to create a logistic regression model that can be used to pr edict two (and only two) outcomes.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/linear-regression D283ABFBEDB32CDCE3B3406B9C29DB2F Mixed Questions Question Set 1" }, { "question": "DRAG DROP You are planning to host practical training to acqu aint staff with Docker for Windows. Staff devices must support the installation of Dock er. Which of the following are requirements for this in stallation? Answer by dragging the correct options from the list to the answer area. Select and Place: A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "Explanation/Reference:", "references": "https://docs.docker.com/toolbox/toolbox_install_win dows/ https://blogs.technet.microsoft.com/canitpro/2015/0 9/08/step-by-step-enabling-hyper-v-for-use-on-windo ws-10/ https://docs.docker.com/docker-for-windows/install/ D283ABFBEDB32CDCE3B3406B9C29DB2F" }, { "question": "HOTSPOT Complete the sentence by selecting the correct opti on in the answer area. Hot Area:", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "A Deep Learning Virtual Machine is a pre-configured environment for deep learning using GPU instances.", "references": "" }, { "question": "You need to implement a Data Science Virtual Machin e (DSVM) that supports the Caffe2 deep learning framework. Which of the following DSVM should you create?", "options": [ "A. Windows Server 2012 DSVM", "B. Windows Server 2016 DSVM", "C. Ubuntu 16.04 DSVM", "D. CentOS 7.4 DSVM" ], "correct": "C. Ubuntu 16.04 DSVM", "explanation": "Caffe2 is supported by Data Science Virtual Machine for Linux. Microsoft offers Linux editions of the DSVM on Ubuntu 16.04 LTS and CentOS 7.4. However, only the DSVM on Ubuntu is preconfigured for Caffe2.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/data-science-virtual-machine/overview" }, { "question": "This question is included in a number of questions that depicts the identical set-up. However, every q uestion has a distinctive result. Establish if the recommen dation satisfies the requirements. D283ABFBEDB32CDCE3B3406B9C29DB2F You have been tasked with employing a machine learn ing model, which makes use of a PostgreSQL database and needs GPU processing, to forecast prices. You are preparing to create a virtual machine that has the necessary tools built into it. You need to make use of the correct virtual machine type. Recommendation: You make use of a Geo AI Data Scien ce Virtual Machine (Geo-DSVM) Windows edition. Will the requirements be satisfied?", "options": [ "A. Yes", "B. No Correct Answer: B" ], "correct": "", "explanation": "The Azure Geo AI Data Science VM (Geo-DSVM) deliver s geospatial analytics capabilities from Microsoft' s Data Science VM. Specifically, this VM extends the AI and data science toolkits in the Data Science VM by adding ESRI's market-leading ArcGIS Pro Geographic Information System.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/data-science-virtual-machine/overview" }, { "question": "This question is included in a number of questions that depicts the identical set-up. However, every q uestion has a distinctive result. Establish if the recommen dation satisfies the requirements. You have been tasked with employing a machine learn ing model, which makes use of a PostgreSQL database and needs GPU processing, to forecast prices. You are preparing to create a virtual machine that has the necessary tools built into it. You need to make use of the correct virtual machine type. Recommendation: You make use of a Deep Learning Vir tual Machine (DLVM) Windows edition. Will the requirements be satisfied?", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "DLVM is a template on top of DSVM image. In terms o f the packages, GPU drivers etc are all there in th e DSVM image. Mostly it is for convenience during cre ation where we only allow DLVM to be created on GPU VM instances on Azure.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/data-science-virtual-machine/overview D283ABFBEDB32CDCE3B3406B9C29DB2F" }, { "question": "This question is included in a number of questions that depicts the identical set-up. However, every q uestion has a distinctive result. Establish if the recommen dation satisfies the requirements. You have been tasked with employing a machine learn ing model, which makes use of a PostgreSQL database and needs GPU processing, to forecast prices. You are preparing to create a virtual machine that has the necessary tools built into it. You need to make use of the correct virtual machine type. Recommendation: You make use of a Data Science Virt ual Machine (DSVM) Windows edition. Will the requirements be satisfied?", "options": [ "A. Yes", "B. No" ], "correct": "A. Yes", "explanation": "In the DSVM, your training models can use deep lear ning algorithms on hardware that's based on graphic s processing units (GPUs). PostgreSQL is available for the following operating systems: Linux (all recent distributions), 64-bit installers available for macOS (OS X) version 10.6 and newer Windows (with installers available for 64-bit versi on; tested on latest versions and back to Windows 2012 R2.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/data-science-virtual-machine/overview" }, { "question": "DRAG DROP You have been tasked with moving data into Azure Bl ob Storage for the purpose of supporting Azure Mach ine Learning. Which of the following can be used to complete your task? Answer by dragging the correct options from the list to the answer area. Select and Place: D283ABFBEDB32CDCE3B3406B9C29DB2F A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "You can move data to and from Azure Blob storage us ing different technologies: Azure Storage-Explorer AzCopy Python SSIS Reference: https://docs.microsoft.com/en-us/azure/machine-lear ning/team-data-science-process/move-azure-blob", "references": "" }, { "question": "HOTSPOT Complete the sentence by selecting the correct opti on in the answer area. Hot Area: D283ABFBEDB32CDCE3B3406B9C29DB2F", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Use the Convert to ARFF module in Azure Machine Lea rning Studio, to convert datasets and results in Az ure Machine Learning to the attribute-relation file for mat used by the Weka toolset. This format is known as ARFF. The ARFF data specification for Weka supports multi ple machine learning tasks, including data preproce ssing, classification, and feature selection. In this form at, data is organized by entities and their attribu tes, and is contained in a single text file.", "references": "https://docs.microsoft.com/en-us/azure/m achine-learning/studio-module-reference/convert-to- arff D283ABFBEDB32CDCE3B3406B9C29DB2F" }, { "question": "You have been tasked with designing a deep learning model, which accommodates the most recent edition of Python, to recognize language. You have to include a suitable deep learning framew ork in the Data Science Virtual Machine (DSVM). Which of the following actions should you take?", "options": [ "A. You should consider including Rattle.", "B. You should consider including TensorFlow.", "C. You should consider including Theano.", "D. You should consider including Chainer.", "A. Yes", "B. No" ], "correct": "B. No", "explanation": "Explanation/Reference:", "references": "https://www.infoworld.com/article/327800 8/what-is-tensorflow-the-machine-learning-library- explained.html QUESTION 248 This question is included in a number of questions that depicts the identical set-up. However, every q uestion has a distinctive result. Establish if the recommen dation satisfies the requirements. You have been tasked with evaluating your model on a partial data sample via k-fold cross-validation. You have already configured a k parameter as the nu mber of splits. You now have to configure the k par ameter for the cross-validation with the usual value choic e. Recommendation: You configure the use of the value k=3. Will the requirements be satisfied? D283ABFBEDB32CDCE3B3406B9C29DB2F" }, { "question": "has a distinctive result. Establish if the recommen dation satisfies the requirements. You have been tasked with evaluating your model on a partial data sample via k-fold cross-validation. You have already configured a k parameter as the nu mber of splits. You now have to configure the k par ameter for the cross-validation with the usual value choic e. Recommendation: You configure the use of the value k=10. Will the requirements be satisfied?", "options": [ "A. Yes", "B. No" ], "correct": "A. Yes", "explanation": "Leave One Out (LOO) cross-validation Setting K = n (the number of observations) yields n -fold and is called leave-one out cross-validation (LOO), a special case of the K-fold approach. LOO CV is sometimes useful but typically doesn't sh ake up the data enough. The estimates from each fol d are highly correlated and hence their average can have high variance. This is why the usual choice is K=5 or 10. It provides a good compromise for the bias-variance tr adeoff.", "references": "" }, { "question": "You construct a machine learning experiment via Azu re Machine Learning Studio. You would like to split data into two separate data sets. Which of the following actions should you take?", "options": [ "A. You should make use of the Split Data module.", "B. You should make use of the Group Categorical Valu es module.", "C. You should make use of the Clip Values module.", "D. You should make use of the Group Data into Bins m odule." ], "correct": "D. You should make use of the Group Data into Bins m odule.", "explanation": "The Group Data into Bins module supports multiple o ptions for binning data. You can customize how the bin edges are set and how values are apportioned into t he bins.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/group-data-into-bins" }, { "question": "You have been tasked with creating a new Azure pipe line via the Machine Learning designer. You have to makes sure that the pipeline trains a m odel using data in a comma-separated values (CSV) f ile that is published on a website. A dataset for the f ile for this file does not exist. Data from the CSV file must be ingested into the de signer pipeline with the least amount of administra tive effort as possible. Which of the following actions should you take?", "options": [ "A. You should make use of the Convert to TXT module.", "B. You should add the Copy Data object to the pipeli ne.", "C. You should add the Import Data object to the pipe line.", "D. You should add the Dataset object to the pipeline ." ], "correct": "D. You should add the Dataset object to the pipeline .", "explanation": "The preferred way to provide data to a pipeline is a Dataset object. The Dataset object points to data that lives in or is accessible from a datastore or at a Web UR L. The Dataset class is abstract, so you will creat e an instance of either a FileDataset (referring to one or more files) or a TabularDataset that's created b y from one or more files with delimited columns of data. Example: from azureml.core import Dataset D283ABFBEDB32CDCE3B3406B9C29DB2F iris_tabular_dataset = Dataset.Tabular.from_delimit ed_files([(def_blob_store, 'train-dataset/iris.csv' )])", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-create-your-first-pipeline" }, { "question": "This question is included in a number of questions that depicts the identical set-up. However, every q uestion has a distinctive result. Establish if the recommen dation satisfies the requirements. You are in the process of creating a machine learni ng model. Your dataset includes rows with null and missing values. You plan to make use of the Clean Missing Data modu le in Azure Machine Learning Studio to detect and f ix the null and missing values in the dataset. Recommendation: You make use of the Replace with me dian option. Will the requirements be satisfied?", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "Explanation/Reference:", "references": "https://docs.microsoft.com/en-us/azure/m achine-learning/studio-module-reference/clean-missi ng- data" }, { "question": "This question is included in a number of questions that depicts the identical set-up. However, every q uestion has a distinctive result. Establish if the recommen dation satisfies the requirements. You are in the process of creating a machine learni ng model. Your dataset includes rows with null and missing values. You plan to make use of the Clean Missing Data modu le in Azure Machine Learning Studio to detect and f ix the null and missing values in the dataset. Recommendation: You make use of the Custom substitu tion value option. Will the requirements be satisfied?", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "Explanation/Reference:", "references": "https://docs.microsoft.com/en-us/azure/m achine-learning/studio-module-reference/clean-missi ng- data D283ABFBEDB32CDCE3B3406B9C29DB2F" }, { "question": "This question is included in a number of questions that depicts the identical set-up. However, every q uestion has a distinctive result. Establish if the recommen dation satisfies the requirements. You are in the process of creating a machine learni ng model. Your dataset includes rows with null and missing values. You plan to make use of the Clean Missing Data modu le in Azure Machine Learning Studio to detect and f ix the null and missing values in the dataset. Recommendation: You make use of the Remove entire r ow option. Will the requirements be satisfied?", "options": [ "A. Yes", "B. No" ], "correct": "A. Yes", "explanation": "Remove entire row: Completely removes any row in th e dataset that has one or more missing values. This is useful if the missing value can be considered rando mly missing.", "references": "https://docs.microsoft.com/en-us/azure/m achine-learning/studio-module-reference/clean-missi ng- data" }, { "question": "You need to consider the underlined segment to esta blish whether it is accurate. To transform a categorical feature into a binary in dicator, you should make use of the Clean Missing D ata module. D283ABFBEDB32CDCE3B3406B9C29DB2F Select \"No adjustment required\" if the underlined s egment is accurate. If the underlined segment is in accurate, select the accurate option.", "options": [ "A. No adjustment required.", "B. Convert to Indicator Values", "C. Apply SQL Transformation", "D. Group Categorical Values" ], "correct": "B. Convert to Indicator Values", "explanation": "Use the Convert to Indicator Values module in Azure Machine Learning Studio. The purpose of this modul e is to convert columns that contain categorical values into a series of binary indicator columns that can more easily be used as features in a machine learning model.", "references": "https://docs.microsoft.com/en-us/azure/m achine-learning/studio-module-reference/convert-to- indicator-values" }, { "question": "You need to consider the underlined segment to esta blish whether it is accurate. To improve the amount of low incidence cases in a d ataset, you should make use of the SMOTE module. Select \"No adjustment required\" if the underlined s egment is accurate. If the underlined segment is in accurate, select the accurate option.", "options": [ "A. No adjustment required.", "B. Remove Duplicate Rows", "C. Join Data", "D. Edit Metadata" ], "correct": "A. No adjustment required.", "explanation": "Use the SMOTE module in Azure Machine Learning Stud io to increase the number of underrepresented cases in a dataset used for machine learning. SMOTE is a better way of increasing the number of rare cases t han simply duplicating existing cases.", "references": "https://docs.microsoft.com/en-us/azure/m achine-learning/studio-module-reference/smote" }, { "question": "HOTSPOT You need to consider the underlined segment to esta blish whether it is accurate. Hot Area: D283ABFBEDB32CDCE3B3406B9C29DB2F", "options": [ "A.", "B. C.", "D." ], "correct": "", "explanation": "The box-plot algorithm can be used to display outli ers.", "references": "https://medium.com/analytics-vidhya/what -is-an-outliers-how-to-detect-and-remove-them-which - algorithm-are-sensitive-towards-outliers-2d501993d5 9" }, { "question": "You are planning to host practical training to acqu aint learners with data visualization creation usin g Python. Learner devices are able to connect to the internet . Learner devices are currently NOT configured for Py thon development. Also, learners are unable to inst all software on their devices as they lack administrato r permissions. Furthermore, they are unable to acce ss Azure subscriptions. It is imperative that learners are able to execute Python-based data visualization code. Which of the following actions should you take?", "options": [ "A. You should consider configuring the use of Azure Container Instance.", "B. You should consider configuring the use of Azure BatchAI.", "C. You should consider configuring the use of Azure Notebooks.", "D. You should consider configuring the use of Azure Kubernetes Service." ], "correct": "", "explanation": "Explanation/Reference:", "references": "https://notebooks.azure.com/" }, { "question": "HOTSPOT Complete the sentence by selecting the correct opti on in the answer area. Hot Area:", "options": [ "A.", "B.", "C.", "D." ], "correct": "", "explanation": "Replace using Probabilistic PCA: Compared to other options, such as Multiple Imputation using Chained Equations (MICE), this option has the advantage of not requiring the application of predictors for eac h column. Instead, it approximates the covariance for the ful l dataset. Therefore, it might offer better perform ance for D283ABFBEDB32CDCE3B3406B9C29DB2F datasets that have missing values in many columns.", "references": "https://docs.microsoft.com/en-us/azure/m achine-learning/studio-module-reference/clean-missi ng- data" }, { "question": "You have recently concluded the construction of a b inary classification machine learning model. You are currently assessing the model. You want to make use of a visualization that allows for precisi on to be used as the measurement for the assessment. Which of the following actions should you take?", "options": [ "A. You should consider using Venn diagram visualizat ion.", "B. You should consider using Receiver Operating Char acteristic (ROC) curve visualization.", "C. You should consider using Box plot visualization.", "D. You should consider using the Binary classificati on confusion matrix visualization.", "A. Yes", "B. No" ], "correct": "B. No", "explanation": "Explanation/Reference:", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-understand-automated-ml#confusion-matri x QUESTION 261 This question is included in a number of questions that depicts the identical set-up. However, every q uestion has a distinctive result. Establish if the recommen dation satisfies the requirements. You have been tasked with evaluating your model on a partial data sample via k-fold cross-validation. You have already configured a k parameter as the nu mber of splits. You now have to configure the k par ameter for the cross-validation with the usual value choic e. Recommendation: You configure the use of the value k=1. Will the requirements be satisfied?" }, { "question": "DRAG DROP You are in the process of constructing a regression model. You would like to make it a Poisson regression mode l. To achieve your goal, the feature values need to meet certain conditions. Which of the following are relevant conditions with regards to the label data? Answer by dragging the correct D283ABFBEDB32CDCE3B3406B9C29DB2F options from the list to the answer area. Select and Place: A.", "options": [ "B.", "C.", "D." ], "correct": "", "explanation": "Poisson regression is intended for use in regressio n models that are used to predict numeric values, t ypically counts. Therefore, you should use this module to cr eate your regression model only if the values you a re trying to predict fit the following conditions: The response variable has a Poisson distribution. Counts cannot be negative. The method will fail out right if you attempt to use it with negative labels . A Poisson distribution is a discrete distribution; therefore, it is not meaningful to use this method with non- whole numbers.", "references": "https://docs.microsoft.com/en-us/azure/m achine-learning/studio-module-reference/poisson- regression" }, { "question": "This question is included in a number of questions that depicts the identical set-up. However, every q uestion has a distinctive result. Establish if the recommen dation satisfies the requirements. You are in the process of carrying out feature engi neering on a dataset. You want to add a feature to the dataset and fill t he column value. Recommendation: You must make use of the Group Cate gorical Values Azure Machine Learning Studio D283ABFBEDB32CDCE3B3406B9C29DB2F module. Will the requirements be satisfied?", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "Explanation/Reference:", "references": "" }, { "question": "This question is included in a number of questions that depicts the identical set-up. However, every q uestion has a distinctive result. Establish if the recommen dation satisfies the requirements. You are in the process of carrying out feature engi neering on a dataset. You want to add a feature to the dataset and fill t he column value. Recommendation: You must make use of the Join Data Azure Machine Learning Studio module. Will the requirements be satisfied?", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "Explanation Explanation/Reference:", "references": "" }, { "question": "This question is included in a number of questions that depicts the identical set-up. However, every q uestion has a distinctive result. Establish if the recommen dation satisfies the requirements. You are in the process of carrying out feature engi neering on a dataset. You want to add a feature to the dataset and fill t he column value. Recommendation: You must make use of the Edit Metad ata Azure Machine Learning Studio module. Will the requirements be satisfied?", "options": [ "A. Yes", "B. No" ], "correct": "A. Yes", "explanation": "D283ABFBEDB32CDCE3B3406B9C29DB2F Typical metadata changes might include marking colu mns as features.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/edit-metadata https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/join-data https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/group-categorical-valu es" }, { "question": "You have been tasked with ascertaining if two sets of data differ considerably. You will make use of A zure Machine Learning Studio to complete your task. You plan to perform a paired t-test. Which of the following are conditions that must app ly to use a paired t-test? (Choose all that apply.)", "options": [ "A. All scores are independent from each other.", "B. You have a matched pairs of scores.", "C. The sampling distribution of d is normal.", "D. The sampling distribution of x1- x2 is normal." ], "correct": "", "explanation": "Explanation Explanation/Reference:", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/test-hypothesis-using- t-test" }, { "question": "You want to train a classification model using data located in a comma-separated values (CSV) file. D283ABFBEDB32CDCE3B3406B9C29DB2F The classification model will be trained via the Au tomated Machine Learning interface using the Classi fication task type. You have been informed that only linear models need to be assessed by the Automated Machine Learning. Which of the following actions should you take?", "options": [ "A. You should disable deep learning.", "B. You should enable automatic featurization.", "C. You should disable automatic featurization.", "D. You should set the task type to Forecasting." ], "correct": "C. You should disable automatic featurization.", "explanation": "Explanation Explanation/Reference:", "references": "https://econml.azurewebsites.net/spec/estimation/dm l.html https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-use-automated-ml-for-ml-models" }, { "question": "You are preparing to train a regression model via a utomated machine learning. The data available to yo u has features with missing values, as well as categorica l features with little discrete values. You want to make sure that automated machine learni ng is configured as follows: missing values must be automatically imputed. categorical features must be encoded as part of the training task. Which of the following actions should you take?", "options": [ "A. You should make use of the featurization paramete r with the 'auto' value pair.", "B. You should make use of the featurization paramete r with the 'off' value pair.", "C. You should make use of the featurization paramete r with the 'on' value pair.", "D. You should make use of the featurization paramete r with the 'FeaturizationConfig' value pair.", "A. Fast Forest Quantile Regression", "B. Poisson Regression", "C. Boosted Decision Tree Regression", "D. Linear Regression" ], "correct": "C. Boosted Decision Tree Regression", "explanation": "Mean absolute error (MAE) measures how close the pr edictions are to the actual outcomes; thus, a lower score is better.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/algorithm-module-reference/boosted-decision-tr ee- regression https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/evaluate-model D283ABFBEDB32CDCE3B3406B9C29DB2F https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/linear-regression" }, { "question": "This question is included in a number of questions that depicts the identical set-up. However, every q uestion has a distinctive result. Establish if the recommen dation satisfies the requirements. You have been tasked with constructing a machine le arning model that translates language text into a d ifferent language text. The machine learning model must be constructed and trained to learn the sequence of the. Recommendation: You make use of Convolutional Neura l Networks (CNNs). Will the requirements be satisfied?", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "Explanation/Reference:", "references": "" }, { "question": "This question is included in a number of questions that depicts the identical set-up. However, every q uestion has a distinctive result. Establish if the recommen dation satisfies the requirements. You have been tasked with constructing a machine le arning model that translates language text into a d ifferent language text. The machine learning model must be constructed and trained to learn the sequence of the. Recommendation: You make use of Generative Adversar ial Networks (GANs). Will the requirements be satisfied?", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "Explanation/Reference:", "references": "" }, { "question": "has a distinctive result. Establish if the recommen dation satisfies the requirements. You have been tasked with constructing a machine le arning model that translates language text into a d ifferent language text. The machine learning model must be constructed and trained to learn the sequence of the. D283ABFBEDB32CDCE3B3406B9C29DB2F Recommendation: You make use of Recurrent Neural Ne tworks (RNNs). Will the requirements be satisfied?", "options": [ "A. Yes", "B. No" ], "correct": "A. Yes", "explanation": "Note: RNNs are designed to take sequences of text a s inputs or return sequences of text as outputs, or both. They're called recurrent because the network's hidd en layers have a loop in which the output and cell state from each time step become inputs at the next time step. This recurrence serves as a form of memory. It all ows contextual information to flow through the network so that relevant outputs from previous time steps c an be applied to network operations at the current time s tep.", "references": "https://towardsdatascience.com/language-translation -with-rnns-d84d43b40571" }, { "question": "DRAG DROP You have been tasked with evaluating the performanc e of a binary classification model that you created . You need to choose evaluation metrics to achieve yo ur goal. Which of the following are the metrics you would ch oose? Answer by dragging the correct options from t he list to the answer area. Select and Place: D283ABFBEDB32CDCE3B3406B9C29DB2F A.", "options": [ "B.", "C.", "D.", "B. C.", "D." ], "correct": "", "explanation": "Explanation Explanation/Reference:", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/two-class-neural-netwo rk" }, { "question": "You make use of Azure Machine Learning Studio to cr eate a binary classification model. You are preparing to carry out a parameter sweep of the model to tune hyperparameters. You have to mak e sure that the sweep allows for every possible combi nation of hyperparameters to be iterated. Also, the computing resources needed to carry out the sweep m ust be reduced. Which of the following actions should you take?", "options": [ "A. You should consider making use of the Selective g rid sweep mode.", "B. You should consider making use of the Measured gr id sweep mode.", "C. You should consider making use of the Entire grid sweep mode.", "D. You should consider making use of the Random grid sweep mode." ], "correct": "D. You should consider making use of the Random grid sweep mode.", "explanation": "Maximum number of runs on random grid: This option also controls the number of iterations over a rando m sampling of parameter values, but the values are no t generated randomly from the specified range; inst ead, a matrix is created of all possible combinations of p arameter values and a random sampling is taken over the matrix. This method is more efficient and less pron e to regional oversampling or undersampling. If you are training a model that supports an integr ated parameter sweep, you can also set a range of s eed values to use and iterate over the random seeds as well. This is optional, but can be useful for avoid ing bias introduced by seed selection. C: Entire grid: When you select this option, the mo dule loops over a grid predefined by the system, to try different combinations and identify the best learne r. This option is useful for cases where you don't know what the best parameter settings might be and want to tr y all possible combination of values.", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/studio-module-reference/tune-model- hyperparameters" }, { "question": "You are in the process of constructing a deep convo lutional neural network (CNN). The CNN will be used for image classification. You notice that the CNN model you constructed displ ays hints of overfitting. You want to make sure that overfitting is minimized , and that the model is converged to an optimal fit . Which of the following is TRUE with regards to achi eving your goal?", "options": [ "A. You have to add an additional dense layer with 51 2 input units, and reduce the amount of training da ta.", "B. You have to add L1/L2 regularization, and reduce th e amount of training data. C. You have to reduce the amount of training data and make use of training data augmentation.", "D. You have to add L1/L2 regularization, and make us e of training data augmentation." ], "correct": "B. You have to add L1/L2 regularization, and reduce th e amount of training data. C. You have to reduce the amount of training data and make use of training data augmentation.", "explanation": "B: Weight regularization provides an approach to re duce the overfitting of a deep learning neural netw ork model on the training data and improve the performance of the model on new data, such as the holdout test se t. Keras provides a weight regularization API that allows yo u to add a penalty for weight size to the loss func tion. Three different regularizer instances are provided; they are: L1: Sum of the absolute weights. L2: Sum of the squared weights. L1L2: Sum of the absolute and the squared weights. Because a fully connected layer occupies most of th e parameters, it is prone to overfitting. One metho d to reduce overfitting is dropout. At each training sta ge, individual nodes are either \"dropped out\" of th e net with probability 1-p or kept with probability p, so that a reduced network is left; incoming and outgoing e dges to a dropped-out node are also removed. By avoiding training all nodes on all training data , dropout decreases overfitting. D283ABFBEDB32CDCE3B3406B9C29DB2F", "references": "https://machinelearningmastery.com/how-to-reduce-ov erfitting-in-deep-learning-with-weight-regularizati on/ https://en.wikipedia.org/wiki/Convolutional_neural_ network" }, { "question": "This question is included in a number of questions that depicts the identical set-up. However, every q uestion has a distinctive result. Establish if the recommen dation satisfies the requirements. You are planning to make use of Azure Machine Learn ing designer to train models. You need choose a suitable compute type. Recommendation: You choose Attached compute. Will the requirements be satisfied?", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "Explanation/Reference:", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-create-attach-compute-studio" }, { "question": "This question is included in a number of questions that depicts the identical set-up. However, every q uestion has a distinctive result. Establish if the recommen dation satisfies the requirements. You are planning to make use of Azure Machine Learn ing designer to train models. You need choose a suitable compute type. Recommendation: You choose Inference cluster. Will the requirements be satisfied?", "options": [ "A. Yes", "B. No" ], "correct": "B. No", "explanation": "Explanation/Reference:", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-create-attach-compute-studio" }, { "question": "This question is included in a number of questions that depicts the identical set-up. However, every q uestion has a distinctive result. Establish if the recommen dation satisfies the requirements. You are planning to make use of Azure Machine Learn ing designer to train models. D283ABFBEDB32CDCE3B3406B9C29DB2F You need choose a suitable compute type. Recommendation: You choose Compute cluster. Will the requirements be satisfied?", "options": [ "A. Yes", "B. No" ], "correct": "A. Yes", "explanation": "Explanation/Reference:", "references": "https://docs.microsoft.com/en-us/azure/machine-lear ning/how-to-create-attach-compute-studio" }, { "question": "You are making use of the Azure Machine Learning to designer construct an experiment. After dividing a dataset into training and testing sets, you configure the algorithm to be Two-Class B oosted Decision Tree. You are preparing to ascertain the Area Under the C urve (AUC). Which of the following is a sequential combination of the models required to achieve your goal?", "options": [ "A. Train, Score, Evaluate.", "B. Score, Evaluate, Train.", "C. Evaluate, Export Data, Train.", "D. Train, Score, Export Data." ], "correct": "A. Train, Score, Evaluate.", "explanation": "Explanation/Reference: D283ABFBEDB32CDCE3B3406B9C29DB2F", "references": "" } ]