File size: 218,174 Bytes
6493548
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
2255
2256
2257
2258
2259
2260
2261
2262
2263
2264
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295
2296
2297
2298
2299
2300
2301
2302
2303
2304
2305
2306
2307
2308
2309
2310
2311
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330
2331
2332
2333
2334
2335
2336
2337
2338
2339
2340
2341
2342
2343
2344
2345
2346
2347
2348
2349
2350
2351
2352
2353
2354
2355
2356
2357
2358
2359
2360
2361
2362
2363
[
    {
        "question": "A Machine Learning Specialist is working with multi ple data sources containing billions of records tha t need to be joined. What feature engineering and model devel opment approach should the Specialist take with a d ataset this large?",
        "options": [
            "A. Use an Amazon SageMaker notebook for both feature  engineering and model development",
            "B. Use an Amazon SageMaker notebook for feature engi neering and Amazon ML for model development",
            "C. Use Amazon EMR for feature engineering and Amazon  SageMaker SDK for model development",
            "D. Use Amazon ML for both feature engineering and mo del development."
        ],
        "correct": "B. Use an Amazon SageMaker notebook for feature engi neering and Amazon ML for model development",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist has completed a proof  of concept for a company using a small data sample  and now the Specialist is ready to implement an end-to- end solution in AWS using Amazon SageMaker The historical training data is stored in Amazon RDS Which approach should the Specialist use for traini ng a model using that data?",
        "options": [
            "A. Write a direct connection to the SQL database wit hin the notebook and pull data in",
            "B. Push the data from Microsoft SQL Server to Amazon  S3 using an AWS Data Pipeline and provide the S3",
            "C. Move the data to Amazon DynamoDB and set up a con nection to DynamoDB within the notebook to pull",
            "D. Move the data to Amazon ElastiCache using AWS DMS  and s t up a connection within the notebook to pul l"
        ],
        "correct": "B. Push the data from Microsoft SQL Server to Amazon  S3 using an AWS Data Pipeline and provide the S3",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "Which of the following metrics should a Machine Lea rning Specialist generally use to compare/evaluate machine learning classification models against each  other?",
        "options": [
            "A. Recall",
            "B. Misclassification rate",
            "C. Mean absolute percentage error (MAPE)",
            "D. Area Under the ROC Curve (AUC)",
            "A. Create a SageMaker endpoint and configuration for  the new model version. Redirect production traffic  to the",
            "B. Create a SageMaker endpoint and configuration for  the new model version. Redirect production traffic  to the",
            "C. Update the existing SageMaker endpoint to use a n ew configuration that is weighted to send 5% of the",
            "D. Update the existing SageMaker endpoint to use a n ew configuration that is weighted to send 100% of t he"
        ],
        "correct": "A. Create a SageMaker endpoint and configuration for  the new model version. Redirect production traffic  to the",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A manufacturing company has a large set of labeled historical sales data. The manufacturer would like to predict how many units of a particular part should be produced each quarter Which machine learning app roach should be used to solve this problem?",
        "options": [
            "A. Logistic regression",
            "B. Random Cut Forest (RCF)",
            "C. Principal component analysis (PCA)",
            "D. Linear regression"
        ],
        "correct": "D. Linear regression",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A manufacturing company has structured and unstruct ured data stored in an Amazon S3 bucket A Machine Learning Specialist wants to use SQL to run queries  on this data. Which solution requires the LEAST ef fort to be able to query this data?",
        "options": [
            "A. Use AWS Data Pipeline to transform th data and Am azon RDS to run queries.",
            "B. Use AWS Glue to catalogue the data and Amazon Ath ena to run queries",
            "C. Use AWS Batch to run ETL on he data and Amazon Au rora to run the queries",
            "D. Use AWS Lambda to transform the data and Amazon K inesis Data Analytics to run queries"
        ],
        "correct": "D. Use AWS Lambda to transform the data and Amazon K inesis Data Analytics to run queries",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist is packaging a custom  ResNet model into a Docker container so the compan y can leverage Amazon SageMaker for training The Spec ialist is using Amazon EC2 P3 instances to train th e model and needs to properly configure the Docker co ntainer to leverage the NVIDIA GPUs What does the Specialist need to do1?",
        "options": [
            "A. Bundle the NVIDIA drivers with the Docker image",
            "B. Build the Docker container to be NVIDIA-Docker co mpatible",
            "C. Organize the Docker container's file structure to  execute on GPU instances.",
            "D. Set the GPU flag in the Amazon SageMaker Create T rainingJob request body"
        ],
        "correct": "A. Bundle the NVIDIA drivers with the Docker image",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A large JSON dataset for a project has been uploade d to a private Amazon S3 bucket The Machine Learnin g Specialist wants to securely access and explore the  data from an Amazon SageMaker notebook instance A new VPC was created and assigned to the Specialist How can the privacy and integrity of the data store d in Amazon S3 be maintained while granting access to the Specialist for analysis?",
        "options": [
            "A. Launch the SageMaker notebook instance within the  VPC with SageMaker-provided internet access",
            "B. Launch the SageMaker notebook instance within the  VPC and create an S3 VPC endpoint for the notebook",
            "C. Launch the SageMaker notebook instance within the  VPC and create an S3 VPC endpoint for the notebook",
            "D. Launch the SageMaker notebook instance within the  VPC with SageMaker-provided internet access"
        ],
        "correct": "B. Launch the SageMaker notebook instance within the  VPC and create an S3 VPC endpoint for the notebook",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "Given the following confusion matrix for a movie cl assification model, what is the true class frequenc y for Romance and the predicted class frequency for Adventure?",
        "options": [
            "A. The true class frequency for Romance is 77.56% an d the predicted class frequency for Adventure is 20",
            "B. The true class frequency for Romance is 57.92% an d the predicted class frequency for Adventure is 13 12%",
            "C. The true class frequency for Romance is 0 78 and the predicted class frequency for Adventure is (0 4 7 -",
            "D. The true class frequency for Romance is 77.56% * 0.78 and the predicted class frequency for Adventur e is"
        ],
        "correct": "B. The true class frequency for Romance is 57.92% an d the predicted class frequency for Adventure is 13 12%",
        "explanation": "Explanation/Reference: https://docs.aws.amazon.com/machine-learning/latest /dg/multiclass-model-insights.html",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist is building a supervi sed model that will evaluate customers' satisfactio n with their mobile phone service based on recent usage. T he model's output should infer whether or not a cus tomer is likely to switch to a competitor in the next 30 days Which of the following modeling techniques should t he Specialist use1?",
        "options": [
            "A. Time-series prediction",
            "B. Anomaly detection",
            "C. Binary classification",
            "D. Regression",
            "A. Increase the randomization of training data in th e mini-batches used in training.",
            "B. Allocate a higher proportion of the overall data to the training dataset",
            "C. Apply L1 or L2 regularization and dropouts to the  training.",
            "D. Reduce the number of layers and units (or neurons ) from the deep learning network."
        ],
        "correct": "C. Apply L1 or L2 regularization and dropouts to the  training.",
        "explanation": "Explanation/Reference: If this is a ComputerVision problem augmentation ca n help and we may consider A an option. However in analyzing customer historic data, there is no easy way to increase randomization in training. If you g o deep into modelling and coding. When you build model with ten sorflow/pytorch, most of the time the trainloader i s already sampling in data in random manner (with shuffle ena ble). What we usually do to reduce overfitting is b y adding dropout. https://docs.aws.amazon.com/machine-learni ng/latest/dg/model-fit-underfitting-vsoverfitting. html",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist was given a dataset c onsisting of unlabeled data The Specialist must cre ate a model that can help the team classify the data into  different buckets What model should be used to com plete this work?",
        "options": [
            "A. K-means clustering",
            "B. Random Cut Forest (RCF)",
            "C. XGBoost",
            "D. BlazingText"
        ],
        "correct": "A. K-means clustering",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A retail company intends to use machine learning to  categorize new products A labeled dataset of curre nt products was provided to the Data Science team The dataset includes 1 200 products The labeled dataset  has 15 features for each product such as title dimensio ns, weight, and price Each product is labeled as be longing to one of six categories such as books, games, electro nics, and movies. Which model should be used for categorizing new pro ducts using the provided dataset for training?",
        "options": [
            "A. An XGBoost model where the objective parameter is  set to multi: softmax",
            "B. A deep convolutional neural network (CNN) with a softmax activation function for the last layer",
            "C. A regression forest where the number of trees is set equal to the number of product categories",
            "D. A DeepAR forecasting model based on a recurrent neu ral network (RNN) Correct Answer: A"
        ],
        "correct": "",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist is building a model t o predict future employment rates based on a wide r ange of economic factors While exploring the data, the Spec ialist notices that the magnitude of the input feat ures vary greatly The Specialist does not want variables with  a larger magnitude to dominate the model. What sho uld the Specialist do to prepare the data for model trainin g'?",
        "options": [
            "A. Apply quantile binning to group the data into cat egorical bins to keep any relationships in the data  by",
            "B. Apply the Cartesian product transformation to cre ate new combinations of fields that are independent  of the",
            "C. Apply normalization to ensure each field will hav e a mean of 0 and a variance of 1 to remove any sig nificant",
            "D. Apply the orthogonal sparse Diagram (OSB) transfo rmation to apply a fixed-size sliding window to gen erate"
        ],
        "correct": "C. Apply normalization to ensure each field will hav e a mean of 0 and a variance of 1 to remove any sig nificant",
        "explanation": "Explanation/Reference: https://docs.aws.amazon.com/machine-learning/latest /dg/data-transformationsreference. html",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist prepared the followin g graph displaying the results of k-means for k = [ 1:10] Considering the graph, what is a reasonable selecti on for the optimal choice of k? A. 1",
        "options": [
            "B. 4",
            "C. 7",
            "D. 10"
        ],
        "correct": "C. 7",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A company is using Amazon Polly to translate plaint ext documents to speech for automated company announcements However company acronyms are being mi spronounced in the current documents How should a Machine Learning Specialist address this issue fo r future documents'?",
        "options": [
            "A. Convert current documents to SSML with pronunciat ion tags",
            "B. Create an appropriate pronunciation lexicon.",
            "C. Output speech marks to guide in pronunciation",
            "D. Use Amazon Lex to preprocess the text files for p ronunciation"
        ],
        "correct": "A. Convert current documents to SSML with pronunciat ion tags",
        "explanation": "Explanation/Reference: https://docs.aws.amazon.com/polly/latest/dg/ssml.ht ml",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist is using Apache Spark  for pre-processing training data As part of the Sp ark pipeline, the Specialist wants to use Amazon SageMa ker for training a model and hosting it Which of th e following would the Specialist do to integrate the Spark application with SageMaker? (Select THREE )",
        "options": [
            "A. Download the AWS SDK for the Spark environment",
            "B. Install the SageMaker Spark library in the Spark environment.",
            "C. Use the appropriate estimator from the SageMaker Spark Library to train a model.",
            "D. Compress the training data into a ZIP file and up load it to a pre-defined Amazon S3 bucket."
        ],
        "correct": "",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist is working with a lar ge cybersecurily company that manages security even ts in real time for companies around the world The cybers ecurity company wants to design a solution that wil l allow it to use machine learning to score malicious events a s anomalies on the data as it is being ingested The company also wants be able to save the results in i ts data lake for later processing and analysis What  is the MOST efficient way to accomplish these tasks'? A. Ingest the data using Amazon Kinesis Data Firehos e, and use Amazon Kinesis Data Analytics Random Cut Forest (RCF) for anomaly detection Then use Kinesis  Data Firehose to stream the results to Amazon S3",
        "options": [
            "B. Ingest the data into Apache Spark Streaming using  Amazon EMR. and use Spark MLlib with kmeans to",
            "C. Ingest the data and store it in Amazon S3 Use AWS  Batch along with the AWS Deep Learning AMIs to tra in",
            "D. Ingest the data and store it in Amazon S3. Have a n AWS Glue job that is triggered on demand transfor m"
        ],
        "correct": "",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist works for a cred t ca rd processing company and needs to predict which transactions may be fraudulent in near-real time. S pecifically, the Specialist must train a model that  returns the probability that a given transact on may be fraudul ent. How should the Specialist frame this business probl em'?",
        "options": [
            "A. Streaming classification",
            "B. Binary classification",
            "C. Multi-category classification",
            "D. Regression classification"
        ],
        "correct": "A. Streaming classification",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "Amazon Connect has recently been tolled out across a company as a contact call center The solution has  been configured to store voice call recordings on Amazon  S3 The content of the voice calls are being analyzed f or the incidents being discussed by the call operat ors Amazon Transcribe is being used to convert the audi o to text, and the output is stored on Amazon S3 Wh ich approach will provide the information required for further analysis?",
        "options": [
            "A. Use Amazon Comprehend with the transcribed files to build the key topics",
            "B. Use Amazon Translate with the transcribed files t o train and build a model for the key topics",
            "C. Use the AWS Deep Learning AMI with Gluon Semantic  Segmentation on the transcribed files to train and",
            "D. Use the Amazon SageMaker k-Nearest-Neighbors (kNN ) algorithm on the transcribed files to generate a",
            "A. Perform one-hot encoding on highly correlated fea tures",
            "B. Use matrix multiplication on highly correlated fe atures.",
            "C. Create a new feature space using principal compon ent analysis (PCA)",
            "D. Apply the Pearson correlation coefficient"
        ],
        "correct": "B. Use matrix multiplication on highly correlated fe atures.",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist wants to determine th e appropriat SageMakerVariant Invocations Per Insta nce setting for an endpoint automatic scaling configura tion. The Specialist has performed a load test on a  single instance and determined that peak requests per seco nd (RPS) without service degradation is about 20 RP S As this is the first deployment, the Specialist intend s to set the invocation safety factor to 0 5. Based  on the stated parameters and given that the invocations per insta nce setting is measured on a per-minute basis, what  should the Specialist set as the sageMakervariantinvocatio nsPerinstance setting?",
        "options": [
            "A. 10",
            "B. 30",
            "C. 600",
            "D. 2,400"
        ],
        "correct": "C. 600",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist deployed a model that  provides product recommendations on a company's website Initially, the model was performing very we ll and resulted in customers buying more products o n average However within the past few months the Spec ialist has noticed that the effect of product recommendations has diminished and customers are st arting to return to their original habits of spendi ng less The Specialist is unsure of what happened, as the m odel has not changed from its initial deployment ov er a year ago. Which method should the Specialist try to  improve model performance?",
        "options": [
            "A. The model needs to be completely re-engineered be cause it is unable to handle product inventory chan ges",
            "B. The model's hyper parameters should be periodical ly updated to prevent drift",
            "C. The model should be periodically retrained from s cratch using the original data while adding a regul arization",
            "D. The model should be periodically retrained using the original training data plus new dataas product inventory"
        ],
        "correct": "D. The model should be periodically retrained using the original training data plus new dataas product inventory",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A manufacturer of car engines collects data from ca rs as they are being driven The data collected incl udes timestamp, engine temperature, rotations per minute  (RPM), and other sensor readings The company wants  to predict when an engine is going to have a problem s o it can notify drivers in advance to get engine maintenance The engine data is loaded into a data l ake for training Which is the MOST suitable predictive model that ca n be deployed into production'?",
        "options": [
            "A. Add labels over time to indicate which engine fau lts occur at what time in the future to turn this i nto a",
            "B. This data requires an unsupervised learning algor ithm Use Amazon SageMaker k-means to cluster the da ta",
            "C. Add labels over time to indicate which engine fau lts occur at what time in the future to turn this i nto a",
            "D. This data is already formulated as a time series Use Amazon SageMaker seq2seq to model the time seri es."
        ],
        "correct": "B. This data requires an unsupervised learning algor ithm Use Amazon SageMaker k-means to cluster the da ta",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Data Scientist is working on an application that performs sentiment analysis. The validation accurac y is poor and the Data Scientist thinks that the cause may be  a rich vocabulary and a low average frequency of w ords in the dataset. Which tool should be used to improve the validation  accuracy?",
        "options": [
            "A. Amazon Comprehend syntax analysts and entity dete ction",
            "B. Amazon SageMaker BlazingText allow mode",
            "C. Natural Language Toolkit (NLTK) stemming and stop  word removal",
            "D. Scikit-learn term frequency-inverse document freq uency (TF-IDF)vectorizers"
        ],
        "correct": "A. Amazon Comprehend syntax analysts and entity dete ction",
        "explanation": "Explanation/Reference: https://monkeylearn.com/sentiment-analysis/",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist is developing recomme ndation engine for a photography blog Given a pictu re, the recommendation engine should show a picture tha t captures similar objects The Specialist would lik e to create a numerical representation feature to perfor m nearest-neighbor searches. What actions would all ow the Specialist to get relevant numerical representation s?",
        "options": [
            "A. Reduce image resolution and use reduced resolutio n pixel values as features",
            "B. Use Amazon Mechanical Turk to label image content  and create a one-hot representation indicating the",
            "C. Run images through a neural network pie-trained o n ImageNet, and collect the feature vectors from th e"
        ],
        "correct": "A. Reduce image resolution and use reduced resolutio n pixel values as features",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A gaming company has launched an online game where people can start playing for free but they need to pay if they choose to use certain features The company nee ds to build an automated system to predict whether or not a new user will become a paid user within 1 year Th e company has gathered a labeled dataset from 1 mil lion users. The training dataset consists of 1.000 posit ive samples (from users who ended up paying within 1 year) and 999.000 negative samples (from users who did no t use any paid features) Each data sample consists of 200 features including user age, device, location, and play patterns Using this dataset for training, the Data Science t eam trained a random forest model that converged wi th over 99% accuracy on the training set However, the predi ction results on a test dataset were not satisfacto ry. Which of the following approaches should the Data Science  team take to mitigate this issue? (Select TWO.)",
        "options": [
            "A. Add more deep trees to the random forest to enabl e the model to learn more features.",
            "B. indicate a copy of the samples in the test databa se in the training dataset",
            "C. Generate more positive samples by duplicating the  positive samples and adding a small amount of nois e to",
            "D. Change the cost function so that false negatives have a higher impact on the cost value than false p ositives"
        ],
        "correct": "",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "While reviewing the histogram for residuals on regr ession evaluation data a Machine Learning Specialis t notices that the residuals do not form a zero-cente red bell shape as shown What does this mean? A. The model might have prediction errors over a ran ge of target values.",
        "options": [
            "B. The dataset cannot be accurately represented usin g the regression model",
            "C. There are too many variables in the model",
            "D. The model is predicting its target values perfect ly."
        ],
        "correct": "D. The model is predicting its target values perfect ly.",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "During mini-batch training of a neural network for a classification problem, a Data Scientist notices that training accuracy oscillates. What is the MOST likely cause of this issue?",
        "options": [
            "A. The class distribution in the dataset is imbalanc ed",
            "B. Dataset shuffling is disabled",
            "C. The batch size is too big",
            "D. The learning rate is very high"
        ],
        "correct": "B. Dataset shuffling is disabled",
        "explanation": "Explanation/Reference: https://towardsdatascience.com/deep-learning-person al-notes-part-1-lesson-2-8946fe970b95",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist observes several perf ormance problems with the training portion of a mac hine learning solution on Amazon SageMaker The solution uses a large training dataset 2 TB in size and is u sing the SageMaker k-means algorithm The observed issues inc lude the unacceptable length of time it takes befor e the training job launches and poor I/O throughput while  training the model. What should the Specialist do to address the perfor mance issues with the current solution?",
        "options": [
            "A. Use the SageMaker batch transform feature",
            "B. Compress the training data into Apache Parquet fo rmat.",
            "C. Ensure that the input mode for the training job i s set to Pipe.",
            "D. Copy the training dataset to an Amazon EFS volume  mounted on the SageMaker instance."
        ],
        "correct": "B. Compress the training data into Apache Parquet fo rmat.",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist is building a convolu tional neural network (CNN) that will classify 10 t ypes of animals. The Specialist has built a series of layer s in a neural network that will take an input image  of an animal, pass it through a series of convolutional a nd pooling layers, and then finallyit through a den se and fully connected layer with 10 nodes The Specialist would like to get an output from the neural network that is a probability distribution of how likely it is that t he input image belongs to each of the 10 classes. Which function will produce the desired output?",
        "options": [
            "A. Dropout",
            "B. Smooth L1 loss",
            "C. Softmax",
            "D. Rectified linear units (ReLU)"
        ],
        "correct": "C. Softmax",
        "explanation": "Explanation/Reference: https://towardsdatascience.com/building-a-convoluti onal-neural-network-cnn-in-keras-329fbbadc5f5",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist is building a model t hat will perform time series forecasting using Amaz on SageMaker The Specialist has finished training the model and is now planning to perform load testing o n the endpoint so they can configure Auto Scaling for the  model variant. Which approach will allow the Specialist to review the latency, memory utilization, and CPU utilizatio n during the load test\"?",
        "options": [
            "A. Review SageMaker logs that have been written to A mazon S3 by leveraging Amazon Athena and Amazon",
            "B. Generate an Amazon CloudWatch dashboard o create a single view for the latency, memory utilization, and",
            "C. Build custom Amazon CloudWatch Logs and then leve rage Amazon ES and Kibana to query and visualize",
            "D. Send Amazon CloudWatch Logs that were generated b y Amazon SageMaker lo Amazon ES and use"
        ],
        "correct": "B. Generate an Amazon CloudWatch dashboard o create a single view for the latency, memory utilization, and",
        "explanation": "Explanation Explanation/Reference: https://docs.aws.amazon.com/sagemaker/latest/dg/mon itoring-cloudwatch.html",
        "references": ""
    },
    {
        "question": "An Amazon SageMaker notebook instance is launched i nto Amazon VPC The SageMaker notebook references data contained in an Amazon S3 bucket in another ac count The bucket is encrypted using SSE-KMS The instance returns an access denied error when trying  to access data in Amazon S3. Which of the following are required to access the b ucket and avoid the access denied error? (Select TH REE )",
        "options": [
            "A. An AWS KMS key policy that allows access to the c ustomer master key (CMK)",
            "B. A SageMaker notebook security group that allows a ccess to Amazon S3",
            "C. An 1AM role that allows access to the specific S3  bucket",
            "D. A permissive S3 bucket policy"
        ],
        "correct": "",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A monitoring service generates 1 TB of scale metric s record data every minute A Research team performs queries on this data using Amazon Athena The querie s run slowly due to the large volume of data, and t he team requires better performance. How should the records be stored in Amazon S3 to im prove query performance?",
        "options": [
            "A. CSV files",
            "B. Parquet files",
            "C. Compressed JSON",
            "D. RecordIO"
        ],
        "correct": "D. RecordIO",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist needs to create a dat a repository to hold a large amount of time-based t raining data for a new model. In the source system, new fil es are added every hour Throughout a single 24-hour period, the volume of hourly updates will change si gnificantly. The Specialist always wants to train o n the last 24 hours of the data. Which type of data repository is  the MOST cost-effective solution?",
        "options": [
            "A. An Amazon EBS-backed Amazon EC2 instance with hou rly directories",
            "B. An Amazon RDS database with hourly table partitio ns",
            "C. An Amazon S3 data lake with hourly object prefixe s",
            "D. An Amazon EMR cluster with hourly hive partitions  on Amazon EBS volumes"
        ],
        "correct": "C. An Amazon S3 data lake with hourly object prefixe s",
        "explanation": "Explanation Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A retail chain has been ingesting purchasing record s from its network of 20,000 stores to Amazon S3 us ing Amazon Kinesis Data Firehose To support training an  improved machine learning model, training records will require new but simple transformations, and some at tributes will be combined The model needs to be ret rained daily. Given the large number of stores and the leg acy data ingestion, which change will require the L EAST amount of development effort?",
        "options": [
            "A. Require that the stores to switch to capturing th eir data locally on AWS Storage Gateway for loading  into",
            "B. Deploy an Amazon EMR cluster running Apache Spark  with the transformation logic, and have the cluste r",
            "C. Spin up a fleet of Amazon EC2 instances with the transformation logic, have them transform the data",
            "D. Insert an Amazon Kinesis Data Analytics stream do wnstream of the Kinesis Data Firehouse stream that"
        ],
        "correct": "D. Insert an Amazon Kinesis Data Analytics stream do wnstream of the Kinesis Data Firehouse stream that",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A city wants to monitor its air quality to address the consequences of air pollution A Machine Learnin g Specialist needs to forecast the air quality in parts per mill ion of contaminates for the next 2 days in the city  As this is a prototype, only daily data from the last year is av ailable. Which model is MOST likely to provide the best resu lts in Amazon SageMaker?",
        "options": [
            "A. Use the Amazon SageMaker k-Nearest-Neighbors (kNN ) algorithm on the single time series consisting of",
            "B. Use Amazon SageMaker Random Cut Forest (RCF) on t he single time series consisting of the full year o f",
            "C. Use the Amazon SageMaker Linear Learner algorithm  on the single time series consisting of the full y ear of",
            "D. Use the Amazon SageMaker Linear Learner algorithm  on the single time series consisting of the full y ear of"
        ],
        "correct": "C. Use the Amazon SageMaker Linear Learner algorithm  on the single time series consisting of the full y ear of",
        "explanation": "Explanation/Reference: https://aws.amazon.com/blogs/machine-learning/build -a model-to-predict-the-impactof-weather-on-urban-a ir- quality-using-amazon-sagemaker/?ref=Welcome.AI",
        "references": ""
    },
    {
        "question": "For the given confusion matrix, what is the recall and precision of the model? A. Recall = 0.92 Precision = 0.84",
        "options": [
            "B. Recall = 0.84 Precision = 0.8",
            "C. Recall = 0.92 Precision = 0.8",
            "D. Recall = 0.8 Precision = 0.92"
        ],
        "correct": "C. Recall = 0.92 Precision = 0.8",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist is working with a med ia company to perform classification on popular art icles from the company's website. The company is using ra ndom forests to classify how popular an article wil l be before it is published A sample of the data being u sed is below. Given the dataset, the Specialist wants to convert the Day-Of_Week column to binary values. What techn ique should be used to convert this column to binary val ues.",
        "options": [
            "A. Binarization",
            "B. One-hot encoding",
            "C. Tokenization",
            "D. Normalization transformation"
        ],
        "correct": "B. One-hot encoding",
        "explanation": "Explanation Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A company has raw user and transaction data stored in AmazonS3 a MySQL database, and Amazon RedShift A Data Scientist needs to perform an analysis by jo ining the three datasets from Amazon S3, MySQL, and Amazon RedShift, and then calculating the average-o f a few selected columns from the joined data. Which AWS service should the Data Scientist use?",
        "options": [
            "A. Amazon Athena",
            "B. Amazon Redshift Spectrum",
            "C. AWS Glue",
            "D. Amazon QuickSight"
        ],
        "correct": "A. Amazon Athena",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Mobile Network Operator is building an analytics platform to analyze and optimize a company's operat ions using Amazon Athena and Amazon S3. The source systems send data in CSV format in real lime The Data Engineering team wants to transform t he data to the Apache Parquet format before storing it  on Amazon S3 Which solution takes the LEAST effort to implement?",
        "options": [
            "A. Ingest .CSV data using Apache Kafka Streams on Am azon EC2 instances and use Kafka Connect S3 to",
            "B. Ingest .CSV data from Amazon Kinesis Data Streams  and use Amazon Glue to convert data into Parquet.",
            "C. Ingest .CSV data using Apache Spark Structured St reaming in an Amazon EMR cluster and use Apache",
            "D. Ingest .CSV data from Amazon Kinesis Data Streams  and use Amazon Kinesis Data Firehose to convert"
        ],
        "correct": "B. Ingest .CSV data from Amazon Kinesis Data Streams  and use Amazon Glue to convert data into Parquet.",
        "explanation": "Explanation/Reference: https://medium.com/searce/convert-csv-json-files-to -apache-parquet-using-aws-glue-a760d177b45f https:/ / github.com/ecloudvalley/Building-a-Data-Lake-with-A WS-Glue-and-Amazon-S3",
        "references": ""
    },
    {
        "question": "An e-commerce company needs a customized training m odel to classify images of its shirts and pants pro ducts The company needs a proof of concept in 2 to 3 days  with good accuracy Which compute choice should the Machine Learning Specialist select to train and ach ieve good accuracy on the model quickly?",
        "options": [
            "A. m5 4xlarge (general purpose)",
            "B. r5.2xlarge (memory optimized)",
            "C. p3.2xlarge (GPU accelerated computing)",
            "D. p3 8xlarge (GPU accelerated computing)"
        ],
        "correct": "C. p3.2xlarge (GPU accelerated computing)",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Marketing Manager at a pet insurance company plan s to launch a targeted marketing campaign on social media to acquire new customers Currently, the compa ny has the following data in Amazon Aurora. Profiles for all past and existing customers. Profiles for all past and existing insured pets. Policy-level information. Premiums received. Claims paid. What steps should be taken to implement a machine l earning model to identify potential new customers o n social media?",
        "options": [
            "A. Use regression on customer profile data to unders tand key characteristics of consumer segments Find",
            "B. Use clustering on customer profile data to unders tand key characteristics of consumer segments Find",
            "C. Use a recommendation engine on customer profile d ata to understand key characteristics of consumer",
            "D. Use a decision tree classifier engine on customer  profile data to understand key characteristics of consumer"
        ],
        "correct": "C. Use a recommendation engine on customer profile d ata to understand key characteristics of consumer",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A company is running an Amazon SageMaker training j ob that will access data stored in its Amazon S3 bu cket A compliance policy requires that the data never be  transmitted across the internet How should the com pany set up the job?",
        "options": [
            "A. Launch the notebook instances in a public subnet and access the data through the public S3 endpoint",
            "B. Launch the notebook instances in a private subnet  and access the data through a NAT gateway",
            "C. Launch the notebook instances in a public subnet and access the data through a NAT gateway",
            "D. Launch the notebook instances in a private subnet  and access the data through an S3 VPC endpoint."
        ],
        "correct": "D. Launch the notebook instances in a private subnet  and access the data through an S3 VPC endpoint.",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist is preparing data for  training on Amazon SageMaker The Specialist is transformed into a numpy .array, which appears to b e negatively affecting the speed of the training Wh at should the Specialist do to optimize the data for training  on SageMaker'?",
        "options": [
            "A. Use the SageMaker batch transform feature to tran sform the training data into a DataFrame",
            "B. Use AWS Glue to compress the data into the Apache P arquet format C. Transform the dataset into the Recordio protobuf fo rmat",
            "D. Use the SageMaker hyperparameter optimization fea ture to automatically optimize the data"
        ],
        "correct": "",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist is training a model t o identify the make and model of vehicles in images  The Specialist wants to use transfer learning and an ex isting model trained on images of general objects T he Specialist collated a large custom dataset of pictu res containing different vehicle makes and models.",
        "options": [
            "A. Initialize the model with random weights in all l ayers including the last fully connected layer",
            "B. Initialize the model with pre-trained weights in all layers and replace the last fully connected lay er.",
            "C. Initialize the model with random weights in all l ayer and replace the last fully connected layer",
            "D. Initialize the model with pre-trained weights in all layers including the last fully connected layer"
        ],
        "correct": "D. Initialize the model with pre-trained weights in all layers including the last fully connected layer",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist is developing a custo m video recommendation model for an application The dataset used to train this model is very large with  millions of data points and is hosted in an Amazon  S3 bucket The Specialist wants to avoid loading all of this d ata onto an Amazon SageMaker notebook instance beca use it would take hours to move and will exceed the attach ed 5 GB Amazon EBS volume on the notebook instance. Which approach allows the Specialist to use all the  data to train the model?",
        "options": [
            "A. Load a smaller subset of the data into the SageMa ker notebook and train locally. Confirm that the tr aining",
            "B. Launch an Amazon EC2 instance with an AWS Deep Le arning AMI and attach the S3 bucket to the",
            "C. Use AWS Glue to train a model using a small subse t of the data to confirm that the data will be comp atible",
            "D. Load a smaller subset of the data into the SageMa ker notebook and train locally. Confirm that the tr aining"
        ],
        "correct": "A. Load a smaller subset of the data into the SageMa ker notebook and train locally. Confirm that the tr aining",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist is creating a new nat ural language processing application that processes  a dataset comprised of 1 million sentences The aim is  to then run Word2Vec to generate embeddings of thesentences and enable different types of predictions . Here is an example from the dataset. \"The quck BROWN FOX jumps over the lazy dog \". Which of the following are the operations the Speci alist needs to perform to correctly sanitize and pr epare the data in a repeatable manner? (Select THREE)",
        "options": [
            "A. Perform part-of-speech tagging and keep the actio n verb and the nouns only",
            "B. Normalize all words by making the sentence lowerc ase",
            "C. Remove stop words using an English stopword dicti onary.",
            "D. Correct the typography on \"quck\" to \"quick.\""
        ],
        "correct": "",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "This graph shows the training and validation loss a gainst the epochs for a neural network The network being trained is as follows. Two dense layers one output neuron. 100 neurons in each layer. 100 epochs. Random initialization of weights. Which technique can be used to improve model perfor mance in terms of accuracy in the validation set? A. Early stopping",
        "options": [
            "B. Random initialization of weights with appropriate  seed",
            "C. Increasing the number of epochs",
            "D. Adding another layer with the 100 neurons"
        ],
        "correct": "D. Adding another layer with the 100 neurons",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A manufacturing company asks its Machine Learning S pecialist to develop a model that classifies defect ive parts into one of eight defect types. The company h as provided roughly 100000 images per defect type f or training During the injial training of the image cl assification model the Specialist notices that the validation accuracy is 80%, while the training accuracy is 90%  It is known that human-level performance for this type of image classification is around 90% What should the Specialist consider to fix this issue1?",
        "options": [
            "A. A longer training time",
            "B. Making the network larger",
            "C. Using a different optimizer",
            "D. Using some form of regularization"
        ],
        "correct": "D. Using some form of regularization",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "Example Corp has an annual sale event from October to December. The company has sequential sales data from the past 15 years and wants to use Amazon ML t o predict the sales for this year's upcoming event.  Which method should Example Corp use to split the data in to a training dataset and evaluation dataset?",
        "options": [
            "A. Pre-split the data before uploading to Amazon S3",
            "B. Have Amazon ML split the data randomly.",
            "C. Have Amazon ML split the data sequentially.",
            "D. Perform custom cross-validation on the data"
        ],
        "correct": "C. Have Amazon ML split the data sequentially.",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A company is running a machine learning prediction service that generates 100 TB of predictions every day A Machine Learning Specialist must generate a visuali zation of the daily precision-recall curve from the predictions, and forward a read-only version to the  Business team. Which solution requires the LEAST coding effort?",
        "options": [
            "A. Run a daily Amazon EMR workflow to generate preci sion-recall data, and save the results in Amazon S3",
            "C. Run a daily Amazon EMR workflow to generate preci sion-recall data, and save the results in Amazon S3",
            "D. Generate daily precision-recall data in Amazon ES , and publish the results in a dashboard shared wit h the"
        ],
        "correct": "C. Run a daily Amazon EMR workflow to generate preci sion-recall data, and save the results in Amazon S3",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist has built a model usi ng Amazon SageMaker built-in algorithms and is not getting expected accurate results The Specialist wants to u se hyperparameter optimization to increase the mode l's accuracy. Which method is the MOST repeatable and r equires the LEAST amount of effort to achieve this?",
        "options": [
            "A. Launch multiple training jobs in parallel with di fferent hyperparameters",
            "B. Create an AWS Step Functions workflow that monito rs the accuracy in Amazon CloudWatch Logs and",
            "C. Create a hyperparameter tuning job and set the ac curacy as an objective metric.",
            "D. Create a random walk in the parameter space to it erate through a range of values that should be used  for"
        ],
        "correct": "B. Create an AWS Step Functions workflow that monito rs the accuracy in Amazon CloudWatch Logs and",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "IT leadership wants Jo transition a company's exist ing machine le rning data storage environment to AW S as a temporary ad hoc solution The company currently use s a custom software process that heavily leverages SOL as a query language and exclusively s ores generate d csv documents for machine learning. The ideal sta te for the company would be a solution tha allows it to co ntinue to use the current workforce of SQL experts The solution must also support the storage of csv and J SON files, and be able to query over semi-structure d data The following are high priorities for the company: Solution simplicity. Fast development time. Low cost High flexibility What technologies meet the company's requirements?",
        "options": [
            "A. Amazon S3 and Amazon Athena",
            "B. Amazon Redshift and AWS Glue",
            "C. Amazon DynamoDB and DynamoDB Accelerator (DAX)",
            "D. Amazon RDS and Amazon ES",
            "A. Precision",
            "B. Recall",
            "C. Area Under the ROC Curve (AUC)",
            "D. Root Mean Square Error (RMSE)"
        ],
        "correct": "A. Precision",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A bank's Machine Learning team is developing an app roach for credit card fraud detection The company h as a large dataset of historical data labeled as fraudul ent The goal is to build a model to take the inform ation from new transactions and predict whether each transacti on is fraudulent or not. Which built-in Amazon Sage Maker machine learning algorithm should be used for model ing this problem?",
        "options": [
            "A. Seq2seq",
            "B. XGBoost",
            "C. K-means",
            "D. Random Cut Forest (RCF)"
        ],
        "correct": "C. K-means",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "While working on a neural network project, a Machin e Learning Spe iali t discovers thai some features in the data have very high magnitude resulting in thi data  being weighted more in the cost function What shou ld the Specialist do to ensure better converg nce during b ackpropagation?",
        "options": [
            "A. Dimensionality reduction",
            "B. Data normalization",
            "C. Model regulanzation",
            "D. Data augmentation for the minority class",
            "A. Listwise deletion",
            "B. Last observation carried forward",
            "C. Multiple imputation",
            "D. Mean substitution"
        ],
        "correct": "C. Multiple imputation",
        "explanation": "Explanation/Reference: https://worldwidescience.org/topicpages/i/imputing+ missing+values.html",
        "references": ""
    },
    {
        "question": "An Machine Learning Specialist discover the followi ng statistics while experimenting on a model. What can the Specialist from the experiments?",
        "options": [
            "A. The model In Experiment 1 had a high variance err or lhat was reduced in Experiment 3 by regularizati on",
            "B. The model in Experiment 1 had a high bias error t hat was reduced in Experiment 3 by regularization",
            "C. The model in Experiment 1 had a high bias error a nd a high variance error that were reduced in Exper iment",
            "D. The model in Experiment 1 had a high random noise  error that was reduced in Expenment 3 by"
        ],
        "correct": "C. The model in Experiment 1 had a high bias error a nd a high variance error that were reduced in Exper iment",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist needs to be able to i ngest stre ming data and store it in Apache Parquet  files for exploration and analysis. Which of the following se rvices would both ingest and store this data in the  correct format?",
        "options": [
            "A. AWSDMS B. Amazon Kinesis Data Streams",
            "C. Amazon Kinesis Data Firehose",
            "D. Amazon Kinesis Data Analytics"
        ],
        "correct": "C. Amazon Kinesis Data Firehose",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist needs to move and tra nsform data in preparation for training Some of the  data needs to be processed in near-real time and other d ata can be moved hourly There are existing Amazon E MR MapReduce jobs to clean and feature engineering to perform on the data Which of the following services  can feed data to the MapReduce jobs? (Select TWO )",
        "options": [
            "A. AWSDMS",
            "B. Amazon Kinesis",
            "C. AWS Data Pipeline",
            "D. Amazon Athena"
        ],
        "correct": "",
        "explanation": "Explanation/Reference: https://aws.amazon.com/jp/emr/?whats-new-cards.sort by= item.additionalFields.postDateTime&whats-new- cards.sort-order=desc",
        "references": ""
    },
    {
        "question": "An insurance company is developing a new device for  vehicles that uses a camera to observe drivers' be havior and alert them when they appear distracted The comp any created approximately 10,000 training images in  a controlled environment that a Machine Learning Spec ialist will use to train and evaluate machine learn ing models During the model evaluation the Specialist n otices that the training error rate diminishes fast er as the number of epochs increases and the model is not acc urately inferring on the unseen test images Which o f the following should be used to resolve this issue? (Se lect TWO)",
        "options": [
            "A. Add vanishing gradient to the model",
            "B. Perform data augmentation on the training data",
            "C. Make the neural network architecture complex.",
            "D. Use gradient checking in the model"
        ],
        "correct": "",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "The Chief Editor for a product catalog wants the Re search and Developm nt team to build a machine lear ning system that can be used to detect whether or not in dividuals in a collection of images are wearing the company's retail brand The team has a set of traini ng data. Which machine learning algorithm should the researc hers use tha BEST meets their requirements?",
        "options": [
            "A. Latent Dirichlet Allocation (LDA)",
            "B. Recurrent neural network (RNN)",
            "C. K-means",
            "D. Convolutional neural network (CNN)"
        ],
        "correct": "C. K-means",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist kicks off a hyperpara meter tuning job for a tree-based ensemble model us ing Amazon SageMaker with Area Under the ROC Curve (AUC ) as the objective metric This workflow will eventually be deployed in a pipeline that retrains and tunes hyperparameters each night to model click -through on data that goes stale every 24 hours With the goa l of decreasing the amount of time it takes to trai n these models, and ultimately to decrease costs, the Speci alist wants to reconfigure the input hyperparameter  range(s) Which visualization will accomplish this?",
        "options": [
            "A. A histogram showing whether the most important in put feature is Gaussian.",
            "B. A scatter plot with points colored by target vari able that uses (-Distributed Stochastic Neighbor Em bedding",
            "C. A scatter plot showing (he performance of the obj ective metric over each training iteration",
            "D. A scatter plot showing the correlation between ma ximum tree depth and the objective metric."
        ],
        "correct": "D. A scatter plot showing the correlation between ma ximum tree depth and the objective metric.",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist is configuring automa tic model tuning in Amazon SageMaker When using the hyperparameter optimization feature, which of the f ollowing guidelines should be followed to improve optimization? Choose the maximum number of hyperpar ameters supported by",
        "options": [
            "A. Amazon SageMaker to search the largest number of combinations possible",
            "B. Specify a very large hyperparameter range to allo w Amazon SageMaker to cover every possible value.",
            "C. Use log-scaled hyperparameters to allow the hyper parameter space to be searched as quickly as possib le",
            "D. Execute only one hyperparameter tuning job at a t ime and improve tuning through successive rounds of"
        ],
        "correct": "C. Use log-scaled hyperparameters to allow the hyper parameter space to be searched as quickly as possib le",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A large mobile network operating company is buildin g a machine learning model to predict customers who  are likely to unsubscribe from the service. The company  plans to offer an incentive for these customers as  the cost of churn is far greater than the cost of the incent ive. The model produces the following confusion matrix a fter evaluating on a tes dataset of 100 customers: Based on the model evaluation results, why is this a viable model for p oduc ion?",
        "options": [
            "A. The model is 86% accurate and the cost incurred b y he company as a result of false negatives is less  than",
            "B. The precision of the model is 86%, which is l ss than the accuracy of the model.",
            "C. The model is 86% accurate and the cost incurred b y the company as a result of false positives is les s than",
            "D. The precision of the model is 86%, which is great er than the accuracy of the model."
        ],
        "correct": "A. The model is 86% accurate and the cost incurred b y he company as a result of false negatives is less  than",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist is designing a system  for improving sales for a company. The objective i s to use the large amount of information the company has on users' behavior and product preferences to predict which products users would like based on the users' simil arity to other users. What should the Specialist do  to meet this objective?",
        "options": [
            "A. Build a content-based filtering recommendation en gine with Apache Spark ML on Amazon EMR.",
            "B. Build a collaborative filtering recommendation en gine with Apache Spark ML on Amazon EMR.",
            "C. Build a model-based filtering recommendation engi ne with Apache Spark ML on Amazon EMR.",
            "D. Build a combinative filtering recommendation engi ne with Apache Spark ML on Amazon EMR."
        ],
        "correct": "B. Build a collaborative filtering recommendation en gine with Apache Spark ML on Amazon EMR.",
        "explanation": "Explanation/Reference: Many developers want to implement the famous Amazon  model that was used to power the \"People who bought this also bought these items\" feature on Ama zon.com. This model is based on a method called Collaborative Filtering. It takes items such as mov ies, books, and products that were rated highly by a set of users and recommending them to other users who also  gave them high ratings. This method works well in domains where explicit ratings or implicit user act ions can be gathered and analyzed. https://aws.amaz on.com/ blogs/big-data/building-a-recommendation-engine-wit h-sparkml-on-amazon-emr-using-zeppelin/",
        "references": ""
    },
    {
        "question": "A Data Engineer needs to build a model using a data set containing customer credit card information. Ho w can the Data Engineer ensure the data remains encrypted  and the credit card information is secure? Use a c ustom encryption algorithm to encrypt the data and store the data on an Amazon SageMaker instance in a VPC. Use the SageMaker DeepAR algorithm to randomize the cre dit card numbers.",
        "options": [
            "A. Use an IAM policy to encrypt the data on the Amaz on S3 bucket and Amazon Kinesis to automatically",
            "B. Use an Amazon SageMaker launch configuration to e ncrypt the data once it is copied to the Sage Makerinstance in a VPC. Use the SageMaker principal comp onent analysis (PCA) algorithm to reduce the length",
            "C. Use AWS KMS to encrypt the data on Amazon S3 and Amazon SageMaker, and redact the credit card"
        ],
        "correct": "",
        "explanation": "Explanation/Reference: https://docs.aws.amazon.com/sagemaker/latest/dg/pca .html",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist is using an Amazon Sa geMaker notebook instance in a private subnet of a corporate VPC. The ML Specialist has important data  st red on the Amazon SageMaker notebook instance's Amazon EBS volume, and needs to take a snapshot of that EBS volume. However the ML Specialist cannot find the Amazon SageMaker notebook instance's EBS v olume or Amazon EC2 instance within the VPC. Why is the ML Specialist not seeing the instance visible i n the VPC?",
        "options": [
            "A. Amazon SageMaker notebook instances are based on the EC2 instances within the customer account, but",
            "B. Amazon SageMaker notebook inst nces are based on the Amazon ECS service within customer accounts.",
            "C. Amazon SageMaker notebook instances are based on EC2 instances running within AWS service",
            "D. Amazon SageMaker notebook instances are based on AWS ECS instances running within AWS service"
        ],
        "correct": "C. Amazon SageMaker notebook instances are based on EC2 instances running within AWS service",
        "explanation": "Explanation/Reference: https://docs.aws.amazon.com/sagemaker/latest/dg/gs- setup-working-env.html",
        "references": ""
    },
    {
        "question": "A manufacturing company has structured and unstruct ured data stored in an Amazon S3 bucket. A Machine Learning Specialist wants to use SQL to run queries  on this data. Which solution requires the LEAST effort to be able  to query this data?",
        "options": [
            "A. Use AWS Data Pipeline to transform the data and A mazon RDS to run queries.",
            "B. Use AWS Glue to catalogue the data and Amazon Ath ena to run queries.",
            "C. Use AWS Batch to run ETL on the data and Amazon A urora to run the queries.",
            "D. Use AWS Lambda to transform the data and Amazon K inesis Data Analytics to run queries."
        ],
        "correct": "B. Use AWS Glue to catalogue the data and Amazon Ath ena to run queries.",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist receives customer dat a for an online shopping website. The data includes demographics, past visits, and locality information . The Specialist must develop a machine learning ap proach to identify the customer shopping patterns, preference s and trends to enhance the website for better serv ice and smart recommendations. Which solution should the Sp ecialist recommend? A. Latent Dirichlet Allocation (LDA) for the given c ollection of discrete data to identify patterns in the customer database.",
        "options": [
            "B. A neural network with a minimum of three layers a nd random initial weights to identify patterns in t he",
            "C. Collaborative filtering based on user interaction s and correlations to identify patte ns in the cust omer",
            "D. Random Cut Forest (RCF) over random subsamples to  identify patterns in the customer database"
        ],
        "correct": "C. Collaborative filtering based on user interaction s and correlations to identify patte ns in the cust omer",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist is working with a lar ge comp ny to leverage machine learning within its products. The company wants to group its customers into cat g ories based on which customers will and will not ch urn within the next 6 months. The company has labeled t he data available to the Specialist. Which machine learning model type should the Speci list use to ac complish this task?",
        "options": [
            "A. Linear regression",
            "B. Classification",
            "C. Clustering",
            "D. Reinforcement learning"
        ],
        "correct": "B. Classification",
        "explanation": "Explanation/Reference: The goal of classification is to determine to which  class or category a data point (customer in our ca se) belongs to. For classification problems, data scientists wo uld use historical data with predefined target vari ables AKA labels (churner/non-churner) -answers that need to be predicted -to train an algorithm. With classific ation, businesses can answer the following questions: Will this customer churn or not? Will a customer renew their subscription? Will a us er downgrade a pricing plan? Are there any signs of  unusual customer behavior? https://www.kdnuggets.com9/05/churn-prediction-mach ine-learning.html",
        "references": ""
    },
    {
        "question": "The displayed graph is from a foresting model for t esting a time series. Considering the graph only, which conclusion should  a Machine Learning Specialist make about the behav ior of the model?",
        "options": [
            "A. The model predicts both the trend and the seasona lity well.",
            "B. The model predicts the trend well, but not the se asonality.",
            "C. The model predicts the seasonality well, but not the trend.",
            "D. The model does not predict the trend or the seaso nality well."
        ],
        "correct": "D. The model does not predict the trend or the seaso nality well.",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A company wants to classify user behavior as either  fraudulent or normal. Based on internal research, a Machine Learning Specialist would like to build a b inary classifier based on two features: age of acco unt and transaction month. The class distribution for these  features is illustrated in the figure provided. Based on this information which model would have th e HIGHEST accuracy?",
        "options": [
            "A. Long short-term memory (LSTM) model with scaled e xponential linear unit (SELL))",
            "B. Logistic regression",
            "C. Support vector machine (SVM) with non-linear kern  l",
            "D. Single perceptron with Tanh activation function"
        ],
        "correct": "C. Support vector machine (SVM) with non-linear kern  l",
        "explanation": "Explanation/Reference: https://machinelearningmastery com/logistic-regress ion-for-machine-learning/",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist at a company sensitiv e to security is preparing a dataset for model trai ning. The dataset is stored in Amazon S3 and contains Persona lly Identifiable Information (Pll). The dataset: * Must be accessible from a VPC only. * Must not traverse the public internet. How can th ese requirements be satisfied?",
        "options": [
            "A. Create a VPC endpoint and apply a bucket access p olicy that restricts access to the given VPC endpoi nt",
            "B. Create a VPC endpoint and apply a bucket access p olicy that allows access from the given VPC endpoin t",
            "C. Create a VPC endpoint and use Network Access Cont rol Lists (NACLs) to allow traffic between only thegiven VPC endpoint and an Amazon EC2 instance.",
            "D. Create a VPC endpoint and use security groups to restrict access to the given VPC endpoint and an"
        ],
        "correct": "B. Create a VPC endpoint and apply a bucket access p olicy that allows access from the given VPC endpoin t",
        "explanation": "Explanation/Reference: https://docs.aws.amazon.com/AmazonS3/latest/dev/exa mple-bucket-policies-vpcendpoint. html",
        "references": ""
    },
    {
        "question": "An employee found a video clip with audio on a comp any's social media feed. The language used in the v ideo is Spanish. English is the employee's first languag e, and they do not understand Spanish. The employee  wants to do a sentiment analysis. What combination of services is the MOST efficient to accomplish the task?",
        "options": [
            "A. Amazon Transcribe, Amazon Translate, and Amazon C omprehend",
            "B. Amazon Transcribe, Amazon Comprehend, and Amazon SageMaker seq2seq",
            "C. Amazon Transcribe, Amazon Translate, and Amazon S ageMaker Neural Topic Model (NTM)",
            "D. Amazon Transcribe, Amazon Translate, and Amazon S ageMaker BlazingText"
        ],
        "correct": "A. Amazon Transcribe, Amazon Translate, and Amazon C omprehend",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist is packaging a custom  ResNet model into a Docker container so the compan y can leverage Amazon SageMaker for training. The Spe cialist is using Amazon EC2 P3 instances to train t he model and needs to properly configure the Docker co ntainer to leverage the NVIDIA GPUs. What does the Specialist need to do?",
        "options": [
            "A. Bundle the NVIDIA drivers with the Docker image.",
            "B. Build the Docker container to be NVIDIA-Docker co mpatibl",
            "C. Organize the Docker container's file structure to  execute on GPU instances.",
            "D. Set the GPU flag in the Amazon SageMaker CreateTr ainingJob request body"
        ],
        "correct": "B. Build the Docker container to be NVIDIA-Docker co mpatibl",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist is building a logisti c regression model that will predict whether or not  a person will order a pizza. The Specialist is trying to bui ld the optimal model with an ideal classification t hreshold. What model evaluation technique should the Specialist us e to understand how different classification thresh olds will impact the model's performance?",
        "options": [
            "A. Receiver operating characteristic (ROC) curve",
            "B. Misclassification rate",
            "C. Root Mean Square Error (RM&) D. L1 norm"
        ],
        "correct": "A. Receiver operating characteristic (ROC) curve",
        "explanation": "Explanation/Reference: https://docs.aws.amazon.com/machine-learning/latest /dg/binary-model-insights.html",
        "references": ""
    },
    {
        "question": "An interactive online dictionary wants to add a wid get that displays words used in similar contexts. A  Machine Learning Specialist is asked to provide word featur es for the downstream nearest neighbor model poweri ng the widget. What should the Specialist do to meet these require ments?",
        "options": [
            "A. Create one-hot word encoding vectors.",
            "B. Produce a set of synonyms for every word using Am azon Mechanical Turk.",
            "C. Create word embedding factors that store edit dis tance with every other word.",
            "D. Download word embedding's pre-trained on a large corpus."
        ],
        "correct": "D. Download word embedding's pre-trained on a large corpus.",
        "explanation": "Explanation/Reference: https://aws.amazon.com/blogs/machine-learning/amazo n-sagemaker-object2vec-addsnew-features-that- support-automatic-negative-sampling-and-speed-up-tr aining/",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist is configuring Amazon  SageMaker so multiple Data Scientists can access notebooks, train models, and deploy endpoints. To e nsure the best operational performance, the Special ist needs to be able to track how often the Scientists are deploying models, GPU and CPU utilization on th e deployed SageMaker endpoints, and all errors that a re generated when an endpoint is invoked. Which services are integrated with Amazon SageMaker  to track this information? (Select TWO.)",
        "options": [
            "A. AWS CloudTrail",
            "B. AWS Health",
            "C. AWS Trusted Advisor",
            "D. Amazon CloudWatch"
        ],
        "correct": "",
        "explanation": "Explanation/Reference: https://aws.amazon.com/sagemaker/faqs/",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist trained a regress on model, but the first iteration needs optimizing. Th e Specialist needs to understand whether the model is  more frequently overestimating or underestimating the target. What option can the Specialist use o determ ine whether it is overestimating or underestimating  the target value?",
        "options": [
            "A. Root Mean Square Error (RMSE) B. Residual plots",
            "C. Area under the curve",
            "D. Confusion matrix"
        ],
        "correct": "",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A company wants to classify user behavior as either  fraudulent or normal. Based on internal research, a Machine Learning Specialist would like to build a b inary classifier based on two features: age of acco unt and transaction month. The class distribution for these  features is illustrated in the figure provided. Based on this information, which model would have t he HIGHEST recall with respect to the fraudulent cl ass?",
        "options": [
            "A. Decision tree",
            "B. Linear support vector machine (SVM)",
            "C. Naive Bayesian classifier",
            "D. Single Perceptron with sigmoidal activation funct ion"
        ],
        "correct": "",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "When submitting Amazon SageMaker training jobs usin g one of the built-in algorithms, which common parameters MUST be specified? (Select THREE.)",
        "options": [
            "A. The training channel identifying the location of training data on an Amazon S3 bucket.",
            "B. The validation channel identifying the location o f validation data on an Amazon S3 bucket.",
            "C. The 1AM role that Amazon SageMaker can assume to perform tasks on behalf of the users.",
            "D. Hyperparameters in a JSON array as documented for  the algorithm used."
        ],
        "correct": "",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Data Scientist is developing a machine learning m odel to predict future patient outcomes based on information collected about each patient and their treatment plans. The model should output a continuo us value as its prediction. The data available includes labe led outcomes for a set of 4,000 patients. The study  was conducted on a group of individuals over the age of  65 who have a particular disease that is known to worsen with age. Initial models have performed poorly. While reviewi ng the underlying data, the Data Scientist notices that, out of 4,000 patient observations, there are 450 where the  patient age has been input as 0. The other feature s for these observations appear normal compared to the re st of the sample population. How should the Data Scientist correct this issue?",
        "options": [
            "A. Drop all records from the dataset where age has b een set to 0.",
            "B. Replace the age field value for records with a va lue of 0 with the mean or median value from the dat aset.",
            "C. Drop the age feature from the dataset and train t he model using the rest of the features.",
            "D. Use k-means clustering to handle missing features ."
        ],
        "correct": "A. Drop all records from the dataset where age has b een set to 0.",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Data Science team is designing a dataset reposito ry where it will store a large amount of training d ata commonly used in its machine learning models. As Da ta Scientists may create an arbitrary number of new datasets every day the solution has to scale automa tically and be cost-effective. Also, it must be pos sible to explore the data using SQL. Which storage scheme is MOST adapted to this scenar io?",
        "options": [
            "A. Store datasets as files in Amazon S3.",
            "B. Store datasets as files in an Amazon EBS volume att ached to an Amazon EC2 instance. C. Store datasets as tables in a multi-node Amazon Red shift cluster.",
            "D. Store datasets as global tables in Amazon DynamoD B."
        ],
        "correct": "A. Store datasets as files in Amazon S3.",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist working for an online  fashion company wants to build a data ingestion so lution for the company's Amazon S3-based data lake. The Specialist wants to create a set of ingestion m echanisms that will enable future capabilities comp rised of: Real-time analytics Interactive analytics of historical data Clickstrea m analytics Product recommendations Which services should the Specialist use?",
        "options": [
            "A. AWS Glue as the data dialog; Amazon Kinesis Data Streams and Amazon Kinesis Data Analytics for real-",
            "B. Amazon Athena as the data catalog; Amazon Kinesis  Data Streams and Amazon Kinesis Data Analytics for",
            "C. AWS Glue as the data catalog; Amazon Kinesis Data  Streams and Amazon Kinesis Data Analyticsfor",
            "D. Amazon Athena as the data catalog; Amazon Kinesis  Data Streams and Amazon Kinesis Data Analytics for"
        ],
        "correct": "A. AWS Glue as the data dialog; Amazon Kinesis Data Streams and Amazon Kinesis Data Analytics for real-",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A company is observing low accuracy while training on the default built-in image classification algori thm in Amazon SageMaker. The Data Science team wants to us e an Inception neural network architecture instead of a ResNet architecture. Which of the following will accomplish this? (Selec t TWO.)",
        "options": [
            "A. Customize the built-in image classification algor ithm to use Inception and use this for model traini ng.",
            "B. Create a support case with the SageMaker team to change the default image classification algorithm t o",
            "C. Bundle a Docker container with TensorFlow Estimat or loaded with an Inception network and use this fo r",
            "D. Use custom code in Amazon SageMaker with TensorFl ow Estimator to load the model with an Inception"
        ],
        "correct": "",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist built an image classi fication deep learning model. However the Specialis t ran into an overfitting problem in which the training a nd testing accuracies were 99% and 75%r respectivel y. How should the Specialist address this issue and what i s the reason behind it?",
        "options": [
            "A. The learning rate should be increased because the  optimization process was trapped at a local minimu m.",
            "B. The dropout rate at the flatten layer should be i ncreased because the model is not generalized enoug h.",
            "C. The dimensionality of dense layer next to the fla tten layer should be increased because the model is  not",
            "D. The epoch number should be increased because the optimization process was terminated before it reach ed"
        ],
        "correct": "A. The learning rate should be increased because the  optimization process was trapped at a local minimu m.",
        "explanation": "Explanation/Reference: https://www.tensorflow.org/tutorials/keras/overfit_ and_underfit",
        "references": ""
    },
    {
        "question": "A Machine Learning team uses Amazon SageMaker to tr ain an Apache MXNet handwritten digit classifier model using a research dataset. The team wants to r eceive a notification when the model is overfitting . Auditors want to view the Amazon SageMaker log activity repo rt to ensure there are no unauthorized API calls. W hat should the Machine Learning team do to address the requirements with the least amount of code and fewe st steps?",
        "options": [
            "A. Implement an AWS Lambda function to long Amazon S ageMaker API calls to Amazon S3. Add code to",
            "B. Use AWS CloudTrail to log Amazon SageMaker API ca lls to Amazon S3. Add code to push a custom metric",
            "C. Implement an AWS Lambda function to log Amazon Sa geMaker API calls to AWS CloudTrail. Add code to",
            "D. Use AWS CloudTrail to log Amazon SageMaker API ca lls to Amazon S3. Set up Amazon SNS to receive a"
        ],
        "correct": "C. Implement an AWS Lambda function to log Amazon Sa geMaker API calls to AWS CloudTrail. Add code to",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist is implementing a ful l Bayesian network on a dataset that describes publ ic transit in New York City. One of the random variables is di screte, and represents the number of minutes New Yo rkers wait for a bus given that the buses cycle every 10 minutes, with a mean of 3 minutes. Which prior prob ability distribution should the ML Specialist use for this variable?",
        "options": [
            "A. Poisson distribution , B. Uniform distribution",
            "C. Normal distribution",
            "D. Binomial distribution"
        ],
        "correct": "A. Poisson distribution , B. Uniform distribution",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Data Science team within a large company uses Ama zon SageMaker notebooks to access data stored in Amazon S3 buckets. The IT Security team is concerne d that internet-enabled notebook instances create a security vulnerability where malicious code running  on the instances could compromise data privacy. Th e company mandates that all instances stay within a s ecured VPC with no internet access, and data communication traffic must stay within the AWS netw ork. How should the Data Science team configure the note book instance placement to meet these requirements?",
        "options": [
            "A. Associate the Amazon SageMaker notebook with a pr ivate subnet in a VPC. Place the Amazon SageMaker",
            "B. Associate the Amazon SageMaker notebook with a pr ivate subnet in a VPC. Use 1AM policies to grant",
            "C. Associate the Amazon SageMaker notebook with a pr ivate subnet in a VPC. Ensure the VPC has S3 VPC",
            "D. Associate the Amazon SageMaker notebook with a pr ivate subnet in a VPC. Ensure the VPC has a NAT"
        ],
        "correct": "D. Associate the Amazon SageMaker notebook with a pr ivate subnet in a VPC. Ensure the VPC has a NAT",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist has created a deep le arning neural network model that performs well on t he training data but performs poorly on the test data. Which of the following methods should the Specialis t consider using to correct this? (Select THREE.)",
        "options": [
            "A. Decrease regularization.",
            "B. Increase regularization.",
            "C. Increase dropout.",
            "D. Decrease dropout."
        ],
        "correct": "",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "streaming data. The ingestion process must buffer and convert incom ing records from JSON to a query-optimized, columna r format without data loss. The output datastore must  be highly available, and Analysts must be able to run SQL queries against the data and connect to existing bu siness intelligence dashboards. Which solution shou ld the Data Scientist build to satisfy the requirements?",
        "options": [
            "A. Create a schema in the AWS Glue Data Catalog of t he incoming data format. Use an Amazon Kinesis Data",
            "B. Write each JSON record to a staging location in A mazon S3. Use the S3 Put event to trigger an AWS",
            "C. Write each JSON record to a staging location in A mazon S3. Use the S3 Put event to trigger an AWS",
            "D. Use Amazon Kinesis Data Analytics to ingest the s treaming data and perform real-time SQL queries to"
        ],
        "correct": "A. Create a schema in the AWS Glue Data Catalog of t he incoming data format. Use an Amazon Kinesis Data",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A company is setting up an Amazon SageMaker environ ment. The corporate data security policy does not a llow communication over the internet. How can the company enable the Amazon SageMaker ser vice without enabling direct internet access to Amazon SageMaker notebook instances?",
        "options": [
            "A. Create a NAT gateway within the corporate VPC.",
            "B. Route Amazon SageMaker traffic through an on-prem ises network.",
            "C. Create Amazon SageMaker VPC interface endpoints w ithin the corporate VPC.",
            "D. Create VPC peering with Amazon VPC hosting Amazon  SageMaker."
        ],
        "correct": "A. Create a NAT gateway within the corporate VPC.",
        "explanation": "Explanation/Reference: https://docs.aws.amazon.com/sagemaker/latest/dg/sag emaker-dg.pdf (46)",
        "references": ""
    },
    {
        "question": "An office security agency conducted a successful pi lot using 100 cameras installed at key locations wi thin the main office. Images from the cameras were uploaded to Amazon S3 and tagged using Amazon Rekognition, and the results were stored in Amazon ES. The agenc y is now looking to expand the pilot into a full pr oduction system using thousands of video cameras in its offi ce locations globally. The goal is to identify activities perfor med by non-employees in real time. Which solution s hould the agency consider? A. Use a proxy server at each local office and for e ach camera, and stream the RTSP feed to a unique Amazon Kinesis Video Streams video stream. On each stream, use Amazon Rekognition Video and create a stream processor to detect faces from a collectio n of known employees, and alert when nonemployees are detected.",
        "options": [
            "B. Use a proxy server at each local office and for e ach camera, and stream the RTSP feed to a unique",
            "C. Install AWS DeepLens cameras and use the DeepLens _Kinesis_Video module to stream video to Amazon",
            "D. Install AWS DeepLens cameras and use the DeepLens _Kinesis_Video module to stream video to Amazon"
        ],
        "correct": "C. Install AWS DeepLens cameras and use the DeepLens _Kinesis_Video module to stream video to Amazon",
        "explanation": "Explanation/Reference: https://aws.amazon.com/blogs/machine-learning/video -analytics-in-the-cloud-and-atthe-edge-with-aws- deeplens-and-kinesis-video-streams/",
        "references": ""
    },
    {
        "question": "A financial services company is building a robust s erverless data lake on Amazon S3. The data lake sho uld be flexible and meet the following requirements: * Support querying old and new data on Amazon S3 th rough Amazon Athena and Amazon Redshift Spectrum. * Support event-driven ETL pipelines. * Provide a quick and easy way to understand metada ta. Which approach meets these requirements?",
        "options": [
            "A. Use an AWS Glue crawler to crawl S3 data, an AWS Lambda function to trigger an AWS Glue ETL job, and",
            "B. Use an AWS Glue crawler to crawl S3 data, an AWS Lambda function to trigger an AWS Batch job, and an",
            "C. Use an AWS Glue crawler to crawl S3 data, an Amaz on CloudWatch alarm to trigger an AWS Batch job,",
            "D. Use an AWS Glue crawler to crawl S3 data, an Amaz on CloudWatch alarm to trigger an AWS Glue ETL"
        ],
        "correct": "A. Use an AWS Glue crawler to crawl S3 data, an AWS Lambda function to trigger an AWS Glue ETL job, and",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A company's Machine Learning Specialist needs to im prove the training speed of a time-series forecasti ng model using TensorFlow. The training is currently i mplemented on a single-GPU machine and takes approximately 23 hours to complete. The training ne eds to be run daily. The model accuracy js acceptable, but the company a nticipates a continuous increase in the size of the  training data and a need to update the model on an hourly, r ather than a daily, basis. The company also wants t o minimize coding effort and infrastructure changes What should the Machine Learning Specialist do to t he training solution to allow it to scale for futur e demand?",
        "options": [
            "A. Do not change the TensorFlow code. Change the mac hine to one with a more powerful GPU to speed up the training.",
            "B. Change the TensorFlow code to implement a Horovod  distributed framework supported by Amazon",
            "C. Switch to using a built-in AWS SageMaker DeepAR m odel. Parallelize the training to as many machines as",
            "D. Move the training to Amazon EMR and distribute th e workload to as many machines as needed to achieve"
        ],
        "correct": "B. Change the TensorFlow code to implement a Horovod  distributed framework supported by Amazon",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist is required to build a supervised image-cognition model to identify a ca t. The ML Specialist performs some tests and records the foll owing results for a neural network based image clas sifier: Total number of images available = 1,000 Test set i mages = 100 (constant test set) The ML Specialist n otices that, in over 75% of the misclassified images, the cats were held upside down by their owners. Which techniques can be used by the ML Specialist t o improve this specific test error?",
        "options": [
            "A. Increase the training data by adding variation in  rotation for training images.",
            "B. Increase the number of epochs for model training.",
            "C. Increase the number of layers for the neural netw ork.",
            "D. Increase the dropout rate for the second-to-last layer."
        ],
        "correct": "A. Increase the training data by adding variation in  rotation for training images.",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Data Scientist is developing a machine learning m odel to classify whether a financial transaction is  fraudulent. The labeled data available for training consists of  100,000 non-fraudulent observations and 1,000 frau dulent observations. The Data Scientist applies the XGBoos t algorithm to the data, resulting in the following  confusion matrix when the trained model is applied to a previ ously unseen validation dataset. The accuracy of th e model is 99.1%, but the Data Scientist has been asked to reduce the number of false negatives. Which combination of steps should the Data Scientis t take to reduce the number of false positive predi ctions by the model? (Select TWO.)",
        "options": [
            "A. Change the XGBoost eval_metric parameter to optim ize based on rmse instead of error.",
            "B. Increase the XGBoost scale_pos_weight parameter t o adjust the balance of positive and negative weigh ts.",
            "C. Increase the XGBoost max_depth parameter because the model is currently underfitting the data.",
            "D. Change the XGBoost evaljnetric parameter to optimiz e based on AUC instead of error. E. Decrease the XGBoost max_depth parameter because th e model is currently overfitting the data."
        ],
        "correct": "",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist is assigned a TensorF low project using Amazon SageMaker for training, an d needs to continue working for an extended period wi th no Wi-Fi access. Which approach should the Specialist use to continu e working?",
        "options": [
            "A. Install Python 3 and boto3 on their laptop and co ntinue the code development using that environment.",
            "B. Download the TensorFlow Docker container used in Amazon SageMaker from GitHub to their local",
            "C. Download TensorFlow from tensorflow.org to emulat e the TensorFlow kernel in the SageMaker",
            "D. Download the SageMaker notebook to their local en vironment then install Jupyter Notebooks on their l aptop"
        ],
        "correct": "D. Download the SageMaker notebook to their local en vironment then install Jupyter Notebooks on their l aptop",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Data Scientist wants to gain real-time insights i nto data stream of GZIP files. Which solution would  allow the use of SQL to query the stream with the LEAST laten cy?",
        "options": [
            "A. Amazon Kinesis Data Analytics with an AWS Lambda function to transform the data.",
            "B. AWS Glue with a custom ETL script to transform th e data.",
            "C. An Amazon Kinesis Client Library to transform the  data and save it to an Amazon ES cluster.",
            "D. Amazon Kinesis Data Firehose to transform the dat a and put it into an Amazon S3 bucket."
        ],
        "correct": "A. Amazon Kinesis Data Analytics with an AWS Lambda function to transform the data.",
        "explanation": "Explanation/Reference: https://aws.amazon.com/big-data/real-time-analytics -featured-partners/",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist must build out a proc ess to query a dataset on Amazon S3 using Amazon Athena The dataset contains more than 800.000 recor ds stored as plaintext CSV files Each record contai ns 200 columns and is approximately 1 5 MB in size Mos t queries will span 5 to 10 columns only How should  the Machine Learning Specialist transform the dataset t o minimize query runtime?",
        "options": [
            "A. Convert the records to Apache Parquet format",
            "B. Convert the records to JSON format",
            "C. Convert the records to GZIP CSV format",
            "D. Convert the records to XML format Correct Answer: A"
        ],
        "correct": "",
        "explanation": "Explanation/Reference: Using compressions will reduce the amount of data s canned by Amazon Athena, and also reduce your S3 bucket storage. It's a Win-Win for your AWS bill. S upported formats: GZIP, LZO, SNAPPY (Parquet) and Z LIB. https://www.cloudforecast.io/blog/using-parquet-on- athena-to-save-money-on-aws/",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist is developing a daily  ETL workflow containing multiple ETL jobs The work flow consists of the following processes * Start the workflow as soon as data is uploaded to  Amazon S3 * When all the datasets are available in Amazon S3,  start an ETL job to join the uploaded datasets wit h multiple terabyte-sized datasets already stored in Amazon S3 * Store the results of joining datasets in Amazon S 3 * If one of the jobs fails, send a notification to the Administrator Which configuration will meet the se requirements?",
        "options": [
            "A. Use AWS Lambda to trigger an AWS Step Functions w orkflow to wait for dataset uploads to complete in",
            "B. Develop the ETL workflow using AWS Lambda to star t an Amazon SageMaker notebook instance Use a",
            "C. Develop the ETL workflow using AWS Batch to trigg er the start of ETL jobs when data is uploaded to",
            "D. Use AWS Lambda to chain other Lambda functions to  read and join the datasets in Amazon S3 as soon as"
        ],
        "correct": "A. Use AWS Lambda to trigger an AWS Step Functions w orkflow to wait for dataset uploads to complete in",
        "explanation": "Explanation/Reference: https://aws.amazon.com/step-functions/use-cases/",
        "references": ""
    },
    {
        "question": "An agency collects census information within a coun try to determine healthcare and social program need s by province and city. The census form collects respons es for approximately 500 questions from each citize n. Which combination of algorithms would provide the a ppropriate insights? (Select TWO )",
        "options": [
            "A. The factorization machines (FM) algorithm",
            "B. The Latent Dirichlet Allocation (LDA) algorithm",
            "C. The principal component analysis (PCA) algorithm",
            "D. The k-means algorithm",
            "A. Train a custom ARIMA model to forecast demand for  the new product.",
            "B. Train an Amazon SageMaker DeepAR algorithm to for ecast demand for the new product",
            "C. Train an Amazon SageMaker k-means clustering algo rithm to forecast demand for the new product.",
            "D. Train a custom XGBoost model to forecast demand f or the new product"
        ],
        "correct": "B. Train an Amazon SageMaker DeepAR algorithm to for ecast demand for the new product",
        "explanation": "Explanation/Reference: The Amazon SageMaker DeepAR forecasting algorithm i s a supervised learning algorithm for forecasting scalar (one-dimensional) time series using recurren t neural networks (RNN). Classical forecasting meth ods, such as autoregressive integrated moving average (A RIMA) or exponential smoothing (ETS), fit a single model to each individual time series. They then use that model to extrapolate the time series into the futur e.",
        "references": ""
    },
    {
        "question": "A Data Scientist needs to migrate an existing on-pr emises ETL process to the cloud The current process  runs at regular time intervals and uses PySpark to combi ne and format multiple large data sources into a si ngle consolidated output for downstream processing. The Data Scientist has been given the following req uirements for the cloud solution. * Combine multiple data sources. * Reuse existing PySpark logic. * Run the solution on the existing schedule. * Minimize the number of servers that will need to be managed. Which architecture should the Data Scie ntist use t build this solution?",
        "options": [
            "A. Write the raw data to Amazon S3 Schedule an AWS L ambda function to submit a Spark step to a persiste nt",
            "B. Write the raw data to Amazon S3 Create an AWS Glu e ETL job to perform the ETL processing against the",
            "C. Write the raw data to Amazon S3 Schedule an AWS L ambda function to run on the existing schedule and",
            "D. Use Amazon Kinesis Data Analytics to stream the i nput data and perform realtime SQL queries against the",
            "A. Alexa for Business",
            "B. Amazon Connect",
            "C. Amazon Lex",
            "D. Amazon Poly"
        ],
        "correct": "A. Write the raw data to Amazon S3 Schedule an AWS L ambda function to submit a Spark step to a persiste nt",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist is applying a linear least squares regression model to a dataset with 1 000 records and 50 features Prior to training, the ML S pecialist notices that two features are perfectly l inearly dependent. Why could this be an issue for the linea r least squares regression model?",
        "options": [
            "A. It could cause the back propagation algorithm to fail during training",
            "B. It could create a singular matrix during optimiza tion which fails to define a unique solution",
            "C. It could modify the loss function during optimiza tion causing i to fail during training",
            "D. It could introduce non-linear dependencies within  the data which could invalidate the linear assumpt ions of"
        ],
        "correct": "C. It could modify the loss function during optimiza tion causing i to fail during training",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist uploads a data et to an Amazon S3 bucket protected with server-side encr yption using AWS KMS. How should the ML Specialist define the Amazon Sage Maker notebook instance so it can read the same dataset from Amazon S3?",
        "options": [
            "A. Define security group(s) to allow all HTTP inboun d/outbound traffic and assign those security group( s) to the",
            "B. ?onfigure the Amazon SageMaker notebook instance to have access to the VPC. Grant permission in the",
            "C. Assign an IAM role to the Amazon SageMaker notebo ok with S3 read access to the dataset. Grant",
            "D. Assign the same KMS key used to encrypt data in A mazon S3 to the Amazon SageMakernotebook"
        ],
        "correct": "D. Assign the same KMS key used to encrypt data in A mazon S3 to the Amazon SageMakernotebook",
        "explanation": "Explanation/Reference: https://docs.aws.amazon.com/sagemaker/latest/dg/enc ryption-at-rest.html",
        "references": ""
    },
    {
        "question": "A Data Scientist is building a model to predict cus tomer churn using a dataset of 100 continuous numer ical features. The Marketing team has not provided any i nsight about which features are relevant for churn prediction. The Marketing team wants to interpret t he model and see the direct impact of relevant feat ures on the model outcome. While training a logistic regres sion model, the Data Scientist observes that there is a wide gap between the training and validation set accurac y. Which methods can the Data Scientist use to improve  the model performance and satisfy the Marketing te am's needs? (Choose two.)",
        "options": [
            "A. Add L1 regularization to the classifier",
            "B. Add features to the dataset",
            "C. Perform recursive feature elimination",
            "D. Perform t-distributed stochastic neighbor embeddi ng (t-SNE)"
        ],
        "correct": "",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "An aircraft engine manufacturing company is measuri ng 200 performance metrics in a time-series. Engine ers want to detect critical manufacturing defects in ne ar-real time during testing. All of the data needs to be stored for offline analysis. What approach would be the MOST effective to perfor m near-real time defect detection?",
        "options": [
            "A. Use AWS IoT Analytics for ingestion, storage, and  further analysis. Use Jupyter notebooks from withi n AWS",
            "B. Use Amazon S3 for ingestion, storage, and further  analysis. Use an Amazon EMR cluster to carry out",
            "C. Use Amazon S3 for ingestion, storage, and further  analysis. Use the Amazon SageMaker Random Cut",
            "D. Use Amazon Kinesis Data Firehose for ingestion an d Amazon Kinesis Data Analytics Random Cut Forest"
        ],
        "correct": "B. Use Amazon S3 for ingestion, storage, and further  analysis. Use an Amazon EMR cluster to carry out",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning team runs its own training algor ithm on Amazon SageMaker. The training algorithm requires external assets. The team needs to submit both its own algorithm code and algorithmspecific parameters to Amazon SageMaker. What combination of services should the team use to  build a custom algorithm in Amazon SageMaker? (Choose two.) A. AWS Secrets Manager",
        "options": [
            "B. AWS CodeStar",
            "C. Amazon ECR",
            "D. Amazon ECS"
        ],
        "correct": "",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A company uses a long short-term memory (LSTM) mode l to evaluate the risk factors of a particular ener gy sector. The model reviews multi-page text documents  to analyze each sentence of the text and categoriz e it as either a potential risk or no risk. The model is no t performing well, even though the Data Scientist h as experimented with many different network structures  and tuned the corresponding hyperparameters. Which approach will provide the MAXIMUM performance  boost?",
        "options": [
            "A. Initialize the words by term frequency-inverse do cument frequency (TF-IDF) vectors pretrained on a l arge",
            "B. Use gated recurrent units (GRUs) instead of LSTM and run the training process until the validation l oss",
            "C. Reduce the learning rate and run the training pro cess until the training loss stops decreasing.",
            "D. Initialize the words by word2vec embeddings pretr ained on a large collection of news articles relate d to the"
        ],
        "correct": "C. Reduce the learning rate and run the training pro cess until the training loss stops decreasing.",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist previously trained a logistic regression model using scikit-learn on a l ocal machine, and the Specialist now wants to deploy it to production for inference only. What steps should  be taken to ensure Amazon SageMaker can host a model that wa s trained locally?",
        "options": [
            "A. Build the Docker image with the inference code. T ag the Docker image with the registryhostname and",
            "B. Serialize the trained model so the format is comp ressed for deployment. Tag the Docker image with th e",
            "C. Serialize the trained model so the forma is compr essed for deployment. Build the image and upload it  to",
            "D. Build the Docker image with the inference code. C onfigure Docker Hub and upload the image to Amazon",
            "A. Use a database, such as Amazon DynamoDB, to store  the images, and set the IAM policies to restrict",
            "B. Use an Amazon S3-backed data lake to store the ra w images, and set up the permissions using bucket",
            "C. Setup up Amazon EMR with Hadoop Distributed File System (HDFS) to store the files, and restrict acce ss",
            "D. Configure Amazon EFS with IAM policies to make th e data available to Amazon EC2 instances owned by"
        ],
        "correct": "C. Setup up Amazon EMR with Hadoop Distributed File System (HDFS) to store the files, and restrict acce ss",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A credit card company wants to build a credit scori ng model to help predict whether a new credit card applicant will default on a credit card payment. The company has collected data from a large number of sources w ith thousands of raw attributes. Early experiments to t rain a classification model revealed that many attr ibutes are highly correlated, the large number of features slo ws down the training speed significantly, and that there are some overfitting issues. The Data Scientist on this project would like to sp eed up the model training time without losing a lot  of information from the original dataset. Which feature engineering technique should the Data  Scientist use to meet the objectives?",
        "options": [
            "A. Run self-correlation on all features and remove h ighly correlated features",
            "B. Normalize all numerical values to be between 0 an d 1",
            "C. Use an autoencoder or principal component analysi s (PCA) to replace original features with new featu res",
            "D. Cluster raw data using k-means and use sample dat a from each cluster to build a new dataset"
        ],
        "correct": "B. Normalize all numerical values to be between 0 an d 1",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Data Scientist is training a multilayer perceptio n (MLP) on a dataset with multiple classes. The tar get class of interest is unique compared to the other classes wi thin the dataset, but it does not achieve and accep table recall metric. The Data Scientist has already tried  varying the number and size of the MLP's hidden la yers, which has not significantly improved the results. A  solution to improve recall must be implemented as quickly as possible. Which techniques should be used to meet these requi rements?",
        "options": [
            "A. Gather more data using Amazon Mechanical Turk and  then retrain",
            "B. Train an anomaly detection model instead of an ML P",
            "C. Train an XGBoost model instead of an MLP",
            "D. Add class weights to the MLP's loss function and th en retrain Correct Answer: C"
        ],
        "correct": "",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist works for a credit ca rd processing company and needs to predict which transactions may be fraudulent in near-real time. S pecifically, the Specialist must train a model that  returns the probability that a given transaction may fraudulent . How should the Specialist frame this business probl em?",
        "options": [
            "A. Streaming classification",
            "B. Binary classification",
            "C. Multi-category classification",
            "D. Regression classification"
        ],
        "correct": "C. Multi-category classification",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A real estate company wants to create a machine lea rning model for predicting housing prices based on a historical dataset. The dataset contains 32 feature s. Which model will meet the business requirement?",
        "options": [
            "A. Logistic regression",
            "B. Linear regression",
            "C. K-means",
            "D. Principal component analysis (PCA)"
        ],
        "correct": "B. Linear regression",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist wants to bring a cust om algorithm to Amazon SageMaker. The Specialist implements the algorithm in a Docker container supp orted by Amazon SageMaker. How should the Specialis t package the Docker container so that Amazon SageMak er an launch the training correctly?",
        "options": [
            "A. Modify the bash_profile file in the container and  add a bash command to start the training program",
            "B. Use CMD config in the Dockerfile to add the train ing program as CMD of the image",
            "C. Configure the training program as an ENTRYPOINT n amed train",
            "D. Copy the training program to directory /opt/ml/tr ain"
        ],
        "correct": "B. Use CMD config in the Dockerfile to add the train ing program as CMD of the image",
        "explanation": "Explanation Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Data Scientist needs to analyze employment d ta. The dataset contains approximately 10 million obser vations on people across 10 different features. During the preliminary analysis, the Data Scientist notices th at income and age distributions are not normal. While income levels shows a right skew as expected, with fewer individuals having a higher income, the age distrib ution also show a right skew, with fewer older indi viduals participating in the workforce. Which feature transformations can the Data Scientis t apply to fix the incorrectly skewed data? (Choose  two.)",
        "options": [
            "A. Cross-validation",
            "B. Numerical value binning",
            "C. High-degree polynomial transformation",
            "D. Logarithmic transformation"
        ],
        "correct": "",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist is given a structured  dataset on the shopping habits of a company's cust omer base. The dataset contains thousands of columns of data and hundreds of numerical columns for each customer. The Specialist wants to identify whether there are natural groupings for these columns acros s all customers and visualize the results as quickly as p ossible. What approach should the Specialist take t o accomplish these tasks?",
        "options": [
            "A. Embed the numerical features using the t-distribu ted stochastic neighbor embedding (t-SNE) algorithm  and",
            "B. Run k-means using the Euclidean distance measure for different values of k and create an elbow plot.",
            "C. Embed the numerical features using the t-distribu ted stochastic neighbor embedding (t-SNE) algorithm  and",
            "D. Run k-means using the Euclidean distance measure for different values of k and create box plots for each"
        ],
        "correct": "B. Run k-means using the Euclidean distance measure for different values of k and create an elbow plot.",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist is planning to create  a long-running Amazon EMR cluster. The EMRcluster will have 1 master node, 10 core nodes, and 20 task node s. To save on costs, the Specialist will use Spot Instances in the EMR cluster. Which nodes should the Specialist launch on Spot In stances?",
        "options": [
            "A. Master node",
            "B. Any of the core nodes",
            "C. Any of the task nodes D. Both core and task nodes"
        ],
        "correct": "A. Master node",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A company wants to predict the sale prices of house s based on available historical sales data. The tar get variable in the company's dataset is the sale price . The features include parameters such as the lot s ize, living area measurements, non-living area measurements, nu mber of bedrooms, number of bathrooms, year built, and postal code. The company wants to use multi-var iable linear regression to predict house sale price s. Which step should a machine learning specialist take to r emove features that are irrelevant for the analysis  and reduce the model's complexity?",
        "options": [
            "A. Plot a histogram of the features and compute thei r standard deviation. Remove features with high var iance.",
            "B. Plot a histogram of the features and compute thei r standard deviation. Remove features with low vari ance.",
            "C. Build a heatmap showing the correlation of the da taset against itself. Remove features with low mutu al",
            "D. Run a correlation check of all features against t he target variable. Remove features with low target  variable"
        ],
        "correct": "D. Run a correlation check of all features against t he target variable. Remove features with low target  variable",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A health care company is planning to use neural net works to classify their X-ray images into normal an d abnormal classes. The labeled data is divided into a training set of 1,000 images and a test set of 20 0 images. The initial training of a neural network model with  50 hidden layers yielded 99% accuracy on the train ing set, but only 55% accuracy on the test set. What changes should the Specialist consider to solv e this issue? (Choose three.)",
        "options": [
            "A. Choose a higher number of layers",
            "B. Choose a lower number of layers",
            "C. Choose a smaller learning rate",
            "D. Enable dropout"
        ],
        "correct": "",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist is attempting to buil d a linear regression model. Given the displayed re sidual plot only, what is the MOST likely problem with the mode l? A. Linear regression is inappropriate. The residuals d o not have constant variance.",
        "options": [
            "B. Linear regression is inappropriate. The underlyin g data has outliers.",
            "C. Linear regression is appropriate. The residuals h ave a zero mean.",
            "D. Linear regression is appropriate. The residuals h ave constant variance."
        ],
        "correct": "D. Linear regression is appropriate. The residuals h ave constant variance.",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A machine learning specialist works for a fruit pro cessing company and needs to build a system that categorizes apples into three types. The specialist  has collected a dataset that contains 150 images f or each type of apple and applied transfer learning on a ne ural network that was pretrained on ImageNet with t his dataset. The company requires at least 85% accuracy to make use of the model. After an exhaustive grid search, the optima hyperparameters produced the following: 68% accuracy on the training set 67% accuracy on the validation set What can the machine learning specialist do to impr ove the system's accuracy?",
        "options": [
            "A. Upload the model to an Amazon SageMaker notebook instance and use the AmazonSageMaker HPO",
            "B. Add more data to the training set and retrain the  model using transfer learning to reduce the bias.",
            "C. Use a neural network model with more layers that are pretrained on ImageNet and apply transfer learn ing to",
            "D. Train a new model using the current neural networ k architecture."
        ],
        "correct": "B. Add more data to the training set and retrain the  model using transfer learning to reduce the bias.",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A company uses camera images of the tops of items d isplayed on store shelves to determine which items were removed and which ones still remain. After several hours of data labeling, the company has a total of 1,000 hand-labeled images covering 10 distinct items. The  training results were poor. Which machine learning approach fulfills the company's long-term needs?",
        "options": [
            "A. Convert the images to grayscale and retrain the m odel",
            "B. Reduce the number of distinct items from 10 to 2,  build the model, and iterate",
            "C. Attach different colored labels to each item, tak e the images again, and build the model",
            "D. Augment training data for each item using image v ariants like inversions and translations, build the  model,"
        ],
        "correct": "A. Convert the images to grayscale and retrain the m odel",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "series of test results. The Data Scientist has data  on 400 patients randomly selected from the populat ion. The disease is seen in 3% of the population. Which cross-validation strategy should the Data Sci entist adopt?",
        "options": [
            "A. A k-fold cross-validation strategy with k=5",
            "B. A stratified k-fold cross-validation strategy wit h k=5",
            "C. A k-fold cross-validation strategy with k=5 and 3  repeats",
            "D. An 80 stratified split between training and valid ation"
        ],
        "correct": "B. A stratified k-fold cross-validation strategy wit h k=5",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A technology startup is using complex deep neural n etworks and GPU compute to recommend the company's products to its existing customers based upon each customer's habits and interactions. The solution cu rrently pulls each dataset from an Amazon S3 bucket before loading the data into a TensorFlow model pulled fro m the company's Git repository that runs locally This job  then runs for several hours while continually outp utting its progress to the same S3 bucket. The job c n be paus ed, restarted, and continued at any time in the eve nt of a failure, and is run from a central queue. Senior managers are concerned about the complexity f the solution's resource management and the costs involved in repeating the process regularly. They a sk for the workload to be automated so it runs once  a week, starting Monday and completing by the close of busi ness Friday. Which architecture should be used to scale the solu tion at the lowest cost?",
        "options": [
            "A. Implement the solution using AWS Deep Learning Co ntainers and run the container as a job using AWS",
            "B. Implement the solution using a low-cost GPU-compa tible Amazon EC2 instance and use the AWS Instance",
            "C. Implement the solution using AWS Deep Learning Co ntainers, run the workload using AWSFargate running",
            "D. Implement the solution using Amazon ECS running o n Spot Instances and schedule the task using the EC S"
        ],
        "correct": "C. Implement the solution using AWS Deep Learning Co ntainers, run the workload using AWSFargate running",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A media company with a very large archive of unlabe led images, text, audio, and video footage wishes t o index its assets to allow rapid identification of relevan t content by the Research team. The company wants t o use machine learning to accelerate the efforts of its i n-house researchers who have limited machine learni ng expertise. Which is the FASTEST route to index the assets?",
        "options": [
            "A. Use Amazon Rekognition, Amazon Comprehend, and Am azon Transcribe to tag data into distinct",
            "B. Create a set of Amazon Mechanical Turk Human Inte lligence Tasks to label all footage.",
            "C. Use Amazon Transcribe to convert speech to text. Use the Amazon SageMaker Neural TopicModel (NTM)"
        ],
        "correct": "A. Use Amazon Rekognition, Amazon Comprehend, and Am azon Transcribe to tag data into distinct",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist is working for an onl ine retailer that wants to run analytics on every c ustomer visit, processed through a machine learning pipelin e. The data needs to be ingested by Amazon Kinesis Data Streams at up to 100 transactions per second, and t he JSON data blob is 100 KB in size. What is the MINIMUM number of shards in Kinesis Data Streams th e Specialist should use to successfully ingest this  data?",
        "options": [
            "A. 1 shards",
            "B. 10 shards",
            "C. 100 shards",
            "D. 1,000 shards"
        ],
        "correct": "B. 10 shards",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist is deciding between b uilding a naive Bayesian model or a full Bayesian n etwork for a classification problem. The Specialist comput es the Pearson correlation coefficients between eac h feature and finds that their absolute values range between 0.1 to 0.95. Which model describes the underlying data in this s ituation?",
        "options": [
            "A. A naive Bayesian model, since the features are al l conditionally independent.",
            "B. A full Bayesian network, since the features are a ll conditionally independent.",
            "C. A naive Bayesian model, since some of the feature s are statistically dependent.",
            "D. A full Bayesian network, since some of the featur es are statistically dependent."
        ],
        "correct": "C. A naive Bayesian model, since some of the feature s are statistically dependent.",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Data Scientist is building a linear regression mo del and will use resulting p-values to evaluate the  statistical significance of each coefficient. Upon inspection o f the dataset, the Data Scientist discovers that mo st of the features are normally distributed. The plot of one feature in the dataset is shown in the graphic. Wha t transformation should the Data Scientist apply to s atisfy the statistical assumptions of the linear re gression model? A. Exponential transformation",
        "options": [
            "B. Logarithmic transformation",
            "C. Polynomial transformation",
            "D. Sinusoidal transformation"
        ],
        "correct": "",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist is assigned to a Frau d Detection team and must tune an XGBoost model, wh ich is working appropriately for test data. However, wi th unknown data, it is not working as expected. The  existing parameters are provided as follows. Which parameter tuning guidelines should the Specia list follow to avoid overfitting?",
        "options": [
            "A. Increase the max_depth parameter value.",
            "B. Lower the max_depth parameter value.",
            "C. Update the objective to binary:logistic.",
            "D. Lower the min_child_weight parameter value."
        ],
        "correct": "B. Lower the max_depth parameter value.",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A data scientist is developing a pipeline to ingest  streaming web traffic data. The data scientist nee ds to implement a process to identify unusual web traffic  patterns as part of the pipeline. The patterns wil l be used downstream for alerting and incident response. The data scientist has access to unlabeled historic dat a to use, if needed. The solution needs to do the following: Calculate an anomaly score for each web traffic ent ry. Adapt unusual event identification to changing web patterns over time. Which approach should the data scientist implement to meet these requirements?",
        "options": [
            "A. Use historic web traffic data to train an anomaly  detection model using the Amazon SageMaker Random",
            "B. Use historic web traffic data to train an anomaly  detection model using the Amazon SageMaker built-i n XG",
            "C. Collect the streaming data using Amazon Kinesis D ata Firehose. Map the delivery stream as an input",
            "D. Collect the streaming data using Amazon Kinesis D ata Firehose. Map he delivery stream as an input so urce"
        ],
        "correct": "D. Collect the streaming data using Amazon Kinesis D ata Firehose. Map he delivery stream as an input so urce",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A Data Scientist received a set of insurance record s, each consisting of a record ID, the final outcom e among 200 categories, and the date of the final outcome. Some partial information on claim contents is also provided, but only for a few of the 200 categories. For each outcome category, there are hundreds of records dis tributed over the past 3 years. The Data Scientist wants to predict how many claims to expect in each category from month to month, a few months in advance. What type of machine learning model should be used?",
        "options": [
            "A. Classification month-to-month using supervised lear ning of the 200 categories based on claim contents.B. Reinforcement learning using claim IDs and timest amps where the agent will identify how many claims in",
            "C. Forecasting using claim IDs and timestamps to ide ntify how many claims in each category to expect fr om",
            "D. Classification with supervised learning of the ca tegories for which partial information on claim con tents is"
        ],
        "correct": "C. Forecasting using claim IDs and timestamps to ide ntify how many claims in each category to expect fr om",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A company that promotes healthy sleep patterns by p roviding cloud-connected devices currently hosts a sleep tracking application on AWS. The application collec ts device usage information from device users. The company's Data Science team is building a machine l earning model to predict if and when a user will st op utilizing the company's devices. Predictions from t his model are used by a downstream application that determines the best approach for contacting users. The Data Science team is building multiple versions  of the machine learning model to evaluate each ver sion against the company's business goals. To measure lo ng-term effectiveness, the team wants to run multip le versions of the model in parallel for long periods of time, with the ability to control the portion of  inferences served by the models. Which solution satisfies thes e requirements with MINIMAL effort?",
        "options": [
            "A. Build and host multiple models in Amazon SageMake r. Create multiple Amazon SageMaker endpoints, one",
            "B. Build and host multiple models in Amazon SageMake r. Create an Amazon SageMaker endpoint",
            "C. Build and host multiple models in Amazon SageMake r Neo to take into account different types of medic al",
            "D. Build and host multiple models in Amazon SageMake r. Create a single endpoint that accesses multiple"
        ],
        "correct": "B. Build and host multiple models in Amazon SageMake r. Create an Amazon SageMaker endpoint",
        "explanation": "Explanation/Reference: A/B testing with Amazon SageMaker is required in th e Exam. In A/B t sting, you test different variants  of your models and compare how each variant performs Amazon  SageMaker enables you to test multiple models or model versions behind the `same endpoint using `pro duction variants`. Each production variant identifi es a machine learning (ML) model and the resources deplo yed for hosting the model. To test multiple models by `distributing traffic` between them, specify the `p ercentage of the traffic` that gets routed to each model by specifying he `weight` for each `production variant ` in the endpoint configuration. https://docs.aws.amazon.com/sagemaker/latest/dg/mod el-ab-testing.html#model-testing-target/ variant",
        "references": ""
    },
    {
        "question": "An agricultural company is interested in using mach ine learning to detect specific types of weeds in a  100-acre grassland field. Currently, the company uses tracto r-mounted cameras to capture multiple images of the  field as 10 like broadleaf and non-broadleaf docks. The company wants to build a weed detection model t hat will detect specific types of weeds and the loc ation of each type within the field. Once the model is ready , it will be hosted on Amazon SageMaker endpoints. The model will perform real-time inferencing using the images captured by the cameras. Which approach shou ld a Machine Learning Specialist take to obtain accurate  predictions?",
        "options": [
            "A. Prepare the images in RecordIO format and upload them to Amazon S3. Use Amazon SageMaker to train,",
            "B. Prepare the images in Apache Parquet format and u pload them to Amazon S3. Use Amazon SageMaker to",
            "C. Prepare the images in RecordIO format and upload them to Amazon S3. Use Amazon SageMaker to train,",
            "D. Prepare the images in Apache Parquet format and u pload them to Amazon S3. Use Amazon SageMaker to"
        ],
        "correct": "C. Prepare the images in RecordIO format and upload them to Amazon S3. Use Amazon SageMaker to train,",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A manufacturer is operating a large number of facto ries with a complex supply chain relationship where unexpected downtime of a machine can cause producti on to stop at several factories. A data scientist w ants to analyze sensor data from the factories to identify equipment in need of preemptive maintenance and the n dispatch a service team to prevent unplanned downti me. The sensor readings from a single machine can include up to 200 data points including temperature s, voltages, vibrations, RPMs, and pressure reading s. To collect this sensor data, the manufacturer deplo yed Wi-Fi and LANs across the factories. Even thoug h many factory locations do not have reliable or high-spee d internet connectivity, the manufacturer would lik e to maintain near-real-time inference capabilities. Which deployment architecture for the model will ad dress these business requirements?",
        "options": [
            "A. Deploy the model in Amazon SageMaker. Run sensor data through this model to predict which machines",
            "B. Deploy the model on AWS IoT Greengrass in each fa ctory. Run sensor data through this model to infer",
            "C. Deploy the model to an Amazon SageMaker batch tra nsformation job Generate inferences in a daily batc h",
            "D. Deploy the model in Amazon SageMaker and use an I oT rule to write data to an Amazon DynamoDB table."
        ],
        "correct": "B. Deploy the model on AWS IoT Greengrass in each fa ctory. Run sensor data through this model to infer",
        "explanation": "Explanation/Reference: https://aws.amazon.com/blogs/iot/industrial-iot-fro m-condition-based-monitoring-to-predictivequality-t o-digitize- your-factory-with-aws-iot-services/ https://aws.amazon.com/blogs/iot/using-aws-iot-for- predictive-maintenance/",
        "references": ""
    },
    {
        "question": "A Machine Learning Specialist is designing a scalab le data storage solution for Amazon SageMaker. Ther e is an existing TensorFlow-based model implemented as a  train.py script that relies on static training dat a that is currently stored as TFRecords. Which method of providing training data to Amazon S ageMaker would meet the business requirements with the LEAST development overhead? A. Use Amazon SageMaker script mode and use train.py  unchanged. Point the Amazon SageMaker training invocation to the local path of the data without re formatting the training data.",
        "options": [
            "B. Use Amazon SageMaker script mode and use train.py  unchanged. Put the TFRecord data into an Amazon",
            "C. Rewrite the train.py script to add a section that  converts TFRecords to protobuf and ingests the pro tobuf",
            "D. Prepare the data in the format accepted by Amazon  SageMaker. Use AWS Glue or AWS Lambda to"
        ],
        "correct": "B. Use Amazon SageMaker script mode and use train.py  unchanged. Put the TFRecord data into an Amazon",
        "explanation": "Explanation/Reference: https://github.com/aws-samples/amazon-sagemaker-scr ipt-mode/blob/master/tf-horovodinference-pipeline/ train.py",
        "references": ""
    },
    {
        "question": "The chief editor for a product catalog wants the re search and development team to build a machine lear ning system that can be used to detect whether or not in dividuals in a collection of images are wearing the company's retail brand. The team has a set of training data. Which machine learning algorithm should the researc hers use that BEST meets their requirements?",
        "options": [
            "A. Latent Dirichlet Allocation (LDA)",
            "B. Recurrent neural network (RNN)",
            "C. K-means",
            "D. Convolutional neural network (CNN)"
        ],
        "correct": "D. Convolutional neural network (CNN)",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A retail company is using Amazon Personalize to pro vide personalized product recommendations for its customers during a marketing campaign. The company sees a significant increase in sales of recommended items to existing customers immediately after deplo ying a new solution version, but these sales decrea se a short time after deployment. Only historical data f rom before the marketing campaign is available for training. How should a data scientist adjust the solution?",
        "options": [
            "A. Use the event tracker in Amazon Personalize to in clude real-time us r interactions.",
            "B. Add user metadata and use the HRNN-Metadata recip e in Amazon Personalize.",
            "C. Implement a new solution using the built-in facto rization machines (FM) algorithm in Amazon SageMake r.",
            "D. Add event type and event value fields to the inte ractions dataset in Amazon Personalize.",
            "A. Add a VPC endpoint policy to allow access to the IAM users.",
            "B. Modify the users' IAM policy to allow access to A mazon SageMaker Service API calls only.",
            "C. Modify the security group on the endpoint network  interface to restrict access to the instances.",
            "D. Modify the ACL on the endpoint network interface to restrict access to the instances."
        ],
        "correct": "A. Use the event tracker in Amazon Personalize to in clude real-time us r interactions.",
        "explanation": "Explanation/Reference: https://aws.amazon.com/blogs/machine-learning/priva te-package-installation-inamazon-sagemaker-running- in- internet-free-mode/",
        "references": ""
    },
    {
        "question": "An e commerce company wants to launch a new cloud-b ased product recommendation feature for its web application. Due to data localization regulations, any sensitive data must not leave its onpremises da ta center, and the product recommendation model must be traine d and tested using nonsensitive data only. Data tra nsfer to the cloud must use IPsec. The web application is  hosted on premises with a PostgreSQL database that contains all the data. The company wants the data t o be uploaded securely to Amazon S3 each day for mo del retraining. How should a machine learning specialis t meet these requirements?",
        "options": [
            "A. Create an AWS Glue job to connect to the PostgreS QL DB instance. Ingest tables without sensitive dat a",
            "B. Create an AWS Glue job to connect to the PostgreS QL DB instance. Ingest all data through an AWS Site -",
            "C. Use AWS Database Migration Service (AWS DMS) with  table mapping to select PostgreSQL tables with no",
            "D. Use PostgreSQL logical replication to replicate a ll data to PostgreSQL in Amazon EC2 through AWS Dir ect"
        ],
        "correct": "C. Use AWS Database Migration Service (AWS DMS) with  table mapping to select PostgreSQL tables with no",
        "explanation": "Explanation/Reference: https://docs.aws.amazon.com/dms/latest/userguide/CH AP_Source.PostgreSQL.html",
        "references": ""
    },
    {
        "question": "A logistics company needs a forecast model to predi ct next month's inventory requirements for a single  item in 10 warehouses. A machine learning specialist uses A mazon Forecast to develop a forecast model from 3 y ears of monthly data. There is no missing data. The spec ialist selects the DeepAR+ algorithm to train a pre dictor. The predictor means absolute percentage error (MAPE ) is much larger than the MAPE produc d by the curr ent human forecasters. Which changes to the CreatePredictor API call could  improve the MAPE? (Choose two.)",
        "options": [
            "A. Set PerformAutoML to true.",
            "B. Set ForecastHorizon to 4.",
            "C. Set ForecastFrequency to W for weekly.",
            "D. Set PerformHPO to true. E. Set FeaturizationMethodName to filling"
        ],
        "correct": "",
        "explanation": "Explanation/Reference: https://docs.aws.amazon.com/forecast/latest/dg/fore cast.dg.pdf",
        "references": ""
    },
    {
        "question": "A data scientist wants to use Amazon Forecast to bu ild a forecasting model for inventory demand for a retail company. The company has provided a dataset of hist oric inventory demand for its products as a .csv fi le stored in an Amazon S3 bucket. The table below show s a sample of the dataset. How should the data scientist transform the data?",
        "options": [
            "A. Use ETL jobs in AWS Glue to separate the dataset into a target time series dataset and an item metad ata",
            "B. Use a Jupyter notebook in Amazon SageMaker to sep arate the dataset into a related time series datase t",
            "C. Use AWS Batch jobs to separate the dataset into a  target time series dataset, a related time series dataset,",
            "D. Use a Jupyter notebook in Amazon SageMaker to tra nsform the data into the optimized protobuf recordI O"
        ],
        "correct": "A. Use ETL jobs in AWS Glue to separate the dataset into a target time series dataset and an item metad ata",
        "explanation": "Explanation/Reference: https://docs.aws.amazon.com/forecast/latest/dg/data set-import-guidelines-troubleshooting.html",
        "references": ""
    },
    {
        "question": "A machine learning specialist is running an Amazon SageMaker endpoint using the built-in object detect ion algorithm on a P3 instance for real-time prediction s in a company's production application. When evalu ating the model's resource utilization, the specialist notice s that the model is using only a fraction of the GP U. Which architecture changes would ensure that provisioned resources are being utilized effectively?",
        "options": [
            "A. Redeploy the model as a batch transform job on an  M5 instance.",
            "B. Redeploy the model on an M5 instance. Attach Amaz on Elastic Inference to the instance.",
            "C. Redeploy the model on a P3dn instance.",
            "D. Deploy the model onto an Amazon Elastic Container  Service (Amazon ECS) cluster using a P3 instance."
        ],
        "correct": "B. Redeploy the model on an M5 instance. Attach Amaz on Elastic Inference to the instance.",
        "explanation": "Explanation Explanation/Reference: https://aws.amazon.com/machine-learning/elastic-inf erence/",
        "references": ""
    },
    {
        "question": "A data scientist uses an Amazon SageMaker notebook instance to conduct data exploration and analysis. This requires certain Python packages th t are not nativ ely available on Amazon SageMaker to be installed o n the notebook instance How can a machine learning specia list ensure that required packages are automaticall y available on the notebook instance for the data sci entist o use?",
        "options": [
            "A. Install AWS Systems Manager Agent on the underlyi ng Amazon EC2 instance and use Systems Manager",
            "B. Create a Jupyter notebook file (.ipynb) with cell s containing the package installation commands to e xecute",
            "C. Use the conda package manager from within the Jup yter notebook console to apply the necessary conda",
            "D. Create an Amazon SageMaker lifecycle configuratio n with package installation commands and assign the"
        ],
        "correct": "D. Create an Amazon SageMaker lifecycle configuratio n with package installation commands and assign the",
        "explanation": "Explanation/Reference: https://docs.aws.amazon.com/sagemaker/latest/dg/nbi -add-external.html https://towardsdatascience.com/ automating-aws-sagemaker-notebooks-2dec62bc2c84",
        "references": ""
    },
    {
        "question": "A data scientist needs to identify fraudulent user accounts for a company's ecommerce platform. The co mpany wants the ability to determine if a newly created a ccount is associated with a previously known fraudu lent user. The data scientist is using AWS Glue to cleanse the  company's application logs during ingestion. Which strategy will allow the data scientist to identify fraudulent accounts?",
        "options": [
            "A. Execute the built-in FindDuplicates Amazon Athena  query.",
            "B. Create a FindMatches machine learning transform i n AWS Glue.",
            "C. Create an AWS Glue crawler to infer duplicate acc ounts in the source data.",
            "D. Search for duplicate accounts in the AWS Glue Dat a Catalog."
        ],
        "correct": "B. Create a FindMatches machine learning transform i n AWS Glue.",
        "explanation": "Explanation/Reference: https://docs.aws.amazon.com/glue/latest/dg/machine- learning.html",
        "references": ""
    },
    {
        "question": "A Data Scientist is developing a machine learning m odel to classify whether a financial transaction is  fraudulent. The labeled data available for training consists of  100,000 non-fraudulent observations and 1,000 frau dulent observations. The Data Scientist applies the XGBoos t algorithm to the data, resulting in the following  confusion matrix when the trained model is applied to a previ ously unseen validation dataset The accuracy of the  model is 99.1%, but the Data Scientist needs to reduce the n umber of false negatives. Which combination of steps should the Data Scientis t take to reduce the number of false negative predi ctions by the model? (Choose two.) A. Change the XGBoost eval_metric parameter to optim ize based on Root Mean Square Error (RMSE).",
        "options": [
            "B. Increase the XGBoost scale_pos_weight parameter t o adjust the balance of positive and negative weigh ts.",
            "C. Increase the XGBoost max_depth parameter because the model is currently underfitting the data.",
            "D. Change the XGBoost eval_metric parameter to optim ize based on Area Under the ROC Curve (AUC)."
        ],
        "correct": "",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A data scientist has developed a machine learning t ranslation model for English to Japanese by using A mazon SageMaker's built-in seq2seq algorithm with 500,000  aligned sentence pairs. While testing with sample sentences, the data scientist finds that the transl ation quality is reasonable for an example as short  as five words. However, the quality becomes unacceptable if  the sentence is 100 words long. Which action will resolve the problem?",
        "options": [
            "A. Change preprocessing to use n-grams.",
            "B. Add more nodes to the recurrent neural network (R NN) than the largest sentence's word count.",
            "C. Adjust hyperparameters related to the attention m echanism.",
            "D. Choose a different weight initialization type."
        ],
        "correct": "C. Adjust hyperparameters related to the attention m echanism.",
        "explanation": "Explanation/Reference: https://docs.aws.amazon.com/sagemaker/latest/dg/seq -2-seq-howitworks.html",
        "references": ""
    },
    {
        "question": "A financial company is trying to detect credit card  fraud. The company observed that, on average, 2% o f credit card transactions were fraudulent. A data scientist  trained a classifier on a year's worth of credit c ard transactions data. The model needs to identify the fraudulent transactions (positives) from the regula r ones (negatives). The company's goal is to accurately ca pture as many positives as possible. Which metrics should the data scientist use to optimize the model? (Choo se two.)",
        "options": [
            "A. Specificity",
            "B. False positive rate",
            "C. Accuracy",
            "D. Area under the precision-recall curve"
        ],
        "correct": "",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A machine learning specialist is developing a proof  of concept for government users whose primary conc ern is security. The specialist is using Amazon SageMaker to train a convolutional neural network (CNN) model  for a photo classifier application. The specialist wants o protect the data so that it cannot be accessed an d transferred to a remote host by malicious ode a cid entally installed on the training container. Which action will provide the MOST secure protectio n?",
        "options": [
            "A. Remove Amazon S3 access permissions from the Sage Maker execution role.",
            "B. Encrypt the weights of the CNN model.",
            "C. Encrypt the training and validation dataset.",
            "D. Enable network isolation for training jobs."
        ],
        "correct": "D. Enable network isolation for training jobs.",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A medical imaging company wants to train a computer  vision model to detect areas of concern on patient s' CT scans. The company has a large collection of unlabe led CT scans that are linked to each patient and st ored in an Amazon S3 bucket. The scans must be accessible t o authorized users only. A machine learning enginee r needs to build a labeling pipeline. Which set of steps should the engineer take to buil d the labeling pipeline with the LEAST effort?",
        "options": [
            "A. Create a workforce with AWS Identity and Access M anagement (IAM). Build a labeling tool on Amazon EC 2",
            "B. Create an Amazon Mechanical Turk workforce and ma nifest file. Create a labeling job by using the bui lt-in",
            "C. Create a private workforce and manifest file. Cre ate a labeling job by using the built-in bounding b ox task",
            "D. Create a workforce with Amazon Cognito. Build a l abeling web application with AWS Amplify. Build a"
        ],
        "correct": "C. Create a private workforce and manifest file. Cre ate a labeling job by using the built-in bounding b ox task",
        "explanation": "Explanation/Reference: https://docs.aws.amazon.com/sagemaker/latest/dg/sms -workforce-private.html",
        "references": ""
    },
    {
        "question": "A company is using Amazon Textract to extract textu al data from thousands of scanned text-heavy legal documents daily. The company uses this information to process loan applications automatically. Some of  the documents fail business validation and are returned  to human reviewers, who investigate the errors. Th is activity increases the time to process the loan app lications. What should the company do to reduce the processing  time of loan applications?",
        "options": [
            "A. Configure Amazon Textract to route low-confidence  predictions to Amazon SageMaker Ground Truth.",
            "B. Use an Amazon Textract synchronous operation inst ead of an asynchronous operation.",
            "C. Configure Amazon Textract to route low-confidence  predictions to Amazon Augmented AI (Amazon A2I).",
            "D. Use Amazon Rekognition's feature to detect text i n an image to extract the data from scanned images.  Use"
        ],
        "correct": "",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A company ingests machine learning (ML) data from w eb advertising clicks into an Amazon S3 data lake. Click data is added to an Amazon Kinesis data stream by u sing the Kinesis Producer Library (KPL). The data i s loaded into the S3 data lake from the data stream b y using an Amazon Kinesis Data Firehose delivery st ream. As the data volume increases, an ML specialist noti ces that the rate of data ingested into Amazon S3 i s relatively constant. There also is an increasing ba cklog of data for Kinesis Data Streams and Kinesis Data Firehose to ingest. Which next step is MOST likely to improve the data ingestion rate into Amazon S3?",
        "options": [
            "A. Increase the number of S3 prefixes for the delive ry stream to write to.",
            "B. Decrease the retention period for the data stream .",
            "C. Increase the number of shards for the da a stream .",
            "D. Add more consumers using the Kinesis Client Libra ry (KCL)."
        ],
        "correct": "C. Increase the number of shards for the da a stream .",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A data scientist must build a custom recommendation  model in Amazon SageMaker for an online retail company. Due to the nature of the company's product s, customers buy only 4-5 products every 5-10 years . So, the company relies on a steady stream of new custom ers. When a new customer signs up, the company collects data on the customer's preferences. Below is a sample of the data available to the data scien tist. How should the data scientist split the dataset int o a training and test set for this use case?",
        "options": [
            "A. Shuffle all interaction data. Split off the last 10% of the interaction data for the test set.",
            "B. Identify the most recent 10% of interactions for each user. Split off these interactions for the tes t set.",
            "C. Identify the 10% of users with the least interact ion data. Split off all interaction data from these  users for the",
            "D. Randomly select 10% of the users. Split off all i nteraction data from these users for the test set."
        ],
        "correct": "B. Identify the most recent 10% of interactions for each user. Split off these interactions for the tes t set.",
        "explanation": "Explanation/Reference: https://aws.amazon.com/blogs/machine-learning/build ing-a-customized-recommender-system-inamazon- sagemaker/",
        "references": ""
    },
    {
        "question": "company's data scientists run machine learning (ML)  models on confidential financial data. The company  is worried about data egress and wants n ML engineer t o secure the environment. Which mechanisms can the ML engineer use to control data egress from SageMak er? (Choose three.)",
        "options": [
            "A. Connect to SageMaker by using a VPC interface end poin powered by AWS PrivateLink.",
            "B. Use SCPs to restrict access to SageMaker.",
            "C. Disable root access on the SageMaker notebook ins tances.",
            "D. Enable network isolation for training jobs and mo dels."
        ],
        "correct": "",
        "explanation": "Explanation/Reference: https://aws.amazon.com/blogs/machine-learning/mille nnium-management-secure-machinelearning-using- amazon-sagemaker/",
        "references": ""
    },
    {
        "question": "A company needs to quickly make sense of a large am ount of data and gain insight from it. The data is in different formats, the schemas change frequently, a nd new data sources are added regularly. The compan y wants to use AWS services to explore multiple data sources, suggest schemas, and enrich and transform the data. The solution should require the least possibl e coding effort for the data flows and the least po ssible infrastructure management. Which combination of AWS services will meet these r equirements? A. Amazon EMR for data discovery, enrichment, and tran sformation Amazon Athena for querying and analyzing the result s in Amazon S3 using standard SQL Amazon QuickSight for reporting and getting insights",
        "options": [
            "A. Amazon Kinesis Data Analytics for data ingestion",
            "B. AWS Data Pipeline for data transfer"
        ],
        "correct": "A. Amazon Kinesis Data Analytics for data ingestion",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A company is converting a large number of unstructu red paper receipts into images. The company wants t o create a model based on natural language processing  (NLP) to find relevant entities such as date, loca tion, and notes, as well as some custom entities such as rece ipt numbers. The company is using optical character recognition (OCR) to extract text for data labeling. However, d ocuments are in different structures and formats, and the co mpany is facing challenges with setting up the manu al workflows for each document type. Additionally, the  company trained a named entity recognition (NER) m odel for custom entity detection using a small sampl siz e This model has a very low confidence score and wi ll require retraining with a large dataset. Which solution for text extraction and entity detec tion will require th LEAST amount of effort?",
        "options": [
            "A. Extract text from receipt images by using Amazon Textract. Use the Amazon SageMaker BlazingText",
            "B. Extract text from receipt images by using a deep learning OCR model from the AWS Marketplace. Use th e",
            "C. Extract text from receipt images by using Amazon Textract. Use Amazon Comprehend for entity detectio n,",
            "D. Extract text from receipt images by using a deep learning OCR model from the AWS Marketplace. Use"
        ],
        "correct": "C. Extract text from receipt images by using Amazon Textract. Use Amazon Comprehend for entity detectio n,",
        "explanation": "Explanation/Reference: https://aws.amazon.com/blogs/machine-learning/build ing-an-nlp-powered-searchindex-with-amazon-textract - and-amazon-comprehend/",
        "references": ""
    },
    {
        "question": "A company is building a predictive maintenance mode l based on machine learning (ML). The data is store d in a fully private Amazon S3 bucket that is encrypted at  rest with AWS Key Management Service (AWS KMS) CMKs. An ML specialist must run data preprocessing by using an Amazon SageMaker Processing job that is triggered from code in an Amazon SageMaker notebook . The job should read data from Amazon S3, process it, and upload it back to the same S3 bucket. The p reprocessing code is stored in a container image in  Amazon Elastic Container Registry (Amazon ECR). The ML spe cialist needs to grant permissions to ensure a smoo th data preprocessing workflow. Which set of actions s hould the ML specialist take to meet these requirem ents?",
        "options": [
            "A. Create an IAM role that has permissions to create  Amazon SageMaker Processing jobs, S3 read and writ e",
            "B. Create an IAM role that has permissions to create  Amazon SageMaker Processing jobs. Attach the role to",
            "C. Create an IAM role that has permissions to create  Amazon SageMaker Processing jobs and to access",
            "D. Create an IAM role that has permissions to create  Amazon SageMaker Processing jobs. Attach the role to"
        ],
        "correct": "D. Create an IAM role that has permissions to create  Amazon SageMaker Processing jobs. Attach the role to",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A data scientist has been running an Amazon SageMak er notebook instance for a few weeks. During this t ime, a new version of Jupyter Notebook was released alon g with additional software updates. The security te am mandates that all running SageMaker notebook instan ces use the latest security and software updates provided by SageMaker. How can the data scientist meet these requirements?A. Call the Create Notebook Instance Lifecycle Confi g API operation",
        "options": [
            "B. Create a new SageMaker notebook instance and moun t the Amazon Elastic Block Store (Amazon EBS)",
            "C. Stop and then restart the SageMaker notebook inst ance",
            "D. Call the Update Notebook Instance Lifecycle Confi g API operation"
        ],
        "correct": "C. Stop and then restart the SageMaker notebook inst ance",
        "explanation": "Explanation/Reference: https://docs.aws.amazon.com/segemaker/latest/dg/nbi -software-updates.html",
        "references": ""
    },
    {
        "question": "A library is developing an automatic book-borrowing  system that uses Amazon Rekognition. Images of lib rary members' faces are stored in an Amazon S3 bucket. W hen members borrow books, the Amazon Rekognition CompareFaces API operation compares real faces agai nst the stored faces in Amazon S3. The library need s to improve security by making sure that images are enc rypted at rest. Also, when the images are used with Amazon Rekognition. they need to be encrypted in tr ansit. The library also must ensure that the images  are not used to improve Amazon Rekognition as a service. How should a machine learning specialist architect the solution to satisfy these requirements?",
        "options": [
            "A. Enable server-side encryption on the S3 bucket. S ubmit an AWS Support ticket to opt out of allowing",
            "B. Switch to using an Amazon Rekognition collection to store the images. Use the IndexFaces and",
            "C. Switch to using the AWS GovCloud (US) Region for Amazon S3 to store images and for Amazon",
            "D. Enable client-side encryption on the S3 bucket. S et up a VPN connection and only call the Amazon"
        ],
        "correct": "B. Switch to using an Amazon Rekognition collection to store the images. Use the IndexFaces and",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A company is building a line-counting application f or use in a quick-service restaurant. The company w ants to use video cameras pointed at the line of customers at a given register to measure how many people are in line and deliver notifications to managers if the line g rows too long. The restaurant locations have limite d bandwidth for connections to external services and cannot acc ommodate multiple video streams without impacting o ther operations. Which solution should a machine learnin g specialist implement to meet these requirements?",
        "options": [
            "A. Install cameras compatible with Amazon Kinesis Vi deo Streams to stream the data to AWS over the",
            "B. Deploy AWS DeepLens cameras in the restaurant to capture video. Enable Amazon Rekognition on the",
            "C. Build a custom model in Amazon SageMaker to recog nize the number of people in an image. Install",
            "D. Build a custom model in Amazon SageMaker to recog nize the number of people in an image. Deploy AWS"
        ],
        "correct": "A. Install cameras compatible with Amazon Kinesis Vi deo Streams to stream the data to AWS over the",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A company has set up and deployed its machine learn ing (ML) model into production with an endpoint usi ng Amazon SageMake hosting services. The ML team has c onfigured automatic scaling for its SageMaker instances to support workload changes. During testi ng, the team notices that additional instances are being launched before the new instances are ready. This b ehavior needs to change as soon as possible. How can the ML team solve this issue?",
        "options": [
            "A. Decrease the cooldown period for the scale-in act ivity. Increase the configured maximum capacity of",
            "B. Replace the current endpoint with a multi-model e ndpoint using SageMaker.",
            "C. Set up Amazon API Gateway and AWS Lambda to trigg er the SageMaker inference endpoint.",
            "D. Increase the cooldown period for the scale-out ac tivity."
        ],
        "correct": "A. Decrease the cooldown period for the scale-in act ivity. Increase the configured maximum capacity of",
        "explanation": "Explanation/Reference: https://aws.amazon.com/blogs/machine-learning/confi guring-autoscaling-inferenceendpoints-in-amazon- sagemaker/",
        "references": ""
    },
    {
        "question": "A telecommunications company is developing a mobile  app for its customers. The company is using an Amazon SageMaker hosted endpoint for machine learni ng model inferences. Developers want to introduce a new version of the model for a limited number of us ers who subscribed to a preview feature of the app.  After the new version of the model is tested as a preview , developers will evaluate its accuracy. If a new v ersion of the model has better accuracy, developers need to b e able to gradually release the new version for all  users over a fixed period of time. How can the company implement the testing model wit h the LEAST amount of operational overhead?",
        "options": [
            "A. Update the ProductionVariant data type with the n ew version of the model by using the",
            "B. Configure two SageMaker hosted endpoints that ser ve the different versions of the model. Create an",
            "D. Configure two SageMaker hosted endpoints that ser ve the different ve sions of the model. Create an"
        ],
        "correct": "D. Configure two SageMaker hosted endpoints that ser ve the different ve sions of the model. Create an",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A company offers an online shopping service to its customers. The company wants to enhance the site's security by requesting additional information when customers access the site from locations that are d ifferent from their normal to action. The company wants to u pdate the process to call a machine learning (ML) m odel to determine when additional information should be req uested. The company has several terabytes of data from its existing ecommerce web servers containing the sourc e IP addresses for each request made to the web server. For authenticated requests, the records also contai n the login name of the requesting user. Which approach should an ML specialist take to impl ement the new security feature in the web applicati on?",
        "options": [
            "A. Use Amazon SageMaker Ground Truth to label each r ecord as either a successful or failed access attem pt.",
            "B. Use Amazon SageMaker to train a model using the I P Insights algorithm. Schedule updates and retraini ng",
            "C. Use Amazon SageMaker Ground Truth to label each r ecord as either a successful or failed access attem pt.",
            "D. Use Amazon SageMaker to train a model using the O bject2Vec algorithm. Schedule updates and retrainin g"
        ],
        "correct": "C. Use Amazon SageMaker Ground Truth to label each r ecord as either a successful or failed access attem pt.",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A retail company wants to combine its customer orde rs with the product description data from its produ ct catalog. The structure and format of the records in  each dataset is different. A data analyst tried to  use a spreadsheet to combine the datasets, but the effort  resulted in duplicate records and records that wer e not properly combined. The company needs a solution tha t it can use to combine similar records from the tw o datasets and remove any duplicates. Which solution will meet these requirements?",
        "options": [
            "A. Use an AWS Lambda function to process the data. U se two arrays to compare equal strings in the field s",
            "B. Create AWS Glue crawlers for reading and populati ng the AWS Glue Data Catalog. Call the AWS Glue",
            "C. Create AWS Glue crawlers for reading and populati ng the AWS Glue Data Catalog. Use the FindMatches",
            "D. Create an AWS Lake Formation custom transform. Ru n a transformation for matching products from the"
        ],
        "correct": "D. Create an AWS Lake Formation custom transform. Ru n a transformation for matching products from the",
        "explanation": "Explanation/Reference: https://aws.amazon.com/lake-formation/features/",
        "references": ""
    },
    {
        "question": "A company provisions Amazon SageMaker notebook inst ances for its data science team and creates Amazon VPC interface endpoints to ensure communication bet ween the VPC and the notebook instances. All connections to the Amazon SageMaker API are contain ed entirely and securely using the AWS network. However, the data science team realizes that indivi duals outside the VPC can still connect to the note book instances across the internet. Which set of actions  should the data science team take to fix the issue ?",
        "options": [
            "A. Modify the notebook instances' security gr up to allow traffic only from the CIDR ranges of the VPC.  Apply",
            "B. Create an IAM policy that allows the sagemaker:Cr eatePresignedNotebooklnstanceUrl and",
            "C. Add a NAT gateway to the VPC. Convert all of the subnets where the Amazon SageMaker notebook",
            "D. Change the network ACL of the subnet the notebook  is hosted in to restrict access to anyone outside the"
        ],
        "correct": "B. Create an IAM policy that allows the sagemaker:Cr eatePresignedNotebooklnstanceUrl and",
        "explanation": "Explanation/Reference: https://gmoein.github.io/files/Amazon%20SageMaker.p df",
        "references": ""
    },
    {
        "question": "A company will use Amazon SageMaker to train and ho st a machine learning (ML) model for a marketing campaign. The majority of data is sensitive custome r data. The data must be encrypted at rest. The com pany wants AWS to maintain the root of trust for the mas ter keys and wants encryption key usage to be logge d. Which implementation will meet these requirements?",
        "options": [
            "A. Use encryption keys that are stored in AWS Cloud HSM to encrypt the ML data volumes, and to encrypt the",
            "B. Use SageMaker built-in transient keys to encrypt the ML data volumes. Enable default encryption for new",
            "C. Use customer managed keys in AWS Key Management S ervice (AWS KMS) to encrypt the ML data",
            "D. Use AWS Security Token Service (AWS STS) to creat e temporary tokens to encrypt the ML storage"
        ],
        "correct": "C. Use customer managed keys in AWS Key Management S ervice (AWS KMS) to encrypt the ML data",
        "explanation": "Explanation Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A machine learning specialist stores IoT soil senso r data in Amazon DynamoDB table and stores weather event data as JSON files in Amazon S3. The dataset in Dyn amoDB is 10 GB in size and the dataset in Amazon S3  is 5 GB in size. The specialist wants to train a model  on this data to help predict soil moisture levels as a function of weather events using Amazon SageMaker. Which solution will accomplish the necessary transf ormation to train the Amazon SageMaker model with t he LEAST amount of administrative overhead?",
        "options": [
            "A. Launch an Amazon EMR cluster. Create an Apache Hi ve external table for the DynamoDB table and S3",
            "B. Crawl the data using AWS Glue crawlers. Write an AWS Glue ETL job that merges the two tables and",
            "C. Enable Amazon DynamoDB Streams on the sensor tabl e. Writ an AWS Lambda function that consumes the",
            "D. Crawl the data using AWS Glue crawlers. Write an AWS Glue ETL job that merges the two tables and"
        ],
        "correct": "C. Enable Amazon DynamoDB Streams on the sensor tabl e. Writ an AWS Lambda function that consumes the",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A company sells thousands of products on a public w ebsite and wants to automatically identify products  with potential durability problems. The company has 1.00 0 reviews with date, star rating, review text, revi ew summary, and customer email fields, but many review s are incomplete and have empty fields. Each review  has already been labeled with the correct durability re sult. A machine learning specialist must train a model to  identify reviews expressing concerns over product durability. The first model needs to be trained and  ready to review in 2 days. What is the MOST direct approach to solve this prob lem within 2 days?",
        "options": [
            "A. Train a custom classifier by using Amazon Compreh end.",
            "B. Build a recurrent neural network (RNN) in Amazon SageMaker by using Gluon and Apache MXNet.",
            "C. Train a built-in BlazingText model using Word2Vec  mode in Amazon SageMaker.",
            "D. Use a built-in seq2seq model in Amazon SageMaker."
        ],
        "correct": "B. Build a recurrent neural network (RNN) in Amazon SageMaker by using Gluon and Apache MXNet.",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A company that runs an online library is implementi ng a chatbot using Amazon Lex to provide book recommendations based on category. This intent is f ulfilled by an AWS Lambda function that queries an Amazon DynamoDB table for a list of book titles, gi ven a particular category. For testing, there are o nly three categories implemented as the custom slot types: \"c omedy,\" \"adventure,\" and \"documentary.\" A machine learning (ML) specialist notices that sometimes the  request cannot be fulfilled because Amazon Lex can not understand the category spoken by users with uttera nces such as \"funny,\" \"fun,\" and \"humor.\" The ML specialist needs to fix the problem without changin g the Lambda code or data in DynamoDB. How should the ML specialist fix the problem? A. Add the unrecognized words in the enumeration val ues list as new values in the slot type.",
        "options": [
            "B. Create a new custom slot type, add the unrecogniz ed words to this slot type as enumeration values, a nd",
            "C. Use the AMAZON.SearchQuery built-in slot types fo r custom searches in the database.",
            "D. Add the unrecognized words as synonyms in the cus tom slot type."
        ],
        "correct": "C. Use the AMAZON.SearchQuery built-in slot types fo r custom searches in the database.",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A manufacturing company uses machine learning (ML) models to detect quality issues The models use imag es that are taken of the company's product at the end of each production step. The company has thousands of machines at the production site that generate one i mage per second on average. The company ran a successful pilot with a single ma nufacturing machine For the pilot, ML specialists u sed an industrial PC that ran AWS IoT Greengrass with a lo ng-running AWS Lambda function that uploaded the images to Amazon S3. The uploaded images invoked a Lambda function that was written in Python to perfo rm inference by using an Amazon SageMaker endpoint tha t ran a custom model. The inference results were forwarded back to a web service that was hosted at the production site to prevent faulty products from bei ng shipped The company scaled the solution out to all manufact uring machines by installing similarly configured i ndustrial PCs on each production machine. However, latency fo r predictions increased beyond acceptable limits. Analysis shows that the internet connection is at i ts capacity limit. How can the company resolve this issue MOST cost-ef fectively?",
        "options": [
            "A. Set up a 10 Gbps AWS Direct Connect connection be tween the production site and the nearest AWS",
            "B. Extend the long-running Lambda function that runs  on AWS IoT Greengrass to compress the images and",
            "C. Use auto scaling for SageMaker. Set up an AWS Dir ect Connect connection between the production site",
            "D. Deploy the Lambda function and the ML models onto  the AWS IoT Greengrass core that is running on the"
        ],
        "correct": "D. Deploy the Lambda function and the ML models onto  the AWS IoT Greengrass core that is running on the",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A data scientist is using an Amazon SageMaker noteb ook instance and needs to securely access data stor ed in a specific Amazon S3 bucket. How should the data scientist accomplish this?",
        "options": [
            "A. Add an S3 bucket policy allowing GetObject, PutOb ject, and ListBucket permissions to the Amazon",
            "B. Encrypt the objects in the S3 bucket with a custo m AWS Key Management Service (AWS KMS) key that only the notebook owner has access to.",
            "C. Attach the policy to the IAM role associated with  the notebook that allows GetObject,PutObject, and",
            "D. Use a script in a lifecycle configuration to conf igure the AWS CLI on the instance with an access ke y ID and"
        ],
        "correct": "C. Attach the policy to the IAM role associated with  the notebook that allows GetObject,PutObject, and",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A company is launching a new product and needs to b uild a mechanism to monitor comments about the company and its new product on social media. The co mpany needs to be able to evaluate the sentiment expressed in social media posts, and visualize tren ds and configure alarms based on various thresholds . The company needs to implement this solution quickly, a nd wants to minimize the infrastructure and data sc ience resources needed to evaluate the messages. The comp any already has a solution in place to collect post s and store them within an Amazon S3 bucket. What services should the data science team use to d eliver this solution?",
        "options": [
            "A. Train a model in Amazon SageMaker by using the Bl azingText algorithm to detect sentiment in the corp us",
            "B. Train a model in Amazon SageMaker by using the se mantic c segmentation algorithm to model the",
            "C. Trigger an AWS Lambda function when social media posts are added to the S3 bucket. Call Amazon",
            "D. Trigger an AWS Lambda function when social media posts are added to the S3 bucket. Call Amazon"
        ],
        "correct": "A. Train a model in Amazon SageMaker by using the Bl azingText algorithm to detect sentiment in the corp us",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A bank wants to launch a low-rate credit promotion.  The bank is located in a town that recently experi enced economic hardship. Only some of the bank's customer s were affected by the crisis, so the bank's credit  team must identify which customers to target with the pr omotion. However, the credit team wants to make sur e that loyal customers' full credit history is considered when the decision is made. The bank's data science team developed a model that  classifies account transactions and understands cr edit eligibility. The data science team used the XGBoost  algorithm to train the model. The team used 7 year s of bank transaction historical data for training and h yperparameter tuning over the course of several day s. The accuracy of the model is sufficient, but the credit  team is struggling to explain accurately why the m odel denies credit to some customers. The credit team has almos t no skill in data science. What should the data sc ience team do to address this issue in the MOST operation ally efficient manner?",
        "options": [
            "A. Use Amazon SageMaker Studio to rebuild the model.  Create a notebook that uses the XGBoost training",
            "B. Use Amazon SageMaker Studio to rebuild the model.  Create a notebook that uses the XGBoost training",
            "C. Create an Amazon SageMaker notebook instance. Use  the notebook instance and the XGBoost library to",
            "D. Use Amazon SageMaker Studio to rebuild the model.  Create a notebook that uses the XGBoost training"
        ],
        "correct": "C. Create an Amazon SageMaker notebook instance. Use  the notebook instance and the XGBoost library to",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A data science team is planning to build a natural language processing (NLP) application. The applicat ion's text preprocessing stage will include part-of-speech tag ging and key phase extraction. The preprocessed tex t will be input to a custom classification algorithm that the  data science team has already written and trained using Apache MXNet. Which solution can the team build MOST quickly to m eet these requirements?",
        "options": [
            "A. Use Amazon Comprehend for the part-of-speech tagg ing, key phase extraction, and classification tasks .",
            "B. Use an NLP library in Amazon SageMaker for the pa rt-of-speech tagging. Use Amazon Comprehend for the",
            "C. Use Amazon Comprehend for the part-of-speech tagg ing and key phase extraction tasks. Use Amazon",
            "D. Use Amazon Comprehend for the part-of-speech tagg ing and key phase extraction tasks. UseAWS Deep"
        ],
        "correct": "B. Use an NLP library in Amazon SageMaker for the pa rt-of-speech tagging. Use Amazon Comprehend for the",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A machine learning (ML) specialist must develop a c lassification model for a financial services compan y. A domain expert provides the dataset, which is tabula r with 10,000 rows and 1,020 features. During explo ratory data analysis, the specialist finds no missing valu es and a small percentage of duplicate rows. There are correlation scores of > 0.9 for 200 feature pairs. The mean value of each feature is similar to its 50 th percentile. Which feature engineering strategy should the ML sp ecialist use with Amazon SageMaker? A. Apply dimensionality reduction by using the princ ipal component analysis (PCA) algorithm.",
        "options": [
            "B. Drop the features with low correlation scores by using a Jupyter notebook.",
            "C. Apply anomaly detection by using the Random Cut F orest (RCF) algorithm.",
            "D. Concatenate the features with high correlation sc ores by using a Jupyter notebook."
        ],
        "correct": "C. Apply anomaly detection by using the Random Cut F orest (RCF) algorithm.",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A machine learning specialist needs to analyze comm ents on a news website with users across the globe.  The specialist must find the most discussed topics in t he comments that are in either English or Spanish. What steps could be used to accomplish this task? ( Choose two.)",
        "options": [
            "A. Use an Amazon SageMaker BlazingText algorithm to find the topics independently from language. Procee d",
            "B. Use an Amazon SageMaker seq2seq algorithm to tran slate from Spanish to English, if necessary. Use a",
            "C. Use Amazon Translate to translate from Spanish to  English, if necessary. Use Amazon Comprehend topic",
            "D. Use Amazon Translate to translate from Spanish to  English, if necessa y. Use Amazon Lex to extract t opics"
        ],
        "correct": "B. Use an Amazon SageMaker seq2seq algorithm to tran slate from Spanish to English, if necessary. Use a",
        "explanation": "Explanation/Reference: https://docs.aws.amazon.com/sagemaker/latest/dg/lda .html",
        "references": ""
    },
    {
        "question": "A machine learning (ML) specialist is administering  a production Amazon SageMaker endpoint with model monitoring configured. Amazon SageMaker Model Monit or detects violations on the SageMaker endpoint, so the ML specialist entrains the model with the lates t dataset. This dataset is statistically representa tive of the current production traffic. The ML specialist notic es that even after deploying the new SageMaker mode l and running the first monitoring job, the SageMaker end point still has violations. What should the ML specialist do to resolve the vio lations?",
        "options": [
            "A. Manually trigger the monitoring job to re-evaluat e the SageMaker endpoint traffic sample.",
            "B. Run the Model Monitor baseline job again on the n ew training set. Configure Model Monitor to use the  new",
            "C. Delete the endpoint and recreate it with the orig inal configuration.",
            "D. Retrain the model again by using a combination of  the original training set and the new training set .",
            "A. Detecting seasonality for the majority of stores will be an issue. Request categorical data to relat e new",
            "B. The sales data does not have enough variance. Req uest external sales data from other industries to",
            "C. Sales data is aggregated by week. Request daily s ales data from the source database to enable buildi ng a",
            "D. The sales data is missing zero entries for item s ales. Request that item sales data from the source"
        ],
        "correct": "B. Run the Model Monitor baseline job again on the n ew training set. Configure Model Monitor to use the  new",
        "explanation": "Explanation/Reference: https://towardsdatascience.com/sales-forecasting-fr om-time-series-to-deep-learning-5d115514bfac https: // arxiv.org/ftp/arxiv/papers22.6613.pdf",
        "references": ""
    },
    {
        "question": "An ecommerce company is automating the categorizati on of its produces based on images. A data scientis t has trained a computer vision model using the Amazo n SageMaker image classification algorithm. The ima ges for each product are classified according to specif ic product lines. The accuracy of the model is too low when categorizing new products. All of the product image s have the same dimensions and are stored within an Amazon S3 bucket. The company wants to improve the model so it can be used for new products as soon as possible. Which steps would improve the accuracy of  the solution? (Choose three.)",
        "options": [
            "A. Use the SageMaker semantic segmentation algorithm  to train a new model to achieve improved accuracy.",
            "B. Use the Amazon Rekognition DetectLabels API to cl assify the products in the dataset.",
            "C. Augment the images in the dataset. Use open sourc e libraries to crop, resize, flip, rotate, and adju st the",
            "D. Use a SageMaker notebook to implement the normali zation of pixels and scaling of the images. Store t he"
        ],
        "correct": "",
        "explanation": "Explanation/Reference: https://docs.aws.amazon.com/rekognition/latest/dg/h ow-it-works-types.html https://towardsdatascience.c om/ image-processing-techniques-for-computer-vision-11f 92f511e21 https://docs.aws.amazon.com/rekognition/ latest/customlabels-dg/training-model.html",
        "references": ""
    },
    {
        "question": "algorithm. There are 5 classes in the dataset, with  300 samples for category A, 292 samples for catego ry B, 240 samples for category C, 258 samples for categor y D, and 310 samples for category E. The data scien tist shuffles the data and splits off 10% for testing. A fter training the model, the data scientist generat es confusion matrices for the training and test sets. What could the data scientist conclude form these r esults?",
        "options": [
            "A. Classes C and D are too similar. B. The dataset is too small for holdout cross-validati on.",
            "C. The data distribution is skewed.",
            "D. The model is overfitting for classes B and E."
        ],
        "correct": "",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A company that manufactures mobile devices wants to  determine and calibrate the appropriate sales pric e for its devices. The company is collecting the relevant  data and is determining data features that it can use to train machine learning (ML) models. There are more than 1 ,000 features, and the company wants to determine t he primary features that contribute to the sales price . Which techniques should the company use for feature  selection? (Choose three.)",
        "options": [
            "A. Data scaling with standardization and normalizati on",
            "B. Correlation plot with heat maps",
            "C. Data binning",
            "D. Univariate selection"
        ],
        "correct": "",
        "explanation": "Explanation/Reference: https://towardsdatascience.com/an-overview-of-data- preprocessing-featuresenrichment-automatic-feature- selection-60b0c12d75ad https://towardsdatascience.com/feature-selection-us ing-python-for-classification-problemb5f00a1c7028#: ~: text=Univariate%20feature%20selection%20works%20by, analysis%20of%20varian ce%20 (ANOVA).&text=That%20is%20why%20it%20is%20called%20 'univariate' https://arxiv.org/abs1.04530",
        "references": ""
    },
    {
        "question": "A power company wants to forecast future energy con sumption for its customers in residential propertie s and commercial business properties. Historical power co nsumption data for the last 10 years is available. A team of data scientists who performed the initial data anal ysis and feature selection will include the histori cal power consumption data and data such as weather, number o f individuals on the property, and public holidays.  The data scientists are using Amazon Forecast to genera te the forecasts. Which algorithm in Forecast shoul d the data scientists use t meet these requirements?",
        "options": [
            "A. Autoregressive Integrated Moving Average (A RMA)",
            "B. Exponential Smoothing (ETS)",
            "C. Convolutional Neural Network -Quintile Regression  (CNN-QR)",
            "D. Prophet",
            "A. Use a voice-driven Amazon Lex bot to perform the ASR customization. Create customer slots within the  bot",
            "B. Use Amazon Transcribe to perform the ASR customiz ation. Analyze the word confidence scores in the",
            "C. Create a custom vocabulary file containing each p roduct name with phonetic pronunciations, and use i t with",
            "D. Use the audio transcripts to create a training da taset and build an Amazon Transcribe custom languag e"
        ],
        "correct": "A. Use a voice-driven Amazon Lex bot to perform the ASR customization. Create customer slots within the  bot",
        "explanation": "Explanation/Reference: https://docs.aws.amazon.com/lex/latest/dg/lex-dg.pd f",
        "references": ""
    },
    {
        "question": "A company is building a demand forecasting model ba sed on machine learning (ML). In the development st age, an ML specialist uses an Amazon SageMaker notebook to perform feature engineering during work hours th at consumes low amounts of CPU and memory resources. A  data engineer uses the same notebook to perform data preprocessing once a day on average that requi res very high memory and completes in only 2 hours.  The data preprocessing is not configured to use GPU. Al l the processes are running well on an ml.m5.4xlarg e notebook instance. The company receives an AWS Budg ets alert that the billing for this month exceeds t he allocated budget. Which solution will result in the MOST cost savings ?",
        "options": [
            "A. Change the notebook instance type to a memory opt imized instance with the same vCPU number as the",
            "B. Keep the notebook instance type and size the same  Stop the notebook when it is not in use. Run data",
            "C. Change the notebook instance type to a smaller ge neral purpose instance. Stop the notebook when it i s not",
            "D. Change the notebook instance type to a smaller ge neral purpose instance. Stop the notebook when it i s not"
        ],
        "correct": "",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A machine learning specialist is developing a regre ssion model to predict rental rates from rental lis tings. A variable named Wall_Color represents the most promi nent exterior wall color of the property. The follo wing is the sample data, excluding all other variables: The specialist chose a model that needs numerical i nput data. Which feature engineering approaches should the spe cialist use to allow the regression model to learn from the Wall_Color data? (Choose two.)",
        "options": [
            "A. Apply integer transformation and set Red = 1, Whi te = 5, and Green = 10.",
            "B. Add new columns that store one-hot representation  of colors.",
            "C. Replace the color name string by its length.",
            "D. Create three columns to encode the color in RGB f ormat."
        ],
        "correct": "",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A data scientist is working on a public sector proj ect for an urban traffic system. While studying the  traffic patterns, it is clear to the data scientist that th e traffic behavior at each light is correlated, sub ject to a small stochastic error term. The data scientist must mode l the traffic behavior to analyze the traffic patte rns and reduce congestion. How will the data scientist MOST effectively model the problem?",
        "options": [
            "A. The data scientist should obtain a correlated equ ilibrium policy by formulating this problem as a mu lti-agent",
            "B. The data scientist should obtain the optimal equi librium policy by formulating his problem as a sing le-agent",
            "C. Rather than finding an equilibrium policy, the da ta scientist should obtain accurate predictors of t raffic flow",
            "D. Rather than finding an equilibrium policy, the da ta scientist should obtain accurate predictors of t raffic flow"
        ],
        "correct": "D. Rather than finding an equilibrium policy, the da ta scientist should obtain accurate predictors of t raffic flow",
        "explanation": "Explanation Explanation/Reference: https://www.hindawi.com/journals/jat1 8011/",
        "references": ""
    },
    {
        "question": "A data scientist is using the Amazon SageMaker Neur al Topic Model (NTM) algorithm to build a model tha t recommends tags from blog posts. The raw blog post data is stored in an Amazon S3 bucket in JSON forma t. During model evaluation, the data scientist discove red that the model recommends certain stopwords such as \"a,\" \"an,\" and  \"the\" as tags to certain blog posts, along with a few rare words that are present only in certain blog entries . After a few iterations of tag review with the con tent team, the data scientist notices that the rare words are unus ual but feasible. The data scientist also must ensu re that the tag recommendations of the generated model do not i nclude the stopwords. What should the data scientist do to meet these req uirements?",
        "options": [
            "A. Use the Amazon Comprehend entity recognition API operations. Remove the detected words from the blog",
            "B. Run the SageMaker built-in principal component an alysis (PCA) algorithm with the blog post data from  the",
            "C. Use the SageMaker built-in Object Detection algor ithm instead of the NTM algorithm for the training job to",
            "D. Remove the stopwords from the blog post data by u sing the Count Vectorizer function in the scikit -l earn"
        ],
        "correct": "D. Remove the stopwords from the blog post data by u sing the Count Vectorizer function in the scikit -l earn",
        "explanation": "Explanation/Reference: https://towardsdatascience.com/natural-language-pro cessing-count-vectorization-withscikit-learn- e7804269bb5e",
        "references": ""
    },
    {
        "question": "A company wants to create a data repository in the AWS Cloud for machine learning (ML) projects. The company wants to use AWS to perform complete ML lif ecycles and wants to use Amazon S3 for the data storage. All of the company's data currently reside s on premises and is 40 in size. The company wants a solution that can transfer and automatically update  data between the on-premises object storage and Am azon S3. The solution must support encryption, schedulin g, monitoring, and data integrity validation. Which  solution meets these requirements?",
        "options": [
            "A. Use the S3 sync command to compare the source S3 bucket and the destination S3 bucket. Determine",
            "B. Use AWS Transfer for FTPS to transfer the files f rom the on-premises storage to Amazon S3.",
            "C. Use AWS DataSync to make an initial copy of the e ntire dataset. Schedule subsequent incremental",
            "D. Use S3 Batch Operations to pull data periodically  from the on-premises storage.Enable S3 Versioning on",
            "A. Use Amazon Rekognition Custom Labels to label the  dataset and create a custom Amazon Rekognition",
            "B. Use an Amazon SageMaker Ground Truth object detec tion labeling task. Use Amazon Mechanical Turk as",
            "C. Use Amazon Rekognition Custom Labels to label the  dataset and create a custom Amazon Rekognition",
            "D. Use an Amazon SageMaker Ground Truth semantic seg mentation labeling task. Use a private workforce as"
        ],
        "correct": "B. Use an Amazon SageMaker Ground Truth object detec tion labeling task. Use Amazon Mechanical Turk as",
        "explanation": "Explanation/Reference: https://docs.aws.amazon.com/sagemaker/latest/dg/sms -workforce-managementpublic. html",
        "references": ""
    },
    {
        "question": "A data engineer at a bank is evaluating a new tabul ar dataset that includes customer data. The data en gineer will use the customer data to create a new model to  predict customer behavior. After creating a correl ation matrix for the variables, the data engineer notices  that many of the 100 features are highly correlate d with each other. Which steps should the data engineer take to addres s this issue? (Choose two.)",
        "options": [
            "A. Use a linear-based algorithm to train the model.",
            "B. Apply principal component analysis (PCA).",
            "C. Remove a portion of highly correlated features fr om the dataset.",
            "D. Apply min-max feature scaling to the dataset."
        ],
        "correct": "",
        "explanation": "Explanation/Reference: https://royalsocietypublishing.org/doi.1098/rsta.20 15.0202 https://scikit-learn.org/stable/auto_exampl es/ preprocessing/plot_all_scaling.html",
        "references": ""
    },
    {
        "question": "A company is building a new version of a recommenda tion engine. Mach ne learning (ML) specialists need  to keep adding new data from users to improve personal ized recommendations. The ML specialists gather dat a from the users' interactions on the platform and fr om sources such as external websites and social med ia. The pipeline cleans, transforms, enriches, and compress es terabytes of da a daily, and this data is stored  in Amazon S3. A set of Python scripts was coded to d t he job and is stored in a large Amazon EC2 instance . The whole process takes more than 20 hours to finish, w ith each script taking at least an hour. The compan y wants to move the scripts out f Amazon EC2 into a more ma naged solution that will eliminate the need to main tain servers. Which approach will address all of these r equirements with the LEAST development effort?",
        "options": [
            "A. Load the data into an Amazon Redshift cluster. Ex ecute the pipeline by using SQL. Store the results in",
            "B. Load the data into Amazon Dyn moDB Convert the sc ripts to an AWS Lambda function. Execute the",
            "C. Create an AWS Glue job. Convert the scripts to Py Spark. Execute the pipeline. Store the results in A mazon",
            "D. Create a set of individual AWS Lambda functions t o execute each of the scripts. Build a step functio n by"
        ],
        "correct": "B. Load the data into Amazon Dyn moDB Convert the sc ripts to an AWS Lambda function. Execute the",
        "explanation": "Explanation/Reference: https://docs.aws.amazon.com/lambda/latest/dg/with-s 3-example.html",
        "references": ""
    },
    {
        "question": "A retail company is selling products through a glob al online marketplace. The company wants to use mac hine learning (ML) to analyze customer feedback and iden tify specific areas for improvement. A developer ha s built a tool that collects customer reviews from the onli ne marketplace and stores them in an Amazon S3 buck et. This process yields a dataset of 40 reviews. A data  scientist building the ML models must identify add itional sources of data to increase the size of the dataset . Which data sources should the data scientist use to  augment the dataset of reviews? (Choose three.)",
        "options": [
            "A. Emails exchanged by customers and the company's c ustomer service agents",
            "B. Social media posts containing the name of the com pany or its products",
            "C. A publicly available collection of news articles",
            "D. A publicly available collection of customer revie ws"
        ],
        "correct": "",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A machine learning (ML) specialist wants to create a data preparation job that uses a PySpark script w ith complex window aggregation operations to create dat a for training and testing. The ML specialist needs  to evaluate the impact of the number of features and t he sample count on model performance. Which approac h should the ML specialist use to determine the ideal  data transformations for the model?",
        "options": [
            "A. Add an Amazon SageMaker Debugger hook to the scri pt to capture key metrics. Run the script as an AWS",
            "B. Add an Amazon SageMaker Experiments tracker to th e script to capture key metrics. Run the script as an",
            "C. Add an Amazon SageMaker Debugger hook to the scri pt to capture key parameters. Run the script as a",
            "D. Add an Amazon SageMaker Experiments tracker to th e script to capture key parameters. Run the script as"
        ],
        "correct": "",
        "explanation": "Explanation/Reference: https://docs.aws.amazon.com/sagemaker/latest/dg/exp eriments.html",
        "references": ""
    },
    {
        "question": "A data scientist has a dataset of machine part imag es stored in Amazon Elastic File System (Amazon EFS ). The data scientist needs to use Amazon SageMaker to  create and train an image classification machine learning model based on this dataset. Because of bu dget and time constraints, management wants the data scientist t create and tr ain a model with the least number of steps and inte gration work required. How should the data scientist meet these requiremen ts?",
        "options": [
            "A. Mount the EFS file system to a SageMaker notebook  and run a script that copies the data to an Amazon",
            "B. Launch a transient Amazon EMR cluster. Configure steps to mount the EFS file system and copy the dat a",
            "C. Mount the EFS file system to an Amazon EC2 instan ce and use the AWS CLI to copy the data to an",
            "D. Run a SageMaker training job with an EFS file sys tem as the data source."
        ],
        "correct": "A. Mount the EFS file system to a SageMaker notebook  and run a script that copies the data to an Amazon",
        "explanation": "Explanation/Reference: https://aws.amazon.com/about-aws/whats-new9/08/amaz on-sagemaker-workswith-amazon-fsx-lustre-amazon- efs-model-training/",
        "references": ""
    },
    {
        "question": "A retail company uses a machine learning (ML) model  for daily sales forecasting. The company's brand manager reports that the model has provided inaccur ate results for the past 3 weeks. At the end of eac h day, an AWS Glue job consolidates the input data that is  used for the forecasting with the actual daily sal es data and the predictions of the model. The AWS Glue job stor es the data in Amazon S3. The company's ML team is using an Amazon SageMaker Studio notebook to gain a n understanding about the source of the model's inaccuracies. What should the ML team do on the SageMaker Studio notebook to visualize the model's degradation MOST accurately?",
        "options": [
            "A. Create a histogram of the daily sales over the la st 3 weeks. In addition, create a histogram of the daily sales",
            "B. Create a histogram of the model errors over the l ast 3 weeks. In addition, create a histogram of the  model",
            "C. Create a line chart with the weekly mean absolute  error (MAE) of the model.",
            "D. Create a scatter plot of daily sales versus model  error for the last 3 weeks. In addition, create a scatter plot"
        ],
        "correct": "C. Create a line chart with the weekly mean absolute  error (MAE) of the model.",
        "explanation": "Explanation/Reference: https://machinelearningmastery.com/time-series-fore casting-performance-measureswith-python/",
        "references": ""
    },
    {
        "question": "An ecommerce company sends a weekly email newslette r to all of its customers. Management has hired a team of writers to create additional targeted conte nt. A d ta scientist needs to identify five custome r segments based on age, income, and location. The customers' current segmentation is unknown. The data scientist previously built an XGBoost model to predict the li kelihood of a customer responding to an email based  on age, income, and location. Why does the XGBoost model NO T meet the current requirements, and how can this b e fixed?",
        "options": [
            "A. The XGBoost model provides a true/false binary ou tput. Apply principal component analysis (PCA) with  five",
            "B. The XGBoost model provides a true/false binary ou tput. Increase the number of classes the XGBoost",
            "C. The XGBoost model is a supervised machine learnin g algorithm. Train a k-Nearest-Neighbors (kNN) mode l",
            "D. The XGBoost model is a supervised machine learnin g algorithm. Train a k-means model with K = 5 on th e"
        ],
        "correct": "D. The XGBoost model is a supervised machine learnin g algorithm. Train a k-means model with K = 5 on th e",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A global financial company is using machine learnin g to automate its loan approval process. The compan y has a dataset of customer information. The dataset cont ains some categorical fields, such as customer loca tion by city and housing status. The dataset also includes financial fields in different units, such as accoun t balances in US dollars and monthly interest in US cents. The company's data scientists are using a gradient boosting regression model to infer the credit score  for each customer. The model has a training accuracy of 99% and a testing accuracy of 75%. The data scientists want to improve the model's testing accuracy. Which process will improve the testing accuracy the  MOST?",
        "options": [
            "A. Use a one-hot encoder for the categorical fields in the dataset. Perform standardization on the fina ncial",
            "B. Use tokenization of the categorical fields in the  dataset. Perform binning on the financial fields i n the",
            "C. Use a label encoder for the categorical fields in  the dataset. Perform L1 regularization on the fina ncial fields",
            "D. Use a logarithm transformation on the categorical  fields in the dataset. Perform binning on the fina ncial"
        ],
        "correct": "A. Use a one-hot encoder for the categorical fields in the dataset. Perform standardization on the fina ncial",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A machine learning (ML) specialist needs to extract  embedding vectors from a text series. The goal is to provide a ready-to-ingest feature space for a data scientist to develop downstream ML predictive model s. The text consists of curated sentences in English. Many  sentences use similar words but in different conte xts. There are questions and answers among the sentences, and the embedding space must differentiate between them . Which options can produce the required embedding ve ctors that capture word context and sequential QA information? (Choose two.)",
        "options": [
            "A. Amazon SageMaker seq2seq algorithm",
            "B. Amazon SageMaker BlazingText algorithm in Skip-gr am mode",
            "C. Amazon SageMaker Object2Vec algorithm",
            "D. Amazon SageMaker BlazingText algorithm in continu ous bag-of words (CBOW) mode"
        ],
        "correct": "",
        "explanation": "Explanation/Reference: https://aws.amazon.com/blogs/machine-learning/creat e-a-word-pronunciation-sequence-to-sequence-model- using-amazon-sagemaker/ https://docs.aws.amazon.com/sagemaker/latest/dg/obj ect2vec.html",
        "references": ""
    },
    {
        "question": "A retail company wants to update its customer suppo rt system. The company wants to implement automatic routing of customer claims to different queues to p rioritize the claims by category. Currently, an ope rator manually performs the category assignment and routi ng. After the operator classifies and routes the cl aim, the company stores the claim's record in a central data base. The claim's record includes the claim's categ ory. The company has no data science team or experience in the field of machine learning (ML). The company' s small development team needs a solution that requir es no ML expertise. Which solution meets these requirements?",
        "options": [
            "A. Export the database to a .csv file with two colum ns: claim_label and claim_text. Use the Amazon",
            "B. Export the database to a .csv file with one colum n: claim_text. Use the Amazon SageMaker Latent Diri chlet",
            "C. Use Amazon Textract to process the database and a utomatically detect two columns: claim_label and",
            "D. Export the database to a .csv file with two colum ns: claim_label and claim_text. Use Amazon Comprehe nd",
            "A. Modify the HPO configuration as follows:",
            "B. Run three different HPO jobs that use different l earning rates form the following intervals for MinV alue and",
            "C. Modify the HPO configuration as follows:",
            "D. Run three different HPO jobs that use different l earning rates form the following intervals for MinV alue and"
        ],
        "correct": "D. Export the database to a .csv file with two colum ns: claim_label and claim_text. Use Amazon Comprehe nd",
        "explanation": "Explanation/Reference:",
        "references": ""
    },
    {
        "question": "A manufacturing company wants to use machine learni ng (ML) to automate quality control in its faciliti es. The facilities are in remote locations and have limited  internet connectivity. The company corporate on-premises data center. The company will use this data to train a model for  real-time defect detection in new parts as the par ts move on a conveyor belt in the facilities. The company need s a solution that minimizes costs for compute infra structure and that maximizes the scalability of resources for  training. The solution also must facilitate the co mpany's use of an ML model in the low-connectivity environments . Which solution will meet these requirements?",
        "options": [
            "A. Move the training data to an Amazon S3 bucket. Tr ain and evaluate the model by using Amazon",
            "B. Train and evaluate the model on-premises. Upload the model to an Amazon S3 bucket. Deploy the model",
            "C. Move the training data to an Amazon S3 bucket. Tr ain and evaluate the model by using Amazon",
            "D. Train the model on premises. Upload the model to an Amazon S3 bucket. Set up an edge device in the"
        ],
        "correct": "A. Move the training data to an Amazon S3 bucket. Tr ain and evaluate the model by using Amazon",
        "explanation": "Explanation/Reference: https://docs.aws.amazon.com/sagemaker/latest/dg/how -it-works-deployment.html",
        "references": ""
    },
    {
        "question": "A company has an ecommerce website with a product r ecommendation engine built in TensorFlow. The recommendation engine endpoint is hosted by Amazon SageMaker. Three compute-optimized instances support the expected peak load of the website. Response times on the product recommendation page a re increasing at the beginning of each month. Some users are encountering errors. The website receives  the majority of its traffic between 8 AM and 6 PM on weekdays in a single time zone. Which of the following options are the MOST effecti ve in solving the issue while keeping costs to a mi nimum? (Choose two.)",
        "options": [
            "A. Configure the endpoint to use Amazon Elastic Infe rence (EI) accelerators.",
            "B. Create a new endpoint configuration with two prod uction variants.",
            "C. Configure the endpoint to automatically scale wit h the Invocations PerInstance metric.",
            "D. Deploy a second instance pool to support a blue/g reen deployment of models."
        ],
        "correct": "",
        "explanation": "Explanation Explanation/Reference: https://docs.aws.amazon.com/sagemaker/latest/APIRef erence/API_ProductionVariant.html https://www.redhat.com/en/topics/devops/what-is-blu e-green-deployment",
        "references": ""
    },
    {
        "question": "A real-estate company is launching a new product th at predicts the prices of new houses. The historica l data for the properties and prices is stored in .csv format in an Amazon S3 bucket. The data has a header, some categorical fields, and some missing values. The co mpany's data scientists have used Python with a com mon open-source library to fill the missing values with  zeros The data scientists have dropped all of the categorical fields and have trained a model by using the openso urce linear regression algorithm with the default parameters. The accuracy of the predictions with the current mo del is below 50%. The company wants to improve the model performance and launch the new product as soon as p ossible. Which solution will meet these requirements with th e LEAST operational overhead?",
        "options": [
            "A. Create a service-linked role for Amazon Elastic C ontainer Service (Amazon ECS) with access to the S3",
            "B. Create an Amazon SageMaker notebook with a new IA M role that is associated with the notebook. Pull t he",
            "C. Create an IAM role with access Amazon S3, Amazon SageMaker, and AWS Lambda. Create a training job",
            "D. Create an IAM role for Amazon SageMaker with acce ss to the S3 bucket. Create a SageMaker AutoML job"
        ],
        "correct": "A. Create a service-linked role for Amazon Elastic C ontainer Service (Amazon ECS) with access to the S3",
        "explanation": "Explanation/Reference: https://docs.aws.amazon.com/deep-learning-container s/latest/devguide/deep-learningcontainers-ecs- setup.html",
        "references": ""
    }
]