File size: 32,825 Bytes
557ba5e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "given-adoption",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "<picture>\n",
    "  <source media=\"(prefers-color-scheme: dark)\" srcset=\"https://vespa.ai/assets/vespa-ai-logo-heather.svg\">\n",
    "  <source media=\"(prefers-color-scheme: light)\" srcset=\"https://vespa.ai/assets/vespa-ai-logo-rock.svg\">\n",
    "  <img alt=\"#Vespa\" width=\"200\" src=\"https://vespa.ai/assets/vespa-ai-logo-rock.svg\" style=\"margin-bottom: 25px;\">\n",
    "</picture>\n",
    "\n",
    "# Deploy a sample app to Vespa Cloud\n",
    "\n",
    "This is the same guide as [getting-started-pyvespa](https://pyvespa.readthedocs.io/en/latest/getting-started-pyvespa.html), deploying to Vespa Cloud.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f8c1448",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "    Refer to <a href=\"https://pyvespa.readthedocs.io/en/latest/troubleshooting.html\">troubleshooting</a>\n",
    "    for any problem when running this guide.\n",
    "</div>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "148d275b",
   "metadata": {},
   "source": [
    "**Pre-requisite**: Create a tenant at [cloud.vespa.ai](https://cloud.vespa.ai/), save the tenant name.\n",
    "\n",
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vespa-engine/pyvespa/blob/master/docs/sphinx/source/getting-started-pyvespa-cloud.ipynb)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "366b0d83",
   "metadata": {},
   "source": [
    "## Install\n",
    "\n",
    "Install [pyvespa](https://pyvespa.readthedocs.io/) >= 0.45\n",
    "and the [Vespa CLI](https://docs.vespa.ai/en/vespa-cli.html).\n",
    "The Vespa CLI is used for data and control plane key management ([Vespa Cloud Security Guide](https://cloud.vespa.ai/en/security/guide)).\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "136750de",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip3 install pyvespa vespacli datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "02f706ff",
   "metadata": {},
   "source": [
    "## Configure application\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "9ca4da83",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Replace with your tenant name from the Vespa Cloud Console\n",
    "tenant_name = \"mytenant\"\n",
    "# Replace with your application name (does not need to exist yet)\n",
    "application = \"fasthtml\"\n",
    "# Token id (from Vespa Cloud Console)\n",
    "token_id = \"fasthtmltoken\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "db637322",
   "metadata": {},
   "source": [
    "## Create an application package\n",
    "\n",
    "The [application package](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.package.ApplicationPackage)\n",
    "has all the Vespa configuration files -\n",
    "create one from scratch:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "bd5c2629",
   "metadata": {},
   "outputs": [],
   "source": [
    "from vespa.package import (\n",
    "    ApplicationPackage,\n",
    "    Field,\n",
    "    Schema,\n",
    "    Document,\n",
    "    HNSW,\n",
    "    RankProfile,\n",
    "    Component,\n",
    "    Parameter,\n",
    "    FieldSet,\n",
    "    GlobalPhaseRanking,\n",
    "    Function,\n",
    "    AuthClient,\n",
    ")\n",
    "\n",
    "package = ApplicationPackage(\n",
    "    name=application,\n",
    "    schema=[\n",
    "        Schema(\n",
    "            name=\"doc\",\n",
    "            document=Document(\n",
    "                fields=[\n",
    "                    Field(name=\"id\", type=\"string\", indexing=[\"summary\"]),\n",
    "                    Field(\n",
    "                        name=\"title\",\n",
    "                        type=\"string\",\n",
    "                        indexing=[\"index\", \"summary\"],\n",
    "                        index=\"enable-bm25\",\n",
    "                    ),\n",
    "                    Field(\n",
    "                        name=\"body\",\n",
    "                        type=\"string\",\n",
    "                        indexing=[\"index\", \"summary\"],\n",
    "                        index=\"enable-bm25\",\n",
    "                        bolding=True,\n",
    "                    ),\n",
    "                    Field(\n",
    "                        name=\"embedding\",\n",
    "                        type=\"tensor<float>(x[384])\",\n",
    "                        indexing=[\n",
    "                            'input title . \" \" . input body',\n",
    "                            \"embed\",\n",
    "                            \"index\",\n",
    "                            \"attribute\",\n",
    "                        ],\n",
    "                        ann=HNSW(distance_metric=\"angular\"),\n",
    "                        is_document_field=False,\n",
    "                    ),\n",
    "                ]\n",
    "            ),\n",
    "            fieldsets=[FieldSet(name=\"default\", fields=[\"title\", \"body\"])],\n",
    "            rank_profiles=[\n",
    "                RankProfile(\n",
    "                    name=\"bm25\",\n",
    "                    inputs=[(\"query(q)\", \"tensor<float>(x[384])\")],\n",
    "                    functions=[\n",
    "                        Function(name=\"bm25sum\", expression=\"bm25(title) + bm25(body)\")\n",
    "                    ],\n",
    "                    first_phase=\"bm25sum\",\n",
    "                ),\n",
    "                RankProfile(\n",
    "                    name=\"semantic\",\n",
    "                    inputs=[(\"query(q)\", \"tensor<float>(x[384])\")],\n",
    "                    first_phase=\"closeness(field, embedding)\",\n",
    "                ),\n",
    "                RankProfile(\n",
    "                    name=\"fusion\",\n",
    "                    inherits=\"bm25\",\n",
    "                    inputs=[(\"query(q)\", \"tensor<float>(x[384])\")],\n",
    "                    first_phase=\"closeness(field, embedding)\",\n",
    "                    global_phase=GlobalPhaseRanking(\n",
    "                        expression=\"reciprocal_rank_fusion(bm25sum, closeness(field, embedding))\",\n",
    "                        rerank_count=1000,\n",
    "                    ),\n",
    "                ),\n",
    "            ],\n",
    "        )\n",
    "    ],\n",
    "    components=[\n",
    "        Component(\n",
    "            id=\"e5\",\n",
    "            type=\"hugging-face-embedder\",\n",
    "            parameters=[\n",
    "                Parameter(\n",
    "                    \"transformer-model\",\n",
    "                    {\n",
    "                        \"url\": \"https://github.com/vespa-engine/sample-apps/raw/master/simple-semantic-search/model/e5-small-v2-int8.onnx\"\n",
    "                    },\n",
    "                ),\n",
    "                Parameter(\n",
    "                    \"tokenizer-model\",\n",
    "                    {\n",
    "                        \"url\": \"https://raw.githubusercontent.com/vespa-engine/sample-apps/master/simple-semantic-search/model/tokenizer.json\"\n",
    "                    },\n",
    "                ),\n",
    "            ],\n",
    "        )\n",
    "    ],\n",
    "    auth_clients=[\n",
    "        AuthClient(\n",
    "            id=\"mtls\",\n",
    "            permissions=[\"read\", \"write\"],\n",
    "            parameters=[Parameter(\"certificate\", {\"file\": \"security/clients.pem\"})],\n",
    "        ),\n",
    "        AuthClient(\n",
    "            id=\"token\",\n",
    "            permissions=[\"read\", \"write\"],\n",
    "            parameters=[Parameter(\"token\", {\"id\": token_id})],\n",
    "        ),\n",
    "    ],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2c5e2943",
   "metadata": {},
   "source": [
    "Note that the name cannot have `-` or `_`.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "careful-savage",
   "metadata": {},
   "source": [
    "## Deploy to Vespa Cloud\n",
    "\n",
    "The app is now defined and ready to deploy to Vespa Cloud.\n",
    "\n",
    "Deploy `package` to Vespa Cloud, by creating an instance of\n",
    "[VespaCloud](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.deployment.VespaCloud):\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "canadian-blood",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Setting application...\n",
      "Running: vespa config set application scoober.fasthtml\n",
      "Setting target cloud...\n",
      "Running: vespa config set target cloud\n",
      "\n",
      "No api-key found for control plane access. Using access token.\n",
      "Checking for access token in auth.json...\n",
      "Successfully obtained access token for control plane access.\n"
     ]
    }
   ],
   "source": [
    "from vespa.deployment import VespaCloud\n",
    "\n",
    "vespa_cloud = VespaCloud(\n",
    "    tenant=tenant_name,\n",
    "    application=application,\n",
    "    application_package=package,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "197c0a27",
   "metadata": {},
   "source": [
    "The following will upload the application package to Vespa Cloud Dev Zone (`aws-us-east-1c`), read more about [Vespa Zones](https://cloud.vespa.ai/en/reference/zones.html).\n",
    "The Vespa Cloud Dev Zone is considered as a sandbox environment where resources are down-scaled and idle deployments are expired automatically.\n",
    "For information about production deployments, see the following [example](https://pyvespa.readthedocs.io/en/latest/getting-started-pyvespa-cloud.html#Example:-Deploy-the-app-to-the-prod-environment).\n",
    "\n",
    "> Note: Deployments to dev and perf expire after 7 days of inactivity, i.e., 7 days after running deploy. This applies to all plans, not only the Free Trial. Use the Vespa Console to extend the expiry period, or redeploy the application to add 7 more days.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "752166fc",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Deployment started in run 13 of dev-aws-us-east-1c for scoober.fasthtml. This may take a few minutes the first time.\n",
      "INFO    [10:44:08]  Deploying platform version 8.397.20 and application dev build 7 for dev-aws-us-east-1c of default ...\n",
      "INFO    [10:44:08]  Using CA signed certificate version 1\n",
      "INFO    [10:44:08]  Using 1 nodes in container cluster 'fasthtml_container'\n",
      "INFO    [10:44:11]  Validating Onnx models memory usage for container cluster 'fasthtml_container', percentage of available memory too low (10 < 15) to avoid restart, consider a flavor with more memory to avoid this\n",
      "INFO    [10:44:13]  Session 4210 for tenant 'scoober' prepared and activated.\n",
      "INFO    [10:44:13]  ######## Details for all nodes ########\n",
      "INFO    [10:44:13]  h94419b.dev.aws-us-east-1c.vespa-external.aws.oath.cloud: expected to be UP\n",
      "INFO    [10:44:13]  --- platform vespa/cloud-tenant-rhel8:8.397.20\n",
      "INFO    [10:44:13]  --- storagenode on port 19102 has config generation 4210, wanted is 4210\n",
      "INFO    [10:44:13]  --- searchnode on port 19107 has config generation 4210, wanted is 4210\n",
      "INFO    [10:44:13]  --- distributor on port 19111 has config generation 4210, wanted is 4210\n",
      "INFO    [10:44:13]  --- metricsproxy-container on port 19092 has config generation 4210, wanted is 4210\n",
      "INFO    [10:44:13]  h93281d.dev.aws-us-east-1c.vespa-external.aws.oath.cloud: expected to be UP\n",
      "INFO    [10:44:13]  --- platform vespa/cloud-tenant-rhel8:8.397.20\n",
      "INFO    [10:44:13]  --- container-clustercontroller on port 19050 has config generation 4209, wanted is 4210\n",
      "INFO    [10:44:13]  --- metricsproxy-container on port 19092 has config generation 4210, wanted is 4210\n",
      "INFO    [10:44:13]  h93281b.dev.aws-us-east-1c.vespa-external.aws.oath.cloud: expected to be UP\n",
      "INFO    [10:44:13]  --- platform vespa/cloud-tenant-rhel8:8.397.20\n",
      "INFO    [10:44:13]  --- logserver-container on port 4080 has config generation 4209, wanted is 4210\n",
      "INFO    [10:44:13]  --- metricsproxy-container on port 19092 has config generation 4210, wanted is 4210\n",
      "INFO    [10:44:13]  h95982a.dev.aws-us-east-1c.vespa-external.aws.oath.cloud: expected to be UP\n",
      "INFO    [10:44:13]  --- platform vespa/cloud-tenant-rhel8:8.397.20\n",
      "INFO    [10:44:13]  --- container on port 4080 has config generation 4209, wanted is 4210\n",
      "INFO    [10:44:13]  --- metricsproxy-container on port 19092 has config generation 4209, wanted is 4210\n",
      "INFO    [10:44:23]  Found endpoints:\n",
      "INFO    [10:44:23]  - dev.aws-us-east-1c\n",
      "INFO    [10:44:23]   |-- https://d14d3ce0.ba4a39d8.z.vespa-app.cloud/ (cluster 'fasthtml_container')\n",
      "INFO    [10:44:23]  Deployment of new application complete!\n",
      "Found mtls endpoint for fasthtml_container\n",
      "URL: https://d14d3ce0.ba4a39d8.z.vespa-app.cloud/\n",
      "Connecting to https://d14d3ce0.ba4a39d8.z.vespa-app.cloud/\n",
      "Using Mutual TLS with key and cert to connect to Vespa endpoint https://d14d3ce0.ba4a39d8.z.vespa-app.cloud/\n",
      "Application is up!\n",
      "Finished deployment.\n"
     ]
    }
   ],
   "source": [
    "app = vespa_cloud.deploy()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aaae2f91",
   "metadata": {},
   "source": [
    "If the deployment failed, it is possible you forgot to add the key in the Vespa Cloud Console in the `vespa auth api-key` step above.\n",
    "\n",
    "If you can authenticate, you should see lines like the following\n",
    "\n",
    "```\n",
    " Deployment started in run 1 of dev-aws-us-east-1c for mytenant.hybridsearch.\n",
    "```\n",
    "\n",
    "The deployment takes a few minutes the first time while Vespa Cloud sets up the resources for your Vespa application\n",
    "\n",
    "`app` now holds a reference to a [Vespa](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.application.Vespa) instance. We can access the\n",
    "mTLS protected endpoint name using the control-plane (vespa_cloud) instance. This endpoint we can query and feed to (data plane access) using the\n",
    "mTLS certificate generated in previous steps.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "sealed-mustang",
   "metadata": {},
   "source": [
    "### Feeding documents to Vespa\n",
    "\n",
    "In this example we use the [HF Datasets](https://huggingface.co./docs/datasets/index) library to stream the\n",
    "[BeIR/nfcorpus](https://huggingface.co./datasets/BeIR/nfcorpus) dataset and index in our newly deployed Vespa instance. Read\n",
    "more about the [NFCorpus](https://www.cl.uni-heidelberg.de/statnlpgroup/nfcorpus/):\n",
    "\n",
    "> NFCorpus is a full-text English retrieval data set for Medical Information Retrieval.\n",
    "\n",
    "The following uses the [stream](https://huggingface.co./docs/datasets/stream) option of datasets to stream the data without\n",
    "downloading all the contents locally. The `map` functionality allows us to convert the\n",
    "dataset fields into the expected feed format for `pyvespa` which expects a dict with the keys `id` and `fields`:\n",
    "\n",
    "`{ \"id\": \"vespa-document-id\", \"fields\": {\"vespa_field\": \"vespa-field-value\"}}`\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "9a49fa8e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Found token endpoint for fasthtml_container\n",
      "URL: https://d3f601e7.ba4a39d8.z.vespa-app.cloud/\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'https://d3f601e7.ba4a39d8.z.vespa-app.cloud/'"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "token_endpoint = vespa_cloud.get_token_endpoint()\n",
    "token_endpoint"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "126c0c29",
   "metadata": {},
   "source": [
    "Add this endpoint to your `.env.example` file:\n",
    "\n",
    "```bash\n",
    "VESPA_APP_URL=https://d3f601e7.ba4a39d8.z.vespa-app.cloud/\n",
    "```\n",
    "\n",
    "Remember to rename the file to `.env`.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "775f3dd4",
   "metadata": {},
   "source": [
    "## Feed data\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "executed-reservoir",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/thomas/.pyenv/versions/3.9.19/envs/pyvespa-dev/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n"
     ]
    }
   ],
   "source": [
    "from datasets import load_dataset\n",
    "\n",
    "dataset = load_dataset(\"BeIR/nfcorpus\", \"corpus\", split=\"corpus\", streaming=True)\n",
    "vespa_feed = dataset.map(\n",
    "    lambda x: {\n",
    "        \"id\": x[\"_id\"],\n",
    "        \"fields\": {\"title\": x[\"title\"], \"body\": x[\"text\"], \"id\": x[\"_id\"]},\n",
    "    }\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f0ca33f",
   "metadata": {},
   "source": [
    "Now we can feed to Vespa using `feed_iterable` which accepts any `Iterable` and an optional callback function where we can\n",
    "check the outcome of each operation. The application is configured to use [embedding](https://docs.vespa.ai/en/embedding.html)\n",
    "functionality, that produce a vector embedding using a concatenation of the title and the body input fields. This step is resource intensive.\n",
    "\n",
    "Read more about embedding inference in Vespa in the [Accelerating Transformer-based Embedding Retrieval with Vespa](https://blog.vespa.ai/accelerating-transformer-based-embedding-retrieval-with-vespa/)\n",
    "blog post.\n",
    "\n",
    "Default node resources in Vespa Cloud have 2 v-cpu for the Dev Zone.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "bottom-memorabilia",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Using mtls_key_cert Authentication against endpoint https://d14d3ce0.ba4a39d8.z.vespa-app.cloud//ApplicationStatus\n"
     ]
    }
   ],
   "source": [
    "from vespa.io import VespaResponse\n",
    "\n",
    "\n",
    "def callback(response: VespaResponse, id: str):\n",
    "    if not response.is_successful():\n",
    "        print(f\"Error when feeding document {id}: {response.get_json()}\")\n",
    "\n",
    "\n",
    "app.feed_iterable(vespa_feed, schema=\"doc\", namespace=\"tutorial\", callback=callback)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "336e339d",
   "metadata": {},
   "source": [
    "### Run a test query\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "11faeacf",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'root': {'id': 'toplevel',\n",
       "  'relevance': 1.0,\n",
       "  'fields': {'totalCount': 1387},\n",
       "  'coverage': {'coverage': 100,\n",
       "   'documents': 3633,\n",
       "   'full': True,\n",
       "   'nodes': 1,\n",
       "   'results': 1,\n",
       "   'resultsFull': 1},\n",
       "  'children': [{'id': 'id:tutorial:doc::MED-2464',\n",
       "    'relevance': 0.03200204813108039,\n",
       "    'source': 'fasthtml_content',\n",
       "    'fields': {'sddocname': 'doc',\n",
       "     'body': \"BACKGROUND: In recent decades, children's diet quality has changed <hi>and</hi> <hi>asthma</hi> prevalence has increased, although it remains unclear if these events are associated. OBJECTIVE: To examine children's total <hi>and</hi> component diet quality <hi>and</hi> <hi>asthma</hi> <hi>and</hi> airway hyperresponsiveness (AHR), a proxy for <hi>asthma</hi> severity. METHODS: Food frequency questionnaires adapted from the Nurses' Health Study <hi>and</hi> supplemented with foods whose nutrients which have garnered interest of late in relation to <hi>asthma</hi> were administered. From these data, diet quality scores (total <hi>and</hi> component), based on the Youth Healthy Eating Index (YHEI adapted) were developed. <hi>Asthma</hi> assessments were performed by pediatric allergists <hi>and</hi> classified by atopic status: Allergic <hi>asthma</hi> (≥1 positive skin prick test to common allergens >3 mm compared to negative control) versus non-allergic <hi>asthma</hi> (negative skin prick test). AHR was assessed via the Cockcroft technique. Participants included 270 boys (30% with <hi>asthma</hi>) <hi>and</hi> 206 girls (33% with <hi>asthma</hi>) involved in the 1995 Manitoba Prospective Cohort Study nested case-control study. Logistic regression was used to examine associations between diet quality <hi>and</hi> <hi>asthma</hi>, <hi>and</hi> multinomial logistic regression was used to examine associations between diet quality <hi>and</hi> AHR. RESULTS: Four hundred seventy six children (56.7% boys) were seen at 12.6 ± 0.5 years. <hi>Asthma</hi> <hi>and</hi> AHR prevalence were 26.2 <hi>and</hi> 53.8%, respectively. In fully adjusted models, high <hi>vegetable</hi> intake was protective against allergic <hi>asthma</hi> (OR 0.49; 95% CI 0.29-0.84; P < 0.009) <hi>and</hi> moderate/severe AHR (OR 0.58; 0.37-0.91; P < 0.019). CONCLUSIONS: <hi>Vegetable</hi> intake is inversely associated with allergic <hi>asthma</hi> <hi>and</hi> moderate/severe AHR. Copyright © 2012 Wiley Periodicals, Inc.\",\n",
       "     'documentid': 'id:tutorial:doc::MED-2464',\n",
       "     'id': 'MED-2464',\n",
       "     'title': 'Low vegetable intake is associated with allergic asthma and moderate-to-severe airway hyperresponsiveness.'}},\n",
       "   {'id': 'id:tutorial:doc::MED-2450',\n",
       "    'relevance': 0.03177805800756621,\n",
       "    'source': 'fasthtml_content',\n",
       "    'fields': {'sddocname': 'doc',\n",
       "     'body': \"Background Atopy is not uncommon among children living in rural Crete, but wheeze <hi>and</hi> rhinitis are rare. A study was undertaken to examine whether this discrepancy could be attributed to a high consumption of fresh fruit <hi>and</hi> <hi>vegetables</hi> or adherence to a traditional Mediterranean diet. Methods A cross‐sectional survey was performed in 690 children aged 7–18\\u2005years in rural Crete. Parents completed a questionnaire on their child's respiratory <hi>and</hi> allergic symptoms <hi>and</hi> a 58‐item food frequency questionnaire. Adherence to a Mediterranean diet was measured using a scale with 12 dietary items. Children underwent skin prick tests with 10 common aeroallergens. Results 80% of children ate fresh fruit (<hi>and</hi> 68% <hi>vegetables</hi>) at least twice a day. The intake of grapes, oranges, apples, <hi>and</hi> fresh tomatoes—the main local products in Crete—had no association with atopy but was protective for wheezing <hi>and</hi> rhinitis. A high consumption of nuts was found to be inversely associated with wheezing (OR 0.46; 95% CI 0.20 to 0.98), whereas margarine increased the risk of both wheeze (OR 2.19; 95% CI 1.01 to 4.82) <hi>and</hi> allergic rhinitis (OR 2.10; 95% CI 1.31 to 3.37). A high level of adherence to the Mediterranean diet was protective for allergic rhinitis (OR 0.34; 95% CI 0.18 to 0.64) while a more modest protection was observed for wheezing <hi>and</hi> atopy. Conclusion The results of this study suggest a beneficial effect of commonly consumed <hi>fruits</hi>, <hi>vegetables</hi> <hi>and</hi> nuts, <hi>and</hi> of a high adherence to a traditional Mediterranean diet during childhood on symptoms of <hi>asthma</hi> <hi>and</hi> rhinitis. Diet may explain the relative lack of allergic symptoms in this population.\",\n",
       "     'documentid': 'id:tutorial:doc::MED-2450',\n",
       "     'id': 'MED-2450',\n",
       "     'title': 'Protective effect of fruits, vegetables and the Mediterranean diet on asthma and allergies among children in Crete'}},\n",
       "   {'id': 'id:tutorial:doc::MED-2458',\n",
       "    'relevance': 0.030776515151515152,\n",
       "    'source': 'fasthtml_content',\n",
       "    'fields': {'sddocname': 'doc',\n",
       "     'body': 'BACKGROUND: Antioxidant-rich diets are associated with reduced <hi>asthma</hi> prevalence in epidemiologic studies. We previously showed that short-term manipulation of antioxidant defenses leads to changes in <hi>asthma</hi> outcomes. OBJECTIVE: The objective was to investigate the effects of a high-antioxidant diet compared with those of a low-antioxidant diet, with or without lycopene supplementation, in <hi>asthma</hi>. DESIGN: <hi>Asthmatic</hi> adults (n = 137) were randomly assigned to a high-antioxidant diet (5 servings of <hi>vegetables</hi> <hi>and</hi> 2 servings of fruit daily; n = 46) or a low-antioxidant diet (≤2 servings of <hi>vegetables</hi> <hi>and</hi> 1 serving of fruit daily; n = 91) for 14 d <hi>and</hi> then commenced a parallel, randomized, controlled supplementation trial. Subjects who consumed the high-antioxidant diet received placebo. Subjects who consumed the low-antioxidant diet received placebo or tomato extract (45 mg lycopene/d). The intervention continued until week 14 or until an exacerbation occurred. RESULTS: After 14 d, subjects consuming the low-antioxidant diet had a lower percentage predicted forced expiratory volume in 1 s <hi>and</hi> percentage predicted forced vital capacity than did those consuming the high-antioxidant diet. Subjects in the low-antioxidant diet group had increased plasma C-reactive protein at week 14. At the end of the trial, time to exacerbation was greater in the high-antioxidant than in the low-antioxidant diet group, <hi>and</hi> the low-antioxidant diet group was 2.26 (95% CI: 1.04, 4.91; P = 0.039) times as likely to exacerbate. Of the subjects in the low-antioxidant diet group, no difference in airway or systemic inflammation or clinical outcomes was observed between the groups that consumed the tomato extract <hi>and</hi> those who consumed placebo. CONCLUSIONS: Modifying the dietary intake of carotenoids alters clinical <hi>asthma</hi> outcomes. Improvements were evident only after increased fruit <hi>and</hi> <hi>vegetable</hi> intake, which suggests that whole-food interventions are most effective. This trial was registered at http://www.actr.org.au as ACTRN012606000286549.',\n",
       "     'documentid': 'id:tutorial:doc::MED-2458',\n",
       "     'id': 'MED-2458',\n",
       "     'title': 'Manipulating antioxidant intake in asthma: a randomized controlled trial.'}},\n",
       "   {'id': 'id:tutorial:doc::MED-2461',\n",
       "    'relevance': 0.03055037313432836,\n",
       "    'source': 'fasthtml_content',\n",
       "    'fields': {'sddocname': 'doc',\n",
       "     'body': 'This study aimed to evaluate the association of diet with respiratory symptoms <hi>and</hi> <hi>asthma</hi> in schoolchildren in Taipei, Taiwan. An in-class interview survey elicited experiences of <hi>asthma</hi> <hi>and</hi> respiratory symptoms <hi>and</hi> consumption frequencies of the major food categories in 2290 fifth graders. Respiratory symptoms surveyed included persistent cough, chest tightness, wheezing with cold, wheezing without cold, dyspnea-associated wheezing, <hi>and</hi> exercise-induced cough or wheezing. Results showed that the consumption of sweetened beverages had the strongest association with respiratory symptoms <hi>and</hi> was positively associated with six of the seven respiratory symptoms (all p < 0.05). The adjusted odds ratios (aOR) ranged from 1.05 (95% confidence interval (CI = 1.01-1.09) for exercise-induced cough to 1.09 (95% CI = 1.03-1.16) for wheezing without cold. Egg consumption was associated with 5 of the 7 respiratory symptoms. Consumptions of seafood, soy products, <hi>and</hi> <hi>fruits</hi> were each negatively associated with one of the seven respiratory symptoms (all p < 0.05). Consumption of seafood was negatively associated with physician-diagnosed <hi>asthma</hi> <hi>and</hi> consumptions of sweetened beverages <hi>and</hi> eggs were positively associated with suspected <hi>asthma</hi> (p < 0.05). In conclusion, the study suggests that diet is associated with the respiratory symptoms in schoolchildren in Taipei. Consumptions of sweetened beverages <hi>and</hi> eggs are associated with increased risk of respiratory symptoms <hi>and</hi> <hi>asthma</hi> whereas consumptions of soy products <hi>and</hi> <hi>fruits</hi> are associated with reduced risk of respiratory symptoms.',\n",
       "     'documentid': 'id:tutorial:doc::MED-2461',\n",
       "     'id': 'MED-2461',\n",
       "     'title': 'The association of diet with respiratory symptoms and asthma in schoolchildren in Taipei, Taiwan.'}},\n",
       "   {'id': 'id:tutorial:doc::MED-5072',\n",
       "    'relevance': 0.027757078986587184,\n",
       "    'source': 'fasthtml_content',\n",
       "    'fields': {'sddocname': 'doc',\n",
       "     'body': 'Antioxidant-rich diets are associated with reduced <hi>asthma</hi> prevalence. However, direct evidence that altering intake of antioxidant-rich foods affects <hi>asthma</hi> is lacking. The objective was to investigate changes in <hi>asthma</hi> <hi>and</hi> airway inflammation resulting from a low antioxidant diet <hi>and</hi> subsequent use of lycopene-rich treatments. <hi>Asthmatic</hi> adults (n=32) consumed a low antioxidant diet for 10 days, then commenced a randomized, cross-over trial involving 3 x 7 day treatment arms (placebo, tomato extract (45 mg lycopene/day) <hi>and</hi> tomato juice (45 mg lycopene/day)). With consumption of a low antioxidant diet, plasma carotenoid concentrations decreased, <hi>Asthma</hi> Control Score worsened, %FEV(1) <hi>and</hi> %FVC decreased <hi>and</hi> %sputum neutrophils increased. Treatment with both tomato juice <hi>and</hi> extract reduced airway neutrophil influx. Treatment with tomato extract also reduced sputum neutrophil elastase activity. In conclusion, dietary antioxidant consumption modifies clinical <hi>asthma</hi> outcomes. Changing dietary antioxidant intake may be contributing to rising <hi>asthma</hi> prevalence. Lycopene-rich supplements should be further investigated as a therapeutic intervention.',\n",
       "     'documentid': 'id:tutorial:doc::MED-5072',\n",
       "     'id': 'MED-5072',\n",
       "     'title': 'Lycopene-rich treatments modify noneosinophilic airway inflammation in asthma: proof of concept.'}}]}}"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "with app.syncio(connections=1) as session:\n",
    "    query = \"How Fruits and Vegetables Can Treat Asthma?\"\n",
    "    response = session.query(\n",
    "        yql=\"select * from sources * where userQuery() or ({targetHits:1000}nearestNeighbor(embedding,q)) limit 5\",\n",
    "        query=query,\n",
    "        ranking=\"fusion\",\n",
    "        body={\"input.query(q)\": f\"embed({query})\"},\n",
    "    )\n",
    "    assert response.is_successful()\n",
    "response.json"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "072a12ac",
   "metadata": {},
   "source": [
    "Now, you should be all set to run your frontend against the Vespa Cloud application.\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.19"
  },
  "nbsphinx": {
   "allow_errors": true
  },
  "vscode": {
   "interpreter": {
    "hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}