bintangyosua commited on
Commit
cc4814b
β€’
1 Parent(s): 65dc541

Added files

Browse files
Files changed (8) hide show
  1. Dockerfile +16 -16
  2. README.md +13 -13
  3. app.py +329 -470
  4. development.md +8 -8
  5. requirements.txt +15 -5
  6. test.parquet +3 -0
  7. train.parquet +3 -0
  8. val.parquet +3 -0
Dockerfile CHANGED
@@ -1,16 +1,16 @@
1
- FROM python:3.12
2
- COPY --from=ghcr.io/astral-sh/uv:0.4.20 /uv /bin/uv
3
-
4
- RUN useradd -m -u 1000 user
5
- ENV PATH="/home/user/.local/bin:$PATH"
6
- ENV UV_SYSTEM_PYTHON=1
7
-
8
- WORKDIR /app
9
-
10
- COPY --chown=user ./requirements.txt requirements.txt
11
- RUN uv pip install -r requirements.txt
12
-
13
- COPY --chown=user . /app
14
- USER user
15
-
16
- CMD ["marimo", "run", "app.py", "--host", "0.0.0.0", "--port", "7860"]
 
1
+ FROM python:3.12
2
+ COPY --from=ghcr.io/astral-sh/uv:0.4.20 /uv /bin/uv
3
+
4
+ RUN useradd -m -u 1000 user
5
+ ENV PATH="/home/user/.local/bin:$PATH"
6
+ ENV UV_SYSTEM_PYTHON=1
7
+
8
+ WORKDIR /app
9
+
10
+ COPY --chown=user ./requirements.txt requirements.txt
11
+ RUN uv pip install -r requirements.txt
12
+
13
+ COPY --chown=user . /app
14
+ USER user
15
+
16
+ CMD ["marimo", "run", "app.py", "--host", "0.0.0.0", "--port", "7860"]
README.md CHANGED
@@ -1,13 +1,13 @@
1
- ---
2
- title: Political Ideology
3
- emoji: πŸƒ
4
- colorFrom: indigo
5
- colorTo: purple
6
- sdk: docker
7
- pinned: true
8
- license: mit
9
- short_description: Political Ideology Analysis
10
- ---
11
-
12
- Check out marimo at <https://github.com/marimo-team/marimo>
13
- Check out the configuration reference at <https://huggingface.co/docs/hub/spaces-config-reference>
 
1
+ ---
2
+ title: marimo app template
3
+ emoji: πŸƒ
4
+ colorFrom: indigo
5
+ colorTo: purple
6
+ sdk: docker
7
+ pinned: true
8
+ license: mit
9
+ short_description: Template for deploying a marimo application to HF
10
+ ---
11
+
12
+ Check out marimo at <https://github.com/marimo-team/marimo>
13
+ Check out the configuration reference at <https://huggingface.co/docs/hub/spaces-config-reference>
app.py CHANGED
@@ -1,470 +1,329 @@
1
- import marimo
2
-
3
- __generated_with = "0.9.2"
4
- app = marimo.App()
5
-
6
-
7
- @app.cell
8
- def __():
9
- import marimo as mo
10
-
11
- mo.md("# Welcome to marimo! πŸŒŠπŸƒ")
12
- return (mo,)
13
-
14
-
15
- @app.cell
16
- def __(mo):
17
- slider = mo.ui.slider(1, 22)
18
- return (slider,)
19
-
20
-
21
- @app.cell
22
- def __(mo, slider):
23
- mo.md(
24
- f"""
25
- marimo is a **reactive** Python notebook.
26
-
27
- This means that unlike traditional notebooks, marimo notebooks **run
28
- automatically** when you modify them or
29
- interact with UI elements, like this slider: {slider}.
30
-
31
- {"##" + "πŸƒ" * slider.value}
32
- """
33
- )
34
- return
35
-
36
-
37
- @app.cell(hide_code=True)
38
- def __(mo):
39
- mo.accordion(
40
- {
41
- "Tip: disabling automatic execution": mo.md(
42
- rf"""
43
- marimo lets you disable automatic execution: just go into the
44
- notebook settings and set
45
-
46
- "Runtime > On Cell Change" to "lazy".
47
-
48
- When the runtime is lazy, after running a cell, marimo marks its
49
- descendants as stale instead of automatically running them. The
50
- lazy runtime puts you in control over when cells are run, while
51
- still giving guarantees about the notebook state.
52
- """
53
- )
54
- }
55
- )
56
- return
57
-
58
-
59
- @app.cell(hide_code=True)
60
- def __(mo):
61
- mo.md(
62
- """
63
- Tip: This is a tutorial notebook. You can create your own notebooks
64
- by entering `marimo edit` at the command line.
65
- """
66
- ).callout()
67
- return
68
-
69
-
70
- @app.cell(hide_code=True)
71
- def __(mo):
72
- mo.md(
73
- """
74
- ## 1. Reactive execution
75
-
76
- A marimo notebook is made up of small blocks of Python code called
77
- cells.
78
-
79
- marimo reads your cells and models the dependencies among them: whenever
80
- a cell that defines a global variable is run, marimo
81
- **automatically runs** all cells that reference that variable.
82
-
83
- Reactivity keeps your program state and outputs in sync with your code,
84
- making for a dynamic programming environment that prevents bugs before they
85
- happen.
86
- """
87
- )
88
- return
89
-
90
-
91
- @app.cell(hide_code=True)
92
- def __(changed, mo):
93
- (
94
- mo.md(
95
- f"""
96
- **✨ Nice!** The value of `changed` is now {changed}.
97
-
98
- When you updated the value of the variable `changed`, marimo
99
- **reacted** by running this cell automatically, because this cell
100
- references the global variable `changed`.
101
-
102
- Reactivity ensures that your notebook state is always
103
- consistent, which is crucial for doing good science; it's also what
104
- enables marimo notebooks to double as tools and apps.
105
- """
106
- )
107
- if changed
108
- else mo.md(
109
- """
110
- **🌊 See it in action.** In the next cell, change the value of the
111
- variable `changed` to `True`, then click the run button.
112
- """
113
- )
114
- )
115
- return
116
-
117
-
118
- @app.cell
119
- def __():
120
- changed = False
121
- return (changed,)
122
-
123
-
124
- @app.cell(hide_code=True)
125
- def __(mo):
126
- mo.accordion(
127
- {
128
- "Tip: execution order": (
129
- """
130
- The order of cells on the page has no bearing on
131
- the order in which cells are executed: marimo knows that a cell
132
- reading a variable must run after the cell that defines it. This
133
- frees you to organize your code in the way that makes the most
134
- sense for you.
135
- """
136
- )
137
- }
138
- )
139
- return
140
-
141
-
142
- @app.cell(hide_code=True)
143
- def __(mo):
144
- mo.md(
145
- """
146
- **Global names must be unique.** To enable reactivity, marimo imposes a
147
- constraint on how names appear in cells: no two cells may define the same
148
- variable.
149
- """
150
- )
151
- return
152
-
153
-
154
- @app.cell(hide_code=True)
155
- def __(mo):
156
- mo.accordion(
157
- {
158
- "Tip: encapsulation": (
159
- """
160
- By encapsulating logic in functions, classes, or Python modules,
161
- you can minimize the number of global variables in your notebook.
162
- """
163
- )
164
- }
165
- )
166
- return
167
-
168
-
169
- @app.cell(hide_code=True)
170
- def __(mo):
171
- mo.accordion(
172
- {
173
- "Tip: private variables": (
174
- """
175
- Variables prefixed with an underscore are "private" to a cell, so
176
- they can be defined by multiple cells.
177
- """
178
- )
179
- }
180
- )
181
- return
182
-
183
-
184
- @app.cell(hide_code=True)
185
- def __(mo):
186
- mo.md(
187
- """
188
- ## 2. UI elements
189
-
190
- Cells can output interactive UI elements. Interacting with a UI
191
- element **automatically triggers notebook execution**: when
192
- you interact with a UI element, its value is sent back to Python, and
193
- every cell that references that element is re-run.
194
-
195
- marimo provides a library of UI elements to choose from under
196
- `marimo.ui`.
197
- """
198
- )
199
- return
200
-
201
-
202
- @app.cell
203
- def __(mo):
204
- mo.md("""**🌊 Some UI elements.** Try interacting with the below elements.""")
205
- return
206
-
207
-
208
- @app.cell
209
- def __(mo):
210
- icon = mo.ui.dropdown(["πŸƒ", "🌊", "✨"], value="πŸƒ")
211
- return (icon,)
212
-
213
-
214
- @app.cell
215
- def __(icon, mo):
216
- repetitions = mo.ui.slider(1, 16, label=f"number of {icon.value}: ")
217
- return (repetitions,)
218
-
219
-
220
- @app.cell
221
- def __(icon, repetitions):
222
- icon, repetitions
223
- return
224
-
225
-
226
- @app.cell
227
- def __(icon, mo, repetitions):
228
- mo.md("# " + icon.value * repetitions.value)
229
- return
230
-
231
-
232
- @app.cell(hide_code=True)
233
- def __(mo):
234
- mo.md(
235
- """
236
- ## 3. marimo is just Python
237
-
238
- marimo cells parse Python (and only Python), and marimo notebooks are
239
- stored as pure Python files β€” outputs are _not_ included. There's no
240
- magical syntax.
241
-
242
- The Python files generated by marimo are:
243
-
244
- - easily versioned with git, yielding minimal diffs
245
- - legible for both humans and machines
246
- - formattable using your tool of choice,
247
- - usable as Python scripts, with UI elements taking their default
248
- values, and
249
- - importable by other modules (more on that in the future).
250
- """
251
- )
252
- return
253
-
254
-
255
- @app.cell(hide_code=True)
256
- def __(mo):
257
- mo.md(
258
- """
259
- ## 4. Running notebooks as apps
260
-
261
- marimo notebooks can double as apps. Click the app window icon in the
262
- bottom-right to see this notebook in "app view."
263
-
264
- Serve a notebook as an app with `marimo run` at the command-line.
265
- Of course, you can use marimo just to level-up your
266
- notebooking, without ever making apps.
267
- """
268
- )
269
- return
270
-
271
-
272
- @app.cell(hide_code=True)
273
- def __(mo):
274
- mo.md(
275
- """
276
- ## 5. The `marimo` command-line tool
277
-
278
- **Creating and editing notebooks.** Use
279
-
280
- ```
281
- marimo edit
282
- ```
283
-
284
- in a terminal to start the marimo notebook server. From here
285
- you can create a new notebook or edit existing ones.
286
-
287
-
288
- **Running as apps.** Use
289
-
290
- ```
291
- marimo run notebook.py
292
- ```
293
-
294
- to start a webserver that serves your notebook as an app in read-only mode,
295
- with code cells hidden.
296
-
297
- **Convert a Jupyter notebook.** Convert a Jupyter notebook to a marimo
298
- notebook using `marimo convert`:
299
-
300
- ```
301
- marimo convert your_notebook.ipynb > your_app.py
302
- ```
303
-
304
- **Tutorials.** marimo comes packaged with tutorials:
305
-
306
- - `dataflow`: more on marimo's automatic execution
307
- - `ui`: how to use UI elements
308
- - `markdown`: how to write markdown, with interpolated values and
309
- LaTeX
310
- - `plots`: how plotting works in marimo
311
- - `sql`: how to use SQL
312
- - `layout`: layout elements in marimo
313
- - `fileformat`: how marimo's file format works
314
- - `markdown-format`: for using `.md` files in marimo
315
- - `for-jupyter-users`: if you are coming from Jupyter
316
-
317
- Start a tutorial with `marimo tutorial`; for example,
318
-
319
- ```
320
- marimo tutorial dataflow
321
- ```
322
-
323
- In addition to tutorials, we have examples in our
324
- [our GitHub repo](https://www.github.com/marimo-team/marimo/tree/main/examples).
325
- """
326
- )
327
- return
328
-
329
-
330
- @app.cell(hide_code=True)
331
- def __(mo):
332
- mo.md(
333
- """
334
- ## 6. The marimo editor
335
-
336
- Here are some tips to help you get started with the marimo editor.
337
- """
338
- )
339
- return
340
-
341
-
342
- @app.cell
343
- def __(mo, tips):
344
- mo.accordion(tips)
345
- return
346
-
347
-
348
- @app.cell(hide_code=True)
349
- def __(mo):
350
- mo.md("""## Finally, a fun fact""")
351
- return
352
-
353
-
354
- @app.cell(hide_code=True)
355
- def __(mo):
356
- mo.md(
357
- """
358
- The name "marimo" is a reference to a type of algae that, under
359
- the right conditions, clumps together to form a small sphere
360
- called a "marimo moss ball". Made of just strands of algae, these
361
- beloved assemblages are greater than the sum of their parts.
362
- """
363
- )
364
- return
365
-
366
-
367
- @app.cell(hide_code=True)
368
- def __():
369
- tips = {
370
- "Saving": (
371
- """
372
- **Saving**
373
-
374
- - _Name_ your app using the box at the top of the screen, or
375
- with `Ctrl/Cmd+s`. You can also create a named app at the
376
- command line, e.g., `marimo edit app_name.py`.
377
-
378
- - _Save_ by clicking the save icon on the bottom right, or by
379
- inputting `Ctrl/Cmd+s`. By default marimo is configured
380
- to autosave.
381
- """
382
- ),
383
- "Running": (
384
- """
385
- 1. _Run a cell_ by clicking the play ( β–· ) button on the top
386
- right of a cell, or by inputting `Ctrl/Cmd+Enter`.
387
-
388
- 2. _Run a stale cell_ by clicking the yellow run button on the
389
- right of the cell, or by inputting `Ctrl/Cmd+Enter`. A cell is
390
- stale when its code has been modified but not run.
391
-
392
- 3. _Run all stale cells_ by clicking the play ( β–· ) button on
393
- the bottom right of the screen, or input `Ctrl/Cmd+Shift+r`.
394
- """
395
- ),
396
- "Console Output": (
397
- """
398
- Console output (e.g., `print()` statements) is shown below a
399
- cell.
400
- """
401
- ),
402
- "Creating, Moving, and Deleting Cells": (
403
- """
404
- 1. _Create_ a new cell above or below a given one by clicking
405
- the plus button to the left of the cell, which appears on
406
- mouse hover.
407
-
408
- 2. _Move_ a cell up or down by dragging on the handle to the
409
- right of the cell, which appears on mouse hover.
410
-
411
- 3. _Delete_ a cell by clicking the trash bin icon. Bring it
412
- back by clicking the undo button on the bottom right of the
413
- screen, or with `Ctrl/Cmd+Shift+z`.
414
- """
415
- ),
416
- "Disabling Automatic Execution": (
417
- """
418
- Via the notebook settings (gear icon) or footer panel, you
419
- can disable automatic execution. This is helpful when
420
- working with expensive notebooks or notebooks that have
421
- side-effects like database transactions.
422
- """
423
- ),
424
- "Disabling Cells": (
425
- """
426
- You can disable a cell via the cell context menu.
427
- marimo will never run a disabled cell or any cells that depend on it.
428
- This can help prevent accidental execution of expensive computations
429
- when editing a notebook.
430
- """
431
- ),
432
- "Code Folding": (
433
- """
434
- You can collapse or fold the code in a cell by clicking the arrow
435
- icons in the line number column to the left, or by using keyboard
436
- shortcuts.
437
-
438
- Use the command palette (`Ctrl/Cmd+k`) or a keyboard shortcut to
439
- quickly fold or unfold all cells.
440
- """
441
- ),
442
- "Code Formatting": (
443
- """
444
- If you have [ruff](https://github.com/astral-sh/ruff) installed,
445
- you can format a cell with the keyboard shortcut `Ctrl/Cmd+b`.
446
- """
447
- ),
448
- "Command Palette": (
449
- """
450
- Use `Ctrl/Cmd+k` to open the command palette.
451
- """
452
- ),
453
- "Keyboard Shortcuts": (
454
- """
455
- Open the notebook menu (top-right) or input `Ctrl/Cmd+Shift+h` to
456
- view a list of all keyboard shortcuts.
457
- """
458
- ),
459
- "Configuration": (
460
- """
461
- Configure the editor by clicking the gears icon near the top-right
462
- of the screen.
463
- """
464
- ),
465
- }
466
- return (tips,)
467
-
468
-
469
- if __name__ == "__main__":
470
- app.run()
 
1
+ import marimo
2
+
3
+ __generated_with = "0.9.14"
4
+ app = marimo.App(width="full")
5
+
6
+
7
+ @app.cell(hide_code=True)
8
+ def __(mo):
9
+ mo.md(
10
+ """
11
+ # Political Ideologies Analysis
12
+
13
+ This project provides a detailed analysis of political ideologies using data from the Huggingface Political Ideologies dataset. The code leverages various data science libraries and visualization tools to map, analyze, and visualize political ideology text data.
14
+ Project Structure
15
+
16
+ This analysis is based on huggingface dataset repository. <br>
17
+ You can visit right [here](https://huggingface.co/datasets/JyotiNayak/political_ideologies)
18
+ """
19
+ )
20
+ return
21
+
22
+
23
+ @app.cell(hide_code=True)
24
+ def __():
25
+ import marimo as mo
26
+ import pandas as pd
27
+ import numpy as np
28
+
29
+ import matplotlib.pyplot as plt
30
+ import seaborn as sns
31
+ import altair as alt
32
+
33
+ from gensim.models import Word2Vec
34
+ from sklearn.manifold import TSNE
35
+
36
+ mo.md("""
37
+ ## 1. Import all libraries needed
38
+
39
+ The initial cells import the necessary libraries for data handling, visualization, and word embedding.
40
+ """)
41
+ return TSNE, Word2Vec, alt, mo, np, pd, plt, sns
42
+
43
+
44
+ @app.cell(hide_code=True)
45
+ def __(mo):
46
+ mo.md(
47
+ """
48
+ Here are the mapped of label and issue type columns.
49
+
50
+ ```yaml
51
+ Label Mapping: {'conservative': 0, 'liberal': 1 }
52
+ Issue Type Mapping: {
53
+ 'economic': 0, 'environmental': 1,
54
+ 'family/gender': 2, 'geo-political and foreign policy': 3,
55
+ 'political': 4, 'racial justice and immigration': 5,
56
+ 'religious': 6, 'social, health and education': 7
57
+ }
58
+ ```
59
+ """
60
+ )
61
+ return
62
+
63
+
64
+ @app.cell(hide_code=True)
65
+ def __(mo, pd):
66
+ df = pd.concat(
67
+ [pd.read_parquet(f'{name}.parquet') for name in ['train', 'val', 'test']],
68
+ axis=0,
69
+ )
70
+
71
+ df = df.drop('__index_level_0__', axis=1)
72
+
73
+ mo.md("""
74
+ ## 2. Dataset Loading
75
+
76
+ The dataset files (`train.parquet`, `val.parquet`, and `test.parquet`) are loaded, concatenated, and cleaned to form a single DataFrame (df). Columns are mapped to readable labels for ease of understanding.
77
+ """)
78
+ return (df,)
79
+
80
+
81
+ @app.cell(hide_code=True)
82
+ def __():
83
+ label_mapping = {
84
+ 'conservative': 0,
85
+ 'liberal': 1
86
+ }
87
+
88
+ issue_type_mapping = {
89
+ 'economic': 0,
90
+ 'environmental': 1,
91
+ 'family/gender': 2,
92
+ 'geo-political and foreign policy': 3,
93
+ 'political': 4,
94
+ 'racial justice and immigration': 5,
95
+ 'religious': 6,
96
+ 'social, health and education': 7
97
+ }
98
+ return issue_type_mapping, label_mapping
99
+
100
+
101
+ @app.cell(hide_code=True)
102
+ def __(issue_type_mapping, label_mapping):
103
+ label_mapping_reversed = {v: k for k, v in label_mapping.items()}
104
+ issue_type_mapping_reversed = {v: k for k, v in issue_type_mapping.items()}
105
+
106
+ print(label_mapping_reversed)
107
+ print(issue_type_mapping_reversed)
108
+ return issue_type_mapping_reversed, label_mapping_reversed
109
+
110
+
111
+ @app.cell(hide_code=True)
112
+ def __(df, issue_type_mapping_reversed, label_mapping_reversed, mo):
113
+ df['label_text'] = df['label'].replace(label_mapping_reversed)
114
+ df['issue_type_text'] = df['issue_type'].replace(issue_type_mapping_reversed)
115
+
116
+ labels_grouped = df['label_text'].value_counts().rename_axis('label_text').reset_index(name='counts')
117
+ issue_types_grouped = (
118
+ df["issue_type_text"]
119
+ .value_counts()
120
+ .rename_axis("issue_type_text")
121
+ .reset_index(name="counts")
122
+ )
123
+
124
+ mo.md("""
125
+ ## 3. Mapping Labels and Issue Types
126
+
127
+ Two dictionaries map labels (conservative and liberal) and issue types (e.g., economic, environmental, etc.) to numerical values for machine learning purposes. Reversed mappings are created to convert numerical labels back into their text form.
128
+ """)
129
+ return issue_types_grouped, labels_grouped
130
+
131
+
132
+ @app.cell(hide_code=True)
133
+ def __(df):
134
+ df.iloc[:, :6].head(7)
135
+ return
136
+
137
+
138
+ @app.cell(hide_code=True)
139
+ def __(mo):
140
+ mo.md(
141
+ """
142
+ ## 4. Visualizing Data Distributions
143
+
144
+ Bar plots visualize the proportions of conservative vs. liberal ideologies and the count of different issue types. These provide an overview of the dataset composition.
145
+ """
146
+ )
147
+ return
148
+
149
+
150
+ @app.cell(hide_code=True)
151
+ def __(alt, labels_grouped, mo):
152
+ mo.ui.altair_chart(
153
+ alt.Chart(labels_grouped).mark_bar(
154
+ fill='#4C78A8',
155
+ cursor='pointer',
156
+ ).encode(
157
+ x=alt.X('label_text', axis=alt.Axis(labelAngle=0)),
158
+ y='counts:Q'
159
+ )
160
+ )
161
+ return
162
+
163
+
164
+ @app.cell(hide_code=True)
165
+ def __(alt, issue_types_grouped, mo):
166
+ mo.ui.altair_chart(
167
+ alt.Chart(issue_types_grouped)
168
+ .mark_bar(
169
+ fill="#4C78A8",
170
+ cursor="pointer",
171
+ )
172
+ .encode(
173
+ x=alt.X(
174
+ "issue_type_text:O",
175
+ axis=alt.Axis(
176
+ labelAngle=-10, labelAlign="center", labelPadding=10
177
+ ),
178
+ ),
179
+ y="counts:Q",
180
+ )
181
+ )
182
+ return
183
+
184
+
185
+ @app.cell(hide_code=True)
186
+ def __(mo):
187
+ mo.md(
188
+ """
189
+ ## 5. Word Embedding with Word2Vec
190
+
191
+ Using Word2Vec, word embeddings are created from text statements in the dataset. The model trains on tokenized sentences, generating a 100-dimensional embedding for each word. Statements are averaged to form document-level embeddings.
192
+ """
193
+ )
194
+ return
195
+
196
+
197
+ @app.cell(hide_code=True)
198
+ def __(Word2Vec, df):
199
+ df['tokens'] = df['statement'].apply(lambda x: x.lower().split())
200
+ word2vec_model = Word2Vec(sentences=df['tokens'], vector_size=100, window=5, min_count=1, seed=0)
201
+ return (word2vec_model,)
202
+
203
+
204
+ @app.cell(hide_code=True)
205
+ def __(np, word2vec_model):
206
+ def get_doc_embedding(tokens):
207
+ vectors = [word2vec_model.wv[word] for word in tokens if word in word2vec_model.wv]
208
+ if vectors:
209
+ return np.mean(vectors, axis=0)
210
+ else:
211
+ return np.zeros(word2vec_model.vector_size)
212
+ return (get_doc_embedding,)
213
+
214
+
215
+ @app.cell(hide_code=True)
216
+ def __(df, get_doc_embedding, np):
217
+ df['embedding'] = df['tokens'].apply(get_doc_embedding)
218
+ embeddings_matrix = np.vstack(df['embedding'].values)
219
+ return (embeddings_matrix,)
220
+
221
+
222
+ @app.cell(hide_code=True)
223
+ def __(mo):
224
+ mo.md(
225
+ """
226
+ ## 6. Dimensionality Reduction with TSNE
227
+
228
+ Embeddings are projected into a 2D space using TSNE for visualization. The embeddings are colored by issue type, showing clusters of similar statements.
229
+ """
230
+ )
231
+ return
232
+
233
+
234
+ @app.cell(hide_code=True)
235
+ def __(TSNE, alt, df, embeddings_matrix, plt, sns):
236
+ tsne = TSNE(n_components=2, random_state=0)
237
+ tsne_results = tsne.fit_transform(embeddings_matrix)
238
+ df['x'] = tsne_results[:, 0]
239
+ df['y'] = tsne_results[:, 1]
240
+
241
+ # Brush for selection
242
+ brush = alt.selection_interval()
243
+ size = 350
244
+
245
+ plt.figure(figsize=(10, 6))
246
+ sns.scatterplot(data=df, x='x', y='y', hue='issue_type_text', palette='Set1', s=100)
247
+ plt.title("2D Visualization of Text Data by Ideology (Word2Vec Embeddings)")
248
+ plt.xlabel("t-SNE Dimension 1")
249
+ plt.ylabel("t-SNE Dimension 2")
250
+ plt.legend(title='Ideology')
251
+ plt.show()
252
+ return brush, size, tsne, tsne_results
253
+
254
+
255
+ @app.cell(hide_code=True)
256
+ def __(mo):
257
+ mo.md(
258
+ """
259
+ ## 7. Interactive Visualizations
260
+
261
+ Interactive scatter plots in Altair show ideology and issue types in 2D space. A brush selection tool allows users to explore specific points and view tooltip information.
262
+
263
+ ### Combined Scatter Plot
264
+
265
+ Combines the two scatter plots into a side-by-side visualization for direct comparison of ideologies vs. issue types.
266
+ Running the Code
267
+
268
+ Run the code using the marimo.App instance. This notebook can also be run as a standalone Python script:
269
+ """
270
+ )
271
+ return
272
+
273
+
274
+ @app.cell(hide_code=True)
275
+ def __(alt, brush, df, mo, size):
276
+ points1 = alt.Chart(df, height=size, width=size).mark_point().encode(
277
+ x='x:Q',
278
+ y='y:Q',
279
+ color=alt.condition(brush, 'label_text', alt.value('grey')),
280
+ tooltip=['x:Q', 'y:Q', 'statement:N', 'label_text:N']
281
+ ).add_params(brush).properties(title='By Political Ideologies')
282
+
283
+ scatter_chart1 = mo.ui.altair_chart(points1)
284
+
285
+ points2 = alt.Chart(df, height=size, width=size).mark_point().encode(
286
+ x='x:Q',
287
+ y='y:Q',
288
+ color=alt.condition(brush, 'issue_type_text', alt.value('grey')),
289
+ tooltip=['x:Q', 'y:Q', 'statement:N', 'issue_type:N']
290
+ ).add_params(brush).properties(title='By Issue Types')
291
+
292
+ scatter_chart2 = mo.ui.altair_chart(points2)
293
+
294
+ combined_chart = (scatter_chart1 | scatter_chart2)
295
+ combined_chart
296
+ return combined_chart, points1, points2, scatter_chart1, scatter_chart2
297
+
298
+
299
+ @app.cell(hide_code=True)
300
+ def __(combined_chart):
301
+ combined_chart.value[['statement', 'label_text', 'issue_type_text']]
302
+ return
303
+
304
+
305
+ @app.cell(hide_code=True)
306
+ def __(combined_chart):
307
+ combined_chart.value['statement']
308
+ return
309
+
310
+
311
+ @app.cell(hide_code=True)
312
+ def __(mo):
313
+ mo.md(
314
+ r"""
315
+ ## Data Insights
316
+
317
+ - Ideology Distribution: Visualizes proportions of conservative and liberal ideologies.
318
+ - Issue Types: Bar plot reveals the diversity and frequency of issue types in the dataset.
319
+ - Word Embeddings: Using TSNE for 2D projections helps identify clusters in political statements.
320
+ - Interactive Exploration: Offers detailed, interactive views on ideology vs. issue type distribution.
321
+
322
+ This code provides a thorough analysis pipeline, from data loading to interactive visualizations, enabling an in-depth exploration of political ideologies.
323
+ """
324
+ )
325
+ return
326
+
327
+
328
+ if __name__ == "__main__":
329
+ app.run()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
development.md CHANGED
@@ -1,8 +1,8 @@
1
- # Development
2
-
3
- ## Testing your Dockerfile locally
4
-
5
- ```bash
6
- docker build -t marimo-app .
7
- docker run -it --rm -p 7860:7860 marimo-app
8
- ```
 
1
+ # Development
2
+
3
+ ## Testing your Dockerfile locally
4
+
5
+ ```bash
6
+ docker build -t marimo-app .
7
+ docker run -it --rm -p 7860:7860 marimo-app
8
+ ```
requirements.txt CHANGED
@@ -1,5 +1,15 @@
1
- marimo
2
- # Or a specific version
3
- # marimo>=0.9.0
4
-
5
- # Add other dependencies as needed
 
 
 
 
 
 
 
 
 
 
 
1
+ marimo
2
+ pandas
3
+ numpy
4
+
5
+ matplotlib
6
+ seaborn
7
+ altair
8
+
9
+ genism
10
+ scikit-learn
11
+
12
+ # Or a specific version
13
+ # marimo>=0.9.0
14
+
15
+ # Add other dependencies as needed
test.parquet ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:456a6384233fbbabd92593dc17ff9b5aec305a51a63aea36c621c0142c2d0ac3
3
+ size 71633
train.parquet ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a81fd1847b0a8b57e908bf8e03bc0e020c8f876aedbed45126a17af89adae18e
3
+ size 552587
val.parquet ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:20dc1daddb20ac5749d4984e9685bbc9e96e1d5d44afec5a144e3acfcf5d7f9e
3
+ size 75360