Update README.md
Browse files
README.md
CHANGED
@@ -114,7 +114,14 @@ The model was trained on [Pre-Training Dataset](https://huggingface.co/datasets/
|
|
114 |
|
115 |
#### Preprocessing
|
116 |
|
117 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
118 |
|
119 |
#### Training Hyperparameters
|
120 |
|
|
|
114 |
|
115 |
#### Preprocessing
|
116 |
|
117 |
+
Here's a more concise version of your original paragraph, maintaining the essential information:
|
118 |
+
|
119 |
+
---
|
120 |
+
|
121 |
+
**Preprocessing of Large-Scale Image Data for Photorealism Enhancement**
|
122 |
+
This section details our methodology for preprocessing a large-scale dataset of approximately 117 million game-rendered frames from 9 AAA video games and 1.24 billion real-world images from Mapillary Vistas and Cityscapes, all in 4K resolution. The goal is to pair game frames with real images that exhibit the highest cosine similarity based on structural and visual features, ensuring alignment of fine details like object positions.
|
123 |
+
|
124 |
+
Images and their corresponding style semantic maps were resized to **512 x 512** pixels and corrected to a **24-bit** depth (3 channels) if they exceeded this depth. We employ a novel **feature-mapped channel-split PSNR matching** approach using **EfficientNet** feature extraction, channel splitting, and dual metric computation of PSNR and cosine similarity. **Locality-Sensitive Hashing** (LSH) aids in efficiently identifying the **top-10 nearest neighbors** for each frame. This resulted in a massive dataset of **1.17** billion frame-image pairs and **12.4 billion** image-frame pairs. The final selection process involves assessing similarity consistency across channels to ensure accurate pairings. This scalable preprocessing pipeline enables efficient pairing while preserving critical visual details, laying the foundation for subsequent **contrastive learning** to enhance **photorealism in game-rendered frames**.
|
125 |
|
126 |
#### Training Hyperparameters
|
127 |
|