BitcoinPaper / Raw Data /README.txt
cgrumbach's picture
Upload 59 files
3327b57 verified
# bitcointalk_crawler
---
## DataFrame Columns Description
### 1. `start_edit`
- **Description**: This column represents the date when the post or content was initially created.
- **Type**: Date (format: YYYY-MM-DD)
- **Example**: `2013-11-02`
### 2. `last_edit`
- **Description**: This column represents the last date when the post or content was edited.
- **Type**: Date (format: YYYY-MM-DD)
- **Example**: `2013-11-02`
### 3. `author`
- **Description**: The user who created the post.
- **Type**: String
- **Example**: `guyver`
### 4. `post`
- **Description**: The actual content or message of the post.
- **Type**: String
- **Example**: `before we all get excited about the second batch...`
### 5. `topic`
- **Description**: The topic or title of the thread in which the post was made.
- **Type**: String
- **Example**: `[EU/UK GROUP BUY] Blue Fury USB miner 2.2 ...`
### 6. `attachment`
- **Description**: Indicates whether the post has an attachment or not. A value of `1` means there's an attachment(image or video), and `0` means there isn't. On the website, it uses img tag to show the emoji although that's not an attachment. The column here ignores the emojis, so '1' indicates a true attachment.
- **Type**: Integer (0 or 1)
- **Example**: `0`
- **Note**: The script 'attachment_fix.py' is run subsequent to the crawling process, as the initial values populated in this column post-crawling are not accurate.
### 7. `link`
- **Description**: Indicates whether the post contains a link or not. A value of `1` means there's a link, and `0` means there isn't.
- **Type**: Integer (0 or 1)
- **Example**: `0`
### 8. `original_info`
- **Description**: This column contains raw HTML or metadata related to the post. It may contain styling and layout information.
- **Type**: String (HTML format)
- **Example**: `<td class="td_headerandpost" height="100%" sty...`
### 9. `preprocessed_post`
- **Description**: Preprocessed of `post` column that for analysis or other tasks.
- **Type**: String
- **Example**: `get excited second batch.let us wait first bat...`
---