File size: 2,081 Bytes
3327b57 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
# bitcointalk_crawler --- ## DataFrame Columns Description ### 1. `start_edit` - **Description**: This column represents the date when the post or content was initially created. - **Type**: Date (format: YYYY-MM-DD) - **Example**: `2013-11-02` ### 2. `last_edit` - **Description**: This column represents the last date when the post or content was edited. - **Type**: Date (format: YYYY-MM-DD) - **Example**: `2013-11-02` ### 3. `author` - **Description**: The user who created the post. - **Type**: String - **Example**: `guyver` ### 4. `post` - **Description**: The actual content or message of the post. - **Type**: String - **Example**: `before we all get excited about the second batch...` ### 5. `topic` - **Description**: The topic or title of the thread in which the post was made. - **Type**: String - **Example**: `[EU/UK GROUP BUY] Blue Fury USB miner 2.2 ...` ### 6. `attachment` - **Description**: Indicates whether the post has an attachment or not. A value of `1` means there's an attachment(image or video), and `0` means there isn't. On the website, it uses img tag to show the emoji although that's not an attachment. The column here ignores the emojis, so '1' indicates a true attachment. - **Type**: Integer (0 or 1) - **Example**: `0` - **Note**: The script 'attachment_fix.py' is run subsequent to the crawling process, as the initial values populated in this column post-crawling are not accurate. ### 7. `link` - **Description**: Indicates whether the post contains a link or not. A value of `1` means there's a link, and `0` means there isn't. - **Type**: Integer (0 or 1) - **Example**: `0` ### 8. `original_info` - **Description**: This column contains raw HTML or metadata related to the post. It may contain styling and layout information. - **Type**: String (HTML format) - **Example**: `<td class="td_headerandpost" height="100%" sty...` ### 9. `preprocessed_post` - **Description**: Preprocessed of `post` column that for analysis or other tasks. - **Type**: String - **Example**: `get excited second batch.let us wait first bat...` --- |