graphchem.preprocessing.Tokenizer
Bases: object
Source code in graphchem/preprocessing/features.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
|
vocab_size: int
property
vocab_size: returns the total number of unique atom/bond strings in the tokenizer's vocabulary
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
number of strings in vocabulary |
__call__(item)
Tokenizer(): returns integer value of atom/bond string, otherwise 'unknown', or 1; if training the tokenizer, add item to vocabulary
Parameters:
Name | Type | Description | Default |
---|---|---|---|
item |
str
|
atom/bond string |
required |
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
integer value of atom/bond string |
Source code in graphchem/preprocessing/features.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
|
__init__()
Tokenizer object: integer tokenizer for unique atom/bond strings
Source code in graphchem/preprocessing/features.py
81 82 83 84 85 86 87 88 |
|