machine learning - SVM - passing a string to the CountVectorizer in Python vectorizes each character? -
i have working svm , countvectorizer works fine when input transform
function list of strings. however, if pass 1 string it, vectorizer iterates through each character in string , vectorizes each one, though set analyzer
parameter word
when constructing countvectorizer
.
for x in range(0,3): test=raw_input("type message classify: ") v=vectorizer.transform(test).toarray() print(v) print(len(v)) print(svm.predict(vectorizer.transform(test).toarray()))
i'm able fix issue changing second line in above code to:
test=[raw_input("type message classify: ")]
but seems strange have 1-item list. isn't there better way without constructing list?
it expects list or array of documents when pass in single string assumes each element of string document (ie: character).
try changing svm.predict(vectorizer.transform(test).toarray())
svm.predict(vectorizer.transform([test]).toarray())
ps: toarray()
part not going scale use real-world corpus. svms in sklearn can operate on sparse matrices i'd drop part together.
Comments
Post a Comment