如何在NLTK中获取文本的主观性分数?

我需要NLTK中的一种方法来计算文本主观性的分数(实数)。NLTK中有没有这样的方法?

some_magic_method(my_text):
    ...

# 0.34

解决方案:

简单的谷歌搜索可以得到 https:/www.nltk.orgapinltk.sentiment.html 其中有一个主观性预测因子。这是在情怀的背景下,如果你是从脱离情怀的东西,你可以看看Pang和Lee 2004的数据集。用一个简单的计数向量化的SVM,我对它的准确率达到了90%。这里是一段定义类的代码(来自我的GitHub),如果你想要整个代码,我可以提供更多。

class ObjectivityDetector():
    '''SVM predicts the objectivity/subjectivity of a sentence. Trained on pang/lee 2004 with NER removal. Pre-grid searched and 5 fold validated and has a 90% accuracy and 0.89 F1 macro'''
    def __init__(self,train,model_file=None):
        self.pipeline = Pipeline(
            [
                ('vect', CountVectorizer()),
                ('tfidf', TfidfTransformer()),
                ('clf', CalibratedClassifierCV( #calibrated CV wrapping SGD to get probability outputs
                        SGDClassifier(
                        loss='hinge',
                        penalty='l2',
                        alpha=1e-4,
                        max_iter=1000,
                        learning_rate='optimal',
                        tol=None,),
                    cv=5)),
            ]
        )
        self.train(train)

    def train(self,train):
        learner = self.pipeline.fit(train['text'],train['truth'])
        self.learner = learner

    def predict(self,test):
        predicted = self.learner.predict(test)
        probs = self.learner.predict_proba(test)
        certainty = certainty_(probs)
        return predicted,certainty

    def score(self,predicted,test):
        acc = accuracy_score(test['truth'].to_numpy(),predicted[0])*100
        f1 = f1_score(test['truth'].to_numpy(),predicted[0], average='macro')
        print("Accuracy: {}\nMacro F1-score: {}".format(acc, f1))
        return acc,f1

本文来自投稿,不代表实战宝典立场,如若转载,请注明出处:https://www.shizhanbaodian.com/22560.html

(0)
上一篇 1天前
下一篇 1天前

相关推荐

发表评论

登录后才能评论