Sort by year:

Evaluating Safety, Soundness and Sensibleness of Obfuscation Systems

Matthias Liebeck, Pashutan Modaresi, Stefan Conrad
Conference PapersCLEF 2016 Evaluation Labs and Workshop – Working Notes Paper, Pages 920-928

Abstract

Author masking is the task of paraphrasing a document so that its writing style no longer matches that of its original author. This task was introduced as part of the 2016 PAN Lab on Digital Text Forensics, for which a total of three research teams submitted their results. This work describes our methodology to evaluate the submitted obfuscation systems based on their safety, soundness and sensibleness. For the first two dimensions, we introduce automatic evaluation measures and for sensibleness we report our manual evaluation results.

Exploring the Effects of Cross-Genre Machine Learning for Author Profiling in PAN 2016.

Pashutan Modaresi, Matthias Liebeck, Stefan Conrad
Conference PapersCLEF 2016 Evaluation Labs and Workshop – Working Notes Paper, Pages 970-977

Abstract

Author profiling deals with the study of various profile dimensions of an author such as age and gender. This work describes our methodology proposed for the task of cross-genre author profiling at PAN 2016. We address gender and age prediction as a classification task and approach this problem by extracting stylistic and lexical features for training a logistic regression model. Furthermore, we report the effects of our cross-genre machine learning approach for the author profiling task. With our approach, we achieved the first place for gender detection in English and tied for second place in terms of joint accuracy. For Spanish, we tied for first place.

What to Do with an Airport? Mining Arguments in the German Online Participation Project Tempelhofer Feld

Matthias Liebeck, Katharina Esau, Stefan Conrad
Conference PapersProceedings of the Third Workshop on Argument Mining, Pages 144-153

Abstract

This paper focuses on the automated extraction of argument components from user content in the German online participation project Tempelhofer Feld. We adapt existing argumentation models into a new model for decision-oriented online participation. Our model consists of three categories: major positions, claims, and premises. We create a new German corpus for argument mining by annotating our dataset with our model. Afterwards, we focus on the two classification tasks of identifying argumentative sentences and predicting argument components in sentences. We achieve macro-averaged F1 measures of 69.77% and 68.5%, respectively.

HHU at SemEval-2016 Task 1: Multiple Approaches to Measuring Semantic Textual Similarity

Matthias Liebeck, Philipp Pollack, Pashutan Modaresi, Stefan Conrad
Conference PapersProceedings of the 10th International Workshop on Semantic Evaluation (SemEval 2016), Pages 595-601

Abstract

This paper describes our participation in the SemEval-2016 Task 1: \textit{Semantic Textual Similarity} (STS). We developed three methods for the English subtask (STS Core). The first method is unsupervised and uses WordNet and word2vec to measure a token-based overlap. In our second approach, we train a neural network on two features. The third method uses word2vec and LDA with regression splines.

IWNLP: Inverse Wiktionary for Natural Language Processing

Matthias Liebeck, Stefan Conrad
Conference PapersProceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), ACL 2015, Pages 414-418

Abstract

Nowadays, there are a lot of natural language processing pipelines that are based on training data created by a few experts. This paper examines how the proliferation of the internet and its collaborative application possibilities can be practically used for NLP. For that purpose, we examine how the German version of Wiktionary can be used for a lemmatization task. We introduce IWNLP, an open-source parser for Wiktionary, that reimplements several MediaWiki markup language templates for conjugated verbs and declined adjectives. The lemmatization task is evaluated on three German corpora on which we compare our results with existing software for lemmatization. With Wiktionary as a resource, we obtain a high accuracy for the lemmatization of nouns and can even improve on the results of existing software for the lemmatization of nouns.

Ansätze zur Erkennung von Kommunikationsmodi in Online-Diskussionen

Matthias Liebeck
Conference Papers Proceedings of the 27th GI-Workshop Grundlagen von Datenbanken, 2015, Pages 42-47

Abstract

Bei der automatisierten Analyse von Textbeiträgen aus Online-Plattformen erfolgt oft eine Einteilung in positive und negative Aussagen. Bei der Analyse von Textbeiträgen eines kommunalen Online-Partizipationsverfahrens ist eine Aufteilung der geäußerten Meinungen in Kommunikationsmodi sinnvoll, um eine Filterung nach Argumenten und Emotionsäußerungen für nachfolgende Verarbeitungsschritte zu ermöglichen. In dieser Arbeit werden zwei Ansätze zur Erkennung von Kommunikationsmodi vorgestellt. Das erste Verfahren unterscheidet verschiedene Kommunikationsmodi anhand von Wortlisten. Die zweite Methode berücksichtigt Wortarten und extrahiert weitere sprachliche Eigenschaften. Zur Evaluation der Ansätze wird ein Datensatz aus Schlagzeilen von Nachrichtenartikeln der Internetseite ZEIT ONLINE und der Satire-Website Postillon erstellt. Die Ansätze werden zur Erkennung des Kommunikationsmodus Satire eingesetzt. Das beste Ergebnis mit einem durchschnittlichen F_1 von 75,5% wird durch den zweiten Ansatz mit einer Support Vector Machine erreicht.

Aspekte einer automatischen Meinungsbildungsanalyse von Online-Diskussionen

Matthias Liebeck
Conference PapersDatenbanksysteme für Business, Technologie und Web (BTW 2015) - Studierendenprogramm, 2015, Pages 203-212

Abstract

Heutzutage haben Menschen die Möglichkeit, ihre Meinung zu verschiedensten Themen in onlinebasierten Diskussionsplattformen zu äußern. Diese Meinungen können in Form einer Meinungsbildungsanalyse genauer untersucht werden. In diesem Beitrag werden verschiedene Aspekte einer automatisierten Diskussionsverfolgung untersucht. Dazu werden Analysekriterien definiert und die vorgestellten Ansätze auf zwei deutschsprachige Datensätze angewendet.