Overview


I have been working at FEUP InfoLab since 2015, always focusing on entity-oriented search. I began this chapter by working as a researcher on the ANT search engine. I then became a MAP-i PhD student, where I am working towards my thesis on "Graph-Based Entity-Oriented Search". My doctoral work consists of exploring the viability of graphs and hypergraphs as a joint representation for text and knowledge and as a way to generalize entity-oriented search tasks.

I have also done research for nearly two years in Laboratório SAPO/U.Porto, at the Faculty of Engineering of the University of Porto. I've worked across many areas, but my main focus has always been network science. I'm deeply interested in the relationships between real world entities and their grouping behavior.

I have also worked for a year in CRACS/INESC TEC, at the Faculty of Sciences of the University of Porto, pursuing a similar line of work, in the context of the Breadcrumbs Project, a computational journalism platform that takes advantage of social networking behavior to organize news fragments. In addition to community detection and data visualization, I was also able to devote some time to and develop skills in the areas of machine learning and data mining.

Information Retrieval

A lot of my work has been done on an information retrieval environment, where I focused on the link analysis task. I've participated in TREC 2010 Blog Track, experimenting with the h-index as a query independent feature for the blog distillation task, a metric typically used to measure the impact of a scientist's work. Continuously working with blog networks and other semantic-rich networks has brought to my attention the importance of working on better methodologies for the study of this kind of heterogeneous networks.

Network Science

I devoted a couple of years to the study of community detection methodologies and the analysis of groups induced by social behavior in different types of data (e.g. networks of people, places, tags, documents, etc.). This is a subject I have begun exploring with my Master's thesis, where I studied the link ecosystem of the portuguese blogosphere. I have always been very interested in understanding human relationships and using networks and their community structure to solve problems in different areas. I have used community detection to organize documents, for disambiguation in a who's who situation, to integrate multidimensional information, or as enablers for topic detection through clustering of similar documents.

Machine Learning

I have done some research in the area of machine learning, having a general understanding of classification with training, but mainly focusing on topic models, frequently working with methodologies such as Latent Dirichlet Allocation. I have also experimented with alternate topic modeling techniques based on networks of words and community detection.

Data Visualization

After working with so much data, I felt a strong interest in creating rich visualizations, so I've worked for six months on real-time data summarization and visualization. I had the chance to face the problems that a high flow data stream presents, both in storage and visualization. With this knowledge I built a prototype for Laboratório SAPO, codename Ciclope, where I used SAPO Blogs clickstream to create a group of chart and tree visualizations that would allow an author to monitor the traffic flow of his or her blog. I have also developed two visualization systems capable of displaying a multidimensional network of news clips, with relationships based on the coreference of entities of different types, and defining three dimensions corresponding to three of the five-Ws of journalism: who, where and when. This is available in the form of two widgets in the user dashboard of the Breadcrumbs system.

Affiliation


2003-2011, 2012-2013, 2015-Present

FEUP Facilities I have worked at the Faculty of Engineering of the University of Porto, as an external researcher for Laboratório SAPO/U.Porto, during a one year period, from June 2010 to June 2011. I got my MSc in Informatics and Computing Engineering from this same institution, where I was accepted as a PhD candidate for ProDEI, the Doctoral Program in Informatics Engineering, but unfortunately had to drop out for lack of funding.

The Faculty of Engineering, and more specifically the Department of Informatics Engineering, is increasingly investing in research, having inaugurated new labs and improved existing ones during these last few years. A great effort has been put into the people of this institution, giving them the conditions, the motivation and the guidance necessary to pursue their research interests, in an attempt to positively contribute to the international engineering and scientific community.

I then returned to Laboratório SAPO/U.Porto in order to pursuit new research goals in the area of music retrieval and recommendation, while attempting to maintain a relationship with the Department of Computer Science at the Faculty of Sciences, through PDCC, the Doctoral Program in Computer Science.

For a brief period, I worked in the industry, as a software engineer and data scientist, but have recently come back to the Department of Informatics Engineering, to work as a researcher on entity-oriented search at FEUP InfoLab.

Faculdade de Ciências da Universidade do Porto

2011-2012

FCUP Facilities I have worked on the Breadcrumbs project, at the Center for Research in Advanced Computing Systems, an associate unit of INESC TEC, that operates at the Faculty of Sciences of the University of Porto.

The Faculty of Science has a privileged location, surrounded by activity and covered in a wide range of tree species, nurtured by the presence of the Botanical Garden, one of the most beautiful green areas of the city of Porto, and an attraction for the students of plant biology and landscape architecture.

Publications


Journals

Devezas, J., and S. Nunes (2017). Graph-Based Entity-Oriented Search: Imitating the Human Process of Seeking and Cross Referencing Information. In ERCIM News (Special Theme: Digital Humanities), 111(October 2017), pp.14-15.

Devezas, J., and Á. Figueira (2013). The Community Structure of a Multidimensional Network of News Clips. In International Journal of Web Based Communities (Special Issue: Community Structure in Complex Networks), 9(3), pp.411-429.

Devezas, J., and Á. Figueira (2012). Finding Language-Independent Contextual Supernodes on Coreference Networks. In IAENG International Journal of Computer Science, 39(2), pp.200-207.

Conference Proceedings

Devezas, J., A. Guillén, Y. Gutiérrez, R. Muñoz, and S. Nunes (2018). FEUP at TREC 2018 Common Core Track: Reranking for Diversity using Hypergraph-of-Entity and Document Profiling. In The Twenty-Seventh Text REtrieval Conference Proceedings (TREC 2018), Gaithersburg, MD, USA.

Devezas, J. and S. Nunes (2018). Social Media and Information Consumption Diversity. In Proceedings of the Second International Workshop on Recent Trends in News Information Retrieval (NewsIR 2018), co-located with the 40th European Conference on Information Retrieval (ECIR 2018), Grenoble, France.

Devezas, J., C. T. Lopes, and S. Nunes (2017). FEUP at TREC 2017 OpenSearch Track: Graph-Based Models for Entity-Oriented. In The Twenty-Sixth Text REtrieval Conference Proceedings (TREC 2017), Gaithersburg, MD, USA.

Devezas, J. and S. Nunes (2017). Information Extraction for Event Ranking. In Proceedings of the 6th Symposium on Languages, Applications and Technologies (SLATE 2017), Vila do Conde, Portugal.

Devezas, J. and S. Nunes (2016). Index-Based Semantic Tagging for Efficient Query Interpretation. In Proceedings of the 6th International Conference of the CLEF Initiative (CLEF 2016), Évora, Portugal.

Devezas, T., J. Devezas, and S. Nunes (2016). Exploring a Large News Collection Using Visualization Tools. In Proceedings of the First International Workshop on Recent Trends in News Information Retrieval (NewsIR 2016), co-located with the 38th European Conference on Information Retrieval (ECIR 2016), Padua, Italy.

Coelho, F., J. Devezas, and C. Ribeiro (2013). Large-scale Crossmedia Retrieval for Playlist Generation and Song Discovery. In Proceedings of the 10th International Conference in the RIAO Series (OAIR 2013), Lisbon, Portugal.

Gomes, F., J. Devezas, and Á. Figueira (2013). Temporal Visualization of a Multidimensional Network of News Clips. In Proceedings of the 2013 World Conference on Information Systems and Technologies (WorldCIST 2013), Algarve, Portugal.

Devezas, J., and Á. Figueira (2012). Interactive Visualization of a News Clips Network: A Journalistic Research and Knowledge Discovery Tool. In Proceedings of the 4th International Conference on Knowledge Discovery and Information Retrieval (KDIR 2012), Barcelona, Spain.

Cravino, N., J. Devezas, and Á. Figueira (2012). Using the Overlapping Community Structure of a Network of Tags to Improve Text Clustering. In Proceedings of the 23rd ACM Conference on Hypertext and Social Media (HT 2012), Milwaukee, WI, USA.

Devezas, J., H. Alves, and Á. Figueira (2012). Creating News Context From a Folksonomy of Web Clipping. In Lecture Notes in Engineering and Computer Science: Proceedings of The International MultiConference of Engineers and Computer Scientists 2012 (IMECS 2012), Hong Kong.

Devezas, J., F. Coelho, S. Nunes, and C. Ribeiro (2012). Studying a Personality Coreference Network in a News Stories Photo Collection In Lecture Notes in Computer Science: Proceedings of the 34th European Conference on Information Retrieval (ECIR 2012), Barcelona, Spain.

Devezas, J., S. Nunes, and C. Ribeiro (2011). Using the H-index to Estimate Blog Authority. In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media (ICWSM 2011), Barcelona, Spain.

Devezas, J., S. Nunes, and C. Ribeiro (2010). FEUP at TREC 2010 Blog Track: Using h-index for blog ranking. In The Nineteenth Text REtrieval Conference Proceedings (TREC 2010), Gaithersburg, MD, USA.

Devezas, J., C. Ribeiro, and S. Nunes (2010). Studying Blog Features over Link Popularity. In Proceedings of the First Workshop on Social Media Analytics (SOMA 2010), held in conjunction with the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Minining (KDD 2010), Washington, DC, USA.

Master's Thesis

Devezas, J. (2010). Link Ecosystem of the Portuguese Blogosphere. Master's Thesis, Faculdade de Engenharia da Universidade do Porto, Porto, Portugal.

Book Chapters

Figueira, Á., J. Devezas, N. Cravino, and L. Francisco-Revilla (2013). Creating and Analysing a Social Network Built from Clips of Online News. In Information Systems and Technology for Organizations in a Networked Society, pp.67-86.

Demos

Coelho, F., J. Devezas, and C. Ribeiro (2013). Juggle: Large-scale Discovery in Music Recommendation. In Proceedings of the 10th International Conference in the RIAO Series (OAIR 2013), Lisbon, Portugal.

Technical Reports

Devezas, J. (2011). An Overview of the Graph Database Paradigm - Breadcrumbs Project Report, CRACS/INESC TEC, Faculdade de Ciências da Universidade do Porto, Porto, Portugal.

Devezas, J., S. Nunes, and C. Ribeiro (2011). Overlapping Community Detection - Labs SAPO/UP Report, Laboratório SAPO/U.Porto, Universidade do Porto, Porto, Portugal.

Unpublished Work

Devezas, J. (2013). A State of the Art for Community-Driven Music Discovery.

Devezas, J. and S. Nunes (2013). Using Visualization Techniques to Characterize Music Listening Behavior.

Devezas, J. and S. Nunes (2013). Juggle Mobile: Recommending Music to Individuals and Groups.

Devezas, J., F. Coelho, S. Nunes, and C. Ribeiro (2013). Music Discovery: Exploiting TF-IDF to Boost Results in the Long Tail of the Tags Distribution.

Presentations


Talks

EN | PT

Posters

Master's Thesis

Demonstrations

Projects


As a researcher of Laboratório SAPO/U.Porto, in the course of the yearly period from June 2010 to June 2011, I was able to work on two main projects and participate in an information retrieval contest. From November 2011 to November 2012, I worked on the Breadcrumbs Project, researching and implementing several algorithms for community detection, topic modeling, event detection and text mining and search. I also developed two visualization tools display and explore the produced results. From December 2012 to November 2013, I went back to Laboratório SAPO/U.Porto to work on music information retrieval and recommender systems. I worked on the Juggle project, for music discovery and location-based recommendation to groups.

Juggle

Juggle project aimed at improving music discovery based on a hybrid large-scale recommender system, capable of handling and combining different types of data, namely text and audio content, context from elements such as tags or location, and collaborative information from user profiles.

We tackled the challenges of multimodality and large-scale by developing a graph-based recommender system, supported on Neo4j, a popular and robust graph database that facilitated the modeling of content, context and collaborative information as nodes and edges in a graph. One of the biggest challenges was the translation of audio content to relationships in a graph, specifically the comparison of the audio features of a million songs with each other, which we solved by using an approximate search algorithm from image retrieval.

Our recommendation algorithm was mainly supported on neighborhood methods for collaborative filtering, but we also used metrics from text retrieval to boost the relevance of tags in the long tail, while not completely disregarding tag popularity, in order to offer a playlist that better potentiated the discovery of music.

Juggle Mobile

Juggle Mobile was developed as a new branch of the Juggle project and, while the name indicates that it works in mobile devices, since we've decided to build it using responsive design, it was also made available in the PC, as a web application. Juggle Mobile aimed at delivering an artist-based experience to mobile devices, using both individual and group-based discovery of artists through their biographies.

In Juggle Mobile we provide the users with the ability to create an account and fill their taste profiles either based on our random artist rating system, or by importing their existing music information from Facebook or Last.fm. All the data from these different sources is mixed together with our weighting model and used to provide recommendations to the user or to a group of nearby users.

Our experiments were based on a linear algebra approach, where, instead of a graph, we used a user-items matrix, applying singular value decomposition to build a latent factor model that provided the support for individual and group recommendations. For groups, we proposed a rating aggregation method that ensured an equal chance for every group member to have a relevant influence in the recommendations outcome.

Breadcrumbs

As a Breadcrumbs researcher, I was able to make contributions on several different areas. I implemented a language-independent named entity recognition system based on DBpedia entity lists. This system enabled the identification of three different types of entities — people, places and dates — tied to three of the five dimensions (the Five-Ws) of journalism: who, where and when. Using this data, a multidimensional entity coreference network was built, connecting news clips that cited the same entity. Next, I implemented the community detection methodologies for multidimensional networks proposed by Tang et al.. This included the dimension integration strategies proposed by the authors, based on their unified view of four traditional community detection methodologies. These algorithms were also implemented in the system, along with the Louvain method, one of the state of the art algorithms for community detection.

Next, two visualization tools were developed to display and explore the acquired data. The first was analogous to a map, where communities were visualized as countries resulting of the aggregation of a node population. The second enabled the exploration of the multidimensional network based on the three identified dimensions: who, where and when. Some simple chart visualizations were created to display statistics about the top user and system tags and entities.

We used a topic model, based on Latent Dirichlet Allocation, to suggest titles for each collection of news clips; a simplistic event detection system was also created, in order to find relevant peaks of activity in a time series of entity frequencies. Some other trivial systems, such as an administration panel, capable of scheduling tasks, and a widget dashboard were also implemented.

These algorithms were all developed using a web services architecture, communicating using either XML or JSON. Several scientific papers were published as the results of the described research. Below are some screenshots of the Breadcrumbs modules I contributed to in some way.

Ciclope

From June 2010 to December 2010, I worked on Ciclope, a real-time data visualization project aimed at gathering information from SAPO Blogs clickstream and displaying it in a useful way, allowing the blog owner to have an understanding of how the traffic flow of his or her blog behaves.

Unite

From January 2011 to June 2011, I focused on graph mining and community detection. As part of my work, I developed Unite, a Java library with the goal to provide an agile platform for link extraction and graph mining. Unite was built with out-of-the-box support for the most common graph building use cases, providing at the same time an extensible and highly modular interface that allows developers to adapt the framework to their own needs. Below is an example of how to read content from a MySQL database, parse it and write the resulting graph to a Blueprints-enabled graph database — in this case, Neo4j.

TREC 2010 Blog Track

In 2010, I was presented with the opportunity to participate in the Blog Track for the Text REtrieval Conference. TREC is an information retrieval conference that holds several competitive tracks every year, providing their own datasets (at a price) together with human assessment for the relevance of each resource, given the search topics of the competition. This allows the participants to calculate metrics such as the mean average precision in order to evaluate their retrieval system. For a novice in the area, the experience gained through TREC's participation is immense.

Our work focused on using some of the structural properties of the blog graph as query-independent features for the blog distillation process. Specifically, we ranked each blog according to the in-degree and then compared it to the rank according to the h-index (a metric commonly used in bibliometrics to measure the scientific output of a researcher). We then introduced a weighted and normalized value, for each of these link-based metrics, in the final document score, obtaining result improvements for the h-index, but not the in-degree.

During the participation in TREC 2010 Blog Track, I acquired competences in the tasks of link extraction, graph mining and blog distillation, additionally learning about large scale methodologies for indexing, searching and parsing large-scale collections.