Promotionskolleg: CrowdAnalyser
Spatio-temporal Analysis of User-generated Content
A key characteristic feature of the Web 2.0 is that data is voluntarily provided by users on the Internet through portals such as Wikipedia, YouTube, Flickr, Twitter, Blogs, OpenStreetMap, and various social networks at an unprecedented scale and staggering rate. In today’s information society and knowledge economy these portals provide a valuable resource for diverse application domains. The enormous potential of this voluntarily generated (crowdsourced) data through the masses of volunteers (crowd) is increasingly recognized, but in many areas, especially in science, it is not utilized to its full potential. There are several unsolved issues that arise from these rapidly increasing, very dynamic and highly heterogeneous data streams of content created by users. Addressing these issues has the goal to automatically assess and develop this new type of poorly structured data for different application domains, in particular, to infer new information. The participating research groups in Heidelberg have done pioneering work in these directions, especially in the context of utilizing geographic data. The objective of the college is to develop novel methods and approaches towards the quality-oriented analysis and exploration of crowdsourced Web 2.0 data as well to further improve and scale existing methods. In comparison to existing efforts, especially the following two key points are considered as new aspects towards such approaches:
a.) The temporal aspect of dynamically changing data – in addition to the more typical geospatial and semantic aspects – needs to be a fully integrated into these approaches.
b.) In order to significantly improve analytical approaches, it is essential to combine heterogeneous streams of data (text, video, images, geospatial data). So far, such data streams have only been studied in isolation. By considering possible relationships between data streams the quality of information extraction approaches and enrichment of the base data can be improved significantly.
The reference frame considered in this research is composed of space, time, and semantics. By combining these axes we expect major improvements of data analysis techniques and novel insights into the exploration processes. By joining the expertise of the research groups participating in this college, research on above topics and problem settings can be conducted effectively.
Potential Research Topics
- Extraction and Enrichment of Event-Data
- Real-time prediction and finding of alternative routes
- Extending visual object recognition with textual metadata
- Improving OpenStreetMap through machine learning
- Crowdsourcing 3D: Fusion of 3D and dynamic geodata from technical and human sensors