Pedestrian trajectory prediction constitutes a critical research challenge in autonomous driving systems, intelligent security surveillance, and human-robot interaction frameworks. The capability to accurately anticipate pedestrian movement patterns directly influences the operational safety of autonomous vehicles, the responsiveness of surveillance systems, and the adaptability of social robots in dynamic environments. While existing approaches predominantly focus on leveraging sequential data patterns and optimizing model architectures through recurrent neural networks, they often overlook the intrinsic social-semantic characteristics embedded in real-world pedestrian interactions. Current methodologies tend to treat trajectory prediction as a purely sequential modeling task, overlooking three fundamental aspects: 1) the social constraints governing crowd movement patterns, 2) the intentional, destination-oriented nature of human locomotion, and 3) the dynamic adaptation mechanisms pedestrians employ during path navigation. This oversight leads to suboptimal performance, particularly in dense pedestrian scenarios where social interactions and environmental adaptability notably influence movement decisions. Aiming to address these limitations, this paper proposes path stepwise estimation network (PSEN), a novel framework that systematically integrates social relationship modeling, endpoint-aware trajectory planning, and environment-adaptive path refinement. The proposed model bridges the gap between conventional sequence prediction paradigms and the complex socio-spatial dynamics inherent in real-world pedestrian navigation scenarios.
This paper incorporates the characteristics of path planning observed in daily human walking, which can be broadly divided into three key aspects. First, social restrictions are considered. The crowd is categorized based on movement direction, speed, and distance to demonstrate these reflections. Intra-class feature learning is then performed on the classified groups. The social relationships between predicted pedestrians and other pedestrians are calculated using social weights to obtain social attention, which affects the subsequent path estimation network. Second, an endpoint estimation network is introduced by stimulating the feature that pedestrians typically identify a destination and then purposefully plan their walking path. This network leverages the strengths of serialized prediction tasks by using spatiotemporal sequences to predict an endpoint. The estimated endpoint serves as a reference condition within the overall network model, guiding the complete path planning process. Third, this paper address the fact that pedestrians constantly fine-tune their local paths and adjust their focus based on environmental context and destination. Aiming to model this behavior, an endpoint and path fine-tuning network is constructed using conditional variational autoencoder (CVAE) and multilayer perceptron (MLP). This module takes the output of the endpoint estimation network as a condition and uses the output from the social restriction module, along with the historical trajectory, as inputs for feature learning. After every three frames of prediction, the social restriction and endpoint module outputs are updated according to the current environment of the pedestrians. This update allows the model to automatically fine-tune the planned path in response to dynamic surroundings.
The experiments are conducted by comparing the proposed method with six baseline methods on the ETH/UCY dataset, five baseline methods on the SDD dataset, and four baseline methods on the NBA SportVU dataset. The evaluation metrics used are average displacement error (ADE) and final displacement error (FDE). On the entire ETH/UCY dataset, ADE and FDE are reduced by an average of 5.1% and 7.5%, respectively. On the SDD dataset, reductions of 1% in ADE and 2% in FDE are observed on average. When analyzing individual datasets, the performance improvements are highly pronounced in scenarios with denser pedestrian traffic. Notably, in the ZARA1, ZARA2, and UNIV datasets, the proposed method achieves improvements of over 10% in prediction accuracy. Ablation experiments are also conducted on the ETH/UCY dataset to evaluate the contributions of individual components of the PSEN framework. The experimental results demonstrate that each module of PSEN notably improves the effectiveness of pedestrian trajectory prediction, achieving average reductions of 19% and 31% in ADE and final displacement error FDE, respectively. Ablation experiments are performed in parameters such as social distance, social attention weights, and the number of frames used in stepwise trajectory generation. These experiments confirm that all aspects of the network design positively impact pedestrian trajectory prediction. However, the model does not perform as well on the NBAsportVU dataset. This dataset is characterized by 10 players moving at high speeds, with trajectory endpoints changing dynamically based on in-game situations and players’ intentions. Different from ETH/UCY and SDD datasets, where movement is predictable and socially constrained, the varying roles and tactical decisions of agent in NBA dataset play a crucial role in path planning, making prediction highly challenging. Therefore, achieving accurate predictions by relying solely on time-position information is difficult because the characteristics of pedestrians in this setting notably differ from those in typical pedestrian scenes. In sports scenes, athletes actively seek collisions and obstructions as part of their strategic movement. PSEN does not consider the role-specific behaviors of agents, limiting its effectiveness in such environments.
The PSEN model proposed in this paper integrates the serialization task with three key features of real-world pedestrian scenes. By combining recurrent neural networks with a CVAE, PSEN effectively reflects the complex features of pedestrian trajectory prediction in realistic scenarios. The model achieves superior performance on the ETH/UCY and SDD datasets, providing a new direction for subsequent tasks in pedestrian trajectory prediction. However, this study focuses only on interactions among pedestrians and does not consider the relationship between pedestrians and other objects, such as vehicles and obstacles. In novel environments, or in scenes where pedestrians are sparse but other dynamic or static objects are abundant, the performance of the model may degrade. Further research is needed in terms of the relationships between pedestrians and objects, along with their associated feature information.
| 科 Family | 属数 Number of genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) | 属 Genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) |
|---|---|---|---|---|---|---|
| 鹅膏菌科Amanitaceae | 2 | 11 | 5.26 | 鹅膏菌属 Amanita | 10 | 4.78 |
| 小菇科 Mycenaceae | 2 | 12 | 5.74 | 丝盖伞属 Inocybe | 5 | 2.39 |
| 多孔菌科 Polyporaceae | 8 | 14 | 6.70 | 蜡蘑属 Laccaria | 5 | 2.39 |
| 红菇科 Russulaceae | 3 | 23 | 11.00 | 小皮伞属 Marasmius | 6 | 2.87 |
| 小菇属 Mycena | 11 | 5.26 | ||||
| 光柄菇属 Pluteus | 5 | 2.39 | ||||
| 红菇属 Russula | 17 | 8.13 | ||||
| 栓菌属 Trametes | 5 | 2.39 |