🎯Temporal action segmentation translates an untrimmed video into action segments by classifying action labels for each frame.
🌐Existing models often suffer from out-of-context errors and do not effectively consider the visual features and activity context.
📜Our method introduces an activity grammar induction algorithm and an effective parser to improve action sequence generation and refinement.
⏲️We employ a dynamic program optimization approach to find optimal action probabilities that fit the activity context.
📊Experimental results show that our approach outperforms existing models, effectively removing out-of-context errors and improving overall segmentation performance.