Packetization Help - MediaStreamFile - FileSource

Jan 9, 2013 at 1:37 AM
Edited Jan 9, 2013 at 4:45 AM

Hi all.

Is it hard to make a MediaFileStream?

MediaFileStream means...

'If you wanted to stream from a file you would make a new type of SourceStream e.g. MediaFileStream and inherit from SourceStream.'


And MediaLiveStream also..

 Please give me a some advice...

Jan 9, 2013 at 8:30 AM
Edited Jan 10, 2013 at 1:30 AM

You would need to implement a parser for the format you wanted to save from the `MediaFileStream`. This is also known as a packetizer or de-packetizer.
You can use for a majority of these formats as far as reading and writing the end files however to get the data (over Rtp) you would want to put into those files when writing you will need the packetizer for the specific format, this is because in some cases the Rtp format is different than the end storage format.
e.g. or are not so much besides for the Network Abstraction Layer.

BaseMedia Library also has support for the Network Abstraction Layer packets.
The Network Abstraction Layer was created to deal with the transport of streams across networks but for some reason it is bundled with Rtp and embedded within it where as they could have just defined what a Box / Atom was and forced people to send those resulting a much smaller encapsulation with less work required to do something like this.

Why it was decided go the route they did I am not sure other then to say they were attempting to make the payload as small and as standard as possible. It is as if the protocol was not compressed or defined enough already and required an internet standard above the ISO standard to ensure that the transport was correct. The other way to look at it is because there are so many variations of a how a stream can be encapsulated that they wanted to ensure everyone was using the same encapsulation.
It could have been a brief reference in most of the RFC's as was which basically would have had resulted in dumping data into a file / decoder however I guess in some cases the encapsulation they give has some addition 'benefits' besides size reduction such as breaking the maximum size of a format inter alia ;)

Users would have already had the SessionDescription information which is more then enough of a supplement for a content-type used in Http however just how RFC4571 defines the length but not the framing or control characters and leaves that to RFC2326 there are quite a few things you will find you might not agree with or understand but have to cope with in order to achieve what you need to.

E.g. I don't agree with SDP in general as the Rtcp SourceDescription is more than enough to convey the same information however I am just implementing a protocol, there would be little gained from making comments in the working group because the protocol is so widely adopted and utilized. I think the protocol creep between Rtp and Rtsp in general is very confusing (especially when using Tcp Interleaving)...
I could go on and on about this but I am not sure what information you are looking for specifically besides what you have stated.
In the future I hope to work on more of the decoding / encoding and some additional DSP related algorithms to bring even more functionality to this library however that is subject to me having the time required to implement such. 
The last advice I can give is to check out ->
Stack Overflow

Live streaming is much easier and completely implemented due to how it aggregates the packets and does not completely parse them because it is not required. 

As far as `LiveMediaSteam`
The server currently works with VLC,, FFMPEG etc (LibAv) and supports multiple streams at a time e.g. audio and video or just one or the other or some other type of combination per live stream. This functionality can be found in the `RtspSourceStream` class which just needs a Uri which is where the media is located e.g. 'rtsp://' and any username / password in the form of a Network Credential if required.

The examples show how to use the RtspClient, RtpClient, RtspServer -> ClientSession and individual packet classes so beyond those examples you must implement a parser for the specific frame types you are dealing with.
Rtp is one of my current interests at the moment so if you need any other advice or opinions please feel free to ask.

Jan 10, 2013 at 1:09 AM
Edited Jan 10, 2013 at 1:11 AM

Thanks. ㅠㅠ

You are very helpful.


Jan 10, 2013 at 10:19 AM


I have question.
What is the DataExtention array in rtpPacket?
And Is the rtpFrame is the one frame of video? 
If yes, 
If I want to provide live media stream from live source like a camera,
I should make a frame packet from some live source and generate rtpPacket from franePacket.
And transfer rtpPacket through your rtspServer, rtpClient?

Please give me a some advice :)
Jan 10, 2013 at 3:27 PM

The DataExtension array is there for any extension headers on the RtpPacket (which will be present if Extensios are true) , otherwise they are not used. The format of the extension data stays the same usually it's the data inside which changes. Personally I am not a fan of Extension headers and I dont't believe they have much purpose in most of the other formats they are simply there for testing and development.

You essentially have the idea of it, you need to take the packets that come in and send them to each client (updating the ssrc), this is the RtspServer's function and it already does that for you if you add a stream to the server then you can connect with VLC / Quicktime and see your original stream through your server.

If you need anything else let me know!

Jan 11, 2013 at 12:41 PM

I am making a new class, LiveStream. Because I want to stream from live source like camera.

I follow your code. I handled OPTION requeset/response, DESCRIBE requeset/response. I tested with VLC Player.(Uri="rtsp://localhost:10000/live/videoOut_1".)

But I have trouble with "ProcessRtspSetup();".  What is the "Interleaves"? I want to call the "ProcessSendRtspResponse()" for the requestint PLAY.

Is there a protocol like SDP in DESCRIBE request/response? Please let me know a scenario of "ProcessRtspSetup()" and next.

Please give me a some advice.... :)

Jan 11, 2013 at 3:10 PM
Edited Jan 11, 2013 at 3:28 PM

The live stream is already implemented in the RtspServer as RtspSourceStream.

There is an example of this in the Tests.Program.

The SDP is used in Describe. It re-writes the SDP from the Live Stream so it can be devlivered from the RtspServer.

ProcessSendRtspResponse is a Server method which means you are using the server I assume.

Interleaves are used in Tcp/Udp for separating the counters and information for each sub stream in the stream. E.g. Audio and Video. They also allow you to know what kind of media each session is using on each socket. It allows a RtpClient to have Udp and Tcp at the same time technically if supported by the server.

The RtspServer is already setup to handle the `Play` so if you are deriving the RtspServer you can just call` base.ProcessRtspRequest` and it will handle the all supported requests (OPTIONS, DESCRIBE, SETUP, PLAY. GET_PARAMETER, SET_PARAMETER and TEARDOWN. The RtspServer should be standard enough you do not have to derive from it or re-implement it.

What is the specific problem with you are having with ProcesssRtspSetup? Do you have a Rtsp Uri you can share to the camera you are testing again so I can try it locally?

Jan 12, 2013 at 1:32 AM
Edited Jan 12, 2013 at 1:34 AM

Actually I don't have any Rtsp Uri. The 'camera' is not a network camera. It is a device that generate some image locally. That image is only existing in memory(It doesn't provide rtsp/rtp).

So I was searching your code before. ("The live stream is already implemented in the RtspServer as RtspSourceStream.") I couldn't find. I think that your code is re-transfer the rtp packet. Do not generate rtp packets from live source(like byte[]). So I think that I have to make a new LiveStream class for streaming live source. So LiveStream class' s constructor have two arguments string name and int sourceNo(like a descripter in linux). String name will be assign to m_Name(like "Alpha", "Beta" in your test code). Int sourceNo is used to unique id of live source. Of course the sourceNo is in that device.

Could you tell that your library is already implemented for this?
Of course the "MMA" is very helpful for me. :)

Jan 12, 2013 at 1:50 AM
Edited Jan 12, 2013 at 2:28 AM

You are correct, the RtspServer does transfer the RtpPackets from a RtspSource, this means if your camera did provide a Rtsp Uri your work would be done...

Since it does not you are correct again. E.g. I  Imagine you are using something that creates you images and provides them to you as a System.Drawing.Image.

You can generate RtpPackets if you have a construct to create the associated RtpFrames for the payload type, this is known as Packetization.

See JpegFrame and the TestJpeg example which will help you when working with System.Drawing.Image's.

With that class (JpegFrame) for any System.Drawing.Image you have you can create a RtpFrame and subsequent RtpPackets and then send them to your sources.

Your derived class -> 'ImageSource' would use JpegFrame to create a RtpFrame for every new Image.

Then you would send the created RtpFrame's packets to the end user.

I think I can implement a example very quickly showing you how to do this however I think it's best for you to gain an understanding of the library by creating it yourself. When you are done definitely submit it back for inclusion in the examples! 

If you get stuck check out:

Please remember to provide feedback there so I can get the idea of what you are trying to accomplish and if the example is generic enough for broad use.

If you need any further help let me know!

Jan 12, 2013 at 4:25 AM
Edited Jan 12, 2013 at 5:00 AM

RtspSourceStream and RtspStreams are worked well. But Another Streams like 'ImageStreams' and 'LiveStream' are not worked in RtspStream. Because Only RtspSourceStream is used in RtspServer. So I fixed 'RtspServer' and make 'LiveStream' before you make a 'ImageStream'.

Anyway I think "ImageStreams" is exactly same what I'm going to do. Actually I'm making a "LiveStream". Even there is an error in ImageStreams:Line 70:Rtp.RtpClient.Sender, I'll test your code.

Thanks :)

Jan 12, 2013 at 5:43 AM
Edited Jan 12, 2013 at 5:50 AM

Hey I noticed that to support what I was advising I needed to update the code and I have!

Let me know if the latest sources help you or if you find something you did differently!

The reason I went about it the way I did was because eventually other transports may be added besides Rtp/Avp such as RTMP or Real or something else (especially if someone contributes it)

So to encapsulate that for now I have just inherited from RtpSourceStream and exposed that on the server.

Eventually I foresee a 'Transport' base class which provides all of those semantics and then `RtpSourceStream` would provide an `RtpTransport` Implementation which derives from `Transport` to be used for Rtp/Avp session over Rtsp.

This is all very preliminary though and I will definitely wait for more feedback before progressing too much in those areas.

If you need anything else let me know and thank you for taking the time to provide feedback and ask questions!

Jan 14, 2013 at 12:01 AM
Edited Jan 14, 2013 at 12:06 AM

There is a problem. 'RtspServer' only treat the 'RtspSourceStream'. So I fixed the 'RtspServer' for working well on my 'liveStream' or 'RtspSourceStream'.
But it is not efficient way of extending some 'source stream'. I think that there are two way of being more generic rtsp server.
First, fix 'rtspServer' more generic for every stream. I choose this way but I think that it is a temporary work for me.
Secound, make a new Server class like 'ImageRtspServer' or 'liveRtspServer'. It is not generic way. But it will work definitly fine.
Maybe I have a wrong idea of your works. But I think that it is not enough for 'ImageStreams' or my 'liveStream'.

And... I think that it need more log for Debug mode. It is difficult to debuging for fool beginner like me.

Jan 14, 2013 at 3:41 AM


Thanks for your input!

I realized the RtspServer only took RtspSourceStreams shortly after I had made the post :)

In the latest version of the code I have changed that so `RtpSourceStreams` are exposed on the RtspServer.

Eventually though it should be `SourceStream` and not `RtpSourceStream` exposed on the RtspServer.

The Generic Server ideas is good but it kind of gets you in the same predicament as soon as you want to work with more then one stream type.

E.g. I would have Server<ImageSource> Server<RtspSource> etc... but then I have to have separate servers for each type unless I had a class like `Source` or `Stream` to constrain the Server to (which gets me back to my current design)

If you can give me more information on what you are doing and why the current ImageSourceStream doesn't work I will do my best to help!

Thanks gain for your input!

Jan 14, 2013 at 4:57 AM
Edited Jan 14, 2013 at 8:26 AM

My goal is liveStream from some file that is generated by 'My device module'.

There are two issues for my job.

First, Generic server issue. I think that you know what I mean.

Secound, Race condition of File I/O. 'My device module' generate file frequently. And your Packetize() of 'ImageStream' read my file and Enqueue the packet to RtpClient of 'ClientSesstion'. So my file is generated in ImageStream and read in 'ClientSesstion' for sending to play. This is a problem. My file is used in different threads. Maybe I know a wrong scenario between ImageStream and ClientSession. But when I changed my file name with numbering, the program works.(Even though my files are generated so much!) I will fix the my file I/O module for my goal. If you are willing to fix this, I think that the generic file I/O is needed.

Of course, maybe I wrong.

And another question.. When I test the my 'LiveStream', VLC Player print Error "[mjpeg @4fe3d20] Found EOI before any SOF, ignoring [mjpeg @4fe3d20] no Jpeg data found in image. What is it mean.... Is it impossible to transfer jpeg image continuously? 

Jan 14, 2013 at 3:41 PM
Edited Jan 14, 2013 at 10:51 PM

Send me and example of your live stream so I can take a look at how you are doing it!

The files should no longer be needed after calling the `Add` Method on the ImageSourceStream as it is loaded in memory!

So it should be safe to use that image elsewhere...  in say a UI thread or wherever.

As for the last problem use the latest code... EOI markers were taken out but caused defects in VLC when viewing the stream. When I kept them in the defects stopped which I believe is a bug in VLC although it could still be a bug in my JPEG code.

There was also a bug in RtpFrame which caused some packets to be lost when multiple threads were calling Add for a RtpPacket when using ImageSourceStream, I have since correct this by performing the check for Complete inside the lock which should stop those errors.

The latest code plays for as long you watch in VLC continuously :)

If you can explain more about the generic file IO I will definitely try to help! The file IO is generic in the sense that every ImageSourceStream can take any kind of file! Even a proprietary one if you just convert it to a System.Drawing.Image before adding it!

I have updated the source again to make the names of things more relevant, check it out and then at least re-ask the question so I know you are using the latest names if you can't send an example.

Thank you again!

Jan 15, 2013 at 12:38 AM
Edited Jan 15, 2013 at 12:43 AM

Also the error message is same(from VLC). As you mentioned about VLC's bug, I will also ignore the bug. I checked the rtspServer code of yours. It is changed generally.(RtspSourceStream -> RtpStream).

Unfortunately, my goal is changed. The source was a JPEG image. But now, it is a mkv or mp4 video file. Play time is a few second. So I have to make new class 'videoStream' and 'mkvFrame' or 'H.264Frame'. It is very challenged for me.

I think that issue of generic file I/O is quite simple but difficult implements. In your sample code 'Programs', 'JpegRtpImageSource' is generating a rtpPacket from some file in some directory. And send the packet to client. (ImageStream read the image file and pass to the queue. Finally clientSesstion used queue.) There are no problems.
But my goal is a little different. The image source( It was a JPEG, but it is a video file now ) is changed frequently. I think it will be problem. Some code read the file when the other code re-write the file.(race condition)

I'm very confused...That is what I have to do.

First, Make videoStream and videoFrame.
Second, I have to think about File I/O.
Third, Forget the JPEG.

Jan 15, 2013 at 6:08 AM

Dear Sir!

Do not be overwhelmed!

I initially stated my opinion on how confusion video was in the article at Code Project, I also stated that I am not sure why certain decisions were made in the various standards. 

I was also disappointed and overwhelmed in the beginning however I used my dis-approval to force myself to recognize a approval-able way to engineer the solution!

Before I respond I will again single out Jpeg over Rtp. There are numerous problems with the current spec / standard.

1) File width and height is limited due to an unknown reason. It was intentionally designed to use a / 8 for determining the width and height information however this forces the width and height to max out at at much less then what a JPEG can actually hold.

Take this simple math as the proof:

Let byteMaxValue = 255;

MaxWidth = 2048 / 8 = 256; 

MaxHeight = 4096 / 8 = 512;

Why '8' was chosen I do not know. If they used 256 then it would have increased this restriction at least 16 fold, for instance:

MaxWidth = 2048 / 256 = 8; 

MaxHeight = 4096 / 256 = 16;

2) Comments are lost in the encoding and decoding process. This is significant because MPEG data does not lose it's user specific data during conversion and in fact undergoes much less 'RTP' overhead or restrictions due to this. You can currently include EXIF comments however since it is not standard it may or may not work depending on who implemented the library. I will admit the standard technique does cut down a significant amount of data however if the losses out weights the gains have to be determined per application.

I am sorry to hear your goal has changed, I originally almost went down the same exact path as you! I had to implement all this and then I found out I would also need not only Video Decoders but also Video Encoders.

That being said I do not know anything about your specific project however I can tell you what I imagine and you can tell me if I am right or wrong.

You have a camera which generates individual H264 frames in NalUnit's and send them to you on the network.*

Have you have successfully taken the NAL's and put the data into a file which can be played with VLC?

If not you have a camera which writes to a .MP4 file and it grows and grows until you stop it.*

Or the file gets delivered from the camera when certain events happen e.g. motion and the file is delivered with a date and time name but never gets written to again.

It sounds like the first one but whichever of these it is Memory Mapped Files will help you ->

Combined with Manual Reset Event or a Semaphore or two you should have NO problem with concurrent access.

It will allow concurrent reading and writing and should be less of a headache then using the file system for all of the work.

From the memory mapped file you can then create your RtpPackets and put them in a RtpFrame one by one using the individual frames you are getting from your camera. After putting them in packets you can then write the data to a real file on the file system or do whatever you need to do with it and then handle the next chunk of data. has tools for parsing the Nal's and MP4 files. has tools for parsing the MP4 files.

You can read the files using either of those and then all you need to implement is really a single class `H264Frame`.

You can implement this class by using ->

Since you are not receiving then all you have to do is choose a method of 'Packetization' e.g. (None)


         This parameter signals the properties of an RTP payload type or
         the capabilities of a receiver implementation.  Only a single
         configuration point can be indicated; thus, when capabilities
         to support more than one packetization-mode are declared,
         multiple configuration points (RTP payload types) must be used.

         When the value of packetization-mode is equal to 0 or
         packetization-mode is not present, the single NAL mode MUST be
         used.  This mode is in use in standards using ITU-T
         Recommendation H.241 [3] (see Section 12.1).  When the value of
         packetization-mode is equal to 1, the non-interleaved mode MUST
         be used.  When the value of packetization-mode is equal to 2,
         the interleaved mode MUST be used.  The value of packetization-
         mode MUST be an integer in the range of 0 to 2, inclusive.


This can be very easily to accomplish depending on how the data is being sent to you from he camera.

If it's in files it is a bit more complex to get the RtpPackets and frames however if you can get the frames in native format with the NAL's then you are already 99% there.. you just need to take the data and apply a small RtpH264Header to each packet and then put that packet in the frame and send it and you are done.

Keep in mind my library lets you send packet by packet also so you can just keep a H264RtpHeader around like I do in the JpegFrame class and then when I `ProcessPackets()` I use that header and correct what I need to in it along the way.

I would like to work more on the video stuff and I have however I don't really have anything I can release as part of this library yet now will I probably for some time if there is no help or contribution.


In short you only need implement a single class (H264Frame) and have it operate as the RFC defines.

There are some java examples e.g.

There is also a C# example here

I am not sure what else you need because you have not told me much and I am making a lot of assumptions already!

If there is anything else I can do please let me know and also please don't forget to contribute back anything you can! Even if it's not perfect it will still help a lot!

Jan 15, 2013 at 6:33 AM
Edited Jan 15, 2013 at 7:16 AM

Thanks god!

Unfortunately, I can't tell you about my camera likely device. Because it is not existed yet! My friend is still developing that.

But your assumptions are exactly same with my idea. That device will generate video file(He changed format... Memory > JPEG > file... forget the memory ㅜ.ㅜ). And it must play on VLC. That is a his job. I just focus on the new class 'H264Frame'. So I would test my new class with sample mp4 file(

Your comment is very helpful. And exactly same what I want to do.

Thanks! :)


Jan 15, 2013 at 6:48 AM
Edited Jan 15, 2013 at 7:02 AM

No problem @ all!

Please do let me know if you implement H264Frame! I would definitely like to add it to the library. (Even if it is not complete and only works for limited packetization modes)

I will also say that getting the data from a file is harder then having him just give you the NAL so if you can get that it would be much easier on you. He can provide it fairly easily if he is already doing the h264 encoding and he can allow you to get it either via a special port or using a special type of connection.

I would especially like to add an implementation for System.Drawing.Images however that is a MUCH MUCH MUCH larger undertaking.

If you need anything else let me know and maybe we can even work together on the H264 Frame :)


Jan 18, 2013 at 5:44 PM

hjkim is asking about things over on my project so I thought I'd pop in here and comment.  I had started fiddling with creating RTP frames for h.264 before you posted your project.  I'll dust off that code and see how much can be used with your library tomorrow while I'm doing the stream extractor sample.  There are 3 different modes and several different frame types for h.264 including essentially "bare" NAL Units with a small RTP header (NAL is what you get out of my demuxers) and multiple aggregated and chunked frame types.

I haven't even begun to look at the RTP framing for AAC, AC3 and MP3 audio.

Just so you know, there IS a profile for TS over RTP.  It's used for trunking TS streams with better error correction than TS over UDP provides for.  I'm not sure of the frame format but I expect it's pretty simple and far easier than demuxing TS and outputting all the programs and tracks as separate RTP streams.  RTSP and TS have a lot of overlap in purpose but TS has no protocol for packet re-transmission or seeking, pausing and trick play -- it's a one-way live only protocol.

Oh, and the reason why NAL is used with RTP instead of MP4...  

MP4 can contain multiple tracks of many different codec types.  Its timing information is designed more for random access, too, and with all the metadata it can contain it'd be difficult to live stream properly as is.  Segmented MP4 is a little better (it's used for Smooth Streaming and DASH for HTML5) but it's still overkill for a lightweight udp based stream protocol and it's useless for low latency applications.

NAL is the lowest level framing for h.264 (akin to MPEG2's Elementary Stream) and it's a bitstream format (not byte-oriented) so the packets are rather a pain to parse.  Usually only encoders parse into them enough to do more than separate them from one another.  NAL is what gets stuffed inside MP4, PES+TS and MKV containers.  The only modification that is usually done to it is swapping between start codes (hex 00 00 01) and 32bit bigendian length prefixes.  TS uses the start codes and MP4 uses the length prefixes.  RTP uses neither prefix but has its own length field for the same purpose.  There's also emulation prevention 0x03 bytes that are stuffed into the NAL unit itself if there are two consecutive 0x00s followed by 0x00 thru 0x03.  Those are removed in some containers and transports.

NAL has no timing or sequence information which is why it needs to get packed into some other format.  Within a GOP there is some sequence information for the frames since they are sent out of order on purpose but it's difficult to parse and not good enough for error detection and correction.  Also, the NAL units are variable length and some are exceedingly large so they have to get broken up into smaller packets for efficient transport over UDP.

Jan 25, 2013 at 1:58 AM
Edited Jan 25, 2013 at 3:02 AM

I will happily implement the TS Packet for the Rtp Profile even if it's only basic and does not contain the ToImage method like JpegFrame, something like TransportStreamFrame however I think that with the result of the `Assemble` Method in the base RtpFrame should be more than enough for the existing TSPacket  and Nal class to works as is, that is unless there is something I am missing from my quick reading.

The only thing I could find when looking very quickly was something for Elementary Streams and not Transport Streams specifically ... Which encompass transport streams I do believe.

I was also able to find this which relates to the previous one but apparently just the packet format including header, the other RFC is what to do one you assemble the frame I assume.

I imagine that once we deal with parsing the MP4 Atoms we will have a Packetized Elementary Mpeg Stream which contains Nal's, this is fine for Rtp because since a elementary stream supersedes it and encompasses the transport stream.. It only matters if you want to do something something more like get the data out of the stream and should not be a problem just putting them in so long as the PES packet is formed properly. 

I see things usually implemented as ES_Descriptors or ES_Objects but I am sure that PES packets contain those and they are likely only relevant when doing something more with the stream as stated above. 

In his case you will need to make RtpFrames which contains RtpPackets from the(Packetized)ElementaryStream which is a single (program) stream with no audio and stored in a single track of the container file.

This can be done by parsing the NAL's up to the size of MaxPayloadSize - RFC3640 Header Bytes (on a boundary) and then sending them with the RtpClient.

You can already read the Atoms/Boxes easily you just have to know where to look in the BaseMediaFile for the audio or video samples, as JClary has told you.

You need to read the sps and pps from a place in the file so you can create a SDP with the correct attributes then move on and find the samples for the audio or video track you are interested in.

After you have the sample you then make a RtpPacket's with the samples data and profile header and send it along by consuming NAL's from the sample until you reach the max packet size, then increment the sequence number and continue until all Nal's are sent. (Similar to JpegFrame)

Audio will be doing the same thing but only for the audio if constructed to do so, when reading samples you will need to know if you are doing to be reading two tracks or just one and these will have to be synchronized in sending. 

I think maybe in a day or two tops I could make the example to read the boxes and get the samples and create packets but I am pressed for time, I hope to have a lot more time in the coming weeks to work on this (hopefully with jclary) but at the current time I am in the middle of moving so things are quiet hectic.

If you need help from there or I stated something wrong let me know but you may not even need to derive from RtpFrame, you could just use a RtpPackets inside a class called 'H264FileSource' (derived from a 'ElementaryStreamSource') which accepts a stream with Nal's and creates the packets similar to `ImageSource` by consuming and writing the Nal's and setting the marker bits where appropriate which I do believe is at the end of samples.

MediaFileSource would have to be a very high level class which supports all known codecs and created packets correctly based on the codec used in the file. It's best to start implementing the various 'CodecFileSources' such as MPEG1, MPEG2, MPEG4, H264, AAC etc as stated before creating a super-class which could encompass the logic necessary to do this for all known codecs.

Let me know if you have questions and hopefully I will get the examples (H264 first) done soon!

This has the added benefit of allowing the later addition of something like Mpeg4Frame, H264Frame etc while not having to change existing logic.

Thanks JClary (and hjkim) and let me know if I was incorrect in anything I said or if I misunderstood!

Jan 25, 2013 at 7:48 AM

TS can be encapsulated over RTP as outlined in RFC2250.  The MIME type for RTP is video/MP2T.  It's overkill but it's good if you need to avoid remuxing and just want the extra features RTP offers for retries and seeking.

You don't really need a TS library if you are going to do that.  If you want to demux and do multiple RTP streams, though, my TS library should do what's needed already.  TS comes interleaved nicely so even reading from a file you don't need multiple threads but you will have to do something to slow it down and send in real-time.

NAL is just a sort of updated version of ES designed for h.264.  It has similar types of packets but with different layouts and identifiers.  h.264 was a little too different to be stuffed into ES, I think.  Both can be used to create a program stream and when that feature isn't used, they are both sometimes just referred to as ES despite being incompatible.  Both use the 00 00 01 start code followed by a 1 byte type identifier but those identifiers are totally different and likely overlap so you have to know which you've been given.

Both NAL and ES can be encapsulated in PES.  And PES is what's encapsulated in TS.  Don't ask me why they needed 3 layers of packets.  Adding a 4th layer for RTP is pushing ridiculous.

I'm working on a stream extractor for mp4.  Getting the iteration right is a bit tricky because the tracks can either be interleaved (like TS) or sequential and if they are interleaved you'll want to read them out in order without seeking the file or threading... If they aren't, though, the best you can do is open the file multiple times on multiple threads -- one for each track -- or try to interleave them on the fly with a lot of seeking.  Any way you look at it, it'll hit a hard drive pretty hard if the file isn't interleaved. 

Getting the chunks out is pretty easy aside from that...  working out the timecodes for each is a tad tricky, though, as you have to add up all the durations as you go and the timebase is essentially arbitrary.  Each track has a divisor defined for seconds so the divisor would be 10000000 for 100ns ticks which is fairly common but it could be anything and often is one of a handful of clock crystal rates commonly used in hardware encoders.

I'll have a method to get a TimeSpan based on 100ns ticks regardless of the divisor.  That way it'll be easy to stop and wait until more needs to be sent.


Jan 25, 2013 at 3:20 PM
Edited Jan 25, 2013 at 3:46 PM

Hey sorry for the confusion in my post and thanks for clearing it up.

I am in total agreement with the ridiculousness, not to mention the 8 or so bytes used for the profile in RTP which comes after the header in the payload...

One way to possibly get around the interleaved threading issue( or way to mitigate it ) it through the use of the RtpClient. It sends packets in intervals of 1024 and would allow you to just add packets and not worry about when they arrive. Rtcp will cut down packets which are too late for the developer automatically and remove them out of the List before sending and it is already using a thread of it's own for this purpose.

If you wanted to have the RtpClient automatically order the packets it can do that too just by using RtpFrame to hold the packets you find / create before sending / queuing them.

The method would still be good for generating the timestamp though (if the PTS / DTS is not used)

Thanks for pointing out the RFC about the RTP TS it still references elementary streams though, that must be for MPEG1 and 2 because I think those streams are different than MPEG4 streams than mpeg1 to mpeg2 was, About the retries, I cannot find the text in the spec to outline how that is supposed to work so as far as I see it's just a normal Rtp stream or so it seems.

But I guess you could have a MPEG2 TS Stream with h264 Nals and that would work too it would just be in the MPEG2 bitstream format rather than MPEG4 Format which is a little weird but im sure that is beyond the scope of this...

Anyway it sounds like we have all of the talent w need here to accomplish this(especially thanks to JClary)!

I will email him when I have more time (within a week or so) and then after that we should have something we can release in no - time.

Thanks again for all the help!

Jan 25, 2013 at 7:44 PM

TS is an "MPEG-2 Part 1, Systems" Transport Stream.  One thing that confuses people is it's not specifically related to "MPEG-2 Part 2, Video" (commonly referred to as just MPEG2 or sometimes h.262) just as "MPEG-4 Part 2, Visual" (a.k.a MPEG4, h.263, DivX/XviD) isn't related to "MPEG-4 Part 10, Advanced Video Coding" (a.k.a AVC, h.264) or "MPEG-4 Part 14, Base Media  File Format" (a.k.a. MP4).  Don't get too hung up on the MPEG number.  At best it indicates age of the spec, not what it is or what it works with.  MP4 can contain any codec as well like TS, it's just better suited as a file format for random-access playback. 

Sometimes I think it would be a lot less confusing just to use the ITU's h.261 - h.265 designations for the 5 MPEG video codecs but h.263 is technically a subset (baseline profile) of MPEG-4 Part 2 so that doesn't really work either.

TS works fine as is with all of the various audio and video formats in MPEG 1, 2 and 4 plus lots of other non-MPEG codecs and it'll be sticking around for a good long time since it's the basis of ATSC broadcast digital television, DVB, 3GPP, Blu-ray and Apple's HTTP Live Streaming among other things.  It does roughly the same job as RTSP and SIP; just for one-way broadcast-only transports that have no concept of multiple streams, ports or even byte alignment. TS is actually considerably more efficient than wrapping an elementary stream successively in RTP, UDP and IP packets then putting that in a PPP, ethernet, DS or SONET frames. 

It gets trunked over UDP or RTP but only to avoid repeated transmuxing when both the source and final destination need to be TS -- not for efficiency.  The good thing about trunking TS over RTP is you don't need to implement dozens of RTP profiles for the very long list of audio, video, text and image codecs it can contain.

PES (Packetized Elementary Stream) is really part of TS (I've never seen it used outside that context) and is also codec agnostic despite the seeming reference to an MPEG-2 Part 2, Video Elementary Stream.  As I've said before, the term "elementary stream" is often applied generically to any codec's raw minimal bitstream format -- NAL is an "elementary stream" in that respect.

Nothing special is done to NAL when wrapping it in PES and TS -- though the AVC/h.264 spec does define two ways to prefix a NALU.  PES+TS uses a different one from MP4 and MKV but it's not TS-specific and it's the same as what you see in .264 files.  TS is designed for bit-oriented (not byte-aligned) transports so it sort of makes some sense it uses the 00 00 01 prefixed start codes instead of lengths.  The 00 00 01 prefix is designed to make it easy to synchronize byte alignment in a bit-oriented stream like broadcast television.  The length prefix is much simpler if you already know your bitstream is byte aligned.

Jan 25, 2013 at 8:18 PM

Check out a doc I posted on my project, Taming Confusing Codec and Container Terminology for a list of some of the more common codecs and containers with most of their various names.  I've never seen anyone try to break it down that way before so I decided to do it myself since it's so confusing.

Mar 27, 2013 at 7:42 AM
Hey guys,

RC 2 (Flawless) is approaching fast...

I was wondering if you guys had any updates regarding your projects so I possibly could include more samples?

Let me know ASAP!

Jan 7, 2014 at 10:48 PM
I have released a new version of the source code!

I will resume working on the other packetization and decoding when time permits, let me know if these changes caused any issue!
Jun 14, 2014 at 1:12 AM
Hey guys I just wanted to let you know archiving will probably take place in the RtpDump format... the reader is all but done and the writer also, I just need to finish up some stuff for the text based format and short format and it will be complete.

From there all that would be needed is a set of extension methods to turn a RtpDump file into a Mp4 file etc.

For existing ISO Compliant Files e.g. rtsp:// I will probably rely on jclary and his library.

I am also looking into adding a h264 Encoder / Decoder... If I do that I will also probably remove System.Drawing support all but completely and then rely on FluxJpeg and the H264 code as means for doing all that.

Besides that there is little to report, hope all else is well and I hope to hear from you guys soon!
Jun 21, 2014 at 11:13 PM
Changeset 108560 added support for the 'RECORD' request.

I am hopeful to finish up support for playback of recorded streams in the next week or so, after that I will have ago at reading files for video on demand and performing packetization as necessary, right now the way I see it a general is reading of a file is performed, the codec is determined and based on the coded a type of source stream is selected as the packetizer. The stream data is then read and split into some rtp packets where it will be sent out as required.

I am sure there will be some nitche cases to attend to but that will be the jist of the process.

If you guys notice anything which needs attention let me know as I am also in the process of working in Rtmp support so Rtmp ->Rtsp/Rtp can be performed and also finishing up a few other things!
Jul 9, 2014 at 7:21 PM
Edited Jul 9, 2014 at 7:22 PM
Implementing Boxes on a level that jclary has is outside the scope of this project...

I however will be adding a general IsoReader and IsoWriter which work with Atom objects.

You will be able to do something like

new IsoReader(myfile).Select("stco","stcc","trak", "mdat", "mdia", "iods", "avcc")

new IsoWriter.Write(new Atom("mdat", new byte[]{....}))

Here is what my experimental tests are showing...
static void Main(string[] args)

            string[] tests = new[] { 
@"H:\Video\Movies\Interview with the Vampire The Vampire Chronicles (1994) BRRip 720p x264 AAC-Ameet6233\Interview with the Vampire The Vampire Chronicles (1994) BRRip 720p x264 AAC-Ameet6233.mp4",
@"H:\Video\Movies\51 2011 BRRip 720p x264 -MgB\51 2011 BRRip 720p x264 -MgB.mp4",
@"H:\Video\Movies\War Of The Worlds 2005 BRRip 720p H264-3Li\War Of The Worlds 2005 3Li BRRip.mp4",
@"H:\Video\Movies\Forbidden Planet 1956 BluRay 1080p DTS x264-LoNeWoLf\Forbidden Planet 1956 BluRay 1080p DTS x264-LoNeWoLf.mkv",
@"H:\Video\Movies\11-11-11 2011 DVDRip XViD DTRG\11-11-11 2011 DVDRip XViD DTRG.avi"

            foreach (string test in tests) using(BufferedStream bs = new BufferedStream(new FileStream(test, FileMode.Open), 4096))
                Console.WriteLine("Total:" + bs.Length);




        static List<string> ParentNodes = new List<string>()

        static byte[] buffer = new byte[4];

        static void PrintAtoms(Stream bs, bool readChildren = true, int depth = 0)
            while (bs.Position < bs.Length)
                //Read 32 bits
                bs.Read(buffer, 0, 4);
                long length = (buffer[0] << 24) + (buffer[1] << 16) + (buffer[2] << 8) + buffer[3];

                //Check for extended length
                if (length == 1 || (length & 0xffffffff) == 0) goto length;

                //Read 32 bits to obtain 4 cc
                bs.Read(buffer, 0, 4);

                //Decode UTF8 FourCC
                string fourcc = Encoding.UTF8.GetString(buffer, 0, 4);

                //Write Atom info
                Console.WriteLine(new String('\t', depth) + "Atom=>" + fourcc + " @" + bs.Position + " Length=>" + length);

                //If there is no length then read another atom
                if (length < 0) goto length;
                //Read children of (Node) Movie Atom, etc
                else if (readChildren && ParentNodes.Contains(fourcc))
                    PrintAtoms(bs, true, ++depth);
                //Seek past binary atom data
                if (length >= 8) bs.Seek(length - 8, SeekOrigin.Current);
Essentially if that is really all that is needed to extract the Format / Codec and Duration information then the only thing left would be to extract the NALS and put them in RTPPackets.

For each Track a Thread would be created and seek to the offsets defined in the offset table and reach the samples for the chunk and read the NALS so the packetization can be performed.

H264 SPS and PPS comes from the AVCC box which is inside the AVC1 Box, once read they would also needed to be saved for inclusion in the SDP File, for other codecs such as MPEG 4 you may need the IODS Box etc.

Once the SDP is created the threads start reading and packetization is performed.

Essentially ripping a stream would be as easy as then looking at the codec and doing something like this:
public class RtspIsoArchiver{

var writer = new IsoWriter(someFile);
writer.WriteAtoms(new Atom("ftype", ...).Concat(new Atom("mdat", ...)).Concat(new Atom("moov", ...)));
while(Listening) writer.Write(nalData);
writer.Close(true);//true updates table offset and length information.
Then you would technically be able to save the streams to a Iso .mp4 file and play them back to clients using the IsoReader / IsoArhchivedStream.

I would then also probably end up providing a IsoArchiver to handle the 'Record' command instead of 'rtpdump' since they iso files have boxes for MJPEG and basically every type of codec available.

There would still need to be separate classes for "mpg, mkv and avi" however but if I went that route I would probably only implement MKV and AVI thus cutting down some of the work I would need to do.

Either way I will update you guys when I have more!
Sep 17, 2014 at 3:36 AM
Changeset 109400 adds support for H264 Packetization and Depacketization with the RFC6184Frame class, there is also an experimental RFC6184Stream which provides experimental intraframe based encoding, decoding is soon to follow.

For now you can definitely take a h264 stream and save it to a .264 file with the library in it's current state, then VLC can play it back or any other library which supports it!

A more developed Encoder and Decoder will also follow in the future which will support all profiles.

There are also now classes for BaseMedia, Riff and Matroska file streams, all streams share a common class Container.ContainerElement which can be used with all file sources. (And eventually ASF)

Work is underway to support writing data to those same containers, in the future it will be possible to convert one type of file to another when I can get that API figured out.

I am thinking about also having HttpEndPoint support for on-demand viewing of those containers also which would hint and fragment them for viewing on demand.

Anyway it should only be another week or two before playback from files is supported in the RtspServer using the new classes, then off to decoding and then file writing API!

If you guys have any questions or comments please feel free to chime in!
Marked as answer by juliusfriedman on 9/17/2014 at 6:55 AM
Sep 18, 2014 at 5:25 PM
I think you mean MediaElement and that should be fairly easy with the current setup, you just need to depacketize the MP4 which actually has no scheme for doing so.

I just added support for reading files in 109421, the support should be able to achieve around the same type of thing you can do with jclarys library but without the specific information for each box, developers will have to extract the information they need when they encounter a box.

When container writing support is finalized you will then be able to mux the two streams into a container but for now I am just trying to get the API down to where its easy enough, currently IMediaContainer is good but I need to designate the difference between a Reader and Writer further but I guess every MediaContainer can be written do so long as the underlying stream is writeable....

Anyway I will keep you updated,

Nov 21, 2014 at 5:34 AM
In the latest release I have support for TransportStreams, ProgramStreams and PacketizedElementaryStreams.

I would like some feedback on the API if possible (110192) adds quite a bit and I am trying to make sure I am getting everything implemented while staying efficient.

This is my next major Todo

Let me know what you think!
Dec 5, 2014 at 12:07 AM
Support is very good at the moment, ATSC, M2TS, VOB, BLURAY Etc all seem to be working without any issue.

Hard Drive IO is reduced by only reading data once if possible and only for the nodes required, e.g. nodes in the way are skipped over.

TransportStream and ProgramStream actually cache their nodes as they find them and parse them for relevant properties, this is basically done.

I just need to extract the correct packets to retrieve video information and duration and I will be moving on to demuxing very shortly.

Which reminds me...I also have to handle fragmented Base Media Files (reading and writing) :)

I suppose you guys are also busy but if you notice anything you need let me know!
Marked as answer by juliusfriedman on 12/4/2014 at 4:07 PM