Apex-core build/release steps improvements proposal

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Apex-core build/release steps improvements proposal

chinmay
Administrator
Hello Community,

I want to propose following improvements for apex-core build and related
steps:

1. Most (probably all of the open source project) has the a binary release
package of the software and not just the source release package. Currently
we have only source package. Luckily there are few places (outside of
apache apex) where binary packages of apex has been created for different
purposes : https://github.com/atrato/apex-cli-package &
https://github.com/apache/bigtop)

Proposal here is generate this binary release package as a part of build
process of apex-core.


2. Currently, the docker build that is being created for apex is built one
of my personal repository (https://github.com/chinmaykolhatkar/docker-pool).
While I don't mind hosting the content (Dockerfile etc...) in my
repository, I believe it make sense to host this in apex-core repository.
This way, there is a possibility of using docker github triggers for
building the docker image from release branches.


3. Currently the docker build uses hadoop and apex specific packages from
bigtop deb repo & CI. (See
https://github.com/chinmaykolhatkar/docker-pool/blob/master/apex/ubuntu/app/setup.sh
for more details)
While use of hadoop packages from bigtop repo is fine, we also need to rely
on bigtop contribution to update apex component and then build from bigtop
CI for getting apex.deb package. Basically our docker image generation
process gets blocked on bigtop source update to generate the updated apex
deb.
As we technically don't need to depend on bigtop to generate the apex
binary, the proposal here is to generate binary package during build
process (point 1) and use that during docker image build process instead of
using the ready made deb package from bigtop CI.


I understand that there are multiple items being mention in a single mail
but they seem related hence the mail.

Please let me know your opinion on above items.

Thanks,
Chinmay.
Reply | Threaded
Open this post in threaded view
|

Re: Apex-core build/release steps improvements proposal

Thomas Weise-2
Administrator
+1 to all of this

There are existing JIRAs that you can assign / add to:

https://issues.apache.org/jira/browse/APEXCORE-727

Thanks!



On Thu, May 3, 2018 at 4:26 AM, Chinmay Kolhatkar <[hidden email]>
wrote:

> Hello Community,
>
> I want to propose following improvements for apex-core build and related
> steps:
>
> 1. Most (probably all of the open source project) has the a binary release
> package of the software and not just the source release package. Currently
> we have only source package. Luckily there are few places (outside of
> apache apex) where binary packages of apex has been created for different
> purposes : https://github.com/atrato/apex-cli-package &
> https://github.com/apache/bigtop)
>
> Proposal here is generate this binary release package as a part of build
> process of apex-core.
>
>
> 2. Currently, the docker build that is being created for apex is built one
> of my personal repository (https://github.com/chinmaykolhatkar/docker-pool
> ).
> While I don't mind hosting the content (Dockerfile etc...) in my
> repository, I believe it make sense to host this in apex-core repository.
> This way, there is a possibility of using docker github triggers for
> building the docker image from release branches.
>
>
> 3. Currently the docker build uses hadoop and apex specific packages from
> bigtop deb repo & CI. (See
> https://github.com/chinmaykolhatkar/docker-pool/
> blob/master/apex/ubuntu/app/setup.sh
> for more details)
> While use of hadoop packages from bigtop repo is fine, we also need to rely
> on bigtop contribution to update apex component and then build from bigtop
> CI for getting apex.deb package. Basically our docker image generation
> process gets blocked on bigtop source update to generate the updated apex
> deb.
> As we technically don't need to depend on bigtop to generate the apex
> binary, the proposal here is to generate binary package during build
> process (point 1) and use that during docker image build process instead of
> using the ready made deb package from bigtop CI.
>
>
> I understand that there are multiple items being mention in a single mail
> but they seem related hence the mail.
>
> Please let me know your opinion on above items.
>
> Thanks,
> Chinmay.
>
Reply | Threaded
Open this post in threaded view
|

Re: Apex-core build/release steps improvements proposal

Vlad Rozov-2
+1 to all 3.

Thank you,

Vlad

On 5/3/18 07:03, Thomas Weise wrote:

> +1 to all of this
>
> There are existing JIRAs that you can assign / add to:
>
> https://issues.apache.org/jira/browse/APEXCORE-727
>
> Thanks!
>
>
>
> On Thu, May 3, 2018 at 4:26 AM, Chinmay Kolhatkar <[hidden email]>
> wrote:
>
>> Hello Community,
>>
>> I want to propose following improvements for apex-core build and related
>> steps:
>>
>> 1. Most (probably all of the open source project) has the a binary release
>> package of the software and not just the source release package. Currently
>> we have only source package. Luckily there are few places (outside of
>> apache apex) where binary packages of apex has been created for different
>> purposes : https://github.com/atrato/apex-cli-package &
>> https://github.com/apache/bigtop)
>>
>> Proposal here is generate this binary release package as a part of build
>> process of apex-core.
>>
>>
>> 2. Currently, the docker build that is being created for apex is built one
>> of my personal repository (https://github.com/chinmaykolhatkar/docker-pool
>> ).
>> While I don't mind hosting the content (Dockerfile etc...) in my
>> repository, I believe it make sense to host this in apex-core repository.
>> This way, there is a possibility of using docker github triggers for
>> building the docker image from release branches.
>>
>>
>> 3. Currently the docker build uses hadoop and apex specific packages from
>> bigtop deb repo & CI. (See
>> https://github.com/chinmaykolhatkar/docker-pool/
>> blob/master/apex/ubuntu/app/setup.sh
>> for more details)
>> While use of hadoop packages from bigtop repo is fine, we also need to rely
>> on bigtop contribution to update apex component and then build from bigtop
>> CI for getting apex.deb package. Basically our docker image generation
>> process gets blocked on bigtop source update to generate the updated apex
>> deb.
>> As we technically don't need to depend on bigtop to generate the apex
>> binary, the proposal here is to generate binary package during build
>> process (point 1) and use that during docker image build process instead of
>> using the ready made deb package from bigtop CI.
>>
>>
>> I understand that there are multiple items being mention in a single mail
>> but they seem related hence the mail.
>>
>> Please let me know your opinion on above items.
>>
>> Thanks,
>> Chinmay.
>>

Reply | Threaded
Open this post in threaded view
|

Re: Apex-core build/release steps improvements proposal

Ananth G
+1 to all 3 considering we are trying to centralise the code.

2 should be redone eventually as part of
https://issues.apache.org/jira/browse/APEXCORE-796 ? But the design for
this needs to be seen in the broader context of some of the points
mentioned below:

Regarding 3, I agree that the current image is tightly coupled to bigtop.
While making it independent of bigtop is a starting step, I believe we
might need to revisit our thinking around as to how we would like to
implement containerisation for Apex in the first place.


There are multiple design items to be resolved for Apex containerisation:

1. Apex community needs to evaluate both Hadoop based and Hadoop free
architectures. For non-hadoop based architectures, we need to solve DFS
alternatives as well as the resource manager alternatives. Tickets like
https://issues.apache.org/jira/browse/APEXCORE-724 will bring this design
issue in more detail I believe.

2. Consider how Apex applications will be built as part of the build
process that results in a docker image of the Apex application ( That would
contain application code , malhar operators etc)

3. Consider how we would like to make use of Hadoop 3 support for Docker
https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/DockerContainers.html




Just curious about the docker implementation: Is the end goal of the docker
image to provide a sandbox for

1. Evaluating Apex or
2. Make Apex installable binary as an image or
3. Make Apex applications aligned with a docker build process ( Ex: Python
libraries installed on the image as part of the application code )?

The reason I raise these questions is that it does not make much sense to
bundle a cluster in a box with any distribution ( dockerizing a Hadoop
cluster is non-trivial and I have not heard good success stories around
this approach so far that can be enabled for production). The docker image
that embeds a Hadoop binary is thus only useful for evaluation wherein
everything is contained in the same image and nothing more.

My suspicion is that we will anyways would revisit this approach if our
goals are 2 and/or 3 as well. Perhaps we will address these questions as
part of https://issues.apache.org/jira/browse/APEXCORE-724 and
https://issues.apache.org/jira/browse/APEXCORE-796.

Regards,
Ananth

On Fri, May 4, 2018 at 10:31 AM, Vlad Rozov <[hidden email]> wrote:

> +1 to all 3.
>
> Thank you,
>
> Vlad
>
>
> On 5/3/18 07:03, Thomas Weise wrote:
>
>> +1 to all of this
>>
>> There are existing JIRAs that you can assign / add to:
>>
>> https://issues.apache.org/jira/browse/APEXCORE-727
>>
>> Thanks!
>>
>>
>>
>> On Thu, May 3, 2018 at 4:26 AM, Chinmay Kolhatkar <[hidden email]>
>> wrote:
>>
>> Hello Community,
>>>
>>> I want to propose following improvements for apex-core build and related
>>> steps:
>>>
>>> 1. Most (probably all of the open source project) has the a binary
>>> release
>>> package of the software and not just the source release package.
>>> Currently
>>> we have only source package. Luckily there are few places (outside of
>>> apache apex) where binary packages of apex has been created for different
>>> purposes : https://github.com/atrato/apex-cli-package &
>>> https://github.com/apache/bigtop)
>>>
>>> Proposal here is generate this binary release package as a part of build
>>> process of apex-core.
>>>
>>>
>>> 2. Currently, the docker build that is being created for apex is built
>>> one
>>> of my personal repository (https://github.com/chinmaykol
>>> hatkar/docker-pool
>>> ).
>>> While I don't mind hosting the content (Dockerfile etc...) in my
>>> repository, I believe it make sense to host this in apex-core repository.
>>> This way, there is a possibility of using docker github triggers for
>>> building the docker image from release branches.
>>>
>>>
>>> 3. Currently the docker build uses hadoop and apex specific packages from
>>> bigtop deb repo & CI. (See
>>> https://github.com/chinmaykolhatkar/docker-pool/
>>> blob/master/apex/ubuntu/app/setup.sh
>>> for more details)
>>> While use of hadoop packages from bigtop repo is fine, we also need to
>>> rely
>>> on bigtop contribution to update apex component and then build from
>>> bigtop
>>> CI for getting apex.deb package. Basically our docker image generation
>>> process gets blocked on bigtop source update to generate the updated apex
>>> deb.
>>> As we technically don't need to depend on bigtop to generate the apex
>>> binary, the proposal here is to generate binary package during build
>>> process (point 1) and use that during docker image build process instead
>>> of
>>> using the ready made deb package from bigtop CI.
>>>
>>>
>>> I understand that there are multiple items being mention in a single mail
>>> but they seem related hence the mail.
>>>
>>> Please let me know your opinion on above items.
>>>
>>> Thanks,
>>> Chinmay.
>>>
>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Apex-core build/release steps improvements proposal

Thomas Weise-2
Administrator
Thanks for bringing that up. Docker in this context is only a convenient
way to create a sandbox. There is other work that would need to be happen
to package applications as Docker images and deploy them on platforms such
as Kubernetes.

Thanks


On Thu, May 3, 2018 at 8:12 PM, Ananth G <[hidden email]> wrote:

> +1 to all 3 considering we are trying to centralise the code.
>
> 2 should be redone eventually as part of
> https://issues.apache.org/jira/browse/APEXCORE-796 ? But the design for
> this needs to be seen in the broader context of some of the points
> mentioned below:
>
> Regarding 3, I agree that the current image is tightly coupled to bigtop.
> While making it independent of bigtop is a starting step, I believe we
> might need to revisit our thinking around as to how we would like to
> implement containerisation for Apex in the first place.
>
>
> There are multiple design items to be resolved for Apex containerisation:
>
> 1. Apex community needs to evaluate both Hadoop based and Hadoop free
> architectures. For non-hadoop based architectures, we need to solve DFS
> alternatives as well as the resource manager alternatives. Tickets like
> https://issues.apache.org/jira/browse/APEXCORE-724 will bring this design
> issue in more detail I believe.
>
> 2. Consider how Apex applications will be built as part of the build
> process that results in a docker image of the Apex application ( That would
> contain application code , malhar operators etc)
>
> 3. Consider how we would like to make use of Hadoop 3 support for Docker
> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/
> DockerContainers.html
>
>
>
>
> Just curious about the docker implementation: Is the end goal of the docker
> image to provide a sandbox for
>
> 1. Evaluating Apex or
> 2. Make Apex installable binary as an image or
> 3. Make Apex applications aligned with a docker build process ( Ex: Python
> libraries installed on the image as part of the application code )?
>
> The reason I raise these questions is that it does not make much sense to
> bundle a cluster in a box with any distribution ( dockerizing a Hadoop
> cluster is non-trivial and I have not heard good success stories around
> this approach so far that can be enabled for production). The docker image
> that embeds a Hadoop binary is thus only useful for evaluation wherein
> everything is contained in the same image and nothing more.
>
> My suspicion is that we will anyways would revisit this approach if our
> goals are 2 and/or 3 as well. Perhaps we will address these questions as
> part of https://issues.apache.org/jira/browse/APEXCORE-724 and
> https://issues.apache.org/jira/browse/APEXCORE-796.
>
> Regards,
> Ananth
>
> On Fri, May 4, 2018 at 10:31 AM, Vlad Rozov <[hidden email]> wrote:
>
> > +1 to all 3.
> >
> > Thank you,
> >
> > Vlad
> >
> >
> > On 5/3/18 07:03, Thomas Weise wrote:
> >
> >> +1 to all of this
> >>
> >> There are existing JIRAs that you can assign / add to:
> >>
> >> https://issues.apache.org/jira/browse/APEXCORE-727
> >>
> >> Thanks!
> >>
> >>
> >>
> >> On Thu, May 3, 2018 at 4:26 AM, Chinmay Kolhatkar <[hidden email]>
> >> wrote:
> >>
> >> Hello Community,
> >>>
> >>> I want to propose following improvements for apex-core build and
> related
> >>> steps:
> >>>
> >>> 1. Most (probably all of the open source project) has the a binary
> >>> release
> >>> package of the software and not just the source release package.
> >>> Currently
> >>> we have only source package. Luckily there are few places (outside of
> >>> apache apex) where binary packages of apex has been created for
> different
> >>> purposes : https://github.com/atrato/apex-cli-package &
> >>> https://github.com/apache/bigtop)
> >>>
> >>> Proposal here is generate this binary release package as a part of
> build
> >>> process of apex-core.
> >>>
> >>>
> >>> 2. Currently, the docker build that is being created for apex is built
> >>> one
> >>> of my personal repository (https://github.com/chinmaykol
> >>> hatkar/docker-pool
> >>> ).
> >>> While I don't mind hosting the content (Dockerfile etc...) in my
> >>> repository, I believe it make sense to host this in apex-core
> repository.
> >>> This way, there is a possibility of using docker github triggers for
> >>> building the docker image from release branches.
> >>>
> >>>
> >>> 3. Currently the docker build uses hadoop and apex specific packages
> from
> >>> bigtop deb repo & CI. (See
> >>> https://github.com/chinmaykolhatkar/docker-pool/
> >>> blob/master/apex/ubuntu/app/setup.sh
> >>> for more details)
> >>> While use of hadoop packages from bigtop repo is fine, we also need to
> >>> rely
> >>> on bigtop contribution to update apex component and then build from
> >>> bigtop
> >>> CI for getting apex.deb package. Basically our docker image generation
> >>> process gets blocked on bigtop source update to generate the updated
> apex
> >>> deb.
> >>> As we technically don't need to depend on bigtop to generate the apex
> >>> binary, the proposal here is to generate binary package during build
> >>> process (point 1) and use that during docker image build process
> instead
> >>> of
> >>> using the ready made deb package from bigtop CI.
> >>>
> >>>
> >>> I understand that there are multiple items being mention in a single
> mail
> >>> but they seem related hence the mail.
> >>>
> >>> Please let me know your opinion on above items.
> >>>
> >>> Thanks,
> >>> Chinmay.
> >>>
> >>>
> >
>